Development History

From Clairlib
Jump to: navigation, search

Contents

1.08 August 2009

  • Updated Clair::SynthCollection to generate synthetic documents based on (1 to 4)-grams.
  • Modified extract N-grams to optionally use CMU-LM.
  • Updated make_synth_collection.pl to fully utilize Clair::SyntheticCollection.
  • Fixed some Tokenizer issues.
  • Added summarize_document.pl to the utilities.
  • Added summarize_collection to the utilities.
  • Added learn.pl to the utilities.
  • Added classify.pl to the utilities.
  • Added extract_features.pl
  • Added bigrams_to_rand_doc.pl to the utilities.
  • Added make_synth_collection_Menczer.pl to the utilities.
  • Added Clair::RandomWalk for random walk on graphs.
  • Added Clair::Harmonic for computing harmonic functions based on the Relaxation and Montecarlo methods
  • Added random_walk.pl to the utilities.
  • Added harmonic.pl to the utilities.
  • Added directory_to_URL_network.pl to the utilities.
  • Fixed a bug in the crawling code.
  • Added new tutorials.
  • Added new sections to the documentation.
  • Added Clair::Bio::GIN for gene interaction extraction.
  • Added an interface to Stanford parser in Clair::Utils::Parse
  • Added tag_genes.pl to the utilities.
  • Added extract_interactions.pl to the utilities.

1.07 June 2009

  • Added Clair::Network::Spectral for spectral partitioning using Fiedler Vector.
  • Made Clairlib independent of MEAD (MEAD is no more required for Clairlib).
  • Added Naive Bayes learning and classification.
  • Added tests for feature extraction, learning, classification.
  • Fixed a bug in Clair::Cluster::create_lexical_network().
  • Added sampling options to Clair::Cluster.
  • Added "No IDF" option and sampling capabilities to corpus_to_cos.pl utility.
  • Fixed documentation typos.
  • Added new tutorials to the documentation.
  • Fixed bug in Clair::Utils::CorpusDownload.
  • Added 'manual weights' option to make_synth_collection util.
  • Fixed bug in extract_ngrams.

1.06 March 2009

  • Added Clair:Network:FordFulkerson
  • Added change_perl_path.pl to the utilities.
  • Added new scripts to interface ACL Anthology Network.
  • Fixed a bug in split_into_sentences() of Clair::Document

1.05 July 2008

  • Fixed formatting bugs in CorpusDownload.pm
  • Added get_predecessor_matrix() function in Network.pm
  • Added get_shortest_path() function in Network.pm
  • Added erase_corpus.pl script
  • added erase_isolated_nodes.pl script
  • added --ignore-isolated-nodes in convert_network.pl
  • added several options to print_network_stats.pl: a. --self-loop,
  • completed the descriptions of print_network_stats.pl: added the note of --force into usage.
  • added sentence_to_docs.pl , lines_to_docs.pl under util folder

1.04B June 2008

  • Added -no-duplicated-edges in convert_network.pl
  • Added largest connected component in cos_to_stats.pl
  • Added full avergage shortest path in print_network.pl
  • fixed divide by zero error in Network.pm, Betweeness.pm

1.04A April 2008

  • Added Clair::Network::GirvanNewman algorithm to do hierarchical clustering
  • Added Clair::Network::KernighanLin algorithm to do graph partition

1.04 Feburary 2008

  • Added Clair::Network::AdamicAdar to compute the adamic/adar value for a given network corpus
  • Added Clair::ChisqIndependent to compute p-value and degree of freedom for Chi square

1.03 August 2007

  • Added functionality to perform community finding within weighted, undirected networks
  • Added util/chunk\_document.pl to break documents into smaller files by word number
  • Added option to retain punctuation for idf and tf queries
  • Added option to print out full lists of idf and tf values for a corpus
  • LexRank moved from Clair::Network to Clair::Network::Centrality::LexRank
  • LexRank use now follows the same use pattern as the other centrality modules

1.02 July 2007

  • Distribution reorganized in standard format
  • Improved and expanded installation documentation (INSTALL)
  • Improved POD (inline) documentation
  • Additional examples
  • Updated PDF documentation

1.01 May 2007

  • Added Phrase-based Retrieval and Fuzzy OR Queries
  • Extended Clairlib-ext with interfaces for the Cluster class and the Document class to the Weka machine learning toolkit
  • Added LSI functionality
  • Extended parsing of strings / files into Documents
  • Added perceptron learning and classification for documents

1.0 RC1 April 2007

  • Moved all Clair modules beneath the Clair::* namespace, updated documentation
  • Improved Network Analysis, added Clustering Coefficients code
  • Added Network Generation and Statistics modules

0.955 March 2007

  • Made it possible to distribute clairlib in two distributions, one containing core code and another containing code that may be dependent on other resources
  • Cleaned up unit tests

0.953 February 2007

  • Fixed bugs in Clair::Cluster, Clair::Document involving stemming
  • Cleaned up t/ and test/ directories
  • Created util/ directory
  • Added scripts to util/ directory to:
    • Run a Google query and save the returned URLs to a file
    • Download files from a URL and build a corpus
    • Segment a document into sentences and build a corpus of the sentences
    • Take all documents in a directory and create a corpus
    • Index the corpus (compute TF*IDF, etc.)
    • Compute cosine similarity measures between all documents in a corpus
    • Generate networks corresponding to various cosine thresholds
    • Print network statistics about a network file
    • Generate plots of degree distribution and cosine transitions
  • New methods in Clair::Network:

   print_network_info
   get_network_info_as_string
   get_cumulative_distribution
   cumulative_power_law_exponent
   find_components
   newman_clustering_coefficient
   linear_regression

Personal tools
Namespaces

Variants
Actions
Main Menu
Documentation
Clairlib Lab
Community
Development
Toolbox