Development
From Clairlib
Contents |
1.08 August 2009
- Updated Clair::SynthCollection to generate synthetic documents based on (1 to 4)-grams.
- Modified extract N-grams to optionally use CMU-LM.
- Updated make_synth_collection.pl to fully utilize Clair::SyntheticCollection.
- Fixed some Tokenizer issues.
- Added summarize_document.pl to the utilities.
- Added summarize_collection to the utilities.
- Added learn.pl to the utilities.
- Added classify.pl to the utilities.
- Added extract_features.pl
- Added bigrams_to_rand_doc.pl to the utilities.
- Added make_synth_collection_Menczer.pl to the utilities.
- Added Clair::RandomWalk for random walk on graphs.
- Added Clair::Harmonic for computing harmonic functions based on the Relaxation and Montecarlo methods
- Added random_walk.pl to the utilities.
- Added harmonic.pl to the utilities.
- Added directory_to_URL_network.pl to the utilities.
- Fixed a bug in the crawling code.
- Added new tutorials.
- Added new sections to the documentation.
- Added Clair::Bio::GIN for gene interaction extraction.
- Added an interface to Stanford parser in Clair::Utils::Parse
- Added tag_genes.pl to the utilities.
- Added extract_interactions.pl to the utilities.
1.07 June 2009
- Added Clair::Network::Spectral for spectral partitioning using Fiedler Vector.
- Made Clairlib independent of MEAD (MEAD is no more required for Clairlib).
- Added Naive Bayes learning and classification.
- Added tests for feature extraction, learning, classification.
- Fixed a bug in Clair::Cluster::create_lexical_network().
- Added sampling options to Clair::Cluster.
- Added "No IDF" option and sampling capabilities to corpus_to_cos.pl utility.
- Fixed documentation typos.
- Added new tutorials to the documentation.
- Fixed bug in Clair::Utils::CorpusDownload.
- Added 'manual weights' option to make_synth_collection util.
- Fixed bug in extract_ngrams.
1.06 March 2009
- Added Clair:Network:FordFulkerson
- Added change_perl_path.pl to the utilities.
- Added new scripts to interface ACL Anthology Network.
- Fixed a bug in split_into_sentences() of Clair::Document
1.05 July 2008
- Fixed formatting bugs in CorpusDownload.pm
- Added get_predecessor_matrix() function in Network.pm
- Added get_shortest_path() function in Network.pm
- Added erase_corpus.pl script
- added erase_isolated_nodes.pl script
- added --ignore-isolated-nodes in convert_network.pl
- added several options to print_network_stats.pl: a. --self-loop,
- completed the descriptions of print_network_stats.pl: added the note of --force into usage.
- added sentence_to_docs.pl , lines_to_docs.pl under util folder
1.04B June 2008
- Added -no-duplicated-edges in convert_network.pl
- Added largest connected component in cos_to_stats.pl
- Added full avergage shortest path in print_network.pl
- fixed divide by zero error in Network.pm, Betweeness.pm
1.04A April 2008
- Added Clair::Network::GirvanNewman algorithm to do hierarchical clustering
- Added Clair::Network::KernighanLin algorithm to do graph partition
1.04 Feburary 2008
- Added Clair::Network::AdamicAdar to compute the adamic/adar value for a given network corpus
- Added Clair::ChisqIndependent to compute p-value and degree of freedom for Chi square
1.03 August 2007
- Added functionality to perform community finding within weighted, undirected networks
- Added util/chunk\_document.pl to break documents into smaller files by word number
- Added option to retain punctuation for idf and tf queries
- Added option to print out full lists of idf and tf values for a corpus
- LexRank moved from Clair::Network to Clair::Network::Centrality::LexRank
- LexRank use now follows the same use pattern as the other centrality modules
1.02 July 2007
- Distribution reorganized in standard format
- Improved and expanded installation documentation (INSTALL)
- Improved POD (inline) documentation
- Additional examples
- Updated PDF documentation
1.01 May 2007
- Added Phrase-based Retrieval and Fuzzy OR Queries
- Extended Clairlib-ext with interfaces for the Cluster class and the Document class to the Weka machine learning toolkit
- Added LSI functionality
- Extended parsing of strings / files into Documents
- Added perceptron learning and classification for documents
1.0 RC1 April 2007
- Moved all Clair modules beneath the Clair::* namespace, updated documentation
- Improved Network Analysis, added Clustering Coefficients code
- Added Network Generation and Statistics modules
0.955 March 2007
- Made it possible to distribute clairlib in two distributions, one containing core code and another containing code that may be dependent on other resources
- Cleaned up unit tests
0.953 February 2007
- Fixed bugs in Clair::Cluster, Clair::Document involving stemming
- Cleaned up t/ and test/ directories
- Created util/ directory
- Added scripts to util/ directory to:
- Run a Google query and save the returned URLs to a file
- Download files from a URL and build a corpus
- Segment a document into sentences and build a corpus of the sentences
- Take all documents in a directory and create a corpus
- Index the corpus (compute TF*IDF, etc.)
- Compute cosine similarity measures between all documents in a corpus
- Generate networks corresponding to various cosine thresholds
- Print network statistics about a network file
- Generate plots of degree distribution and cosine transitions
- New methods in Clair::Network:
print_network_info get_network_info_as_string get_cumulative_distribution cumulative_power_law_exponent find_components newman_clustering_coefficient linear_regression