Development

From Clairlib

Jump to: navigation, search

[edit] Developer Mailing List

For questions about Clairlib, to provide information about bugs, to suggest additional or revised features, or to find out how you can contribute to Clairlib, email clairlib-dev.

[edit] Changes

======== ====1.05 July 2008====

  • Fixed formatting bugs in CorpusDownload.pm
  • Added get_predecessor_matrix() function in Network.pm
  • Added get_shortest_path() function in Network.pm
  • Added erase_corpus.pl script
  • added erase_isolated_nodes.pl script
  • added --ignore-isolated-nodes in convert_network.pl
  • added several options to print_network_stats.pl: a. --self-loop,
  • completed the descriptions of print_network_stats.pl: added the note of --force into usage.
  • added sentence_to_docs.pl , lines_to_docs.pl under util folder

====1.04B June 2008====

  • Added -no-duplicated-edges in convert_network.pl
  • Added largest connected component in cos_to_stats.pl
  • Added full avergage shortest path in print_network.pl
  • fixed divide by zero error in Network.pm, Betweeness.pm

====1.04A April 2008====

  • Added Clair::Network::GirvanNewman algorithm to do hierarchical clustering
  • Added Clair::Network::KernighanLin algorithm to do graph partition

====1.04 Feburary 2008====

  • Added Clair::Network::AdamicAdar to compute the adamic/adar value for a given network corpus
  • Added Clair::ChisqIndependent to compute p-value and degree of freedom for Chi square

====1.03 August 2007====

  • Added functionality to perform community finding within weighted, undirected networks
  • Added util/chunk\_document.pl to break documents into smaller files by word number
  • Added option to retain punctuation for idf and tf queries
  • Added option to print out full lists of idf and tf values for a corpus
  • LexRank moved from Clair::Network to Clair::Network::Centrality::LexRank
  • LexRank use now follows the same use pattern as the other centrality modules

====1.02 July 2007====

  • Distribution reorganized in standard format
  • Improved and expanded installation documentation (INSTALL)
  • Improved POD (inline) documentation
  • Additional examples
  • Updated PDF documentation

====1.01 May 2007====

  • Added Phrase-based Retrieval and Fuzzy OR Queries
  • Extended Clairlib-ext with interfaces for the Cluster class and the Document class to the Weka machine learning toolkit
  • Added LSI functionality
  • Extended parsing of strings / files into Documents
  • Added perceptron learning and classification for documents

====1.0 RC1 April 2007====

  • Moved all Clair modules beneath the Clair::* namespace, updated documentation
  • Improved Network Analysis, added Clustering Coefficients code
  • Added Network Generation and Statistics modules

====0.955 March 2007====

  • Made it possible to distribute clairlib in two distributions, one containing core code and another containing code that may be dependent on other resources
  • Cleaned up unit tests

====0.953 February 2007====

  • Fixed bugs in Clair::Cluster, Clair::Document involving stemming
  • Cleaned up t/ and test/ directories
  • Created util/ directory
  • Added scripts to util/ directory to:
    • Run a Google query and save the returned URLs to a file
    • Download files from a URL and build a corpus
    • Segment a document into sentences and build a corpus of the sentences
    • Take all documents in a directory and create a corpus
    • Index the corpus (compute TF*IDF, etc.)
    • Compute cosine similarity measures between all documents in a corpus
    • Generate networks corresponding to various cosine thresholds
    • Print network statistics about a network file
    • Generate plots of degree distribution and cosine transitions
  • New methods in Clair::Network:

   print_network_info
   get_network_info_as_string
   get_cumulative_distribution
   cumulative_power_law_exponent
   find_components
   newman_clustering_coefficient
   linear_regression

==POD ERRORS== Hey! '''The above document had some coding errors, which are explained below:'''

Around line 7
 : You forgot a '=back' before '=head3'


Refresh

Personal tools