From Clairlib
Jump to: navigation, search

Clairlib is a suite of open-source Perl modules developed and maintained by the Computational Linguistics And Information Retrieval (CLAIR) group [1] at the University of Michigan. Clairlib is intended to simplify a number of generic tasks in natural language processing (NLP), information retrieval (IR), and network analysis (NA). The latest version of clairlib is 1.06 which was released on March 2009 and includes about 130 modules implementing a wide range of functionalities.



Clairlib is distributed in two forms: Clairlib-core, which has essential functionality and minimal dependence on external software, and Clairlib-ext, which has extended functionality that may be of interest to a smaller audience. Much can be done using Clairlib on its own. Some of the things that Clairlib can do are: Tokenization, Summarization, Document Clustering, Document Indexing, Web Graph Analysis, Network Generation, Power Law Distribution Analysis, Network Analysis, RandomWalks on Graphs, Tf-IDF, Perceptron Learning and Classification, and Phrase Based Retrieval and Fuzzy OR Queries.


  • Dragomir Radev, University of Michigan [2]
  • Mark Hodges, University of Michigan
  • Anthony Fader, University of Michigan
  • Mark Joseph, University of Michigan
  • Joshua Gerrish, University of Michigan
  • Mark Schaller, University of Michigan
  • Jonathan dePeri, Columbia University
  • Bryan Gibson, University of Michigan
  • Chen Huang, University of Michigan
  • Amjad Abu Jbara, University of Michigan


Clairlib is a free library; you can redistribute it and/or modify it under the same terms as Perl itself.

Download and Documentation

Clairlib modules are available for download on [3]. Installation instructions and modules documentation is also available in both PDF and HTML formats. Clairlib comes with a lot of code examples and a set of useful tutorials on using its modules in various applications.


This work has been supported in part by National Institutes of Health grants R01 LM008106 ?Representing and Acquiring Knowledge of Genome Regulation and U54 DA021519 ?National center for integrative bioinformatics, as well as by grants IDM 0329043 Probabilistic and link-based Methods for Exploiting Very Large Textual Repositories, DHB 0527513 The Dynamics of Political Representation and Political Rhetoric, 0534323 Collaborative Research: BlogoCenter - Infrastructure for Collecting,Mining and Accessing Blogs, and 0527513. The Dynamics of Political Representation and Political Rhetoric, from the National Science Foundation.

Personal tools

Main Menu
Clairlib Lab