Installation
From Clairlib
This guide explains how to install both Clairlib distributions, Clairlib-Core and Clairlib-Ext. To install Clairlib-core, follow the instructions in the section immediately below. To install Clairlib-Ext, first follow the instructions for installing Clairlib-Core, then follow those for Clairlib-Ext itself. Clairlib-Ext requires an installed version of Clairlib-Core in order to run; it is not a stand-alone distribution.
[edit] Install and Test Clairlib-Core
[edit] System Requirements
Clairlib-Core requires Perl 5.8.2 or greater. Before you proceed, confirm that the version of Perl you are running is at least this recent by entering
perl -v
at the shell prompt.
[edit] Install MEAD
Download MEAD 3.11 or later from http://www.summarization.com/mead/. The installation package is in .tar.gz ("tarball") format. To install MEAD in, say, the directory $HOME/mead, ensure that the installation package is located in $HOME, and enter the following at the shell prompt:
$ cd $HOME $ gunzip MEAD-3.11.tar.gz $ tar -xf MEAD-3.11.tar $ cd mead $ perl Install.PL
Next, you will need to compile tf2gen.cpp to produce an executable required by MEAD. Enter the following:
$ cd $HOME/mead/bin/feature-scripts $ g++ tf2gen.cpp -o tf2gen
[edit] Install CPAN Libraries
Clairlib-Core depends on access to the following Perl modules:
- BerkeleyDB
- Carp
- File::Type
- Getopt::Long
- Graph::Directed
- Hash::Flatten
- HTML::LinkExtractor
- HTML::Parse
- IO::File
- IO::Handle
- IO::Pipe
- Lingua::Stem
- Math::MatrixReal
- Math::Random
- MLDBM
- PDL
- POSIX
- Scalar::Util
- Statistics::ChisqIndep
- Storable
- Test::More
- Text::Sentence
- XML::Parser
- XML::Simple
There are multiple approaches to locating and installing these modules; using the automated CPAN installer, which is bundled with Perl, is perhaps the quickest and easiest. To do so, enter the following at the shell prompt:
$ perl -MCPAN -e shell
If you have not yet configured the CPAN installer, then you'll have to do so this one time. If you do not know the answer to any of the questions asked, simply hit enter, and the default options will likely suit your environment adequately. However, when asked about parameter options for the perl Makefile.PL command, users without root permissions or who otherwise wish to install Perl libraries within their personal $HOME directory structure should enter the suggested path when prompted:
Your choice: ] PREFIX=~/perl
This will cause the CPAN installer to install all modules it downloads and tests into $HOME/perl, which means that all subdirectories of this directory that contain Perl modules will need to be added to Perl's @INC variable so that they will be found when needed (see section V below for further explanation).
As a side note, if you ever need to reconfigure the installer, type at the shell prompt:
$ perl -MCPAN -e shell cpan>o conf init
After configuration (if needed), return to the CPAN shell prompt,
cpan>
and type the following to upgrade the CPAN installer to the latest version:
cpan>install Bundle::CPAN cpan>q
If asked whether to prepend the installation of required libraries to the queue, hit return (or enter yes). After quitting the shell, type the following to install or upgrade Module::Build and make it the preferred installer:
$ perl -MCPAN -e shell cpan>install Module::Build cpan>o conf prefer_installer MB cpan>o conf commit cpan>q
Finally, install each of the following dependencies (if you are at all unsure whether the latest versions of each have already been installed) by entering the following at the shell prompt:
$ perl -MCPAN -e shell cpan>install BerkeleyDB cpan>install Carp cpan>install File::Type cpan>install Getopt::Long cpan>install Graph::Directed cpan>install HTML::LinkExtractor cpan>install HTML::Parse cpan>install IO::File cpan>install IO::Handle cpan>install IO::Pipe cpan>install Lingua::Stem cpan>install Math::MatrixReal cpan>install Math::Random cpan>install MLDBM cpan>install PDL cpan>install POSIX cpan>install Scalar::Util cpan>install Statistics::ChisqIndep cpan>install Storable cpan>install Test::More cpan>install Text::Sentence cpan>install XML::Parser cpan>install XML::Simple
[edit] Configure Clairlib-Core
Download the Clairlib-Core distribution into, say, the directory $HOME. Then to install Clairlib-Core in $HOME/clairlib-core, enter the following at the shell prompt:
$ cd $HOME $ gunzip clairlib-core.tar.gz $ tar -xf clairlib-core.tar $ cd clairlib-core/lib/Clair
Then edit Config.pm, which is located in clairlib-core/lib/Clair. You will see the following message at the top of the file:
################################# # For Clairlib-core users: # 1. Edit the value assigned to $CLAIRLIB_HOME and give it the value of the path to your installation. # 2. Edit the value assigned to $MEAD_HOME and give it the value that points to your installation of MEAD. # 3. Edit the value assigned to $EMAIL and give it an appropriate value.
Follow those instructions. In the case of our example, we would assign
$CLAIRLIB_HOME=$HOME/clairlib-core
and
$MEAD_HOME=$HOME/mead
where $HOME must be replaced by an explicit path string such as /home/username. Also, note that the following MEAD variables reflect the structure of a standard MEAD installation and should typically be kept as assigned:
$CIDR_HOME = "$MEAD_HOME/bin/addons/cidr"; $PRMAIN = "$MEAD_HOME/bin/feature-scripts/lexrank/prmain"; $DBM_HOME = "$MEAD_HOME/etc";
[edit] Test and Install the Clairlib-Core Modules
Before testing and installing the Clairlib-core modules, you'll need to modify Perl's @INC variable to ensure that it includes 1) paths to all Clairlib dependencies (the required libraries installed above), and 2) the path to Clairlib's own modules (in the case of our example, $HOME/clairlib-core/lib). The simplest way to do this is by modifying the contents of your PERL5LIB environment variable from the shell prompt:
$ export PERL5LIB=$HOME/clairlib-core/lib:$HOME/perl/lib (*)
Here $HOME/clairlib-core/lib is the path to Clairlib's own modules, while $HOME/perl is the path to Clairlib's required modules, installed above (assuming that path is their location). However, doing this requires that you export PERL5LIB each time you invoke the shell environment, so a better way to modify @INC is the following:
$ cd $HOME
Edit .profile or the appropriate configuration file for your shell environment, or create it if it does not already exist. Add (*) to to the file, or prepend the necessary paths using colons, as in (*). Save the file and enter:
$ . .profile
This way you will not have to export PERL5LIB each time you invoke the
shell. Enter
$ echo $PERL5LIB
to confirm that your modifications have been applied.
Now you may test your Clairlib-Core installation. Enter its directory, in the case of our example:
$ cd $HOME/clairlib-core
Then enter the following commands to test the Clairlib-Core modules:
$ perl Makefile.PL $ make $ make test
If you would like to have the Clairlib-Core modules installed for you, and you have the necessary (root) permissions to do so, you may install them by entering the following command:
$ make install
If, on the other hand, you have only local permissions, but you have a personal perl library located at, say, $HOME/perl (as described earlier), then you can install Clairlib-Core there by entering the commands:
$ perl Makefile.PL PREFIX=~/perl $ make install
[edit] Using the Clairlib-Core Modules
To use the Clairlib-Core modules in a Perl script, you must add a path to the modules to Perl's @INC variable. You may use either 1) $CLAIRLIB_HOME/lib, where $CLAIRLIB_HOME is the path defined in Config.pm, or 2) $INSTALL_PATH, where $INSTALL_PATH is a path to the location of the installed Clairlib-Core modules (if you installed them in section V, immediately above). Either of these paths can be added to @INC either by appending the path to the PERL5LIB environment variable or by putting a use lib PATH statement at the top of the script. See the beginning of section V above for a detailed explanation of how to modify the PERL5LIB variable.
[edit] Install and Test Clairlib-Ext
The Clairlib-Ext distribution contains optional extensions to Clairlib-Core as well as functionality that depends on other software. The sections below explain how to configure different functionalities of Clairlib-Ext. As each is independent of the rest, you may configure as many or as few as you wish. Section VI provides instructions for the installation and testing of the Clairlib-ext modules itself.
[edit] Sentence Segmentation using Adwait Ratnaparkhi's MxTerminator
To use MxTerminator for sentence segmentation, download the installation package from ftp://ftp.cis.upenn.edu/pub/adwait/jmx/jmx.tar.gz. Putting the tarball in, say, $HOME/jmx, enter the following to unpack:
$ cd $HOME/jmx $ gunzip jmx.tar.gz $ tar -xf .tar
Uncomment and modify the following lines in clairlib-core/lib/Clair/Config.pm. Point $JMX_HOME to the top directory of your MxTerminator installation, and point $JMX_MODEL_PATH to the location of your MxTerminator trained data, as for example
# $JMX_HOME = "$HOME/jmx"; # $SENTENCE_SEGMENTER_TYPE = "MxTerminator"; # $JMX_MODEL_PATH = "$HOME/jmx/eos.project";
where $HOME must be replaced by a literal path string such as /home/username. Note that the /bin directory of a Java installation must be located in your search path, or MxTerminator will not work.
[edit] Parsing using a Charniak Parser
To use a Charniak parser with Clairlib, uncomment the following variables in clairlib-core/lib/Clair/Config.pm and point them to it, as for example:
# Default parser and data paths for the Charniak parser for use in Parse.pm # (Note that CHARNIAK_DATA should end with a slash and that the other # paths include the executable) # $CHARNIAK_PATH = "/data0/tools/charniak/PARSE/parseIt"; # $CHARNIAK_DATA_PATH = "/data0/tools/charniak/DATA/EN/"; # Default path to Chunklink # $CHUNKLINK_PATH = "/data2/tools/chunklink/chunklink.pl";
[edit] Using the Weka Machine Learning Toolkit
To use the Weka Machine Learning Toolkit, a Java machine learning library, with Clairlib, download Weka from http://www.cs.waikato.ac.nz/ml/weka/ and uncomment the following line in clairlib-core/lib/Clair/Config.pm. Point the variable to the location of Weka's .jar file, as for example:
# $WEKA_JAR_PATH = "$HOME/weka/weka-3-4-11/weka.jar"
where $HOME must be replaced by an explicit path string such as /home/username. Note that the /bin directory of a Java installation must be located in your search path, or MxTerminator will not work.
[edit] Using the Automatic Link Extractor (ALE)
If you have MySQL installed and wish to use ALE, uncomment the following variables. Point $ALE_PORT at your MySQL socket, and provide the root password to your MySQL installation:
# $ALE_PORT = "/tmp/mysql.sock"; # $ALE_DB_USER = "root"; # $ALE_DB_PASS = "";
[edit] Using Google WebSearch
To use the Google WebSearch module, first install the CPAN module
Net::Google (refer to section II of the Clairlib-Core installation instructions for further explanation) Then, uncomment the following line and provide a Google SOAP API key. Unfortunately, Google no longer gives out SOAP API keys but has moved to an AJAX Search API. If you have a SOAP API key, you can still use it, and WebSearch will still work.
# $GOOGLE_DEFAULT_KEY = "";
[edit] Configure Clairlib-Ext
Download the Clairlib-Ext distribution into, for example, the directory $HOME. Then to install Clairlib-Ext in $HOME/clairlib-ext, enter the following at the shell prompt:
$ cd $HOME $ gunzip clairlib-ext.tar.gz $ tar -xf clairlib-ext.tar $ cd clairlib-ext
To test the Clairlib-Ext modules, you must first have installed the Clairlib-Core modules. Confirm that you have, then enter the following:
$ perl Makefile.PL $ make $ make test
If you would like to have the Clairlib-Ext modules installed, and you have the necessary (root) permissions to do so, you may install them by entering:
$ make install
If, on the other hand, you have only local permissions, but you have a personal perl library located at, say, $HOME/perl (as described earlier), then you can install Clairlib-Ext there by entering the commands:
$ perl Makefile.PL PREFIX=~/perl $ make install
[edit] Using the Clairlib-Ext Modules
To use the Clairlib-Ext modules in a script, you must add a path to the modules to Perl's @INC variable. You may use either 1) $CLAIRLIB_EXT_HOME/lib, where $CLAIRLIB_EXT_HOME is the path to the top directory of your Clairlib-Ext installation, or 2) $INSTALL_PATH, where $INSTALL_PATH is a path to the location of the installed Clairlib-Ext modules (if you installed them in section V, immediately above). Either of these paths can be added to @INC either by appending the path to the PERL5LIB environment variable or by putting a use lib PATH statement at the top of the script. See the beginning of section V of the Clairlib-Core installation instructions for a detailed explanation of how to modify the PERL5LIB variable.
[edit] Support and Documentation
After installing Clairlib, you may access documentation for a module using the perldoc command, as for example:
$ perldoc Clair::Document
Each Clairlib distribution also includes a PDF tutorial. Online API documentation is available at http://belobog.si.umich.edu/clair/clairlib/pdoc.

