Installation

From Clairlib
Jump to: navigation, search

This guide explains how to install both Clairlib distributions, Clairlib-Core and Clairlib-Ext. To install Clairlib-core, follow the instructions in the section immediately below. To install Clairlib-Ext, first follow the instructions for installing Clairlib-Core, then follow those for Clairlib-Ext itself. Clairlib-Ext requires an installed version of Clairlib-Core in order to run; it is not a stand-alone distribution.

Contents

Install and Test Clairlib-Core

System Requirements

Clairlib-Core requires Perl 5.8.2 or greater. Before you proceed, confirm that the version of Perl you are running is at least this recent by entering

perl -v

at the shell prompt.


Install CPAN Libraries

Clairlib-Core depends on access to the following Perl modules:

  • BerkeleyDB
  • Carp
  • DB_File
  • File::Type
  • Getopt::Long
  • Graph::Directed
  • Hash::Flatten
  • HTML::LinkExtractor
  • HTML::Parse
  • HTML::Strip
  • IO::File
  • IO::Handle
  • IO::Pipe
  • Lingua::Stem
  • Lingua::EN::Sentence
  • Math::MatrixReal
  • Math::Random
  • MLDBM
  • PDL
  • POSIX
  • Scalar::Util
  • Statistics::ChisqIndep
  • Storable
  • Test::More
  • Text::Sentence
  • XML::Parser
  • XML::Simple

There are multiple approaches to locating and installing these modules; using the automated CPAN installer, which is bundled with Perl, is perhaps the quickest and easiest. To do so, enter the following at the shell prompt:

$ perl -MCPAN -e shell

If you have not yet configured the CPAN installer, then you'll have to do so this one time. If you do not know the answer to any of the questions asked, simply hit enter, and the default options will likely suit your environment adequately. However, when asked about parameter options for the perl Makefile.PL command, users without root permissions or who otherwise wish to install Perl libraries within their personal $HOME directory structure should enter the suggested path when prompted:

Your choice:  ] PREFIX=~/perl

This will cause the CPAN installer to install all modules it downloads and tests into $HOME/perl, which means that all subdirectories of this directory that contain Perl modules will need to be added to Perl's @INC variable so that they will be found when needed (see section V below for further explanation).

As a side note, if you ever need to reconfigure the installer, type at the shell prompt:

$ perl -MCPAN -e shell
cpan>o conf init

After configuration (if needed), return to the CPAN shell prompt,

cpan>

and type the following to upgrade the CPAN installer to the latest version:

cpan>install Bundle::CPAN
cpan>q

If asked whether to prepend the installation of required libraries to the queue, hit return (or enter yes). After quitting the shell, type the following to install or upgrade Module::Build and make it the preferred installer:

$ perl -MCPAN -e shell
cpan>install Module::Build
cpan>o conf prefer_installer MB
cpan>o conf commit
cpan>q

Finally, install each of the following dependencies (if you are at all unsure whether the latest versions of each have already been installed) by entering the following at the shell prompt:

$ perl -MCPAN -e shell
cpan>install BerkeleyDB
cpan>install Carp
cpan>install DB_File
cpan>install File::Type
cpan>install Getopt::Long
cpan>install Graph::Directed
cpan>install HTML::LinkExtractor
cpan>install HTML::Parse
cpan>install HTML::Strip
cpan>install IO::File
cpan>install IO::Handle
cpan>install IO::Pipe
cpan>install Lingua::Stem
cpan>install Lingua::EN::Sentence
cpan>install Math::MatrixReal
cpan>install Math::Random
cpan>install MLDBM
cpan>install PDL
cpan>install POSIX
cpan>install Scalar::Util
cpan>install Statistics::ChisqIndep
cpan>install Storable
cpan>install Test::More
cpan>install Text::Sentence
cpan>install XML::Parser
cpan>install XML::Simple

Configure Clairlib-Core

Download the Clairlib-Core distribution into, say, the directory $HOME. Then to install Clairlib-Core in $HOME/clairlib-core, enter the following at the shell prompt:

$ cd $HOME
$ gunzip clairlib-core.tar.gz
$ tar -xf clairlib-core.tar
$ cd clairlib-core/lib/Clair

Then edit Config.pm, which is located in clairlib-core/lib/Clair. You will see the following message at the top of the file:

#################################
# For Clairlib-core users:
# 1. Edit the value assigned to $CLAIRLIB_HOME and give it the value of the path to your installation.
# 2. Edit the value assigned to $MEAD_HOME and give it the value that points to your installation of MEAD.
# 3. Edit the value assigned to $EMAIL and give it an appropriate value.

Follow those instructions. In the case of our example, we would assign

$CLAIRLIB_HOME=$HOME/clairlib-core

and

       
$MEAD_HOME=$HOME/mead

where $HOME must be replaced by an explicit path string such as /home/username. Also, note that the following MEAD variables reflect the structure of a standard MEAD installation and should typically be kept as assigned:

$CIDR_HOME = "$MEAD_HOME/bin/addons/cidr";
$PRMAIN    = "$MEAD_HOME/bin/feature-scripts/lexrank/prmain";
$DBM_HOME  = "$MEAD_HOME/etc";

Test and Install the Clairlib-Core Modules

Before testing and installing the Clairlib-core modules, you'll need to modify Perl's @INC variable to ensure that it includes 1) paths to all Clairlib dependencies (the required libraries installed above), and 2) the path to Clairlib's own modules (in the case of our example, $HOME/clairlib-core/lib). The simplest way to do this is by modifying the contents of your PERL5LIB environment variable from the shell prompt:

$ export PERL5LIB=$HOME/clairlib-core/lib:$HOME/perl/lib     (*)

Here $HOME/clairlib-core/lib is the path to Clairlib's own modules, while $HOME/perl is the path to Clairlib's required modules, installed above (assuming that path is their location). However, doing this requires that you export PERL5LIB each time you invoke the shell environment, so a better way to modify @INC is the following:

$ cd $HOME

Edit .profile or the appropriate configuration file for your shell environment, or create it if it does not already exist. Add (*) to to the file, or prepend the necessary paths using colons, as in (*). Save the file and enter:

$ . .profile

This way you will not have to export PERL5LIB each time you invoke the shell. Enter

      
$ echo $PERL5LIB

to confirm that your modifications have been applied.

Now you may test your Clairlib-Core installation. Enter its directory, in the case of our example:

$ cd $HOME/clairlib-core

Then enter the following commands to test the Clairlib-Core modules:

$ perl Makefile.PL
$ make
$ make test

If you would like to have the Clairlib-Core modules installed for you, and you have the necessary (root) permissions to do so, you may install them by entering the following command:

$ make install

If, on the other hand, you have only local permissions, but you have a personal perl library located at, say, $HOME/perl (as described earlier), then you can install Clairlib-Core there by entering the commands:

$ perl Makefile.PL PREFIX=~/perl
$ make install

Using the Clairlib-Core Modules

To use the Clairlib-Core modules in a Perl script, you must add a path to the modules to Perl's @INC variable. You may use either 1) $CLAIRLIB_HOME/lib, where $CLAIRLIB_HOME is the path defined in Config.pm, or 2) $INSTALL_PATH, where $INSTALL_PATH is a path to the location of the installed Clairlib-Core modules (if you installed them in section V, immediately above). Either of these paths can be added to @INC either by appending the path to the PERL5LIB environment variable or by putting a use lib PATH statement at the top of the script. See the beginning of section V above for a detailed explanation of how to modify the PERL5LIB variable.

Install and Test Clairlib-Ext

The Clairlib-Ext distribution contains optional extensions to Clairlib-Core as well as functionality that depends on other software. The sections below explain how to configure different functionalities of Clairlib-Ext. As each is independent of the rest, you may configure as many or as few as you wish. Section VI provides instructions for the installation and testing of the Clairlib-ext modules itself.

Sentence Segmentation using Adwait Ratnaparkhi's MxTerminator

To use MxTerminator for sentence segmentation, download the installation package from ftp://ftp.cis.upenn.edu/pub/adwait/jmx/jmx.tar.gz. Putting the tarball in, say, $HOME/jmx, enter the following to unpack:

$ cd $HOME/jmx
$ gunzip jmx.tar.gz
$ tar -xf .tar

Uncomment and modify the following lines in clairlib-core/lib/Clair/Config.pm. Point $JMX_HOME to the top directory of your MxTerminator installation, and point $JMX_MODEL_PATH to the location of your MxTerminator trained data, as for example

# $JMX_HOME                = "$HOME/jmx";
# $SENTENCE_SEGMENTER_TYPE = "MxTerminator";
# $JMX_MODEL_PATH          = "$HOME/jmx/eos.project";

where $HOME must be replaced by a literal path string such as /home/username. Note that the /bin directory of a Java installation must be located in your search path, or MxTerminator will not work.

Parsing using a Charniak Parser

To use a Charniak parser with Clairlib, uncomment the following variables in clairlib-core/lib/Clair/Config.pm and point them to it, as for example:

# Default parser and data paths for the Charniak parser for use in Parse.pm
# (Note that CHARNIAK_DATA should end with a slash and that the other
# paths include the executable)
# $CHARNIAK_PATH      = "/data0/tools/charniak/PARSE/parseIt";
# $CHARNIAK_DATA_PATH = "/data0/tools/charniak/DATA/EN/";

# Default path to Chunklink
# $CHUNKLINK_PATH = "/data2/tools/chunklink/chunklink.pl";

Using the Weka Machine Learning Toolkit

To use the Weka Machine Learning Toolkit, a Java machine learning library, with Clairlib, download Weka from http://www.cs.waikato.ac.nz/ml/weka/ and uncomment the following line in clairlib-core/lib/Clair/Config.pm. Point the variable to the location of Weka's .jar file, as for example:

# $WEKA_JAR_PATH = "$HOME/weka/weka-3-4-11/weka.jar"

where $HOME must be replaced by an explicit path string such as /home/username. Note that the /bin directory of a Java installation must be located in your search path, or MxTerminator will not work.

Using the Automatic Link Extractor (ALE)

If you have MySQL installed and wish to use ALE, uncomment the following variables. Point $ALE_PORT at your MySQL socket, and provide the root password to your MySQL installation:

# $ALE_PORT = "/tmp/mysql.sock";
# $ALE_DB_USER = "root";
# $ALE_DB_PASS = "";

Using Google WebSearch

To use the Google WebSearch module, first install the CPAN module Net::Google (refer to section II of the Clairlib-Core installation instructions for further explanation) Then, uncomment the following line and provide a Google SOAP API key. Unfortunately, Google no longer gives out SOAP API keys but has moved to an AJAX Search API. If you have a SOAP API key, you can still use it, and WebSearch will still work.

# $GOOGLE_DEFAULT_KEY = "";

Using CMU-LM toolkit

The CMU-Cambridge Statistical Language Modeling toolkit is a suite of UNIX software tools to facilitate the construction and testing of statistical language models. CMU-LM is used by clairlib for N-grams extraction. It can be downloaded from [1]. Then, add the CMU-LM path to $PATH (or modify ~/.profile):

export PATH=/path/to/CMU-CAM-LM/bin:$PATH

Using GENIA Tagger

The GENIA tagger analyzes English sentences and outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. It is used in Clair::Bio::GIN. To be able to use it, download it from [2] then uncomment and point the following line in Clair::Config to point to the Genia tagger home.

# $GENIATAGGER_PATH = "/path/to/geniatagger";

Using the Stanford Parser

To use the Stanford parser in Clairlib, download it from [3] and install it as instructed in its documentation, then uncomment the following line and point it to the parser home directory.

# $STANFORD_PARSER_PATH = "/path/to/stanford/parser";

Configure Clairlib-Ext

Download the Clairlib-Ext distribution into, for example, the directory $HOME. Then to install Clairlib-Ext in $HOME/clairlib-ext, enter the following at the shell prompt:

$ cd $HOME
$ gunzip clairlib-ext.tar.gz
$ tar -xf clairlib-ext.tar
$ cd clairlib-ext

To test the Clairlib-Ext modules, you must first have installed the Clairlib-Core modules. Confirm that you have, then enter the following:

$ perl Makefile.PL
$ make
$ make test

If you would like to have the Clairlib-Ext modules installed, and you have the necessary (root) permissions to do so, you may install them by entering:

$ make install

If, on the other hand, you have only local permissions, but you have a personal perl library located at, say, $HOME/perl (as described earlier), then you can install Clairlib-Ext there by entering the commands:

$ perl Makefile.PL PREFIX=~/perl
$ make install

Using the Clairlib-Ext Modules

To use the Clairlib-Ext modules in a script, you must add a path to the modules to Perl's @INC variable. You may use either 1) $CLAIRLIB_EXT_HOME/lib, where $CLAIRLIB_EXT_HOME is the path to the top directory of your Clairlib-Ext installation, or 2) $INSTALL_PATH, where $INSTALL_PATH is a path to the location of the installed Clairlib-Ext modules (if you installed them in section V, immediately above). Either of these paths can be added to @INC either by appending the path to the PERL5LIB environment variable or by putting a use lib PATH statement at the top of the script. See the beginning of section V of the Clairlib-Core installation instructions for a detailed explanation of how to modify the PERL5LIB variable.

Support and Documentation

After installing Clairlib, you may access documentation for a module using the perldoc command, as for example:

$ perldoc Clair::Document

Each Clairlib distribution also includes a PDF tutorial. Online API documentation is available at http://clairlib.org/pdoc.

Personal tools
Namespaces

Variants
Actions
Main Menu
Documentation
Clairlib Lab
Community
Development
Toolbox