Create a Lexical Network from a Corpus

From Clairlib
Jump to: navigation, search

In this tutorial, we will create a lexical network from a corpus. We will use the utility 'corpus_to_lexical_network.pl', which will take as input a pre-indexed corpus and produce a graph where each node is a word and an edge exists between two words if they occur in the same sentences. Multiple occurrences are weighted more. To see this utility's usage details, use the command

corpus_to_lexical_network.pl --help

First off, we will copy the file 11sent.txt to our current working directory. Then, we will create a collection of documents with one line from 11sent.txt each using a utility called 'lines_to_docs.pl'. Then, we will create a corpus from that collection and index it.

lines_to_docs.pl --input 11sent.txt --output 11sent_source
directory_to_corpus.pl --corpus 11Sent --base 11sent_produced --directory 11sent_source
index_corpus.pl --corpus 11Sent --base 11sent_produced

Next, we can directly use the utility to stem the corpus create a lexical network, '11sent.graph':

corpus_to_lexical_network.pl -c 11Sent -b 11sent_produced -o 11sent.graph --stem --verbose

Now, our output file, '11sent.graph' contains a lexical network of stemmed terms.

Personal tools
Namespaces

Variants
Actions
Main Menu
Documentation
Clairlib Lab
Community
Development
Toolbox