Create a Lexical Network from a Corpus
In this tutorial, we will create a lexical network from a corpus. We will use the utility 'corpus_to_lexical_network.pl', which will take as input a pre-indexed corpus and produce a graph where each node is a word and an edge exists between two words if they occur in the same sentences. Multiple occurrences are weighted more. To see this utility's usage details, use the command
First off, we will copy the file 11sent.txt to our current working directory. Then, we will create a collection of documents with one line from 11sent.txt each using a utility called 'lines_to_docs.pl'. Then, we will create a corpus from that collection and index it.
lines_to_docs.pl --input 11sent.txt --output 11sent_source directory_to_corpus.pl --corpus 11Sent --base 11sent_produced --directory 11sent_source index_corpus.pl --corpus 11Sent --base 11sent_produced
Next, we can directly use the utility to stem the corpus create a lexical network, '11sent.graph':
corpus_to_lexical_network.pl -c 11Sent -b 11sent_produced -o 11sent.graph --stem --verbose
Now, our output file, '11sent.graph' contains a lexical network of stemmed terms.