Create a Lexical Network from a Corpus

From Clairlib
Jump to: navigation, search

In this tutorial, we will create a lexical network from a corpus. We will use the utility '', which will take as input a pre-indexed corpus and produce a graph where each node is a word and an edge exists between two words if they occur in the same sentences. Multiple occurrences are weighted more. To see this utility's usage details, use the command --help

First off, we will copy the file 11sent.txt to our current working directory. Then, we will create a collection of documents with one line from 11sent.txt each using a utility called ''. Then, we will create a corpus from that collection and index it. --input 11sent.txt --output 11sent_source --corpus 11Sent --base 11sent_produced --directory 11sent_source --corpus 11Sent --base 11sent_produced

Next, we can directly use the utility to stem the corpus create a lexical network, '11sent.graph': -c 11Sent -b 11sent_produced -o 11sent.graph --stem --verbose

Now, our output file, '11sent.graph' contains a lexical network of stemmed terms.

Personal tools

Main Menu
Clairlib Lab