Generate a random document from bigrams using Random Walk

From Clairlib
Jump to: navigation, search

In this tutorial you'll learn how to use Clairlib to generate a random document by first extracting the bigrams of that document then build a network out of these bigrams and perform a random walk on that graph printing the words as the corresponding nodes are visited. The transition probabilities on the edges depend on the frequency of the bigrams.

To do this, we will use the utility '' and the text file '11sent.txt'. For information on the usage of '', run --help

First, extract the bigrams from '11sent.txt' by running -r "11sent.txt" -f text -w 11sent.2grams -N 2 --segment --verbose

The '--segment' option here tells the script to segment the text into sentences and add the delimiter < s > at the end of each sentence.

Then run '' --input 11sent.2grams --output 11sent.synth --size 5 --delim < s >

This will generate a random document of 5 sentences maximum.

Personal tools

Main Menu
Clairlib Lab