Generate a random document from bigrams using Random Walk

From Clairlib
Jump to: navigation, search

In this tutorial you'll learn how to use Clairlib to generate a random document by first extracting the bigrams of that document then build a network out of these bigrams and perform a random walk on that graph printing the words as the corresponding nodes are visited. The transition probabilities on the edges depend on the frequency of the bigrams.

To do this, we will use the utility 'bigrams_to_random_doc.pl' and the text file '11sent.txt'. For information on the usage of 'bigrams_to_random_doc.pl', run

bigrams_to_random_doc.pl --help

First, extract the bigrams from '11sent.txt' by running

extract_ngrams.pl -r "11sent.txt" -f text -w 11sent.2grams -N 2 --segment --verbose

The '--segment' option here tells the script to segment the text into sentences and add the delimiter < s > at the end of each sentence.

Then run 'bigrams_to_random_doc.pl'

bigrams_to_rand_doc.pl --input 11sent.2grams --output 11sent.synth --size 5 --delim < s >

This will generate a random document of 5 sentences maximum.

Personal tools
Namespaces

Variants
Actions
Main Menu
Documentation
Clairlib Lab
Community
Development
Toolbox