Generate a random document from bigrams using Random Walk
From CLAIRlib
In this tutorial you'll learn how to use Clairlib to generate a random document by first extracting the bigrams of that document then build a network out of these bigrams and perform a random walk on that graph printing the words as the corresponding nodes are visited. The transition probabilities on the edges depend on the frequency of the bigrams.
To do this, we will use the utility 'bigrams_to_random_doc.pl' and the text file '11sent.txt'. For information on the usage of 'bigrams_to_random_doc.pl', run
bigrams_to_random_doc.pl --help
First, extract the bigrams from '11sent.txt' by running
extract_ngrams.pl -r "11sent.txt" -f text -w 11sent.2grams -N 2 --segment --verbose
The '--segment' option here tells the script to segment the text into sentences and add the delimiter < s > at the end of each sentence.
Then run 'bigrams_to_random_doc.pl'
bigrams_to_rand_doc.pl --input 11sent.2grams --output 11sent.synth --size 5 --delim < s >
This will generate a random document of 5 sentences maximum.

