Cosine Similarities and Lexrank Distribution Tutorial

From Clairlib
Jump to: navigation, search

In this tutorial, we will calculate the cosine similarities and Lexrank distribution for every pair of lines in a file. To do this, we will use the 'compute_lexrank.pl' utility. It will create a node for each line in the provided text file and place them in a cluster. Then, it will calculate the cosine similarity matrix and Lexrank transition probabilities for each pair of nodes in the cluster and sort the nodes in descending order of Lexrank centrality. For the purposes of this demonstration, we will use 11sent.txt, provided here.

compute_lexrank.pl 11sent

This will create 3 files in the same directory as 11sent.txt:

  • 11sent.cos: contains the cosine similarity matrix
  • 11sent.lr: contains each line arranged in descending order of Lexrank centrality
  • 11sent.prob: contains the transition probabilities used in Lexrank for each pair of nodes.
Personal tools
Namespaces

Variants
Actions
Main Menu
Documentation
Clairlib Lab
Community
Development
Toolbox