How to implement Brown Clustering Algorithm in O(|V|k^2)

https://datascience.stackexchange.com/questions/896

16-10-2019
|

Question

I am trying to implement the Brown Clustering Algorithm.

Paper details: "Class-Based n-gram Models of Natural Language" by Brown et al

The algorithm is supposed to in O(|V|k^2) where |V| is the size of the vocabulary and k is the number of clusters. I am unable to implement it this efficiently. In fact, the best I can manage is O(|V|k^3) which is too slow. My current implementation for the main part of the algorithm is as follows:

for w = number of clusters + 1 to |V|
{
   word = next most frequent word in the corpus

   assign word to a new cluster 

   initialize MaxQuality to 0

   initialize ArgMax vector to (0,0)

   for i = 0 to number of clusters - 1 
   {
      for j = i to number of clusters
      {
         Quality = Mutual Information if we merge cluster i and cluster j

         if Quality > MaxQuality
         {
            MaxQuality = Quality 

            ArgMax = (i,j) 
         }
      }
   }
}

I compute quality as follows:

1. Before entering the second loop compute the pre-merge quality i.e. quality before doing any merges.
2. Every time a cluster-pair merge step is considered:
    i. assign quality := pre-merge quality
   ii. quality = quality - any terms in the mutual information equation that contain cluster i or cluster j (pre-merge)
  iii. quality = quality + any terms in the mutual information equation that contain (cluster i U cluster j)  (post-merge)

In my implementation, the first loop has approx |V| iterations, the second and third loop approx k iterations each. To compute quality at each step requires approx a further k iterations. In total it runs in (|V|k^3) time.

How do you get it to run in (|V|k^2)?

Solution

I have managed to resolve this. There is an excellent and thorough explanation of the optimization steps in the following thesis: Semi-Supervised Learning for Natural Language by Percy Liang.

My mistake was trying to update the quality for all potential clusters pairs. Instead, you should initialize a table with the quality changes of doing each merge. Use this table to find the best merge, and the update the relevant terms that make up the table entries.

OTHER TIPS

Just to complement the question, regarding the overview of the algorithm, I found this slide to be very clear (but also failed to mention the table update mechanism):

from Michael Collins given in his MOOC on NLP (18 - 5 - The Brown Clustering Algorithm (Part 3) (9-18))

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange