System Design of Google Trends?

Question 1

Well... finding out the top K terms is not really a big problem. One of the key ideas in this fields have been the idea of "stream processing", i.e., to perform the operation in a single pass of the data and sacrificing some accuracy to get a probabilistic answer. Thus, assume you get a stream of data like the following:

A B K A C A B B C D F G A B F H I B A C F I U X A C

What you want is the top K items. Naively, one would maintain a counter for each item, and at the end sort by the count of each item. This takes O(U) space and O(max(U*log(U), N)) time, where U is the number of unique items and N is the number of items in the list.

In case U is small, this is not really a big problem. But once you are in the domain of search logs with billions or trillions of unique searches, the space consumption starts to become a problem.

So, people came up with the idea of "count-sketches" (you can read up more here: count min sketch page on wikipedia). Here you maintain a hash table A of length n and create two hashes for each item:

h1(x) = 0 ... n-1 with uniform probability

h2(x) = 0/1 each with probability 0.5

You then do A[h1[x]] += h2[x]. The key observation is that since each value randomly hashes to +/-1, E[ A[h1[x]] * h2[x] ] = count(x), where E is the expected value of the expression, and count is the number of times x appeared in the stream.

Of course, the problem with this approach is that each estimate still has a large variance, but that can be dealt with by maintaining a large set of hash counters and taking the average or the minimum count from each set.

With this sketch data structure, you are able to get an approximate frequency of each item. Now, you simply maintain a list of 10 items with the largest frequency estimates till now, and at the end you will have your list.

Question 2

How exactly a particular private company does it is likely not publicly available, and how to evaluate the effectiveness of such a system is at the discretion of the designer (be it you or Google or whoever)

But many of the tools and research is out there to get you started. Check out some of the Big Data tools, including many of the top-level Apache projects, like Storm, which allows for the processing of streaming data in real-time

Also check out some of the Big Data and Web Science conferences like KDD or WSDM, as well as papers put out by Google Research

How to design such a system is challenging with no correct answer, but the tools and research are available to get you started