Question

According to Freebase, they have 23,407,174 topics. What is the easiest way to get the UI friendly names (essentially the 'text' attribute of the topic JSON, example of a single topic JSON is here) of ALL of these TOPICs? I don't need any other meta information.

Was it helpful?

Solution

wget -O - http://download.freebase.com/datadumps/latest/freebase-simple-topic-dump.tsv.bz2 | bunzip2 | cut -f 2 > freebase-topic-names.txt

although you probably want the Freebase IDs as well so that you know what the names refer to:

wget -O - http://download.freebase.com/datadumps/latest/freebase-simple-topic-dump.tsv.bz2 | bunzip2 | cut -f 1,2

Two additional bits of postprocessing are needed:

  1. Tabs are escaped as \t
  2. The string \N represents a null (non-existent) name

OTHER TIPS

Take a look at the Simple Topic Dump that we provide. It's over a GB of compressed data but its still faster to download than trying to get all the names through the API.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top