Domanda

According to Freebase, they have 23,407,174 topics. What is the easiest way to get the UI friendly names (essentially the 'text' attribute of the topic JSON, example of a single topic JSON is here) of ALL of these TOPICs? I don't need any other meta information.

È stato utile?

Soluzione

wget -O - http://download.freebase.com/datadumps/latest/freebase-simple-topic-dump.tsv.bz2 | bunzip2 | cut -f 2 > freebase-topic-names.txt

although you probably want the Freebase IDs as well so that you know what the names refer to:

wget -O - http://download.freebase.com/datadumps/latest/freebase-simple-topic-dump.tsv.bz2 | bunzip2 | cut -f 1,2

Two additional bits of postprocessing are needed:

  1. Tabs are escaped as \t
  2. The string \N represents a null (non-existent) name

Altri suggerimenti

Take a look at the Simple Topic Dump that we provide. It's over a GB of compressed data but its still faster to download than trying to get all the names through the API.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top