Question

I have two different lists and I need extract data from them according their name and then multiply them.

I have this lists:

query_tfidf = [0.8465735902799727, 0.8465735902799727]
documents_query = [['Aftonbladet', 'play', 0.0], ['Aftonbladet', 'free', 0.0],
 ['Radiosporten Play', 'play', 0.10769448286014331], ['Radiosporten Play', 'free', 0.0]]

And I need sort them according their name, for example:

{Aftonbladet: {play: 0.0, free: 0.0}, Radiosporten Play: {play: 0.10769448286014331, free: 0.0}

Then I need to extract data from each and multiply with query_tfidf and compute two variables. For example:

for each name:
    dot_product = (play_value * query_tfidf[0]) + (free_value * query_tfidf[1])
    query = sqrt((query_tfidf[0])^2 + (query_tfidf[1])^2)
    document = sqrt((play_value)^2 + (free_value)^2)

I'm a little bit desperate so I want to ask here. I'm using python 2.7.

Was it helpful?

Solution

Sorting the entries in your documents_query according to their name and keyword is very straightforward using dictionaries:

indexedValues = {}
for entry in documents_query:
    if entry[0] not in indexedValues:
        indexedValues[entry[0]] = {}
    indexedValues[entry[0]][entry[1]] = entry[2]

This will give you indexedValues that looks like what you asked for:

{'Aftonbladet': {'play': 0.0, 'free': 0.0}, 'Radiosporten Play': {'play': 0.10769448286014331, 'free': 0.0}

OTHER TIPS

Use collections.defaultdict to aggregate your data

from collections import defaultdict

results = defaultdict(dict)
for main_key, key, value in documents_query:
    results[main_key][key] = value

# dict(results)
# Out[16]: 
# {'Aftonbladet': {'free': 0.0, 'play': 0.0},
#  'Radiosporten Play': {'free': 0.0, 'play': 0.10769448286014331}}

What you are going to do with it later is bit unclear... but you should figure it out yourself, right?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top