Question

i have the variable 'actorslist' and its output 100 lines of this ( a line for each movie):

[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunton', u'William Sadler']
[u'Christian Bale', u'Heath Ledger', u'Aaron Eckhart', u'Michael Caine']
etc.

Then I have:

pairslist = list(itertools.permutations(actorslist, 2))

This gives me the pairs of actors, but only within a specific movie and then after a new line it goes to the next movie. How can I get it to output all the actors from all the movies in one big array? The idea being that two actors who were in a movie together should get a pydot edge.

I put in this, which successfully outputted to a dot file, but isn't outputting the right data.

graph = pydot.Dot(graph_type='graph', charset="utf8")
for i in pairslist:
  edge = pydot.Edge(i[0], i[1])
  graph.add_edge(edge)
  graph.write('dotfile.dot')

My expected output should be as follows in the dot file (A,B) is the same as (B,A) and so don't exist in the output:

"Tim Robbins" -- "Morgan Freeman";
"Tim Robbins" -- "Bob Gunton";
"Tim Robbins" -- "William Sadler";
"Morgan Freeman" -- "Bob Gunton";
"Morgan Freeman" -- "William Sadler";
"Bob Gunton" -- "William Sadler";
"Christian Bale" -- "Heath Ledger";
"Christian Bale" -- "Aaron Eckhart";
"Christian Bale" -- "Michael Caine";
"Heath Ledger" -- "Aaron Eckhart";
"Heath Ledger" -- "Michael Caine";
"Aaron Eckhart" -- "Michael Caine";

ADDITIONAL INFO:

some were interested in how the variable actorslist was created:

file = open('input.txt','rU') ###input is JSON data on each line{"Title":"Shawshank...
nfile = codecs.open('output.txt','w','utf-8')
movie_actors = []
for line in file:
  line = line.rstrip()
  movie = json.loads(line)
  l = []
  title = movie['Title']
  actors = movie['Actors']
  tempactorslist = actors.split(',')
  actorslist = []
  for actor in tempactorslist:
    actor = actor.strip()
    actorslist.append(actor)
  l.append(title)
  l.append(actorslist)
  row = l[0] + '\t' + json.dumps(l[1]) + '\n'
  nfile.writelines(row)
Was it helpful?

Solution

from collections import Counter
from itertools import combinations
import pydot

actorslists = [
    [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunton', u'William Sadler'],
    [u'Christian Bale', u'Heath Ledger', u'Aaron Eckhart', u'Michael Caine'],
    [u'Tim Robbins', u'Heath Ledger', u'Michael Caine']
]

# Counter tracks how often each pair of actors has occurred (-> link weight)
actorpairs = Counter(pair for actorslist in actorslists for pair in combinations(sorted(actorslist), 2))

graph = pydot.Dot(graph_type='graph', charset="utf8")
for actors,weight in actorpairs.iteritems():   # or .items() for Python 3.x
    a,b = list(actors)
    edge = pydot.Edge(a, b, weight=str(weight))
    graph.add_edge(edge)
graph.write('dotfile.dot')

results in

enter image description here

OTHER TIPS

You'll want something like this:

import itertools

actorslist = [
    [u'Tim Robbins', u'Morgan Freeman', u'Bob Gunton', u'William Sadler'],
    [u'Christian Bale', u'Heath Ledger', u'Aaron Eckhart', u'Michael Caine']
    ]

for movie in actorslist:
    for actor1, actor2 in itertools.permutations(movie, 2):
        print(actor1, actor2)
        # make edge, etc.

Output:

Tim Robbins Morgan Freeman
Tim Robbins Bob Gunton
Tim Robbins William Sadler
Morgan Freeman Tim Robbins
Morgan Freeman Bob Gunton
Morgan Freeman William Sadler
Bob Gunton Tim Robbins
Bob Gunton Morgan Freeman
Bob Gunton William Sadler
William Sadler Tim Robbins
William Sadler Morgan Freeman
William Sadler Bob Gunton
Christian Bale Heath Ledger
Christian Bale Aaron Eckhart
Christian Bale Michael Caine
Heath Ledger Christian Bale
Heath Ledger Aaron Eckhart
Heath Ledger Michael Caine
Aaron Eckhart Christian Bale
Aaron Eckhart Heath Ledger
Aaron Eckhart Michael Caine
Michael Caine Christian Bale
Michael Caine Heath Ledger
Michael Caine Aaron Eckhart

What you have right now is permuting the list of movies, not the list of actors within each movie.

I am not sure how complicated it needs to be, but this seems to work to generate your output. I only changed your pairs line... (I took the liberty of putting Tim Robbins into Batman, just to give it more realistic overlap)

actorslist = [[u'Tim Robbins', u'Morgan Freeman', u'Bob Gunton', u'William Sadler'],
  [u'Christian Bale', u'Heath Ledger', u'Tim Robbins', u'Michael Caine']]

import itertools
import pydot
graph = pydot.Dot(graph_type='graph', charset="utf8")

# generate a list of all unique actors, if you want that
# allactors = list(set([j for j in [i for i in actorslist]]))

# this is the key line -- you have to iterate through the list 
# and not try to permute the whole thing
pairs = [list(itertools.permutations(k, 2)) for k in actorslist]


for pair in pairs:
    for a,b in pair:
        edge = pydot.Edge(a,b)
        graph.add_edge(edge)
        graph.write('dotfile.dot')

Output file (remember I changed the input re Tim Robbins)...

graph G {
charset=utf8;
"Tim Robbins" -- "Morgan Freeman";
"Tim Robbins" -- "Bob Gunton";
"Tim Robbins" -- "William Sadler";
"Morgan Freeman" -- "Tim Robbins";
"Morgan Freeman" -- "Bob Gunton";
"Morgan Freeman" -- "William Sadler";
"Bob Gunton" -- "Tim Robbins";
"Bob Gunton" -- "Morgan Freeman";
"Bob Gunton" -- "William Sadler";
"William Sadler" -- "Tim Robbins";
"William Sadler" -- "Morgan Freeman";
"William Sadler" -- "Bob Gunton";
"Christian Bale" -- "Heath Ledger";
"Christian Bale" -- "Tim Robbins";
"Christian Bale" -- "Michael Caine";
"Heath Ledger" -- "Christian Bale";
"Heath Ledger" -- "Tim Robbins";
"Heath Ledger" -- "Michael Caine";
"Tim Robbins" -- "Christian Bale";
"Tim Robbins" -- "Heath Ledger";
"Tim Robbins" -- "Michael Caine";
"Michael Caine" -- "Christian Bale";
"Michael Caine" -- "Heath Ledger";
"Michael Caine" -- "Tim Robbins";
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top