문제

I'm learning about graphs(they seem super useful) and was wondering if I could get some advice on a possible way to structure my graphs.

Simply, Lets say I get purchase order data everyday and some days its the same as the day before and on others its different. For example, yesterday I had an order of pencils and erasers, I create the two nodes to represent them and then today I get an order for an eraser and a marker, and so on. After each day, my program also looks to see who ordered what, and if Bob ordered a pencil yesterday and then an eraser today, it creates a directed edge. My logic for this is I can see who bought what on each day and I can track the purchase behaviour of Bob(and maybe use it to infer patterns with himself or other users).

My problem is, I'm using networkx(python) and creating a node 'pencil' for yesterday and then another node 'pencil' for day2 and I can't differentiate them.

I thought(and have been) naming it day2-pencil and then scanning the entire graph and stripping out the 'day2-' to track pencil orders. This seems wrong to me(not to mention expensive on the processor). I think the key would be if I can somehow mark each day as its own subgraph so when I want to study a specific day or a few days, I don't have to scan the entire graph.

As my test data gets larger, its getting more and more confusing so I am wondering what the best practice is? Any generate suggestions would be great(as networkx seems pretty full featured so they probably have a way of doing it).

Thanks in advance!

Update: Still no luck, but this maybe helpful:

import networkx as nx
G=nx.Graph()
G.add_node('pencil', day='1/1/12', colour='blue')
G.add_node('eraser', day='1/1/12', colour='rubberish colour. I know thats not a real colour')
G.add_node('pencil', day='1/2/12', colour='blue')

The result I get typing the following command G.node is:

{'pencil': {'colour': 'blue', 'day': '1/2/12'}, 'eraser': {'colour': 'rubberish colour. I know thats not a real colour', 'day': '1/1/12'}}

Its obviously overwriting the pencil from 1/1/12 with 1/2/12 one, not sure if I can make a distint one.

도움이 되었습니까?

해결책

This is mostly depending on your goal actually. What you want to analyze is the definitive factor in your graph design. But, looking at your structure, a general structure would be nodes for Customers and Products, that are connected by Days (I don't know if this would help you any better but this is in fact a bipartite graph).

So your structure would be something like this:

node(Person) --- edge(Day) ---> node(Product)

Let's say, Bob buys a pencil on 1/1/12:

node(Bob) --- 1/1/12 ---> node(Pencil)

Ok, now Bob goes and buys another pencil on 1/2/12:

          -- 1/1/12 --
         /            \
node(Bob)              > node(Pencil)
         \            /
          -- 1/2/12 --

so on...

This is actually possible with networkx. Since you have multiple edges between nodes, you have to choose between MultiGraphMor MultiDiGraph depending on the directed-ness of your edges.

In : g = networkx.MultiDiGraph()

In : g.add_node("Bob")
In : g.add_node("Alice")

In : g.add_node("Pencil")

In : g.add_edge("Bob","Pencil",key="1/1/12")
In : g.add_edge("Bob","Pencil",key="1/2/12")

In : g.add_edge("Alice","Pencil",key="1/3/12")
In : g.add_edge("Alice","Pencil",key="1/2/12")

In : g.edges(keys=True)
Out:
[('Bob', 'Pencil', '1/2/12'),
 ('Bob', 'Pencil', '1/1/12'),
 ('Alice', 'Pencil', '1/3/12'),
 ('Alice', 'Pencil', '1/2/12')]

so far, not bad. You can actually query things like "Did Alice buy a Pencil on 1/1/12?".

In : g.has_edge("Alice","Pencil","1/1/12")
Out: False

In : g.has_edge("Alice","Pencil","1/2/12")
Out: True

Things might get bad if you want all orders on specific days. By bad, I don't mean code-wise, but computation-wise. Code-wise it is rather simple:

In : [(from_node, to_node) for from_node, to_node, key in g.edges(keys=True) if key=="1/2/12"]
Out: [('Bob', 'Pencil'), ('Alice', 'Pencil')]

But this scans all the edges in the network and filters the ones you want. I don't think networkx has any better way.

다른 팁

Graphs are not the best approach for this. A relational database such as MySQL is the right tool for storing this data and performing such queries as who bought what when.

Try this:

Give each node a unique integer ID. Then, create a dictionary, nodes, such that:

nodes['pencil'] = [1,4,...] <- where all of these correspond to a node with the pencil attribute. Replace 'pencil' with whatever other attributes you're interested in.

Just make sure that when you add a node with 'pencil', you update the dictionary:

node['pencil'].append(new_node_id). Likewise with node deletion.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top