Rounding in Python NumPy when adding nodes in Networkx

https://stackoverflow.com/questions/16174254

11-04-2022
|

Question

Where do I get the trailing 0 or 9 from ? I checked at each step that no rounding issues appear and I got the correct results. However, when I add this numbers to the graph, rounding problems arise.

My full code is the following:

from __future__ import division
from math import sqrt
import networkx as nx
import numpy as np
from decimal import Decimal

n=4   #n is the nummber of steps in the graph.
a = np.array([ 1.1656,  1.0125,  0.8594])

g=nx.DiGraph() #I initiate the graph

#f2 checks for equal nodes and removes them
def f2(seq): 
    checked = []
    for e in seq:
        if (e not in checked):
            checked.append(e)
    return np.asarray(checked)

root = np.array([1])
existing_nodes = np.array([1])
previous_step_nodes = np.array([1])
nodes_to_add =np.empty(0)
clean = np.array([1])

for step in range(1,n):
    nodes_to_add=np.empty(0)
    for values in previous_step_nodes:
        nodes_to_add = np.append(nodes_to_add,values*a)

    print "--------"
    print "*****nodes to add ****" + str(f2(np.round(nodes_to_add,4)))
    print "clean = " + str(clean) + "\n"
    #Up to here, the code generates the nodes I will need 

    # This for loop makes the edges and adds the nodes.
    for node in clean:
        for next_node in np.round(node*a,4):
            print str(node ) + "     "  + str( next_node)
            g.add_edge(np.round(node,4), np.round(next_node,4))
#            g.add_edge(Decimal(np.round(node,4)).quantize(Decimal('1.0000')), Decimal(np.round(next_node,4)).quantize(Decimal('1.0000')))

    previous_step_nodes = f2(nodes_to_add)
    clean = f2(np.round(previous_step_nodes,4))
#    g.add_nodes_from(clean)

    print "\n step" + str(step) + " \n"
    print " Current Step :" + "Number of nodes = " + str(len(f2(np.round(previous_step_nodes,4))))
    print clean

print "How many nodes are there ? " +str(len(g.nodes()))

This code works and prints out a very neat description of the graph, which is exactly what I want. However, when I print the list of nodes, to check that the graph cointains only the number of nodes that I need to have, I get:

How many nodes are there ? 22
[1, 0.88109999999999999, 1.0143, 1.038, 0.74780000000000002, 
1.1801999999999999, 1.3755999999999999, 1.0142, 0.8609, 
0.88100000000000001, 0.85940000000000005, 1.1656,
1.1950000000000001, 1.0125, 1.5835999999999999, 1.0017, 
0.87009999999999998, 1.1676,
0.63480000000000003, 0.73860000000000003, 1.3586, 1.0251999999999999]

This is clearly a problem which is making my program useless. 0.88109999999999999 and 0.88100000000000001 are the same node.

So after checking stackoverflow for days, I came up with the conclusion that the only way around the problem was to use Decimal(). So, I replaced :

g.add_edge(np.round(node,4), np.round(next_node,4))

with

g.add_edge(Decimal(np.round(node,4)).quantize(Decimal('1.0000')), 
           Decimal(np.round(next_node,4)).quantize(Decimal('1.0000')))

However, the result was not what I expected: because

0.88109999999999999 = 0.8811
0.88100000000000001 =0.8810,

so Python still thinks of them as different numbers.

Ideally, I would prefer to not complicate the code using Decimal() and would like to cut off the decimals so that 0.88109999999999999 = 0.88100000000000001 = 0.8810 but I have no clue how to solve this problem.

Thanks to your replies, I have updated my code. I took the suggestion to use f2 as:

def f2(seq): 
    near_equal = lambda x, y: abs(x - y) < 1.e-5
    checked = []
    for e in seq:
        if all([not near_equal(e, x) for x in checked]):
            checked.append(e)
    return np.asarray(checked)

and I deleted all the numpy.round() because if I can remove nodes that are "similar" then I don't need any rounding at all.

However, python still cannot distinguish the nodes:

g.nodes() prints out 23 nodes, when there should only be 20: (Note: I tried it while changing the tolerance level 1.e-5, but did not get something different)

How many nodes are there ? 23

[0.63474091729864457, 0.73858020442900385, 0.74781245698436638,
 0.85940689107605128, 0.86088399667008808, 0.86088399667008819,
 0.87014947721450187, 0.88102634567968308, 0.88102634567968319,
 1, 1.00171875, 1.0125, 1.0142402343749999, 1.02515625,
 1.0379707031249998, 1.1655931089239486, 1.1675964720799117,
 1.180163022785498, 1.1949150605703167, 1.358607295570996,
 1.3755898867656333, 1.3755898867656335, 1.5835833014513552]

This is because: 0.86088399667008808, 0.86088399667008819; 0.88102634567968308, 0.88102634567968319 and 1.3755898867656333, 1.3755898867656335 are still being treated as different nodes.

Full code:

from __future__ import division
from math import sqrt
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt

mu1 = 0.05; sigma1= 0.25
n=4

a0=1
a1 = 1 + mu1/n + sigma1*sqrt(3)/sqrt(2*n)
a2 = 1 + mu1/n
a3 = 1 + mu1 /n - sigma1*sqrt(3)/sqrt(2*n)
a = np.array([a1,a2,a3])

print " a = " + str(a)

g=nx.DiGraph() #I initiate the graph

def f2(seq): 
    near_equal = lambda x, y: abs(x - y) < 1.e-5
    checked = []
    for e in seq:
        if all([not near_equal(e, x) for x in checked]):
            checked.append(e)
    return np.asarray(checked)

root = np.array([1])
existing_nodes = np.array([1])
previous_step_nodes = np.array([1])
nodes_to_add =np.empty(0)
clean = np.array([1])

print "________________This Makes the Nodes____________________________________"
for step in range(1,n):
    nodes_to_add=np.empty(0)
    for values in previous_step_nodes:
        nodes_to_add = np.append(nodes_to_add,values*a)
    print "--------"    
    print "*****nodes to add ****" + str(f2(nodes_to_add))
    print "clean = " + str(clean) + "\n"
    #Up to here, the code generates the nodes I will need 

    # This for loop makes the edges and adds the nodes.
    for node in clean:
        for next_node in node*a:
            print str(node ) + "     "  + str( next_node)
            g.add_edge(node, next_node)

    previous_step_nodes = f2(nodes_to_add)
    clean = f2(previous_step_nodes)
#    g.add_nodes_from(clean)

    print "\n step" + str(step) + " \n"
    print " Current Step :" + "Number of nodes = " + str(len(f2(previous_step_nodes)))
    print clean

print "______________End of the Nodes_________________________________"
print "How many nodes are there ? " +str(len(g.nodes()))
print sorted(g.nodes())

Result:

How many nodes are there ? 23 [0.63474091729864457, 0.73858020442900385, 0.74781245698436638, 0.85940689107605128, 0.86088399667008808, 0.86088399667008819, 0.87014947721450187, 0.88102634567968308, 0.88102634567968319, 1, 1.00171875, 1.0125, 1.0142402343749999, 1.02515625, 1.0379707031249998, 1.1655931089239486, 1.1675964720799117, 1.180163022785498, 1.1949150605703167, 1.358607295570996, 1.3755898867656333, 1.3755898867656335, 1.5835833014513552]

Solution

It is usually not a good idea to depend on exact equality between floating point numbers because the same set of inputs used to generate the numbers can produce different result due to differing floating point representations, order of mathematical operations, etc.

Unless you are dealing with extremely close nodes, you can modify your f2 function with something like the following (you may want to make the tolerance a variable):

def f2(seq): 
    near_equal = lambda x, y: abs(x - y) < 1.e-8
    checked = []
    for e in seq:
        if all([not near_equal(e, x) for x in checked]):
            checked.append(e)
    return np.asarray(checked)

Note that if the floating point numbers were exactly equal, an easier way to get a list with duplicates removed would be

nodes_without_dupes = list(set(nodes_to_add))

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow