Is there some way to test whether two pkl files have the same data in python?

StackOverflow https://stackoverflow.com/questions/22266412

  •  11-06-2023
  •  | 
  •  

Вопрос

I need to compare whether two objects have the same data in python, but some types don't support ==. Can I make pkl files out of both of them and then compare the byte data? If that doesn't work, is there some way to compare their byte data (say we don't know if we're dealing with two tuples that may contain different data types, lists, matrices, dataframes, etc)? Writing a comparison function that has different cases based on whether we're looking at tuples that contain matrices, dataframes, etc, seems very messy.

Это было полезно?

Решение

It's not even guaranteed that two objects that compare equal with == pickle the same:

>>> x = (1,)
>>> y = (x, x)
>>> z = ((1,), (1,))
>>> y == z
True
>>> pickle.dumps(y) == pickle.dumps(z)
False
>>> {-1, -2} == {-2, -1}
True
>>> pickle.dumps({-1, -2}) == pickle.dumps({-2, -1})
False

Serializing objects to compare their serialized forms is not a workable general-purpose equality comparison. If you want to define your own concept of equality, writing your own equality comparison function is probably your best bet.

Другие советы

If the object doesn't implement __eq__, then it's probably not valid to do an equals comparison.

If you have some way of defining if they are equal, simply define your own comparison method that looks at the attributes of the two objects and returns true if they are equal. I.E.:

 def cmp(obj_a, obj_b):
     return a.att1 == b.att1 and a.att2 == b.att2 ... etc 

With respect to Pickle, it makes no guarantees about the contents of its raw data, only that if you unpickle it it will result in the same object.

There is a good module called File Compare that I've used a few times. I'm not really a programming whiz so I don't want to give you some wack advice. In my limited experience with this sort of application, the python module works well roughly 90% of the time. Here is the code I used:

  injury_compare =  filecmp.cmp('/Users/MacBookPro15/injuryc', '/Users/MacBookPro15/injury")

  print "injury files are %s" % inury_compare

The compare returns a true/false, but I also think there is something in the module that returns a "+" for a different line so you could also work with that. Basically, if you get a "+" returned the files are different. I could also recommend using the bash/linux utility hexdump which shows you the low level bytes in a pretty spartan bull illustrative fashion. It's simple too....hexdump file1. Even For someone like me who lacks even a modicum of understanding regarding what hexdump outputs, one can still discern some patterns even without knowing exactly what the bytes actually mean.There is also a difference function in bash/linux which I think you run accordingly (not 100 percent sure but it sounds familiar): diff file1 file2

Sorry I can't articulate some of the finer points but I hope something there helps. Good Luck!

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top