Compare two strings in python

Question 1

Basically, you want the idom if test_string in list_of_strings. Looks like you don't need case sensitivity, so you might want

if test_string.lower() in (s.lower() for s in list_of_strings)

In your case:

>>> originals = ['0430f244a18146a0815aa1dd4012db46', '0430f244a18146a0815aa1dd40 12db46', '59739CCDA2F15D5AC16DB6695CAE3378']
>>> test = '59739ccda2f15d5ac16db6695cae3378'
>>> if test.lower() in (s.lower() for s in originals):
...    print '%s is match, yeih!' % test
... 
59739ccda2f15d5ac16db6695cae3378 is match, yeih!

Question 2

Looks like you're having a problem since the case isn't matching on the letters. May want to try:

def comparemd5():
    origmd5=[item.lower() for item in getreferrerurl()]
    dlmd5=md5_for_file(file_name)
    print "original md5 is",origmd5
    print "downloader file md5 is",dlmd5
    s = difflib.SequenceMatcher(None, origmd5, dlmd5)
    print "ratio is:",s.ratio()

Question 3

Given the input:

original md5 is ['0430f244a18146a0815aa1dd4012db46', '0430f244a18146a0815aa1dd40 12db46', '59739CCDA2F15D5AC16DB6695CAE3378']

downloader file md5 is 59739ccda2f15d5ac16db6695cae3378

You have two problems.

First of all, that first one isn't just an MD5, but an MD5 and two other things.

To fix that: If you know that origmd5 will always be in this format, just use origmd5[2] instead of origmd5. If you have no idea what origmd5 is, except that one of the things in it is the actual MD5, you'll have to compare against all of the elements.

Second, the actual MD5 values are both hex strings representing the same binary data, but they're different hex strings (because one is in uppercase, the other in lowercase). You could fix this by just doing a case-insensitive comparison, but it's probably more robust to unhexlify them both and compare the binary values.

In fact, if you've copied and pasted the output correctly, at least one of those hex strings has a space in the middle of it, so you actually need to unhexlify hex strings with optional spaces between hex pairs. AFAIK, there is no stdlib function that does this, but you can write it yourself in one step:

def unhexlify(s):
    return binascii.unhexlify(s.replace(' ', ''))

Meanwhile, I'm not sure why you're trying to use difflib.SequenceMatcher at all. Two slightly different MD5 hashes refer to completely different original sources; that's kind of the whole point of MD5, and crypto hash functions in general. There's no such thing as a 95% match; there's either a match, or a non-match.

So, if you know the 3rd value in origmd5 is the one you want, just do this:

s = unhexlify(origmd5[2]) == unhexlify(dlmd5)

Otherwise, do this:

s = any(unhexlify(origthingy) == unhexlify(dlmd5) for origthingy in origmd5)

Or, turning it around to make it simpler:

s = unhexlify(dlmd5) in map(unhexlify, origthingy)

Or whatever equivalent you find most readable.