SQL Server hashbytes seed

https://dba.stackexchange.com/questions/276018

08-03-2021
|

Вопрос

I'm trying to compare hashes generated by SQL Server HASHBYTES() with hashes generated with python's hash() (or hashlib) function. For both functions, the same algorithm is used. The problem is I don't know SQL's seed so I cant generate the same output using python. Is there a way to set SQL's seed? I wasn't able to find any information related to it.

Решение

I am able to get identical results between python and T-SQL code with the MD5 algorithm. For example, the NO COLLUSION string hashes to 0x5CA1A58C070F24EF1D4D2900E5727F37 on both platforms.

Example T-SQL code:

SELECT HASHBYTES('MD5', 'NO COLLUSION');

Example Python code:

import hashlib
result = hashlib.md5(b'NO COLLUSION')
print(result.hexdigest())

I'm not an encryption expert, but it's not clear to me what you mean by "SQL's seed". The MD5 algorithm doesn't appear to call for a random number to be added by the implementer. Such an algorithm would be quite inconvenient because it would result in the same problem that you're running into right now: different implementations of the same hashing algorithm would give different results for the same input.

Другие советы

Joe has correctly pointed out that Python's hashlib.md5 and SQL Server's HASHBYTES('MD5', ...) functions have the same output.

As an additional clarification, the built-in hash() function in Python is not intended to be used in the same way. It's implementation is platform specific, varies depending on the type of object being used, and is (as you mentioned) salted with a random value.

It's not an MD5 hash, and you won't have much luck trying to get it to match what's produced by HASHBYTES, as it's just a fundamentally different thing.

This random salt value is generated when the Python process starts up, so it will generally be different even across multiple executions of a script on the same machine. From the docs:

Note By default, the __hash__() values of str and bytes objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с dba.stackexchange