Is it possible to calculate a SHA1 hash by feeding chunks to the rep xsha1 instruction?

VIA's Padlock programming guide shows an example of calculating the SHA1 hash of a file, but it loads the entire file into memory before calling the hash instruction.
I also don't see any info regarding calculating SHA hashes in chunks. One may assume that, because you need to initialise the output buffer values, that just simply calling rep xsha1 again without reinitialising them may work, however after testing, this doesn't seem to be the case (I'm feeding it blocks of 64 bytes if that matters).

I don't know the internals of how SHA1 hashing works, but I'm guessing that there's a finalisation step which you need to somehow stop the hardware doing if loading the data in chunks.

Does anyone know of an efficient way to calculate a SHA1 hash in chunks?

有帮助吗?

解决方案

Depends on the CPU. On VIA Nano and later, you can perform partial hashes by setting EAX to FFFFFFFF before executing the REP XSHA1/256 instruction - and the CPU won't perform the final padding (so you can simply feed the chunks into the hash, just as you usually do with hashing functions). On older models (up to C7), such a possibility is not present, EAX has to be set to zero before the hash instruction, and a full hash (i.e. including the final padding) is performed.

I successfully implemented the hack mentioned above (on Windows) and it worked [tested on VIA Nano with EAX=0 though, don't have access to an old CPU]. But yes, there's a performance penalty here, so you don't want to feed tiny chunks into the code. I suggest to buffer small chunks into a bigger buffer, say a few kilobytes, and only then perform the "interrupted hash". If you finish with less data than that, it may be better to fall back to ordinary x86 code.

Since I can't comment/reply on other posts, here's a reply to the comment below:

I'm afraid I can't share my code, but I suggest to google for "PadlockSDK_3.1_Release_20090121.zip" That's the official Via source containing the relevant functions (look e.g. inside PadlockSDK_3.1_Release_20090121\PadlockSDK_3.1_build20081128\sdk\src - there's the assembly implementation of asm_partial_sha1_op3() function).

其他提示

Well, I found this: http://www.logix.cz/michal/devel/padlock/phe_sum.xp

PHE saves its current state into a memory on every process switch and as well on any page fault that occurs during the run. This state includes number of bytes hashed and an intermediate result that could be used as an initial value for subsequent rounds. So far so good. The only remaining question is how to trigger a context switch or a page fault at the place we need. Solution: mmap(2) two or more pages, mprotect(2) the last one to deny all access (PROT_NONE). This creates an inaccessible piece of memory exactly at the place we need. Now we put all our input data just before this barrier and engage PHE. However we'll tell it to hash slightly more data than we put into the buffer. With these instructions PHE will crunch all our input and attempt to hash some more. At that point it hits the protected area, trigges an exception, saves current intermediate status into the memory and calls the exception handler (well, not exactly and not exactly in this order, never mind ;-). Anyway the exception handler skips over the PHE instruction (hacky hack, EIP+=4 ;-) and returns.

Clever hack, but I don't know about the performance penalty of doing this.

Doing some testing, it seems that it never completes if the file is larger than the input buffer, i.e. the hack doesn't appear to be working for me, so it seems rather fragile, though the theory sounds okay.

So from what I've found, there's no particularly ideal way to feed xsha1 in chunks. (seems a little pointless to have hardware accelerated hashing support without being able to feed it large amounts of data nicely)

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top