Depends on the CPU. On VIA Nano and later, you can perform partial hashes by setting EAX to FFFFFFFF before executing the REP XSHA1/256 instruction - and the CPU won't perform the final padding (so you can simply feed the chunks into the hash, just as you usually do with hashing functions). On older models (up to C7), such a possibility is not present, EAX has to be set to zero before the hash instruction, and a full hash (i.e. including the final padding) is performed.
I successfully implemented the hack mentioned above (on Windows) and it worked [tested on VIA Nano with EAX=0 though, don't have access to an old CPU]. But yes, there's a performance penalty here, so you don't want to feed tiny chunks into the code. I suggest to buffer small chunks into a bigger buffer, say a few kilobytes, and only then perform the "interrupted hash". If you finish with less data than that, it may be better to fall back to ordinary x86 code.
Since I can't comment/reply on other posts, here's a reply to the comment below:
I'm afraid I can't share my code, but I suggest to google for "PadlockSDK_3.1_Release_20090121.zip" That's the official Via source containing the relevant functions (look e.g. inside PadlockSDK_3.1_Release_20090121\PadlockSDK_3.1_build20081128\sdk\src - there's the assembly implementation of asm_partial_sha1_op3() function).