Question

I am scraping websites for information and it involves getting sha1 hashes of magnet links.

I get all the magnet links with a simple preg_match_all but in my results I am getting weird results, I understand that a magnet hash in its hexadecimal form is 40 characters long, but I am also getting results that return strings that are 32 characters long that contain other non hexadecimal values.

Two examples from my results, firstly a normal 40 hexadecimal hash within a magnet link,

array
    0 => string 'F5AD2D170C033736FD987106F04C3ABD6DF41D14' (length=40)

And the other weird results that I do not understand where the hash is a 32 non hexadecimal value,

array
    0 => string 'VPR33QQM3L6BFU5FGOZXMBNORAFFSZWW' (length=32)

Has the hash been packed in some way? I know it is not done with pack('H*', $hash) as that returns the binary of the hash? The magnet links do work as I have tested them.

More so you can see these hashes in use at this website

http://eztv.it

By hovering over the magnet links and looking a the magnet hash.

Thanks

Was it helpful?

Solution

Hashes in magnet links can be encoded using Base32. In your example,

VPR33QQM3L6BFU5FGOZXMBNORAFFSZWW

turns into

ABE3BDC20CDAFC12D3A533B37605AE880A5966D6

which is a valid SHA-1 hash.

OTHER TIPS

Basically its not a valid torrent info hash, only sha1 (40 bytes) are valid, a torrent client or bencode script would fail if you passed that as a torrent hash.

It seems its related to:

http://eztv.it/magnet:?xt=urn:btih:VPR33QQM3L6BFU5FGOZXMBNORAFFSZWW Which is noting.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top