Question

My service should not collect user access IP address to identify user, this is illegal in my country. And stored data must not be able to decrypted, for the case that server got attacked. So, I should corrupt some bit of IP address.

I believe that corrupting some bits of IP address makes me not to violate the law, and provide good defense against strong rainbow tables.

But I want to maintain the uniqueness of IP address as much as possible.

Which bit is more important than other bits in IP address? (32bit)

Was it helpful?

Solution

First a little bit about IP address structure:

IP addresses are aggregated by prefix. Take my own IPv4 address 37.77.56.75. In this example the ISP has block 37.77.56.0/21, which means that the prefix is 21 bits long, and the last (32 - 21 =) 11 bits can be used by the ISP. The ISP delegated to me the prefix 37.77.56.64/27, which leaves me (32 - 27 =) 5 bits to use. I put that whole prefix the LAN of my home network. I then chose to use bits 01011 for my PC, which in this prefix gives IPv4 address 37.77.56.75.

For IPv6 the structure is the same. The addresses are just 128 bits long and written down in hexadecimal (which matches the binary structure and prefix lengths much better than the decimal notation of IPv4). For IPv6 the addresses in this example are:

The ISP has 2a00:8640::/32, delegates 2a00:8640:0001::/48 to me, I put 2a00:8640:0001:0000::/64 on my home LAN, and my PC has address 2a00:8640:0001:0000:3528:2df9:b368:e9e9.

Usually you don't write all the leading zeroes in IPv6 addresses, but I included them for clarity.

What you probably need:

If I understand your question correctly you want to maintain the uniqueness of each address but in such a way that the original address cannot be recovered. The way to do that is to use a hashing algorithm. Make sure you always input the addresses in the same way as either binary strings or if you use a printable string make sure you always use the canonical representation. You can use inet_pton/inet_ntop for that. This is an example in Python:

import md5
import socket 

bad_v4 = '010.001.002.003'            
binary_v4 = socket.inet_pton(socket.AF_INET, bad_v4)                   
canonical_v4 = socket.inet_ntop(socket.AF_INET, binary_v4)              
hash_v4 = md5.md5(canonical_v4).hexdigest()       

print 'Bad IPv4:', bad_v4            
print 'Good IPv4:', canonical_v4       
print 'MD5 digest (in hex):', hash_v4

print ''

bad_v6 = '2A00:8640:001:0:0:0:aB0:cDeF'            
binary_v6 = socket.inet_pton(socket.AF_INET6, bad_v6)                   
canonical_v6 = socket.inet_ntop(socket.AF_INET6, binary_v6)              
hash_v6 = md5.md5(canonical_v6).hexdigest()       

print 'Bad IPv6:', bad_v6            
print 'Good IPv6:', canonical_v6       
print 'MD5 digest (in hex):', hash_v6

This will give you this output:

Bad IPv4: 010.001.002.003
Good IPv4: 10.1.2.3
MD5 digest (in hex): 447d3c6954efb460e6f47e331615176f

Bad IPv6: 2A00:8640:001:0:0:0:aB0:cDeF
Good IPv6: 2a00:8640:1::ab0:cdef
MD5 digest (in hex): b3d5aa35466b0564044ecfb6f558615c

And then use the hash as the identifier instead of the address.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top