As required by law in several countries we anonymize IP-addresses of our users in our log files. Using IPv4 we regularly just anonymize the two last bytes, eg. instead of 255.255.255.255 we log 255.255.\*.\*

What algorithm would you recommend to anonymize IPv6 addresses?

有帮助吗?

解决方案

At the very least you want to strip the EUI-64 off, i.e the last 64 bits of the address. more realistically you want to strip quite a lot more to really be private, since the remaining part will still identify only one subnet (i.e. one house possibly)

IPv6 global addressing is very hierarchical, from RFC2374:

 | 3|  13 | 8 |   24   |   16   |          64 bits               |
 +--+-----+---+--------+--------+--------------------------------+
 |FP| TLA |RES|  NLA   |  SLA   |         Interface ID           |
 |  | ID  |   |  ID    |  ID    |                                |
 +--+-----+---+--------+--------+--------------------------------+
 <--Public Topology--->   Site
                       <-------->
                        Topology
                                 <------Interface Identifier----->

The question becomes how private is private enough? Strip 64 bits and you've identified a LAN subnet, not a user. Strip another 16 on top of that and you've identified a small organisation, i.e. a customer of an ISP, e.g. company/branch office with several subnets. Strip the next 24 off an you've basically identified an ISP or really big organisation only.

You can implement this with a bitmask exactly like you would for an IPv4 address, the question becomes a legal one though of "how much do I need to strip to comply with the specific legislation", not a technical one at that point though.

其他提示

To anonymize public IPv6 addresses you could take the first 2 groups and replace the remaining part with CRC-16. Some examples (where abc1 and abc2 - are CRC-16 values):

  • 2001:0db8:85a3:0000:0000:8a2e:0370:7334 -> 2001:0db8-abc1
  • 2a02:200:7::123 -> 2a02:200-abc2

Such shortening allows easy matching of the first 2 groups (of course with some probability) with non-anonymized IPv6 in full logs having shorter retention time. Which is good for problem or security incident investigation.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top