質問

Say I'm running some sort of public web service and I'd obviously like to collect metrics. For the sake of this argument, let's assume the data I'm interested in would only be what is available from parsing standard Apache access logs. Is there a way to maintain these types of analytics without also keeping identifying information about users?

I've thought about things like hashing IP addresses but this has many obvious problems.

役に立ちましたか?

解決

Yes. You can anonymize IPs using HMAC if you do not wish to store IP addresses in plain-text. The problem might be with referer url's which often contain query parameters, same for requests. If for example, a users mail is in the query you would have to substitute it with (e.g. an uuid).

The problem with hashing IPv4 addresses is they are 32-bit so it is very easy to do a brute-force search. HMAC could improve this situation a bit unless the key is protected. https://panopticlick.eff.org/ uses this technique (with a periodic key removal/change).

You could actually use http://bug.st/mod_anonstats to anonymize the IPs but still count the users.

Referers can be quite easily solved for sensitive links: http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#link-type-noreferrer This however, assumes a modern browser.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top