Domanda

A PHP or Python script periodically fetches a large dataset of IP addresses (/32 netmask) from a remote database. In-between fetches the dataset will be temporarily stored in APC or Memcached key store.

The main job of the script is to check if a given ip-address exists in the database/cache (think: "blacklist").

What would be the most efficient (performance wise) way to:

  1. Store the IP addresses in APC/Memcache
  2. Compare a given IP to the stored IP list.

What i have come up with so far:

Alternative 1 Store all IP addresses as a large array-list as the value of a single key in APC, then do a

if (in_array("192.168.0.1", $ip_list_from_cache))

Alternative 2 Store each IP as key-name in APC, then do a

if (apc_exists('192.168.0.1')

This is a large list and i want the compare check to be very fast.

Thanks in advance for any comments!

È stato utile?

Soluzione

The solution to performance dilemmas is usually to benchmark both solutions.

In this case though, I'd say the cache approach makes a lot more sense: the time complexity of in_array is O(N), that is, a linear sweep. On the other hand, Caches are usually implemented as hash tables, where lookup is O(1).

Also, if you aggregate the records in Memcached, you'll avoid wasting a lot of RAM duplicating the list in memory once per web worker process.

It would also arguably be a much cleaner solution.


On a side note, did you consider doing this at another level? With some light scripting, you could do your checks at the LB (e.g. Nginx) level.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top