Question

I need to store records about user locations based on IP addresses but Im not sure how best to model it.

For each address, we need to record the machine's details (ipaddress, agentId) and the machine's location (isocode, city). This information will never be updated - only inserted and read.

Queries on this data will need to pull location info for a specific user in a given time period.

Traditionally I would model this using a wide-row CF with JSON blobs, along the lines of:

CREATE TABLE user_location (
userid text,
timestamp timeuuid,
data text, -- json blob {agentid, isocode, city, ipaddress}
PRIMARY KEY (userid, timestamp)
);

Im now questioning whether this is the best approach, and whether I should instead be replacing the JSON with a user defined type (UDT), such as:

CREATE TYPE machinelocation (
isocode text,
city text,
ipaddress inet
);

CREATE TABLE user_location (
userid text,
timestamp timeuuid,
machinelocations map<text, machinelocation>
PRIMARY KEY (userid, timestamp)
);

or should I just do away with the blob entirely and separate out the json into dedicated columns, ala:

CREATE TABLE user_location (
userid text,
timestamp timeuuid,
agentid text,
isocode text,
city text,
ipaddress text,
PRIMARY KEY (userid, timestamp)
);

What is the recommended approach for modelling data of this type?

Was it helpful?

Solution

I would go with the separate columns, unless you really will always be pulling the full blob. Even then I would probably still go with separate columns. The best use case I see for UDT's is to be able to put them into collections, such that you can have a collection with multiple fields per item.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top