Redundant data or two queries?

https://stackoverflow.com/questions/16742874

30-05-2022
|

Question

Take, for instance, a table that holds credit card charges, a table that holds credit cards, and a table that holds users.

Each charge is associated with exactly one card, and each card is associated with exactly one user. A user may have multiple cards on file.

If I were to hold this data in a MySQL database in three distinct tables, like so:

charges:

---------------------------------------------
id | card | amount | description | datestamp
---------------------------------------------
5  | 2    | 50.00  | Example     | 1369429422

cards:

------------------------------------------------------------------
id | user | name       | number           | cvv2 | exp_mm | exp_yy
------------------------------------------------------------------
2  | 1    | Joe Schmoe | 4321432143214321 | 555  | 1      | 16

users:

-------------------------------------------
id | first_name | last_name | email
-------------------------------------------
1  | Joe        | Schmoe    | joe@schmoe.co

Now, let's say that I want to access a user, given a charge. In order to get to the user, I would first have to look up the card associated with the charge, and then look up the user associated with the card. Obviously, in an example like this, speed would be negligible. But in other scenarios, I see this as being two queries.

However, if I stored the data like so:

charges

----------------------------------------------------
id | card | user | amount | description | datestamp
----------------------------------------------------
5  | 2    | 1    | 50.00  | Example     | 1369429422

then the charge would be associated directly with a user. That, however, is redundant information, since that same data is stored in the cards table.

Thoughts?

Solution

Your instinct not to include the user information in the charges table is correct; however, it's still only one query:

select first_name, last_name, email
from users, cards, charges
where users.id = cards.user
and cards.id = charges.card
and charges.id = 5;

That would give you the user info for the charge with id 5. This is the exact thing that relational databases are best at :) This kind of thing is called a "join" because it joins multiple tables together to give you the information you need. There are multiple ways to write this query.

As an aside, perhaps this is a contrived example, but if this is an application you are writing from scratch, there are lots of reasons to avoid storing credit cards in your own database. Often a payment processor can handle the details for you while still allowing you to charge credit cards. More info.

OTHER TIPS

You could denormalize by adding the user id to the charges table. You need to know if that is necessary, given the expected size of the tables. If this optimization is warranted, use it. If you don't know, then optimize in the future as necessary.

As it stands, you don't need two queries though

SELECT users.* FROM charges
JOIN cards ON charges.card = cards.id
JOIN users ON cards.user = users.id
WHERE charges.id = ?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow