Question

So I'm doing a task where I take a massive list (30,000+) of movies on Wikipedia that have multiple columns (such as the films name, the genre, the cast, the plot etc.) and upload it into Elasticsearch. However, after doing that I now want to make it so the table is in at least 1NF. I'm not really experienced in database design and the last time I did anything with Normal Form was a few years ago. So I'm looking at this table and thinking, how could I put this into 1NF. It's easy if for example there's only 1 column that has multiple values, but what do you do when there are multiple columns with multiple values as seen below.

Film Name Director Cast Genre Wiki Page Plot
Chimmie Fadden Out West Cecil B. DeMile Victor Moore Comedy, Western https://en.wikipedia.org/wiki/Chimme_Fadden_Out_West Chimmie is sent out west...
20,000 Leagues Under the Sea Stuart Paton Lois Alexander, Curtis Benton, Wallace Clarke, Allen Holubar Action, Adventure https://en.wikipedia.org/wiki/20,000_Leagues_Under_the_Sea_(1916_film) A strange...
The Cat and the Canary Paul Leni Laura La Plante, Forrest Stanley, Creighton Hale Comedy, Horror, Mystery https://en.wikipedia.org/wiki/The_Cat_and_the_Canary_(1927_film)| In a...

Would you just have to do something like this...

Film Name Director Cast Genre Wiki Page Plot
Chimmie Fadden Out West Cecil B. DeMile Victor Moore Comedy https://en.wikipedia.org/wiki/Chimme_Fadden_Out_West Chimmie is sent out west...
Chimmie Fadden Out West Cecil B. DeMile Victor Moore Western https://en.wikipedia.org/wiki/Chimme_Fadden_Out_West Chimmie is sent out west...
20,000 Leagues Under the Sea Stuart Paton Lois Alexander Action https://en.wikipedia.org/wiki/20,000_Leagues_Under_the_Sea_(1916_film) A strange...
20,000 Leagues Under the Sea Stuart Paton Lois Alexander Adventure https://en.wikipedia.org/wiki/20,000_Leagues_Under_the_Sea_(1916_film) A strange...
20,000 Leagues Under the Sea Stuart Paton Curtis Benton Action https://en.wikipedia.org/wiki/20,000_Leagues_Under_the_Sea_(1916_film) A strange...
20,000 Leagues Under the Sea Stuart Paton Curtis Benton Adventure https://en.wikipedia.org/wiki/20,000_Leagues_Under_the_Sea_(1916_film) A strange...
20,000 Leagues Under the Sea Stuart Paton Wallace Clarke Adventure https://en.wikipedia.org/wiki/20,000_Leagues_Under_the_Sea_(1916_film) A strange...
20,000 Leagues Under the Sea Stuart Paton Wallace Clarke Action https://en.wikipedia.org/wiki/20,000_Leagues_Under_the_Sea_(1916_film) A strange...

etc? I'm surely missing something extremely simple when it comes to converting a table with multiple cells with multiple values into 1NF, but I'm not sure what.

Thanks.

Was it helpful?

Solution

So it's actually pretty easy to normalize when there's multiple fields, some with varying amounts of data points in a single field of the same row. Just follow this rule: Any column that has multiple data points within the column of the same row should become it's own table. So in your example that could be Cast and Genre. It's immediately apparent that those two columns represent a many-to-many relationship because of the very fact that there's multiple values stored in a single column of the same row.

As nbk mentions, you'll need a linking / bridge table to store that many-to-many relationship. So while your new Cast table may have columns like CastId (primary key), FirstName, and LastName, your linking table between Cast and Film would be named something like FilmCast and have the field FilmId (from your Film table) with a foreign key reference, and it would also have the CastId with a foreign key reference to the Cast table. Then every row in that FilmCast linking table would represent a specific single Cast person for a single specific Film.

You would repeat this same ideology for each other column in your Films table with multiple data points per row. Once you have the appropriate tables for each normalized column then you have no need to store that data in the main Film table anymore and could remove those columns from it.

OTHER TIPS

Normalisation is to remove information from tables, that are repe4ated by many times and ids as int are smaller than any text.

The bridge Tables you need, because you have a m:n relationship between film and users(cast, director, musician...)

Occupation is in my opionion a attributs of the relationship between film and user

Film (idfilm,Titel, plot,Wiki_Page, year,... )

Film2user (idfilm,iduser,idtype)

type  (idtype,occupation)

user (iduser, Name, Birth,...)

genre (idgenre,name)

Film2genre(idfilm, idgenre)

as you develop further, you can add more attributes or tables if you find more such redundant information

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top