How do database operations (write, update, alter) on particular cell in a table are written to disk without overwriting entire file?

https://dba.stackexchange.com/questions/234890

29-01-2021
|

Question

When I want to write to or alter particular cell of an excel file/table using python I'll use pandas read_csv then alter value in particular cell and write back to file with to_csv. But writing back to file seems to be overwriting the entire file with updated version of the file that differs only in single cell. This is a problem when am altering one or two cells in table of trillion rows and trillion columns.

When we are making write/alter database operations (like in SQL) on single cell out of trillion by trillion table, it seems to be making changes in the disk to only the cells that are modified rather than overwriting entire files.

How do databases facilitate writing/updating only particular cell in a table in the disk rather than overwriting entire file?

BTW, am not using SQL database as my table contains numerical column names and SQL doesn't support that. If you know any SQL/NOSQL database that supports numerical values as column names please let me know.

Solution

A very common misconception about Databases can be dispelled with this:

    Database != File

When you update a Row in a database, the underlying data file on disk isn't touched at all - at least not for "some time". Instead, the database makes a note of the change in its Transaction Log, then updates the value in memory. "Some time" later, the database might get around to needing that bit of memory for something else and will write the changed value to disk. How often that happens and how large a chunk of memory get written at a time varies from DBMS to DBMS.

Data storage in databases is measured in Pages, each of which can contain a number of Rows; those things that make up the Tables that you and I play with. When a Database needs some data, it works out where that data is in its data file(s) and then loads only those Pages that are relevant into its Buffer Cache (memory). This is why some queries run slowly the first time you run them, but are lightning fast thereafter - serving the same Page over and over from the Cache is way faster than hauling it up from the data file on disk.

... I am not using SQL database as my table contains numerical column names ...

Here's another misconception about databases, again easily dispelled:

    Database != SpreadSheet

The way you structure data in Databases can seem quite "alien" when you're starting out; you seem to need to use "complicated", "artificial" constructs instead of just "rows" and "columns" of data. But, once you gain an understanding of why you need these structures and the power that they give you over your data, you'll get over it pretty quickly.

... operations ... on single cell out of trillion by trillion table ...

Do you really have a useful value for every single value in a trillion by trillion table? Personally, I'd doubt it, unless you work for Google.

I'd suggest what you actually have is a Sparse Array, where you have more "holes" than data. That's a structure that Relational Tables can support very easily and really quite efficiently.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange