Database Structure & Hard drive seek time confusion

https://stackoverflow.com/questions/600400

03-07-2019
|

Question

could some one help me out trying to understand how hard drive seeking works.

I have a small binary database file which read performance is absolutely essential. If I need to skip a few bytes in the file is it quicker to use seek() or to read() then discard the unwanted data.

If the average seek time of a hard drive is 10ms and the read speed is 300MB/s I calculated that it's quicker to read() than seek() with a value smaller than 3MB. Is true? Is there an overhead when performing a new seek, which reading an existing stream doesn't have?

Which do you think be a more suitable file structure for an index.

Entry1:Value:PointerIntoToData
Entry2:Value:PointerIntoToData
Entry3:Value:PointerIntoToData
Data, Data, Data

Or

Entry1:Value:Data
Entry2:Value:Data
Entry3:Value:Data

When reading an entry if the value is not correct it will be ignored. So when streaming the file is it quicker to: 1. when an entry is not required use seek() to skip over it 2. when a entry is not needed read it then discard the data 3. or the use first structure, when an entry is required seek() into a data repository at the end.

Entry is 4 bytes, value is 8 bytes & data is 12KB

Cheers

Solution

All seek system call does is changing a position in file where the next read will be. It does not move the drive head. Drive heads move when data is read or written and you don't have direct control over what OS will do next.

Reading lots of data you aren't going to need has impact because all read data needs space in OS buffers and causes older data to be discarded. So using seek over big files will mess with filesystem cache less.

All I write beneath assumes you cannot fit whole database in memory. If you can, just do that. Read everything and try to append new and changed data at the end of file. Don't worry about wasted space, just do some compacting once in a while.

If your database is too big:

Data is read and written to physical drive in blocks (or pages). Similarly the basic unit of disk IO in your OS is page. If OS caches data from disk it's also in whole pages. So thinking whether you need to move forward few bytes using seek or read makes little sense. If you want to make it fast, you need to take into account how disk IO really works.

First, already mentioned by nobugz, locality of reference. If the data you use in each operation is located close together in a file, your OS will need to read or write less pages. On the other hand, if you spread your data, many pages will need to be read or written at once, which will always be slow.

As to data structure for index. Typically they are organized as B-trees. It's a data structure made especially for effective searching of large quantities of data stored in memory with paged reads and writes.

And both strategies for organizing data is used in practice. For example, MS SQL Server by default stores data the first way: data is stored separately and indices only contain data from indexed columns and physical addresses of data rows in files. But if you define clustered index then all data will be stored within this index. All other indexes will point to the data via clustered index key instead of physical address. The first way is simpler but the other may be much more effective if you often do scans of ranges of data based on clustered index.

OTHER TIPS

How "absolutely essential" is seek access? Have you tested your application with a non-optimal solution yet? During that testing, did you benchmark to determine where the real bottlenecks are? If you haven't, you'll be surprised by the results.

Next, try different methods and compare the running times. Test under different system loads (ie, when the system is idle except for your application, and when it is busy).

Consider that your optimizations based on your current hard drive may become incorrect when a new, faster hard drive has different internal optimizations that throw your work out the window.

A sequential read is always faster than one that requires a head seek (not position seek). Typical hard drive perf for sequential read is 50-60 MB/sec, seeking drops that down to a worst-case ~0.4 MB/sec. Once the drive heads are positioned, you essentially get the data in the cylinder for free. The file system cache takes advantage of that by pre-reading sectors from a cylinder.

However, you have no control over the placement of your data on disk cylinders. Nor can you guess the drive geometry. Note that throughput can get significantly worse over time when the volume gets fragmented. You'll need to look for perf by caching data in memory. At that point, you worry about locality of reference.

You could always map the file into memory and then access it through pointers and such. That should usually make your accesses simpler and faster.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow