Storing dynamic objects with growing lists on disk

https://stackoverflow.com/questions/8408633

29-10-2019
|

문제

Okay so I have been developing a system so far in main memory that has many different objects and each object stores lists of other objects in the system. Now I want to move this to persistent storage. I'm not looking for the obvious answer of using a DBMS because the point is that I'm writing a custom database for my system.

Now for each object I'm assigning an ID. The ids can be looked up in a table to find the block and offset for the location of the data for that object. Now each object has lists/sets that point to other objects in the system. So obviously in the storage they will be lists of 8 byte (using longs for the ids) ids that can be used to find the other objects. Now my question here is that I know the lists will be growing over time so they need room to grow. My best thought so far for storing the lists so that I won't need to move around objects when they grow is to have each list assigned an id just like the objects so that they can looked up in a table just like the objects to find them on the disk.

Now each list portion will have a set allocated space to store 10 objects and then at the end will be the id of the next list portion if it contains more objects. This seems like a decent way to do it and to deal with constantly growing objects but I'm wondering if there are any better approaches. I would store the indexes in memory (space permitting) so given an object id, the lookup is in memory then it would take 1 I/O to find get it's data and list ids from the disk. then for each list you want to traverse through it would take another lookup and I/O for every 10 objects in the list or less if the block is cached.

The number of I/O's is not terrible and I would try to keep locality of list portions to eliminate unnecessary I/Os, but is there a better way of doing this? Am I right to try and store the lists separate from the object or should I consider methods of storing them with the object's data. My worry about doing that is that as one list grows it will run into another list and then need to be fragmented and this can get more complicated. Any suggestions are appreciated and thanks in advance.

해결책

Your idea of having these expandable lists is good. I think your explanation is missing some details (ie: ordered lists or not, what do you mean by trying to separate lists from objects, a diagram of these lists might help).

I would keep a sorted index in memory for fast access. The index would have list id, and location on disk. If you're interested in range queries go with a B tree approach, otherwise you could use a hashmap to store these indeces.

A further improvement, if you're doing searching on the lists, is to keep them sorted... or at least semi sorted so that you can group similar lists in the same chunk. This would speed up searching in the lists if you every so often cache to memory say the boundaries of each chunk (nodes with values b/w 1-9, 10-25, etc). Merge sort is probably the best sort for lists. Or even better, when you insert nodes in the lists insert in the correct location so the list is always sorted. Then look up with binary search. If data is not indexed properly and not sorted, you're going to disk multiple times for queries and in this case any search you use will give you linear time because of disk time.

You can also cache data nodes of the 10% most looked up nodes/lists.

Depending on the size of these lists (and how manyc chunks you have for them), you could use some RAID so you can get some parallel reads/writes.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow