Question

I am optimizing a part of a Delphi application where lists of objects are frequently filtered using different criteria. The objects are kept in TObjectList structures and it is common to select a very small percentage (ex. 1%) of the entire set with each filter. The total number of objects can be in the 100k range and during computations the main set doesn't change. Although the filters are applied for only a few properties, the lists cannot be sorted in such a way that would optimize all the possible criteria.

I am looking for suggestions on how to organize the objects (data structures) or an algorithm that can be used to approach this problem. Thank you!

Filter example:

  ((Object.A between 5 and 15) AND
  (Object.B < 20) AND
  (Object.C(AParam) > 0)) OR
  (Object.IsRoot(...))
Was it helpful?

Solution

Idea #1

Run your code in a profiler. Find out if there are any slow spots.

Idea #2

Possibly you could take advantage of cache effects by storing your objects sequentially in memory. (I'm assuming you are walking your list sequentially from beginning to end.)

One way to do it might be to use an array of records instead of a list of objects. If that's possible in your case. Remember that records in Delphi 2006 can have methods (but not virtual ones).

Another idea might be to write your own class allocator. I've never tried that, but here's an article I found. Maybe try walking the objects using a pointer instead of using the TObjectList.

OTHER TIPS

You can use a sorted list using the field, which constricts your search/filtering at most, and then perform a binary search to get the index/indexes of this criteria.

Referring to your example above: Generate an A-sorted list, search for the indexes of 5 and 15 and so you will get a (much) smaller list (all the indexes between the two), where you have to check the other fields (B, C and IsRoot).

Delphi (2009+) implementation of a sorted list: DeHL.Collections.SortedList

I know you don't use tiOPF, but there is a lot to learn from the source code of this project. Since you probably will be looping over the filtered objects, why not create a filter iterator?

This unit is a good start, but I think it is easier to download the complete source and browse through the files: http://tiopf.svn.sourceforge.net/viewvc/tiopf/tiOPF2/Trunk/Core/tiFilteredObjectList.pas?revision=1469&view=markup

This solution probably uses a lot of RTTI: just create a list with filters (property names and values) and loop over the objects. It will provide you with maximum flexibility but at the cost of some speed. If you need speed I think the solution provided by Ulrichb will be better.

If the quantity of objects is small, it probably doesn't matter too much how efficient the search is, though caching a result may help if done often.

If the quantity of objects is large, I'd consider using an in-memory database and using SQL to do the query. The database can then use indexes to find things as fast as possible, and you pass the burden to a proven tool. Personally I use DBISAM or ElevateDB, but others will do in-memory databases too. By using a real database tool, you can move the data to disk easily if it gets really large.

If the number of attributes to filter on is small and they can be sorted, why not have multiple lists, each if which is sorted by another attribute? Each list costs you 4 bytes per object for the reference plus a small overhead for the list itself. Of course all depends on whether the memory requirements can be fullfilled. 2 GB is not that much, if your are dealing with a lot of objects...

This is a very, very very difficult problem to solve in a generic way. There's one kind of software that does this all day long, and it's called an SQL query optimizer: that bit of code present in every single modern SQL engine will take a look at what you want (query), then take a look at the indexes available for your data, the selectivity of available indexes and it has to figure out the optimal way to use all that to give you your result set. Just to prove the problem is very difficult, the SQL query optimizer sometimes fails and produces visibly inefficient plans.

I'm pretty sure you don't want to implement an full-fledged query optimizer so here are some tips on how to make your queries fast enough:

(1) Select some fields that are frequently used in your queries and set up indexes on them. Those fields need to provide good selectivity: Don't index on a "boolean" value, you'd just loose time traversing complex binary search structures when you might just as fast (or faster) look at the whole list!

(2) For every given query select ONE single index to pre-filter the data and the apply all other filters one-by-one (without optimization).

In your example: Create indexes on the "A" and "B" fields. "C" seems to be a function so that's impossible to index. "IsRoot" seems to return an Boolean, that's not worth indexing.

The data structures to use for your data depend entirely on your data. If performance is critical implement several and do tests. If it's not critical just your favorite list sorting algorithm and be done!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top