Question

I don't have experience in database development, so I need your suggestions in choosing of a database that can be used in Firemonkey.

I need to store html files (without media now, but they can be with), their total size is around 20 GB (uncompressed text). A main feature must be maximally fast searching of text in the database, and it must be possible to implement human searching (like google). Plus, there can be compression (20 GB is to much to store. If compression makes searching slow it's not required).

What kind of databases are appropriate for my concern? Thanks a lot for your suggestions!

Edited

Requirements:

  1. Price: Free
  2. Location: local or remote
  3. Operating system support: Windows
  4. System requirements: a database with a large footprint (hopefully in exchange of better performances)
  5. Performances: fast text searching
  6. Concurrent users: 20
  7. Full text indexing and searching: human (Google-like) fast text searching is required
  8. Manageability: doesn't matter much

I know an on-line web legal database that can search words through 100 GB of information in milliseconds. I need the same performance, and Google-like searching is required.

Was it helpful?

Solution

Delphi database access layer is separated from FireMonkey, it's the same used by VCL (although FM AFAIK relies only on LiveBindings to access data, but that's not an issue in your case).

Today 20 GB are really not much data. Almost any database will handle them without much effort if properly configured. What engine to choose depends on:

  • Price: how much are you going to spend for it?
  • Location: do you need a local database (same machine) or a remote one (LAN or WAN)?
  • Operating system support: which OS should it run on?
  • System requirements: do you need a database with a small footprint, or you can use one with a larger one (hopefully in exchange of better performances)?
  • Performances: what are the required performances?
  • Concurrent users: how much user will connect to the database concurrently?
  • Full text indexing and searching: not all databases offer it out of the box
  • Manageability: some databases may require more management than others.

There is no "one database fits all" yet.

OTHER TIPS

I'm no DBA so I can't say directly, and honestly I'm not sure that any one person could give a direct answer to this question as it's one of those it just depends scenarios.

http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems

That's a good starting point to compare features and platform compatibility. I think the major thing to consider here is what hardware will be running it and how can you best utilize that to accomplish the task at hand.

If you have a server farm being sure your DB supports distribution and some sort of load balancing (most do to some degree from what I understand).

To speed up searching unless you code up a custom algorithm that searches the compressed version somehow I think you're going to want to keep the data un-compressed. Searching the compressed data actually might be faster. If you're able to use the index for the compressed file to compare against your plain text search parameters then are just looking for those keys that were matched within the index. If any are found in the index check for them within the compressed data. Without tons of custom code I haven't heard of any DB that supports this idea of searching compressed text (though I could easily be wrong on this point).

If the entire data set needs to be decompressed before doing the search it will very likely be much slower (memory is relatively cheap compared to CPU time). It looks like Firemonkey has a limited selection of DBs to use so that will help to narrow your choices down as well.

What I would suggest based on your edited question, is to write (or find) a parser or regular expression to extract all the important elements from the HTML that you would like to be searchable. Then store those in a database along with a reference for where they were found in the HTML. In terms of Google like searching, if you mean in terms of how it can correct misspellings and use synonyms, you probably need some sort of custom code to do dictionary look ups for spelling and thesaurus look ups for synonyms. I believe full text searching in any modern DB will handle the need to query with LIKE or similar statements in the where clause.

Looks like ldsandon's answer covers most of this anyhow. TLDR; if not thanks for reading.

I would recommend PostgreSQL for this task. It has good performance, and built in full text search capability for Google-like searching. And it's free and open source.

Unfortunately Delphi doesn't come with Postgres data access components out of the box. You can connect by ODBC, or you can purchase components available from, for example, Devart, DA-Soft or microOLAP.

Have you considered NoSQL databases? The Wikipedia article explains their differences to SQL databases and also mentions that they are suited as document store.

http://en.wikipedia.org/wiki/NoSQL

The article lists around twelve implementations in the document store category, many are open source. (Jackrabbit, CouchDB, MongoDB).

This question on Stackoverflow contains some pointers to Delphi clients:

Delphi and NoSQL

I would also consider caching on the application server, to speed up search. And of course a text indexing solution like Apache Lucene.

I would take Microsoft SQL Server Express Edition. I think 2008 R2 is latest stable version but there is also Denali (2011). It match all criterien you have.

You can use ADO to work with.

Try the Advantage Database Server.

It's easy to manage and configure. Both dbase-like and SQL data management languages. Fast indexed full text search capabilities. Plus, unparalled support from the developers themselves.

The local server (stand-alone version, as opposed to the network based server) is free.

devzone.advantagedatabase.com

There is a Firebird version with full text search according to its documentation - http://www.red-soft.biz/en/document_21 - it uses Apache Lucene, a popular search engine

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top