Which free database system is best to store, and compute/analyze large text comparisons out of which the database would make usable statistics? [closed]

https://stackoverflow.com/questions/202715

03-07-2019
|

Question

I got to create script that would compare thousands of large texts (to each other). And I'm wondering if MySQL is the best solution for this. Is there any other free databse system I could use to do simple - but processor-time consuming computing?

Please, throw me into your knowledge's pool!

Edit: Nature of documents - 500-7000 character documents, -> comparing the documents if text matches the other document (plagiarism) and statistics -> % match of any sentence found with nice setups like that I'd like to set how many characters the other string could be different to be still considered as a match..

Technology should be server-based, I'm more interestedin DB and then I'd choose appropriate language to script it with.

More specification: The size of DB must be unlimited.

Solution

You should consider using Lucene. It allows you to store large amounts of text and query them really fast. With good relevance matching too.

OTHER TIPS

You don't mention the technology you will be using; size of the text entries or nature of the comparisons. However, I have founded h2 database to be excellent. It is native java and can be used as an in-memory database which makes setup trivial.

PostgreSql is a free database engine which is well scalable and widely used, besides MySQL.

Sql Server Express

I would recommend MySQL. It has a lot of built-in string handling functions.

cause you don't specify, why not SQL Server 2008 Express Edition?

All of the search features with the big brother SQL Server 2008 with the only problem that you can't exceed 4Gb of Database.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow