Question

I'm looking for a solution for reverse engineering a DB without foreign keys (really! a 20 years old DB...). The intention is to do this completely without additional application or persistence logic, just by analyzing the data. I know this would be somewhat difficult, but should be possible if the data itself esp. the PKs are analyzed as well.

No correct solution

OTHER TIPS

I don't think there is a universal solution to your problem. Hopefully there is some sort of a naming convention for the tables/columns that can lead you. You can query the system tables to try and figure what's going on (Oracle: user_tab_columns, SQL Server: INFORMATION_SCHEMA.COLUMNS, etc.). Good luck!

I also don't think you'll find a universal solution to this... but I'd like to suggest you an approach, especially if you don't have any source code that could be read to guide you on mapping:

First scan all tables on your database. With scan, I mean store table names and columns. You can assume columns types trying to convert data to a specific format (start trying to convert to dates, numbers, booleans and so on)... You can also try to discover data types by analysing its contents (if it has only numbers without floating points, if it has numbers with slashes, if it has long or short texts, if it is only one digit..etc.).

Once you have mapped all tables, start by comparing contents of all columns that has a numeric type. (Why? If the database was designed by a human... then he/she/they probably will use numbers as primary/foreign keys).

Every single time you find more than X successful correspondences between the contents of 2 columns from 2 different tables, log this connection. (This X factor depends on the amount of records you have...)

This analysis must run for each table comparing all other tables... column by column - so... this will take some time...

Of course, this is an overview of what need to be done, but it is not a complex code to be written...

Good luck and let me know if you find any sort of tool to do this! :-)

No offense but you can't have been in databases very long if this surprises you.

I am going to assume that by "reverse engineering" you are just looking to fill in the foreign keys, not moving to NoSQL or something. It could be an interesting project. Here is how I would go about it:

Look at all the SELECT statements and see how joins are made to a table. 20 years ago this would be in a WHERE clause but it gets more complicated than that, of course. With correlated subqueries and UPDATE statements with FROM clauses and whatever implies a join of some sort. You have to be able to figure all that out. If you want to do it formally (you can probably suss out all this stuff intuitively) you could list the number of times combinations are used in joins between tables. List them by pairs of tables not the the set of all the tables in the join. Those would be the candidate foreign keys if one side is a primary key. The other side gets the foreign key. There are multi-column PKs but you can figure that out (so if the other side of the primary key is in two tables that's not a foreign key). If one column ends up pointing to two different table PKs that's not a proper foreign key either but it might be appropriate to pick a table and use it as the target.

If you don't already have a primary keys you should do that first. Indexes, perhaps even clustered indexes (in Sybase/MSSQL), aren't always the correct primary keys. In any case you may have to change the primary keys accordingly.

To collect all the statements might be challenging in itself. You could use perl/awk to parse them out of their C/Java/PHP/Basic/COBOL programs, or you could reap them from monitoring input to the server. You would want to look for WHERE/JOIN/APPLY etc. rather than SELECT. There are lots of other ways.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top