Question

We're migrating MySQL to PostgreSQL. I can easily audit the schema and the SQL statements used throughout the (REALbasic) program. Most of the SQL is composed by building string variables.

I already know about needing to replace our use of SELECT LAST_INSERT_ID() with a SERIAL column with UNIQUE constraint.

What, if any, differences between the two which are not obviously visible in SQL statements might bite us? I'm looking for (probably subtle) assumptions about behaviour such as any differences in autocommit, need to add constraints which aren't in MySQL etc.

I'm trying to tease out any gotchas for a couple of reasonably smart, attentive guys who aren't gurus in either of the databases.

This is a one-way commitment so if there are major benefits we will get by adding new declarations I'd appreciate them pointing out.

Note: we're not using any form of parameterised queries and, yes, I've pointed out issues with injection attacks as a required audit of the code.

Yes, for the curious, this decision was prompted by GPL issues, not that we're averse to paying for licenses but, unfortunately, the sole REALbasic driver for MySQL was GPL. As of May 2009, Real Software have released a new Community driver which is GPL, and properly includes source. They have promised a non-GPL Enterprise driver in the near future.

I am prepared to believe that the answer might be there are no invisible monsters under the bed but thought I'd ask to be sure.

Was it helpful?

Solution

  • select count(*) from table;

    will be slow, as it needs to read entire table. It needs workarounds if you need to count big tables often. This is needed to ensure multiversion concurrency control.

  • In the latest version (8.3) there's no implicit cast to text, which means that for example

    select 234 like '2%';

    will throw error. You'll need explicit cast like:

    select 234::text like '2%';

  • Update is really a delete+insert. As space used by deleted rows is not immediately freed then if you update entire table in one transaction then you'll need double the space.

Postgresql is a very good database, you'll love it in no time. It has several very useful features that you'll then miss in other, even commercial databases. For example transactional data definition language or savepoints.

OTHER TIPS

When I made the migration from MySQL to PostgreSQL, several things really got in my way:

1) The code writing to the MySQL database was broken and feeding the database garbage a real database that had foreign keys would have prevented. Be prepared to find "surprise" garbage data once you add in referential integrity.

2) MySQL's indexes on strings are case insensitive! If you have a primary key on something like a username, "Coryking" and "CORYKING" are the same according to MySQL. On PostgreSQL, they are different. I wasn't aware of this until there started to be people who were registering duplicate usernames that should already be in the database.

3) MySQL likes to automagically add nonsense default values to columns you specify as "NOT NULL". For example, if you specify a VARCHAR(255) NOT NULL, it will turn that column definition into "VARCHAR(255) NOT NULL DEFAULT ''".

4) PostgreSQL likes huge queries--MySQL doesn't. You'll have a lot of fun after the migration improving your database queries--don't be shy about it either.

Unless you've been SQL92-compliant with your column-typing, you will run into differences in the names of types between the two.

In-query variable modification doesn't work in postgreSQL, for instance, this will work in MySQL but not in postgreSQL (I haven't tested this recently, maybe it does work now.)

SET @a:=1 SELECT ID,@a:=@a+1 FROM some_table;

I also personally believe that postgreSQL has better handling of complex queries that include sub-selects and the like (which most MySQL users used to avoid).

Edit: Oh, and I nearly forgot! The way postgreSQL stores tables is utterly different than how MySQL does. This might affect your backup/restore strategies, too.

I don't know if you use PHP or not but I've found that addslashes works relatively well for MySQL but bombs fast in Postgres. Either use pg_escape_string() or better yet a prepared statement.

Depending on the amount of queries involved (and if you have someone inhouse to do it), it might be wise to extract all the queries, adapt them as necessary and run them on two copies of the existing DB, one running mysql and one running posgresl. Looking at the produced logs and comparing resulting datas might show some interesting hints. The first step could also be done by executing unit tests, manual or script based testing of the application.

Based on the WikiVS comparison I just found a number of interesting points, most not really gotchas, such as: - subqueries are a lot faster in Postgres, this isn't really a gotcha but means some workarounds can be removed

That site did lead me to the Postgres Gotchas list, which had more details on count(*) speed and another sequential scan issue: Max and Min are sequential scans. This is much more likely to hurt our performance than anything else identified so far. However, that article includes a Max(col) workaround:

SELECT col FROM table ORDER BY col DESC LIMIT 1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top