I noticed a problem with GROUP BY in a query I am currently trying to debug. I have a DB table with the following structure (reduced from actual real life):
CREATE TABLE IF NOT EXISTS `product_variants` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`product_id` int(11) unsigned NOT NULL DEFAULT '0',
`pid_merchant` varchar(50) NOT NULL,
`checksum` char(32) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `checksum` (`checksum`),
KEY `product_id` (`product_id`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
In this table, I have the following 2 rows (among many other millions):
INSERT INTO `product_variants` (`id`, `product_id`, `pid_merchant`, `checksum`) VALUES
(525555236, 628702710, 'ARTüöäß111', 'af5334b1193bf171580c70813ac83327'),
(525555241, 628702710, 'ARTüöäß222', 'cfe50fd9c3ca29fd957b839892313f82');
The query I'm currently debugging is attempting to find duplicate entries in this table based on pid_merchant
, in a very simple matter:
SELECT count(*), pv.* FROM product_variants pv WHERE pv.pid_merchant != '' GROUP BY pv.pid_merchant HAVING count(*) > 1
My problem is that both these results match, even though the actual pid_merchant
values are different - one ends in 111, the other in 222. Does anyone know how to approach this issue? I already tried grouping by MD5() and HEX(), by changing collation to latin1_german2_ci, by forcing binary or utf8 conversion and many others - pretty much all I could think of.
Another weird thing is that it seems to confuse the values of Y and Ü (capital U with umlaute) while grouping (e.g. ABC-Y and ABC-Ü are considered as identical when grouping).
The server is running MySQL 5.5 on Ubuntu x64:
mysqld Ver 5.5.29-0ubuntu0.12.04.2-log for debian-linux-gnu on x86_64 ((Ubuntu))