How To Split Pipe-Delimited Column and insert each value into new table Once?
-
25-09-2019 - |
Question
I have an old database with a gazillion records (more or less) that have a single tags column (with tags being pipe-delimited) that looks like so:
Breakfast
Breakfast|Brunch|Buffet|Burger|Cakes|Crepes|Deli|Dessert|Dim Sum|Fast Food|Fine Wine|Spirits|Kebab|Noodles|Organic|Pizza|Salad|Seafood|Steakhouse|Sushi|Tapas|Vegetarian
Breakfast|Brunch|Buffet|Burger|Deli|Dessert|Fast Food|Fine Wine|Spirits|Noodles|Pizza|Salad|Seafood|Steakhouse|Vegetarian
Breakfast|Brunch|Buffet|Cakes|Crepes|Dessert|Fine Wine|Spirits|Salad|Seafood|Steakhouse|Tapas|Teahouse
Breakfast|Brunch|Burger|Crepes|Salad
Breakfast|Brunch|Cakes|Dessert|Dim Sum|Noodles|Pizza|Salad|Seafood|Steakhouse|Vegetarian
Breakfast|Brunch|Cakes|Dessert|Dim Sum|Noodles|Pizza|Salad|Seafood|Vegetarian
Breakfast|Brunch|Deli|Dessert|Organic|Salad
Breakfast|Brunch|Dessert|Dim Sum|Hot Pot|Seafood
Breakfast|Brunch|Dessert|Dim Sum|Seafood
Breakfast|Brunch|Dessert|Fine Wine|Spirits|Noodles|Pizza|Salad|Seafood
Breakfast|Brunch|Dessert|Fine Wine|Spirits|Salad|Vegetarian
Is there a way one could retrieve each tag and insert it into a new table tag_id | tag_nm
using MySQL only?
Solution 2
After finding there is no official split function I've solved the issue using only MySQL like so:
1: I created the function strSplit
CREATE FUNCTION strSplit(x varchar(21845), delim varchar(255), pos int) returns varchar(255)
return replace(
replace(
substring_index(x, delim, pos),
substring_index(x, delim, pos - 1),
''
),
delim,
''
);
Second I inserted the new tags into my new table (real names and collumns changed, to keep it simple)
INSERT IGNORE INTO tag (SELECT null, strSplit(`Tag`,'|',1) AS T FROM `old_venue` GROUP BY T)
Rinse and repeat increasing the pos by one for each collumn (in this case I had a maximum of 8 seperators)
Third to get the relationship
INSERT INTO `venue_tag_rel`
(Select a.`venue_id`, b.`tag_id` from `old_venue` a, `tag` b
WHERE
(
a.`Tag` LIKE CONCAT('%|',b.`tag_nm`)
OR a.`Tag` LIKE CONCAT(b.`tag_nm`,'|%')
OR a.`Tag` LIKE CONCAT(CONCAT('%|',b.`tag_nm`),'|%')
OR a.`Tag` LIKE b.`tag_nm`
)
)
OTHER TIPS
Here is my attempt which uses PHP..., I imagine this could be more efficient with a clever MySQL query. I've placed the relationship part of it there too. There's no escaping and error checking.
$rs = mysql_query('SELECT `venue_id`, `tag` FROM `venue` AS a');
while ($row = mysql_fetch_array($rs)) {
$tag_array = explode('|',$row['tag']);
$venueid = $row['venue_id'];
foreach ($tag_array as $tag) {
$rs2 = mysql_query("SELECT `tag_id` FROM `tag` WHERE tag_nm = '$tag'");
$tagid = 0;
while ($row2 = mysql_fetch_array($rs2)) $tagid = $row2['tag_id'];
if (!$tagid) {
mysql_execute("INSERT INTO `tag` (`tag_nm`) VALUES ('$tag')");
$tagid = mysql_insert_id;
}
mysql_execute("INSERT INTO `venue_tag_rel` (`venue_id`, `tag_id`) VALUES ($venueid, $tagid)");
}
}