Question

I'm writing a txt database file to sql converter, and I need to split the items in the rows. The problem is that among the items there might be scripts which can hold multiple commas (which are the separators in the db structure). The good news, that the scripts are nested into {}-s so, it makes the job similar to parsing a csv file. The only problem is that the scripts themselves can hold more scripts nested into {}-s, and this stops my formula working..

Structure of the txt db:

501,Red_Potion,Red Potion,0,50,,70,,,,,0xFFFFFFFF,7,2,,,,,,{ itemheal rand(45,65),0; },{},{}
502,Orange_Potion,Orange Potion,0,200,,100,,,,,0xFFFFFFFF,7,2,,,,,,{ itemheal rand(105,145),0; },{},{}
503,Yellow_Potion,Yellow Potion,0,550,,130,,,,,0xFFFFFFFF,7,2,,,,,,{ itemheal rand(175,235),0; },{},{}
504,White_Potion,White Potion,0,1200,,150,,,,,0xFFFFFFFF,7,2,,,,,,{ itemheal rand(325,405),0; },{},{}

The regex what I use to match the delimeters for splitting:

,(?![^{}]*\})

This works fine until it counters a more complicated nested script item, like:

1492,Velum_Glaive,Vellum Glaive,4,20,,4500,250,,3,0,0x00004082,7,2,34,4,95,1,5,{ bonus2 bAddRace,RC_DemiHuman,80; if(getrefine()>=6) { bonus2 bSkillAtk,"LK_SPIRALPIERCE",100; bonus2 bSkillAtk,"KN_SPEARBOOMERANG",50; } if(getrefine()>=9) { autobonus2 "{ bonus bShortWeaponDamageReturn,20; bonus bMagicDamageReturn,20; }",100,2000,BF_WEAPON|BF_MAGIC,"{ specialeffect2 EF_REFLECTSHIELD; }"; } },{},{}

So how do I make it to match only the db structure delimeters and leave the commas in the script out?

Thanks in advance! :)

Was it helpful?

Solution

This is not a job for regex. As I pointed out in my comment, nested structures are usually beyond what regex can do. PCRE has the recursion construct (?R) and .NET has balanced groups, but the solution usually get really unreadable and unmaintainable.

Added to that, you don't just have to take {} into account, but strings and comments in your script as well. You are much better off, parsing the thing manually. Here is a quick and dirty example how it would be done using PHP (ignoring strings and comments!):

$level = 0;
$values = array();
$start = 0;
for($i = 0; $i < strlen($str); $i++)
{
    switch($str[$i])
    {
    case ",":
        if(!$level) {
            $values[] = substr($str, $start, $i-$start);
            $start = $i+1;
        }
        break;
    case "{":
        $level++;
        break;
    case "}":
        $level--;
        if($level < 0) trigger_error("unexpected }");
        break;
    }
}
if($level > 0) trigger_error("missing }");
$values[] = substr($str, $start);

You can see already, that you basically end up with a simpler parser for your script.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top