PHP preg_split if not inside curly brackets
-
20-09-2019 - |
Question
I'm makin' a scripting language interpreter using PHP. I have this code in that scripting language:
write {Hello, World!} in either the color {blue} or {red} or {#00AA00} and in either the font {Arial Black} or {Monaco} where both the color and the font are determined randomly
(Yes, it's hard to believe but that's the syntax)
Which regex must I use to split this (split by spaces) but only if not inside the curly brackets. So I want to turn the above code into this array:
- write
- Hello, World!
- in
- either
- the
- color
- blue
- or
- red
- or
- #00AA00
- and
- in
- either
- the
- font
- Arial Black
- or
- Monaco
- where
- both
- the
- color
- and
- font
- are
- determined
- randomly
(The strings inside the curly brackets are show above in bold) The strings inside the curly brackets must be one element each. So {Hello, World!} cannot be: 1. Hello, 2. World!
How can I do this?
Thanks in advance.
Solution
what about using something like this :
$str = 'write {Hello, World!} in either the color {blue} or {red} or {#00AA00} and in either the font {Arial Black} or {Monaco} where both the color and the font are determined randomly';
$matches = array();
preg_match_all('#\{.*?\}|[^ ]+#', $str, $matches);
var_dump($matches[0]);
Which will get you :
array
0 => string 'write' (length=5)
1 => string '{Hello, World!}' (length=15)
2 => string 'in' (length=2)
3 => string 'either' (length=6)
4 => string 'the' (length=3)
5 => string 'color' (length=5)
6 => string '{blue}' (length=6)
7 => string 'or' (length=2)
8 => string '{red}' (length=5)
9 => string 'or' (length=2)
10 => string '{#00AA00}' (length=9)
11 => string 'and' (length=3)
12 => string 'in' (length=2)
13 => string 'either' (length=6)
14 => string 'the' (length=3)
15 => string 'font' (length=4)
16 => string '{Arial Black}' (length=13)
17 => string 'or' (length=2)
18 => string '{Monaco}' (length=8)
19 => string 'where' (length=5)
20 => string 'both' (length=4)
21 => string 'the' (length=3)
22 => string 'color' (length=5)
23 => string 'and' (length=3)
24 => string 'the' (length=3)
25 => string 'font' (length=4)
26 => string 'are' (length=3)
27 => string 'determined' (length=10)
28 => string 'randomly' (length=8)
The, you just have to iterate over those results ; the ones starting by { and ending by } will be your "important" words, and the others will be the rest.
Edit after the comment : one way to identify the important words would be something like this :
foreach ($matches[0] as $word) {
$m = array();
if (preg_match('#^\{(.*)\}$#', $word, $m)) {
echo '<strong>' . htmlspecialchars($m[1]) . '</strong>';
} else {
echo htmlspecialchars($word);
}
echo '<br />';
}
Or, like you said, working with strpos and strlen would work too ;-)
OTHER TIPS
Does the order matter? If not you could extract all {}'s, remove them, then operate on the leftover string.
I would replace them using preg_replace_callback. With the callback you can keep track of the order and replace them with something like %var1%, %var2%, etc.
I don't think that there is a way to explode by spaces, but not in the curly brackets without modifying the string beforehand.
This could be done iterately without regexp. You iterate over the entire string. You put every character in a temporary variable, unless you find a space. When you find a space, you put the content of the temporary variable in the array, empty it, and then continue.
If you find a bracket, you set a boolean, and then put everything in the temp var, until you find a closing bracket. And so on.
<?php
$string = "write {Hello, World!} in either the color {blue} or {red} or {#00AA00} and in either the font {Arial Black} or {Monaco} where both the color and the font are determined randomly";
$bracket = false;
$words = array();
$temp = "";
for($i = 0; $i < strlen($string); $i++){
$char = $string[$i]
if($bracket){
$temp .= $char;
if($char == "}"){
$bracket = false;
$words[] = $temp;
}
}
else{
if($char == " "){
if($temp != ""){
$words[] = $temp;
$temp = "";
}
}
elseif($char == "{"}{
$temp .= $char;
$bracket = true;
}
else{
$temp .= $char;
}
}
}
?>
Code is untested.
You want to split on all spaces that are not contained within curly braces.
Match the curly expressions or a sequence of non-whitespace characters then disregard these matches with \K
then use the following space as the delimiter.
Code: (Demo)
$text = 'write {Hello, World!} in either the color {blue} or {red} or {#00AA00} and in either the font {Arial Black} or {Monaco} where both the color and the font are determined randomly';
var_export(preg_split('~({[^}]*}|\S+)\K ~', $text));
p.s. You can replace curly braces with strong tags like this: https://3v4l.org/fXrgE
p.p.s. You could build your exact ordered list with preg_replace_callback()
: (Demo) <-- transfer to phptester.net to see it rendered
$text = 'write {Hello, World!} in either the color {blue} or {red} or {#00AA00} and in either the font {Arial Black} or {Monaco} where both the color and the font are determined randomly';
echo "<ol>" , preg_replace_callback('~{([^}]*)}|(\S+)~', function($m) {
if (!isset($m[2])) {
return "<li><strong>{$m[1]}</strong></li>\n";
}
return "<li>{$m[2]}</li>\n";
},
$text) , "<ol>";