Domanda

I'm trying to split a UTF-8 string on a quote character (") with delimiter capture, except where that quote is followed by a second quote ("") so that (for example)

"A ""B"" C" & "D ""E"" F"

will split into three elements

"A ""B"" C"
&
"D ""E"" F"

I've been attempting to use:

$string = '"A ""B"" C" & "D ""E"" F"';
$temp = preg_split(
    '/"[^"]/mui',
    $string,
    null, 
    PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE
);

but without success as it gives me

array(7) {
  [0]=>
  string(2) " ""
  [1]=>
  string(1) """
  [2]=>
  string(1) "C"
  [3]=>
  string(2) "& "
  [4]=>
  string(2) " ""
  [5]=>
  string(1) """
  [6]=>
  string(2) "F""
}

So it's losing any characters that immediately follow a quote unless that character is also a quote

In this example there's a quote as the first and last characters in the string, though that may not always be the case, e.g.

{ "A ""B"" C" & "D ""E"" F" }

needs to split into five elements

{
"A ""B"" C"
&
"D ""E"" F"
}

Can anybody help me get this working?

È stato utile?

Soluzione

Since you said that you don't mind the quotes to be consumed on the split, you can use the expression:

(?<!")\s?"\s?(?!")

Where two negative lookarounds are used. The output on your sample will be:

{ 
A ""B"" C
&
D ""E"" F
}

[I put the \s? to consume any trailing space, remove them if you want to keep them]

Altri suggerimenti

I think it would probably be easier to use preg_match_all:

preg_match_all('/"([^"]|"")+"|[^"]+/', $string, $matches);

Here’s a demo. The regular expression matches a quoted string or not a quoted string, so if the last part doesn‘t have a closing quote, it’ll ignore that; that might need changing, depending on your situation.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top