Pregunta

In PHP, given a long piece of text, such as:

Ms. Kane, who was elected attorney general last year and has been mentioned as a possible future candidate for governor, struck a political note in her brief announcement to an audience that cheered and applauded her decision.

“I looked at it this way, the governor’s going to be O.K.,” she said. She wondered, she added, who would represent “the Daves and Robbies, who represents the Emilys and Amys?”

“As attorney general,” she said, “I choose you.”

I would like to extract all quoted material, in this case an array with these results:

"I looked at it this way, the governor’s going to be O.K.,"
"the Daves and Robbies, who represents the Emilys and Amys?"
"As attorney general,"
"I choose you."

Assumptions:

  • There will always be a matching opening & closing quotation
  • Simple double-quotes

Bonus points if you also ensure it handles curly-quotes, single-quotes, and other special cases, but feel free to go on the assumption of plain double-quotes if that makes it easier.

And yes - I have searched the site for answers and while there were somethings that seemed helpful I didn't hit anything that worked. Closest was this but no dice:

preg_match_all('/"([^"]*(?:\\"[^"]*)*)"/', $content, $matches)
¿Fue útil?

Solución

$string = 'Ms. Kane, who was elected attorney general last year and has been mentioned as a possible future candidate for governor, struck a political note in her brief announcement to an audience that cheered and applauded her decision.

“I looked at it this way, the governor’s going to be O.K.,” she said. She wondered, she added, who would represent “the Daves and Robbies, who represents the Emilys and Amys?”

“As attorney general,” she said, “I choose you.”';

// Normalize quotes
$search = array("\xe2\x80\x9c", "\xe2\x80\x9d", "\xe2\x80\x98", "\xe2\x80\x99"); 
$replace = array('"', '"', "'", "'");
$newstring = str_replace($search, $replace, $string);

// Extract text
$regex = "/\"(.*)\"/U";  
preg_match_all ($regex, $newstring, $output);  

if(isset($output[1])) {
    print_r($output[1]);
} else {
    echo $newstring;
}

Should give

Array
(
    [0] => I looked at it this way, the governor's going to be O.K.,
    [1] => the Daves and Robbies, who represents the Emilys and Amys?
    [2] => As attorney general,
    [3] => I choose you.
)

Otros consejos

Might try PHP split string. .

Pseudocode:

Split everything into an array with " as the split parameter, then use % (modulus 2) to select only the "in-between" text in the string array. To snag curlies etc, simply convert all instances to straight quotes first.

You can use this....

$matches = array();
preg_match_all('/(\“.*\”)/U', str_replace("\n", " ", $str), $matches);
print_r($matches);

note I am removing newlines so it will give matches where the quote starts on one line and finishes on another.

A simplest way, but no the best was find the occurrence of " with strstr() and after use substr() to cut a string.

$string = 'Your long text "with quotation"';

$occur = strpos($string, '"'); // the frst occurence of "
$occur2 = strpos($string, '"', $occur + 1); // second occurence of "

$start = $occur; // the start for cut text
$lenght = $occur2 - $occur + 1; // lenght of all quoted text for cut

$res = substr($string, $start, $lenght); // Your quoted text here ex: "with quotation"

And you may insert this to a loop for more than one quoted text:

   $string = 'Your long text "with quotation" Another long text "and text with quotation"';

    $occur2 = 0; // for doing the first search from begin
    $resString = ''; // if you wont string and not array
    $res = array();
    $end = strripos($string, '"'); // find the last occurence for exit loop

    while(true){
        $occur = strpos($string, '"', $occur2); // after $occur2 change his value for find next occur
        $occur2 = strpos($string, '"', $occur + 1);

        $start = $occur;
        $lenght = $occur2 - $occur + 1;

        $res[] = substr($string, $start, $lenght); // $res may be array
        $resString .= substr($string, $start, $lenght); // or string with concat

        if($end == $occur2)
            break; // brak if is the last occurence

        $occur2++; // increment for search next
    }


    echo $resString .'<br>';
    exit(print_r($res));

Result:

 "with quotation""and text with quotation"
 or
 Array ( [0] => "with quotation" [1] => "and text with quotation" )

Its simple way without use regexp, hope help someone :) (sorry for bad English)

You can do it like this:

<meta charset="UTF-8" />
<pre>
<?php
$pattern = '~(?|"((?>[^"]++|(?<=\\")")*)"|“((?>[^”]++|(?<=\\”)”)*)”)~u';

$text = <<<LOD
Ms. Kane, who was elected attorney general last year and has been mentioned as a possible future candidate for governor, struck a political note in her brief announcement to an audience that cheered and applauded her decision.

“I looked at it this way, the governor’s going to be O.K.,” she said. She wondered, she added, who would represent “the Daves and Robbies, who represents the Emilys and Amys?”

“As attorney general,” she said, “I choose you.”
LOD;

preg_match_all ($pattern, $text, $matches);
print_r($matches[1]);

Since you use unicode characters, you must add the u modifier at the end of the pattern.

You can easily add what you want to the pattern in the same way, example with simple quotes:

$pattern = '~(?|"((?>[^"]++|(?<=\\")")*)"|“((?>[^”]++|(?<=\\”)”)*)”|\'((?>[^\']++|(?<=\\\')\')*)\')~u';

Note that the structure is always the same:

(?|
    "((?>[^"]++|(?<=\\")")*)"
  |
    “((?>[^”]++|(?<=\\”)”)*)”
  |
    \'((?>[^\']++|(?<=\\\')\')*)\'
)
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top