Inject code after X paragraphs but avoiding tables

Question 1

I'm sometimes a bit crazy, sometimes I go for patterns that are lazy, but this time I'm going for something hazy.

$input = 'test <table><p>wuuut</p><table><p>lolwut</p></table></table> <p>foo bar</p> test1 <p>baz qux</p> test3'; # Some input
$insertAfter = 2; # Insert after N p tags
$code = 'CODE'; # The code we want to insert

$regex = <<<'regex'
~
# let's define something
(?(DEFINE)
   (?P<table>                     # To match nested table tags
      <table\b[^>]*>
         (?:
            (?!</?table\b[^>]*>).
         |
            (?&table)
         )*
      </table\s*>
   )
   (?P<paragraph>                 # To match nested p tags
      <p\b[^>]*>
         (?:
            (?!</?p\b[^>]*>).
         |
            (?&paragraph)
         )*
      </p\s*>
   )
)
(?&table)(*SKIP)(*FAIL)           # Let's skip table tags
|
(?&paragraph)                     # And match p tags
~xsi
regex;

$output = preg_replace_callback($regex, function($m)use($insertAfter, $code){
    static $counter = 0; # A counter
    $counter++;
    if($counter === $insertAfter){ # Should I explain?
        return $m[0] . $code;
    }else{
        return $m[0];
    }
}, $input);

var_dump($output); # Let's see what we've got

Online regex demo Online php demo

References:

Question 2

EDIT: It was late last night.

The PREG_SPLIT_DELIM_CAPTURE was neat but I am now adding a better idea (Method 1).
Also improved Method 2 to replace the strstr with a faster substr

METHOD 1: preg_replace_callback with (*SKIP)(*FAIL) (better)

Let's do a direct replace on the text that is certifiably table-free using a callback to your inject function.

Here's a regex to match table-free text:

$regex = "~(?si)(?!<table>).*?(?=<table|</table)|<table.*?</table>(*SKIP)(*FAIL)~";

In short, this either matches text that is a complete non-table or matches a complete table and fails.

Here's your replacement:

$injectedString = preg_replace_callback($regex,
        function($m){return inject($text,$m[0]);},
            $data);

Much shorter!

And here's a demo of $regex showing you how it matches elements that don't contain a table.

$text = "<table> to 
</table>not a table # 1<table> to 
</table>NOT A TABLE # 2<table> to 
</table>";
$regex = "~(?si)(?!<table>).*?(?=<table|</table)|<table.*?</table>(*SKIP)(*FAIL)~";
$a = preg_match_all($regex,$text,$m);
print_r($m);

The output: Array ( [0] => Array ( [0] => not a table # 1 [1] => NOT A TABLE # 2 ) )

Of course the html is not well formed and $data starts in the middle of a table, all bets are off. If that's a problem let me know and we can work on the regex.

METHOD 2

Here is the first solution that came to mind.

In short, I would look at using preg_split with the PREG_SPLIT_DELIM_CAPTURE flag.

The basic idea is to isolate the tables using a special preg_split, and to perform your injections on the elements that are certifiably table-free.

A. Step 1: split $data using an unusual delimiter: your delimiter will be a full table sequence: from <table to </table>

This is achieved with a delimiter specified by a regex pattern such as (?s)<table.*?</table>

Note that I am not closing <table in case you have a class there.

So you have something like

$tableseparator = preg_split( "~(?s)(<table.*?</table>)~", $data, -1, PREG_SPLIT_DELIM_CAPTURE );

The benefit of this PREG_SPLIT_DELIM_CAPTURE flag is that the whole delimiter, which we capture thanks to the parentheses in the regex pattern, becomes an element in the array, so that we can isolate the tables without losing them. [See demo of this at the bottom.] This way, your string is broken into clean "table-free" and "is-a-table" pieces.

B. Step 2: Iterate over the $tableseparator elements. For each element, do a

if(substr($tableseparator[$i],0,6)=="<table")

If <table is found, leave the element alone (don't inject). If it isn't found, that element is clean, and you can do your inject() magic on it.

C. Step 3: Put the elements of $tableseparator back together (implode just like you do in your inject function).

So you have a two-level explosion and implosion, first with preg_split, second with your explode!

Sorry that I don't have time to code everything in detail, but I'm certain that you can figure it out. :)

preg_split with PREG_SPLIT_DELIM_CAPTURE demo

Here's a demo of how the preg_split works:

$text = "Hi@There@@Oscar@@@@";
$regex = "~(@+)~";
$a = preg_split($regex,$text,-1,PREG_SPLIT_DELIM_CAPTURE);
print_r($a);

The Output: Array ( [0] => Hi [1] => @ [2] => There [3] => @@ [4] => Oscar [5] => @@@@ [6] => )

See how in this example the delimiters (the @ sequences) are preserved? You have surgically isolated them but not lost them, so you can work on the other strings then put everything back together.