EDIT: It was late last night.
The PREG_SPLIT_DELIM_CAPTURE
was neat but I am now adding a better idea (Method 1).
Also improved Method 2 to replace the strstr
with a faster substr
METHOD 1: preg_replace_callback
with (*SKIP)(*FAIL)
(better)
Let's do a direct replace on the text that is certifiably table-free using a callback to your inject
function.
Here's a regex to match table-free text:
$regex = "~(?si)(?!<table>).*?(?=<table|</table)|<table.*?</table>(*SKIP)(*FAIL)~";
In short, this either matches text that is a complete non-table or matches a complete table and fails.
Here's your replacement:
$injectedString = preg_replace_callback($regex,
function($m){return inject($text,$m[0]);},
$data);
Much shorter!
And here's a demo of $regex showing you how it matches elements that don't contain a table.
$text = "<table> to
</table>not a table # 1<table> to
</table>NOT A TABLE # 2<table> to
</table>";
$regex = "~(?si)(?!<table>).*?(?=<table|</table)|<table.*?</table>(*SKIP)(*FAIL)~";
$a = preg_match_all($regex,$text,$m);
print_r($m);
The output: Array ( [0] => Array ( [0] => not a table # 1 [1] => NOT A TABLE # 2 ) )
Of course the html is not well formed and $data starts in the middle of a table, all bets are off. If that's a problem let me know and we can work on the regex.
METHOD 2
Here is the first solution that came to mind.
In short, I would look at using preg_split
with the PREG_SPLIT_DELIM_CAPTURE
flag.
The basic idea is to isolate the tables using a special preg_split
, and to perform your injections on the elements that are certifiably table-free.
A. Step 1: split $data
using an unusual delimiter: your delimiter will be a full table sequence: from <table
to </table>
This is achieved with a delimiter specified by a regex pattern such as (?s)<table.*?</table>
Note that I am not closing <table
in case you have a class there.
So you have something like
$tableseparator = preg_split( "~(?s)(<table.*?</table>)~", $data, -1, PREG_SPLIT_DELIM_CAPTURE );
The benefit of this PREG_SPLIT_DELIM_CAPTURE
flag is that the whole delimiter, which we capture thanks to the parentheses in the regex pattern, becomes an element in the array, so that we can isolate the tables without losing them. [See demo of this at the bottom.] This way, your string is broken into clean "table-free" and "is-a-table" pieces.
B. Step 2: Iterate over the $tableseparator elements. For each element, do a
if(substr($tableseparator[$i],0,6)=="<table")
If <table
is found, leave the element alone (don't inject). If it isn't found, that element is clean, and you can do your inject()
magic on it.
C. Step 3: Put the elements of $tableseparator
back together (implode just like you do in your inject
function).
So you have a two-level explosion and implosion, first with preg_split, second with your explode!
Sorry that I don't have time to code everything in detail, but I'm certain that you can figure it out. :)
preg_split
with PREG_SPLIT_DELIM_CAPTURE
demo
Here's a demo of how the preg_split works:
$text = "Hi@There@@Oscar@@@@";
$regex = "~(@+)~";
$a = preg_split($regex,$text,-1,PREG_SPLIT_DELIM_CAPTURE);
print_r($a);
The Output: Array ( [0] => Hi [1] => @ [2] => There [3] => @@ [4] => Oscar [5] => @@@@ [6] => )
See how in this example the delimiters (the @ sequences) are preserved? You have surgically isolated them but not lost them, so you can work on the other strings then put everything back together.