Question

Consider the following regular expression:

/\<form.+?((action|id|method|name)=(\"|\')(.*?)?(\"|\')).*?\>/i

It should be enough to capture something as basic as <form> but also something like <form action="post.php" method="post" name="form1"> and other various combinations of those four attributes listed in the expression above.

The reason I have chosen this expression over a basic /\<form.*?\>/i is because I want to get the values from capture groups 2 and 4 (the attribute name, and the attribute value). However, when I run this expression on a form element like the complex one above, it will return only action and post.php. I would like for it to return an array of matches.

Here is some example code:

<?php
    $string = '<form action="post.php" method="post" name="form1">';
    preg_match_all('/\<form.+?((action|id|method|name)=(\"|\')(.*?)?(\"|\')).*?\>/i', $string, $forms);
    print_r($forms);
?>

If I run this in the command line for demonstration purposes, here is the output:

c:\Users\Aaron\Desktop>php test.php
Array
(
    [0] => Array
        (
            [0] => <form action="post.php" method="post" name="form1">
        )

    [1] => Array
        (
            [0] => action="post.php"
        )

    [2] => Array
        (
            [0] => action
        )

    [3] => Array
        (
            [0] => "
        )

    [4] => Array
        (
            [0] => post.php
        )

    [5] => Array
        (
            [0] => "
        )

)

My desired output would be like this:

c:\Users\Aaron\Desktop>php test.php
Array
(
    [0] => Array
        (
            [0] => <form action="post.php" method="post" name="form1">
            [1] => <form action="post.php" method="post" name="form1">
            [2] => <form action="post.php" method="post" name="form1">
        )

    [1] => Array
        (
            [0] => action="post.php"
            [1] => method="post"
            [2] => name="form1"
        )

    [2] => Array
        (
            [0] => action
            [1] => method
            [2] => name
        )

    [3] => Array
        (
            [0] => "
            [1] => "
            [2] => "
        )

    [4] => Array
        (
            [0] => post.php
            [1] => post
            [2] => form1
        )

    [5] => Array
        (
            [0] => "
            [1] => "
            [2] => "
        )

)

I am currently able to workaround this by finding the form element and running an expression multiple times for as many attributes as I wish to search for. Here is that code. But I can't help thinking there must be an easier way?

So the question is: can I return all the matches from a capture group, instead of just the first match?

Thanks in advance.

Was it helpful?

Solution

My sincere advice to you is not to handle that (HTML) using regular expressions , simply use a DOM Parser instead.

The code..

<?php
$string = '<form action="post.php" method="post" name="form1">';
$dom = new DOMDocument;
$dom->loadHTML($string);
foreach ($dom->getElementsByTagName('form') as $ftag) {
    if ($ftag->hasAttributes()) {
        foreach ($ftag->attributes as $attribute) {
            $attrib[$attribute->nodeName] = $attribute->nodeValue;
        }
    }
}
print_r($attrib);

OUTPUT :

Array
(
    [action] => post.php
    [method] => post
    [name] => form1
)

OTHER TIPS

You have to find a form element first.

<?php
 $string = '<form action="post.php" method="post" name="form1">';
 preg_match_all('/\<form+?\>/i', $string, $forms);

And then apply regex on the inside:

 foreach($form in $forms){
  preg_match_all('/((action|id|method|name)=(("[^"]*")|(\'[^\']*\'))/i',$form[0],$attrs);
 }
 $form = array_merge($form,$attrs);
 print_r($forms);
?>

I don't have the equipment to try if it's working. Hope that it does :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top