Question

Ive gotten some great help here and I am so close to solving my problem that I can taste it. But I seem to be stuck.

I need to scrape a simple form from a local webserver and only return the lines that match a users local email (i.e. onemyndseye@localhost). simplehtmldom makes easy work of extracting the correct form element:

foreach($html->find('form[action*="delete"]') as $form) echo $form;

Returns:

<form action="/delete" method="post">
    <input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">
        http://www.linux.com/rss/feeds.php
    </a> [email: 
        onemyndseye@localhost (Default)
    ]<br />         
    <input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">
        http://www.ubuntu.com/rss.xml
    </a> [email: 
        onemyndseye@localhost (Default)
    ]<br />         
<input type="submit" name="delete_submit" value="Delete Selected" /></form>

However I am having trouble making the next step. Which is returning lines that contain 'onemyndseye@localhost' and removing it so that only the following is returned:

<input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">http://www.linux.com/rss/feeds.php</a> <br />
<input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">http://www.ubuntu.com/rss.xml</a> <br />

Thanks to the wonderful users of this site Ive gotten this far and can even return just the links but I am having trouble getting the rest... Its important that the complete <input> tags are returned EXACTLY as shown above as the id and name values will need to be passed back to the original form in post data later on.

Thanks in advance!

***** EDIT ******

Issue close to solved now thanks to Yacoby. The last small hurdle is that some trash is left behind from the str_ireplace. Perhaps it would be easier to remove all text between </a> and <br /> ...?

After Yacoby's additions the output is as follows:

<form action="/delete" method="post">
    <input type="checkbox" id="D1" name="D1" /><a href="http://www.linux.com/rss/feeds.php">
        http://www.linux.com/rss/feeds.php
    </a> [email: 
         (Default)
    ]<br />         
    <input type="checkbox" id="D2" name="D2" /><a href="http://www.ubuntu.com/rss.xml">
        http://www.ubuntu.com/rss.xml
    </a> [email: 
         (Default)
    ]<br />         
    <input type="checkbox" id="D3" name="D3" /><a href="http://mythbuntu.org/rss.xml">
        http://mythbuntu.org/rss.xml
    </a> [email: 

    ]<br />         
<input type="submit" name="delete_submit" value="Delete Selected" /></form>

Notice [email: (Default)] and [email: ] have been left behind. Also would need to remove the form action and submit lines at last but that part I think i can gather from the previous suggestion.

***** SOLVED ****

issue solved with:

$html = file_get_html('http://localhost:9000/');
foreach($html->find('form[action*="delete"]') as $form)
  if ( stripos($form->innertext, 'onemyndseye@localhost') !== false ){
      $form = preg_replace('!</a>.*?<br />!s', '</a><br />', $form);
      echo $form;
}

Thanks for the help!

Was it helpful?

Solution

Maybe something like

if ( stripos($form->innertext, 'onemyndseye@localhost') !== false ){
    $form->innertext = str_ireplace('onemyndseye@localhost', '', $form->innertext);
    echo $form;
}

This won't work with html like

<b>onemyndseye</b>@localhost

As it is easy to find if the text with tags removed matches a string using plaintext but it is far harder to replace.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top