Question

Recently I have sent a newsletter to an old email database I own. After a couple of years it was created, it appears that approximately 30% of those emails are inactive - I've received thousands of Mail Delivery Failure messages.

All these failure notifications are stored on my server as text-files and they are replies containing the text I have sent to my subscribers. Each email has in its text the user's id. This id is preceded by a bit of common text, something like

<a href="abc.com?id=123321"></a>

and it's '123321' I want to extract from each failure report I have received.

First I started to do that manually and collect everything one by one. After 500 emails my eyes felt on the floor and I'm sure there's a solution with php and some functions. I was thinking of putting all of them into one big file and to find a preg_match way to do it or to try something with regex expressions.

How would you deal with such a problem and where should I look for a solution?

Was it helpful?

Solution

This appears to work for me for a small data sample. As long as you don't run out of memory from trying to load all the data at once, it should work:

$data = file_get_contents("data.txt");
preg_match_all('#(?<=<a href="abc\.com\?id=)\d+(?="></a>)#',$data,$matches);
print_r($matches);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top