Collecting specific data from server emails (delivery failure reports)

https://stackoverflow.com/questions/10403661

05-06-2021
|

Question

Recently I have sent a newsletter to an old email database I own. After a couple of years it was created, it appears that approximately 30% of those emails are inactive - I've received thousands of Mail Delivery Failure messages.

All these failure notifications are stored on my server as text-files and they are replies containing the text I have sent to my subscribers. Each email has in its text the user's id. This id is preceded by a bit of common text, something like

<a href="abc.com?id=123321"></a>

and it's '123321' I want to extract from each failure report I have received.

First I started to do that manually and collect everything one by one. After 500 emails my eyes felt on the floor and I'm sure there's a solution with php and some functions. I was thinking of putting all of them into one big file and to find a preg_match way to do it or to try something with regex expressions.

How would you deal with such a problem and where should I look for a solution?

Solution

This appears to work for me for a small data sample. As long as you don't run out of memory from trying to load all the data at once, it should work:

$data = file_get_contents("data.txt");
preg_match_all('#(?<=<a href="abc\.com\?id=)\d+(?="></a>)#',$data,$matches);
print_r($matches);

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow