Question

Trying to find spammers in exim mainlog. Mainlog has mail IDs and Subjects something like below.

username1@example.com S==thi#s i $s @a Su~bJec%t
username2@example2.com S==thi#s i ^s an*ot+her Su~bj)ec%t

What I am trying to do is take the subject, remove all the symbols, space using sed and grep for keywords. If satisfied, then print mail ID. I am successful in removing all the symbols, space and grep the keywords, but the problem is symbols from mail IDs (@ and .) are also removed. So my question is how to apply sed and grep only to subjects S==thi#s i ^s an*ot+her Su~bj)ec%t and if satisfied print mail ID without affecting its symbols. Thanks in advance.

Was it helpful?

Solution

This would be tricky with sed, if even possible. If you're ok with awk instead:

awk -F' S==' -v k1=this '{gsub("[][()#$@~% ]", "", $2); if ($2 ~ k1) print $1}'

If you want to remove all non-alphanumeric characters, then it's better to write like this:

awk -F' S==' -v k1=this '{gsub("[^[:alnum:]]", "", $2); if ($2 ~ k1) print $1}'

If your version of awk doesn't support [:alnum:] then you can write like this instead:

awk -F' S==' -v k1=this '{gsub("[^a-zA-Z0-9]", "", $2); if ($2 ~ k1) print $1}'

Explanation:

  • Using S== as the field separator to split mail ID and subject parts
  • Passing in a keyword "this" in the k1 variable. You could use any other keyword or multiple keywords with more -v parameters in the same format, for example -v k2=something
  • Remove all the symbols from the 2nd field with gsub
  • If the 2nd field matches the keyword in k1, then print the first field (= the mail ID)

I hope this helps.

OTHER TIPS

Before: your grep/sed (could be in your sed treatment but before your action)

sed 's/@/(at)/1
: dot
   s/^\([^ ]*\)\.\([^ ]*\) /\1(dot)\2 /
   t dot'

after your grep sed (could be in your sed treatment but aftyer your action)

sed 's/(dot)/./g;s/(at)/@/g'

assuming there is no (dot) and (at) in your subject. Nearly any other pattern could be used like #at# or §1§ or :a: instead (just not use specal sed char like +.{[$^

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top