Using bash-Kung-fu to fight spam

Info: We’ll be looking into stopping spam on a postfix mail-server/network using lots of bash magic.

Background: There is actually some mathematical basis to spam (and if you don’t want to hear any of that and would just like the magic wand to aid you in making it go away, then feel free to skip further down in this article).

The hunch is, if we (the spam sending company) can get just 10% of the one billion spam messages we send clicked on or replied to (which could generate false ad-clicking revenue) or have the user give in and buy our product (even if it cost next to nothing) then we can make a generous profit.

But that’s not so great for us, the regular people that have to wade through it daily, right?

Just 10% of one billion would be 100 million, and the initial one billion was a generous figure (as some estimate the number of spam emails sent daily is well past that mark). One can see why there is such a business for it. Statistically, it could work.

However, that doesn’t make it any less annoying. Lots of time and money is spent (or wasted) trying to combat the mountains of unwanted ads and spam that is sent via email. At times it would see there is little one can do to fight it. Why?

Because spam senders are getting smarter. So how do we get smarter about fighting it?

Tactic one: Stop receiving email? Done. Let’s end the article.

Tactic two: Blocking a spam sending domain – This one is easy, just block the address that is sending spam right? No, that is easily circumvented by using another address.

Tactic three: Employ the use of a Real-time Blackhole List – In theory, this is a list that provides information on domains you pass to it. It should tell you if the domain in question is actively sending spam, known to send spam, or any number of other reasons why you should not talk to it.

This method only works so well. Some domains may actively send multitudes of unwanted email before they are put on one of these list, and this method is known to contain false-positives.

What really helps to get around this method is that fact that spamming companies employ the use of many domains, hundreds, and can send part of their list via many different addresses, so they are not as quickly detected by these ‘real-time’ lists.

This is evident when looking through ones inbox at specific spam messages.

For example, the following two messages were found, with different subject lines, and different sending addresses:

From: D... <(I'll be nice and not post this address)@h*****.com>

and

From: T... <(I'll be nice and not post this address)@n*****.net>

… however the email content was the same. It’s clear to see here, that for the purposes of one carefully crafted spam message it was sent via multiple domains/addresses in order to avoid the initial suspicion it would have gotten had it been sent many times with just one address.

So what’s the solution? How would you go about taking back your inbox and network from unwanted spam?

Well, based off of what we know, we can implement a sort of zero-tolerance type of approach when it comes to dealing with spam. We know that certain phrases or words are unwanted and we also know that even if it is just ‘one’ message, it may be sent many times from multiple domains.

What we should be looking to target would be any domain (we’re not concerned about the sending user, just the domain right now) that specifically sends unwanted message A, with subject B.

Part one: Finding the emails

This is simple enough with a bit of grep and awk magic. Let’s say for example, I wanted to search for all emails (no matter who sent them) that contained the words “buy your gold now”, “win a free house” and “sell us your car today,” in the “From” line. (While you could tweak this to search the subject line, it’s been noted that what we are specifically looking for is sent along with the from address on the “From” line, in the form of From: Buy your gold now <fakeperson@fake*****.com> thus we are targeting the “From” field.

Throughout the rest of this article we will be showing the command strings used on a production server, but feel free to updating them accordingly to match what you are looking for

For us, we wanted to find the following:

1
egrep -wR 'From\: (NPW|APP\. STATUS|Womens Network|Surface Pros|Surface Professionals|DIY Professionals|The DIY Pros|The Womans Network|.*Surface Pros)' .

These are all subjects/from lines that turned up unwanted emails (spam).

  • APP. STATUS
  • Surface Pros
  • DIY Professionals
  • etc …

The basic string is egrep -wR 'From\: (phrase here|or_a_word|...|...)'. This was ran in the mail folder (any mail folder) that is getting these unwanted emails. It produces quite the list:

1
2
3
4
5
6
7
8
9
10
11
12
13
.
..
...
root@server:/var/mail/virtual/domain.com/mail_accounts/bob# egrep -wR 'From\: (NPW|APP\. STATUS|Womens Network|Surface Pros|Surface Professionals|DIY Professionals|The DIY Pros|The Womans Network|.*Surface Pros)' .
Binary file ./dovecot.index.cache matches
./cur/1471275197.H11474P29309.mail.server.server.tld,S=69717:2,:From: Surface Pros <olinda@aquaengines.com>
./cur/1470749297.H535141P31132.mail.server.server.tld,S=74913:2,:From: The DIY Pros <stacee@tpi20.com>
./cur/1470749297.H535141P31132.mail.server.server.tld,S=74913:2,:From: The DIY Pros <stacee@tpi20.com>
./cur/1471274562.H889714P29014.mail.server.server.tld,S=43055:2,S:From: NPW <darnell@aquaengines.com>
./cur/1471027019.H568221P16603.mail.server.server.tld,S=74249:2,:From: Surface Professionals <bryanna@dukebluefeeds.com>
...
..
.

Part two: Making a pretty list

We just need the domains to block, that is all we are after. So it takes a bit more command-line-fu to acheive this. What is seen above are the full-names of the email files along with other information we don’t need. We are going to cut that out using awk. The entire command above is piped to awk which prints that last field. Then we use a power feature in grep a ‘lookahead’ and a ‘lookbehind’ to grab just the email listed between the “<” and “>”. Then finally we cut out the domain portion with the second awk. This gives us a list domains that have sent the same spam email(s) using different addresses/domains.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
root@server:/var/mail/virtual/domain.com/mail_accounts/bob# egrep -wR 'From\: (NPW|APP\. STATUS|Womens Network|Surface Pros|Surface Professionals|DIY Professionals|The DIY Pros|The Womans Network|.*Surface Pros)' . | awk '{print $NF}' | grep -oP '(?!\<).*@.*(?<!\>)' | awk -F '\@' '{print $2}'
aquaengines.com
tpi20.com
tpi20.com
aquaengines.com
dukebluefeeds.com
tata-harper-skincare.com
losportalitos.com
pureclearbeauty.com
nightskyreserve.net
ga-ear.com
oyager.com
healthbeautypurity.com
smartskintreatment.com
smartskintreatment.com
successisfree.com
healthyreviewsplus.com
gismoman.com
hansfordsmile.com
microdroneoffer.com
pateintsgrowth.com
pateintsgrowth.com
dailylifedeal.com
aquaengines.com
diamondswww.net
clearlifesecrets.com
ajarn.net
realestatedirections.net
websiteerror32.com
websiteerror32.com
comp32error.com
comp32error.com
ajarn.net
ajarn.net
comp32error.com
comp32error.com

It should be noted that this list is not sorted the way we want, and we also don’t want multiple listings of the same domains, thus adding a sort and a uniq can take care of that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
root@server:/var/mail/virtual/domain.com/mail_accounts/bob# egrep -wR 'From\: (NPW|APP\. STATUS|Womens Network|Surface Pros|Surface Professionals|DIY Professionals|The DIY Pros|The Womans Network|.*Surface Pros)' . | awk '{print $NF}' | grep -oP '(?!\<).*@.*(?<!\>)' | awk -F '\@' '{print $2}' | sort | uniq
ajarn.net
aquaengines.com
clearlifesecrets.com
comp32error.com
dailylifedeal.com
diamondswww.net
dukebluefeeds.com
ga-ear.com
gismoman.com
hansfordsmile.com
healthbeautypurity.com
healthyreviewsplus.com
losportalitos.com
microdroneoffer.com
nightskyreserve.net
oyager.com
pateintsgrowth.com
pureclearbeauty.com
realestatedirections.net
smartskintreatment.com
successisfree.com
tata-harper-skincare.com
tpi20.com
websiteerror32.com

That is our list, a list of domains that we do not want our mail server talking to. Now what do we do with it? Add it to a block list of course.

Part three: Let’s start blocking

Given everything we discussed, here is what we want to achieve:

  1. Zero-tolerance policy towards emails with phrases we don’t like – if you get flagged the domain is blocked
  2. This needs to run frequently, as frequently as the emails are being sent so we don’t alone new emails to flow in matching the same criteria
  3. We don’t want to have to manually run this each time we bad emails in our inbox. It should be automatic.

So let’s get to it. In our case. Email is first filtered via a postfix sever then passed off to an exim server. What we want to do is grab the bad domains from the exim server, and have them sent to the postfix server where they will be blocked before they are passed off to the rest of the network.

On the exim server we want to:

  1. Scan some directories
  2. Output a list
  3. Check output against a list so we don’t add duplicate domains
  4. Update list when needed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
touch /root/updated-spam-domains.list;

touch /tmp/domains.list;

cd /var/mail/virtual/domain.com/mail_accounts/bob/;

egrep -wR 'From\: (NPW|APP\. STATUS|Womens Network|Surface Pros|Surface Professionals|DIY Professionals|The DIY Pros|The Womans Network|.*Surface Pros)' . | awk '{print $NF}' | grep -oP '(?!\<).*@.*(?<!\>)' | awk -F '\@' '{print $2}' | sort | uniq > /tmp/domains.list;

while read line
do
if ! grep -qw ${line} /root/updated-spam-domains.list
then
echo ${line} >> /root/updated-spam-domains.list
fi
done < /tmp/domains.list

The above script is ran as a cronjob. This list gets passed off to the spam filtering server (postfix) where the domains here will be checked against and added to a spam list. The key parts are as follows:

1
2
3
4
5
6
7
8
# /etc/postfix/main.cf

smtpd_sender_restrictions =
hash:/etc/postfix/maps/blacklist_domains,
permit_mynetworks,
reject_non_fqdn_sender,
reject_unknown_sender_domain,
permit

The key part above is the /etc/postfix/maps/blacklist_domains file. This is where the domains are that we don’t want to talk to.

That is updated with the information we got from the exim server. The command line kung-fu used for that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/bin/bash

while read line
do
if ! grep -wq ${line} /etc/postfix/maps/blacklist_domains
then
echo "${line} REJECT" >> /etc/postfix/maps/blacklist_domains
fi
done < /tmp/domains-to-block.list

rm -rfv /etc/postfix/maps/blacklist_domains.db \
&& cd /etc/postfix/maps \
&& /usr/sbin/postmap /etc/postfix/maps/blacklist_domains \
&& postfix reload

The above checks for the domains we found in exim, against the list of already blocked domains, if it is not there, it adds them. Then postfix is reloaded.

Thus the contents of that files looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
root@server:/etc/postfix# cat /etc/postfix/maps/blacklist_domains
mujeresyempresas.com REJECT
smartskintreatment.com REJECT
microdroneoffer.com REJECT
/\.top$/ REJECT
ajarn.net REJECT
aquaengines.com REJECT
clearlifesecrets.com REJECT
comp32error.com REJECT
dailylifedeal.com REJECT
diamondswww.net REJECT
...
..
.

We can test if this works with the following telnet commands:

1
2
3
4
5
6
7
8
9
10
11
root@another-server:~# telnet spam.server.server.tld 25
Trying 104.131.190.164...
Connected to spam.server.server.tld.
Escape character is '^]'.
220 spam.server.server.tld ESMTP Postfix
helo test.com
250 spam.server.server.tld
MAIL FROM: <test@ajarn.net>
250 2.1.0 Ok
RCPT TO: <bob@domain.com>
554 5.7.1 <test@ajarn.net>: Sender address rejected: Access denied

Success!

Conclusion

Honestly there are many ways in which we could improve the spam fighting powers of our server however this method was quick and dirty. It was used to fight against the waves and waves of the same spam messages being sent daily (some times hourly) from seemingly random addresses.

It works for now. I think that is what matters the most.