by Katja and Guido Socher <katja/at/linuxfocus.org, guido/at/linuxfocus.org>
About the authors:
Katja is the German editor of LinuxFocus. She likes Tux,
film & photography and the sea. Her homepage can be found
here. Guido is a long time Linux fan and he likes Linux because it
gives you choices and freedom. You can choose and develop
solutions according to your needs.
Content:
|
Fighting against Spam-Mail
Abstract:
Spam between your mail!? Spam E-mail is growing at an alarming
rate and it is a major problem for almost everybody.
In this article we will explain what to do against this
plague.
_________________ _________________ _________________
|
What is spam-mail?
Spam-mail has many names. Some call it UCE (Unsolicited commercial
email) others call it just Unwanted E-mail but all these names
don't really say what it is. If you don't get spam (yet) then
take a look at this collection
of spam-mail (spam_samples.html). It's a random selection of
spam-mail collected over just a few days. Read through the mails and
you will soon understand that it has nothing to do with
commerce or business. These spammers are criminals. No serious
business man/woman would annoy and offend millions of people to
find a few "idiots" who would buy their tricks.
It is a common misunderstanding of people who have not much
used the Internet to believe that this type of advertisement
can be compared to information they get from time to time from
their local supermarket. Products sold via spam-mails are often
illegal or no products at all. They are tricks to get your
money.
How much?
Spammers get your e-mail addresses from webpages, news groups
or domain records (if you have your own domain). There are
individuals who use robots to extract the addresses, burn them
on CDs and sell them very cheap to other Spammers. If you write
your e-mail address in clear text onto your homepage today such
that programs can extract it, then you will have a major problem in a few
months time and you can't stop it. The problem will
be growing every day!
In 1998 the percentage of spam mail sent to LinuxFocus was less than
10%.
As of November 2002 the statistics are as follows:
Our server gets about 4075 mails per week. 3273
are spam-mails!
=> 80% of all mail is Spam.
That is 80% of the capacity of the mail server and 80% of the
network bandwidth is for something that nobody wants.
Out of these 3273 spam mails about 40% originate in America
(mostly Canada, US, Mexico) and about 30% in Asia (mostly
Korea, China, Taiwan).
What to do with Spam
If you look at the spam-mails
you will notice that almost all offer a possibility to be
removed from the list. Don't do it! You are dealing with
criminals. None of the spammers get anything if they maintain a
proper remove list. Why do they still add this possibility? The
answer is simple. It makes a much better impression on the
reader and it's an excellent statistical tool. The spammers can
immediately check that their mails arrive. In other words
you confirm the reception of the mail!
There is also a simple technical problem with the idea of a
remove list. LinuxFocus is not a very big site but we would
need 1 person full time to unsubscribe 3273 Spam mails per week
and then this person would need to unsubscribe one
mail every minute . Every spammer uses a different method, it would be an
idiotic task and it can't work. Remove lists are nonsense and
help only the spammers.
The only right thing to do is: delete it.
Software to handle spam
There are many different options to filter out spam and this is
good because it makes it harder for spammers to circumvent
them. It's however an arms race. The tools to filter spam
become more sophisticated but spammers improve their methods
too.
There are 2 types of filters:
- Checks directly build into the MTA (Message Transfer
Agent=Mail server). Here you can usually reject the mail.
That is: you don't even store the email. You send an error code
back as soon as you recognize that this is spam during the reception of the
email. Typical tools of this kind are IP
based blocklists and mail header checks. If you don't have your own
Mailserver then your ISP would need to configure this.
- Filtering after the reception of the mail. In this case
the email is successfully delivered and will be filtered out
later.
We will now discuss the different possibilities in detail, all of
them have advantages and disadvantages. The best solution to
get rid of all spam is to use several different tools.
Rejecting email directly at the MTA
If you reject your mail directly at the mail server during the
reception of the mail then the spammer can get back an error
code and knows that this address does not work. If he is one of
the "CD-makers" then he might take out the address. It can save
network bandwidth because you don't have to receive the full
message. You can send the error code back as soon as you find that
this is spam.
To do this you need a good and flexible MTA. Unfortunately the
two most common servers, Sendmail and the one from Bill Gates
are not good at all for this task. Two very good alternatives
are Postfix and Exim. If you can't change your server
then you can put an smtp proxy such as messagewall in front of
the server (smtp = Simple Mail Transfer Protocol, the Internet
mail protocol).
We will now discuss some common filter techniques and how they
work. We will not describe how to configure them exactly in each
MTA. It would make the article too long. Instead we suggest to
read the documentation that comes with the MTA that you have
installed. Postfix and Exim are well documented.
- Realtime Block lists:
These are DNS based lists. You check the IP address of the
mailserver that wants to send mail to your server against a
blacklist of known spammers. Common lists are
www.spamhaus.org or ordb.org. There is also a tool called blq
(see references) to manually query such block lists and test
if a given IP address is listed. You should however not be
too enthusiastic about it and carefully choose the lists
since there are also some which block entire IP ranges simply
because one spammer had used a dialup connection
from this ISP at one point in time. We personally would at least enable ordb.org to
keep out mail from poorly administrated servers.
Experience shows that these lists block about 1%-3% of the
spam mail.
- 8 bit characters in subject line:
About 30% of the spam origins in China, Taiwan or
other Asian countries these days. If you are sure that you can't read
Chinese then you can reject mail which has a lot of 8 bit
characters (not ASCII) in the subject. Some MTAs have a
separate configuration option for this but you can also use
regular expression matching on the header:
/^Subject:.*[^ -~][^ -~][^ -~][^ -~]/
This will reject email which has more than 4 consecutive
characters in the subject line which are not in the ASCII
range space to tilde. If you are not familiar with regular
expressions then learn them, you will need them (See LinuxFocus article 53).
Both exim and postfix can be compiled
with perl regular expression support (see www.pcre.org). Perl
has the most powerful regular expressions.
This method is quite good and keeps out 20-30% of the
spam-mail.
- Lists with "From" addresses of known spammers:
Forget it. This used to work back in 1997. Spammers today use
faked addresses or addresses of innocent people.
- Reject non FQDN (Fully Qualified Domain Name) sender and
unknown sender domain:
Some spammers use non existent addresses in the "From". It is
not possible to check the complete address but you can check
the hostname/domain part of it by querying a DNS server.
This keeps out about 10-15% of the spam and you don't want
these mails anyhow because you would not be able to reply to
them even if they were not spam.
- IP address has no PTR record in the DNS:
This checks that the IP address from where you get the mail
can be reverse resolved into a domain name. This is a very
powerful option and keeps out a lot of mail. We would not
recommend it! This does not test if the system administrator of
the mail server is good but if he has a good backbone
provider. ISPs buy IP addresses from their backbone providers
and they buy from bigger backbone providers. All involved
backbone providers and ISPs have to configure their DNS
correctly to make the whole chain work. If somebody
in between makes a mistake or does not want to configure it then
it does not work. It says nothing about the individual
mail server at the end of the chain.
- Require HELO command:
When 2 MTAs (mail servers) talk to each other (via smtp) then
they first say who they are (e.g. mail.linuxfocus.org). Some
spam software does not do that. This keeps out 1-5% of the
spam.
- Require HELO command and reject unknown servers:
You take the name that you get in the HELO command and then
you go to DNS and check if this is a correctly registered
server. This is very good because a spammer who uses just a
temporary dialup connection will usually not configure a
valid DNS record for it.
This blocks about 70-80% of all spam but rejects also
legitimate mail which comes from sites with multiple mail
servers where a sloppy system administrator forgot to put the
hostnames of all servers into DNS.
Some MTAs have even more options but the above are quite
commonly available in a good MTA. The advantage of all those
checks is that they are not CPU intensive. You will usually not
need to update your mailserver hardware if you use those
checks.
Filtering of already received mail
The following techniques are usually applied to the complete
mail and the mail server who sends the mail does not notice that the
mail could not be delivered. It means also that a legitimate
sender will not get a failure report. The message will just
disappear.
Having said this we must also say that this is not totally correct because
it really depends on the filtering possibilities of the
mail server. Exim is very flexible and would allow you to write
custom filters on messages.
-
SpamAssassin (http://spamassassin.org/):
This is a spam filter written in perl. It uses carefully
handwritten rules and assigns certain points to typical
spam phrases such as "strong buy", "you receive this mail
because", "Viagra", "limited time offer".... If the points
are above a given level then the mail is declared as spam.
The problem with this filter is that it is very heavy in
terms of memory and cpu power. You will probably need to
upgrade your mail server hardware especially if the server
is already 2-3 years old. We would not recommend to use
it directly on the mail server. Spamassassin comes with a
spamd program (spamd=spam daemon + spamc=client to connect
to the daemon) which will reduce the startup time of
spamassassin and reduce the cpu consumption but it is still
a very resource demanding application.
To filter the mail you need to create a .procmailrc file (and
.forward) similar to this one:
# The condition line ensures that only messages smaller than 50 kB
# (50 * 1024 = 56000 bytes) are processed by SpamAssassin. Most spam
# isn't bigger than a few k and working with big messages can bring
# SpamAssassin to its knees. If you want to run SpamAssassin without
# the spamc/spamd programs then replace spamc by spamassassin.
:0fW:
* < 56000
| /usr/bin/spamc
# All mail tagged as spam (e.g. with a score higher than the set threshold)
# is moved to the file "spam-mail" (replace with /dev/null to discard all
# spam mail).
:0:
* ^X-Spam-Status: Yes
spam-mail
The installation is easy and spamassassin will filter more
than 90% of the spam.
-
procmail (http://www.procmail.org):
Procmail is not a spam filter on its own but you can use it
to write yourself one. procmail is also very light weight
as long as you limit the number of rules to something
reasonable (e.g. less than 10). To use it you create a
.forward file in your home-directory and add there the
following line:
"| exec /usr/bin/procmail"
Some people recommend to use
"|IFS=' ' && exec /usr/bin/procmail"
but this creates new problems with an extra process being
created which does not run under the control of the
mailserver any longer. Secure mail servers like postfix or
exim will have no problems with the .forward file as shown
above.
Procmail is especially useful in an environment where you
normally communicate just in a closed group. E.g. for people
in a company where most of the mail should come from your
colleagues and some known friends. Here is an example for
"mycompany.com":
# .procmailrc file.
# search on header for friends:
:0 H:
* ^From.*(joe|paul|dina)
/var/spool/mail/guido
# search on header for mails which are not coming from
# inside mycompany.com and save them to maybespam
:0 H:
* !^From.*(@[^\@]*mycompany\.com)
/home/guido/maybespam
# explicit default rule
:0:
/var/spool/mail/guido
This makes it much easier to delete spam and you don't find
the ugly spam between your normal mail.
Procmail is very flexible and can also be used for other
tasks. Here is a totally different example:
Procmail comes with a "reply to sender" program called
formail. This can e.g. be used to send a message back to
people. A major plague are those e-mails with word
documents inside. If you are a Linux developer using e-mail
to exchange information about your projects or Linux in
general then you are for sure not interested in people who
write text into a word document and attach it to mails.
Viruses can easily be spread that way. They don't usually
infect Linux but it's a bad idea in general to use MS-word
for sending text to other people as it requires MS word
with the same version on the receiver side to read the
text. There are open formats such as RTF or HTML which do
not spread viruses, are cross platform, and do not have
such a version problem.
# Promail script to
# reject word documents. Reject the mail, but do not reply to
# error messages "From MAILER-DAEMON"
# If you use ":0 Bc" instead of ":0 B" then you will still get the mail
:0 H
* !^From.*DAEMON
{
# The mime messages with word documents look like this in the body
# of the message:
#------=_NextPart_000_000C_01C291BE.83569AE0
#Content-Type: application/msword;
# name="some file.doc"
#Content-Transfer-Encoding: base64
#Content-Disposition: attachment;
# filename="real file.doc"
:0 B
* ^Content-Type:.*msword
| (formail -r ; cat /home/guido/reject-text-msword ) | $SENDMAIL -t
}
# explicit default rule
:0:
/var/spool/mail/guido
The text file /home/guido/reject-text-msword should contain
a text explaining that msword documents can spread viruses
and ask the sender to send the document e.g. in RTF
format.
How to use procmail and what all these strange letters in
the configuration file mean is very well explained in the
"procmailrc" man page.
- bogofilter (http://www.tuxedo.org/~esr/bogofilter ):
Bogofilter is a Bayesian spam filter. It is entirely written
in C and it is very fast (compared to SpamAssassin). A
Bayesian filter is a statistical filter that you have to
train first to learn what is spam and what is not spam. You
need about 100 training messages (sorted into spam and not
spam) until the filter can work efficiently on new
messages.
Bogofilter is fast but it does not work from day one as
SpamAssassin. After a while it will be as efficient as
SpamAssassin and filter more than 90% of the spam.
- razor (http://razor.sf.net/):
This is a distributed, collaborative, spam detection system.
Checksums of known spam messages are stored in a database. If
you get a new mail you compute the checksum and compare it
with checksums in the central database. If the checksum
matches then you can discard the message as spam. razor works
because special e-mail accounts have been spread over the
Internet only for the purpose of getting into the address
lists of all the spammers. These accounts catch only spam and
no normal mail. In addition people can of course send mails
to razor for marking it as spam. There is a good chance that
the mails are already known as spam before they arrive in
your mailbox. The system filters about 80% of the spam. razor
has one characteristic that none of the other post processing
and filtering techniques has: razor detects almost no false
positives. That is: the number of mails which are not spam
but still declared spam is very low with razor.
There are many more possible solutions to fight against spam. We
believe that the above covers the most important ones.
The best solution is to use checks in the MTA as a first stage
and then kill the remaining spam in a second stage with a
post-processing filter.
HTML mails
A particularly dangerous form of e-mail are spam mails in HTML
format.
Most spammers use the "unsubscribe possibility" to see how many
of their mails arrive. HTML formatted mail offers a much better
form of feedback: Images. You can compare this system with the
visitor counters as found on some webpages. The spammer can
exactly see when and how many of the mails are read. If you
study Spam carefully you will see that in some cases the URL
for included images contains a sequence number: The spammer can
see exactly who looks at the mail and at what time time. An incredible
security hole.
Modern mail reader programs will not display images which are
downloaded somewhere from a URL. However there is hardly any
modern and secure HTML mail reader. Kmail and the very latest
version of mozilla mail offer the possibility to disable images
from external sources. Most other programs will generate nice
statistics for the spammer.
The solution? Don't use a html mail capable program or download
the mail first then disconnect from the Internet and then read
the mail.
Where does the spam come from?
Never trust the sender address in the "From" field of spam
mails! These are either non existent users or innocent people.
It is very rare that this is the mail address of the spammer.
If you want to know where the mail comes from then you have to
look at the full header:
...
Received: from msn.com (dsl-200-67-219-28.prodigy.net.mx [200.67.219.28])
by mailserver.of.your.isp (8.12.1) with SMTP id gB2BYuYs006793;
Mon, 2 Dec 2002 12:35:06 +0100 (MET)
Received: from unknown (HELO rly-xl05.dohuya.com) (120.210.149.87)
by symail.kustanai.co.kr with QMQP; Mon, 02 Dec 2002 04:34:43
Here an unknown host with IP address 120.210.149.87 who claims
to be rly-xl05.dohuya.com sends the mail to
symail.kustanai.co.kr. symail.kustanai.co.kr sends this message
further on.
The spammer is hiding somewhere behind 120.210.149.87 which is
probably just a dynamic dialup IP address.
In other words the police could find this person if they would
go to the owner of kustanai.co.kr and ask for server logs and a
printout of connections from the local telephone company. You
have very little chance of finding out who that was.
It could also be that the first part is faked and the spammer
is really behind dsl-200-67-219-28.prodigy.net.mx. This is very
likely since there is no good reason why symail.kustanai.co.kr
should send the mail to msn.com via the dsl dialup connection
(dsl-200-67-219-28.prodigy.net.mx). The mailserver.of.your.isp
(symbolic name) is the server of your Internet Service Provider
and is the only part from this "Received:" line which is reliable.
It is possible to find the spammer but you need international
intelligence and police forces to go to prodigy.net.mx.
Conclusion
If spam continues to increase at the current rate then the
Internet will soon transport a lot more Spam than real e-mail.
Spam is transported at the cost of the receiver. More bandwidth
is needed and often the mail systems need to be upgraded to
handle the Spam.
Laws in many countries do little to protect people
against criminal spammers. In fact some countries have laws
which restrict only honest people (digital rights management etc.
...) and help the criminals (e.g. to get nice statistics about the
spam-mail).
Join the Coalition Against UCE!
http://www.euro.cauce.org/en/ |
http://www.cauce.org/ |
Internet Service Providers should check their mail systems. No
unauthenticated access to mail servers must be given and the amount
of mails that one user can send per minute must be limited.
References
Talkback form for this article
Every article has its own talkback page. On this page you can submit a comment or look at comments from other readers:
2003-01-23, generated by lfparser version 2.35