Pentagon Email Relays in SMTP Census

by Doctor Electron

No, the author is not writing this from a prison cell. Yes, email relays -- like spammers might use -- were found on Pentagon computers and other U.S. military networks.

Since people love email, a net census of computers connected to the internet might well include Simple Mail Transfer Protocol (SMTP, RFC 821) servers that accept connections from email client programs on default port 25.

Almost all email arrives at SMTP servers for delivery to inboxes of persons in the network of the particular server. However, when both the sender and recipient are foreign to the server's network, the server may forward the mail to its recipient in another domain. This is called "email relay" and is often used by spammers.

At a medical conference, a presentation on an illness usually begins with a statement on its prevalence and hence the importance of its study. Thus, we will start with (1) computers accepting internet connections on port 25 and end with (2) an estimate of the number of mail relays in the world today. Perhaps there will be some interesting twists and turns in the plot along the way.

Methods and Results

1. How many computers have SMTP port 25 listening for connections?

Established connections with port 25 were found at 5119 IP addresses selected randomly. This sample size is sufficient to reliably estimate characteristics of the population of IP addresses where SMTP servers are accessible.

0.66% of random addresses from 1.0.0.1 to 218.255.255.254 (excluding local host 127.x.y.z) established connections on port 25. 775,606 (5119 / 0.66%) packets to random addresses were required to find these 5,000 plus open ports.

Other results by the author [papers in preparation] may provide context. Data collected thus far show that observed response rates with common services were: 1% for ICMP Echo, 0.76% for FTP port 21, 0.67% for telnet port 23, 0.66% for SMTP port 25, 0.60% for NetBios port 139, 0.52% for HTTP port 80, 0.47% for HTTPS port 443 and 0.37% for SSH port 22.

Now we estimate the population. The address space considered has about 3.6 billion possibilities (3,612,213,248 = 217 x 256 x 256 x 254). Thus, 0.66% x 3.6 billion is a little more than 23.8 million addresses where a system listening on port 25 might be found.

With virtual IP address technology, probably there are not nearly as many as 23.8 million different machines in this estimated population of addresses. Indeed, entire domains including hundreds or in some cases thousands of addresses may be directed to a single SMTP server. But these considerations do not change our estimate of how many addresses will accept connections on the SMTP port 25.

2. How many functional SMTP servers exist?

A minimum requirement is that the SMTP server says something after a TCP connection is established. This something is text data, called the "banner," sent to our client computer. 3352 of the 5119 connections produced banners. Table 1 lists the frequency of common banner responses.

Table 1: SMTP Server Responses to Foreign Client

   N Code Typical Response Description
  85  572 Relay not authorized
 122  555 no domain at this ip address. goodbye!
  12  554 Connection not authorized
   1  500 Invalid data flow attempt
   1  451 /etc/sendmail.cf: line 0: cannot open
 468  421 Sorry, you are not authorized to make this connection
2663  220 some.part.of.the.mil SMTP/smap Ready.

3352 = Total connections sending data

Legend: N, number of servers; Code and description from SMTP data collected.

Taken at face value, 122 of these cases with the 555 response code state that a server is not present. Two cases do not appear functional (Codes 500 and 451). This leaves us with 3228 servers (3352 - 124).

This 3228 of the 5119 connections suggests that 63% of the connections may have working SMTP servers implemented. Hence, we estimate about 15 million (0.63 x 23.8 million) mail server addresses presently world-wide.

Many of the other 37% of the connected computers may be just listening to see who might stop by to chat and what they might say. There is more to that story.

3. What do the SMTP servers say?

Table 1 lists the types and incidence of SMTP server responses to an "outside" client email program. In the best of all worlds, a client outside the local network would be delivering mail to holders of inboxes in the domain or network of the SMTP server or be sending outgoing mail from one of those known users. In this research, the client email program written by the author was doing neither of these "normal" activities, and may be referred to as a "foreign client".

A more secure procedure would be to require all senders of mail to do a POP3 logon first, which would require the sender's user name and password to establish the right to use the server for sending mail. In this study, there was no prior POP3 logon performed.

Response codes 572, 554 and 431 politely stated that the client test program is not authorized to connect. No doubt the client IP address of this lab was not known or acceptable to these servers.

However, 2663 servers were willing to talk, indicated by the 220 response code. The typical response includes the name of the server (fictional in Table 1) and description of the SMTP or ESMTP software running. Shortly, we shall attempt to send "relay" mail in each case.

4. Who operates these SMTP servers?

With random sampling of IP addresses, there was no bias in data collection concerning variables like geographic location, country, organization, etc.

A reverse DNS lookup for each established connection showed that a minority, 1204 of the 3252 servers (37%), have DNS entries for their host name. However, most of the others provide their host name in the 220 introductory banner. Thus, for both white-hat and black-hat hackers, port 25 may provide easily obtained information on host identity.

The .mil example in Table 1 illustrates the large number of military email servers. Some readers may think of Hotmail, Yahoo, AOL or the like in relation to email. The present data shows a strong presence of military email servers, which may not be too surprising considering the size of the organization and that many participants are away from home and may use email regularly.

Using only the reverse DNS lookup and the SMTP banner self-identification (further analysis could use IANA and ARIN databases), Table 2 lists some of the port 25 responses by common domain types. Most of the other domains were for non-U.S. computers.

Table 2: Port 25 Profile by Domain Type

Domain   220   Other  Listen  Totals
com      696++  138    241--  1075
edu       49     15     33      97
gov       20--   19     46++    85
mil      564--  221+   429++  1214
net      344++   19--  105-    468
org       60     29+    21-    110
Totals  1733    441    875    3049

Legend: 220, connections sending 220 code banners; Other, authorization required (see Table 1); Listen, connections with no data received. Significant differences from expected probability (+, greater; -, less than) where p < .01 for (+,-) and p < .001 for (++,--).

The most frequent domains were .mil and .gov together, followed by .com, .net, .org and .edu. By comparison, only seven AOL sites were identified in the same random sample, with just three of these producing 220 banners.

Table 2 also illustrates how this data may be profiled. The .com and .net sites both showed significantly more 220 responses and less listening connections than expected by chance (expected frequency = row total x column total / table total). The .mil and .gov connections showed the inverse pattern: significantly less 220 responses and more listening responses than expected by random distribution of the data. Further, the .mil and .org connections showed greater interest in authorization (Other in Table 2).

Clearly, there are two types of domains. The .com and .net domains appear to be more concerned with receiving mail (the 220 responses), compared to authorization and listening. On the other hand, for the .gov and .mil connections, there was relatively more emphasis on listening.

Many of the listening connections may be the crudest types of honeypot situations. Basically, it is like you call someone on the phone and a person answers the phone but says nothing. The person you called may quickly hang up or just listen to see if you say something. However, in ordinary phone call etiquette and in SMTP, the receiver of the call is supposed to say something to the caller.

5. Can relay mail be sent?

Email Test #1, used in the present study, was conducted as follows:

POP3 Logon = none
HELO, MAIL = yourIPaddress
FROM = postmaster@yourIPaddress
RCPT = TO foreign, valid address
SUBJECT = Mail Test #1 of yourIPaddress

where "yourIPaddress" was the randomly selected IP address.

Notice that the sender was portrayed as local to the server, namely its own postmaster.

If the SMTP server allowed mail to be sent, the text of the message was:

From: postmaster@yourIPaddress
To: freepress@myrealbox.com
Reply-to: postmaster@yourIPaddress
Subject: Mail Test #1 of yourIPaddress

This message was sent to test your SMTP server.
IF we receive this message, your server may be open to abuse.
And we will notify you by return email.
Please direct any questions to freepress@myrealbox.com

Note that this method identified Net Census with one of its valid email addresses in both the RCPT TO field and in the body of the message. An important methodological and ethical point is that the research entity should always identify itself.

Table 3 shows that most SMTP servers rejected the email relay test #1, but not all.

Table 3: SMTP Server Response Codes for Mail Test #1

     N  Banner HELO   FROM   TO     DATA   QUIT   CLOSE
a  343  220   (221) 
b   23  220    4xx/5xx
   190  220    250
c   69  220    250    221
   674  220    250    4xx/5xx
    96  220    250    250
d  407  220    250    250    221
   608  220    250    250    4xx/5xx
    70  220    250    250    250
e    7  220    250    250    250    5xx
     2  220    250    250    250    354
f  174  220    250    250    250    354    250    221
  2663  

(a) 343 servers simply disconnected, perhaps not liking the looks of our client IP address. 20 of these were polite enough to send a 221 response indicating the connection would be closed.

(b) Our client issued the HELO command using the IP address of the SMTP server as its identity, which was rejected by 23 servers.

(c) For the remaining 2297 servers, our client sent "MAIL FROM:<postmaster@[serverIPaddress]>", which was rejected by 933 (190 + 69 + 674) servers. It seems that most of these attempted to determine the identity of the client. It was somewhat comic that many replied that they did not know their own postmaster.

(d) Next, the "RCPT TO:<freepress@myrealbox.com>" command was accepted (250 response) by 253 of the 1364 remaining servers. At this point in the protocol, some 1111 servers threw in the towel. The server had (1) the client IP address from the TCP packets, regardless of what it was told in steps (b) and (c) above, and now (2) that the recipient of the mail was not a local user, but rather a foreign address at a valid domain.

Table 3 lists some of the variations in the response codes. Most servers stated that relay mail is not allowed.

(e) The magic 354 response to the client "DATA" command indicated that 176 servers were ready to accept email header and body text presented above.

(f) 174 connections accepted the email with the 250 response.

For these and all of the open connections above, the procedures did not include any attempt to "retry" or try alternative commands. Any error message from the remote host elicited the "QUIT" command from our client, and the server response (normally code 221) was noted, before our client closed the connection.

6. How many relayed Test #1 emails were received?

Most of 174 apparently successful relay emails submitted may not have been actually sent, since 50 emails were actually received. Some of these sessions may have been with fake servers (honeypots) or the mail may not have passed later screening by the hosts.

7. Can we estimate the number of email relays?

From section 2 above, an estimated 15 million addresses with SMTP servers accessible was based on a sample of 3228 connections suggesting operating servers. With the present data and using the more conservative value of 50 relayed emails actually received, 50 of 3228 or 1.55% were demonstrated to be email relays.

Thus, we can estimate about 232,500 (0.0155 x 15 million) email relays -- almost one quarter million. No wonder there is so much spam. Keep in mind that virtual IP addressing almost certainly means that the actual number of computers is less than the estimated number of email relay IP addresses. (Another report by the author will show that certain address prefixes utilize virtual IP addresses in an apparently big way.)

8. Can we reduce the number of email relays?

Many of the relays found in this study have already been closed. For almost all of the relayed emails received by the author, it was possible to reply to the responsible persons of the systems affected. This is an example of the text sent:

On Sat, 8 Jun 2002 23:52:47 -0400, you wrote:

>This message was sent to test your SMTP server.
>IF we receive this message, your server may be open to abuse.
>And we will notify you by return email.
>Please direct any questions to freepress@myrealbox.com

Email Test #1:
HELO, MAIL = Your IP address, FROM postmaster@yourIPaddress
RCPT = TO foreign, valid address
POP3 Logon = none

Hello, as you can see above, this reply is to an email which was sent
through your server -- IP address in "Subject:" above -- to a valid
address indicated above.

This may mean your server would be subject to abuse, such as spam
mailing, denial or service, or worse.

We do not know the specific objectives of your server, but we note that
the "mail relay" observed in this test might merit your attention.  The
SMTP server may be configured to prevent this providing better security
to your network/organization and to others.

This is a research study where your IP address was picked at random.
Our interest is not in specific cases but rather, in statistics derived
from many cases.  However, we are interested in computer security
issues.  If we can be of assistance to you or if you have questions,
please direct mail to freepress@myrealbox.com.
https://www.angelfire.com/space/netcensus/ is the Net Census web site.

Greetings, Global Services

In many cases, the postmaster@[IPaddress] was found to be invalid. However, in almost all cases, the headers of the relayed email and other research sufficed to produce a valid email address to contact the responsible parties for the open email relay servers found.

9. Who runs the email relays found?

Considering the 174 relay emails accepted by SMTP servers, the most common domain types were: .com (n = 87), .mil (n = 51), other countries (n = 39), .edu (n = 12) and .net (n = 9). Thus, the biggest email relay operators appear to be commercial (.com) and the military (.mil).

Considering the 50 relay emails actually received, the major relays were .com (n = 22), other countries (n = 17) and .mil (n = 6).

124 of the 174 relay mails accepted for sending were not received, perhaps because of post-hoc screening by the host. From the values above, this possible screening was greatest for .mil (88%) and .com (75%) and least for other countries (56%).

Given the large number U.S. military networks and email servers (e.g., Table 2), it was somewhat puzzling that four of the six relay emails received from .mil servers were relayed through the Pentagon. An email address was found for the responsible parties at the Pentagon to provide the "reply" and explanation in section 8 above. By the time the fourth relay email was received as a result of the random sampling procedure, it was found that even this previously valid contact email address had been closed.

Discussion and Conclusions

This case study of SMTP servers illustrates how random sampling may be used to estimate population parameters in internet research. The data collected was used to estimate the number of IP addresses with SMTP servers accessible and finally the number of open email relay addresses.

The email test #1 used in this study is one variation of a variety of procedures that could be used to send relayed email. Considering just the first step, the HELO value used was the IP address of the SMTP server. Other values might produce different results. Strictly speaking, then, the email relay estimate calculated in this report applies to the specific method used.

A larger sample size for this data would permit more accurate estimates and justify a more detailed presentation including confidence intervals and further examination of subsamples (by country, domain type, etc) and their corresponding populations. Since email is a key internet function, SMTP server values provide variables which might be highly correlated with general internet usage.

As the internet continues to grow in social importance, commentary and policy should be based as much as possible on empirical data.

Copyright 2002 Global Services

Last Modified: July 28, 2002

Back to Net Census