Updated: June 9, 2005
GypsyProxy includes functional http, http proxy, socks v5 and chat servers as well as some honeypot features.

A special combo dubbed "Gypsy woman" delights spammers and anti-spammers alike. The Gypsy woman feature redirects proxy connections to internet smtp mail servers to an internal honeypot server which prevents the spam from actually being sent. Spammer records will show that it was sent; so they are happy. Anti-spammers can launch an army of Gypsy women and reduce spam because Gypsy takes it in, tanks it and documents spammer activity.
This document presents details of the individual servers and honeypots in GypsyProxy. Users may suggest where improvements in GypsyProxy might be focussed.

SERVERS

http port 80 GypsyProxy includes a real and simple web server which supports subdirectories. The default root directory can be changed by the user. GET, HEAD and POST methods are supported in this CGI/1.1 compliant server.

The author has tried a number of web servers available on the net and many are very professional and filled with features. In some cases, the user pays dearly in lost system resources by almost every measure. Even so, at least in my hands, I could not get a server package which seemed able to do everything, to deliver to a client a default index.html file after a GET / HTTP/1.0 request. It wanted to send a header with a Location statement instead. In this case, all of those bells and whistles are for naught if it can't send the requested default web page. So...

GypsyProxy was designed to be as small and simple as possible to minimize use of system resources on computers used for other applications. Compared to more complicated servers, Gypsy does not need a baby-sitter. Just set the root directory of your web page(s), and go online.

Gypsy does not require fore-play, negotiation with clients or a U.N. vote. If Gypsy finds the requested resource, it is sent. End of story.

As with other servers, subdirectories of the web server root directory can represent different web sites hosted by the same machine, in which case the main default web page of the hosting machine may or may not exist and provide links to the default pages of each publicly hosted web site.

While parsing a request, responses for port 80 (and http proxies 3128 and 8080) include descriptive message content and the following response codes: 200, 302, 305, 307, 400, 404, 405, 408, 500, 501, 503 and 504.

Traffic is logged to 80.log, but requests containing ".ida", such as in Code Red probes, are logged to ida.log. Nimda and similar requests containing "+dir" are logged to Scripts.log. Further, content with "default.ida" CGI requests is not processed, but rather saved to the file DATA\[client IP address].ida for your analysis and delight [Requires compiled ida.bas].

CGI/1.1 programs are put in the CGI-BIN subdirectory and your html form ACTION refers to the program name like this: (1) for the GET method, "cgi-bin/prog" and "prog" are identical if the later is followed by a query string from your form and (2) for the POST method, "cgi-bin/prog" and "prog?" are identical. Notice the prog extension may be omitted and Gypsy will run CGI-BIN\prog if found. Or you could substitute "prog.exe" for "prog", if you want to specify the entire program/script name.

In a CGI action, the DATA subdirectory contains (1) post data as postX.dat before the CGI program is called and (2) the result as CGIoutX.dat where X = the internal connection index used by Gypsy (usually the left number in the left status bar). These files may be overwritten when the same connection index is used again for a CGI call.

For CGI requests, extra information appears in 80.log: (1) rIP = cgi program name, (2) rName = request Content-Type and (3) rPort = request Content-Length.

Gypsy never sends arguments or query strings to CGI programs as command-line arguments, since this may be a security risk if the strings are too long. In any case, your forms should always use the POST method if there is any chance that a client might make a very long entry. The GET method transmits the query string from the form as an environment variable which, in effect, forces the webmaster to assert that mistakenly or maliciously long form inputs will not interfere with system function. Unless you have detailed knowledge about these possibilities, the POST method is safer, since data can be any length.

postdata.log logs client IP address, date and the POST data, for every client POST where Content-Length is greater than zero. This information can be useful to associate specific clients with data posted in filling out forms, passwords submitted, etc.

If a client sends a series of requests in a short interval (or in one packet), once the receive buffer is read, only the first request is considered. This applies also to the http proxy ports. Right or wrong, it seems to work very well. Basically, it seems that browsers are designed to be very robust and methodical in retrieving web content.

http proxy ports 3128, 8080 The GET, HEAD, POST and CONNECT methods work. In fact, any method except CONNECT will be passed on to the remote web server.

Requests through the proxy ports all use (1) the HTTP version in the request, (2) Host: [host specified by client] and (3) Connection: Keep-Alive statements. If the request contains content, its (4) Content-Type, (5) Content-Length and (6) up to about 8K of content are also transmitted. [Soon, transmission of the total content, if greater than 8K, will be coded.]

GypsyProxy creates its own http headers for client requests sent to remote web servers. However, the entire data stream -- http header and content -- from the remote server is transferred without alteration to the client web browser.

Note that the original client browser requests received by Gypsy may contain a whole autobiography of the client -- what computer, what OS, what browser, what it will accept, etc -- as if anyone cares. Gypsy transmits none of this identifying chatter.

What happens because Gypsy creates its own simple http header for the remote web servers? Essentially, nothing of consequence. Complex web pages like cnn.com are transmitted completely. In rare cases, the remote host thinks the client browser is dumb and will say so. This may help you identify what web pages are written by jerks. A very few web pages will not appear completely correct in layout because the web server apparently needs the autobiography with each request. Try msn.com as one example. Such cases are rare. Most web sites are able to deliver content which is properly displayed. In other cases, certain ad banners will not be displayed, probably because that depends on having the autobiography of the client. Note: The tests above used IE 5.0 with all scripting (and Java) disabled through GypsyProxy.

In short, GypsyProxy is a truly anonymous http proxy. To filter out any autobiography that your own browser sends, set that browser to use Gypsy as proxy. [Tools > Internet Options > Connections > Settings > Use Proxy checked; address = 127.0.0.1; port=8080 > OK > OK]

The CONNECT method allows proxied tunnel connections to any host on any port, except port 25 on non-LAN hosts from non-LAN clients (non-LAN = internet). That is, non-LAN clients cannot use the http proxy CONNECT method to reach port 25 used by internet smtp servers. Spammers will try to do this and unless this is blocked your http proxy will be found and used. Some spammers will attempt essentially unlimited number of connections through the http proxy to various smtp servers at the same time. Thus, the honeypot smtp server described below was added.

On the other hand, a LAN email client can CONNECT through GypsyProxy to port 25 on an internet smtp server for sending mail and to a POP3 server to retrieve mail.

Http proxy connections with the remote computer are closed after 8-12 seconds (depending on version) of inactivity, again, assuming the remote host does not close the connection earlier. This time is sort of a "tweak" and changes may improve performance. It is intended to give time for a client to issue more requests to the same or another remote host, but be short enough that unused connections are closed to save resources.

Proxy connections for the CONNECT tunneling method (and the port 1080 socks v5 proxy) will remain open for 5 minutes after the last activity, if the remote host does not close the connection.

If a CONNECT request omits the remote port, it is assumed to be port 80. Gypsy returns an error code for CONNECT requests to its own IP address.

Note that it is possible for an external network (internet) client to attempt to connect to computers on a LAN if GypsyProxy is run on a machine with an internet interface. Probably there should be some limitations added here. For example, if GypsyProxy is run on a LAN machine without a direct internet interface, it could serve private web pages over the LAN. However, these may not be so private if internet clients could tunnel through another copy of Gypsy on a gateway machine to the private web page server on a LAN machine.

A future version might just add one statement testing (and disallowing) non-LAN clients to CONNECT to LAN addresses. However, the ability to tunnel in to a server on the LAN may be a useful feature in some situations.

socks v5 port 1080 In version 5 socks, an authorization exchange is required before connection requests can be made. Gypsy will reply to the client that authorization is not required. A proxied connection works essentially the same as the http proxy CONNECT method, except that the syntax of client-proxy dialog is different (please see RFC 1928). Note: Internet clients that attempt to connect to an internet smtp server, are redirected to the local smtp honeypot.

chat port 6667 works in two modes: text and binary.

In the most simple case, even a program like netcat can be used to connect and act as a chat client. The chat server in this mode strips all non-printable characters from incoming "chat" and adds its own cr-lf before sending the chat to all other online clients who are also in this "text mode".

In the binary mode, instant private groups can be created. If the client program sends a special string to the server immediately after a connection is established, a private group and binary transfer are both implemented. Here is how it works. First, the "special string" consists of two hex FF characters (255) followed by any string, such as "MyGroup" or whatever. This latter string is sort of a password, since in this mode, incoming chat or data is distributed only to other clients who also logged on in binary mode with the same string, such as "MyGroup". This "group ID" string is added as a prefix to the Welcome message that Gypsy sends to chat port clients. Thus, the client program can verify that binary mode has been established in a particular connection.

Once in binary mode, Gypsy will not alter in any manner the data it receives and distributes to clients in the same group. Thus, encryption of chat can be done or binary files transferred. Although this is called "binary" mode, plain text can also be sent. If the chat is plain text, the only difference from "text mode" is that you have created a private chat group on the fly and only people who know the name of the group can participate. Note that the author has written a proprietary encryption method and implemented that in a chat client called "X-Press".

HONEYPOTS

smtp port 25 is a honeypot smtp mail server. Clients connecting to port 25 will find an apparently fully functional smtp mail server. There is rudimentary syntax check for MAIL and RCPT (presence and position of "@" and ".") commands in this SMTP honeypot. Further, replies contain appropriate error messages if the client fails to send mail according to the protocol.

The smtp honeypot supports these commands: HELO, EHLO, MAIL, RCPT, DATA, VRFY, EXPN, RSET and QUIT.

For attempts to CONNECT from the internet (but not from LAN clients) through the http proxy to an external mail server, e.g., mail.xyz.com, GypsyProxy redirects the traffic to its own honeypot. In this case, the client will receive a banner that contains the destination host name and thus looks like it may really be coming from mail.xyz.com although that server is never contacted.

The 25.log entries indicate whether the connection was directly to port 25 or redirected from a socks proxy or http proxy CONNECT method. In the proxy redirection, the Remote IP field will show 127.0.0.1 and the Remote Name field will show the remote name (or IP address) in the original proxy request. The Request field shows the last smtp command issued in the connection.

This honeypot server will allow multiple mails to be sent in a single connection but most spammers observed thus far will "QUIT" and start again with another connection.

Using this new feature of GypsyProxy, spammers trying to use the http or socks proxies to connect to the real mail.xyz.com:25 have been observed to be very persistent. A single client IP address may attempt to establish a dozen or more http proxy connections to each of many mail servers, e.g., mail1.xyz.com, mail2.xyz.com, etc.

These spammers have been observed dumping thousands of spam emails into the black hole of the "SMTP server ready". If you can spare the bandwidth, you can accumulate evidence in the 25.log file. If you can spare the disk space, you can turn Capture ON in GypsyProxy and the mail will be saved to .\DATA\capture.dat.

GypsyProxy also contains ftp port 21 and pop3 port 110 honeypots with the ability of the client to "log on" with the USER and PASS commands. If logged on, other commands will be completed. At present, for ftp, all commands are "completed", but nothing is actually done. The ftp honeypot includes PWD, CWD, MKD, RMD, CDUP and QUIT. The pop3 honeypot does LIST, RETR, DELE, TOP, STAT, NOOP, QUIT and SEND. SEND is redirected to the smtp honeypot.

ssh port 22 and ident port 113 also look real, at least at first glance, and log client requests. ssh sends a banner, to which the client must respond maybe receiving a "Protocol mismatch" message. ident requires client input before responding [RFC 1413].

user-specified ports GypsyProxy can listen on up to a total of 24 ports. Edit the ports list before putting Gypsy "online". Ports may be listed in any order with one space between entries. If you want to disable a default port listed, just delete it before starting. Ports added to those listed at program start listen and log incoming traffic. Incoming connections can sometimes be hostile and these connections are closed in 8-12 seconds (depending on version) regardless of what the traffic is.

If the traffic is binary data, it may not show correctly in the program window; however, all data is written to the *.log files in binary form. For non-text protocols, one can examine this log with a hex viewer to see exactly the "request" sent by the client to "dummy" services that you create by adding port numbers.

If SQL port 1433 is added, its response to client requests is similar to that from a real MS SQL server. For port 261, client firewalls typically ask for a user and password; "GypsyProxy" and "NetCensus" are sent. For other ports, an appropriately pompous banner is sent:

220 Agency Services Firewall gp-1 ready

Other real or honeypot server components can be added.

GENERAL FEATURES

Several specifications are "hard-wired" in the code, but could easily be changed for specific needs.

A client IP address can have no more than 12 simultaneous connections to the server. Trying smaller values here can block successful web page retrieval through the http proxy ports as some browsers like to retrieve information from various sources simultaneously.

If 12 connections are exceeded, a 503 Service Unavailable code is sent, including "Retry-After: 15". Some recent browser versions appear to ignore this and to falsely assume that proxies will accept an unlimited number of simultaneous connections from a single source IP address.

Up to 48 simultaneous clients may be online for the services of GypsyProxy. In the case of proxy services, there will be 2 socket allocations for each client (the client and the remote host) and as many as 96 sockets could be open at one time.

GypsyProxy "smart" features can dumb-found poorly-written clients accustomed to "dumb" servers. For example, during download of an archive file with ".zip" extension, Gypsy will not allow multiple connections from the same client IP address (although up to 12 such connections are allowed for other files such as web pages as described above). Some "internet enhancer" or "download helper" programs on the client side routinely abuse servers by trying to make multiple download requests for the same file to the same server. Not with GypsyProxy. If a .zip file download is underway for a particular client IP address other connection requests are ignored and the connections are closed (with a "click" sound). On the other hand, up to 48 different clients may simultaneously download .zip files. Summary: Only one connection per client IP address for .zip file downloads.

Except for capture.dat, which is open if Capture is checked, all of the LOG directory files are closed after appending new data. Thus, these files can be moved, deleted, etc, at almost any time. Gypsy will recreate them as needed.

BlockIP.ini is a text file with IP address prefixes which will receive error 403 access denied. If it exists, BlockIP.ini is loaded each time the server is started. Thus, it can be edited in real-time to add or delete entries; then click the Gypsy button to stop and then to restart the server, loading the new .ini file. Example:

200.45.2
165.220.30
201.8.145.244

If the left part of the IP address matches characters in the client IP address, then the IP address or address range is blocked receiving 403 errors (Please see example err403.html). Notice the first fictional example blocks all addresses where the third byte starts with "2" (20 - 29, 200 - 255). Notice the last example is a full IP address, not a range. This simple text method for a "mask" has drawbacks, but can be effective to block ranges of hosts.

As a single thread application, Gypsy cycles through open connections to evaluate who needs what service over time, in a procedure called polling. While polling is considered to be less advanced than spawning additional threads for each connection, it is suited for a small application with low to moderate levels of traffic.

Once all tasks are complete, an 60 msec sleep is built in to avoid excessive use of CPU time. For proxy connections, a 16384 byte buffer is used. Therefore, data transfer up to 270k/sec is possible [16384 x 1000msec / 60 msec = 273k/sec].

At the application layer (GypsyProxy), there is no chance for buffer overflows, since received data length is never assumed to be less than a fixed value, and arbitrarily large amounts of received data are never read. Except for the internet connection check (wininet.dll), all network activity uses direct calls to socket functions, and thus avoids any bugs or vulnerabilities that might exist in other internet dlls.

For the http server and http proxy servers, all requests are filtered excluding characters less than " " or greater than "~".

The Proxy checkbox enables ports 1080, 3128 and 8080. If unchecked, immediately, new connections will not be accepted and all proxy functions on existing connections will be disabled. Existing connections will then time out and close.

The DNS checkbox enables reverse DNS lookup of client host names after the connection is closed.

In this version, all Sound effects are blocking. That is, the program stops while the sound plays, although the non-blocking socket data transmissions continue. The sounds are less than one second and the delay is inconsequential if connections are few. If there is more traffic, both DNS and Sound take time and the server throughput will increase if Sound and perhaps also DNS are unchecked.

This sound-related blocking seeks to avoid sound issues during program development. E.g., multiple requests to play sounds in a short interval may cause problems not directly relevant to development of the server.

On the other hand, Sound checked can be used to deliberately slow things down (with volume reduced so the sound is not a form of torture).

The Capture checkbox enables capture of proxy, honeypot and user-added port traffic. This can provide information for study of the protocols and debugging or even evidence, such as in the case of spam. However, the resulting capture.dat file will show a multiplexed view if Gypsy is handling transfers with many remote hosts during the same period. For example, segments of the traffic may be seen if one client is sending mail and another client is web browsing via the Gypsy anonymous http proxy. Indeed, a single client browsing the web may create multiple connections.

The Local checkbox actually does three actions simultaneously, as presented in gypsy.txt. Probably these actions should be controlled by separate checkboxes in future releases. Anyway, have Local checked if you are serving web pages over a LAN and are not connected to the internet. Otherwise, the program will detect that there is no internet connection and stop, with a sound notification. This is convenient if you loose a connection to the internet and want a sound notification. Also, LAN connections to the computer running Gypsy will not be logged if Local is checked.

If clients on your LAN are browsing the web through the Gypsy ports 3128 or 8080, Local checked prevents the generation of potentially very large logs. If you want those logs, however, you can uncheck Local. For this feature of the Local checkbox to work, hardwired at this time is that the LAN address is assumed to start with "192.168" often associated with small LAN networks. For other private address ranges, you will need a special build (compile) of Gypsy. Of course, other private address ranges can be used, but the Local checkbox will not control the logging of connections in those cases; they will be logged.

Thank you for reading. Your suggestions and knowledge are welcome.

Copyright © 2003 Global Services
Original publication: Feb. 23, 2003

Back to Net Census