General description of the TCP/IP protocols
Copyright (C) 1987, Charles L. Hedrick. Anyone may reproduce this
document, in whole or in part, provided that: (1) any copy or
republication of the entire document must show Rutgers University as
the source, and must include this notice; and (2) any other use of
this material must reference this manual and Rutgers University, and
the fact that the material is copyright by Charles Hedrick and is used
by permission.
TCP/IP is a layered set of protocols. In order to understand what
this means, it is useful to look at an example. A typical situation
is sending mail. First, there is a protocol for mail. This defines a
set of commands which one machine sends to another, e.g. commands to
specify who the sender of the message is, who it is being sent to, and
then the text of the message. However this protocol assumes that
there is a way to communicate reliably between the two computers.
Mail, like other application protocols, simply defines a set of
commands and messages to be sent. It is designed to be used together
with TCP and IP. TCP is responsible for making sure that the commands
get through to the other end. It keeps track of what is sent, and
retransmitts anything that did not get through. If any message is too
large for one datagram, e.g. the text of the mail, TCP will split it
up into several datagrams, and make sure that they all arrive
correctly. Since these functions are needed for many applications,
they are put together into a separate protocol, rather than being part
of the specifications for sending mail. You can think of TCP as
forming a library of routines that applications can use when they need
reliable network communications with another computer. Similarly, TCP
calls on the services of IP. Although the services that TCP supplies
are needed by many applications, there are still some kinds of
applications that don't need them. However there are some services
that every application needs. So these services are put together into
IP. As with TCP, you can think of IP as a library of routines that
TCP calls on, but which is also available to applications that don't
use TCP. This strategy of building several levels of protocol is
called "layering". We think of the applications programs such as
mail, TCP, and IP, as being separate "layers", each of which calls on
the services of the layer below it. Generally, TCP/IP applications
use 4 layers:
- an application protocol such as mail
- a protocol such as TCP that provides services need by many
applications
- IP, which provides the basic service of getting datagrams to
their destination
- the protocols needed to manage a specific physical medium, such
as Ethernet or a point to point line.
TCP/IP is based on the "catenet model". (This is described in more
detail in IEN 48.) This model assumes that there are a large number
of independent networks connected together by gateways. The user
should be able to access computers or other resources on any of these
networks. Datagrams will often pass through a dozen different
networks before getting to their final destination. The routing
needed to accomplish this should be completely invisible to the user.
As far as the user is concerned, all he needs to know in order to
access another system is an "Internet address". This is an address
that looks like 128.6.4.194. It is actually a 32-bit number. However
it is normally written as 4 decimal numbers, each representing 8 bits
of the address. (The term "octet" is used by Internet documentation
for such 8-bit chunks. The term "byte" is not used, because TCP/IP is
supported by some computers that have byte sizes other than 8 bits.)
Generally the structure of the address gives you some information
about how to get to the system. For example, 128.6 is a network
number assigned by a central authority to Rutgers University. Rutgers
uses the next octet to indicate which of the campus Ethernets is
involved. 128.6.4 happens to be an Ethernet used by the Computer
Science Department. The last octet allows for up to 254 systems on
each Ethernet. (It is 254 because 0 and 255 are not allowed, for
reasons that will be discussed later.) Note that 128.6.4.194 and
128.6.5.194 would be different systems. The structure of an Internet
address is described in a bit more detail later.
Of course we normally refer to systems by name, rather than by
Internet address. When we specify a name, the network software looks
it up in a database, and comes up with the corresponding Internet
address. Most of the network software deals strictly in terms of the
address. (RFC 882 describes the name server technology used to handle
this lookup.)
TCP/IP is built on "connectionless" technology. Information is
transfered as a sequence of "datagrams". A datagram is a collection
of data that is sent as a single message. Each of these datagrams is
sent through the network individually. There are provisions to open
connections (i.e. to start a conversation that will continue for some
time). However at some level, information from those connections is
broken up into datagrams, and those datagrams are treated by the
network as completely separate. For example, suppose you want to
transfer a 15000 octet file. Most networks can't handle a 15000 octet
datagram. So the protocols will break this up into something like 30
500-octet datagrams. Each of these datagrams will be sent to the
other end. At that point, they will be put back together into the
15000-octet file. However while those datagrams are in transit, the
network doesn't know that there is any connection between them. It is
perfectly possible that datagram 14 will actually arrive before
datagram 13. It is also possible that somewhere in the network, an
error will occur, and some datagram won't get through at all. In that
case, that datagram has to be sent again.
Note by the way that the terms "datagram" and "packet" often seem to
be nearly interchangable. Technically, datagram is the right word to
use when describing TCP/IP. A datagram is a unit of data, which is
what the protocols deal with. A packet is a physical thing, appearing
on an Ethernet or some wire. In most cases a packet simply contains a
datagram, so there is very little difference. However they can
differ. When TCP/IP is used on top of X.25, the X.25 interface breaks
the datagrams up into 128-byte packets. This is invisible to IP,
because the packets are put back together into a single datagram at
the other end before being processed by TCP/IP. So in this case, one
IP datagram would be carried by several packets. However with most
media, there are efficiency advantages to sending one datagram per
packet, and so the distinction tends to vanish.
The TCP level
Two separate protocols are involved in handling TCP/IP datagrams. TCP
(the "transmission control protocol") is responsible for breaking up
the message into datagrams, reassembling them at the other end,
resending anything that gets lost, and putting things back in the
right order. IP (the "internet protocol") is responsible for routing
individual datagrams. It may seem like TCP is doing all the work.
And in small networks that is true. However in the Internet, simply
getting a datagram to its destination can be a complex job. A
connection may require the datagram to go through several networks at
Rutgers, a serial line to the John von Neuman Supercomputer Center, a
couple of Ethernets there, a series of 56Kbaud phone lines to another
NSFnet site, and more Ethernets on another campus. Keeping track of
the routes to all of the destinations and handling incompatibilities
among different transport media turns out to be a complex job. Note
that the interface between TCP and IP is fairly simple. TCP simply
hands IP a datagram with a destination. IP doesn't know how this
datagram relates to any datagram before it or after it.
It may have occurred to you that something is missing here. We have
talked about Internet addresses, but not about how you keep track of
multiple connections to a given system. Clearly it isn't enough to
get a datagram to the right destination. TCP has to know which
connection this datagram is part of. This task is referred to as
"demultiplexing." In fact, there are several levels of demultiplexing
going on in TCP/IP. The information needed to do this demultiplexing
is contained in a series of "headers". A header is simply a few extra
octets tacked onto the beginning of a datagram by some protocol in
order to keep track of it. It's a lot like putting a letter into an
envelope and putting an address on the outside of the envelope.
Except with modern networks it happens several times. It's like you
put the letter into a little envelope, your secretary puts that into a
somewhat bigger envelope, the campus mail center puts that envelope
into a still bigger one, etc. Here is an overview of the headers that
get stuck on a message that passes through a typical TCP/IP network:
We start with a single data stream, say a file you are trying to send
to some other computer:
......................................................
TCP breaks it up into manageable chunks. (In order to do this, TCP
has to know how large a datagram your network can handle. Actually,
the TCP's at each end say how big a datagram they can handle, and then
they pick the smallest size.)
.... .... .... .... .... .... .... ....
TCP puts a header at the front of each datagram. This header actually
contains at least 20 octets, but the most important ones are a source
and destination "port number" and a "sequence number". The port
numbers are used to keep track of different conversations. Suppose 3
different people are transferring files. Your TCP might allocate port
numbers 1000, 1001, and 1002 to these transfers. When you are sending
a datagram, this becomes the "source" port number, since you are the
source of the datagram. Of course the TCP at the other end has
assigned a port number of its own for the conversation. Your TCP has
to know the port number used by the other end as well. (It finds out
when the connection starts, as we will explain below.) It puts this
in the "destination" port field. Of course if the other end sends a
datagram back to you, the source and destination port numbers will be
reversed, since then it will be the source and you will be the
destination. Each datagram has a sequence number. This is used so
that the other end can make sure that it gets the datagrams in the
right order, and that it hasn't missed any. (See the TCP
specification for details.) TCP doesn't number the datagrams, but the
octets. So if there are 500 octets of data in each datagram, the
first datagram might be numbered 0, the second 500, the next 1000, the
next 1500, etc. Finally, I will mention the Checksum. This is a
number that is computed by adding up all the octets in the datagram
(more or less - see the TCP spec). The result is put in the header.
TCP at the other end computes the checksum again. If they disagree,
then something bad happened to the datagram in transmission, and it is
thrown away. So here's what the datagram looks like now.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| |
| Offset| Reserved |R|C|S|S|Y|I| Window |
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| your data ... next 500 octets |
| ...... |
If we abbreviate the TCP header as "T", the whole file now looks like
this:
T.... T.... T.... T.... T.... T.... T....
You will note that there are items in the header that I have not
described above. They are generally involved with managing the
connection. In order to make sure the datagram has arrived at its
destination, the recipient has to send back an "acknowledgement".
This is a datagram whose "Acknowledgement number" field is filled in.
For example, sending a packet with an acknowledgement of 1500
indicates that you have received all the data up to octet number 1500.
If the sender doesn't get an acknowledgement within a reasonable
amount of time, it sends the data again. The window is used to
control how much data can be in transit at any one time. It is not
practical to wait for each datagram to be acknowledged before sending
the next one. That would slow things down too much. On the other
hand, you can't just keep sending, or a fast computer might overrun
the capacity of a slow one to absorb data. Thus each end indicates
how much new data it is currently prepared to absorb by putting the
number of octets in its "Window" field. As the computer receives
data, the amount of space left in its window decreases. When it goes
to zero, the sender has to stop. As the receiver processes the data,
it increases its window, indicating that it is ready to accept more
data. Often the same datagram can be used to acknowledge receipt of a
set of data and to give permission for additional new data (by an
updated window). The "Urgent" field allows one end to tell the other
to skip ahead in its processing to a particular octet. This is often
useful for handling asynchronous events, for example when you type a
control character or other command that interrupts output. The other
fields are beyond the scope of this document.
The IP level
TCP sends each of these datagrams to IP. Of course it has to tell IP
the Internet address of the computer at the other end. Note that this
is all IP is concerned about. It doesn't care about what is in the
datagram, or even in the TCP header. IP's job is simply to find a
route for the datagram and get it to the other end. In order to allow
gateways or other intermediate systems to forward the datagram, it
adds its own header. The main things in this header are the source
and destination Internet address (32-bit addresses, like 128.6.4.194),
the protocol number, and another checksum. The source Internet
address is simply the address of your machine. (This is necessary so
the other end knows where the datagram came from.) The destination
Internet address is the address of the other machine. (This is
necessary so any gateways in the middle know where you want the
datagram to go.) The protocol number tells IP at the other end to
send the datagram to TCP. Although most IP traffic uses TCP, there
are other protocols that can use IP, so you have to tell IP which
protocol to send the datagram to. Finally, the checksum allows IP at
the other end to verify that the header wasn't damaged in transit.
Note that TCP and IP have separate checksums. IP needs to be able to
verify that the header didn't get damaged in transit, or it could send
a message to the wrong place. For reasons not worth discussing here,
it is both more efficient and safer to have TCP compute a separate
checksum for the TCP header and data. Once IP has tacked on its
header, here's what the message looks like:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| TCP header, then your data ...... |
| |
If we represent the IP header by an "I", your file now looks like
this:
IT.... IT.... IT.... IT.... IT.... IT.... IT....
Again, the header contains some additional fields that have not been
discussed. Most of them are beyond the scope of this document. The
flags and fragment offset are used to keep track of the pieces when a
datagram has to be split up. This can happen when datagrams are
forwarded through a network for which they are too big. (This will be
discussed a bit more below.) The time to live is a number that is
decremented whenever the datagram passes through a system. When it
goes to zero, the datagram is discarded. This is done in case a loop
develops in the system somehow. Of course this should be impossible,
but well-designed networks are built to cope with "impossible"
conditions.
At this point, it's possible that no more headers are needed. If your
computer happens to have a direct phone line connecting it to the
destination computer, or to a gateway, it may simply send the
datagrams out on the line (though likely a synchronous protocol such
as HDLC would be used, and it would add at least a few octets at the
beginning and end).
The Ethernet level
However most of our networks these days use Ethernet. So now we have
to describe Ethernet's headers. Unfortunately, Ethernet has its own
addresses. The people who designed Ethernet wanted to make sure that
no two machines would end up with the same Ethernet address.
Furthermore, they didn't want the user to have to worry about
assigning addresses. So each Ethernet controller comes with an
address builtin from the factory. In order to make sure that they
would never have to reuse addresses, the Ethernet designers allocated
48 bits for the Ethernet address. People who make Ethernet equipment
have to register with a central authority, to make sure that the
numbers they assign don't overlap any other manufacturer. Ethernet is
a "broadcast medium". That is, it is in effect like an old party line
telephone. When you send a packet out on the Ethernet, every machine
on the network sees the packet. So something is needed to make sure
that the right machine gets it. As you might guess, this involves the
Ethernet header. Every Ethernet packet has a 14-octet header that
includes the source and destination Ethernet address, and a type code.
Each machine is supposed to pay attention only to packets with its own
Ethernet address in the destination field. (It's perfectly possible
to cheat, which is one reason that Ethernet communications are not
terribly secure.) Note that there is no connection between the
Ethernet address and the Internet address. Each machine has to have a
table of what Ethernet address corresponds to what Internet address.
(We will describe how this table is constructed a bit later.) In
addition to the addresses, the header contains a type code. The type
code is to allow for several different protocol families to be used on
the same network. So you can use TCP/IP, DECnet, Xerox NS, etc. at
the same time. Each of them will put a different value in the type
field. Finally, there is a checksum. The Ethernet controller
computes a checksum of the entire packet. When the other end receives
the packet, it recomputes the checksum, and throws the packet away if
the answer disagrees with the original. The checksum is put on the
end of the packet, not in the header. The final result is that your
message looks like this:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethernet destination address (first 32 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethernet dest (last 16 bits) |Ethernet source (first 16 bits)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethernet source address (last 32 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Type code |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP header, then TCP header, then your data |
| |
...
| |
| end of your data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethernet Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
If we represent the Ethernet header with "E", and the Ethernet
checksum with "C", your file now looks like this:
EIT....C EIT....C EIT....C EIT....C EIT....C
When these packets are received by the other end, of course all the
headers are removed. The Ethernet interface removes the Ethernet
header and the checksum. It looks at the type code. Since the type
code is the one assigned to IP, the Ethernet device driver passes the
datagram up to IP. IP removes the IP header. It looks at the IP
protocol field. Since the protocol type is TCP, it passes the
datagram up to TCP. TCP now looks at the sequence number. It uses
the sequence numbers and other information to combine all the
datagrams into the original file.
The ends our initial summary of TCP/IP. There are still some crucial
concepts we haven't gotten to, so we'll now go back and add details in
several areas. (For detailed descriptions of the items discussed here
see, RFC 793 for TCP, RFC 791 for IP, and RFC's 894 and 826 for
sending IP over Ethernet.)
Back to the Index
|