Only a few years ago, TCP/IP networks relied on simple distance-vector routing protocols and classful, 32-bit IP addressing

Only a few years ago, TCP/IP networks relied on simple distance-vector routing protocols and classful, 32-bit IP addressing. These technologies offered a limited capacity for growth. Today, network designers must retool, work around, or completely abandon these early technologies to build networks that can handle rapid growth and constant change. This course explores networking technologies that have evolved to meet this demand for scalability.

In networking, scalability is the capability to grow and adapt without major redesign or reinstallation. Allowing for growth seems simple enough, but it can be difficult to do without significant and costly redesign. For example, a network may provide a small company with access to e-mail, the Internet, and shared files. What would happen if that company tripled in size and demanded streaming video or e-commerce? Would the original networking media and devices adequately serve these new applications? Organizations can ill afford to completely re-cable and redesign their networks every time workers are moved, new nodes are added, or new applications are introduced.

Good design is the key to a network's capability to scale. More often than not, it is a poor design, and not an outdated protocol or router, that prevents a network from scaling gracefully. To be scalable, a network design should follow a hierarchical model. This chapter discusses the components of the hierarchical network design model and the key characteristics of scalable internetworks.

A hierarchical network design model breaks the complex problem of network design into smaller, more manageable problems. Each level, or tier, in the hierarchy addresses a different set of problems so that network hardware and software can be optimized to perform specific roles. Devices at the lowest tier of the hierarchy are designed to accept traffic into a network and then pass traffic up to the higher layers. Cisco offers a three-tiered hierarchy as the preferred approach to network design.

In the three-layer network design model, network devices and links are grouped according to three layers: core, distribution, and access. Like the Open System Interconnection (OSI) reference model, the three-layer design model is a conceptual framework, an abstract picture of a network.

Layered models are useful because they facilitate modularity. Since devices at each layer have similar and well-defined functions, administrators can easily add, replace, and remove individual pieces of the network. This kind of flexibility and adaptability makes a hierarchical network design a scalable network design.

At the same time, layered models can be difficult to comprehend because the exact composition of each layer varies from network to network. Each layer of the three-tiered design model may include a router, a switch, a link, or some combination of these. In fact, some networks may combine the function of two layers into a single device, or may omit a layer entirely.

The following sections look at each of the three layers in detail.

The Core Layer
The core of the network has one purpose: to provide an optimized and reliable transport structure by forwarding traffic at very high speeds. In other words, the core layer should switch packets as fast as possible. Devices at this layer should not be burdened with access-list checking, data encryption, address translation, or any other process that stands in the way of switching packets at top speed.

The Distribution Layer
The distribution layer sits between the access and core layers and helps differentiate the core from the rest of the network. The purpose of this layer is to provide boundary definition by using access lists and other filters to limit what gets into the core. Therefore, this layer defines policy for the network. A policy is an approach to handling certain kinds of traffic, including routing updates, route summaries, VLAN traffic, and address aggregation. You can use policies to secure networks and to preserve resources by preventing unnecessary traffic.

If a network has two or more routing protocols, such as Routing Information Protocol (RIP) and Interior Gateway Routing Protocol (IGRP), information between the different routing domains is shared, or redistributed, at the distribution layer.

The Access Layer
The access layer feeds traffic into the network and performs network entry control. End users access the network via the access layer. As a network's "front door," the access layer employs access lists designed to prevent unauthorized users from gaining entry. The access layer can also give remote sites access to the network via a wide-area technology, such as Frame Relay, ISDN, or leased lines.

Because each layer (core, distribution, and access) has a clearly defined function, each layer demands a different set of features from routers, switches, and links. Routers that operate in the same layer can be configured in a consistent way because they all must perform similar tasks. In fact, the router is the primary device that maintains logical and physical hierarchy in a network, so proper and consistent configurations are imperative. Cisco offers several router product lines, each with a particular set of features tailored for one of the three layers:

Core layer - 12000, 7500, 7200, and 7000 series routers -
Distribution layer - 4500, 4000, and 3600 series routers -
Access layer - 2600, 2500, 1700, and 1600 series routers -

The following sections revisit each layer and examine the specific routers and other devices used there.

As the center of the network, the core layer is designed to be fast and reliable. Access lists are avoided in the core because they add latency, or delay. Moreover, end users should not access the core directly. Consider an apple; you can not get to the seeds in an apple's core without going through the skin first. In a hierarchical network, end users' traffic should reach core routers only after those packets have passed through the distribution and access layers, where access lists may be applied.

Because core routing is done without access lists, address translation, or other packet manipulation, it may seem as though the least powerful routers would work well for so simple a task. However, the opposite is true. The most powerful Cisco routers serve the core because they have the fastest switching technologies and the largest capacity for physical interfaces.

Marketed by Cisco as enterprise core routers, the 7000, 7200, and 7500 series routers feature the fastest switching modes available. The 12000 series router is also a core router, but it is designed to meet the core routing needs of Internet service providers (ISPs). Unless your company is in the business of providing Internet access to other companies, you are not likely to see a 12000 series router in your telecommunications closet.

Unlike some routers, such as the Cisco 2500 series, the 7000, 7200, and 7500 series routers are modular, so interface modules can be added as needed. The large chassis of this series can accommodate dozens of interfaces on multiple modules for virtually any media type, which makes these routers scalable, reliable core solutions.

One way that core routers achieve reliability is through using redundant links, usually to all other core routers. When possible, these redundant links should be symmetrical (i.e., they should have equal throughput) so that equal-cost load balancing can be used. That is why core routers need a relatively large number of interfaces. Another way that core routers achieve reliability is through redundant power supplies. Core routers usually feature two or more "hot-swappable" power supplies, which may be removed and replaced individually without bringing down the router.

The figure presents a simple core topology using 7507 routers at three key sites in an enterprise. Each Cisco 7507 is directly connected to every other router by two links, which makes this configuration a full mesh. Core links should be the fastest, most reliable, and most expensive leased lines in the WAN: T1, T3, OC3, or better. If redundant T1s are used for this WAN core, each router needs four serial interfaces for two point-to-point connections to each site. Ultimately, the design requires even more than this because other routers at the distribution layer will also need to connect to the core routers. Fortunately, you can easily add interfaces to the 7507s because they are modular.

You can see that with the high-end routers and WAN links involved, the core can become a huge expense, even in a simple example such as this. Some designers will choose not to use symmetrical links in the core to reduce cost. In place of redundant lines, packet-switched and dial-on-demand technologies, such as Frame Relay and ISDN, may be used as backup links. The trade-off for saving money by using such technologies is performance. For instance, if you use ISDN BRIs as backup links, you lose the capability to do equal-cost load balancing.

The core of a network does not have to exist in the WAN. In some cases, a LAN backbone may also be considered to belong to the core layer. Campus networks, or large networks that span an office complex or adjacent buildings, might have a LAN-based core. In this case, switched Fast Ethernet and Gigabit Ethernet are the most common core technologies, and they are usually run over fiber. Enterprises switches, such as the Catalyst 4000, 5000, and 6000 series, shoulder the load in LAN cores because they switch frames at Layer 2 much faster than routers can switch packets at Layer 3. In fact, as modular devices, these switches can be equipped with route switch modules (RSMs), adding Layer 3 routing functionality to the switch chassis.

The following rules will protect the core from unnecessary or unauthorized traffic. Distribution-layer routers need fewer interfaces and less switching speed than their counterparts in the core because they should handle less traffic. Nevertheless, a lightning-fast core is useless if a bottleneck at the distribution layer prevents user traffic from accessing core links. For this reason, Cisco offers robust, powerful distribution routers, such as the 4000, 4500, and, most recently, the 3600 series router. These routers are modular, so interfaces can be added and removed depending on need, although the smaller chassis of these series are much more limiting than those of the 7000, 7200, and 7500 series.

Exactly how will these distribution-layer routers bring policy to the network? You can configure them to use a combination of access lists, route summarization, distribution lists, route maps, and other rules to define how a router should deal with traffic and routing updates. Many of these techniques are covered later in this course.

The figure shows two 3620 routers have been added at Core A (in the same wiring closet as the 7507). This means that you can use high-speed LAN links to make the connections between our distribution routers and the core router. Depending on the size of the network, these links may be part of the campus backbone and will most likely be fiber running 100 or 1000 Mbps. In this example, Dist-1 and Dist-2 are part of Core A's campus backbone. Dist-1 serves remote sites, while Dist-2, serves access routers at Site A. If Site A employs VLANs throughout the campus, Dist-2 may be responsible for routing between them.

Both Dist-1 and Dist-2 use access lists to prevent unwanted traffic from reaching the core. In addition, these routers summarize their routing tables in updates to Core A, keeping Core A's routing table as small and efficient as possible.

In the figure, routers at the access layer are deployed to permit users at Site A and remote sites Y and Z to access the network.

Access routers generally offer fewer physical interfaces than distribution and core routers. For this reason, Cisco access routers, which include the 1600, 1700, 2500, and 2600 series, feature a small, streamlined chassis that may or may not support modular interfaces.

Two 2621s have been added to the access layer of the example network at Site A. These 2621 routers have two Ethernet interfaces: one that the users' end stations will connect to via a workgroup switch or hub, and one that connects to Site A's high-speed campus backbone.

Each remote site in the example requires only one Ethernet interface for the LAN side and one serial interface for the WAN side. The WAN interface connects via Frame Relay or ISDN to the distribution router in the wiring closet of Site A. For this application, the 2610 router provides a single 10-Mbps Ethernet port and will work well at these locations. These remote sites, Y and Z, are small branch offices that must access the core through Site A. Therefore, Dist-1 A is acting as a WAN hub for the organization. As the network scales, dozens of remote sites may access the core by connection to distribution routers at the WAN hubs, Site A, Site B, and Site C.

Although every large internetwork has unique features, all scalable networks have essential attributes in common. A scalable network has five key characteristics:

Reliable and available - A reliable network should be dependable and available 24 hours a day, 7 days a week. In addition, failures need to be isolated, and recovery must be invisible to the end user.
Responsive - A responsive network should provide Quality of Service (QoS) for various applications and protocols without affecting a response at the desktop. For example, the internetwork must be capable of responding to latency issues common for Systems Network Architecture (SNA) traffic but still allow for the routing of desktop traffic, such as Internetwork Packet Exchange (IPX), without compromising QoS requirements.
Efficient - Large internetworks must optimize the use of resources, especially bandwidth. Reducing the amount of overhead traffic, such as unnecessary broadcasts, service location, and routing updates, results in an increase in data throughput without increasing the cost of hardware or the need for additional WAN services.
Adaptable - An adaptable network is capable of accommodating disparate protocols, applications, and hardware technologies.
Accessible but secure - An accessible network allows for connections using dedicated, dialup, and switched services while maintaining network integrity.

The Cisco IOS offers a rich set of features that support network scalability. The remainder of this chapter outlines specific IOS features that work to promote these five key characteristics of a scalable network.

A reliable and available network provides users with 24-hours-a-day, 7-days-a-week access. In a highly reliable and available network, fault tolerance and redundancy make outages and failures invisible to the end user. The high-end devices and telecommunication links that ensure this kind of performance come with a steep price tag. Network designers constantly have to balance the needs of users with the resources at hand.

When choosing between high performance and low cost at the core layer, you should opt for the best available routers and dedicated WAN links. You must design the core to be the most reliable and available layer. If a core router went down, or if a core link became unstable, routing for the entire internetwork might be adversely affected.

Core routers maintain reliability and availability by rerouting traffic in the event of a failure. Networks that can deal with failures quickly and effectively are said to be robust. To build robust networks, the Cisco IOS offers several features that enhance reliability and availability. These include support for scalable routing protocols, alternative paths, load balancing, protocol tunnels, and dial backup. The following sections describe these features.

Scalable Routing Protocols
Routers in the core of a network should converge rapidly and maintain reachability to all networks and subnetworks within an Autonomous System (AS). Simple distance-vector routing protocols, such as RIP, take too long to update and adapt to topology changes to be viable core solutions. Compatibility issues sometimes require that some areas of a network run simple distance-vector protocols such as RIP and Routing Table Maintenance Protocol (RTMP, an Apple Computer proprietary routing protocol). Whenever possible, a scalable protocol such as Open Shortest Path First (OSPF) or Enhanced Interior Gateway Routing Protocol (EIGRP) should be implemented, especially in the core layer.

Alternate Paths
Redundant links maximize network reliability and availability, but they are expensive to deploy throughout a large internetwork. Links in the core layer should always be made redundant, but other areas of a network may also need redundant telecommunication lines. If a remote site exchanges mission-critical information with the rest of the internetwork, that site would be a candidate for redundant links. To provide another dimension of reliability, an organization may even invest in redundant routers to connect to these links. A network that consists of multiple links and redundant routers will contain several paths to a given destination. If a network uses a scalable routing protocol, such as OSPF or EIGRP, its routers will maintain a map of the entire network topology. This will allow the routers to reroute traffic quickly by selecting an alternate path. In fact, EIGRP maintains a database of all alternate paths just in case the preferred route is lost.

Load Balancing
Redundant links do not necessarily remain idle until a link fails. Routers can distribute the traffic load across multiple links to the same destination. This process is called load balancing. It can be implemented using alternate paths with the same cost or metric (equal-cost load balancing), or over alternate paths with different metrics (unequal-cost load balancing). When routing IP, the Cisco IOS offers two methods of load balancing: per-packet and per-destination load balancing. If process switching is enabled, the router will alternate paths on a per-packet basis. If fast switching is enabled, only one of the alternate routes will be cached for the destination address and all packets in the packet stream bound for a specific host will take the same path. Packets bound for a different host on the same network may use an alternate route. This way, traffic is load-balanced on a per-destination basis.

Per-packet load balancing requires more CPU time than per-destination load balancing. On the plus side, per-packet load balancing allows load balancing that is proportional to the metrics of unequal paths, rather than round-robin path selection, which can help utilize bandwidth efficiently.

Tunnels
Consider an IP network with Novell NetWare running IPX at a handful of remote sites. One way to provide IPX connectivity between the remote sites is to route IPX in the core. Even if only two or three offices sparingly use NetWare, this will create additional overhead associated with routing a second routed protocol (IPX) in the core. It would also require that all routers in the data path have appropriate IOS and hardware to support IPX. For this reason, many organizations have adopted "IP only" policies at the network core because IP has become the world's dominant routed protocol.

Tunneling allows an administrator a second and more palatable option: configure a point-to-point link through the core between the two routers using IP. When this link is configured, IPX packets can be encapsulated, or packaged, inside IP packets. IPX can then traverse the core over IP links, and the core can be spared the additional burden of routing IPX. Using tunnels, administrators increase the availability of network service.

Dial Backup
Sometimes two redundant WAN links are not enough, or a single link needs to be fault-tolerant, but a full-time redundant link is too expensive. In these cases, a backup link can be configured over a dialup technology, such as ISDN, or even an ordinary analog phone line. These relatively low-bandwidth links remain idle until the primary link fails.

Dial backup can be a cost-effective insurance policy, but it is not a substitute for redundant links that can effectively double throughput by using equal-cost load balancing.

A network's responsiveness is typically measured by its end users as they access the network to perform day-to-day tasks. Today's users expect network resources to respond quickly, as if network applications were running from a local hard drive. You must tailor networks to meet the needs of applications, especially delay sensitive applications such as voice and video. The Cisco IOS offers traffic prioritization features to tune responsiveness in a congested network. Routers can be configured to prioritize certain kinds of traffic based on protocol information, such as TCP port numbers. As shown in the figure, traffic prioritization ensures that packets carrying mission-critical data take precedence over less important traffic.

If the router schedules these packets for transmission on a first-come, first-served basis, users could experience an unacceptable lack of responsiveness. Therefore, an end user sending delay-sensitive voice traffic, may be forced to wait too long while the router empties its buffer of a long train of queued packets.

The Cisco IOS addresses priority and responsiveness issues through queuing. The question of priority is most important on routers that maintain a slow WAN connection and therefore experience frequent congestion. Queuing refers to the process that the router uses to schedule packets for transmission during periods of congestion. By using the queuing feature, you can configure a congested router to reorder packets so that mission-critical and delay-sensitive traffic is sent out first. These higher priority packets are sent first even if other low-priority packets arrive first. The Cisco IOS supports four methods of queuing, as described in the following sections: first-in, first out (FIFO) queuing; priority queuing; custom queuing, and weighted fair queuing (WFQ). Only one of these queuing methods can be applied per interface because each method handles traffic in a unique way.

An efficient network does not waste bandwidth, especially over costly WAN links. To be efficient, routers should prevent unnecessary traffic from traversing the WAN and should minimize the size and frequency of routing updates. The Cisco IOS includes several features designed to optimize a WAN connection:

Access lists
Snapshot routing
Compression over WANs

The following sections describe each of these features.

Access Lists
Access lists, also called access control lists (ACLs), can be used to prevent traffic that the administrator defines as unnecessary, undesirable, or unauthorized. You may apply one access list on an interface for each protocol, per direction (i.e., in or out). Different filtering policies can be defined for IP, IPX, and AppleTalk. Access lists can also be used to control routing updates, apply route maps, and implement other network "policies" that improve efficiency by curtailing traffic.

Snapshot Routing
Distance-vector routing protocols typically update neighbor routers with their complete routing table at regular intervals. This is done even when nothing has changed in the network's topology. If a remote site relies on a dialup technology, such as ISDN, you can not expect the WAN link to remain active 24 hours a day. In fact, the tolls associated with ISDN make heavy use cost-prohibitive. However, if RIP routers expect updates every 30 seconds by default, the ISDN link would have to be reestablished twice a minute to maintain the routing tables. This hardly seems efficient, especially after employees have gone home for the night. Although it is possible to adjust RIP's timers, the Cisco IOS provides a much better solution to maximize network efficiency in this situation; snapshot routing.

Snapshot routing allows distance-vector routers to exchange their complete tables during an initial connection, but then wait until active periods on the line before again exchanging routing information. The router takes a snapshot of the routing table, which it uses during quiet periods while the dialup link is down. In other words, the routing table is kept frozen so that routes will not be lost because an update has not been received. When the link is re-established (usually because the router has identified interesting traffic that needs to be routed over the WAN), the router again updates its neighbors.

Compression
The Cisco IOS supports several compression techniques that can maximize bandwidth by reducing the number of bits in all or part of a frame. Compression is accomplished through mathematical formulas, or compression algorithms. Unfortunately, routers must dedicate a significant amount of processor time to compress and decompress traffic. This increases latency. For this reason, compression proves an efficient measure only on links with extremely limited bandwidth.

The Cisco IOS also supports the following bandwidth optimization features:

Dial-on-demand routing (DDR)
Switched access
Route summarization
Incremental updates

Dial-on-Demand Routing
An organization cannot always afford dedicated WAN circuits, or even Frame Relay, for every remote site. At sites that require only occasional WAN connectivity, dial-on-demand routing (DDR) offers an efficient, economical alternative. As shown in the figure, a router configured for DDR will listen for interesting traffic and wait to build the WAN link. When the router receives interesting traffic (as defined by the administrator), it places a call to activate the link, which is commonly ISDN.

Route Summarization
The number of entries in a routing table can be reduced if the router uses one network address and mask to represent multiple networks or subnetworks. This technique is called route aggregation, or route summarization. Some routing protocols automatically summarize subnet routes based on the major network number. Other routing protocols, such as OSPF and EIGRP, allow manual summarization. You will learn more about route summarization in the next chapter.

Incremental Updates
Some routing protocols, such as OSPF and EIGRP, send routing updates that contain information only about routes that have changed. These incremental routing updates use the bandwidth more efficiently than simple distance-vector protocols, which transmit their complete routing table at fixed intervals, regardless of whether a change has occurred.

An adaptable network can gracefully handle the addition and coexistence of multiple routed and routing protocols. EIGRP is an exceptionally adaptable protocol because it supports routing information for three routed protocols: IP, IPX, and AppleTalk.

The Cisco IOS also supports route redistribution, which is described in Chapter 7, Route Optimization. Route redistribution allows routing information to be shared (i.e., redistributed) among two or more different routing protocols. For instance, RIP routes can be redistributed into an OSPF area.

Mixing routable and non-routable protocols
A network delivering both routable and non-routable traffic has some unique problems. Routable protocols (e.g., IP) can be forwarded from one network to another based on a network-layer address. Non-routable protocols (e.g., SNA) do not contain any network-layer address and cannot be forwarded by routers. Most non-routable protocols also lack a mechanism to provide flow control and are sensitive to delays in delivery. Any delays in delivery or packets arriving out of order can result in session loss. An adaptable network should accommodate both routable and non-routable protocols.

Accessible networks allow users to connect easily over a wide variety of technologies. Campus LAN users typically connect to routers at the access layer through Ethernet or Token Ring. Remote users and sites depend on one of several WAN services. The variety of WAN services will differ from area to area. Because cost and geography play a significant role in determining what type of WAN services an organization can deploy, Cisco routers support all major WAN connection types. As shown in the figure, these include circuit-switched (dialup) networks, leased lines (dedicated), and packet switched networks.

Dialup and dedicated access - Cisco routers can be directly connected to basic telephone service or digital services such as T1/E1. Dialup links can be used for backup or at remote sites that need occasional WAN access, while dedicated leased lines provide a high-speed, high-capacity WAN core between key sites.
Packet switched - Cisco routers support Frame Relay, X.25, Switched Multi-megabit Data Service (SMDS), and ATM. With this variety of support, the WAN service, or combination of WAN services, to deploy can be determined based on cost, location, and need.

Often, the easier it is for legitimate remote users to access the network, the easier it is for unauthorized users to break in. An access strategy must be carefully planned so that resources, such as remote access routers and servers, are secure. If a company enables users to telecommute via dialup modem, the network administrator must secure access routers with access lists or an authentication protocol such as the Password Authentication Protocol (PAP) or the Challenge Handshake Protocol (CHAP). These protocols require the user to provide a valid name and password before the router permits access to other network resources.

This chapter has defined scalability and provided examples of Cisco IOS features that enable you to grow your network successfully. You learned, for example, that knowing where the router is located in the hierarchy and what the key needs are for a given layer make it easier for you to configure the router to meet the specific needs of the layer. Recall that the key characteristics of a scalable network are reliability and availability, responsiveness, efficiency, adaptability, and being accessible but secure. You will see these same concepts discussed again and again as you make your way through this semester.

A scalable network requires an addressing scheme that allows for growth. As you add new nodes and new networks to the enterprise, existing addresses may need to be reassigned, bloated routing tables may bog down routers, and the supply of available addresses may simply run out. You can avoid these unpleasant consequences with careful planning and deployment of a scalable network-addressing system.

Although network designers can choose among many different network protocols and addressing schemes, the emergence of the Internet and its nonproprietary protocol, TCP/IP, has meant that virtually every enterprise must implement an IP addressing scheme. In fact, as companies such as Apple and Novell have recently migrated their network software to TCP/IP (and away from their proprietary protocols), many organizations opt to run TCP/IP as the only routed protocol on the network. The bottom line is that today's administrators must find ways to scale their networks by using IP addressing.

Unfortunately, the architects of TCP/IP could not have predicted that their protocol would eventually sustain a global network of information, commerce, and entertainment. 20 years ago, IP version 4 (IPv4) offered an addressing strategy that, although scalable for a time, resulted in an inefficient allocation of addresses. Over the past two decades, engineers have successfully modified IPv4 so that it could survive the Internet's exponential growth. Meanwhile, an even more extendible and scalable version of IP, IP version 6 (IPv6), has been defined and developed. Today IPv6 is slowly being implemented in select networks. Eventually, IPv6 may replace IPv4 as the Internet's dominant protocol.

This chapter explores the evolution and extension of IPv4, including the key scalability features that engineers have added to it over the years: subnetting, classless interdomain routing (CIDR), variable-length subnet masking (VLSM), and route summarization. Finally, this chapter examines advanced IP implementation techniques, such as IP unnumbered, Dynamic Host Configuration Protocol (DHCP), and helper addresses.

When TCP/IP was first introduced in the 1980s, it relied on a two-level addressing scheme, which at the time offered adequate scalability. IPv4's 32-bit-long address identifies a network number and a host number, as shown in Figure .

Together, the network number and the host number uniquely identify all hosts connected via the Internet. It is possible that the needs of a small, networked community could be satisfied with just host addresses, as is the case with LANs. However, network addresses are necessary for end systems on different networks to communicate with each other. Routers use the network portion of the address to make routing decisions and facilitate communication between hosts that belong to different networks.

Unlike routers, humans find working with strings of 32 ones and zeros tedious and clumsy. Therefore, 32-bit IP addresses are written using dotted-decimal notation. Each 32-bit address is divided into four groups of eight, called octets, and each octet is converted to decimal and then separated by decimal points, or dots.

In the dotted decimal address, 132.163.128.17, which of these four numbers represents the network portion of the address? Which are the host numbers? Finding the answers to these questions is complicated by the fact that IP addresses are not really four numbers. They actually consist of 32 different numbers (32 bits)!

In the early days of TCP/IP, a class system was used to define the network and host portions of the address. IPv4 addresses were grouped into five distinct classes, according to the value of the first few bits in the first octet of the address. Although you can still apply the class system to IP addresses, today's networks often ignore the rules of class in favor of a classless IP scheme.

In the following sections, we will examine the limitations of the IP address classes, the subsequent addition of the subnet mask, and the addressing crisis that led to the adoption of a classless system.

In a classful system, IP addresses can be grouped in to one of five different classes: A, B, C, D, and E. Each of the four octets of an IP address represents either the network portion or the host portion of the address, depending on the address's class.

Only the first three classes (A, B, and C) are used for addressing actual hosts on IP networks. Class D addresses are used for multicasting, and Class E addresses are reserved for experimentation and are not shown in the figure. The following sections explore each of the five classes of addresses.

Class A Addresses
If the first bit of the first octet of an IP address is a binary 0, then the address is a Class A address. With that first bit a 0, the lowest number that can be represented is 00000000 (decimal 0), and the highest number that can be represented is 01111111 (decimal 127). Any address that starts with a value between 0 and 127 in the first octet is a Class A address. These two numbers, 0 and 127, are reserved and can not be used as a network address.

Class A addresses were intended to accommodate very large networks, so only the first octet is used to represent the network number, which leaves three octets (or 24 bits) to represent the host portion of the address. With 24 bits total, 224 combinations are possible, yielding 16,777,216 possible addresses. Two of those possibilities, the lowest and highest values (24 zeros and 24 ones), are reserved for special purposes, so each Class A address can support up to 16,777,214 unique host addresses.

Why are two host addresses reserved for special purposes? Every network requires a network number, an ID number that is used to refer to the entire range of hosts when building routing tables. The address that contains all 0s in the host portion is used as the network number and cannot be used to address an individual node. 46.0.0.0 is a class A network number. Similarly, every network requires a broadcast address that can be used to address a message to every host on a network. It is created when you have all 1s in the host portion of the address.

With almost 17 million host addresses available, a Class A network actually provides too many possibilities for one company or campus. Although you can imagine an enormous global network with that many nodes, the hosts in such a network could not function as members of the same logical group. Administrators require much smaller logical groupings to control broadcasts, apply policies, and troubleshoot problems. Fortunately, the subnet mask allows you to subnet, which means to break a large block of addresses into smaller groups called subnetworks. All Class A networks are subnetted. If they were not, Class A networks would represent huge waste and inefficiency!

How many Class A addresses are there? If only the first octet is used as network number, and it contains a value between 0 and 127, then 126 Class A networks exist. There are only 126 Class A addresses, each with almost 17 million possible hosts addresses which makes up about half of the entire IPv4 address space! Under this system, a mere handful of organizations control half of the Internet's addresses.

Class B Addresses
Class B addresses start with a binary 10 in the first 2 bits of the first octet. Therefore, the lowest number that can be represented with a Class B address is 10000000 (decimal 128), and the highest number that can be represented is 10111111 (decimal 191). Any address that starts with a value in the range of 128 to 191 in the first octet is a Class B address.

Class B addresses were intended to accommodate medium-size networks, so the first two octets are used to represent the network number, which leaves two octets (or 16 bits) to represent the host portion of the address. With 16 bits total, 216 combinations are possible, yielding 65,536 Class B addresses. Recall that two of those numbers, the lowest and highest values, are reserved for special purposes, so each Class B address can support up to 65,534 hosts. Though significantly smaller than the networks created by Class A addresses, a logical group of more than 65,000 hosts is still unmanageable and impractical. Therefore, like Class A networks, Class B addresses are subnetted to improve efficiency.

There are 16,384 Class B networks. The first octet of a Class B address offers 64 possibilities (128 to 191), and the second octet has 256 (0 to 255). That yields 16,384 (64 * 256) addresses, or 25 percent of the total IP space. Nevertheless, given the popularity and importance of the Internet, these addresses have run out quickly, which essentially leaves only Class C addresses available for new growth.

Class C Addresses
A Class C address begins with binary 110. Therefore, the lowest number that can be represented is 11000000 (decimal 192), and the highest number that can be represented is 11011111 (decimal 223). If an IPv4 address contains a number in the range of 192 to 223 in the first octet, it is a Class C address.

Class C addresses were originally intended to support small networks; the first three octets of a Class C address represent the network number, and the last octet may be used for hosts. One octet for hosts yields 256 possibilities; after you subtract the all 0s network number and all 1s broadcast address; only 254 hosts may be addressed on a Class C network. Whereas Class A and Class B networks prove impossibly large (without subnetting), Class C networks can impose too restrictive a limit on hosts.

With 2,097,152 total network addresses containing a mere 254 hosts each, Class C addresses account for 12.5 percent of the Internet's address space. With Class A and B exhausted, the remaining Class C addresses are all that is left to be assigned to new organizations that need IP networks. The figure summarizes the ranges and availability of three address classes used to address Internet hosts.

Class D Addresses
A Class D address begins with binary 1110 in the first octet. Therefore, the first octet range for Class D addresses is 11100000 to 11101111, or 224 to 239. Class D addresses are not used to address individual hosts. Instead, each Class D address can be used to represent a group of hosts called a host group, or multicast group.

For example, a router configured to run EIGRP joins a group that includes other nodes that are also running EIGRP. Members of this group still have unique IP addresses from the Class A, B, or C range, but they also listen for messages addressed to 224.0.0.10, which is a Class D address. Therefore, a single routing update message can be sent to 224.0.0.10, and all EIGRP routers will receive it. A single message sent to several select recipients is called a multicast. Class D addresses are also called multicast addresses.

A multicast is different from a broadcast. Every device on a logical network receives a broadcast, whereas only devices configured with a Class D address receive a multicast.

Class E Addresses
If the first octet of an IP address begins with 1111, then the address is a Class E address. Class E addresses are reserved for experimental purposes and should not be used for addressing hosts or multicast groups.

As you may already know from previous study or experience, subnet masking, or subnetting, is used to break one large group into several smaller subnetworks. These subnets can then be distributed throughout an enterprise, resulting in less waste and better logical organization. Formalized with RFC 950 in 1985, subnetting introduced a third level of hierarchy to the IPv4 addressing structure. The number of bits available to the network, subnet, and host portions of a given address varies depending on the size of the subnet mask.

A subnet mask is a 32-bit number that acts as a counterpart to the IP address. Each bit in the mask corresponds to its counterpart bit in the IP address. If a bit in the IP address corresponds to a 1 bit in the subnet mask, the IP address bit represents a network number. If a bit in the IP address corresponds to a 0 bit in the subnet mask, the IP address bit represents a host number.

In effect, the subnet mask (when known) overrides the address class to determine whether a bit is either network or host. This allows you to configure routers and other hosts to recognize addresses differently than the format dictated by class. For example, you can use the mask to tell hosts that, even though their addresses are Class B, the first three octets (instead of the first two) are the network number. In this case, the additional octet acts like part of the network number, but only inside the organization where the mask is configured.

The subnet mask applied to an address ultimately determines the network and host portions of an IP address. The network and host portions change when the subnet mask changes. If you apply the mask 255.255.0.0, only the first 16 bits (two octets) of the IP address 172.24.100.45 represent the network number, as shown in Figure . Therefore, the network number for this host address is 172.24.0.0. The shaded portion of the address in Figure indicates the network number.

Because the rules of class dictate that the first two octets of a Class B address are the network number, this 16-bit mask does not create subnets within the 172.24.0.0 network.

To create subnets with this Class B address, you must use a mask that identifies bits in the third or fourth octet as part of the network number.

You can apply a 24-bit mask, 255.255.255.0, which specifies the first 24 bits of the IP address as the network number. The network number for this example host is 172.24.100.0. The shaded portion of the address in Figure indicates this.

Routers and hosts configured with this mask will see all 8 bits in the third octet as part of the network number. These 8 bits are considered the subnet field because they represent network bits beyond the two octets prescribed by classful addressing.

Inside this network, devices configured with a 24-bit mask will use the 8 bits of the third octet to determine what subnet a host belongs. Because 172.24.100.45 and 172.24.101.46 have different values in the third octet, they do not belong to the same logical network. Hosts must match subnet fields to communicate with each other directly. Otherwise, the services of a router must be used so that a host on one subnet can talk to a host on another.

An 8-bit subnet field creates 2⁸, or 256, potential subnets. Because 8 bits remain in the host field, 254 hosts may populate each network (two host addresses are reserved as the network number and broadcast address, respectively). By dividing a Class B network into smaller logical groups, you can make the internetwork more manageable, more efficient, and more scalable.

Note that subnet masks are not sent as part of an IP packet header, so routers outside this network will not know what subnet mask is configured inside the network. An outside router will therefore treat 172.24.100.45 as just one of sixty-five thousand hosts that belong to the 172.24.0.0 network. In effect, subnetting provides a logical structure that is hidden from the outside world.

Class A and B addresses make up 75 percent of the IPv4 address space, but a relative handful of organizations (fewer than 17,000) can be assigned a Class A or B network number. Class C network addresses are far more numerous than Class A and Class B addresses, although they account for only 12.5 percent of the possible 4 billion (2³²) IP hosts, as shown in the Figure.

Unfortunately, Class C addresses are limited to 254 hosts, which will not meet the needs of larger organizations that can not acquire a Class A or B address. Even if there were more Class A, B, and C addresses, too many network addresses would cause Internet routers to grind to a halt under the weight of enormous routing tables.

Ultimately, the classful system of IP addressing, even with subnetting, could not scale to effectively handle global demand for Internet connectivity. As early as 1992, the Internet Engineering Task Force (IETF) identified two specific concerns:

Exhaustion of the remaining, unassigned IPv4 network addresses. At the time, the Class B space was on the verge of depletion.
The rapid and substantial increase in the size of the Internet's routing tables is because of its growth. As more Class C's came online, the resulting flood of new network information threatened Internet routers' capability to cope effectively.

In the short term, the IETF decided that a retooled IPv4 would have to hold out long enough for engineers to design and deploy a completely new Internet Protocol. That new protocol, IPv6, solves the address crisis by using a 128-bit address space. After years of planning and development, IPv6 promises to be ready for wide-scale implementation, although it continues (for the most part) to wait in the wings.

One reason that IPv6 has not been rushed into service is that the short-term extensions to IPv4 have been so effective. By eliminating the rules of class, IPv4 now enjoys renewed viability.

Routers use a form of IPv4 addressing called classless interdomain routing (CIDR) (pronounced "cider") that ignores class. In a classful system, a router determines the class of an address and then identifies the network and host octets based on that class. With CIDR, a router uses a bitmask to determine the network and host portions of an address, which are no longer restricted to using an entire octet.

First introduced in 1993 by RFC 1517, 1518, 1519, and 1520, and later deployed in 1994, CIDR dramatically improves IPv4's scalability and efficiency by providing the following:

The replacement of classful addressing with a more flexible and less wasteful classless scheme
Enhanced route aggregation, also known as supernetting

The following sections describe route aggregation, supernetting, and address allocation in more detail.

By using a bitmask instead of an address class to determine the network portion of an address, CIDR allows routers to aggregate, or summarize, routing information. This shrinks the size of the router's routing tables. In other words, just one address and mask combination can represent the routes to multiple networks.

Without CIDR and route aggregation, a router must maintain individual entries for the Class B networks shown in Figure .

The shaded columns in Figure identify the 16 bits that, based on the rules of class, represent the network number. Classful routers are forced to handle Class B networks using these 16 bits. Because the first 16 bits of each of these eight network numbers are unique, a classful router sees eight unique networks and must create a routing table entry for each. However, these eight networks do have common bits, as shown by the shaded portion of Figure .

Figure shows that the example eight-network addresses have the first 13 bits in common. A CIDR-compliant router can summarize routes to these eight networks by using a 13-bit prefix, which these eight networks, and only these networks, share these bits:

10101100 00011

To represent this prefix in decimal terms, the rest of the address is padded with zeros and then paired with a 13-bit subnet mask:

10101100 00011000 00000000 00000000 = 172.24.0.0
11111111 11111000 00000000 00000000 = 255.248.0.0

Thus, a single address and mask define a classless prefix that summarizes routes to the eight networks, 172.24.0.0/13.

By using a prefix address to summarize routes, you can keep routing table entries manageable, which results in the following:

More efficient routing
A reduced number of CPU cycles when recalculating a routing table or when sorting through the routing table entries to find a match
Reduced router memory requirements

Supernetting is the practice of using a bitmask to group multiple classful networks as a single network address. Supernetting and route aggregation are different names for the same process, although the term supernetting is most often applied when the aggregated networks are under common administrative control. Supernetting and route aggregation are essentially the inverse of subnetting.

Recall that the Class A and Class B address space is virtually exhausted, leaving large organizations little choice but to request multiple Class C network addresses from their providers. If a company can acquire a block of contiguous (that is, sequential) Class C network addresses, supernetting can be used so that the addresses appear as a single large network, or supernet.

Consider Company XYZ, which requires addresses for 400 hosts. Under the classful addressing system, XYZ could apply to a central Internet address authority for a Class B address. If the company got the Class B and then used it to address one logical group of 400 hosts, tens of thousands of addresses would be wasted. A second option for XYZ would be to request two Class C network numbers, yielding 508 (2 * 254) host addresses. The drawback to this approach: XYZ would have to route between its own logical networks, and default-free Internet routers would need to maintain two routing table entries for XYZ's network, rather than just one.

Under a classless addressing system, supernetting allows XYZ to get the address space that it needs without wasting addresses or increasing the size of routing tables unnecessarily. Using CIDR, XYZ asks for an address block from its Internet service provider, not a central authority such as the InterNIC. The ISP assesses XYZ's needs and allocates address space from its own large "CIDR block" of addresses. Providers assume the burden of managing address space in a classless system. With this system, Internet routers keep only one summary route, or supernet route, to the provider's network, and the provider keeps routes that are more specific to its customer networks. This method drastically reduces the size of Internet routing tables.

In the following example, XYZ receives two contiguous Class C addresses, 207.21.54.0 and 207.21.55.0. If you examine the shaded portion of Figure , you will see that these network addresses have this common 23-bit prefix:

11001111 00010101 0011011

When supernetted with a 23-bit mask (207.21.54.0 /23), the address space provides well over 400 host addresses (2⁹) without the tremendous waste of a Class B address. With the ISP acting as the addressing authority for a CIDR block of addresses, the ISP's customer networks, which include XYZ, can be advertised among Internet routers as a single supernet. In Figure , the ISP manages a block of 256 Class C addresses and advertises them to the world using a 16-bit prefix: 207.21.0.0 /16.

When CIDR enabled ISPs to hierarchically distribute and manage blocks of contiguous addresses, IPv4 address space enjoyed the following benefits:

Efficient allocation of addresses
Reduced number of routing table entries

VLSM allows an organization to use more than one subnet mask within the same network address space. Implementing VLSM is often referred to as "subnetting a subnet," and it can be used to maximize addressing efficiency.

Consider the subnets created by borrowing 3 bits from the host portion of the Class C address, 207.21.24.0, shown in Figure .

If you use the ip subnet-zero command, this mask creates seven usable subnets of 30 hosts each. You can use four of these subnets to address remote offices in the organization picture in Figure , at sites A, B, C, and D.

Unfortunately, you have only three subnets left for future growth, and you have yet to address (literally) the three point-to-point WAN links between the four sites. If you assign the three remaining subnets to the WAN links, you would completely exhaust your supply of IP addresses. Moreover, squandering the remaining 30-host subnets to address these two-node networks will waste more than a third of your available address space!

As you may have guessed, there are ways to avoid this kind of waste. Over the past 20 years, network engineers have developed three strategies for efficiently addressing point- to-point WAN links:

Use VLSM
Use private addressing (RFC 1918)
Use IP unnumbered

Private addresses and IP unnumbered are discussed in detail later in this chapter. This section focuses on VLSM. If VLSM is applied to your addressing problem, your Class C address can be broken into groups (i.e., subnets) of various sizes. Large subnets are created for addressing LANs, and very small subnets are created for WAN links and other special cases.

You use a 30-bit mask to create subnets with only two valid host addresses, the exact number needed for a point-to-point connection. In Figure , you can see what happens if we take one of our three remaining subnets (subnet 6) and subnet it again using a 30-bit mask.

Subnetting the 207.21.24.192 /27 subnet in this way supplies us with eight ranges of addresses to be used for point-to-point networks. For example, the network 207.21.24.192/30 can be used to address the point-to-point serial link between Site A's router and Site B's router .

So how do you configure VLSM on a Cisco router? Figure shows the commands needed to configure Site A's router (RTA) with a 27-bit mask on its Ethernet port and a 30-bit mask on its serial port.

For routers in a variably subnetted network to properly update each other, they must send masks in their routing updates. Without subnet information in the routing updates, routers will have nothing but the address class and their own subnet mask to go on. Only routing protocols that ignore the rules of address class and use classless prefixes will work properly with VLSM (see the figure).

RIPv1 and IGRP, common interior gateway protocols, cannot support VLSM because they do not send subnet information in their updates. Upon receiving an update packet, these classful routing protocols will use one of the following methods to determine the network prefix of an address:

If the router receives information about a network, and if the receiving interface belongs to that same network (but on a different subnet), the router applies the subnet mask that is configured on the receiving interface.
If the router receives information about a network address that is not the same as the one configured on the receiving interface, it applies the default (by class) subnet mask.

Despite its limitations, RIP is a very popular routing protocol and is supported by virtually all IP routers. RIP's popularity stems from its simplicity and universal compatibility. However, the first version of RIP (RIPv1) suffers from several critical deficiencies:

RIPv1 does not send subnet mask information in its updates. Without subnet information, VLSM and CIDR cannot be supported
Its updates are broadcast increasing network traffic.
It does not support authentication.

In 1988, RFC 1058 prescribed the new (and improved) RIP version 2 to address these deficiencies:

RIPv2 does send subnet information and therefore supports VLSM and CIDR.
It multicasts routing updates using the Class D address 224.0.0.9, providing better efficiency.
It provides for authentication in its updates.

Because of these key features, RIPv2 should always be preferred over RIPv1, unless some legacy device on the network can not support it.

When RIP is first enabled on a Cisco router, the router listens for version 1 and 2 updates but sends only version 1. To take advantage of version 2's features, you can turn off version 1 support and enable version 2 updates with the following command:

Router(config)#router rip
Router(router-config)#version 2

RIP's straightforward design ensures that it will continue to survive. A new version has already been designed to support future IPv6 networks.

The use of CIDR and VLSM not only prevents address waste, but it also promotes route aggregation, or summarization. Without route summarization, Internet backbone routing would likely have collapsed sometime before 1997.

The figure illustrates how route summarization reduces the burden on upstream routers. This complex hierarchy of variable-sized networks and subnetworks is summarized at various points using a prefix address until the entire network is advertised as a single aggregate route: 200.199.48.0 /20.

Recall that this kind of route summarization, or supernetting, is possible only if the network's routers run a classless routing protocol, such as OSPF or EIGRP. Classless routing protocols carry the prefix length (subnet mask) with the 32-bit address in routing updates. In the figure, the summary route that eventually reaches the provider contains a 20-bit prefix common to all of the addresses in the organization, 200.199.48.0 /20 or 11001000 11000111 0001. For summarization to work properly, you must carefully assign addresses in a hierarchical fashion so that summarized addresses will share the same high-order bits.

Route flapping occurs when a router's interface alternates rapidly between the "up" and "down" states. This can be caused by a number of factors, including a faulty interface or poorly terminated media.

Summarization can effectively insulate upstream routers from route flapping problems. Consider RTC in the figure. If RTC's interface connected to the 200.199.56.0 network goes down, RTC will remove that route from its table. If the routers were not configured to summarize, RTC would then send a triggered update to RTZ about the removal of the specific network, 200.199.56.0. In turn, RTZ would update the next router upstream, and so on. Every time these routers are updated with new information, their processors must go to work. It is possible (especially in the case of OSPF routing) that the processors can work hard enough to noticeably impact performance. Now, consider the impact on performance if RTC's interface to network 200.199.56.0 comes back up after only a few seconds. The routers update each other and recalculate. In addition, what happens when RTC's link goes back down seconds later? And then back up? This is route flapping, and it can cripple a router with excessive updates and recalculations.

However, the summarization configuration prevents RTC's route flapping from affecting any other routers. RTC updates RTZ about a supernet (200.199.56.0 /21) that includes eight networks (200.199.56.0 through 200.199.63.0). The loss of one network does not invalidate the route to the supernet. While RTC may be kept busy dealing with its own route flap, RTZ (and all upstream routers) do not notice a thing. Summarization effectively insulates the other routers from the problem of route flapping.

Because TCP/IP is the world's dominant routed protocol, most network applications and operating systems offer extensive support for it. Thus, many designers build their networks around TCP/IP-even if they do not require Internet connectivity. As you already know, Internet hosts require a globally unique IP addresses. However, private hosts that are not connected to the Internet can use any valid address, as long as it is unique within the private network.

Because many private networks exist alongside public nets, grabbing "just any address" is strongly discouraged. RFC 1918 sets aside three blocks of IP addresses (i.e., a Class A, a Class B, and a Class C range) for private, internal use. Addresses in this range will not be routed on the Internet backbone (see Figure ). Internet routers immediately discard private addresses.

If you are addressing a nonpublic intranet, a test lab, or a home network, these private addresses can be used instead of globally unique addresses. Global addresses must be obtained from a provider or a registry at some expense.

RFC 1918 addresses have found a home in production networks as well. Earlier in this chapter, you saw the advantages of using VLSM to address the point-to-point WAN links in an internetwork. Recall that with VLSM, you were able to further subnet one of the subnets left in a Class C network's address space. Although this solution was better than wasting an entire 30-host subnet on each two-host WAN link, it still costs one subnet that could have been used for future growth. A less wasteful solution is to address the WAN links using private network numbers. In Figure , the WAN links are addressed using subnets from the private address space, 10.0.0.0 /8.

How can these routers use private addresses if LAN users at site A, B, C, and D expect to access the Internet? End users at these sites should have no problem because they use globally unique addresses from the 207.21.24.0 network. The routers use their serial interfaces with private addresses merely to forward traffic and exchange routing information. Upstream providers and Internet routers see only the source and destination IP addresses in the packet; they do not care if the packet traveled through links with private addresses at some point. In fact, many providers use RFC 1918 network numbers in the core of their network to avoid depleting their supply of globally unique addresses.

One trade-off of using private numbers on WAN links is that these serial interfaces cannot be the original source of traffic bound for the Internet or the final destination of traffic from the Internet. Routers do not normally spend time surfing the web, so this limitation typically becomes an issue only when troubleshooting with ICMP, using SNMP, or connecting remotely with Telnet over the Internet. In those cases, the router can be addressed only by its globally unique LAN interfaces.

The following sections discuss implementation of a private addresses scheme, including the pitfalls of discontiguous subnets and the advantages of Network Address Translation (NAT).

Mixing private addresses with globally unique addresses can create discontiguous subnets, which are subnets from the same major network that are separated by a completely different major network or subnet.

In the figure, Site A and Site B both have LANs that are addressed using subnets from the same major net (207.21.24.0). They are discontiguous because the 10.0.0.4/30 network separates them. Classful routing protocols, notably RIPv1 and IGRP, cannot support discontiguous subnets because the subnet mask is not included in routing updates. If Site A and Site B are running RIPv1, Site A will receive updates about network 207.21.24.0/24 and not about 207.21.24.32/27 because the subnet mask is not included in the update. Because Site A has an interface directly connected to that network (in this case, E0), Site A will reject Site B's route.

Even some classless routing protocols require additional configuration to solve the problem of discontiguous subnets. RIPv2 and EIGRP automatically summarize on classful boundaries unless explicitly told not to. Usually, this type of summarization is desirable, but in the case of discontiguous subnets, the following command must be entered for both RIPv2 and EIGRP to disable automatic summarization:

Router(config-router)#no auto-summary

Finally, when using private addresses on a network that is connected to the Internet, you should filter packets and routing updates to avoid "leaking" any RFC 1918 addresses between autonomous systems. For example, if both you and your provider use addresses from the 192.168.0.0 /16 block, your routers could get confused if confronted with updates from both systems.

NAT, as defined by RFC 1631, is the process of swapping one address for another in the IP packet header. In practice, NAT is used to allow hosts that are privately addressed (using RFC 1918 addresses) to access the Internet.

A NAT-enabled device, such as a UNIX computer or a Cisco router, operates at the border of a stub domain (i.e., an internetwork that has a single connection to the outside world). When a host inside the stub domain wants to transmit to a host on the outside, it forwards the packet to the NAT-enabled device. The NAT process then looks inside the IP header and, if appropriate, replaces the inside IP address with a globally unique IP address. When an outside host sends a response, the NAT process receives it, checks the current table of network address translations, and replaces the destination address with the original inside source. NAT translations can occur dynamically or statically and can be used for a variety of purposes.

The most powerful feature of NAT routers is their capability to use port address translation (PAT), which allows multiple inside addresses to map to the same global address. This is sometimes called a "many-to-one" NAT. With PAT, or address overloading, literally hundreds of privately addressed nodes can access the Internet using only one global address. The NAT router keeps track of the different conversations by mapping TCP and UDP port numbers.

Throughout this chapter, you have seen ways to maximize an organization's use of IP addresses. In previous sections, you learned that you could avoid wasting an entire subnet on the point-to-point serial links by using VLSM, or use private addresses instead. Neither technique can be supported by classful routing protocols, such as the popular RIPv1 and IGRP. Fortunately, the Cisco IOS offers a third option for efficiently addressing serial links: IP unnumbered.

When a serial interface is configured for IP unnumbered, it borrows the IP address of another interface (usually a LAN interface or loopback interface) and therefore does not need its own address. Not only does IP unnumbered avoid wasting addresses on point-to-point WAN links, but it also can be used with classful routing protocols, where VLSM and discontiguous subnets cannot. If your network runs RIPv1 or IGRP, IP unnumbered may be the only solution to maximize your addresses.

RTA's S1 (168.71.5.1) and RTB's S1 (168.71.8.1) can communicate using TCP/IP over this serial link, even though they do not belong to the same IP network. This is possible because it is a point-to-point link, so there is no confusion about which device a packet is originating from or destined for. There are two ground rules for configuring IP unnumbered on an interface:

The interface is both serial and connected via a point-to-point link.
The same major network with the same mask is used to address the LAN interfaces that "lend" their IP address on both sides of the WAN link.

Different major networks with no subnetting are used to address the LAN interfaces on both sides of the WAN link.

Using IP unnumbered is not without its drawbacks, which include the following:

You cannot use ping to determine whether the interface is up because the interface has no IP address.
You cannot boot from a network IOS image over an unnumbered serial interface.
You cannot support IP security options on an unnumbered interface.

After designing a scalable IP addressing scheme for the enterprise, you will be faced with the daunting task of implementation. Routers, servers, and other key nodes usually require special attention from administrators, but desktop clients are often automatically assigned IP configurations using Dynamic Host Configuration Protocol (DHCP). Because desktop clients typically make up the bulk of network nodes, DHCP is good news for systems administrators. Small offices and home offices can also take advantage of DHCP by using Easy IP, a Cisco IOS feature set that combines DHCP with NAT functions.

DHCP works by configuring servers to give out IP configuration information to clients. Clients lease the information from the server for an administratively defined period. When the lease is up, the host must ask for another address, although the host is typically reassigned the same one. -

Administrators typically prefer to use a Microsoft NT server or a UNIX computer to offer DHCP services because these solutions are highly scalable and relatively easy to manage. Even so, the Cisco IOS offers an optional, fully featured DHCP server, which leases configurations for 24 hours by default.

Administrators set up DHCP servers to assign addresses from predefined pools. DHCP servers can also offer other information, such as DNS server addresses, WINS server addresses, and domain names. Most DHCP servers also allow you to define specifically what client MAC addresses can be serviced and to automatically assign the same number to a particular host each time.

Note: BootP was originally defined in RFC 951 in 1985. It is the predecessor of DHCP, and it shares some operational characteristics. Both protocols use UDP ports 67 and 68, which are well known as "BootP" ports because BootP came before DHCP.

The DHCP client configuration process is shown in the figure. This process follows these steps:

The client sends a DHCPDISCOVER broadcast to all nodes. Then a client set up for DHCP needs an IP configuration (typically at boot time), it tries to locate a DHCP server by sending a broadcast called a DHCPDISCOVER.
The server sends a DHCPOFFER unicast to client. When the server receives the broadcast, it determines whether it can service the request from its own database. If it cannot, the server may forward the request on to another DHCP server or servers, depending on its configuration. If it can, the DHCP server offers the client IP configuration information in the form of a unicast DHCPOFFER. The DHCPOFFER is a proposed configuration that may include IP address, DNS server address, and lease time.
The client sends a DHCPREQUEST broadcast to all nodes. If the client finds the offer agreeable, it will send another broadcast, a DHCPREQUEST, specifically requesting those particular IP parameters. Why does the client broadcast the request instead of unicasting it to the server? A broadcast is used because the very first message, the DHCPDISCOVER, may have reached more than one DHCP server (after all, it was a broadcast). If more than one server makes an offer, the broadcasted DHCPREQUEST lets everyone know which offer was accepted (it is usually the first offer received).
The server sends a DHCPACK unicast to client. The server that receives the DHCPREQUEST makes the configuration official by sending a unicast acknowledgment, the DHCPACK. Note that it is possible but highly unlikely that the server will not send the DHCPACK because it may have leased that information to another client in the interim. Receipt of the DHCPACK message enables the client to begin using the assigned address immediately.

Depending on an organization's policies, it may be possible for an end user or an administrator to statically assign a host an IP address that belongs in the DHCP server's address pool. Just in case, the Cisco IOS DHCP server always checks to make sure that an address is not in use before the server offers it to a client. The server issues ICMP echo requests (ping) to a pool address before sending the DHCPOFFER to a client. Although configurable, the default number of pings used to check for potential IP address conflict is two (the more pings, the longer the configuration process takes).

Although it is enabled by default on versions of the Cisco IOS that support it, the DHCP server process can be re-enabled by using the service dhcp global configuration command. The no service dhcp command disables the server.

Like NAT, DHCP server requires that the administrator define a pool of addresses. In Figure , the ip dhcp pool command defines which addresses will be assigned to hosts.

The first command, ip dhcp pool room12, creates a pool named room12 and puts the router in a specialized DHCP configuration mode. In this mode, you use the network statement to define the range of addresses to be leased. If you want to exclude specific addresses on this network, you must return to global configuration mode, as shown in Figure .

This ip dhcp excluded-address command configures the router to exclude 172.16.1.1 through 172.16.1.10 when assigning addresses to clients. You may choose to use the ip dhcp excluded-address command to reserve addresses that are statically assigned to key hosts.

Typically, you will want a DHCP server to configure much more than an IP address. Other IP configuration values can be set from the DHCP config mode, as shown in Figure .

IP clients will not get very far without a default gateway, which can be set by using the default-router command. The address of the DNS server (dns-server) and WINS server (netbios-name-server) can be configured here as well. The IOS DHCP server can configure clients with virtually any TCP/IP information.

Figure lists the key IOS DHCP server commands, which you enter in DHCP pool configuration mode (identified by the dhcp-config# prompt).

Using the EXEC mode commands shown in Figure can monitor DHCP server operation.

Easy IP is a combination of Cisco IOS features that allows a router to negotiate its own IP address and to do NAT through that negotiated address. Typically deployed on a small office/home office (SOHO) router, Easy IP is useful in cases where a small LAN connects to the Internet via a provider that dynamically assigns only one IP address for the entire remote site.

A SOHO router with the Easy IP feature set uses DHCP to automatically address local LAN clients with RFC 1918 addresses. When the router dynamically receives its WAN interface address via the Point-to-Point Protocol, it uses NAT overload to translate between local inside addresses and its single global address. Therefore, both the LAN side and the WAN side are dynamically configured with little or no administrative intervention. In effect, Easy IP offers "plug-and-play" routing.

DHCP is not the only critical service that uses broadcasts. Cisco routers and other devices may use broadcasts to locate TFTP servers. Some clients may need to broadcast to locate a TACACS (security) server. Typically, in a complex hierarchical network, clients reside on the same subnet as key servers. Such remote clients will broadcast to locate these servers, but routers, by default, will not forward client broadcasts beyond their subnet. Because some clients are dead in the water without services such as DHCP, you are faced with two choices: to place servers on all subnets, or to use the Cisco IOS helper address feature. Running services such as DHCP or DNS on several computers creates overhead and administrative headaches, so the first option is not very appealing. When possible, administrators use the ip helper-address command to relay broadcast requests for these key UDP services.

By using the helper address feature, a router can be configured to accept a broadcast request for a UDP service and then forward it as a unicast to a specific IP address. Alternately, the router can forward these requests as directed broadcasts to a specific network or subnetwork.

To configure the helper address, identify the router interface that will be receiving the broadcasts for UDP services. In interface configuration mode, use the ip helper-address command to define the address to which UDP broadcasts for services should be forwarded.

By default, the ip helper-address command forwards the eight UDP services shown in Figure .

What if Company XYZ needs to forward requests for a service not on this list? The Cisco IOS provides the global configuration command ip forward-protocol to allow an administrator to forward any UDP port in addition to the default eight. In order to forward UDP on port 517, you would use the global configuration command, ip forward-protocol udp 517. This command is used not only to add a UDP port to the "default eight" (see Figure ), but also to subtract an unwanted service from the default group. For instance, if you wanted to forward DHCP, TFTP, and DNS, and, for some reason, not Time, TACACS, and NetBIOS, the Cisco IOS requires that you configure the router according to Figure .

Consider this complex sample helper address configuration (see Figure ). Assume you want Host A to automatically obtain its IP configuration from the DHCP server at 172.24.1.9. Because RTA will not forward Host A's DHCPDISCOVER broadcast, you must configure RTA to help Host A.

To configure RTA's E0 (the interface that receives Host A's broadcasts) to relay DHCP broadcasts as a unicast to the DHCP server, use the following commands:

RTA(config)#interface e0

RTA(config-if)#ip helper-address 172.24.1.9

With this simple configuration, Host A broadcasts using any of the eight default UDP ports will be relayed to the DHCP server's IP address. However, what if Host A also needs to use the services of the NetBIOS server at 172.24.1.5? As configured, RTA will forward NetBIOS broadcasts from Host A to the DHCP server. Moreover, if Host A sends a broadcast TFTP packet, RTA also will forward this to the DHCP server at 172.24.1.9. What is needed in this example is a helper address configuration that relays broadcasts to all servers on the segment. The following commands configure a directed broadcast to the IP subnet that is being used as a server farm:

RTA(config)#interface e0

RTA(config-if)#ip helper-address 172.24.1.255

Configuring a directed broadcast to the server segment (172.24.1.255) is more efficient than entering the IP address of every server that could potentially respond to Host A's UDP broadcasts.

Finally, some devices on Host A's segment need to broadcast to the TACACS server, which does not reside in the server farm. You can configure RTA's E0 to make it work by adding the command ip helper-address 172.16.1.2.

You can verify the correct helper configuration with the show ip interface command, as shown in Figure .

Notice in Figure that RTA's interface E3 (which connects to the server farm) is not configured with helper addresses. However, the output in Figure also shows that, for this interface, directed broadcast forwarding is disabled. This means that the router will not convert the logical broadcast 172.24.1.255 into a physical broadcast (with a Layer 2 address of FF-FF-FF-FF-FF-FF). To allow all the nodes in the server farm to receive the broadcasts at Layer 2, you will have to configure E3 to forward directed broadcasts with the following command:

RTA(config)#interface e3

RTA(config-if)#ip directed-broadcastIn this chapter, you have seen that IPv4 addressing faces two major problems: the depletion of addresses, particularly the key medium-sized space (Class B), and dangerous overgrowth of Internet routing tables.

In the early 1990s, CIDR ingeniously built on the concept of the address mask and stepped forward to temporarily alleviate these crushing problems. The hierarchical nature of CIDR dramatically improved IPv4's scalability. Once again, a hierarchical design proves to be a scalable one.

Yet even with subnetting (1985), variable-length subnetting (1987), and CIDR (1993), a hierarchical structure could not save IPv4 from one simple problem: There just are not enough addresses to meet future need. At roughly 4 billion possibilities, the IPv4 address space is formidable, but it will not suffice in a future world of mobile Internet-enabled devices and IP-addressable household appliances (RFC 2235 references the world's first "Internet toaster").

Recent short-term IPv4 solutions to the address crunch, such as RFC 1918, which sets aside addresses for unlimited internal use, and NAT, which allows thousands of hosts to access the Internet with only a handful of valid addresses.

However, the ultimate solution to the address shortage is the introduction of IPv6 and its 128-bit address. Developed to create a supply of addresses that would outlive demand, IPv6 is on course to eventually replace IPv4. The fantastically large address space of IPv6 will provide not only far more addresses than IPv4, but additional levels of hierarchy as well. For the record, 128 bits allows for 340,282,366,920,938,463,
463,374,607,431,768,211,456 possibilities.

In 1994, the IETF proposed IPv6 in RFC 1752, and a number of working groups were formed in response. IPv6 tackles issues such as address depletion, quality of service, address autoconfiguration, authentication, and security.

It will not be easy for organizations deeply invested in the IPv4 scheme to migrate to a totally new architecture. As long as IPv4 (with its recent extensions and CIDR-enabled hierarchy) remains viable, administrators will shy away from adopting IPv6. A new IP protocol requires new software, new hardware, and new methods of administration. It is likely that IPv4 and IPv6 will coexist, even within an autonomous system, for years to come.

As defined first by RFC 1884 and later revised by RFC 2373, IPv6 addresses are 128-bit identifiers for interfaces and sets of interfaces, not nodes. Three general types of addresses exist:

Unicast - An identifier for a single interface. A packet sent to a unicast address is delivered to the interface identified by that address.
Anycast - An identifier for a set of interfaces (typically belonging to different nodes). A packet sent to an anycast address is delivered to the "nearest," or first, interface in the anycast group.
Multicast - An identifier for a set of interfaces (typically belonging to different nodes). A packet sent to a multicast address is delivered to all interfaces in the multicast group.

To write 128-bit addresses so that they are readable to human eyes, IPv6's architects abandoned dotted-decimal notation in favor of a hexadecimal format. Therefore, IPv6 can be written as 32 hex digits, with colons separating the values of the eight 16-bit pieces of the address, as shown in Figure .

Under current plans, IPv6 nodes that connect to the Internet will use what is called an aggregatable global unicast address, which is the counterpart to IPv4 global addresses that you are already familiar with. Like CIDR-enhanced IPv4, aggregatable global unicast addresses rely on hierarchy to keep Internet routing tables manageable. IPv6 global unicast addresses feature three levels of hierarchy:

Public topology - The collection of providers that provide Internet connectivity
Site topology - The level local to an organization that does not provide connectivity to nodes outside itself
Interface identifier - The level specific to a node's individual interface

This three-level hierarchy is reflected by the structure of the aggregatable global unicast address, which includes the following fields (see Figure ):

FP field (3 bits) - The 3-bit Format Prefix (FP) is used to identify the type of address (unicast, multicast, and so on). The bits 001 identify aggregatable global unicasts.
TLA ID field (13 bits) - The Top-Level Aggregation Identifier (TLA ID) field is used to identify the authority responsible for the address at the highest level of the routing hierarchy. Internet routers will necessarily maintain routes to all TLA IDs. With 13 bits set aside, this field can represent up to 8,192 TLAs.
Res field (8 bits) - IPv6 architect defined the reserved (Res) field so that the TLA or NLA IDs could be expanded as future growth warrants. Currently, this field must be set to zero.
NLA ID field (24 bits) - The Next-Level Aggregation Identifier (NLA ID) field is used to identify ISPs. The field itself can be organized hierarchically to reflect a hierarchy, or multitiered relationship, among providers.
SLA ID field (16 bits) - The Site-Level Aggregation Identifier (SLA ID) is used by an individual organization to create its own local addressing hierarchy and to identify subnets.
Interface ID field (64 bits) - The Interface ID field is used to identify individual interfaces on a link. This field is analogous to the host portion of an IPv4 address, but it is derived using the IEEE EUI-64 format, which, on LAN interfaces, adds a 16-bit field to the interface's MAC address.

In addition to the global unicast address space, IPv6 offers internal network numbers, or "site local use" addresses, which are analogous to RFC 1918 addresses. If a node is not addressed with a global unicast address or an internal site local use address, it can be addressed using a link local use address, which is specific a network segment.

In this chapter, you learned how subnet masks, VLSMs, private addressing, and network address translation could enable more efficient use of IP addresses. You learned that hierarchical addressing allows for efficient allocation of addresses and reduced number of routing table entries. VLSMs, specifically, provide the capability to include more than one subnet mask within a network and the capability to subnet an already subnetted network address. Proper IP addressing is required to ensure the most efficient network operations.

Many of the scalable design features explored in the first two chapters, such as load balancing and route summarization, work very differently depending on the routing protocol used. Routing protocols are the rules that govern the exchange of routing information between routers. The open architecture and global popularity of TCP/IP has encouraged the development of more than a half-dozen prominent IP routing protocols, each with its unique combination of strengths and weaknesses. Because routing protocols are key to network performance, you must have a clear understanding of each different protocol's attributes: convergence times, overhead, and scalability features.

This chapter explores the routing process including: default routing, floating static routes, convergence, and route calculation.

One of a router's primary jobs is to make decisions about which path is the best path to a given destination. A router learns paths, called routes, from an administrator's configuration or from other routers via routing protocols. Routers keep a routing table in RAM. A routing table is a list of the best available routes. Routers use this table to make decisions about how to forward a packet. You can issue the show ip route command to view the TCP/IP routing table, an example of which is shown in Figure .

A routing table maps network prefixes to an outbound interface. Consider the routing table shown in Figure . When RTA receives a packet destined for 192.168.4.46, it looks for the prefix 192.168.4.0/24 in its table. RTA then forwards the packet out an interface (Ethernet0) based on the routing table entry. If RTA receives a packet destined for 10.3.21.5, it sends that packet out Serial0 (S0).

The first few lines in Figure list possible codes that designate how the IP router learned the route. The example table shows four routes for directly connected networks. They are labeled with a C in the routing table. RTA drops any packet destined for a network that is not listed in the routing table. To forward to other destinations, RTA's table will have to include more routes. New routes can be added via one of two methods:

Static routing - An administrator manually defines routes to one or more destination networks.
Dynamic routing - Routers follow rules defined by a routing protocol to exchange routing information and independently select the best path.

Administratively defined routes are said to be static because they do not change until a network administrator manually programs the changes. Routes learned from other routers are dynamic because they can change automatically as neighboring routers update each other with new information. Each method has fundamental advantages and disadvantages, as listed in Figure and .

The following sections describe how to configure both static and dynamic routing on a Cisco router.

Static routing is useful in very simple networks that do not have multiple paths to any destination network. Static routing reduces the memory and processing burdens on a router. Even on large internetworks, administrators often configure static routes on access routers that connect stub networks, or networks that have only one way in and one way out. In Figure , RTZ is configured with a static route to 172.24.4.0 /24.

To configure static routing on a Cisco router, you must use the ip route command. This command uses the following syntax:

Router(config)#ip route destination-prefix destination-prefix-mask {address | interface} [distance] [tag tag] [permanent]

Figure describes the parameters that are used with this command.

You can manually add an entry to a routing table using one of two variations on the ip route command:

RTA(config)#ip route 10.6.0.0 255.255.0.0 s1

RTA(config)#ip route 10.7.0.0 255.255.0.0 10.4.0.2

Both of these global configuration commands will add a static route to the routing table. The first example maps a network prefix (10.6.0.0/16) to a local physical interface (S1) on the router the same way that a directly connected network is mapped to an interface. The second example maps the network prefix (10.7.0.0/16) to the next-hop address (10.4.0.2). Although these commands will both add valid static routes to the router's table, notice that the format of these two static routing table entries is different.

In Figure , the static route to 10.6.0.0 shows as a static route (notice the S at the beginning of the line), but is formatted similar to a directly connected route (even though it is not) because the router has just been configured to forward packets for 10.6.0.0 out S1. The static route to 10.7.0.0, which is configured with a next-hop address, is formatted the same way that dynamic routes are; the next-hop address is included in the routing table because the packets destined for 10.7.0.0 should be forwarded to another router's interface at 10.4.0.2. What is the difference between these two kinds of static routes?

When using a routing protocol such as RIP or IGRP, static routes that show as directly connected will automatically be advertised to other routers as long as the appropriate network command has been issued. The next-hop static route will not be advertised without additional configuration. Static routes can be included in updates if they are injected, or redistributed into the dynamic routing protocol.

When an interface goes down, all static routes mapped to that interface are removed from the IP routing table. In addition, when the router can no longer find a valid next hop for the address specified in a static route, the static route is likewise removed from the table. An alternative method is to map a static IP address to a loopback interface.

Note: As a rule, you should always use the next-hop address when defining a static route on a multi-access network such as Ethernet. A router interface on a multi-access network could have several link partners, so you must use the next-hop address to specify which neighbor should receive traffic for a given network.

Static routes are also good to use when having trouble with a routing protocol. At a multi-campus college, the ISP was upgrading its infrastructure. As each campus was converted to the new ISP infrastructure, problems occurred with the RIP routing protocol. As a quick solution, the WAN staff created static routes until the ISP issue was resolved.

Static routing does not suit large, complex networks that include redundant links, multiple protocols, and meshed topologies. Routers in complex networks must adapt to topology changes quickly and select the best route from multiple candidates. Dynamic routing is the best choice for complex networks or in any other network in which automation is preferred over human intervention.

Dynamic routing of TCP/IP can be implemented using one or more protocols. These protocols are often grouped according to where they are used. Routing protocols designed to work inside an autonomous system are categorized as interior gateway protocols (IGPs), and protocols that work between autonomous systems are classified as exterior gateway protocols (EGPs). Figure lists widely supported EGPs and IGPs for TCP/IP routing.

A comprehensive discussion of EGPs, in particular BGP4, can be found in Chapter 8, BGP. Chapter 3 focuses on IGPs. As shown in Figure , you can further categorize these protocols as either distance-vector routing protocols or link-state routing protocols, depending on their method of operation.

Routing protocols for IPX and AppleTalk
Despite the dominance of IP, a significant number of organizations continue to support legacy protocols, such as Novell's IPX and Apple's AppleTalk. A legacy technology is one that is supported because of a significant past investment or deployment. Many organizations continue to support IPX and AppleTalk to leverage a past investment in protocol-specific printers, software, and servers. Although Cisco's EIGRP offers comprehensive support for both IPX and AppleTalk, it is important to be familiar with the names of the following proprietary routing protocols: IPX RIP (or Novell RIP), NetWare Link Services Protocol (NLSP), and AppleTalk's Routing Table Maintenance Protocol (RTMP).

Implementing these Apple and Novell proprietary routing protocols is beyond the scope of this curriculum.

IP routing protocols and the routing table
The Cisco IOS commands to enable dynamic routing vary depending on the routing protocol used. Figure displays the routing table of a router configured to use four IP routing protocols: RIP, IGRP, EIGRP, and OSPF. Note that most organizations would not normally use more than one or two routing protocols. This example is provided to show different types of routing table entries.

Figure dissects the specific table entry for 192.168.1.0/24. Routes in the routing table that are not directly connected include two numbers offset by brackets, in the form [administrative distance/metric]. Therefore, [120/3] means that the administrative distance is 120 and the metric is 3. Routers base their evaluations of routes on these two numbers. Since this is a RIP route, the metric represents hop count.

Routers use metrics to evaluate, or measure, routes. When multiple routes to the same network exist and the routes are from the same routing protocol, the route with the lowest metric is considered the best. IP RIP uses only one factor to determine the metric: hop count. In the sample entry shown in Figure , the number 3 indicates that the destination network is three hops away, which is a better metric than four hops.

Each routing protocol calculates its metrics differently. For example, EIGRP uses a sophisticated combination of factors that typically includes bandwidth and reliability to calculate a metric. With default settings, EIGRP's metric for the same route to 192.168.1.0 is 3,219,456! If RTA receives a RIP update and an EIGRP update for this same network, how can the router compare what is, in effect, three apples against more than 3 million oranges? That is where administrative distance comes in.

When a router receives updates from different routing protocols about the same network, it can not use dissimilar metrics to evaluate a route. It uses administrative distance to decide which protocol to believe. The Cisco IOS assigns a default administrative distance to every routing protocol; the lower the value, the more trustworthy the routing protocol. A complete list of administrative distances can be found in Chapter 7, Route Optimization.

Routing protocols may be classified as either distance-vector or link-state routing protocols. These classifications describe the algorithm, or formula, that routers use to calculate and exchange routing information. Distance-vector routing protocols are based on the Bellman-Ford algorithm (also known as a distance-vector algorithm).

Routers configured to use a distance-vector routing protocol typically send their complete routing table at regular intervals to neighbor routers. In fact, simple distance-vector protocols, such as RIP and IGRP, broadcast (or multicast) their routing table out all configured interfaces, by default. Routers that use these protocols do not actually identify their neighbors for direct communication.

A neighbor router receiving the broadcast update examines it and compares the information to its current routing table. Routes to new networks, or routes to known networks with better metrics, are inserted in the table. The neighbor then broadcasts its routing table, which includes any updated routes.

Distance-vector routing protocols are concerned with the distance and vector (direction) of destination networks. Before sending an update, each router adds its own distance value to the route's metric. When a router receives an update, it maps the learned network to the receiving interface. The router then uses that interface to reach those destinations.

Simple distance-vector routing protocols enjoy two major benefits over link-state protocols. They are relatively easy to configure, and they generally use less memory and processing power. RIPv1 has the added advantage of almost universal support among all routing software and is often used as a common denominator in mixed-vendor or legacy routing environments.

Simple distance-vector routing protocols do not scale as well as their link-state counterparts. RIPv1 and IGRP are classful routing protocols (they do not send subnet information in updates), so they can not support scalability features such as Variable Length Subnet Masking (VLSM) or supernetting. In general, simple distance-vector routing protocols converge more slowly than link-state protocols. Because complex and scalable internetworks demand routing protocols that are quick to achieve convergence (when all routers agree on the state of the network's topology), distance-vector protocols often are not appropriate. Finally, RIP restricts networks from growing beyond 15 hops between any two destinations, a limitation that proves too stifling for today's large networks. IGRP overcomes this limitation by supporting a 255-hop maximum. However, IGRP is a Cisco-proprietary protocol and therefore cannot support a multi-vendor routing environment.

Because of the limitations of simple distance-vector routing protocols, network administrators often turn to link-state routing in complex internetworks.

Link-state routing protocols offer greater scalability and faster convergence than distance-vector protocols such as RIP and IGRP. Unfortunately, these advantages come at a price. Link-state protocols require more memory and processing power from the router, and more knowledge and expertise from the administrator than do distance-vector protocols.

Link-state protocols are based on the Dijkstra algorithm, sometimes referred to as the Shortest Path First (SPF) algorithm. The most common link-state routing protocol, Open Shortest Path First (OSPF), is examined in Chapter 4, OSPF in a Single Area, and Chapter 5, Multiarea OSPF.

Routers running a link-state protocol, such as OSPF, are concerned with the states (for example, up or down) of links (interfaces on other routers) in the network. A link-state router builds a complete database of all the link states of every router in its area. In other words, a link-state router gathers enough information to create its own map of the network. Each router then individually runs the SPF algorithm on its own map, or link-state database, to identify the best paths to be installed in the routing table. These paths to other networks form a tree with the local router as its root.

Instead of learning routes and then broadcasting the routes with incremented distances to neighbors, link-state routers advertise the states of their links to all other routers in the area so that each router can build a complete link-state database. These advertisements are called link-state advertisements (LSAs). Unlike distance-vector routers, link-state routers can form special relationships with their neighbors and other link-state routers, to ensure that the LSA information is properly and efficiently exchanged.

After an initial flood of LSAs provides routers with the information that they need to build a link-state database, routing updates occur only when a link-state changes, or, if no changes have occurred, after a specific interval. If a link state changes, a partial update is sent immediately. The partial update contains only link states that have changed, not a complete routing table. An administrator concerned about WAN link utilization will find these partial and infrequent updates an efficient alternative to distance-vector routing, which sends out a complete routing table every minute or so. Moreover, when a change occurs, link-state routers are all notified immediately by the partial update. Distance-vector routers have to wait for neighbors to note the change, increment the change, and then pass it on to the next neighbor down the line.

The benefits of link-state routing include faster convergence and improved bandwidth utilization over distance-vector protocols. Link-state protocols support Classless Inter-Domain Routing (CIDR), VLSM, and supernetting. This makes them a good choice for complex, scalable networks. In fact, link-state protocols generally outperform distance-vector protocols on any size network. So why are not link-state protocols used exclusively for routing? Link-state protocols have two major disadvantages:

Link-state routing may overtax low-end hardware. Link-state routers require more memory and processing power than distance-vector routers, which potentially makes link-state routing cost-prohibitive for organizations with tight budgets and legacy hardware.
Link-state protocols require complex administration. Configuring link-state routing can be a daunting task, and many administrators prefer to avoid its complexity and stick to distance-vector routing. Even capable administrators may opt for a straightforward distance-vector protocol on simple networks.

Cisco's proprietary EIGRP is an advanced distance-vector protocol that also employs the best features of link-state routing. For the most part, EIGRP configuration is similar to configuring a simple distance-vector protocol such as IGRP. However, like their link-state counterparts, EIGRP routers use partial updates, special neighbor relationships, and topological databases to provide optimal convergence. Sometimes called a hybrid protocol, EIGRP is discussed thoroughly in Chapter 6.

It is not feasible, or even desirable, for every router to maintain routes to every possible destination. Instead, routers can keep a default route, or a gateway of last resort. Default routes are used when the router can not match a destination network with a more specific entry in the routing table; thus, the gateway of last resort. In effect, the router uses the default route to hand off to another router. The other router must have either a route to that destination or its own default route to a third router. If it is a default route to a third router, that router must have either the route to the destination or another default route, and so on. Eventually, the packet should be routed to a router that actually has a route to the destination.

A key scalability feature is that default routes keep routing tables as lean as possible. They make it possible for routers to forward packets destined to any Internet host without having to maintain a table entry for every Internet network. Default routes can be statically entered by an administrator or dynamically learned via a routing protocol.

Default routing begins with the administrator. Before routers can dynamically exchange default information, an administrator must configure at least one router with a default route. An administrator can use two very different commands to statically configure default routes: ip route 0.0.0.0 0.0.0.0 and ip default-network.

The following sections explore these two methods in detail.

Creating an ip route to 0.0.0.0/0 is the simplest way to configure a default route. This is done using the following syntax:

Router(config)# ip route 0.0.0.0 0.0.0.0 [next-hop-ip-address | exit-interface]

To the Cisco IOS, network 0.0.0.0 /0 has special meaning as the gateway of last resort. All destination addresses match this route because a mask of all 0s requires none of the 32 bits in an address to be an exact match.

A route to 0.0.0.0 /0 is often called a "quad-zero route." Manually configuring 0.0.0.0/0 routes on every router might suffice in a simple network. You may want routers to dynamically exchange default routes in more complex situations. The exchange of default information works differently depending on the routing protocol being used and can create severe problems when improperly configured. Remember, default routes typically point to the outside world, so when they fail, everyone tends to notice!

Static routes to 0.0.0.0/0 are automatically propagated to other routers in RIP updates. The only way to stop this automatic update is to use a route filter, a configuration option that is discussed in Chapter 7, Route Optimization.

However, in IOS release 12.1, RIP does not propagate a static default route automatically. If you are using RIP and IOS 12.1, you must manually configure the RIP process to advertise the static default by issuing the network 0.0.0.0 command.

Alternately, you can use either the default-information originate command or the redistribute static command to configure static default route propagation. OSPF (regardless of the IOS version) requires the default-information originate command if you want to propagate static default routes. The following example illustrates this configuration for RIP (see the figure).

RTY(config)# ip route 0.0.0.0 0.0.0.0 172.16.1.2
RTY(config)# router rip
RTY(config-router)# default-information originate

Using the default-information originate command, an administrator can statically configure a single RIP router with a 0.0.0.0/0 route, and that default route will be propagated to other routers. The default-information originate command can also be used with OSPF to achieve the same effect.

IGRP treats 0.0.0.0/0 routes differently. In fact, IGRP does not recognize the network 0.0.0.0/0 and will not include it in updates. To configure a dynamic exchange of default information in an IGRP network, you must use the ip default-network command. The ip default-network command can be used to flag a route to any IP network, not just 0.0.0.0/0, as a candidate default route, using the following command syntax:

Router(config)#ip default-network ip-network-address

A candidate default route is denoted by an asterisk in the routing table and will be considered (along with any other candidates) for the role as gateway of last resort. As an example of how this command works, consider the internetwork shown in the figure.

As the "boundary router," RTB must be manually configured to send default traffic to its link partner, the ISP router. You could configure a 0.0.0.0/0 route on RTB, but this route will not be propagated by IGRP to the other routers. To avoid manually configuring 0.0.0.0/0 routes on all routers, you can configure RTB to flag its route to 207.21.20.0/24 as a candidate default route, as shown:

RTB(config)#ip default-network 207.21.20.0

The network 207.21.20.0/24 now has special properties as an exterior network (the outside network that serves as a gateway of last resort). RTB will send this information in IGRP routing updates to RTA and RTC. These routers can now dynamically learn that network 207.21.20.0/24 is an exterior network, making RTB the gateway of last resort for both of these routers. Both RTA and RTC will propagate this route, flagged as a candidate default, to other IGRP neighbors, if present.

Unlike a static 0.0.0.0/0 route configuration, the ip default-network command provides an administrator with a great deal of flexibility. In complex topologies, several networks can be flagged as candidate defaults. Routers can then choose from among the available candidates to pick the lowest-cost route.

If you are running IGRP, you must use the ip default-network command to enable the exchange of default information. If you are using RIP, a 0.0.0.0/0 route will usually suffice. However, you can use the ip default-network command on a RIP router, but RIP routers propagate IP default networks as 0.0.0.0/0 routes. In other words, a RIP router configured with the ip default-network 192.168.1.0 command will send neighbors a route to 0.0.0.0/0, not a flagged route to 192.168.1.0.

Note that the ip default-network command is classful, which means that if the router has a route to the subnet indicated by this command, it installs a static route to the major net and then flags that route as a candidate default route.

A router does not use a gateway of last resort for addresses that are part of its local domain. A local domain is a major network to which the router is connected. In the figure, RTX has two interfaces configured with IP addresses that belong to the major network, 172.16.0.0.

If all three routers are running IGRP, RTX will not learn about the subnet 172.16.1.1/30 because a variable-length subnet mask is used. (IGRP does not support VLSM.) So, after statically configuring a 0.0.0.0/0 route, RTX's routing table is shown in the figure. (Click on the topology to view the table).

What happens when you issue a ping to 172.16.1.1 from RTX? Because the network 172.16.1.0/30 is not in RTX's routing table, you may expect RTX to use its default route to send the ping to RTZ. However, because RTX has interfaces connected to the major net 172.16.0.0, RTX considers 172.16.0.0 a local domain and will not use a default route to reach 172.16.1.0 or any other local domain address. Without additional configuration, RTX's ping to 172.16.1.1 will fail.

You can solve this problem in several ways. The first and best approach is to configure the router with the ip classless global configuration command. With ip classless enabled (which is the default in Cisco IOS versions 11.3 and greater), the router uses the best prefix match available, including a supernet route, such as 172.0.0.0/8 or, ultimately, 0.0.0.0/0. By enabling ip classless you can get RTX to use the 0.0.0.0/0 route to reach unknown subnets within its local domain, 172.16.0.0.

A second approach is to give RTX an explicit route for the major network 172.16.0.0, as shown:

RTX(config)#ip route 172.16.0.0 255.255.0.0 172.16.3.1

Without a more specific route available for 172.16.1.1, RTX uses this static route to the major network number 172.16.0.0/16 and will successfully route packets destined for 172.16.1.1.

Static routing has disadvantages, one of which is that it cannot adapt to topology changes. However, you can configure static routing to have limited adaptability by creating floating static routes.

Floating static routes are static routes configured with an administrative distance value that is greater than that of the primary route (or routes). Essentially, floating static routes are fallback routes, or backup routes, that do not appear in the routing table until another route to the same destination fails. For example, assume that RTB is connected to network 10.0.0.0/8 via two different links, as shown in Figure . Also, assume that RTB's preferred route to network 10.0.0.0/8 is via RTC because that link has a higher bandwidth. This route is learned by RIP. RTB should use the slower link to 10.0.0.0/8 via RTA only if the primary route fails. The route to RTA is statically configured.

To statically configure RTB so that it will use the slower link to reach 10.0.0.0/8 after the RIP route fails, you must use a floating static route as shown:

RTB(config)#ip route 10.0.0.0 255.0.0.0 1.1.1.1 130

This ip route command includes an administrative distance of 130. Recall that static routes have a default administrative distance of 1 (see Figure ). To create a static route that will float (that is, wait for another route to fail before entering the routing table) you must manually set an administrative distance value. This value must be greater than the primary route's administrative distance value. In this example, the primary route is learned by RIP and thus has an administrative distance of 120. By configuring the static route with an administrative distance of 130, the static route will be less desirable than the primary route and the RIP route via RTC is always preferred. However, if the RIP route is lost, the floating static route takes its place in the routing table. Figure shows RTB's routing table with the RIP route and then, after the RIP route is lost, RTB's routing table with the floating static route.

The output in Figure includes output from the debug ip routing command, which details the loss of the primary route and the subsequent installation of the floating static route.

Floating static routes can be used in conjunction with other static routes to create a semi-adaptable static routing scheme. Consider this configuration :

RTZ(config)#ip route 0.0.0.0 0.0.0.0 s0
RTZ(config)#ip route 0.0.0.0 0.0.0.0 s1 5
RTZ(config)#ip route 4.0.0.0 255.0.0.0 s2
RTZ(config)#ip route 4.0.0.0 255.0.0.0 s3 5
RTZ(config)#ip route 4.0.0.0 255.0.0.0 s4 10

If RTZ is configured with these commands, it installs one route to 0.0.0.0/0 (using S0) and one route to 4.0.0.0/8 (using S2). If S0 becomes unavailable, RTZ will install the floating static route to 0.0.0.0/0 (using S1) into its routing table. If S2 fails, RTZ will fall back to using S3 to reach 4.0.0.0/8. Finally, if both S2 and S3 go down, RTZ will use the least desirable static route to 4.0.0.0/8, with an administrative distance 10.

This process is both collaborative and independent; the routers share information with each other, but must individually recalculate their own routing tables. For individual routing tables to be accurate, all routers must have a common view of the network's topology. When all routers in a network agree on the topology, they have converged. Rapid convergence means rapid recovery from link failure or other network changes. Routing protocols and network designs are ultimately judged by how quickly they converge.

When routers are in the process of converging, the network is susceptible to routing problems. If some routers learn that a link is down and others incorrectly believe that the link is still up, their individual tables will be contradictory and may lead to dropped packets or devastating routing loops.

It is virtually impossible for all routers in a network to simultaneously detect a topology change. In fact, depending on the routing protocol in use, a significant amount of time may pass before all the routers in a network converge. Many factors affect convergence time, including the following:

The routing protocol used
A router's distance (that is, the number of hops) from the point of change
The number of routers in the network that use dynamic routing protocols
Bandwidth and traffic load on communications links
A router's load
Traffic patterns in relation to the topology change

The effects of some of these factors can be minimized through careful network design. For instance, a network can be designed to minimize the load on any given router or communications link. Other factors, such as the number of routers in the network, must be accepted as risks inherent in a network's design. Large internetworks, however, can reduce the number of routers that must converge by using static default routes for stub networks.

Although proper network design can significantly reduce convergence time, a routing protocol's capability to update and calculate routes efficiently can also improve convergence.

A routing protocol's capability to update and calculate routes efficiently is based on several factors:

Whether the protocol calculates and stores multiple routes to each destination
The manner in which routing updates are initiated
The metrics used to calculate distances or costs

The following sections discuss these three factors in detail.

Multiple routes to a single destination
Some routing protocols allow the router to install only a single route to a destination network in its routing table. Other routing protocols permit the router to store multiple routes to each destination, at the cost of additional overhead. One advantage of multiple routes is that equal-cost load balancing or unequal-cost load balancing can be used. Another advantage is that maintaining multiple routes to a single destination reduces a network's vulnerability to routing loops and dropped packets when a link fails. If a router maintains two different routes to 10.0.0.0 and one route fails, the router can continue to route to 10.0.0.0 using the second route, without waiting for an alternate route to propagate. Maintaining multiple routes does not reduce convergence time, but it can insulate a router from instabilities during the convergence process.

Routing protocols can instruct a router to update neighbors after a specific amount of time has passed, after a certain event has occurred, or both. Time-driven routing protocols wait for the update timer to expire and then send an update. For example, RIP sends a complete update every 30 seconds by default even if its routing table is unchanged since the last update. By contrast, protocols that are event-driven do not require the router to update neighbors until the router detects a change in the network topology. Link-state protocols (and EIGRP) send a partial update that concerns only the changed information. Other protocols may send their entire table when triggered by an event.

As you might expect, routing protocols that are exclusively time-driven react poorly to topology changes. If a router detects a change but has to wait 30 seconds before alerting neighbors, routing in that network could break down. It could take several minutes before such a network's routers converge. In the meantime, routers unaware of the change may send packets the wrong way, leading to routing loops or loss of connectivity.

On the other hand, routing protocols that are exclusively event-driven theoretically could go months without sending updates. If there is no other mechanism to ensure that routers regularly communicate (such as a Hello protocol), routers could base their routing decisions on dangerously outdated information.

For these reasons, most routing protocols use a combination of time-driven and event-driven updates. RIP is time-driven, but Cisco's implementation of RIP sends triggered updates whenever a change is detected. Likewise, topology changes trigger immediate updates in IGRP routers, regardless of the update timer. Without triggered updates, RIP and IGRP would perform miserably.

Protocols that are primarily event-driven typically use timers as well. For instance, OSPF routers typically assign a MaxAge to routing information. Once information has reached its MaxAge, it can no longer be used in the routing table, and a new update must be requested.

A routing metric is a value that measures the desirability. Some routing protocols use only one factor to calculate a metric. For example, IP RIP uses hop count as the only factor to determine the metric of a route. Other protocols base their metric on two, three, or even five different factors, such as hop count, bandwidth, delay, load, and reliability.

Some factors, such as bandwidth and delay, are static which means that they remain the same for each interface until the router is reconfigured or the network is redesigned. Other factors, such as load and reliability, are dynamic which means that they are calculated for each interface in real time by the router.

The more factors that make up a metric, the greater your ability is to tailor network operation to meet specific needs. For example, IGRP by default uses two static factors to calculate metric: bandwidth and delay. You can configure these two factors manually, which allows for precise control over what routes a router chooses. IGRP can also be configured to include two dynamic factors in the metric calculation: load and reliability. By using dynamic factors, IGRP routers can make decisions based on conditions at that moment. Thus, if a link becomes heavily loaded or unreliable, IGRP will increase the metric of routes using that link. Alternate routes may present a lower metric than the downgraded route and would be used instead.

One of the most important decisions in a network's design is the selection of a routing protocol or protocols. Such selection should be done carefully and with an appreciation for the long-term implications of your decisions. This chapter has provided an overview of the various ways that routing can be performed, as well as the benefits and limitations of each.

Open Shortest Path First (OSPF) is a link-state routing protocol based on open standards. Described in several RFCs, most recently RFC 2328, the Open in Open Shortest Path First means that OSPF is open to the public and nonproprietary. Among nonproprietary routing protocols, such as RIPv1 and RIPv2, OSPF is preferred because of its remarkable scalability. Recall that both versions of RIP are very limited. RIP can not scale beyond 15 hops, it converges slowly, and it chooses suboptimal routes that ignore critical factors such as bandwidth. OSPF addresses all of these limitations and proves to be a robust, scalable routing protocol appropriate for today's networks.

OSPF's considerable capability to scale is achieved through hierarchical design. You can divide an OSPF network into multiple areas, which allows for extensive control of routing updates. By defining areas in a properly designed network, an administrator can reduce routing overhead and improve performance. Multiarea OSPF is discussed in Chapter 5, Multiarea OSPF.

OSPF uses link-state technology , as opposed to distance vector technology used by protocols such as RIP . Link-state routers maintain a common picture of the network and exchange link information upon initial discovery or network changes. Link-state routers do not broadcast their routing tables periodically like distance vector routing protocols do. While RIP is appropriate for small networks, OSPF was written to address the needs of large, scalable internetworks. OSPF addresses the following issues:

Speed of convergence - In large networks, RIP convergence can take several minutes, since the entire routing table of each router is copied and shared with directly connected neighboring routers. In addition, a distance vector routing algorithm may experience hold down and/or route-aging periods. With OSPF, convergence is faster because only the routing changes (not the entire routing table) are flooded rapidly to other routers in the OSPF network.
Support for Variable-Length Subnet Masking (VLSM) - RIPv1 is a classful protocol and does not support VLSM. In contrast, OSPF, a classless protocol, supports VLSM. (Note: RIPv2 supports VLSM.)
Network size - In a RIP environment, a network that is more than 15 hops away is considered unreachable. Such limitations restrict the size of a RIP network to small topologies. On the other hand, OSPF has virtually no reachability limitations and is appropriate for intermediate to large size networks.
Use of bandwidth - RIP broadcasts full routing tables to all neighbors every 30 seconds. This is especially problematic over slow WAN links because these updates consume bandwidth. Alternately, OSPF multicasts minimally sized link-state updates and sends the updates only when there is a network change.
Path Selection - RIP selects a path by measuring the hop count, or distance, to other routers. It does not take into consideration the available bandwidth on the link or delays in the network. In contrast, OSPF selects optimal routes using cost as a factor ("cost" is a metric based on bandwidth).
Grouping of members - RIP uses a flat topology and all routers are part of the same network. Thus, communication between routers at each end of the network must travel through the entire network. Unfortunately, changes in even one router will affect every device in the RIP network. OSPF, on the other hand, uses the concept of "areas" and can effectively segment a network into smaller clusters of routers. By narrowing the scope of communication within areas, OSPF limits traffic regionally and can prevent changes in one area from affecting performance in other areas. This use of areas allows a network to scale efficiently.

Although OSPF was written for large networks, implementing it requires proper design and planning, which is especially important if your network has more than 50 routers. At this size, it is important to configure your network to let OSPF reduce traffic and combine routing information whenever possible.

As a link-state protocol, OSPF operates differently than the distance-vector routing protocols. Link-state routers identify and communicate with their neighbors so that they can gather firsthand information from other routers in the network. The OSPF terminology is depicted in Figure ; a brief description of each term is given.

The information gathered from OSPF neighbors is not a complete routing table. Instead, OSPF routers tell each other about the status of their connections, or "links," to the internetwork. In other words, OSPF routers advertise their link states. The routers process this information and build a link-state database , which is essentially a picture of who is connected to what. All routers in a given area should have identical link-state databases. Independently, each router then runs the Shortest Path First (SPF) algorithm, also known as the Dijkstra algorithm, on the link-state database to determine the best routes to a destination. The SPF algorithm adds up the cost (which is a value usually based on bandwidth) of each link between the router and the destination. The router then chooses the lowest-cost path to add to its routing table, also known as a forwarding database.

OSPF routers keep track of their neighbors in their adjacencies database. To simplify the exchange of routing information among several neighbors on the same network, OSPF routers may elect a Designated Router (DR) and a Backup Designated Router (BDR) to serve as focal points for routing updates.

OSPF routers establish relationships, or states, with their neighbors for efficiently sharing link-state information. In contrast, distance-vector routing protocols, such as RIP, blindly broadcast or multicast their complete routing table out every interface, hoping that a router is out there to receive it. Every 30 seconds, by default, RIP routers send only one kind of message - their complete routing table. OSPF routers, on the other hand, rely on five different kinds of packets to identify their neighbors and to update link-state routing information.

These five packet types make OSPF capable of sophisticated and complex communications. These packet types will be discussed in more detail later in the chapter. At this point, you should become familiar with the different relationships, or states, that are possible between OSPF routers, the different OSPF network types, and the OSPF Hello protocol.

OSPF States
The key to effectively designing and troubleshooting OSPF networks is to understand the relationships, or states, that develop between OSPF routers. OSPF interfaces can be in one of seven states. OSPF neighbor relationships progress through these states, one at a time, in the order presented.

Down State
In the Down state, the OSPF process has not exchanged information with any neighbor. OSPF is waiting to enter the next state, which is the Init state.
Init State
OSPF routers send Type 1 (hello) packets at regular intervals (usually 10 seconds) to establish a relationship with neighbor routers. When an interface receives its first hello packet, the router enters the Init state, which means the router knows a neighbor is out there and is waiting to take the relationship to the next step.

There are two kinds of relationships - two-way state and adjacency - although there are many phases in between. A router must receive a hello from a neighbor before it can establish any relationship.
Two-Way State
Using hello packets, every OSPF router tries to establish a Two-way state, or bi-directional communication, with every neighbor router on the same IP network. Among other things, hello packets include a list of the sender's known OSPF neighbors. A router enters the Two-Way state when it sees itself in a neighbor's hello. For example, as shown in Figure , when RTB learns that RTA knows about RTB, RTB declares a two-way state to exist with RTA.

The Two-Way state is the most basic relationship that OSPF neighbors can have, but routing information is not shared between routers in this relationship. To learn about other routers' link states and eventually build a routing table, every OSPF router must form at least one adjacency. An adjacency is an advanced relationship between OSPF routers that involves a series of progressive states that rely not just on hellos, but also on the other four types of OSPF packets. Routers that attempt to become adjacent to one another exchange routing information even before the adjacency is fully established. The first step toward full adjacency is the ExStart state, which is described next.
ExStart State
Technically, when a router and its neighbor enter the ExStart state, their conversation is characterized as an adjacency, but the routers have not become fully adjacent yet. ExStart is established using Type 2 database description (DBD) packets, also known as DDPs. The two neighbor routers use hello packets to negotiate who is the "master" and who is the "slave" in their relationship and DBD packets to exchange databases.

The router with the highest OSPF router ID "wins" and becomes master. (The OSPF router ID is discussed later in this chapter.) When the neighbors establish their roles as master and slave, they enter the Exchange state and begin sending routing information.
Exchange State
In the Exchange state, neighbor routers use Type 2 DBD packets to send each other their link-state information . In other words, the routers describe their link-state databases to each other. The routers compare what they learn with their existing link-state databases. If either of the routers receives information about a link that is not already in its database, the router requests a complete update from its neighbor. Complete routing information is exchanged in the Loading state.
Loading State
After the databases have been described to each router, they may request information that is more complete by using Type 3 packets, link-state requests (LSRs). When a router receives an LSR, it responds with an update by using a Type 4 link-state update (LSU) packet. These Type 4 LSU packets contain the actual link-state advertisements (LSAs), which are the heart of link-state routing protocols. As shown in Figure , Type 4 LSUs are acknowledged using Type 5 packets, called link-state acknowledgments (LSAcks).
Full Adjacency
With the Loading state complete, the routers are fully adjacent. Each router keeps a list of adjacent neighbors, called the adjacency database. Do not confuse the adjacency database with the link-state database or the forwarding database.

Because adjacency is required for OSPF routers to share routing information, a router will try to become adjacent to at least one other router on each IP network to which it is connected. Some routers may try to become adjacent to all their neighbor routers, and others may try with only one or two. OSPF routers determine which routers to become adjacent to based on what type of network connects them.

OSPF interfaces automatically recognize three types of networks: broadcast multiaccess, nonbroadcast multiaccess (NBMA), and point-to-point networks . An administrator can configure a fourth network type, a point-to-multipoint network. The four network types are listed in Figure .

The type of network dictates how OSPF routers should relate to each other. An administrator may have to override the detected network type in order for OSPF to operate properly.

Some networks are defined as multiaccess because you can not predict just how many routers are connected to them - it may be one, two, or more. A campus that uses a switched Ethernet core may have half a dozen routers connected to the same backbone network. A school district might have 10, 12, or 25 remote-site routers connected via Frame Relay PVCs to the same IP subnet.

Because a significant number of routers can exist on a multiaccess network, OSPF's designers developed a system to avoid the overhead that would be created if every router established full adjacency with every other router. This system restricts who can become adjacent to whom by employing the services of one of the following:

Designated router (DR) - For every multiaccess IP network, one router will be elected the DR. This DR has two main functions: first, to become adjacent to all other routers on the network, and second, to act as a spokesperson for the network. As spokesperson, the DR will send network LSAs for all other IP networks to every other router. Because the DR becomes adjacent to all other routers on the IP network, it is the focal point for collecting routing information (LSAs).
Backup designated router (BDR) - The DR could represent a single point of failure, so a second router is elected as the BDR to provide fault tolerance. Thus, the BDR must also become adjacent to all routers on the network and must serve as a second focal point for LSAs, as shown in Figure . However, unlike the DR, the BDR is not responsible for updating the other routers or sending network LSAs. Instead, the BDR keeps a timer on the DR's update activity to ensure that it is operational. If the BDR does not detect activity from the DR before the timer expires, the BDR takes over the role of DR and a new BDR is elected.

On point-to-point networks, only two nodes exist. Therefore, a focal point for routing information is not needed. No DR or BDR is elected. Both routers become fully adjacent to one and other.

When a router starts an OSPF routing process on an interface, it sends a hello packet and continues to send hellos at regular intervals. The rules that govern the exchange of OSPF hello packets are collectively referred to as the Hello Protocol.

At Layer 3 of the OSI model, hello packets are addressed to the multicast address 224.0.0.5. This address effectively means "all OSPF routers." OSPF routers use hello packets to initiate new adjacencies and to ensure that adjacent neighbors have not disappeared. Hellos are sent every 10 seconds by default on multiaccess and point-to-point networks. On interfaces that connect to NBMA networks, such as Frame Relay, hellos are sent every 30 seconds.

Although the hello packet is small (often less than 50 bytes), hellos contain plenty of vital information. Like OSPF packet types, hello packets include an OSPF packet header, which has the form shown in Figure .

All five types of OSPF packets use the OSPF packet header, which consists of eight fields. The purpose of each of these fields is described below:

Version, Type, and Packet Length - The first three fields of the OSPF packet let the recipients know the version of OSPF that is being used by the sender (version 1 and 2), the OSPF packet type, and length. OSPF version 2 was first introduced in 1991 (RFC 1247) and is not compatible with version 1, which is obsolete. The Cisco IOS uses OSPF version 2 and cannot be configured to use OSPF version 1.
Router ID - The function of the hello packet is to establish and maintain adjacencies, so the sending router signs the fourth field with its router ID, which is a 32-bit number used to identify the router to the OSPF protocol. A router uses its IP address as its ID because both the router ID and the IP address must be unique within a network. Because routers support multiple IP addresses, a loopback IP address is used as the router ID. In the absence of a loopback IP address, the highest-value address interface IP is used as the router ID, regardless of whether that interface is involved in the OSPF process.

If the interface associated with that IP address goes down, the router can no longer use that IP address as its router ID. When a router's ID changes for any reason, the router must reintroduce itself to its neighbors on all links. To avoid the unnecessary overhead caused by re-establishing adjacency and readvertising link states, an administrator typically assigns an IP address to a loopback interface. Unless an administrator shuts down a loopback interface, it always stays up, so loopback interfaces make ideal router IDs.

Note: If a loopback interface is configured with an IP address, the Cisco IOS will use that IP address as the router ID, even if the other interfaces have higher addresses.

Area ID - You can define multiple areas within an OSPF network to reduce and summarize route information, which allows large and complex networks to continue to grow. When configuring a single-area OSPF network, you should always use Area 0 because it is defined as the "backbone" area. You must have a backbone area to scale (add other OSPF areas).
Checksum - As you may have seen with other protocols, a 2-byte checksum field is used to check the message for errors. Good packets are retained and damaged packets are discarded.
Authentication Type and Authentication Data - OSPF supports different methods of authentication so that OSPF routers will not believe just anyone sending hellos to 224.0.0.5. Routers with unequal authentication fields will not accept OSPF information from each other.

The hello header , which is found only in Type-1 hello packets, carries essential information. The following are the fields in the hello header:

Network Mask - This 32-bit field carries subnet mask information for the network.
Hello Interval and Dead Interval - The hello interval is the number of seconds that an OSPF router waits to send the next hello packet. The default for multiaccess broadcast and point-to-point networks is 10 seconds. The dead interval is the number of seconds that a router waits before it declares a neighbor down (if the neighbor's hello packets are no longer being received). The dead interval is four times the hello interval by default, or 40 seconds. Both of these intervals are configurable, which is the reason they are advertised. If two routers have different hello intervals or if they have different dead intervals, they will not accept OSPF information from each other.
Options - The router can use this field to indicate optional configurations, including the stub area flag, which is discussed in Chapter 5.
Router Priority - This field contains a value that indicates the priority of this router when selecting a designated router (DR) and backup designated router (BDR). The default priority is 1 and can be configured to a higher number to ensure that a specified router becomes the DR.
Designated Router and Backup Designated Router - The router IDs of the DR and BDR are listed here, if known by the source of the hello packet.
Neighbor Address - If the source of the hello packet has received a valid hello from any neighbor within the dead interval, its router ID is included here.

OSPF routers progress through five distinct steps of operation:

Establish router adjacencies.
Elect a DR and BDR (if necessary).
Discover routes.
Select the appropriate routes to use.
Maintain routing information.

The following sections describe each of these steps in detail.

A router's first step in OSPF operation is to establish router adjacencies. Each of the three routers shown in the figure attempts to become adjacent to another router on the same IP network.

To become adjacent with another router, RTB sends hello packets, advertising its own router ID. Because no loopback interfaces are present, RTB chooses its highest IP address, 10.6.0.1, as its router ID.

Assuming that RTB is appropriately configured, RTB multicasts hello packets out both S0 and E0. RTA and RTC should both receive the hello packets. These two routers then add RTB to the Neighbor ID field of their respective hello packets and enter the Init state with RTB.

RTB receives hello packets from both of its neighbors and sees its own ID number (10.6.0.1) in the Neighbor ID field. RTB declares a Two-Way state between itself and RTA, and a Two-Way state between itself and RTC.

At this point, RTB determines which routers to establish adjacencies with, based on the type of network that a particular interface resides on. If the network type is point-to-point, the router becomes adjacent with its sole link partner. If the network type is multiaccess, RTB enters the election process to become a DR or BDR, unless both roles are already established (as advertised in the hello packet header).

If an election is necessary, OSPF routers will proceed as described in the next section, Step 2: Elect a DR and a BDR. However, if an election is not necessary, the routers will enter the ExStart state, as described in the section, Step 3: Discover Routes.

Because multiaccess networks can support more than two routers, OSPF elects a DR to be the focal point of all link-state updates and LSAs. The DR's role is critical, therefore a BDR is elected to "shadow" the DR. In the event that the DR fails, the BDR can smoothly take over.

Like any election, the DR/BDR selection process can be rigged. The "ballots" are hello packets, which contain a router's ID and priority fields. The router with the highest priority value among adjacent neighbors wins the election and becomes the DR. The router with the second-highest priority is elected the BDR. When the DR and BDR have been elected, they keep their roles until one of them fails, even if additional routers with higher priorities show up on the network. Hello packets inform newcomers of the identity of the existing DR and BDR.

OSPF routers all have the same priority value by default: 1. You can assign a priority from 0 to 255 on any given OSPF interface. A priority of 0 prevents the router from winning any election on that interface. A priority of 255 ensures at least a tie. The Router ID field is used to break ties; if two routers have the same priority, the router with the highest ID will be selected. You can manipulate the router ID by configuring an address on a loopback interface, although that is not the preferred way to control the DR/BDR election process. The priority value should be used instead because each interface can have its own unique priority value. You can easily configure a router to win an election on one interface, and lose an election on another.

How does the DR election process affect the example network? As shown in the figure, RTB and RTC are connected via PPP on a point-to-point link. Thus, there is no need for a DR on the network 10.6.0.0/16 because only two routers can exist on this link.

Because 10.4.0.0/16 and 10.5.0.0/16 networks are multiaccess Ethernet networks, they may potentially connect more than two routers. Even if only one router is connected to a multiaccess segment, a DR is still elected because the potential exists for more routers to be added to the network. Thus, a DR must be elected on both 10.4.0.0/16 and 10.5.0.0/16.

Note: DRs and BDRs are elected on a per-network basis. An OSPF area can contain more than one IP network, so each area can (and usually does) have multiple DRs and BDRs.

In the example topology, RTA serves a dual role as both the DR and the BDR. Because it is the only router on the 10.4.0.0/16 network, RTA elects itself as the DR. After all, the 10.4.0.0/16 network is a multiaccess Ethernet network, so a DR is elected because multiple routers could potentially be added to this network. RTA is also the runner-up in the election for 10.5.0.0/16 and thus the BDR for that network. Despite claiming equal priority value with RTA, RTB is elected as DR for 10.5.0.0/16 by virtue of the tie-breaker: a higher router ID (10.5.0.2 vs. 10.5.0.1).

With elections complete and bi-directional communication established, routers are ready to share routing information with adjacent routers and build their link-state databases. This process is discussed in the next section.

On a multiaccess network, the exchange of routing information occurs between the DR or BDR and every other router on the network. As the DR and BDR on the 10.5.0.0 /16 network, RTA and RTB will exchange link-state information.

Link partners on a point-to-point or point-to-multipoint network also engage in the exchange process. That means that RTB and RTC will share link-state data. -

However, who goes first? This question is answered in the first stage of the Exchange process, the ExStart state. The purpose of ExStart is to establish a master/slave relationship between the two routers. The router that announces the highest router ID in the hello packet acts as master, as shown in Figure . The master router orchestrates the exchange of link-state information, while the slave router responds to prompts from the master. RTB engages in this process with both RTA and RTC.

After the routers define their roles as master and slave, they enter the Exchange state. As shown in Figure , the master leads the slave through a swap of DBDs that describe each router's link-state database in limited detail. These descriptions include the link-state type, the address of the advertising router, the cost of the link, and a sequence number.

The routers acknowledge the receipt of a DBD by sending an LSAck (Type 5) packet, which echoes back the DBD's sequence number. Each router compares the information that it receives in the DBD with the information that it already has. If the DBD advertises a new or more up-to-date link state, the router will enter the Loading state by sending an LSR (Type 3) packet about that entry. In response to the LSR, a router sends the complete link-state information, using an LSU (Type 4) packet. LSUs carry LSAs.

With the Loading state complete, the routers have achieved full adjacency (entered into the Full state). RTB is now adjacent to RTA and to RTC. Adjacent routers must be in the Full state before they can create their routing tables and route traffic. At this point, the neighbor routers should all have identical link-state databases.

After a router has a complete link-state database, it is ready to create its routing table so that it can forward traffic. As mentioned earlier in the chapter, OSPF uses the metric value called cost to determine the best path to a destination (see the figure to the left). The default cost value is based on media bandwidth. In general, cost decreases as the speed of the link increases. RTB's 10-Mbps Ethernet interface, for example, has a lower cost than its T1 serial line because 10 Mbps is faster than 1.544 Mbps.

To calculate the lowest cost to a destination, RTB uses the SPF algorithm. In simple terms, the SPF algorithm adds up the total costs between the local router (called the root) and each destination network. If there are multiple paths to a destination, the lowest-cost path is preferred. By default, OSPF keeps up to four equal-cost route entries in the routing table for load balancing.

Sometimes a link, such as a serial line, will go up and down rapidly (a condition called flapping). If a flapping link causes LSUs to be generated, routers that receive those updates must rerun the SPF algorithm to recalculate routes. Prolonged flapping can severely affect performance. Repeated SPF calculations can overtax the router's CPU; moreover, the constant updates may prevent link-state databases from converging.

To combat this problem, the Cisco IOS uses an SPF hold timer. After receiving an LSU, the SPF hold timer determines how long a router will wait before running the SPF algorithm. The timers spf command enables you to adjust the timer, which defaults to 10 seconds.

After RTB has selected the best routes using the SPF algorithm, it moves into the final phase of OSPF operation.

When an OSPF router has installed routes in its routing table, it must diligently maintain routing information. When there is a change in a link-state, OSPF routers use a flooding process to notify other routers on the network about the change. The Hello protocol's dead interval provides a simple mechanism for declaring a link partner down. If RTB does not hear from RTA for a time period exceeding the dead interval (usually 40 seconds), RTB declares its link to RTA down.

RTB then sends an LSU packet containing the new link-state information, but to whom?

On a point-to-point network, no DR or BDR exists. New link-state information is sent to the 224.0.0.5 multicast address. All OSPF routers listen at this address.
On a multiaccess network, a DR and BDR exist and maintain adjacencies with all other OSPF routers on the network. If a DR or BDR needs to send a link-state update, it will send it to all OSPF routers at 224.0.0.5. However, the other routers on a multiaccess network are adjacent only to the DR and the BDR and thus can send LSUs only to them. For that reason, the DR and BDR have their own multicast address, 224.0.0.6. Non-DR/BDR routers send their LSUs to 224.0.0.6, or "all DR/BDR routers"

When the DR receives and acknowledges the LSU destined for 224.0.0.6, it floods the LSU to all OSPF routers on the network at 224.0.0.5 . Each router acknowledges receipt of the LSU with an LSAck.

If an OSPF router is connected to another network, it floods the LSU to other networks by forwarding the LSU to the DR of the multiaccess network, or to an adjacent router if in a point-to-point network . The DR, in turn, multicasts the LSU to the other OSPF routers in that network.

Upon receiving an LSU that includes new information, an OSPF router updates its link-state database. It then runs the SPF algorithm using the new information to recalculate the routing table. After the SPF hold timer expires, the router switches over to the new routing table.

If a route already exists in a Cisco router, the old route is used while the SPF algorithm is calculating the new information. If the SPF algorithm is calculating a new route, the router will not use that route until after the SPF calculation is complete.

It is important to note that even if a change in link state does not occur, OSPF routing information is periodically refreshed. Each LSA entry has its own age timer. The default timer value is 30 minutes. After an LSA entry ages out, the router that originated the entry sends an LSU to the network to verify that the link is still active.

In this section, you will learn how to configure OSPF on routers within a single area.

To configure OSPF, you must enable OSPF on the router and configure the router's network addresses and area information , according to the following steps:

Enable OSPF on the router using the following command:

router(config)# router ospf process-id

The process ID is a process number on the local router. The process ID is used to identify multiple OSPF processes on the same router. The number can be any value between 1 and 65,535. You do not have to start numbering OSPF processes at 1. Most network administrators keep the same process ID throughout the entire AS. It is possible to run multiple OSPF processes on the same router, but is not recommended because it creates multiple database instances that add extra overhead to the router.
Identify IP networks on the router, using the following command:

router(config-router)# network address wildcard-mask area area-id

For each network, you must identify the area to which the network belongs. The network value can be the network address, subnet, or the address of the interface. The router knows how to interpret the address by comparing the address to the wildcard mask. A wildcard mask is necessary because OSPF supports CIDR and VLSM, unlike RIPv1 and IGRP. The area argument is needed even when configuring OSPF in a single area. Again note that more than one IP network can belong to the same area.

Configuring a Loopback Address
When the OSPF process starts, the Cisco IOS uses the highest local IP address as its OSPF router ID. If a loopback interface is configured, that address is used, regardless of its value. You can assign an IP to a loopback interface with the following commands:

router(config)#interface loopback number
router(config-if)#ip address ip-address subnet-mask

A loopback-derived router ID ensures stability because that interface is immune to link failure. The loopback interface must be configured before the OSPF process starts, to override the highest interface IP address.

It is recommended that you use the loopback address on all key routers in your OSPF-based network. To avoid routing problems, it is good practice to use a 32-bit subnet mask when configuring a loopback IP address, as shown:

router(config)#interface loopback0
router(config-if)#ip address 192.168.1.1 255.255.255.255

A 32-bit mask is sometimes called a host mask, because it specifies a single host and not a network or subnetwork. Note: To prevent propagation of bogus routes, OSPF always advertises loopback addresses as host routes, with a 32-bit mask.

Modifying OSPF Router Priority
You can manipulate DR/BDR elections by configuring the priority value to a number other than the default value, which is 1. A value of 0 guarantees that the router will not be elected as a DR or BDR. Each OSPF interface can announce a different priority. You can configure the priority value (a number from 0 to 255) with the ip ospf priority command, which has the following syntax:

router(config-if)#ip ospf priority number

To set a router's E0 with a priority of 0 (so that it can not win DR/BDR elections on that network), you would use the commands shown here:

RTB(config)#interface e0
RTB(config-if)#ip ospf priority 0

For the priority value to figure into the election, it must be set before the election takes place. An interface's priority value and other key information can be displayed with the show ip ospf interface command as shown in the figure. The output in this example tells which routers have been elected the DR and BDR, the network type (in this case, broadcast multiaccess), the cost of the link (10), and the timer intervals specific to this interface. The timer intervals configured are Hello (10), Dead (40), Wait (40), Retransmit (5).

OSPF routers use costs associated with interfaces to determine the best route. The Cisco IOS automatically determines cost based on the bandwidth of an interface using the formula:

10⁸/ bandwidth value = 100,000,000 / bandwidth value

Figure shows common default path costs for a variety of media. For OSPF to calculate routes properly, all interfaces connected to the same link must agree on the cost of that link. In a multivendor routing environment, you may override the default cost of an interface to match another vendor's value with the ip ospf cost command, which has the following syntax:

router(config-if)#ip ospf cost number

The new cost can be a number between 1 and 65,535. You can use this command to override the default cost on a router's S0 using these commands:

router(config)#interface s0
router(config-if)#ip ospf cost 1000

The ip ospf cost command can also be used to manipulate the desirability of a route because routers install the lowest-cost paths in their tables.

For the Cisco IOS cost formula to be accurate, serial interfaces must be configured with appropriate bandwidth values. Cisco routers default to T1 (1.544 Mbps) on most serial interfaces and require manual configuration for any other bandwidth, as shown in this example:

router(config)#interface s1
router(config-if)#bandwidth 56

Configuring Authentication
Authentication is another interface-specific configuration. Each OSPF interface on a router can present a different authentication key, which functions as a password among OSPF routers in the same area. The following command syntax is used to configure OSPF authentication:

router(config-if)#ip ospf authentication-key password

After a password is configured, you can enable authentication on an area-wide basis with the following syntax, which must be entered on all participating routers:

router(config-router)#area number authentication [message-digest]

Although the message-digest keyword is optional, it is recommended that you always use it with this command. By default, authentication passwords will be sent in clear text over the wire. A packet sniffer could easily capture an OSPF packet and decode the unencrypted password. However, if the message-digest argument is used, a message digest, or hash, of the password is sent over the wire in place of the password itself. Unless the recipient is configured with the proper authentication key, that person will not be able to make sense of the message digest.

If you choose to use message-digest authentication, the authentication key will not be used. Instead, you must configure a message-digest key on the OSPF router's interface. The syntax for this command is as follows:

router(config-if)#ip ospf message-digest-key key-id md5 [encryption-type] password

Figure describes the ip ospf message-digest-key command parameters.

The following example sets the message-digest key to "itsasecret" and enables message-digest authentication within Area 0.

router(config)#int s0
router(config-if)#ip ospf message-digest-key 1 md5 7 itsasecret
router(config-if)#int e0
router(config-if)#ip ospf message-digest-key 1 md5 7 itsasecret
router(config-if)#router ospf 1
router(config-router)#area 0 authentication message-digest

Remember, you would have to configure the same parameters on the other routers in the same area.

Configuring OSPF Timers
In order for OSPF routers to exchange information, they must have the same hello intervals and the same dead intervals. By default, the dead interval is four times the value of the hello interval. That way, a router has four chances to send a hello packet before being declared dead.

On broadcast OSPF networks, the default hello interval is 10 seconds, and the default dead interval is 40 seconds. On nonbroadcast networks, the default hello interval is 30 seconds, and the default dead interval is 2 minutes (120 seconds).

These default values typically result in efficient OSPF operation and therefore do not need to be modified. You may come across a situation in which the hello and dead intervals need to be adjusted either to improve performance or to match another router's timers. The syntax of the commands needed to configure both the hello and dead intervals is as follows:

router(config-if)#ip ospf hello-interval seconds
router(config-if)#ip ospf dead-interval seconds

The following example sets the hello interval to 5 seconds, and the dead interval to 20 seconds.

router(config)#interface e0
router(config-if)#ip ospf hello-interval 5
router(config-if)#ip ospf dead-interval 20

Note that, although it is advised, the Cisco IOS does not require you to configure the dead interval to be four times the hello interval. If you set the dead interval to be less than that, you increase the risk that a router could be declared dead, when in fact a congested or flapping link has prevented one or two hello packets from reaching their destination.

This chapter has focused on two types of OSPF networks in detail: broadcast multiaccess and point-to-point networks. Even if there is only one router, broadcast multiaccess networks elect a DR and a BDR to serve as focal points for routing information. In contrast, point-to-point OSPF networks do not elect a DR because they can never include more than two nodes.

Another type of OSPF network, Nonbroadcast Multiaccess (NBMA), can include more than two nodes and therefore will try to elect a DR and a BDR. Common NBMA implementations include Frame Relay, X.25, and SMDS. NBMA networks follow rules at Layer 2 that prevent the delivery of broadcasts and multicasts. Figure summarizes the OSPF network types.

NBMA networks can create problems with OSPF operation, specifically with the exchange of multicast hello packets. In the example shown in Figure , RTA, RTB, and RTC belong to the same IP subnetwork and will attempt to elect a DR and a BDR. However, these routers can not hold a valid election if they can not receive multicast hellos from every other router on the network. Without administrative intervention, a strange election takes place. As far as RTA is concerned, RTC is not participating. Likewise, RTC goes through the election process oblivious to RTA. This botched election can lead to problems if the central router, RTB, is not elected the DR.

The Cisco IOS offers several options for configuring OSPF to overcome NBMA limitations, including the OSPF neighbor command, point-to-point subinterfaces, and point-to-multipoint configuration. The solutions that are available to you depend on your NBMA network topology.

Before selecting an OSPF configuration strategy for a Frame Relay network (or legacy X.25 network), you must first understand the different NBMA topologies. Fundamentally, two possible physical topologies exist for Frame Relay networks :

Full-mesh topology
Partial-mesh topology (including the hub-and-spoke topology)

The following sections describe how to configure OSPF in both full-mesh and partial-mesh Frame Relay networks.

Full-Mesh Frame Relay
Organizations deploy Frame Relay primarily because it supports more than one logical connection over a single interface, making it an affordable and flexible choice for WAN links. A full-mesh topology takes advantage of Frame Relay's capability to support multiple permanent virtual circuits (PVCs) on a single serial interface. In a full-mesh topology, every router has a PVC to every other router.

For OSPF to work properly over a multiaccess full-mesh topology that does not support broadcasts, you must manually enter each OSPF neighbor addresses on each router, one at a time. The OSPF neighbor command tells a router about its neighbors' IP addresses so that it can exchange routing information without multicasts. The following example illustrates how the neighbor command is used:

RTA(config)#router ospf 1
RTA(config-router)#network 3.1.1.0 0.0.0.255 area 0
RTA(config-router)#neighbor 3.1.1.2
RTA(config-router)#neighbor 3.1.1.3

Specifying each router's neighbors is not the only option to make OSPF work in this type of environment. The following section explains how configuring subinterfaces can eliminate the need for the neighbor command.

Configuring Subinterfaces to Create Point-to-Point Networks
The IOS subinterface feature can be used to break up a multiaccess network into a collection of point-to-point networks.

In Figure , a different IP subnet is assigned to each PVC. OSPF automatically recognizes this configuration as point-to-point, not NBMA, even with Frame Relay configured on the interfaces. Recall that OSPF point-to-point networks do not elect a DR. Instead, the Frame Relay router uses Inverse ARP or a Frame Relay map to obtain the link partner's address so that routing information can be exchanged.

A full-mesh topology offers numerous advantages, including maximum fault tolerance. Unfortunately, full-mesh topologies can get expensive because each PVC must be leased from a provider. An organization would have to lease 45 PVCs to support just 10 fully meshed routers! If subinterfaces are used to create point-to-point networks, then the 45 IP subnets must also be allocated and managed, which is an additional expense.

Because a full-mesh topology is costly, many organizations implement a partial-mesh topology instead. A partial-mesh topology is any configuration in which at least one router maintains multiple connections to other routers, without being fully meshed. The most cost-effective partial-mesh topology is a hub-and-spoke topology, in which a single router (the hub) connects to multiple spoke routers.

The hub-and-spoke topology is a cost-effective WAN solution that introduces a single point of failure (the hub router). Organizations typically deploy Frame Relay because it is inexpensive, not because it is fault-tolerant. Since dedicated leased lines (not Frame Relay links) typically carry mission-critical data, an economical Frame Relay topology, such as hub-and-spoke, makes sense.

Unfortunately, the neighbor command that worked with a full-mesh topology does not work as well with the hub-and-spoke topology. The hub router in Figure sees all the spoke routers and can send routing information to them using the neighbor command, but the spoke routers can send hellos only to the hub.

The DR/BDR election will be held, but only the hub router sees all of the candidates. Because the hub router must act as the DR for this OSPF network to function properly, you could configure an OSPF interface priority of 0 on all the spoke routers. Recall that a priority of 0 makes it impossible for a router to be elected as DR or a BDR for a network.

A second approach to dealing with this topology is to avoid the DR/BDR issue altogether by breaking the network into point-to-point connections. Point-to-point networks will not elect a DR or a BDR.

Although they make OSPF configuration straightforward, point-to-point networks have major drawbacks when used with a hub-and-spoke topology. Subnets must be allocated for each link, which in turn can lead to WAN addressing that is complex and difficult to manage. The WAN addressing issue can be avoided by using IP unnumbered, but many organizations have WAN-management policies that prevent using this feature. Are there any viable alternatives to a point-to-point configuration? Fortunately, the Cisco IOS offers a relatively new alternative. A hub-and-spoke physical topology can be manually configured as a point-to-multipoint network type, as described in the following section.

In a point-to-multipoint network, a hub router is directly connected to multiple spoke routers, but all the WAN interfaces are addressed on the same subnet.

You saw this logical topology earlier in the chapter, but you learned that OSPF does not work properly as an NBMA OSPF network type. By manually changing the OSPF network type to point-to-multipoint, you can make this logical topology work. Routing between RTA and RTC will go through the router that has virtual circuits to both routers, RTB. Note that it is not necessary to configure neighbors when using this feature (Inverse ARP will discover them).

Point-to-multipoint networks have the following properties:

Adjacencies are established between all neighboring routers. There is no DR or BDR for a point-to-multipoint network. No network LSA is originated for point-to-multipoint networks. Router priority is not configured for point-to-multipoint interfaces or for neighbors on point-to-multipoint networks.
When originating a router LSA, the point-to-multipoint interface is reported as a collection of point-to-point links to all the interface's adjacent neighbors, together with a single stub link advertising the interface's IP address with a cost of 0.

When flooding out a nonbroadcast interface, the LSU or LSAck packet must be replicated to be sent to each of the interface's neighbors.

To configure point-to-multipoint, you must manually override the detected OSPF network type with the following syntax:

router(config-if)#ip ospf network point-to-multipoint

You should also configure the interface with a frame-relay map ip command, as in the following syntax:

router(config-if)#frame-relay map ip address dlci broadcast

The broadcast keyword permits the router to send broadcasts via the specified DLCI to the mapped neighbor or neighbors. If you apply the point-to-multipoint configuration to the example network , you would have to configure two separate frame-relay map statements on the hub router, RTB. Partial configurations for each router are shown in Figure .

In a point-to-multipoint configuration, OSPF treats all router-to-router connections on the nonbroadcast network as if they were point-to-point links. No DR is elected for the network. Neighbors can be manually specified using the neighbor command or can be dynamically discovered using Inverse ARP.

Ultimately, point-to-multipoint OSPF offers efficient operation without administrative complexity.

You can use the commands in the figure to verify that OSPF is working properly. You should become familiar with these commands to ensure your routers are configured correctly and are performing the way they should.

The following commands and their associated options can be used when troubleshooting OSPF:

To clear all routes from the IP routing table use the following command:

router#clear ip route *

To clear a specific route from the IP routing table use the following command:

router#clear ip route A.B.C.D
A.B.C.D Destination network route to delete

To debug OSPF operations use the following debug options:

router#debug ip ospf ?

adj OSPF adjacency events
events OSPF events
flood OSPF flooding
lsa-generation OSPF lsa generation
packet OSPF packets
retransmission OSPF retransmission events
spf OSPF spf
tree OSPF database tree

OSPF is a scalable, standards-based link-state routing protocol. OSPF's benefits include no hop-count limitation, the capability to multicast routing updates, faster convergence rates, and optimal path selection. The basic steps for OSPF operation are as follows:

Establish router adjacencies
Select a designated router and a backup designated router
Discover routes
Select appropriate routes to use
Maintain routing information

In the next chapter, you will learn how to connect multiple OSPF areas in order to support a larger hierarchical routing environment.

As explained in the previous chapter, OSPF relies on complex communications and relationships to maintain a comprehensive link-state database. However, as an OSPF network scales to 100, 500, or even 1000 routers, link-state databases can balloon to include thousands of links. To help OSPF routers route more efficiently and to preserve their CPU and memory resources for the business of switching packets, network engineers divide OSPF networks into multiple areas.

Because it has the capability to break a large network into small manageable units, OSPF enjoys tremendous scalability, but that scalability comes at price. Multiarea OSPF networks can be difficult to design, and typically demand more administrative attention than any other popular interior gateway protocol.

This chapter describes how to create and configure OSPF areas. Specifically, this chapter examines the different OSPF area types, which include stubby, totally stubby, and not-so-stubby areas (NSSAs). Each of these areas use special advertisements to exchange routing information with the rest of the OSPF network, so you will study link-state advertisements (LSAs) in detail. You will also look at the Area 0 backbone rule and how virtual links can work around backbone connectivity problems. Finally, this chapter surveys important show commands that can be used to verify multiarea OSPF operation.

Three issues can overwhelm an OSPF router in a heavily populated OSPF network: high demand for router processing and memory resources, large routing tables, and large topology tables. In a very large internetwork, changes are inevitable. OSPF routers are likely to run SPF calculations frequently, which saps the router of precious CPU cycles and memory resources.

Not only is the routing table frequently recalculated in a large OSPF network, but also, it risks being overstuffed with multiple paths and hundreds of routes. Bloated routing tables make routers less efficient. Finally, the link-state database, which must contain a complete topology of the network, will also threaten to consume resources and bog down the router.

Fortunately, OSPF allows large areas to be separated into smaller, more manageable areas that can exchange summaries of routing information rather than exchange every detail. By splitting the network into manageable pieces, OSPF routers can scale gracefully.

Just how many routers can an OSPF area support? Field studies have shown that a single OSPF area should not stretch beyond 50 routers, although there is no concrete limit. Some areas may do fine with more than 50 routers. Other areas, particularly those with unstable links, may need to operate with fewer than 50 routers. Ultimately, you must determine just how many routers a particular OSPF area can handle. Knowing your network, by tracking performance and monitoring usage, is the only way to accurately gauge whether an OSPF area can support 20, 30, or 60 routers.

OSPF's capability to separate a large internetwork into multiple areas is referred to as hierarchical routing. Hierarchical routing enables you to separate large internetworks into smaller internetworks that are called areas. With this technique, interarea routing still occurs. Interarea routing is the process of exchanging routing information between OSPF areas. However, interarea routing allows OSPF to summarize and contain area-specific information so that many of the smaller internal routing operations, such as recalculating the database, are restricted within an area.

For example, if Area 1, shown in Figure , is having problems with a link going up and down (flapping), routers in other areas do not need to run their Shortest Path First (SPF) calculation because they are isolated from the problems in Area 1.

The hierarchical topology possibilities of OSPF have several important advantages:

Reduced frequency of SPF calculations - Because detailed route information is kept within each area, it is not necessary to flood all link-state changes to all other areas. Thus, only those routers affected by a change need to run the SPF calculation.
Smaller routing tables - When using multiple areas, detailed route entries for specific networks within an area are kept inside the area. Rather than advertise these explicit routes outside the area, you can have the routes summarized into one or more summary routes. Advertising these summaries reduces the amount of LSAs propagated between areas but allows all networks to remain reachable.
Reduced link-state update (LSU) overhead - LSUs can contain a variety of LSA types, including link-state information and summary information. Rather than send an LSU about each network to every area, you can advertise a single route or a few summarized routes between areas to reduce the overhead associated with LSUs that cross multiple areas.

Hierarchical routing increases routing efficiency because it allows you to control the type of routing information that flows into and out of an area. OSPF provides for different types of routing updates, depending on the type of area and the number of areas that a router connects to. The following sections describe the different roles that an OSPF router can play, the types of LSAs that it can use, and the types of areas that it can connect to.

Four different types of OSPF routers exist, as shown in the figure:

Internal router - As discussed previously, routers that have all their interfaces within the same area are called internal routers. Internal routers in the same area have identical link-state databases and run a single copy of the routing algorithm.
Backbone router - Routers that are attached to the backbone area of the OSPF network are called backbone routers. They have at least one interface connected to Area 0 (the backbone area). These routers maintain OSPF routing information using the same procedures and algorithms as internal routers.
Area Border Router (ABR) - ABRs are routers with interfaces attached to multiple areas. They maintain separate link-state databases for each area to which they are connected, and they route traffic destined to or arriving from other areas. ABRs are exit points for the area, which means that routing information destined for another area can travel there only via the local area's ABR. ABRs summarize information about the attached areas from their link-state databases and distribute the information into the backbone. The backbone ABRs then forward the information to all other connected areas. An area can have one or more ABRs.
Autonomous System Boundary Router (ASBR) - ASBRs are routers that have at least one interface connected to an external internetwork (another autonomous system), such as a non-OSPF network. These routers can import non-OSPF network information to the OSPF network, and vice versa (this is referred to as redistribution).

A router can be more than one router type. For example, if a router interconnects to Area 0 and Area 1, as well as to a non-OSPF network, it would be both an ABR and an ASBR.

Multiarea OSPF is scalable because a router's link-state database can include multiple types of LSAs. DRs (Designated Routers) and routers that reside in multiple areas or autonomous systems use special LSAs to send or summarize routing information. The OSPF LSA types are described in Figure .

OSPF Area Types
The characteristics that you assign to an area control the type of route information that it can receive. For example, you may want to minimize the size of routing tables in an OSPF area, in which case you can configure the routers to operate in an area that does not accept external routing information (Type 5 LSAs).
Several area types are possible, some of which are shown in Figure :

Standard area - A standard area can accept link updates and route summaries.
Backbone area (transit area) - When interconnecting multiple areas, the backbone area is the central entity to which all other areas connect. The backbone area is always Area 0. All other areas must connect to this area to exchange route information. The OSPF backbone has all the properties of a standard OSPF area.
Stub area - A stub area is an area that does not accept information about routes external to the autonomous system (that is, the OSPF internetwork), such as routes from non-OSPF sources. If routers need to reach networks outside the autonomous system, they use a default route. A default route is noted as 0.0.0.0/0.
Totally stubby area - A totally stubby area is an area that does not accept external autonomous system (AS) routes and summary routes from other areas internal to the autonomous system. Instead, if the router needs to send a packet to a network external to the area, it sends it using a 0.0.0.0/0 default route. Totally stubby areas are a Cisco proprietary feature.
Not-so-stubby area (NSSA) - An NSSA is an area that is similar to a stub area but allows for importing external routes as Type 7 LSAs and translation of specific Type 7 LSA routes into Type 5 LSAs.

A key difference among these OSPF area types is the way they handle external routes. External routes are injected into OSPF by an ASBR. The ASBR may learn these routes from RIP or some other routing protocol.
You can configure an ASBR to send out two types of external routes into OSPF: Type 1 (denoted in the routing table as E1) and Type 2 (E2). Depending on the type, OSPF calculates the cost of external routes differently, as follows:

E1 - If a packet is an E1, then the metric is calculated by adding the external cost to the internal cost of each link that the packet crosses. You use this packet type when you have multiple ASBRs advertising a route to the same autonomous system.
E2 - If a packet is an E2, then the packet will always have the external cost assigned, no matter where in the area it crosses (this is the default setting on ASBRs). You use this packet type if only one router is advertising a route to the autonomous system. Type 2 routes are preferred over Type 1 routes unless two equal cost routes exist to the destination.

For example, consider the network shown in Figure .

In this network, RTB will receive external RIP routes, including 9.0.0.0/8 from RTA. By default, RTA is sending external routing information using Type 2 metrics. Thus, when RTB sends this route to RTC, the metric for the external route remains the same (in this case, 20). Click on the topology of Figure to compare RTB's table with RTC's table.

Now, if you configure RTA to use a Type 1 metric with external routes, OSPF will increment the metric value of the external route according to its standard cost algorithm. You can see that, in the show ip route output in Figure , the exact same routes now have very different metrics in each table. RTB now increments the external route's metric.

Internal routers, ABRs, ASBRs, and backbone routers each play a role in communicating OSPF routing information in a multiarea network. This section summarizes how the different types of OSPF routers flood information and how they build their routing tables when operating within a multiarea environment.

In Chapter 4, you saw that a packet destined for a network within an area is merely forwarded from one internal router to another until it reaches the destination network.

However, what if a packet must traverse multiple areas as shown in the figure?

In this case, the packet must exit Area 1 via ABR1. ABR1 then sends the packet through the backbone area to ABR2. Finally, ABR2 can forward the packet to an internal router in Area 50. The internal router then delivers the message to the appropriate host on that network.

For the OSPF routers in this example to make these routing decisions, they must build sufficient routing tables by exchanging LSUs. The LSU exchange process within a single OSPF area relies on just two LSA types-Type 1 and Type 2. To distribute routing information to multiple areas efficiently, Type 3 and Type 4 LSAs must be used by ABRs. The following sections describe how LSUs containing the various LSA types are flooded to multiple areas, and how OSPF routers use this information to update their routing tables.

An ABR is responsible for generating routing information about each area to which it is connected and flooding the information through the backbone area to the other areas to which the backbone is connected. The general process for flooding follows these steps:

The routing processes occur within the area, as discussed in Chapter 4. The entire area must be synchronized before the ABR can begin sending summary LSAs to other areas.
The ABR reviews the resulting link-state database and generates summary LSAs (Type 3 or Type 4). By default, the ABR sends summary LSAs for each network that it knows about. To reduce the number of summary LSA entries, you can configure route summarization so that a single IP address can represent multiple networks. To use route summarization, your areas need to use contiguous IP addressing, as discussed in Chapter 2. The better your IP address plan is, the fewer the number of summary LSA entries an ABR will advertise.
The summary LSAs are placed in an LSU and distributed through all ABR interfaces, with the following exceptions:

If the interface is connected to a neighboring router that is in a state below the exchange state, then the summary LSA is not forwarded.
If the interface is connected to a totally stubby area, then the summary LSA is not forwarded.
If the summary LSA includes a Type 5 (external) route and the interface is connected to a stub or totally stubby area, then the LSA is not sent to that area.

After an ABR or ASBR receives summary LSAs, it adds them to its link-state databases and floods them to the local area. The internal routers then assimilate the information into their databases. Remember that OSPF enables you to configure different area types so that you can reduce the number of route entries that internal routers maintain. To minimize routing information, you can define the area as a stub area, a totally stubby area, or an NSSA.

After all routers receive the routing updates, they add them to their link-state databases and recalculate their routing tables. The order in which paths are calculated is as follows:

All routers first calculate the paths to destinations within their area and add these entries into the routing table. These are learned via Type 1 and Type 2 LSAs.
All routers then calculate the paths to the other areas within the internetwork. These paths are learned via interarea route entries, or Type 3 and Type 4 LSAs. If a router has an interarea route to a destination and an intra-area route to the same destination, the intra-area route is kept.
All routers, except those that are in any of the stub area types, then calculate the paths to the AS external (Type 5) destinations.

At this point, a router can reach any network within or outside the OSPF autonomous system.

The following paragraphs cover some of the multiarea OSPF capabilities and associated configurations in more detail. You will learn how to configure an ABR and how to configure route summarization.

Configuring an ABR
There are no special commands to make a router an ABR or an ASBR. The router becomes an ABR as soon as you configure two of its interfaces to operate in different areas, as shown in Figure .

Configuring an ASBR
ASBRs are created when you configure OSPF to import, or redistribute, external routes into OSPF, as shown in Figure .

Note that the ASBR configuration commands shown in Figure include the command redistribute rip. This command tells OSPF to import RIP routing information. A detailed discussion of route redistribution can be found in Chapter 7, Route Optimization.

Recall that summarization is the consolidation of multiple routes into one single, supernet advertisement (see Chapter 2 for more details). Proper summarization requires contiguous (sequential) addressing (for example, 200.10.0.0, 200.10.1.0, 200.10.2.0, and so on). OSPF routers can be manually configured to advertise a supernet route, which is different from an LSA summary route.

Route summarization directly affects the amount of bandwidth, CPU, and memory resources that are consumed by the OSPF process. With summarization, if a network link fails or flaps, the topology change will not be propagated into the backbone (and other areas by way of the backbone). As discussed in previous chapters, route summarization protects routers from needless routing table recalculations. Because the SPF calculation places a significant demand on the router's CPU, proper summarization is an imperative part of OSPF configuration.

OSPF supports two types of summarization:

Interarea route summarization - Interarea route summarization is done on ABRs and applies to routes from within each area. It does not apply to external routes injected into OSPF via redistribution. To take advantage of summarization, network numbers within areas should be contiguous.
External route summarization - External route summarization is specific to external routes that are injected into OSPF via redistribution. Here again, it is important to ensure that external address ranges that are being summarized are contiguous. Summarization of overlapping ranges from two different routers could cause packets to be sent to the wrong destination. Only ASBRs can summarize external routes.

To configure an ABR to summarize routes for a specific area before injecting them into a different area, you use the following syntax:

Router(config-router)# area area-id range address mask

To configure an ASBR to summarize external routes before injecting them into the OSPF domain, you use the following syntax:

Router(config-router)# summary-address address mask

To configure RTA in the figure for external route summarization, you can use the following commands:

RTA(config)# router ospf 1
RTA(config-router)# summary-address 200.9.0.0 255.255.0.0

Once configured, RTA will send only a single summary route, 200.9.0.0/16, into the OSPF domain.

Because RTB sits on the border between Area 0 and Area 1, it should be configured to perform interarea summarization, as shown:

RTB(config)# router ospf 1
RTB(config-router)# area 1 range 192.168.16.0 255.255.252.0

Note that the area 1 range command in this example specifies the area containing the range to be summarized before being injected into Area 0.

Also, note that, depending on your network topology, you may not want to summarize area 0 networks. If you have more than one ABR between an area and the backbone area, for example, sending a summary LSA with the explicit network information will ensure that the shortest path is selected. If you summarize the addresses, a suboptimal path selection may occur.

You can configure an OSPF router interface to either operate in a stub area (does not accept information about routes external to the AS) or as a totally stubby area (does not accept external AS routes and summary routes from other areas internal to the AS). Both of these area types are shown in the figure.

By configuring an area as stub, you can greatly reduce the size of the link-state database inside that area and, as a result, reduce the memory requirements of area routers. Remember that stub areas do not accept Type 5 (that is, external) LSAs.

Because OSPF routers internal to a stub area will not learn about external networks, routing to the outside world is based on a 0.0.0.0/0 default route. When you configure a stub area, the stub's ABR automatically propagates a 0.0.0.0/0 default route within the area.

Stub areas are typically created when you have a hub-and-spoke topology, with the spokes (such as branch offices) configured as stub areas. In the case of a hub-and-spoke topology, the branch office may not need to know about every network at the headquarters site. It can instead use a default route to get there.

To further reduce the number of routes in a table, you can create a totally stubby area, which is a Cisco-specific feature. A totally stubby area is a stub area that blocks external Type 5 LSAs and summary (that is, Type 3 and Type 4) LSAs from entering the area. This way, intra-area routes and the default of 0.0.0.0/0 are the only routes known to the stub area. ABRs inject the default summary link 0.0.0.0/0 into the totally stubby area.

Thus, totally stubby areas further minimize routing information and increase stability and scalability of OSPF internetworks. This is typically a better solution than creating stub areas, unless the target area uses a mix of Cisco and non-Cisco routers. The following sections describe the criteria for determining whether an area should be configured as stub or totally stubby, and the configuration commands necessary to implement these area types.

An area can be qualified as a stub or totally stubby when it meets the following criteria:

There is a single exit point from that area.
The area is not needed as a transit area for virtual links. (Virtual links are discussed at the end of this chapter.)
No ASBR is internal to the stub area.
The area is not the backbone area (Area 0).

These criteria are important because a stub/totally stubby area is configured primarily to exclude external routes. If these criteria are not met, external links may be injected into the area, invalidating their stubby nature.

To configure an area as a stub or totally stubby area, use the following syntax on all router interfaces that are configured to belong to that area:

Router(config-router)#area area-id stub

The optional no-summary keyword is used only on ABRs. This keyword configures the ABR to block interarea summaries (Type 3 and Type 4 LSAs). The no-summary keyword creates a totally stubby area. The area stub command is configured on each router in the stub location, which is essential for the routers to become neighbors and exchange routing information. When this command is configured, the stub routers exchange hello packets with the E bit set to 0. The E bit is in the Options field of the hello packet. It indicates that the area is a stub area. The state of this bit must be agreed upon otherwise the routers will not become neighbors.

On ABRs only, you also have the option of defining the cost of the default route that is automatically injected in the stub/totally stubby area. You use the following syntax to configure the default route's cost:

Router(config-router)#area area-id default-cost cost

In the figure, Area 2 is configured as a stub area. No routes from the external autonomous system will be forwarded into the stub.

The last line in each configuration, area 2 stub, defines the stub area. The area stub default-cost has not been configured on R3, so this router will advertise 0.0.0.0 (the default route) with a default cost metric of 1 plus any internal costs.

The only routes that will appear in R4's routing table are intra-area routes (designated with an O in the routing table), the default route, and interarea routes (both designated with an IA in the routing table; the default route will also be denoted with an asterisk).

Note that each router in the stub must be configured with the area stub command. The area stub command determines whether the routers in the stub become neighbors. This command must be included on all routers in the stub area if they are to exchange routing information.

In the figure, the keyword no-summary has been added to the area stub command on R3. This keyword causes summary routes (delivered by both Type 3 and Type 4 LSAs) to also be blocked from the stub. Each router in the stub picks the closest ABR as a gateway to all other networks outside the area.

The only routes that will appear in R4's routing table are intra-area routes (designated with an O in the routing table) and the default route. No interarea routes (designated with an IA in the routing table) will be included.

Note that it is necessary to configure the no-summary keyword only on the totally stubby border routers, because the area is already configured as stub.

OSPF has certain restrictions when multiple areas are configured. One area must be defined as Area 0, the backbone area. It is called the backbone because all inter-area communication must go through it. Thus, all areas should be physically connected to Area 0 so that the routing information injected into this backbone can be disseminated to other areas. The backbone area must always be configured as Area 0. You cannot make any other area ID function as the backbone.

There are situations, however, when a new area is added after the OSPF internetwork has been designed, and it is not possible to provide that new area with direct access to the backbone. In these cases, a virtual link can be defined to provide the needed connectivity to the backbone area, as shown in Figure . The virtual link provides the disconnected area a logical path to the backbone. All areas must connect directly to the backbone area or through a transit area, as shown in Figure .

The virtual link has the following two requirements:

· It must be established between two routers that share a common area.

· One of these two routers must be connected to the backbone.

When virtual links are used, they require special processing during the SPF calculation. That is, the "real" next-hop router must be determined so that the true cost to reach a destination across the backbone can be calculated.

Virtual links serve the following purposes:

· They can link an area that does not have a physical connection to the backbone. This linking could occur, for example, when two organizations merge.

· They can patch the backbone if discontinuity in Area 0 occurs. Discontinuity of the backbone might occur, for example, if two companies merge their two separate OSPF networks into a single one with a common Area 0. The only alternative for the companies is to redesign the entire OSPF network and create a unified backbone.

Another reason for creating a virtual link is to add redundancy in cases when router failure might cause the backbone to be split into two.

In Figure , the disconnected Area 0s are linked via a virtual link through the common area, Area 3. If a common area does not already exist, one can be created to become the transit area. Area 0 could become partitioned, for example, if two OSPF networks were merged.

To configure a virtual link, perform the following steps:

Configure OSPF, as described previously in the " Using and Configuring OSPF Multiarea Components " section.
On each router that will use the virtual link, create the "virtual link" configuration. The routers that make the links are the ABR that connects the remote area to the transit area and the ABR that connects the transit area to the backbone area.

router(config-router)#area area-id virtual-link router-id

If you do not know the neighbor's Router ID, you can Telnet to it and type the show ip ospf command. The results are shown in Figure .

In the figure, Area 3 does not have a direct physical connection to the backbone (Area 0), which is an OSPF requirement because the backbone is a collection point for LSAs. ABRs forward summary LSAs to the backbone, which in turn forwards the traffic to all areas. All interarea traffic transits the backbone.

To provide connectivity to the backbone, a virtual link must be configured between R2 and R1. Area 1 will be the transit area and R1 will be the entry point into area 0. R2 will have a logical connection to the backbone through the transit area.

Both sides of the virtual link must be configured, as follows:

R2(config-router)#area 1 virtual-link 10.3.10.5 --- With this command, area 1 is defined to be the transit area and the router ID of the other side of the virtual link is configured.
R1(config-router)#area 1 virtual-link 10.7.20.123 --- With this command, area 1 is defined to be the transit area and the router ID of the other side of the virtual link is configured.

NSSAs are a relatively new, standards-based OSPF enhancement. To understand how to use NSSAs, consider the network shown in Figure .

RTA connects to an external RIP domain, and RTB currently serves as an ABR for Area 0. If the RIP domain is not under your administrative control, what options do you have to exchange routing information between these two domains? If you are going to use dynamic routing, you could create an OSPF standard area, as shown in Figure .

However, what if the routers that you place in Area 1 do not have the required processing power or memory to run OSPF? You have learned that you can reduce the burden on OSPF routers by configuring them to participate in a stub or totally stubby area. Figure illustrates what would happen in this case.

A stub area cannot include an ASBR because Type 5 (external) LSAs are not allowed in a stub domain. The configuration shown in Figure would fail miserably.

So, how do you dynamically exchange external routing information without creating a standard OSPF area? You could configure another routing protocol, such as RIP or IGRP, in place of creating an Area 1. This may prove to be disadvantageous because an additional routing protocol must be maintained and imported into OSPF (and the RIP domain, is not under your administrative control).

With the introduction of the NSSA, you have another, more palatable option. An NSSA acts like a stub network in the sense that it does not allow Type 5 LSAs. It can also be configured to prevent floods of Type 3 and Type 4 summary LSAs, just as a totally stubby area would. However, an NSSA does allow Type 7 LSAs, which can carry external routing information and be flooded throughout the NSSA.

Note: NSSAs are supported in Cisco IOS version 11.2 and later.

By configuring an area as an NSSA, you can minimize routing tables within the area but still import external routing information into OSPF.

Figure illustrates the example network, including an NSSA implementation. RTA can import external routes as Type 7 LSAs, and ABRs will translate Type 7 LSAs into Type 5 LSAs as they leave the NSSA. A benefit of Type 7 LSAs is that they can be summarized. The OSPF specification prohibits the summarizing or filtering of Type 5 LSAs. It is an OSPF requirement that Type 5 LSAs always be flooding throughout a routing domain. When you define an NSSA, you can import specific external routes as Type 7 LSAs into the NSSA. In addition, when translating Type 7 LSAs to be imported into nonstub areas, you can summarize or filter the LSAs before importing them as Type 5 LSAs.

NSSAs are often used when a remote site (which uses RIP or IGRP) must be connected to a central site using OSPF. You can use NSSA to simplify the administration of this kind of topology. Before NSSA, the connection between the corporate site ABR and the remote router used RIP or EIGRP. This meant maintaining two routing protocols. Now, with NSSA, you can extend OSPF to handle the remote connection by defining the area between the corporate router and the remote router as an NSSA.

In Figure , the central site and branch office are interconnected through a slow WAN link. The branch office is not using OSPF, but the central site is. If you configure a standard OSPF area between the two networks, the slow WAN link could be overwhelmed by the ensuing flood of LSAs, especially Type 5 external LSAs. As an alternative, you could configure a RIP domain between the two networks, but that would mean running two routing protocols on the central site's routers. A more attractive solution is to configure an OSPF area and define it as a NSSA.

In this scenario, RTA is defined as an ASBR. It is configured to redistribute any routes within the RIP/EIGRP domain to the NSSA. The following is a description of what happens when the area between the connecting routers is defined as an NSSA:

RTA receives RIP or EIGRP routes for networks 10.10.0.0/16, 10.11.0.0/16, and 20.0.0.0/8.
Because RTA is also connected to an NSSA, it redistributes the RIP or EIGRP routes as Type 7 LSAs into the NSSA.
RTB, an ABR between the NSSA and the backbone Area 0, receives the Type 7 LSAs.
After the SPF calculation on the forwarding database, RTB translates the Type 7 LSAs into Type 5 LSAs and then floods them throughout Area 0.

It is at this point that RTB could have summarized routes 10.10.0.0/16 and 10.11.0.0/16 as 10.0.0.0/8, or could have filtered one or more of the routes.

To configure an OSPF area as a NSSA, you must configure all OSPF router interfaces that belong to the area using the following command syntax:

Router(config-router)#area area-id nssa [no-summary]

Typically, you should use the optional keyword no-summary when configuring NSSA on an ABR. This prevents Type 3 and Type 4 summary routes from flooding the NSSA area and minimizes the routing tables within the area. In effect, the no-summary keyword makes the NSSA totally stubby.

Optionally, you can control the summarization or filtering during the translation using the following syntax:

Router(config)#summary-address prefix mask [not-advertise] [tag tag]

The not-advertise keyword is used to suppress routes that match the prefix/mask pair. This keyword applies to OSPF only. The tag value can also be assigned but is not required. Route tags can be used in policy routing and are discussed in Chapter 6, EIGRP.

To verify that NSSA is defined on a given router, you can use the show ip ospf command . This command shows you general OSPF-configured parameters, including the number of NSSAs to which the router is attached and whether the router is performing LSA translation.

In addition to the show commands discussed in Chapter 4, several key OSPF commands can be used to verify multiarea operation.

show ip ospf border-routers - Displays the internal OSPF routing table entries to an ABR.
show ip ospf virtual-links - Displays parameters about the current state of OSPF virtual links.
show ip ospf process-id - Displays information about each area to which the router is connected, and indicates whether the router is an ABR, ASBR, or both. The process ID is a user defined identification parameter. It is locally assigned and can be any positive integer number. The number used here is the number assigned administratively when enabling the OSPF routing process.
show ip ospf database - Displays the contents of the topological database maintained by the router. Several keywords can be used with this command to get specific information about the following links:

show ip ospf [process-id area-id] database [router] - Displays individual router link-state information.
show ip ospf [process-id area-id] database [network] - Displays network link-state information. The area ID is the area number associated with the OSPF address range defined in the network router configuration command when defining a particular area.
show ip ospf [process-id area-id] database [summary] - Displays summary information about ABR link states.
show ip ospf [process-id area-id] database [asbr-summary] - Displays information about ASBR link-states.
show ip ospf [process-id area-id] database [external] - Displays information about autonomous system external link states.
show ip ospf [process-id area-id] database [database-summary] - Displays database summary information and totals.

In this chapter, you learned the advantages of multiple area OSPF configurations and the OSPF components used in a large multiple area OSPF internetwork. The benefits of multiarea configurations include reduced frequency of SPF calculations, smaller routing tables, and reduced LSU overhead. You learned about the types of areas (including stub, totally stubby, NSSA), OSPF router types (including ABRs and ASBRs), link-state advertisements, and virtual links.

Enhanced Interior Gateway Routing Protocol (EIGRP) is a Cisco-proprietary routing protocol based on IGRP. Unlike IGRP, which is a classful routing protocol, EIGRP supports CIDR, allowing network designers to maximize address space by using CIDR and VLSM. Compared to IGRP, EIGRP boasts faster convergence times, improved scalability, and superior handling of routing loops. Furthermore, EIGRP can replace Novell RIP and AppleTalk Routing Table Maintenance Protocol (RTMP), serving both IPX and AppleTalk networks with powerful efficiency.

You may have heard EIGRP described as a hybrid routing protocol offering the best of distance-vector and link-state algorithms. Technically, EIGRP is an advanced distance-vector routing protocol that relies on features commonly associated with link-state protocols. Some of OSPF's best traits, such as partial updates and neighbor discovery, are similarly put to use by EIGRP. However, OSPF's benefits, especially its hierarchical design, come at a price: administrative complexity. As seen in Chapter 5, Multiarea OSPF, multiarea implementation of OSPF requires mastery of a complex terminology and command set. On the other hand, EIGRP's advanced features can be easily implemented and maintained. Although it does not mirror OSPF's classic hierarchical design, EIGRP is an ideal choice for large, multiprotocol networks built primarily on Cisco routers.

This chapter surveys EIGRP's key concepts, technologies, and data structures. This conceptual overview is then followed by a study of EIGRP convergence and basic operation. Finally, this chapter shows how to configure and verify EIGRP, including using route summarization.

Cisco released EIGRP in 1994 as a scalable, improved version of its proprietary distance-vector routing protocol, IGRP. IGRP and EIGRP are compatible with each other, although EIGRP offers multiprotocol support and IGRP does not.

Despite being compatible with IGRP, EIGRP uses a different metric calculation and hop-count limitation. EIGRP scales IGRP's metric by a factor of 256. That is because EIGRP uses a metric that is 32 bits long, and IGRP uses a 24-bit metric. By multiplying or dividing by 256, EIGRP can easily exchange information with IGRP.

EIGRP also imposes a maximum hop limit of 224, slightly less than IGRP's generous 255, but more than enough to support today's largest internetworks.

Getting dissimilar routing protocols, such as OSPF and RIP, to share information requires advanced configuration. However sharing, or redistribution, is automatic between IGRP and EIGRP as long as both processes use the same autonomous system (AS) number. In Figure , RTB automatically redistributes EIGRP-learned routes to the IGRP AS, and vice versa.

EIGRP will tag routes learned from IGRP (or any outside source) as external because they did not originate from EIGRP routers. On the other hand, IGRP cannot differentiate between internal and external routes. Notice that in the show ip route command output for the routers in Figure , EIGRP routes are flagged with D, and external routes are denoted by EX. RTA identifies the difference between the network learned via EIGRP (172.16.0.0) and the network that was redistributed from IGRP (192.168.1.0). In RTC's table, we see that IGRP makes no such distinction.

RTC, which is running IGRP only, just sees IGRP routes, despite the fact that both 10.1.1.0 and 172.16.0.0 were redistributed from EIGRP.

Even though it is compatible with IGRP, EIGRP operates quite differently than its predecessor. As an advanced distance-vector routing protocol, EIGRP acts like a link-state protocol when updating neighbors and maintaining routing information. EIGRP's advantages over simple distance-vector protocols include the following:

Rapid convergence - EIGRP routers converge quickly because they rely on a state-of-the-art routing algorithm called the Diffusing Update Algorithm (DUAL). DUAL guarantees loop-free operation at every instant throughout a route computation and allows all routers involved in a topology change to synchronize at the same time.
Efficient use of bandwidth - EIGRP makes efficient use of bandwidth by sending partial, bounded updates and its minimal consumption of bandwidth when the network is stable.

Partial, bounded updates - EIGRP routers make partial, incremental updates rather than sending their complete tables. This may remind you of OSPF operation, but unlike OSPF routers, EIGRP routers send these partial updates only to the routers that need the information, not to all routers in an area. For this reason, they are called bounded updates.
Minimal consumption of bandwidth when the network is stable - Instead of using timed routing updates; EIGRP routers keep in touch with each other using small hello packets. Though exchanged regularly, hello packets do not eat up a significant amount of bandwidth.

Support for VLSM and CIDR - Unlike IGRP, EIGRP offers full support for classless IP by exchanging subnet masks in routing updates.
Multiple network-layer support - EIGRP supports IP, IPX, and AppleTalk through protocol-dependent modules (PDMs).
Independence from routed protocols - PDMs protect EIGRP from painstaking revision. Evolution of a routed protocol, such as IP, may require a new protocol module, but not necessarily a reworking of EIGRP itself.

In a legacy NetWare network, servers and routers may be configured to use IPX RIP and the Service Advertising Protocol (SAP) to exchange information with peers. As time-driven protocols, IPX RIP and SAP generate updates every 60 seconds by default. These updates can crowd low-speed WAN links, especially in large internetworks.

EIGRP can redistribute IPX RIP and SAP information to improve overall performance. In effect, EIGRP can take over for these two protocols. An EIGRP router will receive routing and service updates and then update other routers only when changes in the SAP or routing tables occur. Routing updates occur as they would in any EIGRP network - using partial updates. EIGRP sends SAP updates incrementally on all serial interfaces by default. However, you must manually configure incremental SAP updates on LAN interfaces (e.g., Ethernet, Token Ring, and FDDI).

Like IP RIP, IPX RIP restricts a network's diameter to 15 hops. By using EIGRP to redistribute IPX RIP, a network's diameter can expand to EIGRP's comfortable limit of 224 hops. Moreover, EIGRP's more advanced metric, which uses bandwidth and delay, replaces Novell RIP's less optimal metric derived from hop count and ticks.

The obvious shortcomings of IPX RIP and SAP spurred Novell's development of a proprietary link-state routing protocol for NetWare: NetWare Link Services Protocol (NLSP). A link-state protocol, NLSP replaces both RIP and SAP. On servers running NetWare 3.11 or later, administrators can choose between using RIP/SAP or NLSP. Note that since Cisco IOS version 11.1, EIGRP can redistribute NLSP as well as IPX RIP.

EIGRP Support for AppleTalk
EIGRP can also take over for AppleTalk's Routing Table Maintenance Protocol (RTMP). As a distance-vector routing protocol, RTMP relies on periodic and complete exchanges of routing information. To reduce overhead, EIGRP redistributes AppleTalk routing information using event-driven updates. EIGRP also uses a configurable composite metric to determine the best route to an AppleTalk network. RTMP uses hop count, which can result in suboptimal routing.

AppleTalk clients expect RTMP information from local routers, so EIGRP for AppleTalk should be run only on a clientless network, such as a WAN link.

EIGRP routers keep route and topology information readily available in RAM so they can react quickly to changes. Like OSPF, EIGRP keeps this information in several tables, or databases. The following terms are related to EIGRP and its tables and are used throughout this chapter:

Neighbor table - Each EIGRP router maintains a neighbor table that lists adjacent routers. This table is comparable to the adjacency database used by OSPF. There is a neighbor table for each protocol that EIGRP supports.
Topology table - Every EIGRP router maintains a topology table for each configured network protocol. This table includes route entries for all destinations that the router has learned. All learned routes to a destination are maintained in the topology table. Each EIGRP router maintains a topology table for each network protocol
Routing table - EIGRP chooses the best routes to a destination from the topology table and places these routes in the routing table. Each EIGRP router maintains a routing table for each network protocol.
Successor - A successor is a route selected as the primary route to use to reach a destination. Successors are the entries kept in the routing table. Multiple successors for a destination can be retained in the routing table.
Feasible successor - A feasible successor is a backup route. These routes are selected at the same time the successors are identified, but are kept in the topology table. Multiple feasible successors for a destination can be retained in the topology table.

EIGRP includes many new technologies, each of which represents an improvement in operating efficiency, rapidity of convergence, or functionality relative to IGRP and other routing protocols. These technologies fall into one of the following four categories:

Neighbor discovery and recovery
Reliable Transport Protocol
DUAL finite-state machine
Protocol-specific modules

The following sections examine these technologies in detail.

Remember that simple distance-vector routers do not establish any relationship with their neighbors. RIP and IGRP routers merely broadcast or multicast updates on configured interfaces. In contrast, EIGRP routers actively establish relationships with their neighbors, much the same way that OSPF routers do.

Figures - illustrate how EIGRP adjacencies are established. EIGRP routers establish adjacencies with neighbor routers by using small hello packets. Hellos are sent by default every 5 seconds. An EIGRP router assumes that, as long as it is receiving hello packets from known neighbors, those neighbors (and their routes) remain viable. By forming adjacencies, EIGRP routers do the following:

Dynamically learn of new routes that join their network
Identify routers that become either unreachable or inoperable
Rediscover routers that had previously been unreachable

RTP is a transport-layer protocol that can guarantee ordered delivery of EIGRP packets to all neighbors. On an IP network, hosts use TCP to sequence packets and ensure their timely delivery. However, EIGRP is protocol-independent (i.e., it does not rely on TCP/IP to exchange routing information the way that RIP, IGRP, and OSPF do). To stay independent of IP, EIGRP uses its own proprietary transport-layer protocol to guarantee delivery of routing information: RTP.

EIGRP can call on RTP to provide reliable or unreliable service as the situation warrants. For example, hello packets do not require the overhead of reliable delivery because they are frequent and should be kept small. Nevertheless, the reliable delivery of other routing information can actually speed convergence because EIGRP routers are not waiting for a timer to expire before they retransmit.

With RTP, EIGRP can multicast and unicast to different peers simultaneously, allowing for maximum efficiency.

The centerpiece of EIGRP is the Diffusing Update Algorithm (DUAL), EIGRP's route-calculation engine. The full name of this technology is DUAL finite-state machine (FSM). An FSM is an abstract machine, not a mechanical device with moving parts. FSMs define a set of possible states that something can go through, what events cause those states, and what events result from those states. Designers use FSMs to describe how a device, computer program, or routing algorithm will react to a set of input events. The DUAL FSM contains all the logic used to calculate and compare routes in an EIGRP network.

DUAL tracks all the routes advertised by neighbors and uses the composite metric of each route to compare them. DUAL also guarantees that each path is loop-free. Lowest-cost paths are then inserted by the DUAL protocol into the routing table.

As noted earlier in the chapter, EIGRP keeps important route and topology information readily available, in a neighbor table and a topology table. These tables supply DUAL with comprehensive route information in case of network disruption. DUAL selects alternate routes quickly by using the information in these tables. If a link goes down, DUAL looks for a feasible successor in its neighbor and topology tables.

A successor is a neighboring router that is currently being used for packet forwarding; it provides the least-cost route to the destination and is not part of a routing loop. Feasible successors provide the next lowest-cost path without introducing routing loops. Feasible successor routes can be used in case the existing route fails. Packets to the destination network are immediately forwarded to the feasible successor, which at that point is promoted to the status of successor as illustrated in Figures - .

Note in the example that router D does not have a feasible successor identified. The FD for router D to router A is 2 and the AD via router C is 3. Because the AD is smaller than the best-route metric but larger than the FD, no feasible successor is placed in the topology table. Router C has a feasible successor identified as well as router E because the route is loop-free and because the AD for the next hop router is less than the FD for the successor.

One of EIGRP's most attractive features is its modular design. Modular, layered designs prove to be the most scalable and adaptable. Support for routed protocols such as IP, IPX, and AppleTalk is included in EIGRP through protocol-dependent modules (PDMs). In theory, EIGRP can easily adapt to new or revised routed protocols (e.g., IPv6) by adding protocol-dependent modules.

Each PDM is responsible for all functions related to its specific routed protocol. The IP-EIGRP module is responsible for the following:

Sending and receiving EIGRP packets that bear IP data
Notifying DUAL of new IP routing information that is received
Maintaining the results of DUAL's routing decisions in the IP routing table
Redistributing routing information that was learned by other IP-capable routing protocols

Like OSPF, EIGRP relies on several different kinds of packets to maintain its various tables and establish complex relationships with neighbor routers Figure . The five EIGRP packet types are listed here:

Hello
Acknowledgment
Update
Query
Reply

The following sections describe these packet types in detail.

Hello Packets
EIGRP relies on hello packets to discover, verify, and rediscover neighbor routers. Rediscovery occurs if EIGRP routers do not receive each other's hellos for a hold time interval but then re-establish communication.

EIGRP routers send hellos at a fixed (and configurable) interval, called the hello interval. The default hello interval depends on the bandwidth of the interface, as shown in Figure .

EIGRP hello packets are multicast. On IP networks, EIGRP routers send hellos to the multicast IP address 224.0.0.10.

An EIGRP router stores information about neighbors in the neighbor table, including the last time that each neighbor responded. That is, if any of its EIGRP packets, hello or otherwise, is received. If a neighbor is not heard from for the duration of the hold time, EIGRP considers that neighbor down, and DUAL must step in to re-evaluate the routing table. By default, the hold time is three times the hello interval, but an administrator can configure both timers as desired.

Recall that OSPF requires neighbor routers to have the same hello and dead intervals to communicate. EIGRP has no such restriction. Neighbor routers learn about each other's respective timers via the exchange of hello packets, and they use that information to forge a stable relationship, despite unlike timers.

Acknowledgment Packets
An EIGRP router uses acknowledgment packets to indicate receipt of any EIGRP packet during a "reliable" exchange. Recall that RTP can provide reliable communication between EIGRP hosts. To be reliable, a sender's message must be acknowledged by the recipient. Acknowledgment packets, which are "dataless" hello packets, are used for this purpose. Unlike multicast hellos, acknowledgment packets are unicast. Note also that acknowledgments can be made by piggybacking on other kinds of EIGRP packets, such as reply packets.

Hello packets are always sent unreliably and thus do not require acknowledgment.

Update Packets
Update packets are used when a router discovers a new neighbor. An EIGRP router sends unicast update packets to that new neighbor so that it can add to its topology table. More than one update packet may be needed to convey all the topology information to the newly discovered neighbor.

Update packets are also used when a router detects a topology change. In this case, the EIGRP router sends a multicast update packet to all neighbors, alerting them to the change.

All update packets are sent reliably.

Query and Reply Packets
EIGRP routers use query packets whenever it needs specific information from one or all of its neighbors. A reply packet is used to respond to a query.

If an EIGRP router loses its successor and cannot find a feasible successor for a route, DUAL places the route in the active state. At this point, the router multicasts a query to all neighbors, searching for a successor to the destination network. Neighbors must send replies that either provide information on successors or indicate that no successor information is available.

Queries can be multicast or unicast, while replies are always unicast. Both packet types are sent reliably.

DUAL can select alternate routes based on the tables kept by EIGRP. By building these tables, every EIGRP router can track all the routing information in an AS, not just the "best" routes.

The following sections examine the neighbor table, the routing table, and the topology table in detail and provide an example of each. In addition, we will look at the various packet types used by EIGRP to build and maintain these tables.

The Neighbor Table
The most important table in EIGRP is the neighbor table (refer to Figure ). The neighbor relationships tracked in the neighbor table is the basis for all the EIGRP routing update and convergence activity.

The neighbor table contains information about adjacent neighboring EIGRP routers. Whenever a new neighbor is discovered, the address of that neighbor and the interface used to reach it are recorded in a new neighbor table entry.

A neighbor table is used to support reliable, sequenced delivery of packets. One field in each row of the table includes the sequence number of the last packet received from that neighbor. EIGRP uses this field to acknowledge a neighbor's transmission and to identify packets that are out of sequence.

As shown in Figure , an EIGRP neighbor table includes the following key elements:

Neighbor address (Address) - The network-layer address of the neighbor router.
Hold time (Hold Uptime) - The interval to wait without receiving anything from a neighbor before considering the link unavailable. Originally, the expected packet was a hello packet, but in current Cisco IOS software releases, any EIGRP packets received after the first hello will reset the timer.
Smooth Round-Trip Timer (SRTT) - The average time that it takes to send and receive packets from a neighbor. This timer is used to determine the retransmit interval (RTO).
Queue count (Q Cnt) - The number of packets waiting in queue to be sent. If this value is constantly higher than zero, then there may be a congestion problem at the router. A zero means that there are no EIGRP packets in the queue.

Note that an EIGRP router can maintain multiple neighbor tables, one for each PDM running (e.g., IP, IPX, and AppleTalk as shown in Figure ). A router must run a unique EIGRP process for each routed protocol.

The Routing Table
The routing table contains the routes installed by DUAL as the best loop-free paths to a given destination as shown in Figure . EIGRP will maintain up to four routes per destination. These routes can be of equal or unequal cost. EIGRP routers maintain a separate routing table for each routed protocol.

The Topology Table
EIGRP uses its topology table to store all the information it needs to calculate a set of distances and vectors to all reachable destinations. EIGRP maintains a separate topology table for each routed protocol. A sample EIGRP topology table is shown in Figure .

The topology table is made up of all the EIGRP routing tables in the autonomous system. By tracking this information, EIGRP routers can find alternate routes quickly. The topology table includes the following fields:

Feasible distance (FD is xxxx) - The feasible distance (FD) is the lowest calculated metric to each destination. For example, in Figure , the feasible distance to 32.0.0.0 is 2195456 as indicated by FD are 2195456.
Route source (via xxx.xxx.xxx.xxx) - The source of the route is the identification number of the router that originally advertised that route. This field is populated only for routes learned externally from the EIGRP network. Route tagging can be particularly useful with policy-based routing. For example, in Figure , the route source to 32.0.0.0 is 200.10.10.10 via 200.10.10.10.
Reported distance (FD/RD) - The reported distance (RD) of the path is the distance reported by an adjacent neighbor to a specific destination. For example, in Figure , the reported distance to 32.0.0.0 is 281600 as indicated by (2195456/281600).

In addition to these fields, each entry includes the interface through which the destination is reachable.

EIGRP sorts the topology table so that the successor routes are at the top, followed by feasible successors. At the bottom, EIGRP lists routes that DUAL believes to be loops in the topology table.

How does an EIGRP router determine which routers are successors and which routers are feasible successors? Assume that RTA's routing table includes a route to Network Z via RTB (see Figure ). From RTA's point of view, RTB is the current successor for Network Z; RTA will forward packets destined for Network Z to RTB. RTA must have at least one successor for Network Z for DUAL to place it in the routing table.

Can RTA have more than one successor for Network Z? If RTC claims to have a route to Network Z with the exact same metric as RTB, then RTA also considers RTC a successor, and DUAL will install a second route to Network Z via RTC (see Figure ).

Any of RTA's other neighbors that advertise a loop-free route to Network Z (but with a RD higher than the best-route metric and lower than the FD) will be identified as feasible successors in the topology table, as shown in Figure .

A router views its feasible successors as neighbors that are downstream, or closer, to the destination than it is. If something goes wrong with the successor, DUAL can quickly identify a feasible successor from the topology table and install a new route to the destination. If no feasible successors to the destination exist, DUAL places the route in the active state. Entries in the topology table can be in one of two states: active or passive. These states identify the status of the route indicated by the entry rather than the status of the entry itself.

A passive route is one that is stable and available for use. An active route is a route in the process of being recomputed by DUAL. Recomputation happens if a route becomes unavailable and DUAL can not find any feasible successors. When this occurs, the router must ask neighbors for help in finding a new, loop-free path to the destination. Neighbor routers are compelled to reply to this query. If a neighbor has a route, it will reply with information about the successor(s). If not, the neighbor notifies the sender that it does not have a route to the destination either.

Excess recomputation is a symptom of network instability and results in poor performance. To prevent convergence problems, DUAL always tries to find a feasible successor before resorting to a recomputation. If a feasible successor is available, DUAL can quickly install the new route and avoid recomputation.

"Stuck in Active" Routes
If one or more routers to which a query is sent do not respond with a reply within the active time of 180 seconds (3 minutes), the route, or routes, in question are placed in the "stuck in active" state. When this happens, EIGRP clears the neighbors that did not send a reply and logs a "stuck in active" error message for the route(s) that went active.

Not only does the topology table track information regarding route states, but it also can record special information about each route. EIGRP classifies routes as either internal or external. EIGRP uses a process called route tagging to add special tags to each route. These tags identify a route as internal or external, and may include other information as well.

Internal routes originate from within the EIGRP AS. External routes originate from outside the system. Routes learned (redistributed) from other routing protocols, such as RIP, OSPF, and IGRP are external. Static routes originating from outside the EIGRP AS and redistributed inside are also external routes.

All external routes are included in the topology table and are tagged with the following information:

The identification number (router ID) of the EIGRP router that redistributed the route into the EIGRP network
The AS number of the destination
The protocol used in that external network
The cost or metric received from that external protocol
The configurable administrator tag

The figure shows a specific topology table entry for an external route.

To develop a precise routing policy, take advantage of the route tagging and, in particular, the administrator tag (shown in the shaded portion of the figure). You can configure the administrator tag to be any number between 0 and 255; in effect, this is a custom tag that you can use to implement a special routing policy. External routes can be accepted, rejected, or propagated based on any of the route tags, including the administrator tag. Because you can configure the administrator tag as you see fit, the route-tagging feature affords a high degree of control. This level of precision and flexibility proves especially useful when EIGRP networks interact with BGP networks, which themselves are policy-based. You will learn more about BGP in Chapter 8, BGP, and Chapter 9, Scaling BGP.

DUAL's sophisticated algorithm results in EIGRP's exceptionally fast convergence. To better understand convergence - using DUAL, consider the scenario in Figure . RTA can reach network 24 via three different routers: RTX, RTY, or RTZ.

In Figure , EIGRP's composite metric is replaced by a link cost to simplify calculations. RTA's topology table includes a list of all routes advertised by neighbors. For each network, RTA keeps the real (computed) cost of getting to that network and also keeps the advertised cost (reported distance) from its neighbor, as shown in Figure .

At first, RTY is the successor to network 24, by virtue of its lowest computed cost. RTA's lowest calculated metric to Network 24 is 31; this value is the FD to Network 24.

What if the successor to Network 24, RTY, becomes unavailable, as shown Figure ?

RTA follows a three-step process to select a feasible successor to become a successor for Network 24:

Determine which neighbors have a reported distance (RD) to Network 24 that is less than RTA's FD to network 24. The FD is 31; RTX's RD is 30, and RTZ's RD is 220 (see Figure ). Thus, RTX's RD is below the current FD, while RTZ's RD is not.
Determine the minimum computed cost to Network 24 from among the remaining routes available. The computed cost via RTX is 40, while the computed cost via RTZ is 230. Thus, RTX provides the lowest computed cost.
Determine whether any routers that met the criterion in Step 1 also met the criterion in Step 2. RTX has done both, so it is the feasible successor.

With RTY down, RTA immediately uses RTX (the feasible successor) to forward packets to Network 24. The capability to make an immediate switchover to a backup route is the key to EIGRP's exceptionally fast convergence times. However, what happens if RTX also becomes unavailable, as shown Figure ?

Can RTZ be a feasible successor? Using the same three-step process as before, RTA finds that RTZ is advertising a cost of 220, which is not less than RTA's FD of 31. Therefore, RTZ cannot be a feasible successor (yet). The FD can change only during an active-to-passive transition, and this did not occur, so it remains at 31. At this point, because there has not been a transition to active state for network 24, DUAL has been performing what is called a local computation.

RTA cannot find any feasible successors, so it finally transitions from passive to active state for Network 24 and queries its neighbors about Network 24. This process is known as a diffusing computation. When Network 24 is in active state, the FD is reset. This allows RTA to at last accept RTZ as the successor to Network 24.

Despite the complexity of DUAL, configuring EIGRP can be relatively simple. EIGRP configuration commands vary depending on the protocol that is to be routed (e.g., IP, IPX, or AppleTalk). This section covers configuration commands for each of these routed protocols, in addition to special controls for IPX SAP.

Perform the following steps to configure EIGRP for IP:

Use the following to enable EIGRP and define the autonomous system.

router(config)# router eigrp autonomous-system-number

The autonomous-system-number is the number that identifies the autonomous system. It is used to indicate all routers that belong within the internetwork. This value must match all routers within the internetwork.
Indicate which networks are part of the EIGRP autonomous system on the local router.

router(config-router)# network network-number

The network-number is the network number that determines which interfaces of the router are participating in EIGRP and which networks are advertised by the router.

The network command configures only connected networks. For example, network 3.1.0.0 (on the far left of the main Figure) is not directly connected to Router A. Consequently, that network is not part of Router A's configuration.
When configuring serial links using EIGRP it is important to configure the bandwidth setting on the interface. If you do not change the bandwidth for these interfaces EIGRP assumes the default bandwidth on the link instead of the true bandwidth. If the link is slower, the router may not be able to converge, routing updates might become lost, or suboptimal path selection may result.

router(config-if)# bandwidth kilobits

The value, kilobits, indicates the intended bandwidth in kilobits per second. For generic serial interfaces (PPP or HDLC), set the bandwidth to the line speed.

Cisco also recommends that you add the following command to all of your EIGRP configurations:

router(config-if)# eigrp log-neighbor-changes

This command enables the logging of neighbor adjacency changes to monitor the stability of the routing system and to help detect problems.

You should follow three rules when configuring EIGRP over an NBMA cloud such as Frame Relay:

EIGRP traffic should not exceed the CIR capacity of the VC (virtual circuit).
EIGRP's aggregated traffic over all the VCs should not exceed the access line speed of the interface.
The bandwidth allocated to EIGRP on each VC must be the same in both directions.

If these rules are understood and followed, EIGRP works well over the WAN. If care is not taken in the configuration of the WAN, EIGRP can swamp the network.

Configuring Bandwidth over a Multipoint Network
The configuration of the bandwidth command in an NBMA cloud depends on the design of the VCs. If the serial line has many VCs in a multipoint configuration, and all of the VCs share bandwidth evenly, set the bandwidth to the sum of all of the CIRs. For example, in Figure , each VC's CIR is set to 56 Kbps. Since there are 4 VCs, the bandwidth is set to 224 (4 x 56).

Configuring Bandwidth over a Hybrid Multipoint Network
If the multipoint network has differing speeds allocated to the VCs, a more complex solution is needed. There are two main approaches.

Take the lowest CIR and multiply this by the number of VCs. As shown in Figure , this is applied to the physical interface. The problem with this configuration is that the higher-bandwidth links may be underutilized.
Use subinterfaces. The bandwidth command may be configured on each subinterface, which allows different speeds on each VC. In this case, subinterfaces are configured for the links with the differing CIRs. The links that have the same configured CIR are presented as a single subinterface with a bandwidth, which reflects the aggregate CIR of all the circuits. In Figure , three of the VCs have the same CIR, 256 Kbps. All three VCs are grouped together as a multipoint subinterface, serial 0.1. The single remaining VC, which has a lower CIR, 56 Kbps, can be assigned a point-to-point subinterface, serial 0.2.

The bandwidth-percent command configures the percentage of bandwidth that may be used by EIGRP on an interface. By default, EIGRP is set to use only up to 50% of the bandwidth of an interface to exchange routing information. In order to calculate its percentage, the bandwidth-percent command relies on the value set by the bandwidth command.

Use the bandwidth-percent command when the bandwidth setting of a link does not reflect its true speed. The bandwidth value may be artificially low for a variety of reasons, such as to manipulate the routing metric, or to accommodate an oversubscribed multipoint Frame Relay configuration. Regardless of the reasons, configure EIGRP to overcome an artificially low bandwidth setting by setting the bandwidth-percent to a higher number. In some cases, it may even be set to a number above 100.

For example, assume that the actual bandwidth of a router's serial link is 64 Kbps, but the bandwidth value is set artificially low, to 32 Kbps. The figure shows how to modify EIGRP's behavior so that it limits routing protocol traffic according to the actual bandwidth of the serial interface. The example configuration sets serial 0's bandwidth-percent to 100 percent for the EIGRP process running in AS 24. Since 100 percent of 32 kbps is 32, EIGRP will be allowed to use half of the actual bandwidth of 64 Kbps.

Note that you can change EIGRP's percentage of bandwidth for IP, IPX, and AppleTalk with the following commands:

ip bandwidth-percent eigrp
ipx bandwidth-percent eigrp
appletalk eigrp-bandwidth-percent

To enable EIGRP for IPX, perform the following steps:

Enable IPX routing.

router(config)# ipx routing
Define EIGRP as the IPX routing protocol.

router(config-router)# ipx router {eigrp autonomous-system-number | rip}

If IPX EIGRP is selected, an autonomous system number must be specified. This number must be the same for all IPX EIGRP routers in the network. Figure -
Indicate which networks are part of the EIGRP autonomous system.

router(config-ipx-router)# network network-number
(Optional) If IPX RIP is also operating on the router, remove RIP from the networks using EIGRP by going to the router rip configuration entry and doing the following:

router(config-ipx-router)# no network network-number

By default, Cisco routers redistribute IPX RIP routes into IPX EIGRP, and vice versa. When routes are redistributed, a RIP route to a destination with a hop count of 1 is always preferred over an EIGRP route with a hop count of 1. This ensures that the router always believes a Novell IPX server over a Cisco router for internal IPX networks. (The only exception to this rule is if both the RIP and EIGRP updates were received from the same router. In this case, the EIGRP route always is preferred over the RIP route when the hop counts are the same.)

Controlling IPX RIP
IPX RIP runs by default when IPX routing is enabled. If a legacy Novell server is using IPX RIP, a router's LAN interface must also run IPX RIP to exchange routing information with the server. Because the IPX RIP routes are redistributed into EIGRP, the router does not need to run IPX RIP on a serial link to another Cisco router. IPX EIGRP should be used instead. An administrator can disable IPX RIP on a network-by-network basis using the no network command, as shown in step 4, above.

EIGRP offers other advantages over IPX WAN links, including controlling of SAP updates, which is discussed in the following section.

If an IPX EIGRP router has another IPX EIGRP router as its link partner, you can configure either the router to send SAP updates periodically or when a change occurs in the SAP table. When no IPX EIGRP peer is present on the interface, periodic SAPs are always sent.

On serial lines, by default, if an EIGRP neighbor is present, the router sends SAP updates only when the SAP table changes. Overhead is greatly reduced if a router updates other routers only when a change occurs.

On Ethernet, Token Ring, and FDDI interfaces, by default, the router sends SAP updates periodically. To reduce the amount of bandwidth required to send SAP updates, you might want to disable the periodic sending of SAP updates on LAN interfaces. Do this only when all nodes out this interface are EIGRP peers; otherwise, loss of SAP information on the other nodes will result. If a router's LAN interface connects to a NetWare server, as shown in the figure, do not disable periodic updates. However, Figure shows that incremental SAP updates on RTC's E0 can safely be configured.

To configure incremental SAP updates using EIGRP, issue the ipx sap-incremental eigrp command, which has the following syntax:

router(config-if)#ipx sap-incremental eigrp autonomous-system-number [rsup-only]

The rsup-only keyword is used to indicate that on this interface the system uses EIGRP to carry reliable SAP update information only. RIP routing updates are used, and EIGRP routing updates are ignored.

Configure incremental SAP for RTC as shown in Figure .

Note that in Figure , RTC does not need to run IPX RIP. Thus, it is explicitly disabled by using the command no ipx router rip in the configuration Figure .

EIGRP automatically summarizes routes at the classful boundary (that is, the boundary where the network address ends as defined by class-based addressing). This means that even though RTC is connected only to the subnet 2.1.1.0, it will advertise that it is connected to the entire Class A network, 2.0.0.0. In most cases, auto summarization is a good thing; it keeps routing tables as compact as possible (see Figure ).

However, as we saw in Chapter 2, IP Addressing, you may not want automatic summarization to occur. If you have discontiguous subnetworks, as shown in Figure , auto-summarization must be disabled for routing to work properly. To turn off auto-summarization, use the following command:

router(config-router)#no auto-summary

EIGRP also enables you to manually configure a prefix to use as a summary address. Manual summary routes are configured on a per-interface basis, so you must first select the interface that will propagate the route summary. Then you can define the summary address with the ip summary-address eigrp command, which has the following syntax:

router(config-if)#ip summary-address eigrp autonomous-system-number ip-address mask administrative-distance

EIGRP summary routes have an administrative distance of 5 by default. Optionally, they can be configured for a value between 1 and 255.
In the figure, RTC can be configured using the commands shown:

RTC(config)#router eigrp 2446
RTC(config-router)#no auto-summary
RTC(config-router)#exit
RTC(config)#interface serial0
RTC(config-if)#ip summary-address eigrp 2446 2.1.0.0 255.255.0.0

Thus, RTC will add a route to its table, as follows:

D 2.1.0.0/16 is a summary, 00:00:22, Null0

Notice that the summary route is sourced from Null0, and not an actual interface. That's because this route is used for advertisement purposes and does not represent a path that RTC can take to reach that network. On RTC, this route has an administrative distance of 5.

In the figure, RTD is oblivious to the summarization but accepts the route, and it assigns the route the administrative distance of a "normal" EIGRP route (which is 90, by default). In the configuration for RTC, automatic summarization is turned off, with no auto-summary command. If it wasn't, RTD would receive two routes, the manual summary address (2.1.0.0 /16) and the automatic, classful summary address (2.0.0.0 /8).

In most cases, when you manually summarize, you should also issue the no auto-summary command.

Throughout this chapter, you have seen EIGRP show commands used to verify EIGRP operation. Figure lists the key EIGRP show commands and briefly describes their functions.

The Cisco IOS debug feature also provides useful EIGRP monitoring commands, as listed in Figure .

In this chapter, you learned that EIGRP, a routing protocol developed by Cisco, is an advanced distance-vector routing protocol that uses the DUAL algorithm. It includes features such as rapid convergence, reduced bandwidth usage, and multiple network-layer support.

You also learned that EIGRP converges rapidly, performs incremental updates, routes IP, IPX, and AppleTalk traffic, and summarizes routes. You learned how to configure and verify EIGRP configuration for various protocols.

In the next chapter, you will learn how to optimize routing operations using static routes, default routes, and route filtering.

Dynamic routing, even in small internetworks, can involve much more than just enabling the default behavior of a routing protocol. A few simple commands may be enough to get dynamic routing started, but more advanced configuration must be done to enable such features as routing update control and exchanges among multiple routing protocols. You can optimize routing in a network by controlling when a router exchanges routing updates and what those updates contain. This chapter examines the key IOS route optimization features, including routing update control, policy-based routing, and route redistribution.

Consider RTA in the figure, which is running a simple distance-vector routing protocol, RIP.

The network 10.0.0.0 command does two things. First, it tells RIP where to send and receive advertisements (which interfaces to send and receive updates on). The network 10.0.0.0 command enables RIP updates on all interfaces that have an IP address belonging to the 10.0.0.0 network (Bri0, S1, S2, and E0). Second, this command tells the RIP process what to advertise. All directly connected subnets belonging to the major network 10.0.0.0 are included in RIP updates, in addition to any dynamically learned routes. That means that RTA advertises the following networks: 10.1.1.0, 10.2.2.0, 10.3.3.0, and 10.4.4.0.

Unfortunately, the default behavior of RIP, or any routing protocol, may not be the best thing for an internetwork. Look again at the figure. Is it useful for RTA to send updates on all four interfaces?

Updating out E0 is a waste of resources. No other routers on the 10.4.4.0 subnetwork can receive the updates, so they serve no purpose. Meanwhile, sending updates creates slight (and needless) overhead and a potential security risk. (A malicious user could use a packet sniffer to capture routing updates and thus glean key network information.)

For these reasons, you can configure passive interfaces or route filters to control routing updates. Both strategies are discussed in the following sections.

You can configure LAN interfaces as passive interfaces when enabling a routing protocol. A passive interface receives updates, but does not send them. The passive-interface command can be used with all IP interior gateway protocols (that is, RIP, IGRP, EIGRP, OSPF, and IS-IS). The syntax of this command is as follows:

Router(config-router)# passive-interface type number

You can configure RTA's E0 as a passive interface, as shown in Figure .

You can also configure WAN interfaces as passive interfaces, to prevent the sending of updates to link partners. You can use the passive-interface command on WAN interfaces to prevent routers from sending updates to link partners.

There may be several reasons to prevent updates on the WAN. If RTA and RTX are connected by a dial-on-demand ISDN link , regular RIP updates will keep the link up constantly and will result in an unnecessarily large bill from the provider. Instead, a static route can be configured on both routers with RTA's Bri0 configured as a passive interface.

Note that for RTA to update the other routers (RTY and RTZ) about the route to 172.16.1.0, RTA must be configured to redistribute static routes into RIP. The redistribute static command tells RIP to import the static routes into RIP and advertise them as part of a RIP update. Route redistribution is covered in more detail later in this chapter.

The passive-interface command works differently with the different IP routing protocols that support it. In OSPF, the network address of the passive interface appears as a stub network. OSPF routing information is neither sent nor received via a passive interface. In EIGRP and OSPF, the router stops sending hello packets on passive interfaces. When this happens, the router can not form neighbor adjacencies, and thus can not send and receive routing updates on the interface. You will see later in this chapter that the passive effect can be achieved for an EIGRP interface (without preventing adjacency relationships) by using the distribute-list command.

Configuring an interface as passive prevents it from sending updates entirely, but sometimes you need to suppress only certain routes in the update from being sent or received. If RTA in Figure is configured with the network 10.0.0.0 command, all four directly connected subnets will be advertised in RTA's updates, along with any dynamically learned routes. However, you may want to prevent RTZ from learning about network 10.1.1.0 from RTA.

This may be needed to enforce a routing policy that is based on some external factor such as link expense, administrative jurisdiction, or security concerns. In some cases, you may just want to reduce needless overhead by preventing access routers from receiving the complete (and possibly immense) core routing table. Just assume that for one of these reasons, you do not want RTZ learning the route to 10.1.1.0 from RTA.

You can use the distribute-list command to pick and choose what routes a router will send or receive updates about. By referencing an access list, the distribute-list creates a route filter - a set of rules that precisely controls what routes a router will send or receive in a routing update. This command is available for all IP routing protocols and can be applied to either inbound or outbound routing updates. When applied to inbound updates, the syntax for configuring a route filter is as follows:

Router(config-router)# distribute-list access-list-number in [interface-name]

When applied to outbound updates, the syntax can be more complicated:

Router(config-router)# distribute-list access-list-number out [interface-name | routing-process | as-number]

The routing-process and as-number options are invoked when exchanging routes between different routing protocols. This will be covered later in the chapter, in the section, "Using Multiple Routing Protocols."

In Figure , access list 24 will match the route to 10.1.1.0 and result in a deny. When referenced by the distribute-list command, this match results in the removal of the route to network 10.1.1.0 in the outbound update. However, there is a catch. The distribute-list 24 out command will have a global effect on RIP updates out every interface, not just out the interface connected to RTZ. Your intent was to suppress the 10.1.1.0 route from updates to RTZ only. This level of specificity can be accomplished by using an optional interface argument with the command, as shown:

RTA(config-router)# distribute-list 24 out interface s2

Conversely, we could have told RTZ to globally filter network 10.1.1.0 from any incoming updates, as shown in Figure .

Or, you could have precisely filtered 10.1.1.0 from the specific interface on RTZ, as shown:

RTZ(config-router)# distribute-list 16 in interface s0

The distribute-list command can filter any routes in either an outbound or an inbound update globally, or for a specific interface. The Cisco IOS permits one incoming and one outgoing global distribute-list for each routing process, as well as one incoming and one outgoing distribute-list for each interface involved in a routing process. You can keep track of which routing filters are applied globally and which are applied on specific interfaces with the show ip protocols command, as shown in Figure .

Configuring a Passive EIGRP Interface Using distribute-list Command
A passive interface can not send EIGRP hellos, which prevents adjacency relationships with link partners. You can create a "pseudo" passive EIGRP interface by using a route filter that suppresses routes from the EIGRP routing update, as shown:

RTA(config)# router eigrp 364
RTA(config-router)# network 10.0.0.0
RTA(config-router)# distribute-list 5 out interface s0
RTA(config-router)# exit
RTA(config)# access-list 5 deny any

With this configuration, RTA can send EIGRP hellos and establish adjacencies, but no routes will appear in any updates sent out s0.

You can use the ip route command to dictate which path a router will select to a given destination. However, through policy routing, you can program a router to choose a route based not only on destination, but on source as well.

Concerns such as monetary expense, organizational jurisdiction, or security issues can lead administrators to establish policies, or rules that routed traffic should follow. Left to their default behavior, routing protocols may arrive at path decisions that conflict with these policies. For that reason, administrators use policy routing to override dynamic routing and take precise control of how their routers handle certain traffic.

Although policy routing can be used to control traffic within an AS, it is typically used to control routing between autonomous systems. For that reason, policy routing is used extensively with exterior gateway protocols (EGPs), such as BGP.

The route-map command is used to configure policy routing, which is often a complicated task. A route map is defined using the syntax shown in the figure.

The map-tag is the name, or ID, of the route map. You can set this to something easily recognizable, such as route2ISP or CHANGEROUTE. The route-map command changes the router's mode to the route-map configuration mode, from which you can configure conditions for the route map.

Route maps operate similar to access lists in that they examine one line at a time and when a match is found, action is taken. Route maps are different from numbered access lists in that they can be modified without changing the entire list. Each route map statement is given a number. If a sequence number is not specified, the first route map condition will automatically be numbered as 10, the second condition will automatically be numbered as 20, and so on. The optional sequence-number can be used to indicate the position that a new route map is to have in the list of route maps already configured with the same name.

After you have entered the route-map command, you can enter set and match commands in the route-map configuration mode. Each route-map command has a list of match and set commands associated with it. The match commands specify the match criteria-the conditions that should be tested to determine whether to take action. The set commands specify the set actions - the actions to perform if the match criteria are met.

Figure presents a policy-routing scenario. You can use a route map at RTA to implement policy routing. Assume for this example that the policy you want to enforce is as follows: Internet-bound traffic from 192.168.1.0 /24 is to be routed to ISP1, and Internet-bound traffic from 172.16.1.0 /24 is to be routed to ISP2.

First, define the access lists that will be used in the route maps to match IP addresses; then configure the route map itself using the syntax shown in Figure .

The commands in Figure have actually configured two policies. The ISP1 route map matches access list 1 and routes traffic out S0 toward ISP1. The ISP2 route map matches access list 2 and routes that traffic out S1 toward ISP2.

The final step is to apply each route map to the appropriate interface on RTA using the ip policy route-map command, as shown in Figure . With the route maps applied to the appropriate LAN interfaces, policy routing is successfully implemented.

Frequently, route maps are used to control the exchange of routing information during redistribution. Route redistribution is detailed in the next section.

To support multiple routing protocols within the same internetwork efficiently, routing information must be shared among the different routing protocols. For example, routes learned from a RIP process may need to be imported into an IGRP process. This process of exchanging routing information between routing protocols is called route redistribution. Such redistribution can be one-way (that is, one protocol receives the routes from another) or two-way (that is, both protocols receive routes from each other). Routers that perform redistribution are called boundary routers because they border two or more autonomous systems or routing domains. This section examines route redistribution in detail, including the use of administrative distance, guidelines for redistribution implementation, and issues with redistribution configuration.

Because using multiple routing protocols typically results in increased administrative complexity and overhead, you may wonder why it is done in the first place. Actually, there are several scenarios in which using multiple routing protocols solves more problems than it creates especially in medium and large-sized networks.

Consider a large, mixed-vendor routing environment in which Cisco routers work alongside other routers. An administrator may create an "all-Cisco" domain, where the advantages of proprietary protocols such as IGRP and EIGRP can be enjoyed. Meanwhile, other areas of the network run a nonproprietary protocol, such as OSPF or RIP, as shown in Figure .

Multiple routing protocols may also be effectively deployed to support legacy UNIX systems that support RIP only. These systems may represent a significant financial investment and may not be readily upgradeable. As shown in Figure , an administrator may elect to run RIP on subnets populated by the UNIX systems but might use a more scalable protocol elsewhere. Also, running multiple routing protocols can be seen as a temporary fix during a prolonged upgrade from older protocols and hardware to newer, more scalable solutions.

On some occasions redistribution is done even when running compatible routing platforms (for example, all Cisco routers running EIGRP). If an organization is exchanging routing information with a domain outside its administrative control, it may choose to configure route redistribution as a means of logically separating the different routing processes, which may have different policies.

Cisco routers support up to 30 dynamic routing processes. This means that a router can run RIP, OSPF, IGRP, IS-IS, EIGRP, IPX RIP, RTMP (AppleTalk), and other protocols simultaneously. Most of these routing protocols allow you to configure multiple processes of the same routing algorithm; RIP is a notable exception. For example, you can define multiple IGRP processes by using different AS numbers, or different OSPF processes by using different process ID numbers, as shown in Figure .

Note that in configuration shown in Figure , RTA's OSPF processes (24 and 46) will not share routing information unless route redistribution is configured. Because each routing process places substantial demands on the router's memory and CPU resources, only boundary routers should run more than one routing process for the same routed protocol, and only when absolutely necessary.

If a boundary router is running multiple IP routing protocols, then it may be possible that the router will learn about the same network from more than one routing protocol. For example, RTZ may learn about the 10.0.0.0 network from both RIP and IGRP, see Figure . Which route will RTZ install in its routing table?

A router looks at the metric value to determine the best route. However, in this case, the router would have to compare RIP's simple metric, hop count, with IGRP's composite metric, derived from bandwidth, delay, reliability, load, and MTU. As noted in Chapter 3, Routing Overview, there is no way to precisely compare what are, in effect, apples and oranges. In Figure , IGRP's metric of 10576 cannot be accurately measured against RIP's metric of 3 for the same route. Instead, routers use administrative distance to choose between routes to the same network offered by different routing protocols.

A routing protocol's administrative distance rates its trustworthiness as a source of routing information. Administrative distance is an integer from 0 to 255. The lowest administrative distance has the highest trust rating. An administrative distance of 255 means the routing information source cannot be trusted at all and should be ignored. An administrative distance of zero is reserved for directly connected interfaces and will always be preferred.

Specifying administrative distance values enables the Cisco IOS software to discriminate between sources of routing information. If two routes have the same network number (and possibly subnet information), the IOS software always picks the route whose routing protocol has the lowest administrative distance. Although you can not easily compare apples with oranges, we can instruct the router to always choose oranges over apples. Figure shows the default administrative distances for some routing information sources.

The IGRP route will be preferred, or "trusted," over the RIP route to the same network, by virtue of a lower administrative distance (IGRP's 100 vs. RIP's 120). Of course, there may be times that you may actually want the router to believe RIP over IGRP for some reason. Fortunately, the Cisco IOS allows you to manually configure administrative distance, as discussed in the next section.

When using multiple IP routing protocols on a router, the default distances usually suffice. However, some circumstances call for changing the administrative distance values on a router.

For example, if a router is running both IGRP and OSPF, it may receive routes to the same network from both protocols. The default administrative distances favor IGRP routes over OSPF routes, as shown in the figure. However, because IGRP does not support CIDR, you may want the router to use the OSPF route instead. In this case, you can configure the local router to apply a custom administrative distance to all OSPF routes, as shown in the Figure.

With the distance 95 OSPF configuration command, RTZ compares the IGRP and OSPF routes and comes up with a different result.

In its broadest application, the distance command can be used to modify the administrative distance value applied to all routes learned via a specific routing process. The commands in the figure will assign the value of 95 to all routes learned by the OSPF 1 process. Note that these values are local to the router. Although RTZ assigns the 10.0.0.0 network an administrative distance of 95, all other Cisco OSPF routers will apply a value of 110, unless otherwise configured.

You can also apply the distance command with optional arguments to make changes to selected routes based on where they originate. The expanded syntax of the distance command is as follows:

Router(config-router)#distance weight [source-ip-address source-mask (access-list-number | name)]

After running multiple protocols on a boundary router, you may discover that one or two suboptimal paths have been installed because of their lower administrative distance. Rather than assign a new distance value to all routes learned by a process, specific routes can be identified based on their source IP. Using the optional arguments, you can configure a router to apply an administrative distance of 105 to all RIP routes received from 10.4.0.2:

RTZ(config)#router rip
RTZ(config-router)#distance 105 10.4.0.2 255.255.255.255

Alternatively, you can apply an administrative distance value to only certain routes from that same source by specifying an access list.

Remember that the administrative distance defaults exist for a reason and will serve a network well in most circumstances. Use the distance command only when you are certain that it is necessary to guarantee optimal routing.

Route redistribution can be tricky business, with several pitfalls:

Routing loops - Depending on how you use redistribution, routers can send routing information received from one AS back into the AS. The feedback is similar to the split-horizon problem that occurs in distance-vector technologies.
Incompatible routing information - Each routing protocol uses different metrics. Because these metrics cannot be translated exactly into a different protocol, path selection using the redistributed route information may not be optimal.
Inconsistent convergence time - Different routing protocols converge at different rates. For example, RIP converges slower than EIGRP, so if a link goes down, the EIGRP network will learn about it before the RIP network.

These potential trouble spots can be avoided with careful planning and implementation. Be sure to follow these important guidelines when configuring route redistribution:

Be familiar with your network. There are many ways to implement redistribution, so knowing your network will enable you to make the best decision.
Do not overlap routing protocols. Do not run two different protocols in the same internetwork. Instead, have distinct boundaries between networks that use different routing protocols.
Use one-way redistribution with multiple boundary routers. If more than one router serves as a redistribution point, use one-way redistribution to avoid routing loops and convergence problems. Consider using default routes in the domains that do not import external routes.
Use two-way redistribution with a single boundary router. Two-way redistribution works smoothly when redistribution is configured on a single boundary router in the internetwork. If you have multiple redistribution points, do not use two-way redistribution unless you enable a mechanism to reduce the chances of routing loops. A combination of default routes, route filters, and distance modifications can be used to combat routing loops.

Although the redistribution command is available for all IP routing protocols, it behaves differently depending on the actual IP routing protocols involved. The underlying principles, however, are the same, so the examples in this section can be used as a starting point for any redistribution scheme.

This section closely examines examples of one-way and two-way redistribution and then focuses on specific redistribution issues, including connected, static routes and the default-metric command.

In Figure , RTB injects routes learned via RIP into the EIGRP domain. However, the RIP routers do not learn about the EIGRP routes. This is one-way route distribution. In this example, the RIP routers can use a default route to handle any traffic bound for non-local destinations.

As the AS boundary router, RTB must run two routing processes: one for the RIP domain and one for the EIGRP AS, as shown in Figure .

The redistribute rip command enables route redistribution: RIP routes learned by RTB will be imported into the EIGRP process. The metric argument sets up the values used by EIGRP to translate the metric from RIP's hop count to EIGRP's composite metric. When used with IGRP/EIGRP, the metric keyword sets the bandwidth value (in kbps), the delay (in tens of microseconds), the reliability (out of 255), the load (out of 255), and, finally, the maximum transmission unit (MTU).

These five values constitute the seed metric in the example . The seed metric is the initial metric value of an imported route. After it is imported into the EIGRP AS, the RIP route becomes an EIGRP route with a composite metric derived from these seed values. So, using the above configuration, RIP routes with metrics of 2, 6, and 14 will all be redistributed with the same EIGRP metric value (2195456). However, as the imported route propagates to other EIGRP routers, its metric values increment according to EIGRP rules.

You can examine the routing tables of the EIGRP router, RTA; the boundary router, RTB; and the RIP router, RTC by clicking on the topology Figure . (The tables have been reformatted for the sake of clarity.)

RTA's routing table includes not only the EIGRP routes from AS 24, but also the redistributed routes from the RIP domain. The redistributed RIP routes that have been learned from RTB are denoted by "D EX" because EIGRP considers them external. As discussed in Chapter 6, EIGRP differentiates between internal routes (routes learned from within the AS) and external routes (imported from outside the AS). The Cisco IOS even assigns a different-and much less desirable-administrative distance to external EIGRP routes: 170.

RTB's table shows that RTB is running two routing protocols and has learned routes via RIP (denoted by R) and learned routes via EIGRP (denoted by D).

Notice that RTC does not have a default route and has not learned about any routes from the boundary router, RTB. That means that RTC can not route to 6 of the 12 networks shown in the outputs of Figure . You may decide that the best solution in this scenario is to use a default route that points to RTB. This can easily be accomplished statically, as shown:

RTC(config)# ip route 0.0.0.0 0.0.0.0 172.16.0.1

Because RTC is running RIP, it can dynamically propagate its 0.0.0.0/0 route to the other routers in the RIP domain. If you choose to implement this default route configuration, there is no need for the boundary router (RTB) to send updates into the RIP domain. Thus, you should configure RTB's RIP interface as passive, as shown:

RTB(config)# router rip
RTB(config-router)# passive-interface s0

A more complex topology may require that we employ two-way, or mutual, redistribution by importing the EIGRP routes into the RIP domain, as described in the next section.

You can configure the boundary router for two-way redistribution, as shown in Figures - .

Notice that the syntax of the metric keyword varies depending on the routing protocol that it uses. For RIP, OSPF, and BGP, the metric option is followed by a single number that represents the metric value (hop count, cost, and so on). For IGRP and EIGRP, the metric option is followed by five values that represent bandwidth, delay, reliability, load, and MTU.

Note: Whenever there is a major network that is subnetted, you need to use the keyword subnets to redistribute protocols into OSPF. Without this keyword, OSPF redistributes only major nets that are not subnetted. For example, to inject EIGRP routes, including subnets, into an OSPF area, use the command redistribute eigrp 24 metric 100 subnets.

In Figures - , RIP is configured to import EIGRP routes and distribute them into the RIP domain with a seed metric of 2 (hops).

Mutual redistribution will result in RTC installing 11 routes in its table (click on the topology Figure to view RTC's routing table).

Unlike EIGRP, RIP does not differentiate between external and internal routes. Also, note that RTB's seed metric has resulted in a metric of two hops for all the redistributed routes, even though two of these networks are actually three hops away.

After configuring two-way redistribution, RTC and RTA have only 11 routes, while the boundary router (RTB) has 12. What is going on here? The answer lies in RTB's directly connected routes:

172.16.0.0/16 (missing from RTA's table)
172.24.0.0/16 (missing from RTC's table)

Recall that the network command identifies not only which interfaces to run the routing protocol on, but also which directly connected networks will be included in routing updates. Look carefully at Figures - . RTB's RIP process is configured to advertise the connected network 172.16.0.0, while its EIGRP process is configured to advertise the connected network 172.24.0.0.

To bring RTA and RTC's routing tables up to a complete 12 routes, we can configure both of RTB's routing processes to include the two connected networks using the network command. However, that will result in a RIP process running in the EIGRP AS and an EIGRP process running in the RIP domain. This solution will generate needless overhead. Redistribution offers a much more efficient and elegant solution. You can configure RTB to redistribute its connected routes using a default metric, as discussed in the following sections.

Directly connected routes can be redistributed into a routing protocol by using the redistribute connected command with a seed metric, as shown in this example:

RTB(config-router)#router eigrp 24
RTB(config-router)#redistribute connected metric 10000 100 255 1 1500

By using the connected keyword, redistribution will inject all connected routes into the routing protocol's updates, without configuring a network statement.

Static routes can be redistributed in the same way. This example illustrates how RTB could be configured to redistribute static routes:

RTB(config-router)#router eigrp 24
RTB(config-router)#redistribute static metric 10000 100 255 1 1500

In the figure, RTB is configured for two-way redistribution. Note that the seed metric is included each time the redistribute command is issued. The default-metric command can be used as a shortcut in this situation.

The default-metric Command
An alternative to this redistribution configuration is to use the default-metric command instead of including the same seed metric with each redistribute statement. Whenever the redistribute command is used and the metric is not specified, the router will use the default metric value as the seed metric. The default metric value can be administratively configured for each routing protocol, as shown in the second show running-config output of the Figure. In the second example, all EIGRP redistribute commands will use the default metric, 10000 100 255 1 1500. Meanwhile, all RIP redistribute commands will use a default metric of 2.

Following is the best way to verify redistribution operation:

Know your network topology, particularly where redundant routes exist.
Display the routing table of the appropriate routing protocol on a variety of routers in the internetwork using the show command. For example, check the routing table on the ASBR as well as some of the internal routers in each AS.
Perform a traceroute on some of the routes that go across the ASs to verify that the shortest path is being used for routing. Make sure that you run traces especially to networks for which redundant routes exist.
If you do encounter routing problems, use traceroute and debug commands to observe the routing update traffic on the ASBRs and internal routers.

This section presents a case study that addresses the issues associated with integrating RIP with OSPF. Because complex internetworks typically include UNIX hosts and legacy routers (neither of which is likely to support OSPF), most OSPF internetworks must also use RIP in select areas. This case study provides examples of how to complete the following phases in redistributing information between RIP and OSPF networks:

Configuring a RIP network
Adding OSPF to the core of a RIP network
Adding OSPF areas

Phase 1: Configuring RIP
Figure illustrates the RIP network. Three sites are connected with serial lines.

This RIP network uses a Class B address (130.10.0.0) and an 8-bit subnet mask. Each site has a contiguous set of network numbers, as shown in Figure .

All three routers are connected to the same major network, so they will each be configured with the network 130.10.0.0 command. (Click on the topology in Figure to view the running configs of each router). A common first step in migrating a RIP network to an OSPF network is to configure backbone routers that run both RIP and OSPF, while the remaining network devices run RIP. These backbone routers act as OSPF ASBRs. Each ASBR controls the flow of routing information between OSPF and RIP. In the figure, RTA, RTB, and RTC now act as ASBRs.

Because RIP does not need to run between the backbone routers, updates can be suppressed using the passive-interface command. Although the example below specifies RTA, the same commands could be entered on the other routers.

RTA(config)# router rip
RTA(config-router)# passive-interface serial 0
RTA(config-router)# passive-interface serial 1

Instead of RIP updates, OSPF updates will carry the redistributed information across the WAN links. The necessary OSPF routing and redistribution commands are shown here. The same configuration is used for all three routers except for the network numbers.

RTA(config)# router ospf 109
RTA(config-router)# redistribute rip subnets
RTA(config-router)# network 130.10.62.0 0.0.0.255 area 0
RTA(config-router)# network 130.10.63.0 0.0.0.255 area 0

The subnets keyword tells OSPF to redistribute all subnet routes. Without the subnets keyword, only networks that are not subnetted are redistributed by OSPF.

The redistributed RIP routes appear as external Type 2 routes in OSPF, as discussed in Chapter 5, Multiarea OSPF.

Mutual redistribution must be configured for other routers in the RIP domain (not shown in the Figure) to receive information from OSPF. This example lists the necessary commands, which are again the same for each router.

RTA(config)# router rip
RTA(config-router)# redistribute ospf 109 match internal external 1 external 2
RTA(config-router)# default-metric 10

Note that the redistribute command includes the OSPF process ID, 109. The other keywords, match internal external 1 and external 2, instruct RIP to redistribute internal OSPF routes, as well as external Type 1 and Type 2 routes. This is the default for OSPF redistribution. These keywords are required only if you want to modify its behavior.

As illustrated in the figure, there are no paths directly connecting the RIP domains outside the core. In real-world networks, this is not always the case. If one RIP domain can communicate directly with another, there is a chance that they will exchange routes, resulting in a routing feedback loop. You can use route filters to prevent these potentially disastrous loops.

The configuration below allows RTA's OSPF process to redistribute RIP information, only for networks 130.10.8.0 through 130.10.15.0:

RTA(config)# router ospf 109
RTA(config-router)# redistribute rip subnets
RTA(config-router)# distribute-list 11 out rip

RTA(config)# access-list 11 permit 130.10.8.0 0.0.7.255
RTA(config)# access-list 11 deny 0.0.0.0 255.255.255.255

These commands prevent RTA from advertising networks in other RIP domains onto the OSPF backbone, thereby preventing other boundary routers from using false information and forming a loop. When an OSPF backbone area is in place, the RIP domains can easily be converted into OSPF areas.

Each RIP domain can be converted into an OSPF area independently of the other RIP domains. This allows you to migrate one section of your internetwork at a time, if desired.

When all three of our RIP domains have become OSPF areas, the three core routers will serve as ABRs Figure . Recall that ABRs control the exchange of routing information between OSPF areas and the OSPF backbone. Each ABR keeps a detailed record of its respective area's topology and summarizes this information in its updates to other backbone routers.

Note that the figure also presents a new addressing scheme in the core. A 29-bit mask (255.255.255.248) is used to address WAN links and conserve address space. Meanwhile, a 24-bit mask remains on the LAN interfaces, resulting in variable-length subnet masks. OSPF fully supports VLSM, while RIPv1 does not. With OSPF as the sole routing protocol, the network can now take advantage of VLSM's advantages. The example below shows the commands necessary to configure RTA for OSPF operation on all interfaces, with the appropriate masks.

RTA(config)#router ospf 109
RTA(config-router)#network 130.10.62.0 0.0.0.7 area 0
RTA(config-router)#network 130.10.63.0 0.0.0.7 area 0
RTA(config-router)#network 130.10.8.0 0.0.7.255 area 1

Because OSPF is classless, we can configure each ABR to use route summarization. For example, RTA connects to eight networks, which occupy a contiguous address space (see Figure ). Thus, you can configure this ABR to send a single supernet route, which will advertise all eight of the networks:

RTA(config)#router ospf 109
RTA(config-router)#area 1 range 130.10.8.0 255.255.248.0

RTA will advertise one route, 130.10.8.0 255.255.248.0, which covers all subnets in Area 1, into Area 0. Without the range keyword in the area command, RTA would advertise each subnet individually--for example, one route for 130.10.8.0 255.255.255.0, one route for 130.10.9.0 255.255.255.0, and so forth.

The migration of the network from RIP to OSPF is now complete, and redistribution is no longer necessary.

In this chapter, you learned that there are many ways to control routing update traffic, including passive interface, default routes, static routes, and route filtering. You also learned that redistribution (the capability for boundary routers connecting different autonomous systems to exchange and advertise routing information received from one autonomous system to the other autonomous system) enables you to exchange routing information between dissimilar routing protocols and requires some care when configuring.

In the next chapter, you learn how to connect an enterprise network to an Internet service provider (ISP).

Routing protocols can be classified in many ways. One of the ways would take into account where they are used in relationship to your enterprise. Protocols that run inside an enterprise are called interior gateway protocols (IGPs). Examples of IGPs include RIP, IGRP, EIGRP, and OSPF. Protocols that run outside an enterprise, or between autonomous systems (AS), are called exterior gateway protocols (EGPs). Typically, EGPs are used to exchange routing information between ISPs, or in some cases between a customer's AS and the provider's network. Border Gateway Protocol, version 4 (BGP4), is the most common EGP and is considered the Internet standard.

This chapter provides an overview of the different types of autonomous systems and then focuses on basic BGP operation, including BGP neighbor negotiation. The chapter then looks at how to use the Cisco IOS to configure BGP and verify its operation. Finally, it examines BGP peering and the BGP routing process.

An internetwork is a group of smaller, independent networks. Each of these smaller networks may be owned and operated by a different organization: a company, a university, a government agency, or some other group. The Internet is one example of a single, albeit immense, internetwork.

Not surprisingly, the operators of these individual networks desire autonomy, or self-administration, over their own systems. Because the routing and security policies of one organization may conflict with the policies of another, internetworks are divided into domains, or autonomous systems. Each AS typically represents an independent organization and applies its own unique routing and security policies. EGPs facilitate the sharing of routing information between autonomous systems. (See the Figure).

An AS is any set of routers that share similar routing policies and operate within a single administrative domain. An AS can be a collection of routers running a single IGP, or it can be a collection of routers running different protocols all belonging to one organization. In either case, the outside world views the entire AS as a single entity.

Each AS has an identifying number, assigned by an Internet registry or a service provider, between 1 and 65,535. AS numbers within the range, 64,512 through 65,535 are reserved for private use (similar to RFC 1918 IP addresses). Because of the finite number of available AS numbers, an organization must present justification of its need before it will be assigned an AS number.

Today, the Internet Assigned Numbers Authority (IANA) is enforcing a policy whereby organizations that connect to a single provider and share the provider's routing policies use an AS number from the private pool (64,512 to 65,535). These private AS numbers appear only within the provider's network and are replaced by the provider's registered number upon exiting the network. Thus, to the outside world, several individual networks are advertised as part of one service provider's network. In principal, this process is similar to NAT (see Chapter 2, IP Addressing).

During the early days of the Internet, an EGP called EGP version 3 (EGP3 - not to be confused with EGPs in general) was used to interconnect autonomous systems. Currently, BGP4 is the accepted standard for Internet routing and has essentially replaced the more limited EGP3.

The following sections detail the different types of autonomous systems: single-homed, multihomed nontransit, and multihomed transit. In addition to defining these three types of systems, these sections will examine BGP's role in connecting each type of AS to an ISP.

If an AS has only one exit point to outside networks, it is considered a single-homed system. Single-homed autonomous systems are often referred to as stub networks or stubs. Stubs can rely on a default route to handle all traffic destined for non-local networks. For the network depicted in Figure , you can configure routers in the customer AS to use a default route to an upstream service provider.

However, how will the outside world learn about the network addresses inside the customer's AS? The provider can essentially use three different methods to advertise a customer's networks.

Use a static configuration. The provider could list the customer's networks as static entries in its own router and then advertise these routes upstream to the Internet core. This approach works well if the customer's networks can be summarized using a CIDR prefix, as discussed in Chapter 2. However, if the AS contains numerous discontiguous networks, route aggregation may not be a viable option.
Use an IGP. Another alternative is to use an IGP to advertise the customer's networks, as depicted in Figure . This has all the benefits of dynamic routing, in which network information and changes are dynamically sent to the provider.
Use an EGP. The third method by which the ISP can learn and advertise the customer's routes is to use an EGP such as BGP. It is difficult to get a registered AS number from IANA for a stub network because the customer's routing policies are an extension of the policies of the provider. Instead, the provider can give the customer an AS number from the private pool of AS numbers (64,512 to 65,535), and strip off these numbers when advertising the customer's routes towards the core of the Internet (Figure ).

Note that only one of these three solutions requires a customer to run BGP with its provider.

An AS is a multihomed system if it has more than one exit point to outside networks. An AS connected to the Internet can be multihomed to a single provider or multiple providers. A nontransit AS does not allow transit traffic to pass through it. Transit traffic is any traffic that has a source and destination outside the AS. The figure illustrates a multihomed and nontransit AS (AS 24), which is connected to two providers, ISP1 and ISP2.

A nontransit AS would advertise only its own routes to both providers to which it connects. It would not advertise routes that it learned from one provider to another. This ensures that ISP1 will not use AS 24 to reach destinations that belong to ISP2, and vice versa. Of course, ISP1 or ISP2 can force traffic to be directed to AS 24 via default or static routing. As a precaution against this, the router at the border of AS 24 could filter incoming traffic to prevent transit traffic from passing through.

Multihomed nontransit autonomous systems do not really need to run BGP4 with their providers, although it is recommended and often required by ISPs. As you will see later in this chapter, BGP4 offers numerous advantages, including increased control of route propagation and filtering.

A multihomed transit system has more than one connection to the outside world and can be used for transit traffic by other autonomous systems. From the multihomed AS' view, transit traffic is any traffic originating from outside sources bound for outside destinations as shown in the Figure.

A transit AS can route transit traffic by running BGP internally so that multiple border routers in the same AS can share BGP information. Additional routers may be used to forward BGP information from one border router to another. You may choose to run BGP inside an AS to facilitate this exchange.

When BGP is running inside an AS, it is referred to as Internal BGP (IBGP). When BGP runs between autonomous systems, it is called External BGP (EBGP). If a BGP router's role is to route IBGP traffic, it is called a transit router. Routers that sit on the boundary of an AS and that use EBGP to exchange information with the ISP are called border (or edge) routers.

In many cases, the routing policy that is implemented in an AS is consistent with the ISP's policy. In these cases, it is not necessary or even desirable, to use BGP to exchange routing information with the ISP. Instead, connectivity can be achieved through a combination of static routes and default routes.

When connecting to two ISPs, it is frequently necessary to use BGP. Some network administrators connect their enterprise to different ISPs for redundancy, load sharing, and lower tariffs at particular times during the day or night. If you have a backup link for redundancy, you can use a combination of static and default routes instead of BGP. However, if both of these connections are active at the same time, BGP is required. In addition, any time your policy requirements differ from the policy of your ISP, BGP is required.

In the figure, router A is advertising a default network into the AS through a local IGP, such as RIP. A static route affords connectivity through router B to the ISP's AS. The ISP is running BGP and is recognized by other BGP routers in the Internet.

Note: In general, it is necessary to use BGP to connect to an ISP only when you have different policy requirements than the ISP.

BGP has been most recently defined in RFC 1772. BGP's job is to exchange routing information between autonomous systems while guaranteeing loop-free path selection. BGP4 is the first version of BGP that supports CIDR and route aggregation. Unlike common IGPs (such as RIP, OSPF, and EIGRP), BGP does not use technical metrics. Instead, BGP makes routing decisions based on network policies, or rules.

This section offers a brief overview of how BGP works and is followed by a more detailed examination of the various types of BGP packets and relationship states.

BGP updates are carried using TCP on port 179. In contrast, RIP updates use UDP port 520, while OSPF does not use a Layer 4 protocol. Because BGP requires TCP, IP connectivity must exist between BGP peers, and TCP connections must be negotiated between them before updates can be exchanged. Thus, BGP inherits TCP's reliable, connection-oriented properties.

To guarantee loop-free path selection, BGP constructs a graph of autonomous systems based on the information exchanged between BGP neighbors. As far as BGP is concerned, the whole internetwork is a graph, or tree, of autonomous systems. The connection between any two systems forms a path, and the collection of path information is expressed as a sequence of AS numbers (called the AS Path). This sequence forms a route to reach a specific destination, as shown in the figure.

When two routers establish a TCP-enabled BGP connection, they are called neighbors or peers. Each router running BGP is called a BGP speaker. Peer routers exchange multiple messages to open and confirm the connection parameters, such as the version of BGP to be used. If there are any disagreements between the peers, notification errors are sent and the connection fails.

When BGP neighbors first establish a connection (Figure ), they exchange all candidate BGP routes (Figure ). After this initial exchange, incremental updates are sent as network information changes. As discussed in earlier chapters, incremental updates are more efficient than complete table updates. This is especially true with BGP routers, which may contain the complete Internet routing table.

Peers advertise destinations that are reachable through them by using update messages. These messages contain, among other things, route prefix, AS path, and path attributes such as the degree of preference for a particular route.

If network reachability information changes, such as when a route becomes unreachable or a better path become available, BGP informs its neighbors by withdrawing the invalid routes and injecting the new routing information. Withdrawn routes are part of the update message. BGP routers keep a table version number that tracks the version of the BGP routing table received from each peer. If the table changes, BGP increments the table version number. A rapidly incrementing table version is usually an indication of instabilities in the network, or a misconfiguration.

If there are no routing changes to transmit to a peer, a BGP speaker will periodically send keepalive messages to maintain the connection. These 19-byte keepalive packets are sent every 60 seconds by default, and they present a negligible drain on bandwidth and a router's CPU time.

Different message types play an essential role in BGP operation. Each message type includes the BGP message header.

The message header contains only three fields: a 16-byte Marker field, a 2-byte Length field, and a 1-byte Type field. The Marker field is used either to authenticate incoming BGP messages or to detect loss of synchronization between two BGP peers.

The Length field indicates the total BGP message length, including the header. The smallest BGP message is 19 bytes (16 + 2 + 1), and the largest possible message is 4096 bytes.

The Type field can have four values (1 to 4). Each of these values corresponds to one of the four BGP message types, described below:

Open Message - This message is used to establish connections with peers and includes fields for the BGP version number; AS number, hold time, and Router ID.
Keepalive Message - This message type is sent periodically between peers to maintain connections and verify paths held by the router sending the keepalive. If the periodic timer is set to a value of 0, no keepalives are sent. The recommended keepalive interval is one-third of the hold time interval. The keepalive message is a 19-byte BGP message header with no data following it.
Notification Message - This message type is used to inform the receiving router of errors. This message includes a field for error codes , which can be used to troubleshoot BGP connections.
Update Message - The update messages contain all the information BGP uses to construct a loop-free picture of the internetwork. There are three basic components of an update message: network-layer reachability information (NLRI), path attributes, and withdrawn routes. These three elements are described briefly in the following sections.

The BGP neighbor negotiation process proceeds through various states, or stages, which can be described in terms of a finite-state machine (FSM).

The BGP FSM
Recall that in Chapter 6, EIGRP, an FSM was defined as a set of possible states something can go through, what events causes those states, and what events result from those states. The figure presents the BGP FSM, which includes the states and some of the message events that cause them.

The six states of the BGP FSM are described below:

Idle - Idle is the first state of a BGP connection. BGP is waiting for a start event, which is normally initiated by an administrator or a network event. At the start event, BGP initializes its resources, resets a connect retry timer, and starts listening for a TCP connection that may be initiated by a remote peer. BGP then transitions to a Connect state. Note that BGP can transition back to Idle from any other state in case of errors.
Connect - In the connect state; BGP is waiting for the TCP connection to be completed. If the TCP connection is successful, the state transitions to OpenSent. If the TCP connection fails, the state transitions to the Active state, and the router tries to connect again. If the connect retry timer expires, the state remains in the Connect state, the timer is reset, and a TCP connection is initiated. In case of any other event (initiated by system or administrator), the state returns to Idle.
Active - In the Active state, BGP is trying to acquire a peer by initiating a TCP connection. If it is successful, it transitions to OpenSent. If the connect retry timer expires, BGP restarts the connect timer and falls back to the Connect state. While active, BGP is still listening for a connection that may be initiated from another peer. The state may go back to Idle in case of other events, such as a stop event initiated by the system or the operator.

In general, a neighbor state that is flip-flopping between "Connect" and "Active" is an indication that something is wrong and that there are problems with the TCP connection. It could be because of many TCP retransmissions, or the incapability of a neighbor to reach the IP address of its peer.

OpenSent - In the OpenSent state, BGP is waiting for an open message from its peer. The open message is checked for correctness. In case of errors, such as an incompatible version number or an unacceptable AS, the system sends an error notification message and goes back to idle. If there are no errors, BGP starts sending keepalive messages and resets the keepalive timer. At this stage, the hold time is negotiated and the smaller value is taken. If the negotiated hold time is 0, the hold timer and the keepalive timer are not restarted.

At the OpenSent state, BGP recognizes whether the peer belongs to the same AS (an IBGP peer) or to a different AS (an EBGP peer) by comparing its AS number to the AS number of its peer.

When a TCP disconnect is detected, the state falls back to Active. For any other errors, such as an expiration of the hold timer, BGP sends a notification message with the corresponding error code and falls back to the Idle state.

OpenConfirm - While in OpenConfirm state, BGP is waiting for a keepalive or notification message. If a keepalive message is received, the state goes to the Established state, and the neighbor negotiation is complete. If the system receives an update or keepalive message, it restarts the hold timer (assuming that the negotiated hold time is not 0). If a notification message is received, the state falls back to Idle. The system sends periodic keepalive messages at the rate set by the keepalive timer. In the case of any TCP disconnect or in response to any stop event (initiated by the system or the administrator), the state falls back to Idle. In response to any other event, the system sends a notification message with an FSM error code and returns to the idle state.
Established - Established is the final state in the neighbor negotiation; BGP starts exchanging update packets with its peers. If it is nonzero, the hold timer is restarted at the receipt of an update or keepalive message.

Each update message is checked for errors, such as missing or duplicate attributes. If errors are found, a notification is sent to the peer. Any notification received while in the Established state will cause the BGP process to drop the receiving peer back to idle. If the hold timer expires, a disconnect notification is received from TCP or if a stop event is received, the system will fall back to Idle.

Rather than advertise reachable destinations as a network and a subnet mask, BGP advertises them using NLRI, which consists of prefixes and prefix lengths. The prefix represents the reachable destination, and the prefix length represents the number of bits set in the subnet mask. For example, 10.1.1.0 255.255.255.0 has a prefix of 10.1.1.0 and a prefix length of 24 (there are 24 bits set in the subnet mask), and thus would be advertised by BGP as 10.1.1.0/24.

The NLRI consists of multiple instances of the 2-tuple <length, prefix>. A tuple is a mathematical term for a set of elements (in this case, the 2 refers to the fact that there are only two elements in the set). Thus, the NLRI <19, 192.24.160.0> represents the prefix of 192.24.160.0, and the length is a 19-bit mask. In decimal terms, this NLRI refers to a supernet: 192.24.160.0 255.255.224.0.

Withdrawn Routes
Withdrawn routes provide a list of routing updates that are no longer reachable and that need to be withdrawn (removed) from the BGP routing table. Withdrawn routes have the same format as NLRI.

An update message that has no NLRI or path attribute information is used to advertise only routes to be withdrawn from service.

Much of the work you will do configuring BGP focuses on path attributes. Each route has its own set of defined attributes, which can include path information, route preference, next-hop, and aggregation information. Administrators use these values to enforce routing policy. Based on attribute values, you can configure BGP to filter routing information, prefer certain paths, or otherwise customize its behavior. Many of these attributes and policy configurations are explored later in this chapter, in the section "The BGP Routing Process."

Every update message has a variable-length sequence of path attributes in the form <attribute type, attribute length, attribute value>.

Because you will use path attributes extensively when configuring routing policy, you should note that not all vendor implementations of BGP recognize the same attributes. In fact, there are four different attribute types:

Well-known mandatory - An attribute that must exist in the BGP update packet. It must be recognized by all BGP implementations. If a well-known attribute is missing, a notification error will be generated. This ensures that all BGP implementations agree on a standard set of attributes. An example of a well-known mandatory attribute is the AS_Path attribute.
Well-known discretionary - An attribute that is recognized by all BGP implementations, but may or may not be sent in the BGP update message. An example of a well-known discretionary attribute is the LOCAL_PREF attribute.
Optional transitive - An attribute that may or may not be recognized by all BGP implementations (thus, optional). Because the attribute is transitive, BGP should accept and advertise the attribute even if it is not recognized.
Optional nontransitive - An attribute that may or may not be recognized by all BGP implementations. Whether or not the receiving BGP router recognizes the attribute, it is nontransitive and is not passed along to other BGP peers.

Each individual attribute is identified by its type and attribute code. The figure lists the attribute codes that are currently defined.

Several of these attributes are discussed later in this chapter. Attributes 11, 12, and 13 are not implemented by Cisco because they do not add functionality and will not be covered.

BGP configuration commands appear on the surface to mirror the syntax of familiar IGP (for example, RIP and OSPF) commands. Although the syntax is similar, the function of these commands is significantly different.

To begin configuring a BGP process, issue the following familiar command:

Router(config)#router bgp AS-number

Note that the Cisco IOS permits only one BGP process to run at a time. Thus, a router cannot belong to more than one AS.

The network command is used with IGPs, such as RIP, to determine the interfaces on which to send and receive updates, as well as which directly connected networks to advertise. However, when configuring BGP, the network command does not affect what interfaces BGP runs on. Thus, configuring just a network statement will not establish a BGP neighbor relationship. This is a major difference between BGP and IGPs. The network statement follows this syntax:

Router(config-router)#network network-number [mask network-mask]

In BGP, the network command tells the BGP process what locally learned networks to advertise. The networks can be connected routes, static routes, or routes learned via a dynamic routing protocol, such as RIP. These networks must also exist in the local router's routing table or they will not be sent out in updates. The mask keyword can be used with the network command to specify individual subnets. Routes learned by the BGP process are propagated by default but are often filtered by a routing policy.

For a BGP router to establish a neighbor relationship with another BGP router, issue the this configuration command:

Router(config-router)#neighbor ip-address remote-as AS-number

This command serves to identify a peer router with which the local router will establish a session. The ip-address argument is the neighbor interface's IP address. The AS-number argument determines whether the neighbor router is an EBGP or an IBGP neighbor.

When configuring BGP, remember that BGP supports two types of sessions, each with slightly different configuration requirements:

EBGP session - Occurs between routers in two different autonomous systems. These routers are usually adjacent to one another, sharing the same medium and a subnet (see RTA in the figure).
IBGP session - Occurs between routers in the same AS, and is used to coordinate and synchronize routing policy within the AS. Neighbors may be located anywhere in the AS, even several hops away from one another. An IBGP session typically occurs between routers in the same AS in an ISP (see RTC in the figure).

If the AS-number configured in the router bgp command is identical to the AS-number configured in the neighbor statement, BGP will initiate an internal session. If the field values are different, BGP will build an external session.

In the example depicted by the figure, RTB speaks EBGP to RTA, which is a different AS, and IBGP to RTC, which resides in the same AS. To start the EBGP process with RTA, use the following commands:

RTB(config)#router bgp 200
RTB(config-router)#neighbor 10.1.1.2 remote-as 100

Note that the neighbor command's remote-as value, 100, is different from the AS number specified by the router bgp command (200). Because the two AS numbers are different, BGP will start an EBGP connection with RTA. Communication will occur between autonomous systems.

The commands to configure IBGP are essentially the same as those to configure EBGP, except for the possible addition of the update-source interface keyword.

RTB(config)#router bgp 200
RTB(config-router)#neighbor 172.16.1.2 remote-as 200
RTB(config-router)#neighbor 172.16.1.2 update-source loopback 0

The remote-as value (200) is the same as RTB's BGP AS number, so BGP recognizes that this connection will occur within AS 200. It attempts to establish an IBGP session. In reality, AS 200 is not a remote AS at all; it is the local AS because both routers reside there. For simplicity, the keyword remote-as is used when configuring both EBGP and IBGP sessions.

Note also the second neighbor command, which is used to assign an optional parameter to be used when communicating with that neighbor. It is typical to use multiple neighbor commands for the same BGP neighbor, each specifying a particular BGP option.

In this example, the option specified is update-source loopback 0. If multiple pathways to the neighbor exist, then the router can use any IP interface to speak BGP with that neighbor. The update-source loopback 0 command is used to instruct the router to use interface loopback 0 for TCP connections. This command is typically used in all IBGP configurations. Without this command, BGP routers can use only the closest IP interface to the peer. The capability to use any operational interface provides BGP with robustness in case the link to the closet interface fails. Because EBGP sessions are typically point-to-point, there is no need to use this command with EBGP.

Returning to the example configuration, assume that the following route appears in RTB's table:

O 192.168.1.0/24 [110/74] via 10.2.2.1, 00:31:34, Serial2

RTB learned this route via an IGP (OSPF). AS 200 uses OSPF internally to exchange route information. Can RTB advertise this network via BGP? Certainly. Redistributing OSPF into BGP will work, but the BGP network command will do the same thing:

RTB(config-router)#network 172.16.1.0 mask 255.255.255.252
RTB(config-router)#network 10.1.1.0 mask 255.255.255.252
RTB(config-router)#network 192.168.1.0

The first two network commands above include the mask keyword, so only a particular subnet is specified. The third network command results in the OSPF route being advertised by BGP without redistribution. Remember that the BGP network command works differently than the IGP network command.

These example configurations have shown how to configure EBGP and IBGP. However, what's the difference between the configuration types? The figure presents some of the important characteristics between these BGP session types.

RTZ and RTY have established an EBGP session. EBGP peers are normally directly connected, but there are certain exceptions to this requirement. In contrast, IBGP peers merely require TCP/IP connectivity within the same AS. As long as RTY can communicate with RTW using TCP, both routers can establish an IBGP session. If needed, an IGP such as OSPF can provide IBGP peers with routes to each other.

In a typical configuration, an IBGP router maintains IBGP sessions with all other IBGP routers in the AS, forming a logical full mesh. This is necessary because IBGP routers do not advertise routes learned via IBGP to other IBGP peers (to prevent routing loops). In other words, if you want your IBGP routers to exchange BGP routes with each other, you should configure a full mesh. An alternative to this approach is configuring a route reflector, which is discussed in Chapter 9, Scaling BGP.

As noted, EBGP neighbors must be directly connected to establish an EBGP session. However, look again at RTW and RTU in the figure. These routers can maintain an EBGP session even though a non-BGP box, RTV, separates them. In this situation, EBGP is running across a non-BGP router using a configurable option called EBGP multihop. EBGP multihop is a Cisco IOS option that allows RTW and RTU to be logically connected in an EBGP session, despite the fact that RTV does not support BGP. The EBGP multihop option is configured on each peer with the following command:

Router(config-router)#neighbor ip-address ebgp-multihop [hops]

This command enables you to specify how many hops (up to 255) separate the EBGP peers. The following commands could be applied to the routers in the example:

RTW(config)#router bgp 200
RTW(config-router)#neighbor 1.1.1.2 remote-as 300
RTW(config-router)#neighbor 1.1.1.2 ebgp-multihop 2
RTU(config)#router bgp 300
RTU(config-router)#neighbor 2.2.2.1 remote-as 200
RTU(config-router)#neighbor 2.2.2.1 ebgp-multihop 2

When configuring BGP, changes made to an existing configuration may not appear immediately. To force BGP to clear its table and reset BGP sessions, use the clear ip bgp * command. The easiest way to enter this command is as follows:

Router#clear ip bgp *

The asterisk (*) is a wildcard that matches all table entries; so all BGP routes are lost while the neighbor relationships are reset. This is expedient and very useful in a lab situation, but caution should be exercised when issuing this command on a production router. On an Internet backbone router, it may be more appropriate to use this command with a specific IP address, as shown:

Router#clear ip bgp 10.0.0.0

By specifying a particular network to clear, you prevent all BGP information from being temporarily lost.

This example demonstrates the different types of BGP peering sessions you will encounter. An IBGP peering session is formed within AS3, between the loopback address of RTA and a physical address of RTF. An EBGP session is also formed between AS3 and AS1 by using the two directly connected IP addresses of RTA and RTC. Another EBGP session is formed between RTF in AS3 and RTD in AS2, using IP addresses that are not on the same segment (multihop).

It is important to remember that the BGP peers will never become established unless there is IP connectivity between the two peers. In this example, OSPF is used to establish the required internal connectivity between RTD and RTE.

Note: Click on topology to view command outputs.

In RTF's configuration, you can see the ebgp-multihop 2 command being used as part of the neighbor configuration. This is an indication that the exterior BGP peer is not directly connected and can be reached at a maximum of two hops away. Remember that EBGP multihop is applicable only with EBGP, and not with IBGP.

The example also shows how the peer connection will look after the neighbors are in an established state. From RTF's point of view, neighbor 172.16.2.254 is an internal neighbor that belongs to AS3. The neighbor connection is running BGP Version 4 with a table version of 2. The table version changes every time the BGP table is updated.

The other RTF neighbor, 192.68.12.1, is also in an Established state. This is an external neighbor that belongs to AS2. Note that the display indicates that this neighbor is two hops away and is configured using the ebgp-multihop command.

BGP does not advertise routes learned via IBGP peers to other IBGP peers. If it did, BGP routing inside the AS would present a dangerous potential for routing loops. For IBGP routers to learn about all BGP routes inside the AS, they must connect to every other IBGP router in a full IBGP mesh. This full mesh needs to be only logical, not physical. In other words, as long as the IBGP peers can connect to each other using TCP/IP, you can create a logical full mesh even if the routers are not directly connected.

The figure illustrates one of the common mistakes that administrators make when setting BGP routing within an AS. This ISP network has three POPs: San Jose, San Francisco, and Los Angeles. Each POP has multiple non-BGP routers and a BGP border router running EBGP with other autonomous systems. The administrator has set up an IBGP connection between the San Jose border router and the San Francisco border router, and another IBGP connection between the San Francisco border router and the Los Angeles border router. In this configuration, EBGP routes learned via San Jose are given to San Francisco, EBGP routes learned via San Francisco are given to San Jose and Los Angeles, and EBGP routes learned via Los Angeles are given to San Francisco.

However, routing in this scenario is not complete. EBGP routes learned via San Jose will not be given to Los Angeles, and EBGP routes learned via Los Angeles will not be given to San Jose. This is because the San Francisco router will not advertise IBGP routes between San Jose and Los Angeles. What is needed is an additional IBGP connection between San Jose and Los Angeles (shown in the figure as a dotted line). You will see in Chapter 9 how this situation could be handled by using the concept of route reflectors, an option that scales much better when the AS has a large number of IBGP peers.

The following section discusses the synchronization issues that must be addressed when running IBGP within an AS.

When an IBGP router receives an update about a destination from an IBGP peer, it tries to verify reachability to that destination via an IGP route, such as RIP or OSPF. If the IBGP router can not find the destination network in its IGP routing table, it will not advertise the destination to other BGP peers. If the route is not reachable through the IGP running within the AS, non-BGP routers will not be capable of routing traffic passing through the AS toward this destination. It is pointless to advertise destinations to external peers if traffic sent through this AS is going to be dropped by some non-BGP router within the AS.

If the IBGP router has an IGP route to this destination, the route is considered synchronized, and the router will announce it to other BGP peers. Otherwise, the router will treat the route as not being synchronized with the IGP and will not advertise it.

Consider the situation illustrated in the figure. ISP1 and ISP2 are using ISP3 as a transit AS. ISP3 has multiple routers in its AS and is running BGP only on the border routers. (Even though RTB and RTD are carrying transit traffic, ISP3 has not configured BGP on these routers.) ISP3 is using an IGP inside the AS for internal connectivity.

Assume that ISP1 is advertising route 192.213.1.0/24 to ISP3. Because RTA and RTC are running IBGP, RTA will propagate the route to RTC. Note that other routers besides RTA and RTC are not running BGP and have no knowledge of the existence of route 192.213.1.0/24.

In the situation illustrated in the Figure, if RTC advertises the route to ISP2, traffic toward the destination 192.213.1.0/24 will start flowing toward RTC. RTC will do a recursive lookup in its IP routing table and will direct the traffic toward the next hop, RTB. RTB, having no knowledge of the BGP routes, will drop the traffic because it has no route of the destination. This has happened because there is no synchronization between BGP and the IGP.

The BGP synchronization rule states that a BGP router should not advertise to external neighbors the destinations learned from inside BGP neighbors, unless those destinations are also known via an IGP. If a router knows about these destinations via an IGP, it assumes that the route has already been propagated inside the AS, and internal reachability is guaranteed.

The consequence of injecting BGP routes inside an AS is costly. Redistributing routes from BGP into the IGP will result in major overhead on the internal routers, which might not be equipped to handle that many routes. Besides, carrying all external routes inside an AS is not really necessary. Unless traffic is transiting the AS, internal non-BGP routers can simply default to one of the BGP routers. Of course, this may result in suboptimal routing because there is no guarantee for a shortest path for each route. Because the alternative is maintaining thousands of routes inside the AS, giving up optimal routing is well worth it.

The Cisco IOS offers an optional command called no synchronization. This command enables BGP to override the synchronization requirement, allowing the router to advertise routes learned via IBGP regardless of an existence of an IGP route.

In practice, two situations exist in which synchronization can be safely turned off on border routers:

When all transit routers inside the AS are running fully meshed IBGP. Internal reachability is guaranteed because a route that is learned via EBGP on any of the border routers will automatically be passed on via IBGP to all other transit routers.
When the AS is not a transit AS.

Verify BGP operation by using the key show commands listed in Figure .

If the router has not installed the BGP routes expected, use the show ip bgp command to verify that BGP has learned these routes.

Note that the output of this command includes the BGP table version number, which increments each time the local router receives changed route information. The AS_Path, among other key attributes, is also included in this table. Routes that are deemed the best are denoted by the > character and are installed in the router's IP routing table.

If an expected BGP route does not appear in the BGP table, you can use the show ip bgp neighbors command to verify that your router has established a BGP connection with its neighbors.

The most important information output of this command is the BGP state that exists between the neighbors. Anything other than "Established" indicates that the peers are not fully communicating.

The show ip bgp neighbors command supports a number of optional parameters and uses the following syntax shown in Figure .

This section describes the process that BGP uses to make routing decisions. Routes are exchanged between BGP peers via update messages. BGP routers receive the update messages, run some policies or filters over the updates, and then pass the routes on to other BGP peers.

The Cisco implementation of BGP keeps track of all BGP updates in a BGP table separate from the IP routing table. In case multiple routes to the same destination exist, BGP does not flood its peers with all those routes. Instead, it picks only the best route and sends it to the peers. In addition to passing along routes from peers, a BGP router may originate routing updates to advertise networks that belong to its own AS. Valid local routes originated in the system and the best routes learned from BGP peers are then installed in the IP routing table. The IP routing table is used for the final routing decision.

The following sections detail the BGP routing process, implementing BGP routing policy, controlling BGP routing with attributes, and handling the BGP decision process.

To model the BGP process, imagine each BGP speaker having different pools of routes and different policy engines applied to the routes. The model, shown in the figure, would involve the following components:

Routes received from peers - BGP receives routes from external or internal peers. Depending on what is configured in the input policy engines, some or all of these routes will make it into the router BGP table.
Input policy engine - This engine handles route filtering and attribute manipulation. Filtering is done based on parameters such as the IP prefix, the AS path, and attribute information.

The input policy engine is used to manipulate the path attributes and influence BGP's decision process, which affects what routes it will actually use to reach a certain destination. For example, if you choose to filter a certain network coming from a peer, it is an indication to BGP that it should not reach that network via that peer. Or, if you give a certain route a better local preference than some other path to the same destination, it is an indication that BGP should prefer this route over the other available routes.

The decision process - BGP goes through a decision process to decide the route(s) it wants to use to reach a certain destination. The decision process is based on the routes that made it into the router after the input policy engine was applied, and is performed on the routes in the BGP routing table. The decision process looks at all the available routes for the same destination, compares the different attributes associated with each route, and chooses one best route. The decision process is discussed later in this chapter, in the section "The BGP Decision Process."
Routes used by the router - The best routes, as identified by the decision process, are candidates to be advertised to other peers. These routes are also presented to the routing engine to be placed in the IP routing table (not all routes presented to the routing engine will be placed in the routing table because multiple protocols may present the same prefix for installation, and the router must choose among them based on administrative distance). In addition to routes passed on from other peers, the router (if configured to do so) originates updates about the networks inside its own AS. This is how an AS injects its routes into the outside world.
Output policy engine - This is the same engine as the input policy engine, applied on the output side. Routes used by the router (the best routes) in addition to routes that the router generates locally are given to this engine for processing. The engine might apply filters and might change some of the attributes (such as AS_Path or metric) before sending the update.

The output policy engine also differentiates between internal and external peers; for example, routes learned from internal peers cannot be passed on to internal peers.

Routes advertised to peers - The routes advertised to peers are routes that made it through the output engine and are advertised to the BGP peers, internal or external.

Input and output policies generally are defined using route maps. Route maps are used with BGP to control and modify routing information and to define the conditions routes are redistributed between routing domains.

Recall from Chapter 7, Route Optimization, that the route-map command is entered using the following syntax:

Router(config)#route-map map-tag [permit | deny] [sequence-number]

Note that map-tag is a name that identifies the route map; the sequence-number indicates the position that an instance of the route map is to have in relation to other instances of the same route map. Instances are ordered sequentially, starting with the number 10 by default.

For example, the route-map command might be used to define a route map named MYMAP:

route-map MYMAP permit 10
! First set of conditions goes here.
route-map MYMAP permit 20
! Second set of conditions goes here.

When BGP applies MYMAP to routing updates, it applies the lowest instance first (in this case, instance 10). If the first set of conditions is not met, the second instance is applied, and so on, until either a set of conditions has been met or there are no more sets of conditions to apply.

The condition portion of a route map is set by using the match and set commands. The match command specifies what criteria must be matched, and the set command specifies an action that is to be taken if the routing update meets the conditions defined by the match command.

Figure shows the commands needed to create a simple route map. Access list 1 is used here as a way to specify routes.

You may recall that there are two types of access lists, standard and extended; the main difference is that a standard access list is applied to the source IP address, whereas an extended access list is normally applied to the source and destination of a packet. However, when used to filter routes within BGP, the first address/wildcard bit set given in an extended access list applies to the prefix, and the second address/wildcard bit set applies to the subnet mask of the advertised route.

In Figure , access list 1 identifies all routes of the form 1.1.1.x. A routing update of the form 1.1.1.x will match the access list and will be propagated (because of the permit keyword in the access list) with a metric set to 5.

When an update does not meet the criteria of a route map instance, BGP applies the next instance, and so on, until an action is taken or there are no more route map instances to apply. If the update does not match in any instance, the update is not redistributed or controlled.

The route map can be applied on the incoming (using the keyword in) or the outgoing (using the keyword out) BGP updates. Figure shows the commands needed to apply the route map MYMAP on the outgoing updates toward BGP neighbor 172.16.20.2.

Traffic inside and outside an AS always flows according to the road map laid out by routes. Altering the routes, changes traffic behavior. Among the questions that organizations and service providers ask about controlling routes are these: How do I prevent my private networks from being advertised? How do I filter routing updates coming from a particular neighbor? How do I make sure that I use this link or this provider rather than another one? Through the use of attributes, BGP provides the answer to all these questions and more.

When a BGP speaker receives updates from multiple autonomous systems that describe different paths to the same destination, it must choose the single best path for reaching that destination. Once chosen, BGP propagates the best path to its neighbors. The decision is based on the value of attributes (such as Next Hop or Local Preference) that the update contains and other configurable BGP factors. The following sections describe these key attributes that BGP uses in the decision-making process:

Next Hop
AS_Path
Atomic Aggregate
Aggregator
Local Preference
Weight
Multiple Exit Discriminator (MED)
Origin

Another BGP attribute, Community, is considered in Chapter 9.

The Next Hop attribute is a well-known mandatory attribute (type code 3). In terms of an IGP, such as RIP, the "next hop" to reach a route is the IP address of the router that has announced the route.

The Next Hop concept with BGP is more complex and takes one of the following three forms:

For EBGP sessions, the next hop is the IP address of the neighbor that announced the route.
For IBGP sessions, where routes originated inside the AS, the next hop is the IP address of the neighbor that announced the route. For routes injected into the AS via EBGP, the next hop learned from EBGP is carried unaltered into IBGP. The next hop is the IP address of the EBGP neighbor from which the route was learned.
When the route is advertised on a multiaccess medium (such as Ethernet or Frame Relay) the next hop is usually the IP address of the interface of the router. This will be the interface connected to the media that originated the route.

The figure illustrates the BGP Next Hop attribute. RTC is running an EBGP session with RTZ router and an IBGP session with the RTA. RTC is learning route 128.213.1.0/24 from the RTZ. In turn, RTC router is injecting the local route 128.212.1.0/24 into BGP.

RTA learns route 128.212.1.0/24 via 2.2.2.2, the IP address of the IBGP peer announcing the route. Thus, according to the definition, 2.2.2.2 is the next hop for RTA to reach 128.212.1.0/24. Similarly, RTC sees 128.213.1.0/24 coming from RTZ via Next Hop 1.1.1.1. When it passes this route update to RTA via IBGP, RTC includes the Next Hop information, unaltered. Thus, RTA receives the BGP update about 128.213.1.0/24 with Next Hop 1.1.1.1. This is an example of the EBGP next hop being carried into IBGP.

As you can see, the Next Hop is not necessarily reachable via a direct connection. RTA's next hop for 128.213.1.0/24 is 1.1.1.1, but reaching it requires a pathway through 3.3.3.3. Thus, the next-hop behavior mandates a recursive IP routing table lookup for a router to know where to send the packet. To reach the Next Hop 1.1.1.1, RTA will consult its IGP routing table to see if and how 1.1.1.1 is reachable. This recursive search continues until the router associates destination 1.1.1.1 with an outgoing interface. The same recursive behavior is performed to reach Next Hop 2.2.2.2. If a hop is not reachable via an IGP, BGP would consider the route as being inaccessible.

Recall that a network link is considered multiaccess if more than two hosts can potentially connect to it. Routers on a multiaccess link share the same IP subnet and can physically access all other connected routers in one hop. Ethernet, Frame Relay, and ATM are examples of multiaccess media.

BGP speakers should always advertise the actual source of the route if the source is on the same multiaccess link as the speaker. In other words, if RTC is advertising a route learned from RTB, and if RTC and RTB share a common multiaccess media, then when RTC advertises the route, it should indicate RTB as being the source of the route. If not, routers on the same medium would have to make an unnecessary hop via RTC to get to a router that is sitting in the same segment.

In the figure, RTA, RTB, and RTC share a common multiaccess medium. RTA and RTC are running EBGP, while RTC and RTB are running OSPF. RTC has learned network 11.11.11.0/24 from RTB via OSPF and is advertising it to RTA via EBGP. Because RTA and RTB are running different routing protocols, you might think that RTA would consider RTC (10.10.10.2) as its next hop to reach 11.11.11.0/24, but this is incorrect. The correct behavior is for RTA to consider RTB (10.10.10.3) as the next hop because RTB shares the same medium with RTC.

When the media is broadcast, such as Ethernet and FDDI, physical connectivity is a given and the Next Hop behavior is no problem. However, when the media is nonbroadcast, such as Frame Relay and ATM, special care should be taken as described in the following section.

On an NBMA network, the many-to-many direct interaction between routers is not guaranteed unless virtual circuits are configured from each router to all other routers. The primary reason that most organizations implement a hub-and-spoke topology is because of cost considerations. In a hub-and-spoke topology, multiple remote sites have virtual circuits connected to one or more routers at a central site. Figure illustrates an example of next-hop behavior in a nonbroadcast multiaccess environment.

The only difference between the environments illustrated is that the medium in Figure is a Frame Relay cloud that is NBMA. RTC is the hub router; RTA and RTB are the spokes. Note that the virtual circuits are laid out between RTC and RTA, and between RTC and RTB, but not between RTA and RTB. This is a partially meshed topology.

RTA gets a BGP routing update about 11.11.11.0/24 from RTC and would try to use RTB (10.10.10.3) as the next hop (the same behavior as on multiaccess media). Routing will fail because no virtual circuit exists between RTA and RTB.

Cisco IOS supports a special-case parameter that remedies this situation. The next-hop-self keyword forces the router (in this case, RTC) to advertise 11.11.11.0/24 with itself as the next hop (10.10.10.2). RTA then directs its traffic to RTC to reach destination 11.11.11.0/24. The syntax for this option is as follows:

Router(config-router)#neighbor ip-address next-hop-self

For RTC, you would issue the command:

RTC(config-router)#neighbor 10.10.10.1 next-hop-self

An AS_Path attribute is a well-known mandatory attribute (type code 2). It is the sequence of AS numbers that a route has traversed to reach a destination. The AS that originates the route adds its own AS number when sending the route to its external BGP peers. Thereafter, each AS that receives the route and passes it on to other BGP peers will prepend its own AS number to the list. Prepending is the act of adding the AS number to the beginning of the list. The final list has all the AS numbers that a route has traversed with the AS number of the AS that originated the route at the end of the list. This type of AS_Path list is called an AS_Sequence because all the AS numbers are ordered sequentially.

BGP uses the AS_Path attribute as part of the routing updates (update packet) to ensure a loop-free topology on the Internet. Each route that gets passed between BGP peers will carry a list of all AS numbers that the route has already been through. If the route is advertised to the AS that originated it, that AS will see itself as part of the AS_Path attribute list and will not accept the route. BGP speakers prepend their AS numbers when advertising routing updates to other autonomous systems (external peers). When the route is passed to a BGP speaker within the same AS, the AS_Path information is left intact.

The figure illustrates the AS_Path attribute at each instance of the route 172.16.10.0/24, originating in AS1 and passed to AS2, AS3, AS4, and back to AS1. Note how each AS that passes the route to other external peers adds its own AS number to the beginning of the list. When the route gets back to AS1, the BGP border router will realize that this route has already been through its AS (AS number 1 appears in the list) and will not accept the route.

AS_Path information is one of the attributes that BGP looks at to determine the best route to take to get to a destination. In comparing two or more different routes, given that all other attributes are identical, a shorter path is always preferred. In case of a tie in AS_Path length, other attributes are used to make the decision.

In an effort to conserve AS numbers, customers whose routing policies are an extension of the policies of their provider generally are not assigned a legal AS number. Thus, if a customer is single-homed or multihomed to the same provider, the provider generally requests that the customer use an AS number taken from the private pool (64,512 to 65,535). As such, all BGP updates that the provider receives from its customer contain private AS numbers.

Private AS numbers cannot be leaked to the Internet because they are not unique. For this reason, Cisco has implemented a feature to strip private AS numbers out of the AS_Path list before the routes get propagated to the Internet.

In the figure, AS1 is providing Internet connectivity to its customer AS 65001. Because the customer connects to only this provider and has no plans to connect to an additional provider in the near future, the customer has been allocated a private AS number. If the customer later needs to connect to another provider, a legal AS number should be assigned.

Prefixes originating from AS65001 in the figure have an AS_Path of 65001. Note the prefix 172.16.220.0/24 as it leaves AS65001. For AS1 to propagate the prefix to the Internet, it would have to strip the private AS number. When the prefix reaches the Internet, it would look like it has originated from the provider's AS. Note how prefix 172.16.220.0/24 has reached the network access point (NAP) with AS_Path 1.

BGP will strip private AS numbers only when propagating updates to the external peers. This means that the AS stripping would be configured on RTB as part of its neighbor connection to RTC.

Privately numbered autonomous systems should be used only when connected to a single provider. If the AS_Path contains a mixture of private and legal AS numbers, BGP will view this as an illegal design. BGP will not strip the private AS numbers from the list, and the update will be treated as usual. Only AS_Path lists that contain private AS numbers in the range 64,512 to 65,535 are stripped.

The example below demonstrates how BGP can be configured to prevent the leakage of private AS numbers into the Internet.

RTB(config)#router bgp 1
RTB(config-router)#neighbor 172.16.20.2 remote-as 65001
RTB(config-router)#neighbor 192.168.6.3 remote-as 7
RTB(config-router)#neighbor 192.168.6.3 remove-private-as

Note how RTB is using the remove-private-as keyword in its neighbor connection to AS7.

AS_Path information is manipulated to affect interdomain routing behavior. Because BGP prefers a shorter path over a longer one, system operators are tempted to change the path information by including dummy AS path numbers that would increase the path length and influence the traffic trajectory one way or the other. Cisco's implementation enables a user to insert AS numbers at the beginning of an AS_Path to make the path length longer. The following example shows how this feature can be used (see Figures and ).

Because BGP prefers a shorter path over a longer one, system operators can add to the AS_Path length by prepending extra path entries and can influence the preferred route used by other autonomous systems. The following example shows how this feature can be used.

In Figure , AS 50 is connected to two providers, AS 200 and AS 100. AS 100 is directly connected to an Internet network access point (NAP), while AS 200 must go through an extra hop via AS 300 to reach the Internet. Figure shows the AS path of prefix 192.213.1.0/24 as it traverses the autonomous systems on its way to the NAP. When the 192.213.1.0/24 prefix reaches the NAP via AS 300, it has an AS_Path of 300 200 50. If the same prefix reaches the NAP via AS 100, it has an AS_Path of 100 50, which is shorter. Autonomous systems upstream from the NAP prefer the shorter AS_Path length and direct their traffic toward AS 100 at all times for destination 192.213.1.0/24.

AS 50's network administrator is not too happy about this behavior because she prefers Internet traffic to come in via the higher bandwidth T3 link to AS 200, rather than through the slower link to AS 100. AS 50's network administrator can resolve this by manipulating the AS_Path information, inserting extra AS hops when sending routing updates to AS 100. One common practice is for AS 50 to repeat its AS number as many times as necessary to tip the balance and make the path via AS 200 appear to become the shorter route.

In Figure , AS 50 will insert two AS numbers 50 50 at the beginning of the AS_Path of prefix 192.213.1.0/24. When the prefix 192.213.1.0/24 reaches the NAP via AS 100, it has the AS_Path 100 50 50 50, which is longer than the AS_Path 300 200 50 via AS 300. Autonomous systems upstream of the NAP prefer the shortest path and thus direct the traffic toward AS 300 for destination 192.213.1.0/24.

The bogus number should always be a duplicate of the AS announcing the route or the neighbor that the route is learned from (in case an AS is increasing the path length for incoming updates). Adding any other number is misleading and could potentially lead to routing loops.

To configure the prepending of dummy AS numbers, a route map is used in a BGP neighbor statement:

RTX(config)#router bgp 100
RTX(config-router)#neighbor 172.16.20.2 remote-as 300
RTX(config-router)#neighbor 172.16.20.2 route-map AddASnumbers out
RTX(config-router)#exit
RTX(config)#route-map AddASnumbers permit
RTX(config-route-map)#set as-path prepend 100 100

The Atomic Aggregate attribute is a well-know discretionary attribute (type code 6). The Atomic Aggregate attribute is set to either True or False. If True, this attribute alerts BGP routers that multiple destinations have been grouped into a single update. In other words, the BGP router that sent the update had a more specific route to the destination but did not send it. Because this can lead to routing problems, the Atomic Aggregate attribute warns receiving routers that the information they are receiving is not necessarily the most complete route information available.

You can manually configure BGP to summarize routes by using the aggregate-address command, which has the syntax shown in the figure.

Using the aggregate-address command with no arguments will create an aggregate entry (that is, a supernet route) in the BGP routing table, as long as the router knows at least one specific BGP route that belongs to that supernet. Thus, if your router knows just one route, it can claim to know hundreds of others (which is why this feature should be used with caution). The aggregate route will be advertised as coming from your router's AS and has the Atomic Aggregate attribute set to True, showing that information might be missing. By default, the Atomic Aggregate attribute is set to True unless you specify the as-set keyword.

Using the as-set keyword creates an aggregate entry, but the path advertised for this route will be an AS_Set consisting of all elements contained in all paths that are being summarized. Do not use this form of aggregate-address when aggregating many paths because this route must be continually withdrawn and updated as autonomous system path reachability information for the summarized route changes.

If you want your router to propagate the supernet route only, and you do not want it to propagate any more specific routes, use the summary-only keyword. When configured using this keyword, the router will send the supernet route and suppress the more specific routes known to BGP.

The following example shows the commands needed to configure a simple supernet advertisement, which will be sent with the Atomic Aggregate attribute set to True.

RTA(config)# router bgp 300
RTA(config-router)# neighbor 3.3.3.3 remote-as 200
RTA(config-router)# neighbor 2.2.2.2 remote-as 100
RTA(config-router)# network 160.10.0.0
RTA(config-router)# aggregate-address 160.0.0.0 255.0.0.0

If you wanted RTA to suppress more specific routes and to update other BGP routers only about the supernet 160.0.0.0 /8, we could issue this command:

RTA(config-router)# aggregate-address 160.0.0.0 255.0.0.0 summary-only

Aggregator is a well-known discretionary attribute (type code 7). When configuring address aggregation, you can also configure the router to include its router ID and local AS number along with the supernet route. This attribute enables ISP administrators to determine which BGP router is responsible for a particular instance of aggregation. Tracing a supernet to its original "aggregator" may be necessary for troubleshooting purposes.

Local Preference is a well-known discretionary attribute (type code 5). The Local Preference attribute is a degree of preference given to a route for comparison with other routes for the same destination. Higher Local Preference values are preferred. Local Preference, as indicated by the name, is local to the AS and is exchanged between IBGP peers only. Local Preference is not advertised to EBGP peers.

Routers within a multihomed AS may learn that they can reach the same destination network via neighbors in two (or more) different autonomous systems. In effect, there could be two or more exit points from the local AS to any given destination. You can use the Local Preference attribute to force your BGP routers to prefer one exit point over another when routing to a particular destination network. Because this attribute is communicated within all BGP routers inside the AS, all BGP routers will have a common view on how to exit the AS.

Note: Although routers always prefer the lowest route metric and administrative distance for a given destination, BGP routers prefer higher Local Preference values over lower ones.

Consider the environment illustrated in the figure. Suppose that company ANET has purchased Internet connections via two service providers, XNET and YNET. ANET is connected to YNET via a primary T3 link and to XNET via a backup T1 link.

It is important for ANET to decide what path its outbound traffic is going to take. Of course, ANET's network administrators prefer to use the T3 link via YNET in normal operation because it is a high-speed link. This is where Local Preference comes into play: RTB can give the routes coming from YNET a Local Preference of 300, and RTA can give the routes coming from XNET a lower value, such as 200. Because both RTA and RTB are exchanging routing updates via IBGP, they both agree that the exit point of the AS is going to be via YNET because of the higher local preference.

In the figure, ANET learns route 128.213.0.0/16 via XNET and YNET. RTA and RTB will agree on using YNET as the exit point for destination 128.213.0.0/16 because of the higher Local Preference value of 300. The Local Preference manipulation discussed in this case affects the traffic going out of the AS and not traffic coming into the AS. Inbound traffic can still come via the T1 link.

In Figure , AS 256 receives route updates for network 170.10.0.0 from AS 100 and AS 300. There are two ways to set the Local Preference attribute on the routers in AS 256:

Use the bgp default local-preference command
Use a route map to set local preference

Using the bgp default local-preference command, you can set the Local Preference attribute on RTC and RTD:

RTC(config)#router bgp 256
RTC(config-router)#neighbor 1.1.1.1 remote-as 100
RTC(config-router)#neighbor 128.213.11.2 remote-as 256
RTC(config-router)#bgp default local-preference 150

RTD(config)#router bgp 256
RTD(config-router)#neighbor 3.3.3.4 remote-as 300
RTD(config-router)#neighbor 128.213.11.1 remote-as 256
RTD(config-router)#bgp default local-preference 200

RTC's configuration causes it to set the Local Preference of all updates from AS 100 to 150, and RTD's configuration causes it to set the Local Preference for all updates from AS 300 to 200. Because Local Preference is exchanged within the AS, both RTC and RTD determine that updates regarding network 170.10.0.0 have a higher Local Preference when they come from AS 300 than when they come from AS 100. As a result, all traffic in AS 256 destined for network 170.10.0.0 is sent RTD.

As an alternate configuration, you can use a route map. Route maps provide more flexibility than the bgp default local-preference configuration command. When the bgp default local-preference command is used on RTD, the Local Preference attribute of all updates received by RTD will be set to 200, including updates from AS 34. The example configuration in Figure uses a route map to set the Local Preference attribute on RTD specifically for updates regarding AS 300.

Note that the configuration shown in Figure uses the ip as-path access-list command, which matches the regular expression ^300$. Essentially, this statement matches any routes that include AS 300 in their AS_Path attribute.

With the configuration, the Local Preference attribute of any update coming from AS 300 is set to 200 by instance 10 of the route map, SETLOCALIN. Instance 20 of the route map accepts all other routes.

The Weight attribute is similar to the Local Preference attribute in that it gives higher preference to the route that has a higher weight. The difference is that the weight parameter is local to the router and is not exchanged between routers. The weight parameter influences routes coming from different providers to the same router (one router with multiple connections to two or more providers). The weight parameter has a higher precedence than any other attribute. It is the most important attribute when determining route preference. The Weight attribute is a Cisco proprietary attribute.

The Multiple-exit-discriminator (MED) attribute is an optional nontransitive attribute (type code 4). It informs external neighbors about the preferred path into an AS that has multiple entry points; a lower MED is preferred over a higher MED.

Unlike Local Preference, the MED attribute is exchanged between autonomous systems, but a MED attribute that comes into an AS does not leave the AS. When an update enters the AS with a certain MED value, that value is used for decision making within the AS. When BGP passes on the routing update to another AS, the MED is reset to zero (unless the outgoing MED is set to a specific value).

When the route is originated by the AS itself, the MED value typically follows the internal IGP metric of the route. This becomes useful when a customer has multiple connections to the same provider. The IGP metric reflects how close or how far a network is to a certain exit point. A network that is closer to exit point A than to exit point B will have a lower IGP metric in the border router connected to A. When the IGP metric translates to MED, traffic coming into the AS can enter from the link closer to the destination because a lower MED is preferred for the same destination. This can be used by both providers and customers to balance the traffic over multiple links between two autonomous systems.

Unless otherwise specified, the router compares MED attributes for paths from external neighbors that are in the same AS. MEDs from different autonomous systems are not comparable because the MED associated with a route usually gives some indication of the AS internal topology. Comparing MEDs from different autonomous systems would be like comparing apples and oranges. Still, if you have a reason to do so, the Cisco IOS offers the bgp always-compare-med router command.

The MED can be used by an AS to influence the outbound decision of another AS as shown in the Figure.

XNET is receiving routing updates about 128.213.0.0/16 from three different sources: San Jose (metric 120), Los Angeles (metric 200), and New York (metric 50). San Francisco will compare the two metric values coming from ANET and will prefer the San Jose router because it is advertising a lower metric (120). When the bgp always-compare-med command is used on the San Francisco router, it will then compare metric 120 with metric 50 coming from New York and will prefer New York to reach 128.213.0.0/16. Note that San Francisco could have influenced its decision by using Local Preference inside XNET to override the metrics coming from outside autonomous systems. Nevertheless, MED is still useful in case XNET prefers to base its BGP decisions on outside factors to simplify router configuration on its end. Customers who connect to the same provider in multiple locations could exchange metrics with their providers to influence each other's outbound traffic, leading to better load balancing.

In the figure, AS 100 receives updates regarding network 180.10.0.0 from RTB, RTC, and RTD. RTC and RTD are in AS 300, and RTB is in AS 400.

You can use a route map to configure the MED attribute on a router:

RTB(config)#route-map setmedout permit 10
RTB(config-route-map)#set metric 50
RTB(config)#router bgp 400
RTB(config-router)#neighbor 4.4.4.4 route-map setmedout out

By default, BGP compares only the MED attributes of routes coming from neighbors in the same external AS (such as AS 300 in the example), which means that RTA will compare the MED attribute coming from RTC (120) only to the MED attribute coming from RTD (200). Even though the update coming from RTB has the lowest MED value, RTA will choose RTC as the best path for reaching network 180.10.0.0. To force RTA to include updates for network 180.10.0.0 from RTB in the comparison, use the bgp always-compare-med router configuration command.

RTA will choose RTB as the best next hop for reaching network 180.10.0.0 (assuming that all other attributes are the same). You can also set the MED attribute when you configure the redistribution of routes into BGP. For example, on RTB you can inject the static route into BGP with a MED of 50. The preceding configuration causes RTB to send out updates for 180.10.0.0 with a MED attribute of 50.

The Origin attribute is a well-known mandatory attribute (type code 1) that indicates the origin of the routing update. BGP allows three types of origins:

IGP - The prefix is internal to the originating AS.
EGP - The prefix was learned via some EGP, such as BGP.
Incomplete - The prefix was learned by some other means, probably redistribution.

BGP considers the Origin attribute in its decision-making process to establish a preference ranking among multiple routes. Specifically, BGP prefers the path with the lowest origin type, where IGP is lower than EGP, and EGP is lower than INCOMPLETE.

BGP bases its decision process on the attribute values. When faced with multiple routes to the same destination, BGP chooses the best route for routing traffic toward the destination. The following process summarizes how BGP chooses the best route:

If the next hop is inaccessible, the route is ignored (this is why it is important to have an IGP route to the next hop).
The BGP router will prefer the path with the largest weight (weight is a Cisco-proprietary parameter).
If the weights are the same, the BGP router will prefer the route with the largest local preference.
If the routes have the same local preference, the BGP router will prefer the route that was locally originated (originated by this router).
If the local preference is the same, the BGP router will prefer the route with the shortest AS_Path.
If the AS_Path length is the same, the BGP router will prefer the route with the lowest origin type (where IGP is lower than EGP, and EGP is lower than INCOMPLETE).
If the origin type is the same, the BGP router will prefer the route with the lowest MED.
If the routes have the same MED, the BGP router will prefer the route in the following manner: External (EBGP) is better than confederation external, which is better than IBGP. (BGP confederations are not covered in this book. For more information, consult Cisco's web site, at http://www.cisco.com/univercd.
If all the preceding scenarios are identical, the BGP router will prefer the route that can be reached via the closest IGP neighbor; that is, take the shortest internal path inside the AS to reach the destination (follow the shortest path to the BGP Next Hop).
If the internal path is the same, the BGP router ID will be a tiebreaker. The BGP router will prefer the route coming from the BGP router with the lowest router ID. The router ID is usually the highest IP address on the router or the loopback address.

BGP defines the basis of routing architectures in the Internet. The segregation of networks into autonomous systems has logically defined the administrative and political borders between organizations. Interior Gateway Protocols can now run independently of each other, but still interconnect via BGP to provide global routing.

In this chapter, you learned the practical implementation details for BGP as part of the overall design problem in building reliable Internet connectivity. This chapter examined specific attributes of BGP and how they are applied individually and together to address this design problem. Although the terminology, attributes, and details of this chapter are specific to BGP, the general concepts and problems raised are pertinent to routing architecture design, regardless of what specific protocols are being utilized.

From fiber to phone lines, from huge corporate networks to a single home user, security dominates the discussion of today's computer networks. Securing an IP-based network can be a difficult task, largely because the Internet is based on open standards. Because nonproprietary technologies such as TCP/IP are so well known, their bugs and their limitations are well publicized -- and often easily exploited.

Fortunately, the rush to connect businesses, schools, and homes to the Internet has given way to a more cautious, security-savvy approach to building networks. Even as residential broadband brings "always-on" Internet connectivity to homes, average home users have taken to installing firewalls and other security measures. Meanwhile, the growth of e-commerce has prompted corporations to spend more resources on fortifying network security.

Virtually all computer networks have some portion that is IP-based, so it is imperative that you learn how to restrict and control TCP/IP access. The key to access control is the access list, or access control list (ACL). These lists are the building blocks of IP firewalls, and firewalls stand on the frontlines of Internet security. A firewall is hardware and/or software that works to protect a network from unauthorized access.

After providing a quick review of access list syntax, this chapter examines advanced IP security configurations, including restricting router access, dynamic access lists (lock-and-key), null0 routes, and the established argument. Finally, this chapter examines the next generation of IP traffic management: reflexive access lists and context-based access control.

An IP access list is a sequential collection of permit and deny conditions that apply to IP addresses or upper-layer IP protocols. There are two types of IP access lists, standard and extended. The Cisco IOS also supports access lists for numerous other protocols, as shown in Figure . The following sections focus on standard and extended IP access lists, including:

a named list configuration
a time-based extended list configuration
access list remarks (descriptions)
access list application

Standard IP access lists (numbered 1 to 99 and 1300 to 1999) filter based on source address only. The syntax for a standard list is relatively straightforward (see Figure ).

Extended IP access lists (numbered 100 to 199 and 2000 to 2699) offer more control than standard access lists by filtering based on source address, destination address, or protocol characteristics. The command syntax for an extended list is far more complex than that for a standard list. Figure presents a simplified syntax. A look at each of the parameters in this syntax is beyond the scope of this chapter, although the keywords time-range and established are discussed in the sections that follow.

In addition to numbered access lists, the Cisco IOS enables you to create IP access lists by name. Named access lists enable you to configure more IP access lists in a router than if you were to use numbered access lists. Another advantage of named lists is that you can use no permit and no deny commands to remove individual entries from the list. Currently, only packet and route filters can use named lists.

You should consider the following before configuring named access lists:

Named access lists are not compatible with older releases of IOS software.
Not all access lists that accept a number will accept a name. Currently, only access lists for packet filters and route filters on interfaces can use a name.
A standard access list and an extended access list cannot have the same name.

If you identify an access list with a name, the mode and command syntax are slightly different from those used with a numbered list. To configure a standard named list, follow these steps:

Define a standard IP access list by using a name:

router(config)# ip access-list standard name
In access-list configuration mode, specify one or more conditions allowed or denied, which determine whether the packet is passed or dropped:

router(config-std-nacl)# deny| permit {source [source-wildcard] | any}[log]
Exit access-list configuration mode:

router(config-std-nacl)# exit

The process is similar for extended named access lists:

Define an extended IP access list by using a name (but remember not to duplicate a standard list's name):

router(config)#ip access-list extended name
In access-list configuration mode, specify one or more conditions allowed or denied, which determine whether the packet is passed or dropped:

router(config-ext-nacl)# deny | permit protocol source
source-wildcard destination destination-wildcard
[precedence precedence] [tos tos] [established] [log]
[time-range time-range-name]
Exit the access-list configuration mode:

router(config-ext-nacl)#exit

Note: Named access lists will not be recognized by any software release prior to Cisco IOS Release 11.2.

In the example shown in the figure, these steps are used to create an extended named access list for a router called RTA.

Today's network security policies demand that access lists do more than use destination and source addresses to statically define whether a protocol is permitted. In some cases, an administrator may determine that certain traffic is permissible only during business hours, or that users have access to specific resources only at fixed times of day. This is possible using a time-based access list. Since IOS release 12.01(T), it is possible to implement time-based access lists based on the time of day and week by using the time-range command. There are many possible benefits of using time ranges, including the following:

To provide more control over permitting or denying a user access to resources; these resources could be an application (identified by an IP address/mask pair and a port number), or an on-demand link (identified as interesting traffic to the dialer).
To set time-based security policy, including the following:

Perimeter security using the Cisco IOS Firewall feature set or access lists
Data confidentiality with Cisco Encryption Technology or IP Security Protocol (IPSec)

To provide enhanced policy-based routing and queuing functions
To automatically reroute traffic cost effectively when provider access rates vary by time of day
To support the quality of service (QoS) service-level agreements (SLAs) that are negotiated for certain times of day when service providers can dynamically change a committed access rate (CAR) configuration
To control logging messages. Access list entries can log traffic at certain times of the day, but not constantly. Therefore, administrators can simply deny access without needing to analyze many logs generated during peak hours.

To implement a time-based extended access list, first define the name and times of the day and week of the time range, and then reference the time range by name in an access list. To apply restrictions to the access list, using the following steps:

Define a time range using a name:

router(config)#time-range time-range-name
In time-range configuration mode, use the periodic command, the absolute command, or some combination of them to define when the feature is in effect. Multiple periodic commands are allowed in a time range; only one absolute command is allowed. The periodic keyword specifies a recurring (weekly) start and end time for a time range. The absolute keyword specifies an absolute start and end time for a time range:

router(config-time-range)# periodic days-of-the-week hh:mm to[days-of-the-week] hh:mm

router(config-time-range)# absolute [start time date] [end time date]

The periodic command will take the following arguments: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday. Other possible values are daily (Monday through Sunday), weekdays (Monday through Friday), and weekend (Saturday and Sunday).
Exit the time-range configuration mode:

router(config-time-range)#exit

Currently, IP and IPX named or numbered extended access lists are the only functions that can use time ranges. The time range allows the network administrator to define when the permit or deny statements in the access list are in effect. Prior to this feature, access list statements were always in effect once they were applied.

In the figure, RTA is configured with the named access list STRICT, which references two time ranges, NO-HTTP and UDP-YES. NO-HTTP is used in conjunction with a deny statement to prevent web traffic weekdays from 8 A.M. to 6 P.M. The UDP-YES time range is used with a permit statement to allow all UDP traffic on weekends, from 12 noon to 8 P.M.

To make access list configuration easier on administrators, Cisco IOS version 12.0.2(T) introduced the remark command. Access list remarks, or comments, are used to include documentation and descriptions to facilitate the understanding of all or parts of a specific access list. By describing your lists in simple terms, you can quickly sum up the router's configuration -- without having to sift through dozens of access list statements in an attempt to piece together its function. Good programmers use remarks liberally in their programs to describe the function of certain blocks of code.

The syntax for adding a remark to a numbered access list is as follows:

router(config)#access-list access-list-number remark remark

When configuring a named access list, you issue the remark command in named access list configuration mode:

router(config-std-nacl)#remark remark

After entering the remark keyword, you can include an alphanumeric string of up to 100 characters to describe the access list. It is good form to enter a remark statement before configuring the permits and denies. That way the description will appear as the first entry in the configuration file. In a long list, you may want to include multiple remark statements and enter them before the part of the list that you want to describe. The figure illustrates proper use of the remark option.

Access lists are applied to one or more interfaces, and can filter inbound or outbound traffic. A good rule to remember is that you can apply one access list per protocol, per interface, per direction (in or out). Outbound access lists require few CPU cycles than inbound lists and therefore are preferred. A router with an outbound access list must switch every packet to an outbound interface before checking against the list. This results in a waste of processing resources if the packet ends up being denied.

To apply an access list to an interface, use the command syntax shown in the figure:

Router(config-if)#ip access-group access-list-number |
access-list-name in | out

Applying access lists to an interface is just one part of IP traffic management. The following section looks at other ways of applying access lists to implement security.

Although you may focus on using the Cisco IOS to secure network resources, you should not forget that the router itself is a vital resource that must also be protected. An obvious vulnerability is a router's virtual terminals, which should rarely, if ever, be open to public Internet connections.

Although an extended access list can be used to block the Telnet port (TCP 23), such a list would be IP-specific and may have to be configured for every IP interface on the router. A more efficient and precise approach is to apply a standard access list to the virtual terminal lines themselves. The figure illustrates these virtual terminal (VTY) lines, which are numbered 0 to 4.

You can apply an access list directly to one of the five VTYs, but because you can not always predict which VTY a user will connect to, you should apply the same access list to all five lines, as shown:

RTA(config)#access-list 5 permit 200.100.50.0 0.0.0.255
RTA(config)#access-list 5 permit host 192.168.1.1
RTA(config)#line vty 0 4
RTA(config-line)#access-class 5 in

In this example, access list 5 defines which hosts will be allowed to connect to the virtual terminals. The line vty 0 4 command specifies all five VTYs. Finally, the access-class command -- not ip access-group -- is used to apply the list to the VTYs. Note that the in keyword is almost always used with this command. The out keyword would actually restrict the router's capability to Telnet to outside destinations.

VTYs are not the only way to gain configuration access to a router. Many administrators manage their devices using a web browser, and Cisco routers and switches typically run web services (on port 80, by default) to permit this. Securing the VTYs will not do much good if a web surfer can access the router's web-based command interface. If your organization uses the IOS web interface, be sure to apply an access list to it as well. The following commands provide an example for web server configuration:

RTA(config)#access-list 17 permit 202.206.100.0 0.0.0.255
RTA(config)#ip http server
RTA(config)#ip http access-class 17

When applying an access class to an IOS HTTP server, the in keyword is not used. Because this service can accept only incoming connections, the access list cannot be applied to outbound traffic.

In the above example, the ip http server command enables web-based configuration, while the ip http access-class 17 command restricts the web interface to source IP addresses matching access list 17.

Lock-and-key is a Cisco IOS feature that enables users to temporarily open a hole in a firewall without compromising other configured security restrictions. This feature is configured using a type of extended access list called a dynamic access list. In practice, lock-and-key users are typically power users or systems administrators because the user must Telnet to a Cisco router to create the hole in the firewall. However, some administrators may automate the procedure using a process such as scripts so that intermediate users can take advantage of this feature.

Dynamic access lists enable designated users to gain temporary access to protected resources from any IP address, or, from any specific addresses that you choose. When configured, lock-and-key modifies the existing IP access list of the interface so that it permits the IP addresses of designated users to reach specific destinations. After the user has disconnected, lock-and-key returns the access list back to its original state.

For lock-and-key to work, the user must first Telnet to the router. When telnetting, the user is provided an opportunity to tell the router who he or she is (by authenticating with a username and a password), and what IP address he or she is currently sending from. If the user successfully authenticates to the router, the user's IP address can be granted temporary access through the router. The dynamic access list configuration determines the extent of the access granted.

When is it appropriate to use lock-and-key? Two general scenarios warrant a dynamic access list configuration:

You want to permit a user, or group of users, to securely access a host within your protected network via the Internet. Lock-and-key authenticates the user and then permits limited access through your firewall router, but only for that individual's host or subnet, and only for a finite period of time.
You want certain users on a remote network to access a host on the corporate network protected by a firewall (as shown in the figure). Lock-and-key requires users to authenticate before allowing their hosts to access the protected hosts.

The following steps summarize lock-and-key operation:

A user opens a Telnet session to a firewall router configured for lock-and-key. The user connects via one of the VTYs on the router.
The Cisco IOS receives the Telnet packet, opens a Telnet session, prompts the user for a username and password, and performs the authentication process. The authentication can be done by the router or by a security server (such as a TACACS+ or RADIUS box). When a user passes authentication, he or she is logged out of the Telnet session, and the software creates a temporary entry in the dynamic access list. Depending on the configuration, this temporary entry can limit the range of networks to which the user is given temporary access.
The user exchanges data through the "hole" in the firewall.
The IOS deletes the temporary access list entry when a configured timeout is reached, or when the system administrator manually clears it. The configured timeout can be either an idle timeout or an absolute timeout. The temporary access-list entry is not automatically deleted when the user terminates a session. It remains until the timeout is reached or until it is cleared by the system administrator.

Cisco IOS releases prior to Release 11.1 are not compatible with dynamic access lists (lock-and-key). Therefore, if you use a configuration file that includes a dynamic access list with IOS software older than Release 11.1, the resulting access list will not be interpreted correctly. This could cause you severe security problems.

To configure lock-and-key, you start by defining a dynamic access list, using the syntax shown in the figure.

To configure lock-and-key on RTA (192.168.1.1), you would configure the dynamic access list, as shown:

RTA(config)#access-list 101 permit tcp any host 192.168.1.1 eq telnet
RTA(config)#access-list 101 dynamic UNLOCK timeout 120 permit ip any any
RTA(config)#int s0
RTA(config-if)#ip access-group 101 in

The first access list statement in the example permits Telnet traffic to RTA's interface at 192.168.1.1. Lock-and-key will not work if a firewall blocks the user from Telnetting to the router. You must be able to reach the router via Telnet, so Configuring an explicit TELNET permit is a good idea.

The second statement includes the dynamic keyword and creates a dynamic list called UNLOCK. Notice that the dynamic statement is part of the same extended access list, 101. Remember that you can have only one access list per protocol, per interface, per direction. Assume, for this example, that additional statements are configured as part of access list 101.

The timeout 120 option specifies the absolute timeout, which is a maximum time limit (in minutes) for each entry within this dynamic list. In this case, the user would be disconnected 120 minutes after first connecting. If you do not specify a timeout, the IOS allows the entry (the hole in the firewall) to remain forever, or until the idle timer expires (if configured).

Finally, the ip access-group 101 in command applies this list to inbound traffic on the serial interface. With the dynamic access list configured and applied, you must configure the router to use authentication to complete the lock-and-key configuration. Authenticating lock-and-key users is described in the next section.

A router can be configured to authenticate for lock-and-key using its own locally created database or a centralized database on a network server. Local authentication can become burdensome when it has to be repeated on dozens of routers. To prevent this administrative overhead, routers and other nodes can be pointed to the security server to authenticate username and password combinations. These security servers can keep track of all users and passwords in the network in a single centralized database. Typically, a network administrator will choose either a TACACS+ or a RADIUS server for this purpose.

The following configuration uses a simple local database for user authentication.

RTA(config)#username ernie password bert
RTA(config)#line vty 0 4
RTA(config-line)#login local

You can see that we have configured a single user, ernie. The login local command configures all five VTY lines to authenticate users via the local username/password database.

The final step to configuring lock-and-key is to enable the router to create a temporary access list entry in the dynamic access list that was specified in the original ACL (UNLOCK - continuing with the example from 10.3.3). The router will not do this by default. The router can be configured to create temporary access list entries in two ways. You could use the following syntax to enable the creation of temporary entries:

router#access-enable [host] [timeout minutes]

A simple access-enable command will work, but the optional keywords are strongly advised. If the host keyword is used, the temporary entry will be created for the user's individual IP address. Without the host keyword, the user's entire network (or IP subnet) is permitted by the temporary entry.

The timeout keyword specifies the idle timeout, which is how long the connection can remain idle before being terminated. If the access list entry is not used within this period, it is automatically deleted and requires the user to authenticate again. The default is for the entries to remain permanently.

Note: If you configure both idle and absolute timeouts, the idle timeout value must be less than the absolute timeout value.

The router could also be configured to create temporary access list entries automatically. To set up lock-and-key, you configure the VTY lines so that the router automatically issues the access-enable command and then logs the user out. This is accomplished using the autocommand feature, as shown:

RTA(config)#line vty 0 4
RTA(config-line)#autocommand access-enable host timeout 20

By configuring the VTY's with autocommand access-enable, the hole in the firewall is automatically created each time the user authenticates via Telnet. That completes Lock-and-key configuration.

Network administrators have long been faced with a difficult problem -- how to prevent outsiders from connecting at will to inside hosts, while at the same time allowing insiders to connect at will to outside hosts. In other words, how do you allow invited traffic in and keep uninvited traffic out?

Many firewalls (including the Cisco IOS) offer a partial solution to this problem. If the traffic in question uses TCP at Layer 4, the firewall can filter traffic based on the 6 TCP code bits. The 6 TCP codes bits are URG (Urgent), ACK (Acknowledgment), PSH (Push), RST (Reset), SYN (Synchronization), and FIN (Finish). IP hosts use the TCP codes bits to perform the three-way handshake and other connection-oriented communications. The three-way handshake uses the SYN and ACK bits (see the figure).

The first part of the three-way handshake is sent with the SYN bit set to 1, and the ACK and RST bits set to 0. For the second part of the handshake, all subsequent TCP headers in that conversation stream will have either the ACK or the RST bit set to 1. Thus, traffic that is invited into your network will always have one of these bits set to 1. Such traffic is considered part of an established connection. Uninvited traffic (the initial packet in a three-way handshake) will have only the SYN bit set to 1.

Using the Cisco IOS, you can configure an extended list to match a packet based on whether it is part of an established connection. The access list will look for an ACK or RST set to 1. If it does not find one, it will not consider the packet part of an established connection, and the packet will not match the statement. The established argument is used with the tcp keyword in an extended list, as shown here:

router(config)#access-list access-list-number permit tcp source-address source-mask destination-address destination-mask established

After configuring this statement, you should then configure an explicit deny or use the implicit deny to filter traffic that is not established. This example shows a possible established configuration.

access-list 101 permit tcp any 192.168.1.0 0.0.0.255 established
access-list 101 permit icmp any any
access-list 101 permit udp any any eq 53
access-list 101 deny ip any 192.168.1.0 0.0.0.255
access-list 101 permit ip any any

Application of the established argument is limited to TCP traffic, which means that UDP, ICMP, and all other IP protocols are not matched by this keyword. In the example above, additional access list statements permit all ICMP and UDP traffic into the 192.168.1.0/24 network (and any other network). If these protocols were not permitted, key services such as DNS (UDP 53) could be blocked because they do not match the established keyword. Be aware that the any keywords do not present a secure configuration because many network security breaches occur using UDP, and many denial-of-service attacks occur using ICMP. Of course, you could have configured tighter security for protocols other than TCP, but this would require careful planning and implementation, possibly involving dozens of complex statements. The Cisco IOS offers a new feature called reflexive access lists as a way to permit only invited IP traffic, regardless of whether that traffic uses TCP, UDP, or another Internet protocol. Reflexive access lists are discussed in the next section.

Reflexive access lists provide the capability to filter network traffic at a router, based on IP upper-layer protocol "session" information. Like the established argument, you can use reflexive access lists to permit IP traffic for sessions originating from within your network but to deny IP traffic for sessions originating from outside your network. Unlike the established argument, reflexive access lists can do this with all Internet protocols, not just TCP. This is accomplished by reflexive filtering; a way of dynamically matching incoming traffic with the pattern of outgoing traffic.

Note: Reflexive access lists can be defined with extended named IP access lists only; numbered lists do not support this feature.

Reflexive access lists are an important part of securing a network against hackers because they can prevent most kinds of spoofing and denial-of-service attacks. Reflexive access lists are simple to use, and, compared to basic access lists, they provide greater control over which packets enter your network. The following sections describe how reflexive access lists work, what their limitations are, and how you can configure reflexive access lists.

Like standard and extended access lists, reflexive access lists contain condition statements (that is, entries) that define criteria for permitting IP packets (see the Figure). The router evaluates these entries in order, and when a match occurs, no more entries are evaluated. However, reflexive access lists have significant differences from other types of access lists. Reflexive access lists contain only temporary entries. These entries are automatically created when a new IP session begins (for example, with an outbound packet), and the entries are removed when the session ends.

Reflexive access lists are not applied directly to an interface, but they are "nested" within an extended named IP access list that is applied to the interface. Because reflexive lists are nested in extended lists, they do not have the implicit deny ip any any statement at the end of the list.

Compared to using the established argument, reflexive access lists provide a truer form of session filtering. This method is much harder to spoof because more filter criteria must be matched before a packet is permitted through. (For example, source and destination addresses and port numbers are checked, not just ACK and RST bits.) Also, session filtering uses temporary filters that are removed when a session is over. This limits the hacker's attack opportunity to a smaller time window.

A reflexive access list is triggered when a new IP upper-layer session (such as TCP or UDP) is initiated from inside the network, with a packet traveling to the external network. When triggered, the reflexive access list generates a new, temporary entry. This entry will permit traffic to enter your network if the traffic is part of the session, but it will not permit traffic to enter your network if the traffic is not part of the session. For example, if the first packet of a TCP session is forwarded out, a new, temporary reflexive access list entry will be created. This entry is added to the reflexive access list, which applies to inbound traffic. The temporary entry has the following characteristics:

The entry is always a permit entry.
The entry specifies the same protocol (such as TCP) as the original outbound packet.
The entry specifies the same source and destination port numbers (for TCP and UDP only) as the original outbound packet, except that the port numbers are swapped.
For protocols that do not have port numbers, such as ICMP and Internet Group Management Protocol (IGMP), other criteria are specified. For example, for ICMP, type numbers are used instead.
Inbound traffic will be evaluated against the reflexive entry, until the entry expires. If an inbound packet matches the entry, the inbound packet will be forwarded into your network.
The entry will expire (be removed) after the last packet of the session passes through the interface.
If no packets belonging to the session are detected for a configurable length of time (the timeout period), the entry will expire.

Temporary reflexive access list entries are removed at the end of the session. For TCP sessions, the entry is removed 5 seconds after 2 set FIN bits are detected, or immediately after matching a TCP packet with the RST bit set. Two set FIN bits in a session indicate that the session is about to end; the 5-second window allows the session to close gracefully. A set RST bit indicates an abrupt session close. Alternately, the temporary entry is removed after no packets of the session have been detected for a configurable length of time (the timeout period).

Unlike TCP, UDP and other connectionless protocols do not include session tracking information in either the Layer 3 or Layer 4 headers, so the exact end of a session cannot be known. Therefore, the end of a session is considered to be when no packets of the session have been detected for a configurable length of time (the timeout period).

Reflexive access lists do not work with some applications that use port numbers that change during a session. For example, if the port numbers for a return packet are different from the originating packet, the return packet will be denied, even if the packet is actually part of the same session. FTP is an example of an application with changing port numbers. When you initiate an FTP session, that conversation typically uses TCP port 21 to send control information, including the three-way handshake and username/password negotiation.

However, when the connection is established, data is actually sent on a different port, typically TCP port 20. If an outside FTP server sends the reply to the first packet on port 20, the reflexive access list will not let it through because it has not seen an inside host use that port. From the reflexive access list's point of view, the stream on port 20 is a new and uninvited conversation. However, if the FTP client operates in passive mode, it can be the first host to send a packet on the data port. Thus, you must configure your FTP clients for passive FTP so that they will originate the data port transfer, which in turn will create the appropriate reflexive access list entry.

By applying a reflexive list to an external interface (the interface that connects to the outside world), you can prevent IP traffic from entering the internal network, unless the traffic is part of a session already established. The following are the steps in configuring a reflexive access list:

Define the extended named access list that will be applied to the outbound interface. Outbound traffic is traffic originating from your local hosts, which is leaving your network to outside destinations. This traffic will be monitored by the router and used to create reflexive access lists. In a sense, this traffic's "reflection" will be allowed to enter your local network as invited traffic. Use the following syntax to create this list:

router(config)# ip access-list extended extended-list-name
Configure the extended named access list to include an entry to reflect traffic. The reflect name parameters create the reflexive list and assign it a name. The name will be used later in this configuration. Use the following syntax:

router(config-ext-nacl)# permit ip-protocol any any reflect name [timeout seconds]
Apply the outbound list to the outbound interface:

router(config-if)# ip access-group extended-list-name out
Define the extended named access list that will filter incoming traffic. This list will include an entry to evaluate incoming traffic (traffic coming in from the outside world) against the reflexive list:

router(config)# ip access-list extended extended-list-name
Configure this extended named access list to include the entry to evaluate traffic against the reflexive list's assigned name:

router(config-ext-nacl)# evaluate name

Note that this entry is nested inside the extended list. Other permit or deny statements can be used before matching the evaluate statement. However, if a packet does match a preceding statement, it will not be tested for a match against any of the other entries, including the evaluate statement.
Apply the inbound list to the external interface for incoming traffic:

router(config-if)# ip access-group extended-list-name in
(Optional) Specify a global timeout value for dynamic reflexive list entries:

router(config)# ip reflexive-list timeout seconds

Reflexive access list entries expire after no packets in the session have been detected for a specified length of time (the timeout period). If you do not specify the timeout for a given reflexive access list, the list will use the global timeout value instead. The global timeout value is 300 seconds, by default, but you can change the global timeout to a different value at any time.

The figure shows a reflexive access list for RTA. To create this list, first define and apply the outbound list that will be reflected. The reflection will generate reflexive access list entries:

RTA(config)#ip access-list extended OUTBOUND
RTA(config-ext-nacl)#permit ip any any reflect INVITED-TRAFFIC
RTA(config-ext-nacl)#exit
RTA(config)#interface serial0
RTA(config-if)#ip access-group OUTBOUND out

The commands shown in the above example create an extended named access list called OUTBOUND. This list includes an entry that creates the reflexive list, INVITED-TRAFFIC. Entries for INVITED-TRAFFIC will be generated dynamically based on a reflection of the outbound traffic flow.

Next, configure an inbound list that will match incoming traffic (traffic coming in from the Internet) to this reflexive list, as shown:

RTA(config)#ip access-list extended INBOUND
RTA(config-ext-nacl)#evaluate INVITED-TRAFFIC
RTA(config-ext-nacl)#exit
RTA(config)#interface serial0
RTA(config-if)#ip access-group INBOUND in

The commands in this example create an extended named access list called INBOUND. This list will be used to match traffic coming in from the Internet. Although you could include other entries, the only one shown here is the evaluate statement, which is a reflexive access list nested inside the list, INBOUND. This evaluate statement instructs the router to permit only traffic that matches the INVITED-TRAFFIC reflexive access list. If desired, you can set a global timeout to something other than the default, as shown:

RTA(config)#ip reflexive-list timeout 200

When configured with a reflexive access list, RTA presents a sophisticated firewall, but still a limited one. None of the access lists discussed so far in this chapter can go beyond Layer 4 to filter traffic based on application. However, the next generation of access lists, context-based access control, can do just that, as you will see in the next section.

Context-based access control (CBAC) is a comprehensive set of security tools that includes stateful packet filtering. CBAC's method of stateful packet filtering goes beyond just Layer 3 and Layer 4 header examination; CBAC actually examines a packet's data content. In the previous section, you saw that reflexive access lists could not effectively handle sophisticated application protocols that change TCP or UDP port numbers during a session. In contrast, CBAC has been specifically designed to recognize popular application protocols, such as FTP, and to accommodate outside hosts that want to continue conversations on another port.

As traffic leaves the protected network, CBAC tracks the "state" of the TCP or UDP connection, which includes port numbers and IP addresses for both the destination and the source. These connection states are kept in a table. When traffic from an outside network tries to enter the protected network, CBAC checks the traffic against the state table to ensure that each packet is part of an invited session. CBAC also looks beyond port numbers and IP addresses to inspect the type of data being exchanged. CBAC examines the payload of a packet to determine what application layer protocol is used. Because CBAC is aware of how certain applications work, it recognizes and permits invited traffic, even if the outside host has responded using a port number that is not yet in the state table. These supported applications include Real Audio and Microsoft's NetShow. Thus, CBAC supports protocols that involve multiple channels, or ports. Most multimedia streaming protocols, as well as some other protocols (such as FTP, RPC, and SQL*Net), use multiple channels.

Note: CBAC is part of the Cisco IOS Firewall feature set and was first available with Release 11.2. A significant number of commands and features were added to CBAC in Release 12.0.5(T). Note that the Firewall feature set is not available for all router platforms.

CBAC is more than just an improved access list command; it is a set of security tools that includes traffic filtering, Java blocking, traffic inspection, alerts and audit trails, and intrusion detection. A comprehensive discussion of how all these features work is beyond the scope of this chapter.

The following sections present an overview of CBAC operation, when and where to configure CBAC, and basic CBAC configuration guidelines. Moreover, these sections will look at CBAC inspection rules, applying rules to an interface, and verifying CBAC operation.

Like reflexive access lists, CBAC creates temporary openings at firewall interfaces. These openings are created when specified traffic exits your internal network through the firewall. The openings allow returning traffic (that would normally be blocked) and additional data channels (TCP ports and UDP ports) to enter your internal network back through the firewall. The traffic is allowed back through the firewall only if it is part of the same session as the original traffic that triggered CBAC when exiting through the firewall.

In the figure, the inbound access lists at S0 and S1 are configured to block Telnet traffic, and there is no outbound access list configured at E0. When the connection request for User 1's Telnet session passes through the firewall, CBAC creates a temporary opening in the inbound access list at S0 to permit returning Telnet traffic for User 1's Telnet session. (If the same access list is applied to both S0 and S1, the same opening would appear at both interfaces.) If necessary, CBAC would also have created a similar opening in an outbound access list at E0 to permit return traffic.

With CBAC, you specify which protocols you want to be inspected. You also specify an interface and direction (in or out) where inspection originates.

For these protocols, packets flowing through the firewall in any direction are inspected, as long as they flow through the interface where inspection is configured. Only the protocols that you explicitly specify will be inspected by CBAC.

Packets entering the firewall are inspected by CBAC only if they first pass the inbound access list at the interface. If a packet is denied by the access list, the packet is simply dropped and is not inspected by CBAC.

Only the control channels of connections are inspected and monitored by CBAC; the data channels are not inspected. For example, during FTP sessions, both the control (typically port 21) and data channels (typically port 20) are monitored for state changes, but only the control channel is inspected.

CBAC inspection recognizes application-specific commands in the control channel, and detects and prevents certain application-level attacks such as SYN-flooding. A SYN-flood attack occurs when a network attacker floods a server with a barrage of requests for connection and does not complete the connection. The resulting volume of half-open connections can overwhelm the server, causing it to deny service to valid requests. Network attacks that deny access to a network device are called denial-of-service (DoS) attacks.

CBAC inspection helps to protect against DoS attacks in other ways. CBAC inspects packet sequence numbers in TCP connections to see if they are within expected ranges -- CBAC drops any suspicious packets. You can also configure CBAC to drop half-open connections, which require firewall processing and memory resources to maintain. Additionally, CBAC can detect unusually high rates of new connections and issue alert messages.

With UDP, a connectionless service, there are no actual sessions. CBAC approximates sessions by examining the information in the packet and determining whether the packet is similar to other UDP packets. For example, similar source/destination addresses and port numbers. It then determines whether the packet was detected soon after another similar UDP packet. "Soon" is a configurable UDP idle timeout period.

Warning: CBAC protects against certain types of attacks, but not every type of attack. CBAC should not be considered a perfect, impenetrable defense. Determined, skilled attackers might be able to launch effective attacks. Although there is no such thing as a perfect defense, CBAC detects and prevents most of the popular attacks on your network.

CBAC cannot be used to filter every TCP/IP protocol, but it is an appropriate security solution for networks that are running TCP or UDP applications or certain multimedia applications, such as Microsoft's NetShow, or Real Audio. In many cases, you will configure CBAC in one direction only at a single interface, causing traffic to be permitted back into the internal network only if the traffic is part of an existing session.

You can also configure CBAC in two directions at one or more interfaces. CBAC is configured in two directions when the networks on both sides of the firewall should be protected, such as with extranet or intranet configurations, and to protect against DoS attacks. For example, if the firewall is situated between two partner companies' networks, you might want to restrict traffic in one direction for certain applications, and restrict traffic in the opposite direction for other applications.

So, what protocols does CBAC support? Like reflexive access lists, CBAC can filter all TCP and UDP sessions, without inspecting the application layer protocols. However, CBAC can also be configured to effectively handle the multichannel (multiport) application layer protocols listed in the figure.

CBAC is available only for IP protocol traffic. Only TCP and UDP packets are inspected. (Other IP traffic, such as ICMP, cannot be inspected with CBAC and should be filtered with basic access lists instead.)

To configure CBAC properly, you must be able to decide which interface should receive the appropriate CBAC configuration, as discussed in the next section.

The first step in configuring traffic filtering is to decide whether to configure CBAC on an internal or external interface of the firewall. In this case, "internal" refers to the side where sessions must originate for their traffic to be permitted through the firewall. "External" refers to the side where sessions cannot originate (sessions originating from the external side will be blocked).

If you will be configuring CBAC in two directions, you should configure CBAC in one direction first, using the appropriate internal and external interface designations. When you configure CBAC in the other direction, the interface designations will be swapped.

CBAC is most commonly used with one of two basic network topologies. Determining which of these topologies is most like your own can help you decide whether to configure CBAC on an internal interface or on an external interface.

Figure shows the first network topology. In this simple topology, CBAC is configured for the external interface Serial 0. This prevents specified protocol traffic from entering the firewall router and the internal network, unless the traffic is part of a session initiated from within the internal network.

In this topology, a Demilitarized Zone (DMZ) is defined and accessed through interface Ethernet 1. A DMZ is a region where public services such as DNS, Mail and Web can be reached without passing any security filter. In this example CBAC is configured for the internal interface Ethernet 0, allowing external traffic to access the public services in the DMZ, but preventing specified protocol traffic from entering your internal network -- unless the traffic is part of a session initiated from within the internal network.

The key difference between these two topologies is that the first topology does not allow outside traffic into the router without passing through the filter. The second topology allows outside traffic to enter the router so that it can reach public servers on the DMZ without passing through the filter.

Using these two sample topologies, you can decide whether your network should have CBAC on an internal or external interface.

Note: If your firewall has only two connections, one to the internal network and one to the external network, using all inbound access lists works well because packets are stopped before they get a chance to affect the router itself.

Following are some tips for configuring CBAC on an external interface:

If you have an outbound IP access list at the external interface, the access list can be a standard or an extended access list. This outbound access list should permit traffic that you want to be inspected by CBAC. If traffic is not permitted, it will not be inspected by CBAC and will simply be dropped.
The inbound IP access list at the external interface must be an extended access list. This inbound access list should deny traffic that you want to be inspected by CBAC. (CBAC will create temporary openings in this inbound access list as appropriate to permit only return traffic that is part of a valid, existing session.)

The following are some tips for your access lists when you are configuring CBAC on an internal interface:

If you have an inbound IP access list at the internal interface or an outbound IP access list at a external interface or interfaces, these access lists can be either standard or extended access lists. These access lists should permit traffic that you want to be inspected by CBAC. If traffic is not permitted, it will not be inspected by CBAC but will be simply dropped.
The outbound IP access list at the internal interface and the inbound IP access list at the external interface must be extended access lists. These outbound access lists should deny traffic that you want to be inspected by CBAC. (CBAC will create temporary openings in these outbound access lists as appropriate to permit only return traffic that is part of a valid, existing session.) You do not necessarily need to configure an extended access list at both the outbound internal interface and the inbound external interface, but at least one is necessary to restrict traffic flowing through the firewall into the internal protected network.

CBAC inspection rules specify which application layer protocols of the IP traffic will be inspected by CBAC at an interface. Normally, you define only one inspection rule. The only exception might occur if you want to enable CBAC in two directions, as described earlier in this chapter. For CBAC configured in both directions at a single firewall interface, you should configure two rules, one for each direction.

An inspection rule should specify each desired application layer protocol as well as generic TCP or generic UDP, if desired. The inspection rule consists of a series of statements, each listing a protocol and specifying the same inspection rule name. Inspection rules include options for controlling alert and audit trail messages and for checking IP packet fragmentation. The following sections describe how to define an inspection rule.

Configuring Application Layer Protocol Inspection
To configure CBAC inspection for an application layer protocol (except for RPC and Java), use the following command syntax (Figure ):

Router(config)#ip inspect name inspection-name protocol [timeout seconds]

The protocol option can be any one of several possible arguments, listed in Figure . Repeat this command for each desired protocol. Use the same inspection-name to create a single inspection rule, as shown:

RTA(config)#ip inspect name FIREWALL http
RTA(config)#ip inspect name FIREWALL ftp
RTA(config)#ip inspect name FIREWALL udp
RTA(config)#interface s0
RTA(config-if)#ip inspect FIREWALL out

These commands create a CBAC inspect list named FIREWALL that is applied to outbound traffic exiting interface S0. RTA will inspect outbound traffic and create dynamic access list entries to allow inbound traffic through the firewall, if it is part of the session started by an internal host.

How is this CBAC configuration different from a reflexive access list? In the above, CBAC is configured to inspect FTP, which was challenging in a reflexive access list. Also, UDP conversations that use multiple ports are fully supported by CBAC, while reflexive access lists cannot handle an IP conversation when the outside host changes ports.

The syntax for configuring a CBAC inspection for Java is as follows :

Router(config-if)#ip inspect name inspection-name http [java-list access-list] [timeout seconds]

Java applets can represent a security risk because unaware users can download them into your network and then run malicious code behind your firewall. You can configure CBAC to filter Java applets at the firewall, which enables users to download only applets residing within the firewall and trusted applets from outside the firewall.

Java applet filtering distinguishes between trusted and untrusted applets by relying on a list of external sites that you designate as "friendly." If an applet is from a friendly site, the firewall allows the applet through. If the applet is not from a friendly site, the applet is blocked. (Alternately, you could permit applets from all external sites except for those that you specifically designate as hostile.)

To block Java applets from sites known to be a risk, but to permit all others, you can use a configuration similar to this:

RTA(config)#access-list 24 deny 200.100.50.0 0.0.0.255
RTA(config)#access-list 24 deny 169.199.0.0 0.0.255.255
RTA(config)#access-list 24 permit any
RTA(config)#ip inspect name FIREWALL http java-list 24
RTA(config)#ip inspect name FIREWALL tcp
RTA(config)#interface s0
RTA(config-if)#ip inspect FIREWALL out

If RTA is configured accordingly, it will inspect traffic for Java and match it according to access list 24. Holes will not be opened in the firewall for Java traffic originating from the explicitly defined networks. Of course, the permit any statement makes this configuration extremely vulnerable to the thousands of other sites that may infect your network with malicious Java code. If you want to sacrifice functionality and end-user freedom, you can use an access list to explicitly permit Java code from friendly networks, and deny code from all others. The result will be a secure but highly restrictive configuration.

Note: CBAC does not detect or block encapsulated Java applets. Therefore, Java applets that are wrapped or encapsulated, such as applets in .zip or .jar format, are not blocked at the firewall. CBAC also does not detect or block applets loaded from FTP, gopher, HTTP on a nonstandard port, and certain other applications.

Configuring Generic TCP and UDP Inspection
You can configure TCP and UDP inspection to permit TCP and UDP packets to enter the internal network through the firewall, even if the application layer protocol is not configured to be inspected . However, TCP and UDP inspection do not recognize application-specific commands and therefore might not permit all return packets for an application. This particularly true if the return packets have a different port number than the previous exiting packet.

Any application layer protocol that is inspected will take precedence over the TCP or UDP packet inspection. For example, if inspection is configured for FTP, all control channel information will be recorded in the state table. In addition, if the control channel information is valid for the state of the FTP session, all FTP traffic will be permitted back through the firewall. The fact that TCP inspection is configured is irrelevant to the FTP state information.

With TCP and UDP inspection, packets entering the network must exactly match the corresponding packet that previously exited the network. The entering packets must have the same source/destination addresses and source/destination port numbers as the exiting packet (but reversed). Otherwise, the entering packets will be blocked at the interface.

With UDP inspection configured, replies will be permitted back in through the firewall only if they are received within a configurable time after the last request was sent out. (This time is configured with the ip inspect udp idle-time command.) To configure CBAC inspection for TCP or UDP packets, use the following commands:

RTA(config)#ip inspect name FIREWALL tcp
RTA(config)#ip inspect name FIREWALL udp

Applying the Inspection Rule to an Interface
After you define an inspection rule, you apply that rule to an interface. Normally, you apply only one inspection rule to one interface. The only exception might occur if you want to enable CBAC in two directions. For CBAC configured in both directions at a single firewall interface, you should apply two rules, one for each direction.

To apply an inspection rule to an interface, use the following syntax :

router(config-if)#ip inspect inspection-name in | out

The following example shows the commands needed to configure RTA to use the inspection list, FIREWALL, on traffic traveling out E0 into a protected network.

RTA(config)#interface ethernet 0
RTA(config-if)#ip inspect FIREWALL out

CBAC uses timeouts and thresholds to determine how long to manage state information for a session, and to determine when to drop sessions that do not become fully established. These timeouts and thresholds apply globally to all sessions. You can use the default timeout and threshold values, or you can change to values more suitable to your security requirements. You should make any changes to the timeout and threshold values before you continue configuring CBAC.

All the available CBAC timeouts and thresholds are listed in the figure, with the corresponding commands and default values. To change a global timeout or threshold listed in the "Timeout or Threshold Value to Change" column, use the global configuration command in the "Command" column.

You cannot get the benefits of access lists or CBAC without paying a price. If you apply an access list to an interface, you force the router to check each packet that passes through it, resulting in increased latency. Although, in some cases you can secure your network without impacting performance. If you want to restrict all traffic to a particular destination, you can configure a static route to null0.

The null0 interface is a software-only interface that functions as a "destination" for discarded information. In a sense, null0 is a garbage bin.

Instead of using an access list to filter traffic destined for network 10.0.0.0/8, you can configure a static route to null0, with the same result:

RTA(config)#ip route 10.0.0.0 255.0.0.0 null0

RTA will install a route to the 10.0.0.0/8 network into its routing table. That route points to null0. So, when RTA receives traffic destined for 10.0.0.0/8, it will perform a table lookup, find the route, and send the packets to null0. The end result? Traffic destined for 10.0.0.0/8 cannot pass through RTA because it is routed to nowhere.

Static routes to null0 can be used as traffic filters only when you want to completely prohibit a destination network. Filtering based on source address would have to be done using a route map.

The figure compares filtering with an access list to filtering with a route to null0. A route to null0 makes far fewer demands on the router's CPU. There are no access lists statements which have to be matched upon receiving the packet. Essentially, the packet is routed to nowhere.

In this chapter, you have learned about the configuration tasks necessary to deal with managing traffic in an IP environment using access lists. You have learned about the difference between standard and extended access lists and you have seen example configurations of each type of access list. You also discovered how a network can be protected from denial-of-service attacks based on the TCP connection establishment process. Finally, you learned how to use lock-and-key security (dynamic access lists), IP session filtering (reflexive access lists) and CBAC.