Riding the Storm

At weather.com, getting back to "the basics" is yielding the performance advantage it needs to build customer loyalty

By Justin Kestelyn 

When Mark Ryan sees a storm approaching over the horizon, he has more to worry about than getting wet or getting caught in a blizzard. As CTO of weather.com, the online counterpart to The Weather Channel, he has to consider whether a virtual flood of visitors will overwhelm his IT infrastructure - and if they will be able to enter the site, get the information they need, and then leave quickly enough without damage being done to a carefully cultivated "trust" relationship.

With 14 million unique visitors, weather.com is the largest single-content Web site in the world. It has unique requirements: The content involved is dynamic, usage spikes are unpredictable, and visitors demand instantaneous access to weather information in a personal context, whether they're traveling, golfing, sailing, or just plain weather-watching. Indeed, as Ryan explains, performance and personalization are integral elements of trust architecture.

Before joining weather.com in October 1999, Ryan served as CTO at eBay Inc., where he learned a thing or two about the role of business-critical infrastructure in earning customer trust. A former IBMer, in 1996 Ryan designed and managed the IT infrastructure behind the Atlanta Olympic Games, the first such games with a strong Web presence. As you'll see, he has strong views on enforcing baseline IT principles, the industry trend toward open source, and the value of personalization in customer retention.

IE: Weather.com is unusual in that your content and usage patterns are both extremely dynamic. Does that fact lead to unique scalability requirements?

Ryan: Yes. Our timing for scale is opposite that of a standard e-business. Most companies scale over a period of weeks and months. We have to scale within several days to some pretty tremendous numbers: from four or five million page views per day to 19 to 22 million page views per day within 24 or 48 hours, with our high periods being the first quarter for winter storms and the third quarter for hurricanes.

In contrast, most e-commerce shops scale during and across a single quarter so that there's more time to plan for increased traffic. If you're Land's End and you're going into your fourth quarter Christmas season, you can anticipate your increased rate of usage and then add capacity if appropriate; you don't get into a situation where you have to spike an additional 15 million page views in one day. 

IE: How has your infrastructure evolved to meet those requirements?

Ryan: We've had to build in an extremely robust, scalable architecture. We started with an approach that was similar to other Internet startups that are growing at compounded growth rates. This approach is based on a self-explanatory strategy called "just throw hardware at it." That doesn't mean it's the right hardware, or that it's tuned for the application you're trying to run on it. It just means that you survive another day.

After a couple of years of throwing hardware at our problems, we ended up with a hodgepodge of different systems tied together with "Band-Aid" code. We didn't have the ability to do any proactive tuning. All the production servers, software, and engineering change levels and release levels of the operating systems were different.

IE: Sounds like a major headache. What did you do to address the problem?

Ryan: The only thing we could do was start baselining our environment by running apps only on the most optimal platforms, and by making every piece of hardware on which we run those applications identical. We put the base disciplines of IT back in place: Whenever possible, make every piece of hardware identical. In other words, optimize the hardware for the application that you're trying to run on it, not the other way around.

In our case, we're running very flat content that can be cached, so we really don't need a Sun E10000. Rather, we use fast, very inexpensive Linux boxes or offload our content to cache boxes across our infrastructure.

IE: What led you in that particular direction?

Ryan: A couple of things. First of all, one of the main criteria for earning customers' trust is to let them log on to your site, get the information they need, and then get off. You want to provide a combination of content they really need, and you want to give them the performance to access that content very quickly. Therefore, we wanted our architecture to scale very well and serve up flat content very quickly.

Second, I believe that the industry - and the Web sites that have to move very quickly - is trending toward open source. Linux and Apache are very lightweight operating and Web serving elements, and as such, they're very fast.

Frankly, this isn't brain surgery. It's all about base IT principles: Put the apps on the platforms on which they run best.

IE: How do you execute that approach at weather.com?

Ryan: I have two hosting facilities, which I try to make nearly identical. I put the applications that are more transactional in nature - the ones that need more robust serving capacity - on the appropriate platform; say, on Unix or something else slightly more robust than Linux. Then I make all of those servers identical so we can tune the application, server, Web server, and the IP stack all at once. System management is easy; we use round-robin or geographical load balancing to optimize capacity.

We've also switched our maps and image serving from a Sun Solaris platform to a bunch of Netfinity Linux boxes running Apache. They're all identical, we tune them all the same, and we manage them all the same. By doing so, we reduce by an order of magnitude the time it takes us to serve up images.

This horizontal approach lets us scale the end-user experience at some level of consistency. Previously, we were running anywhere from a 18-second page download to a 25-second download during peak season. Now, we're running at about 1.78 seconds very consistently. Even during hurricane season, when we have 15 million page views a day, we still run under two seconds per download.

IE: Did your experiences at eBay and the Atlanta Olympics influence your decision-making here?

Ryan: Any experience you can get in handling mission-critical, time-sensitive situations helps considerably. Having the world watching while you are getting that experience only makes it more interesting. And in both these experiences, the world was indeed watching.

IE: How important are load and stress testing in managing your infrastructure?

Ryan: We just started load and stress testing last year. As with most other companies, it's very tough for us to simulate the volume of load that we'll have on these high stress dates - an environment in which we can have 18 million page view requests from different portions of our site. Thus, we simulate load using automated tools and then algorithmically project what we believe it will be in practice.

IE: What about personalization? To what extent does it play a role in earning customer trust at weather.com?

Ryan: Personalization is an interesting word; everybody defines it differently. Some folks want to know who e-visitors are at the demographic group level. Other people want to know that you're Bob, that you live on the third house on 14th St., that you watch Barnaby Jones reruns at night, and that those facts make you a valuable customer. I'm sure for some businesses, they do. 

However, from our users' viewpoint, we think that the most valuable kind of personalization is at the content level. So, from a personalization perspective, we're very interested in knowing you as a business traveler, as someone who needs pollen count information, or as a golfer or gardener. That's the level of personalization that we "serve" into.

IE: What information do you use as a basis for that personalization, and how do you aggregate and analyze it?

Ryan: We're looking at different approaches for different challenges. For example, one short-term challenge that we have is identifying people quickly enough to do immediate and effective serving of either content or ad inventory. Another main challenge - determining which path that most visitors take to our site - is more of an ongoing, long-term process.

These challenges involve two different sets of analysis. Both of them are very expensive to do in real time. If you have a site that has to scale like ours does, predicting the information you'll need - and the ability to log and react to that information on the fly - becomes very difficult. The log analysis itself will require more capacity than your entire site has just to serve up the content.

IE: What about using a clickstream data warehouse?

Ryan: Yes, we have a set of software that we use to analyze clickstreams or, more specifically, to count them. It's great to find out that the clickstream is XYZ, but what does that really mean? It takes time to analyze that information and then determine the action you should take. Consequently, we're investigating more "intelligent" analytic software tools, if I can use that term loosely, that will tell us what the information means, rather than having to manually sift through massive logs of information.

IE: It sounds like you have a lot in common with transaction-intensive e-commerce portals.

Ryan: We're in two different businesses, but in theory, mine is morphing into theirs.

Let's look at both approaches. In the e-commerce model, somebody comes to a site, they search a database of available products; they put something in their shopping basket; and they go to checkout. During that process, a bunch of transactions kick off: one to get the credit card validated, one to debit the credit card, one that says "Go to the warehouse and tell a picker to put these 17 items on the mail dock, and here's the address to put on them," and so on. And then a response comes back to the user that says, "Thank you for shopping with us, here's your total bill." The whole process is back-end transactional intensive.

Our model is a little bit different. When you come to weather.com, you're looking at latent data. In theory, I could do a push to the site and cache everything. It doesn't really matter if it's five minutes from now or five minutes ago, because the weather forecast will be the same.

When you come to my site, I need to know why you came there in order to provide better value-added service. In the future - at least, in my vi sion - we'll do a transaction to figure out who you are and if we want to track your demographic information. Every time you touch something, we'll do a transaction against your account, a database update that says, "Justin came to the site and looked at Bondi Beach today." The next time you come to weather.com, I can serve you up Bondi Beach information right from the start.

IE: Do you think that your expanded definition of a transaction could influence back-end transactional-intensive businesses, as you call them?

Ryan: Absolutely. Everybody has to get smart really quickly. How often does Amazon.com update the prices on those millions of products? Perhaps only every other day, or once per week. But in my situation, I want to know who you are immediately so I can serve up the content - fast. I'm going to go into the database, see who you are, and then do an update on that information so that I know where you were on my site. That way, I can better serve you the next time you visit weather.com.
 

IE: Will increasing levels of personalization raise e-visitor expectations even higher?

Ryan: In any business, people demand better services and higher levels of quality over time. To the extent that performance and availability become the ante just to get in the game, fast, high-quality content is crucial. To the extent that personalization or profiling, depending on which term you want to use, helps bring more relevant information to customers more quickly, it will also be a big factor in staying competitive.

It's kind of like comparing Lowe's and The Home Depot. If they're both in town, you go to the one that can offer better inventory, better service, and that helps you check in and out more quickly, right? The same thing is going to apply to e-commerce and content sites: Customers will frequent the ones they can quickly get on, that help them easily locate the products they want, and that have the best prices. In that sense, personalization and availability become not only value-added services, but customer retention strategies in themselves.


WE'RE FROM MARS, YOU'RE FROM VENUS

Mark Ryan on the timeless struggle between IT and business processes
 

A series of of opportunities arise in an internet business for which you need a certain level of business maturity within the IT organization. You need to have the right skill sets portfolio, but you also need to have enough business processes inside IT, including change control, problem management, return on investment, and security policies. Unfortunately, you usually won't know about those things unless you've been around for a while.

At weather.com, we've recently gone through a huge transition to bring IT and business processes into more realistic alignment. IT and business managers generally don't communicate as effectively as they should, and even people within IT don't necessarily all speak the same language; the words "Java program" mean different things to different people.

To combine into a "synergistic" force, you have to communicate in a common language so that everybody understands the stakes involved. The IT people and coders have to understand the visual image that the marketing and content people want, and why they want it. It's a matter of transforming your organization by establishing a common "language" and by establishing goals based on revenue, rather than on emotion or what somebody read in a book.

Site hosted by Angelfire.com: Build your free website today!