BLOG

Apr 13 2009
Posted by chl @ 10:59am UTC

I wanted to get up on a soap box and talk about servers a bit. Getting disclosure out of the way as it pertains to what I’m going to talk about here:

(This will all make sense if you read to the end, promise…)

So today, I’m going to talk about servers. If you’re working on the web, you interact with servers. Even if you personally don’t deal with them (and the vast majority of you don’t) you still interact with them. They are an indispensable part of the web ecosystem. If you have a web site, then somewhere there is a server, or possibly a collection of servers, that feeds your site out to the Internet when asked. Most of you know this. But, I’d like to point out something that apparently is not properly understood.

Your server doesn’t scale.

Seriously. Your server doesn’t scale. Or, if you have a bigger site, then your servers (plural) don’t scale. They don’t scale ever.

Servers in a classical sense are physical devices. They are made of up hardware components, and each of those components has an ability to do some task at some speed, and that’s it. Nothing anywhere gains any sort of capacity in response to the amount of stuff it’s being asked to do. It’s not scaling, and it’s never going to. This is despite the fact that people wish they would, and as far as I can tell, a lot of people think that they ought to.

You’ll note I hedged on that last paragraph by saying “in a classical sense”. That’s because in this day and age, with the advent of virtualization, a “server” doesn’t always refer to a physical box anymore. The argument from the last paragraph still holds though. If you’re on a virtual server of some sort, there are defined amounts of physical resources assigned to it, and that’s what’s available. It still doesn’t have some inherent ability to adjust to the workload it’s being asked to perform.

A good analogy is with a car. If you go to the dealer and buy a Honda, you get a Honda. If, when you’re evaluating it, you ask the sales person, “Hey, if I really need to go fast, will this Civic scale into a Formula One Race Car?” you’d get a very funny look. Obviously, that’s not how it works. But for some reason, in server-landia, people have this idea that it can. Yes, it’s true that if you have a really good mechanic (analogous to a systems administrator) and a really good driver (analogous to a programmer) you might be able to get that Honda to go around a track faster than it did off the show room floor. But, it’s still never going get entered in a Grand Prix event.

What does scale on the web is architectures. People spend a lot of time on this, myself included. Architectures are about the only thing that you can honestly say scales. Everything else is hyperbole.

“But wait!” you say … “you work at an ISP, and didn’t you guys talk about your (gs) Grid-Service and say it was ready to scale?”

Well, yeah, we did. But the servers aren’t the parts that scale. That platform is a load balanced cluster with lots and lots of servers behind it, and that all “just works” for those customers transparently, as if it was one server. That’s the architecture that we developed, so that for the web servers, those customers have built in horizontal scaling over many nodes. The servers themselves don’t scale one whit, but the architecture does. Similarly, we developed technology for that platform to move databases around for customers transparently, and to put them into VPS servers with dedicated resources if a particular customer needs a lot of MySQL juice, all without any intervention. So there, we have an architecture that provides some automagic vertical scaling for customer databases.

There is a somewhat similar story for the (dv) Dedicated-Virtual servers that we sell at work. As noted, those are servers that are assigned defined amounts of physical resources. The software that does this allows us to very easily change those amounts, and to move the servers around between physical hardware that has sufficient resources to provide the assigned levels of things. Can this be done on-the-fly and automagically? Yes, it can be done. But if you figure out how to make that happen, you just implemented an architecture. And, for the record, if you do develop an architecture that just makes your VPS bigger when needed, you’ve implemented an architecture for vertical scaling. Vertical scaling is neat because you don’t have to change your app, but it will only take you so far, because eventually no one physical machine will be big enough anymore. In any case, the server is still always just a pool of hardware resources for your app.

So why am I here preaching about this today? Well, it’s because frankly I’m sort of sick of seeing things like “those servers don’t scale” on Twitter / blog posts / etc. Sure, they’re correct, “those servers don’t scale” because SERVERS DON’T SCALE. And the way that sentiment is generally announced is by somebody piping up and saying “FooBar site is down, and so their ISP sucks!!!”.

Case in point: http://wefollow.com.

Let’s take a quick walk through time and talk about that site. A few months back, Kevin hit me up and said he was going to need some hosting for a personal project. Since I work at (mt) Media Temple, I was able to facilitate that. Since Kevin’s a friend, I offered to help with setup and configuration. For the record, the site is hosted on two (dv) Rage servers. One is a webserver machine, the other is a database machine, both participate in a memcached pool.

I did some basic setup and then pretty much forgot about it for a while. When it was getting close to launch, he hit me back again and asked me to give things a once over, which I did, albeit not very thoroughly as we’ll see. The site was launched during SXSWi this year, and promptly crashed. Naturally, this was immediately twittered about and there was a bunch of “media temple sucks” tossed around. As if, somehow, (mt) was responsible for the site being down.

The site went down right away because I missed something fairly obvious that I really should have caught in the Apache configuration. That was it. When I found it and fixed it, the site popped right back up.

Until of course the next day, when it crashed again. Queue the cries of (mt) sucking from all over. That second spat of downtime was because the programmer for the site, Jeff Hodsdon, missed a database query that should have been getting cached in memcached but wasn’t. That query got hot, MySQL promptly went into a tailspin, and the site went down. It took me a little while to find the bad query, but once I did Jeff fixed it in about five minutes and things were okay again. [For the record, I'm definitely NOT dissing Jeff's coding chops. This was a side project and nobody had time for real QA. He missed one, it happens. If I had been as good at 19 as Jeff is, I'd probably be retired by now. Kid has some serious fu happening.]

After that, and I forget exactly when, there were two more brief periods of downtime. Both were because I tried doing some more intricate things with the Varnish configuration and I got it wrong. Since I got that sorted out, the site has been running smooth as teflon, even through the last traffic whompings when Kevin broadcasted something about the site to his eighteen-squajillion followers on Twitter.

The key point I’d like to highlight here is this: at no point did the two VPS servers change at all. Not a bit. Zero. Nada. Zilch. At one point during those last two oopses Kevin IM’d me and said something to the effect of “Hey man, I really don’t mind paying for bigger servers.” I’ve been doing this for a long time and given the amount of traffic the site is pushing, I knew that there was enough hardware there already. I asked him for another day to try and get things configured right, and (knock on wood) I did, and all’s been well. (And yes, for the astute readers out there, we have seen traffic on par with what we saw at launch since these last fixes and things have been fine).

But all was fine with the servers the entire time. The servers have always provided the amount of resources that they were advertised to have. When I provisioned them using the internal (mt) provisioning system, I got what I was supposed to get. So basically, the servers have always been doing exactly what they should have, (mt) did exactly what it should have, and the only entity responsible for the site going down is the person who configured things wrong. In this case, that was me. The servers never changed at all. What did change to get things nice and stable was that we fixed the code and configs, which is what currently constitutes the architecture for the site.

Again, just to be clear: we vastly increased the efficiency of the architecture that the site runs on, and that’s why it’s running well now. The servers are exactly the same as they were on day one, when everything died. The servers didn’t scale, because they’re just resource pools, but we figured out how to use the resources they provide properly.

“But what about The Cloud?” you say. Oh boy. Don’t get me started. I’ll save that one for a different post. ;)

7 Responses to “Your Server Doesn’t Scale”

  1. David says:

    Amen brother. I’m going to mash on my gas pedal in traffic on the way home today to see if traffic will suddenly part like I’m Moses at the Red Sea. My car is, after all, freeway-ready-out-of-the-box which means that I should be able to go 65 (ish) on the freeway if I just give it a little gas right?

  2. imran says:

    I noticed those hiccups when wefellow was launched, i don’t hold MT responsible for that, rather it is Kevin Rose itself who was too hasty to launch the site… at the end of the day people are more likely to curse the webhost whatsover the reason could be behind site going down… very well explained.

  3. Very well put! I wish more people understood this principle. Oh and +1 for using a car analogy =)

  4. While I would love to agree with your viewpoint, but I can’t. it is the same issue as “Programming Languages do not scale” rhetoric. Architecture is the way to go when you need to scale properly. However, this one has a twist.

    With MS and VMWARE products, you can add resources to virtualized instances on the fly. Some call it “Dynamic Resource Management”, “hot swappable resources”, etc… The main requirement is that the guest OS must support this. I’m only aware of MS products being aware/built for this situation. When the DB needs more ram and cpu, I can add it on the fly, enabling me to have enough time to fix said issue.

    “• Dynamic resource management: Windows Server virtualization provides the capability to hot add resources such as CPU, memory, networks and storage to the virtual machines with no downtime. Combined with the hot add features of Windows Server “Longhorn”, this enables administrators to manage their hardware resources without impacting their SLA commitments….” — http://download.microsoft.com/download/3/2/2/32212eab-a431-4cd4-8567-cf951b1322de/Virtualization.doc

    MS beat VMWARE to the punch long ago. :\

    VMWARE, at this time, does not allow hot swappable with all pieces of the virtualized hardware. Just network, cpu, and ram. They are ways you can run around the resource scaling of the hard drive (linux-LVM sitting between the physical resources and the hypervisor). I can’t imagine that would be very efficient/powerful.
    http://www.malaysiavm.com/images/vsphere/enable_memory_hot_add.png

    Honestly, you can never prevent a backend developer from doing multiple query joins from WAN-distributed databases just so you can load the landing page ;-) However, you can add resources on the fly when you need to get along.

  5. Chris Lea says:

    @John Menerick

    I understand your point, but I still disagree. Though in fairness, I think we are disagreeing on semantic issues here.

    Yes, there certainly are ways to add physical resources to the pool without downtime. There’s all sorts of ways to do it. People have been able to open up IBM mainframes and add RAM to them without turning anything off forever.

    My point is that the servers themselves don’t just automatically up their own resource levels in response to anything. If you implement some sort of watchdog functionality, where the servers realize they’re getting too overloaded and ask for more resources, or if there’s some other system watching the servers and deciding how much resources they need, then you can do this sort of thing on-the-fly. Albeit vertically in this scenario. But if such a thing is implemented then that is an architecture unto itself. The concept of “the server” is, at any point, the physical resources that are provided to you. So, the thing I was trying to get across is that it’s pointless to say “this server doesn’t scale”. The server either provides you with the physical resources it says it’s going to, or it doesn’t. If you change the amount of available resources, you can claim that you scaled the server, but the server at that point still doesn’t scale. It just provides an adjusted amount of physical resources.

    Thanks for your comment though. It’s good for everybody to be aware of all the facets of this discussion, and I probably didn’t go in depth enough about the points you bring up.

Leave a Reply

7 Comments

chrislea.com is proudly powered by wordpress
entries (rss) and comments (rss).

Switch to our mobile site