ServerDir 2.0

As I am putting together the architecture for the new game we’re building at Divide by Zero, I am spending a fairly significant amount of time thinking about where the weak spots in the Pirates architecture were. The servers in Pirates worked out pretty well, but I think I can do better the second time around.  This is the first of N posts describing how I intend to evolve Server Architecture v1 into Server Architecture v2.

By far the biggest scaling problem Pirates ran into right at the start of open beta was the Server Directory (ServerDir) database. This was the direct result of incredible naiveté on my part about how much load a single database could handle. The original design of ServerDir called for every process in every cluster to connect to one shared database and to update its own status in that database every five seconds. When you multiply that update by all the instanced zones in the game (plus other miscellaneous servers) you find that the database needs to handle thousands of updates per second from tens of thousands of connections. It turns out that Microsoft SQL Server is not up to the task. (There’s also the little problem that the single shared ServerDir database was a single point of failure for the entire service.)

Pirates ServerDir on a single DB

 

Original ServerDir design

When a single ServerDir was obviously not going to work, we expanded the system slightly to split that single database into up to one database per cluster. This still put quite a bit of load onto the ServerDir DB, but there were now enough of them to allow SQL Server to keep up.  This is the setup that Pirates was using when I left Flying Lab in July of 2008.

Pirates ServerDir with one DB per cluster

Final ServerDir design

Within a cluster the ServerDir database was used by a process called Big Brother to monitor the health of the cluster. Each physical server machine in the cluster has an instance of Big Brother running on it, and they automatically pick one of their number to be the primary Big Brother for the cluster. This process is responsible for deciding which other processes need to be launched, as well as clearing out the ServerDir entries for processes that have crashed. If you want to read more about the specifics of the ServerDir system, you can read all about it in Massively Multiplayer Game Development 2. I wrote an article on the Pirates architecture years before the game launched, and it really didn’t change too much.

Pirates ServerDir inside a cluster

ServerDir Inside a Cluster

ServerDir 2.0

There are several fundamental problems with the original ServerDir that I intend to fix with version 2.0. First is the reliance on a database as the point of synchronization. Databases are not built for this kind of transient data, so they handle it poorly.  The second problem is the way the Big Brothers communicate with each other via UDP (the dashed lines above indicate non-persistent or UDP connections.) This pointlessly complicated the protocol between Big Brothers by requiring them to compensate for dropped network packets. Another goal for the new ServerDir is actually driven by broader architectural changes I want to make, specifically that I want to promote “shard” from being an operations-level concept to one that is entirely in game design and UI.  That will require far more machines with far more processes per cluster, and ServerDir will need to cope. The fourth and final fix in the new ServerDir is that the old version of Big Brother actually does a pretty poor job of dealing with hung processes. We had some periods during Beta where we were getting some of those, and the operations staff had to deal with them by restarting clusters regularly and running scripts to kill all the zombies.  What follows is a sketch of my initial design for how to accomplish all this.

ServerDir v2.0

ServerDir v2.0

The biggest change here is that individual cluster processes no longer connect to ServerDir directly. Instead they open a persistent connection to their local Big Brother, and Big Brother updates ServerDir on their behalf. Part of this change is that the “every five seconds” updates never go into ServerDir at all.  ServerDir is notified of two events for processes: process started and process stopped. All of the “is this process hung” detection is now the job of each individual Big Brother. While a cluster process is up, it will send period updates to Big Brother, and if none arrive for too long a period of time, Big Brother will kill the process and clean up ServerDir.

Another significant change is that instead of the point of synchronization being a database, the point of synchronization is a web service. Whether there is a database (or multiple databases) backing up that web service is entirely invisible to the tools and to the cluster processes. Using a stateless API with no persistent connections also makes the task of scaling the ServerDir resource much easier. With load balancers and some reasonable architecture on the back end, single points of failure and scaling problems with ServerDir itself can be all but eliminated.

My next post will go into much greater detail on the new web service and how BigBrothers and operations tools interact with it. Once I’ve covered the new ServerDir plan I can get into my whacky new ideas for the game servers themselves.

What do you think? See any red flags in my high level sketch?

~Joe


13 Responses to “ServerDir 2.0”

  1. Whaledawg commented on :

    Not to be nitpicky, but you don’t graphically show the difference between your UDP connections and your web service connections.

    Also, what kind of overhead was UDP adding that a web service won’t? If dropped packets was the problem then why wouldn’t a simple TCP connection solve it?

  2. Joe said on :

    Yeah, I probably could have indicated one of those in a different way. :)

    UDP was adding complexity more than overhead. Because packets could be dropped the protocol had to include timeouts, acks, and retransmits to guarantee that requests were actually being handled. Simply using a TCP connection with the Pirates setup would certainly have solved that problem.

    The web service is more to address the scaling limitations imposed by the SQL database than to compensate for UDP, though. I want to have hundreds of machines with tens of thousands of processes, and the SQL approach just can’t cope with that much load.

  3. ArmEagle (PotBS) wrote on :

    Argh, I read “web service” near the end and suddenly I got nightmares of this Java web service piece of crap I worked with..

  4. Amol Deshpande wrote on :

    Is the status information being tracked a complex object ? If not, could you simply use Windows performance counters which give you easy, scalable ways to monitor large numbers of hosts ?
    (assuming your servers are Windows, which I seem to recall is true)

  5. Amol Deshpande said on :

    oh, I do mean *custom* application counters, not the built-in OS counters.

  6. Joe said on :

    Our servers for Pirates were on windows. It’s not clear yet if the servers for DBZ will be windows or Linux. (Those server OS licensing fees are killer.)

    Do you have a link to any good primers on using performance counters to track this kind of process?

  7. Amol Deshpande commented on :

    The Windows platform SDK has a few samples that are quite complete.
    msdn link:
    http://msdn.microsoft.com/en-us/library/aa373083(VS.85).aspx

    Unfortunately, I don’t know of any other primers. If one reads the documentation on the architecture and then studies the samples, I think that’s good enough to get going.

    The native Windows API for perf counters is a bit complicated (isn’t that always the case). The good news is that if you monitor these from a .net service, it’s a lot simpler.

  8. Matthew Weigel wrote on :

    One big thing you haven’t mentioned is the scale of the new project. Are you working on something similarly sized to PotBS (in terms of server count, server load, that kind of thing)?

    For that kind of scale, most of your design makes sense, but internal web services throw up a red flag for me. Their main advantages are that they’re easy to do in interpreted languages, and they hide complexity between programming groups (or people) who don’t communicate. That makes sense for external customers, or maybe jointly-managed studio/publisher systems, but it seems heavyweight for an internal interface.

    Otherwise, I definitely like the idea of having a ‘gatekeeper’ process between whatever does the real work, and the database.

  9. Joe replied on :

    Obviously we are a long way from knowing how many people will play our game (and thus how many servers we’ll need), but the hope is that it’s bigger than PotBS. The servers will be more uniform than they were on PotBS, but I think that affects the way the servers are broken up internally more than it does the way they are kept up and running.

    An internal web service is definitely not as efficient as something more purpose-built would be. What I like about it is that I can use off the shelf load balancing technologies to make such a service scale. Writing something that scalable from the ground up is not trivial. On the other hand, most of what the web service would do is act as a message queue and process “update this in the DB” type requests. I could probably build the whole thing over a scalable message queue service (like ActiveMQ) and provide a small set of event consumers to handle the DB updates out of messages flying around the MQ.

  10. BugHunter said on :

    Hey Joe, you’re planning on REST style web services instead of the SOAP ones right? They’re a bit lighter, than SOAP (I so hate wsdl nonsense). Also, have you looked at Windows Communication Foundation?

    Just stuff…

  11. Joe said on :

    I don’t know yet. I think it might be a little difficult to map the message-queue API I need onto a RESTful service. I was expecting it to be state-free though since BigBrother could go away at any time.

  12. Adam replied on :

    I totally don’t understand what the problem is you’re trying to solve here.

    On first reading it sounded like you were just trying to track the kinds of things that SNMP is designed for picking up, in which case of course you should be using perf counters (either native or implement some custom equivalent of your own if you need to, it’s not too tricky). But if that were all you were doing then you’d never consider a DB, because it wouldn’t add anything, and would only take away.

    So I’ve clearly missed the point somewhere :( . Could you explain in a bit more detail what you’re solving?

  13. Joe commented on :

    What I’m trying to solve is two things:
    1) Launching the many processes that make up a cluster and restarting them automatically if they fall down. In the case of pirates the exact set of processes changed over the course of the server’s lifetime, so this wasn’t a static bit of configuration.
    2) Detect when a process is hung and kill it. Once it’s dead the “restarting” part of #1 takes over.

    The problem with SNMP is that it’s generally “per service type” information, where what I’m looking for is “per process” information. Most standard network/system management schemes assume that there’s on instance of each service (or process) running on each machine. With Pirates we had more like 300-500 for some process types.

Leave a Reply