Archive for the ‘Engineering’ Category

Five Kinds of Programmers

I recently had a conversation with one of the long-time programmers on Pirates that got me thinking about how I think about programmers. Over the course of my career I’ve run into several archetypes of professional programmers. I thought it might be interesting to formalize my thinking on the subject, and this is the result.

The Researcher

These programmers are more scientist than engineer. If your organization has a research lab, it is probably stocked with Researchers. Since academia is just one giant lab, it is almost entire filled with Researchers.

The Researcher loves to find solutions to problems that are poorly understood. They are on the bleeding edge of their technological specialty. If there are no papers out there that explain how to do something they will write one.

One downside of the Researcher is that there are so many interesting problems out there that need solving that they have trouble actually finishing any solution before they move on to the next thing. When you can get these guys to check in some code it’s usually great, but it takes them far longer than it would take other kinds of programmers to actually implement anything. They are also the most likely archetype to suffer from Not Invented Here Syndrome.

The Explorer

Like the Researcher, the Explorer is unafraid of the poorly defined dark corners of technology. The key difference is that when the Explorer delves those depths it is to get things done, not for the joy of the exploration itself.

When you have a really thorny problem that you don’t know how to solve, this is the programmer you give it to. Explorers will dig into unfamiliar code-bases and problem domains with a shocking level of energy. These programmers are by far the quickest learners, and are a great resource for other programmers who are trying come behind them into new territory.

The downside of Explorers is that their single-minded practicality can make their code a little sloppy. These programmers dedicate a lot more time to putting their current task behind them than they do to writing code they would want to maintain years down the road. This doesn’t mean that the code won’t work, but that if an extra #include or circular dependency will save an hour the Explorer is always tempted to cut that corner.

The Craftsman

The highest quality code in your code-base was probably written by a Craftsman. Your QA department loves Craftsmen. They value the quality of their work above all else.

When a new system just has to work, you give it to a Craftsman. They will do a great job coding it, and then test it until it is perfect. Craftsmen are absolutely the best programmers when it comes to handling exceptional conditions and corner cases. In my experience Craftsmen also excel at writing maintainable code because they know that they’re going to have to come back to it someday.

Unfortunately all that quality comes at a price. The Craftsmen on your team are the slowest programmers you have. When they estimate tasks they generate the most accurate estimates, but also the biggest. (Partly because they always include the debugging time that everyone else hopes won’t be necessary.) Their emphasis on quality and reliability also means that Craftsmen are terrified of unfamiliar parts of the code-base or poorly defined problems.

The Activist

You know that guy on your team who is pushing Test-Driven Development, is constantly refactoring code, and actually uses the names of design patterns? That guy is your Activist. They are the driving force for architectural and process improvements on your team.

Activists want the code quality in your project to be as high as it can be. They give tough code reviews, and even tougher design reviews, but that’s a good thing. Every time someone on the team listens to the Activist, they are improving as a programmer.

On the other hand, their ceaseless pursuit of perfect code hurts the productivity of the Activist. Quick hacks are physically painful to them, even when that is exactly what the situation calls for. Paradoxically, they also often introduce bugs with their refactoring that never would have come up otherwise. (On the plus side, the refactoring makes fixing that bug far easier.)

The Workhorse

In their various ways, all of the programmers above are sacrificing some of their capacity to their particular quirks. Workhorse programmers don’t do that. They are in a single-minded pursuit of adding as much to the system as possible, and as a result end up owning vast chunks of the code-base.

If you were count lines of code per programmer, the Workhorses would come out ahead. (That’s assuming you don’t count generated code from the Activists.) Sheer output is the domain of these kind of programmers. If you have a few great Workhorses on your team you will be able to do things that other teams only dream of.

The dark flip side of what a great workhorse can accomplish is that a bad one will do absurd amounts of damage to your code-base. Workhorses don’t have any significant dedication to quality that allows them to avoid doing bad things. Sometimes make up for this by having enough time to build the system two or three times in the time that a Craftsman would build it once, but that’s always painful. A single bad Workhorse can do enough damage to negate the positive effect of one or two other programmers.

What kind of programmer are you?

You will notice that none of these archetypes are particularly bad or particularly good. There can be good or bad programmers of any archetype. All the teams I’ve ever been on have had a mix of archetypes. For that matter, very few programmers could be assigned to one archetype.

Personally, I think I’m mostly a Workhorse with a little bit of Activist and Explorer mixed in. I am put to shame by the ability of the some of the programmers around me to suss out how to do some radical new thing. I’m not hard-core enough about process or code quality to keep up with the Activists on the team. The one way I compete is on quantity, and most of that code is fortunately good enough to not doom any projects I’ve been on up to this point.

What about you? Where would you fit in this taxonomy? Do you recognize any programmers you know?

Pirates Post-partum at ION

At ION I gave a talk on our development process for Pirates. Darius Kazemi has posted a transcript of the talk. It’s also up at the Vault Network. I wonder how much buzz it’s going to get.

I’m giving the same talk at AGDC this year, so if you missed me at ION you can catch it there.

Scaling on a Dime at ION

I spent the week at the ION game conference. The first of my two speaking parts was a panel on scaling your development effort.  Darius live-blogged the thing.

Coding Vacations

A couple of weeks ago I took the week’s vacation from my job as the Producer of Pirates of the Burning Sea to sit at home, in my basement, and write code 12 hours a day for 8 straight days. It was a fantastic experience and I would love to do it again. If you aren’t a programmer that probably sounds crazy to you.

There are two things that made my coding vacation the awesome, relaxing, productive, and fulfilling experience it was. The first is that there is very little drag on writing code on the first few thousand lines of a project. The second is that I haven’t had much of a chance to code at Flying Lab in the past year and a half. Well those and the fact that I genuinely enjoy programming.

When you are at the very beginning of a project you have little to no drag on your efforts. There isn’t a large body of code to keep up and running when you make a new change. Your compile and startup times are incredibly fast. When you have a bug, there are far fewer places it could be. When you’re used to writing in a million-line code base, this is liberating. It’s also very productive, which feels great.

As the Pirates project has gone on, I’ve gradually been moving further from the code.  Way back when it was just me writing all the code (or even just Heidi and me) I had tons of coding tasks, but about the time we added the fourth or fifth programmer the amount of time I could devote to coding during daylight hours dropped to almost nothing. Once we signed with SOE, I picked up all of the management duties for the technical side of that relationship, which made it even worse. I wrote a little code here and there, but it was always late in the evening or on the weekend around all my other duties.

If you’ve been reading my blog for long, it shouldn’t surprise you that I think coding is fun. Ever since we got the TI 99-4/a for Christmas 1983, programming has been a hobby of mine. When I was deciding what to study in college, I really couldn’t imagine a major that didn’t involve tons of programming. It’s not work, it’s entertainment.

I assume that every other creative person who truly loves what they do has a similar attitude. I know plenty of artists who draw, sculpt, or paint on the weekends. Many game designers design card or board games that they never expect anyone else to see just for the fun of it. The writers I know can’t seem to stop writing for local newspapers, online outlets, or former employers. There’s no reason to think programming would be any different.

And I’m not alone.  One of my co-workers is just finishing up a coding vacation of his own. He took a week off from programming video games to program a video game. Good for him, I say.  He’s going to return to work more refreshed and relaxed than if he’d run off to some tropical island and it won’t have cost him a dime.  (Ok, maybe not as relaxed, but close.)

How about you?  Ever take a vacation to do more of what you already do at the office?

Lag sucks

One thing I’ve gained through the beta process for Pirates is a healthy contempt for the word “lag”. This word is used in many different ways that have basically nothing to do with each other, and every time I hear it I have to ask, “What do you mean?” Even people who know better often end up using it because they’re repeating what players are saying about their trouble.The problem is that lag is used to describe at least three totally different things:

  1. Latency – Most often this is demonstrated to the player by noticeable command lag (I click Fire and it takes 2 seconds to happen) or rubber banding (I run around a corner and it pops me back where I was a few seconds earlier.) The cause of this latency could be in the server, the network around the servers, the internet, the player’s local network, or even in the client. It just means data either isn’t moving quickly enough, or isn’t be processed in a timely manner.
  2. Poor Client Frame Rate – Regular old crappy client performance. This happens when we’re trying to draw too much for the hardware the client is running on to handle. It could also be caused by doing too many other things on the client CPU and slowing the frame rate down. Frame rate problems are very common on low-end hardware.
  3. Hitching - Inconsistent client frame rate, usually including occasional frames that are half a second or more in length. This is caused by processing something slow in the same thread that’s responsible for drawing. In my experience that is usually loading a file. Sometimes this is made worse by the hardware the client is running on, but usually if there’s a hitch on one machine it’s probably there on another to some extent.  As an added bonus, every time you hitch your camera may also go all wonky.

All that these three things have in common is that they are all Serious Problems We Should Fix Before Launch. They differ in the way you diagnose them, by which programmers are likely to work on the problem, and by what kind of information you need to gather from the players who are experiencing the problem. Until you know what kind of lag you’re dealing with, you’re really working blind.

Lag as Latency is the most painful of the three to deal with.  Chances are you never see these problems on your office network, so they mostly turn up “in the wild.”  The problem is that the wild is really wild.  Because a player’s network hardware can contribute so significantly to network latency, you often end up asking for intimate details about the player’s network topology: traceroutes from them to your  data center, make and model of all of their network equipment, packet traces, and maybe even who their ISP is.  On multiple occasions we’ve even had to procure network equipment that matched what the users had to try to reproduce the problem.

The biggest network latency problems we’ve had on Pirates all had to do with a combination of Network Address Translation (NAT) and game data sent over UDP.  Just about everyone runs NAT these days, so these problems could hit anyone. While NAT does a great job of holding its automatic port forwarding open for TCP connections, there is no connection for UDP.  Every hardware vendor seems to have its own idea of how to set up that forwarding, how long to keep it open, and what traffic to demand from the application to extend that time. The specifics really deserve a post all their own, but we’ve seen network code that works fine on one NAT device not work at all on most of them.  We’ve seen code that keeps the port forwarding alive indefinitely on 80% of hardware stop reliably after 10 minutes on the remaining 20%.  About a year after we fixed that we found another piece of hardware that was fortunately relative uncommon stop forwarding UDP packets after just a few minutes. This is a problem that just seems to never end, and I fully expect we will still be sorting out network trouble on some new piece of NAT hardware five years after launch.

Slow frame rates are on the opposite end of the spectrum.  Standard performance tools (like profilers) and tools provided by graphics hardware vendors (like NVPerfHud) do a great job on this kind of problem. Finding the cause of a poor frame rate is relatively easy as a result, and all you typically need from the player is a description of where they were and what they were looking at. A screen shot can often do the trick.

Actually fixing a slow frame rate can be a much bigger deal.  If you have to get the art team involved you are going to waste tens or hundreds of hours of somebody’s time redoing artwork.  Fortunately you can see most of these problems coming long before you’ve built all the assets. That’s why it’s so important to be testing on your min spec the whole way through development.

Hitching is a bit more difficult to track down than a steady state frame rate problem.  There is usually some event that causes it, like a new character coming on screen, or a new part of the environment loading.  We’ve also seen hitching from server updates of health information, Lua garbage collection, and external applications that had nothing to do with our game. The profiler does a poor job of collecting information over a time span as short as half a second, so it’s typically useless at finding hitches.  Call graph analysis can help sometimes, but it tends to suffer from a long sample period too.  Your best bet is to log all events that are going on in the game and try to correlate the hitching with a small number of events.  Then you can instrument the code around those events and find the culprit. It’s a little more difficult to figure out hitches that only happen in the wild, but often if they’re happening to one player they’re happening to all players, so running against (semi-)public test servers can demonstrate the problem.

Once you figure out what’s causing the hitch, fixing it is generally not that hard. If you can intentionally cause that event to happen hundreds of times a second while you run the profiler you’ll find out where the slow code is.  You may need to time-slice an algorithm, move some work to a background thread, or speed up the work itself.  After a couple of years of tracking every hitch down to the Lua garbage collector we eventually tossed Lua out on its ear and fixed that problem.

    And yet all of these problems with all of their myriad sources, diagnostics, and solutions are all just lag to the player. Almost every time you hear a report of lag you are going to ask the following diagnostic questions:

    • What does the in-game frame rate counter show when you see this?
    • What is your ping time when this happens?
    • Does the whole screen freeze? (In our case I usually ask if the ships are still rocking or if the ocean is still moving.  These are really obvious hitch indicators in Pirates.)
    • Is your character popping around?

    The answers to these questions will help you pin down which lag your player has. I’ve also found that there’s a good chance one of your players will be fairly technically savvy and can help you track it down further.  In one case we had a player rearrange his network and hook up a laptop above his router in the network.  With his packet traces from above and below the router we were able to see exactly what was happening and fix the problem.  His name is forever immortalized next to the code fix (and we send him a nice thank you gift.)

    That’s why lag sucks.  It confuses users, customer service, and programmers alike. It’s a pain to diagnose, and often a pain to fix. And you can never really fix it because no matter what you do someone is always going to report that they are still having lag.