April 27, 2008

p

Coding Vacations

Filed under: Engineering — Joe @ 4:09 pm

A couple of weeks ago I took the week’s vacation from my job as the Producer of Pirates of the Burning Sea to sit at home, in my basement, and write code 12 hours a day for 8 straight days. It was a fantastic experience and I would love to do it again. If you aren’t a programmer that probably sounds crazy to you.

There are two things that made my coding vacation the awesome, relaxing, productive, and fulfilling experience it was. The first is that there is very little drag on writing code on the first few thousand lines of a project. The second is that I haven’t had much of a chance to code at Flying Lab in the past year and a half. Well those and the fact that I genuinely enjoy programming.

When you are at the very beginning of a project you have little to no drag on your efforts. There isn’t a large body of code to keep up and running when you make a new change. Your compile and startup times are incredibly fast. When you have a bug, there are far fewer places it could be. When you’re used to writing in a million-line code base, this is liberating. It’s also very productive, which feels great.

As the Pirates project has gone on, I’ve gradually been moving further from the code.  Way back when it was just me writing all the code (or even just Heidi and me) I had tons of coding tasks, but about the time we added the fourth or fifth programmer the amount of time I could devote to coding during daylight hours dropped to almost nothing. Once we signed with SOE, I picked up all of the management duties for the technical side of that relationship, which made it even worse. I wrote a little code here and there, but it was always late in the evening or on the weekend around all my other duties.

If you’ve been reading my blog for long, it shouldn’t surprise you that I think coding is fun. Ever since we got the TI 99-4/a for Christmas 1983, programming has been a hobby of mine. When I was deciding what to study in college, I really couldn’t imagine a major that didn’t involve tons of programming. It’s not work, it’s entertainment.

I assume that every other creative person who truly loves what they do has a similar attitude. I know plenty of artists who draw, sculpt, or paint on the weekends. Many game designers design card or board games that they never expect anyone else to see just for the fun of it. The writers I know can’t seem to stop writing for local newspapers, online outlets, or former employers. There’s no reason to think programming would be any different.

And I’m not alone.  One of my co-workers is just finishing up a coding vacation of his own. He took a week off from programming video games to program a video game. Good for him, I say.  He’s going to return to work more refreshed and relaxed than if he’d run off to some tropical island and it won’t have cost him a dime.  (Ok, maybe not as relaxed, but close.)

How about you?  Ever take a vacation to do more of what you already do at the office?

December 16, 2007

p

Lag sucks

Filed under: Engineering — Joe @ 11:01 pm

One thing I’ve gained through the beta process for Pirates is a healthy contempt for the word “lag”. This word is used in many different ways that have basically nothing to do with each other, and every time I hear it I have to ask, “What do you mean?” Even people who know better often end up using it because they’re repeating what players are saying about their trouble.The problem is that lag is used to describe at least three totally different things:

  1. Latency - Most often this is demonstrated to the player by noticeable command lag (I click Fire and it takes 2 seconds to happen) or rubber banding (I run around a corner and it pops me back where I was a few seconds earlier.) The cause of this latency could be in the server, the network around the servers, the internet, the player’s local network, or even in the client. It just means data either isn’t moving quickly enough, or isn’t be processed in a timely manner.
  2. Poor Client Frame Rate - Regular old crappy client performance. This happens when we’re trying to draw too much for the hardware the client is running on to handle. It could also be caused by doing too many other things on the client CPU and slowing the frame rate down. Frame rate problems are very common on low-end hardware.
  3. Hitching - Inconsistent client frame rate, usually including occasional frames that are half a second or more in length. This is caused by processing something slow in the same thread that’s responsible for drawing. In my experience that is usually loading a file. Sometimes this is made worse by the hardware the client is running on, but usually if there’s a hitch on one machine it’s probably there on another to some extent.  As an added bonus, every time you hitch your camera may also go all wonky.

All that these three things have in common is that they are all Serious Problems We Should Fix Before Launch. They differ in the way you diagnose them, by which programmers are likely to work on the problem, and by what kind of information you need to gather from the players who are experiencing the problem. Until you know what kind of lag you’re dealing with, you’re really working blind.

Lag as Latency is the most painful of the three to deal with.  Chances are you never see these problems on your office network, so they mostly turn up “in the wild.”  The problem is that the wild is really wild.  Because a player’s network hardware can contribute so significantly to network latency, you often end up asking for intimate details about the player’s network topology: traceroutes from them to your  data center, make and model of all of their network equipment, packet traces, and maybe even who their ISP is.  On multiple occasions we’ve even had to procure network equipment that matched what the users had to try to reproduce the problem.

The biggest network latency problems we’ve had on Pirates all had to do with a combination of Network Address Translation (NAT) and game data sent over UDP.  Just about everyone runs NAT these days, so these problems could hit anyone. While NAT does a great job of holding its automatic port forwarding open for TCP connections, there is no connection for UDP.  Every hardware vendor seems to have its own idea of how to set up that forwarding, how long to keep it open, and what traffic to demand from the application to extend that time. The specifics really deserve a post all their own, but we’ve seen network code that works fine on one NAT device not work at all on most of them.  We’ve seen code that keeps the port forwarding alive indefinitely on 80% of hardware stop reliably after 10 minutes on the remaining 20%.  About a year after we fixed that we found another piece of hardware that was fortunately relative uncommon stop forwarding UDP packets after just a few minutes. This is a problem that just seems to never end, and I fully expect we will still be sorting out network trouble on some new piece of NAT hardware five years after launch.

Slow frame rates are on the opposite end of the spectrum.  Standard performance tools (like profilers) and tools provided by graphics hardware vendors (like NVPerfHud) do a great job on this kind of problem. Finding the cause of a poor frame rate is relatively easy as a result, and all you typically need from the player is a description of where they were and what they were looking at. A screen shot can often do the trick.

Actually fixing a slow frame rate can be a much bigger deal.  If you have to get the art team involved you are going to waste tens or hundreds of hours of somebody’s time redoing artwork.  Fortunately you can see most of these problems coming long before you’ve built all the assets. That’s why it’s so important to be testing on your min spec the whole way through development.

Hitching is a bit more difficult to track down than a steady state frame rate problem.  There is usually some event that causes it, like a new character coming on screen, or a new part of the environment loading.  We’ve also seen hitching from server updates of health information, Lua garbage collection, and external applications that had nothing to do with our game. The profiler does a poor job of collecting information over a time span as short as half a second, so it’s typically useless at finding hitches.  Call graph analysis can help sometimes, but it tends to suffer from a long sample period too.  Your best bet is to log all events that are going on in the game and try to correlate the hitching with a small number of events.  Then you can instrument the code around those events and find the culprit. It’s a little more difficult to figure out hitches that only happen in the wild, but often if they’re happening to one player they’re happening to all players, so running against (semi-)public test servers can demonstrate the problem.

Once you figure out what’s causing the hitch, fixing it is generally not that hard. If you can intentionally cause that event to happen hundreds of times a second while you run the profiler you’ll find out where the slow code is.  You may need to time-slice an algorithm, move some work to a background thread, or speed up the work itself.  After a couple of years of tracking every hitch down to the Lua garbage collector we eventually tossed Lua out on its ear and fixed that problem.

    And yet all of these problems with all of their myriad sources, diagnostics, and solutions are all just lag to the player. Almost every time you hear a report of lag you are going to ask the following diagnostic questions:

    • What does the in-game frame rate counter show when you see this?
    • What is your ping time when this happens?
    • Does the whole screen freeze? (In our case I usually ask if the ships are still rocking or if the ocean is still moving.  These are really obvious hitch indicators in Pirates.)
    • Is your character popping around?

    The answers to these questions will help you pin down which lag your player has. I’ve also found that there’s a good chance one of your players will be fairly technically savvy and can help you track it down further.  In one case we had a player rearrange his network and hook up a laptop above his router in the network.  With his packet traces from above and below the router we were able to see exactly what was happening and fix the problem.  His name is forever immortalized next to the code fix (and we send him a nice thank you gift.)

    That’s why lag sucks.  It confuses users, customer service, and programmers alike. It’s a pain to diagnose, and often a pain to fix. And you can never really fix it because no matter what you do someone is always going to report that they are still having lag.

    November 24, 2007

    p

    How to make Microsoft SQL Server cry like a baby

    Filed under: Day Job, Engineering — Joe @ 3:46 pm

    Earlier this year we switched from MySQL to MS SQL Server. I don’t regret the switch at all; MS SQL Server has been far more stable than MySQL was, and has lots of whizzy new features. The MySQL client library was dropping connections under load and then crashing when it reconnected. That is what pushed us to switch in the first place. Well it turns out that MS SQL Server has some scaling problems of its own. It doesn’t crash, but it does get so slow as to be non-functional. This is a helpful guide that will help you make your own installation of SQL Server whimper.

    Our server boxes are 8-way 2.6GHx Xeons with 16GB RAM running Windows Server 2003 64-bit and SQL Server Enterprise Edition 64-bit. If your configuration is different your mileage may vary.

    Technique #1

    We are using a system called the Flogger to record gameplay event into a database. To make this happen, all server processes connect to one central DB and call a stored procedure per event. This works fine when the number of processes is low, as in under 500. When the load on a world instance grows the number of processes connecting to the flogger DB increases to 1200.

    Exactly how long seems to vary from a few hours to a few days, but after a while at this load SQL server decides that it has had enough and stops accepting new connections. New processes starting up time out eventually and things generally start going badly on the servers. Once SQL Server starts timing out connections the only way we’ve found to get the database running again is to restart the SQL Server service. While it’s in this state the server is only using moderate server resources.

    The way we’re working around this problem is to use files as a buffer between the server processes and the database. Every so often (depending on activity) each process will dump the events it wants to record out to file. Some time later (well under a second when there’s no load, but potentially longer on a well loaded cluster) another process that maintains a connection to the flog database reads the file, dumps it to the database, and then deletes the file. This eliminates the need for the game servers to connect to this database at all, so if it decides to go out to lunch the game is unaffected. It also makes the data collection more reliable by putting any backlog into one directory full of files instead of in memory on 1500 different processes spread across five server machines.

    Technique #2

    We have another database exhibiting similar problems, though not quite as severely. Each process in a game cluster connects to a shared database called Serverdir and uses the DB to report its status back to operations tools and the “keep everything running” processes. This data is strictly temporary and probably doesn’t belong in a database all, but Horrible Design Flaws That Are All My Fault aside, it’s just not that many queries and they’re all very simple selects and updates. This shouldn’t be a problem for server hardware as beefy as ours.

    That argument doesn’t convince SQL Server, however. After a few days SQL Server pauses for a few minutes. The CPU goes to 0% and no queries return for the entire time it’s paused. Our code responds to that by closing things down because it can’t currently tell the difference between “Query takes over a minute” and “Crashed process.” At that point half the cluster shuts down.

    We don’t have a great workaround for this one yet. We’ve been steadily reducing the load on the Serverdir database, but it doesn’t seem to take all that much load to make it happen. Our best bet is to make the code smarter and have it detect these situations. If it just sits tight for a few minutes everything will return to normal without needing to restart anything. Fortunately it only happens a couple times a week so while it’s something we definitely need to fix before launch it isn’t impacting beta tester’s ability to play.

    Making an MMO scale is a pain

    None of the profiling tools we’re using at the SQL Server or OS levels are much help with either of these problems. Nothing tells us why SQL Server is refusing connections, or why it’s refusing to work on queries. Most database books and websites think that a slow query is one that takes longer than a minute or two, but in our world that’s a dead process and a disappointed customer.

    We have made great strides in scalability since the first stress test, but no matter how many things you fix there is always one more waiting to bite you on the ass. *sigh*  We’ll get it figured out and apart from these DB troubles everything is staying up quite well at this point. We have 43 more days until the pre-order head start, so there’s still plenty of time to get through this round of problems. Then we break through into the infinite!

    My fix for the flogger scale problem is now ready for a code review, so I’m going home to play Rock Band.

    November 3, 2007

    p

    Scripting for Designers

    Filed under: Engineering, Game Design — Joe @ 10:38 am

    I started a kerfuffle on the subject of designers writing scripts. Since my original post was more about our experience with Lua than about scripting for designers I thought I would collect what I’ve already written in everyone else’s comment thread in one place.

    Raph believes that designers should know how to write scripts. I agree completely. Games are more about algorithms than they are about art, sound, or databases, and knowing how to code at some level is going to help any system designer immensely. It will allow them to communicate with programmers more effectively, it will make their designs fit better within existing game or technical systems, and it will improve the quality of their designs overall.

    Where I draw the line, however, is at actually shipping those designer-written scripts with the game. They are a fine prototyping mechanism, incredibly useful at creating gobs of data, and a brilliant simulation mechanism. Designer scripts are also often slower, more obtuse, and less maintainable than the equivalent script (or code) written by a professional programmer.

    Does that mean I think designers have some mental deficiency that makes them write crappy code? Of course not. While there are some basic concepts of programming that require a certain talent to grok (pointers, branches, order of algorithms) by and large most scripting designers have that talent. What they lack is the experience required to write code that you can keep running for years on end. Programmers spend all day, every day on the subject of how to quickly write maintainable code that runs well. For designers, it’s at best a sideline. We put our programmers though a hard-core technical interview to try to determine if we want to put up with their code. Any designer who can pass that interview is welcome to write production code in my book.

    A much better approach is to provide a rich mechanism for driving game logic with data and give designers reasonable tools to manipulate that data. That doesn’t mean designers are reduced to inputting tables of numbers. The data-driven systems we use in Pirates allow designers to add entire new game systems by combining existing building blocks. We also work closely with the designers to implement new blocks for them on a regular basis.

    Damion mentioned that schedule constraints often lead to programmers changing their tune when it comes to designers writing scripts. Tight schedules are why we integrated Lua in the first place. I thought it would let us take advantage of the people in the office who were less overloaded to write some of the game. My current position on designer scripting is a direct result of that Lua integration.

    One thing I discounted in the “let’s get some designers to write some scripts” approach was how valuable the designer’s time is. In most cases it’s easier to build a new system using our data-driven system than it would have been to implement the same system in Lua. When using data isn’t easier, a day or two of a programmer’s time can usually make it so. Our system design team is even more critically understaffed than our programming team, and by using data instead of code we can save them time.

    Just about everyone has said, “It depends on your situation.” It certainly does. If you have a team of 5 and your lead designer is also your junior programmer, you would probably be well served to have that designer writing production code. In a more general case with more specialization among your staff, it’s a bad idea to plan on all your design hires having that level of programming ability. And if you reject all designers who don’t meet some minimum programming skill level you may find it hard to hire designers.

    All in all, the Great Designer Script Debate of ‘07 has been great. It’s nice to take a break from whining about how many users Second Life doesn’t have or how raid content in WoW is the best/worst thing to ever happen to MMOs. Who’s going to kick off the next kerfuffle?

    October 31, 2007

    p

    Scripting in PotBS

    Filed under: Day Job, Engineering — Joe @ 4:54 pm

    The sweng-gamedev list is all a-flutter with a debate about the merits of scripting in games.  I wrote up a response that describes our experiences and figured I’d share it here too.

    We had Lua integrated into the client and wrote much of our UI logic written in it. We struggled with bugs in our glue layer, difficulty debugging, and major spikes in our frame times whenever the garbage collector ran. Of course the glue code was terrible to begin with and we were exporting script hooks for much of the game instead of a nice clean interface, so that didn’t help. After a while we started moving away from Lua and began implementing new UI in C++. We’ve now removed all the Lua from the game.

    On the gameplay side we use a rich data-drive system that lets designers define an arbitrary list of “requirements” with which they are able to test most any condition. When a trigger fires, object is used, or skill is activated, an arbitrary list of “results” is activated which is capable of modifying just about any state in the game.  The designers also have a few ways of maintaining persistent state on the characters depending on the circumstances.  This system is working pretty well for us and eliminates the need for any designer-written scripts.

    If I ever integrate scripting into an engine again, there are several things I’ll do differently to make it go more smoothly:

    1. No designer scripting.  If designers are writing scripts, You’re Doing It Wrong. Scripts are code, and need to be just as maintainable as all your other code.
    2. A much cleaner API layer between the C++ code and the script code.  Exporting the whole game to Lua was just dumb.
    3. A built-in debugger. Printf-style debugging is so incredibly painful when you’re used to having a rich source-level debugger.
    4. Built-in profiling. All calls across the native/script interface should be timed and memory consumption should be strictly monitored.
    5. Dynamic script loading. Part was stupid glue and part was just our poor use of the scripts, but the first time around we ended up loading all the scripts when the client booted and couldn’t reload most of them. This one of the major advantages of scripting and we were missing out on it.
    6. Much more evaluation time. We know a bunch of things to look for the next time around including slow garbage collection, object lifecycle issues, memory corruption in the glue, testability of the scripts in isolation, etc.

    On the other hand, I think writing servers in a higher-level-than-C++ language like C# or Java makes a lot of sense and would save us tons of development time. It’s the dynamically typed language with no debugger that didn’t work well for us.

    September 24, 2007

    p

    Running the PotBS Servers

    Filed under: Day Job, Engineering — Joe @ 10:02 pm

    Brendan Walker, one of the engineers on Pirates, has posted a devlog on the server operations tools we’re using in our beta (and soon in our launch.) The stuff he’s put together is pretty sweet and is really helping us stay on top of the beta.

    September 23, 2007

    p

    GWT FTW

    Filed under: Engineering — Joe @ 10:23 am

    Joel Spolsky posted an essay the other day drawing parallels between the optimization-in-assembly obsessed application developers of the late 80s and the optimization-of-download-and-compile-time obsessed web application developers of today. He proposes that someday soon someone (perhaps a “bratty Y combinator startup”) will come up with NewSDK, a rich SDK and language that does all sorts of fancy AJAXy things. Because they don’t have to convince anyone it’s a good idea, the brats will pay far more attention to functionality in NewSDK than they pay to performance, so when it first comes out the performance will be terrible. Existing AJAX apps (like everything at Google) will ignore NewSDK because of the performance problems, completely forgetting about Moore’s Law:

     But then, while you’re sitting on your googlechair in the googleplex sipping googleccinos and feeling smuggy smug smug smug, new versions of the browsers come out that support cached, compiled JavaScript. And suddenly NewSDK is really fast. And Paul Graham gives them another 6000 boxes of instant noodles to eat, so they stay in business another three years perfecting things.

    Once improved JavaScript support in browsers, more bandwidth, and faster computers do for AJAX what better compilers, more RAM, and faster computers did for PC applications, NewSDK will suddenly take off. All NewSDK apps will interoperate nicely, have a rich user experience, and work better across browsers. The market share of AJAX applications that don’t use NewSDK (or at least interoperate with it) will start to fall and the brats will take over.

    But Joel’s essay overlooks one important fact:  Google is already developing NewSDK.  The Google Web Toolkit is an SDK that allows a developer to build their entire application in Java and cross-compile it to browser-specific JavaScript for use on real clients.  The toolkit gets better every month, and with version 1.4 they aren’t even calling it “beta” anymore. (Does that mean it’s no longer Web 2.0?) Short of a few highly suspicious meteor strikes, Google isn’t going to go away before GWT has a chance to graduate from “Wow, that’s neat” to “saved us a year.”

    So far adoption of GWT has been fairly small. Three Rings is using it for Whirled. They use GWT to wrap the flash applet that does the heavy graphical lifting. They are able to call back and forth between the two so they appear to the user as one application. As the SDK improves over time, more people are bound to try it out, and it only takes one successful application for it to really take off.

    All this kind of makes me wish I had a reason to write a major web app. As long as you survive, it’s good to be an early adopter of the Next Big SDK.

    September 9, 2007

    p

    Slides from my FLogger talk

    Filed under: Engineering, Game Industry — Joe @ 1:06 pm

    I think the talk went pretty well. If you attended the lecture, I’d love to hear what you thought. I’m trying to do a better job each time I do this, so getting real feedback is important to that process.

    My slides are here. I’m not sure how much good they’ll be, but I’m hoping to record some audio to go with them and put them up on slideshare. We’ll see if that actually happens. :)

    April 11, 2007

    p

    How to anger your customers

    Filed under: Day Job, Engineering — Joe @ 8:21 pm

    A few days ago Ian Landsman posted on the subject of distinguishing between features you don’t have because they are inappropriate for your product and features you don’t have out of sheer arrogance. We recently ran into a case of the latter that has cost us many man-hours to deal with. That feature is label support in Perforce.

    But Perforce has labels, you might say. Well they have labels, but they don’t have labels that you can use like you would in any other version control tool. If you try, you will end up with a dog-slow Perforce repository and a bunch of unhappy employees. Perforce supports labels, but they simply don’t scale to real-project sizes. We found label support in Perforce and, assuming that they had support for them, set up the daily build script to operate something like this:

    1. Get the latest version of all the files out of perforce
    2. Build the game
    3. Label the version of all the files that were synched at step 1 with the version number of the build

    This is the same build process I’ve used in RCS, Starteam, ClearCase, and Visual Source Safe. It’s just what you do. You can use that label to sync a very specific version of all the files out of the repository, and we often do. Labels are central enough to the Perforce experience that there’s an entire menu dedicated to them on the top of their UI. The only clue we had that labels might not be the best choice in Perforce was this note from the label documentation:

    Before you assume that a label is required, consider whether simply referring to a changelist number might fulfill your requirements.

    Well fast forward four and a half years and run a build every work day. Over that time our Perforce repository accumulated over 1100 labels. That’s certainly a lot of labels, but since almost all of them were sitting inert in the database, it didn’t seem like a problem. But over the past 4-6 months our Perforce performance has really started to bog down. As of last week, simple operations were taking 5 minutes or more. And by “simple” I mean refreshing the UI.

    We tried upgrading our Perforce server first. It was 4 years old, so that seemed like a reasonable explanation, but it actually made the problem worse. When we finally called Perforce support to find out what was going on, they were shocked that we had the gall to use labels. After all they, only included them in the product to ease the transition from other version control systems, not because anyone was supposed to use them. At the end of the support call, after rolling back to an old server version and occupying one programmer and one IT guy for an entire day, they sent an email that included this gem:

    You will see better performance even before you delete the labels — using changes instead of labels is far less intensive in terms of resources. Also, they may grumble about it now, but I suspect your developers will be much happier when they realize that they don’t have to create and maintain labels any more — the change number system is automatic.

    Excuse me? I don’t “have to” create and maintain labels now. I type “build” and the labels just happen. The level of “our way is better, and we know what’s best for you” in this email is astounding. It mirrors earlier comments made over the phone by the same support guy.

    Now, I don’t really care about the labels vs. changelist numbers debate. They want us to use changelist numbers, and if we had known labels didn’t really work from the start, we would have written our build system around changelist numbers instead. What offends me is that a feature they display so prominently is something we aren’t supposed to use, but they couldn’t be bothered to tell us this anywhere in the documentation. If labels are such a problem for them, they should put something like “Labels have serious performance implications for large repositories. Use changelist numbers instead.” somewhere in their documentation.

    And what’s with the attitude? We paid nearly $40,000 for this product (for our 50 seats.) We pay them $8,000 per year to keep our license up to date so we can add new licenses when we need to. That’s an awful lot of money to pay for them to tell us in a snooty voice we’re wrong for using a documented feature in their product.
    Yesterday I wrote a quick one-off script to delete all the build labels that were more than a month old and which didn’t point to a version of the game we ever distributed or showed to anyone. Now that those are gone, our Perforce performance seems to be back to normal. We may eventually switch to changelist numbers, but it will be on our schedule, not in some sort of emergency fix mode. We’re also looking around at other revision control systems that won’t look down on us as much as Perforce apparently does. Anybody have anything to say one way or the other about Accurev?

    March 28, 2007

    p

    Who is my customer?

    Filed under: Engineering, Game Industry — Joe @ 9:37 pm

    There are actually two answers to that question.  The first and most obvious is the end user – the person who actually buys the game.  But, the right answer for me, a game programmer, is the artists and designers who are going to take what I build and use it to make an actual game. It is by paying attention to those people that I can have the biggest impact on the game.

    So why aren’t the end users my customers? Well they are.  I’m responsible for making sure the game runs well on their computers, doesn’t crash, and faithfully implements the game design.  One thing that I probably can’t help with is making the game fun. I usually don’t design the UI, rarely design the game systems, and never tune a damned thing. Making the game pretty isn’t my job either.  The art team produces all the visuals, and if something is ugly it is far more likely to be their problem than mine. Performance is a duty I own jointly with the other departments – artists and designers are often under strict constraints determined by how many polygons, pixels, particles, or AIs I’ve managed to coax out of the system.

    The group of people that are, by far, the most important customers for me are the designers and artists.  These people take the inert lump of code I’ve created and make a game with it. My job is to provide them with the tools they. Those tools vary depending on the customer, but in general they spend as much or more time using my tools as I do building them. I pay a lot of attention to compile and link times to keep my code-compile-debug loop as short as possible and I also need to pay close attention to the iteration time of designers and artists. This is one of the biggest things I can do to increase the productivity of the other departments.

    I use data-driven design to put as much of the game as possible into data files and out of the code. I need to make every single tuning value something that can be modified without a code change, and if possible without rebooting the game. This has the advantage of greatly reducing the amount of up-front work a designer has to do – they don’t need to know what missions the game will have or what recipes the economy will contain, just how a mission interacts with other game systems, or how a recipe uses ingredients and produces products. Using this information, I can build the system and leave all the what for later. Knowing designers are going to change their minds about how things work, it is my responsibility to build things to be as adaptable to change as possible.  It is also my responsibility to make creating the what as easy as possible, which usually means developing some sort of editor for all the data files the game will read.

    While data-driven design for game systems is relatively new phenomenon, game art has been data-driven since programmer art was banished from professional game development. It is my responsibility to take the output of art packages and get it to display in the game in real-time. To help with art iteration time, I need to get the art to display as final game visuals as early in the pipeline as possible.  An important part of serving the artists is to work with the art director to determine what graphical features the game engine actually needs, and implement those as early in development as possible.

    I need to treat these internal customers as I would an external customer. They need stable code from me on a regular basis.  Their tools need to run and boot quickly. Warning dialogs and reasonable UI choices must save them from common human error. They must be able to see the output of their work without any involvement from me because they are always going to outnumber me. The list of features I implement needs to be driven by their needs, not by whatever new technology I thought would be cool, or whatever is easiest for me to implement.  I need to work with them to understand their needs instead of building systems for myself imagining that they are just like me. Under no circumstances should they have to throw away their work on my account – I need to make data files backward compatible or write further tools to convert them to the new formats.

    If I can manage to do all of these things, the team will be very productive and quite happy.  Artists and designers joining the team from other companies will marvel at the way things work on my team. When the game finally ships I will have done everything I can do to make it the best game it can be, and will deserve a big part of the credit for the game’s success. After all, a stack of design documents and XML files isn’t a game any more than a collection of code is.  And a bunch of Maya files with no runtime engine is called a “movie”.  I may not make the game, but without me the game could not exist.

    Next Page »