April 28, 2008

p

Getting Feedback

Filed under: Day Job, Production, Uncategorized — Joe @ 9:33 pm

Andy Brice recently posted on getting feedback from software customers. With Pirates, our options are similar but somewhat tweaked.

We host our own forums for our user community to hang out on. On most MMOs about 10% of the player base actually uses these, and they self-select into a very hard-core and usually unhappy group. We can use the forums to find out what they’re unhappy about, but they probably don’t represent the actual player base very well. Still, listening to this segment of our community is important.

Click-cancel surveys are another common option. When someone goes to your site to cancel their subscription you ask them why they’ve canceled. SOE isn’t currently set up to run these, so we don’t have that data available, but many games do this kind of survey. This information is useful for finding exit points for players so you can eliminate them.

Recently I’ve started doing something a little different. I show up in game with no warning whatsoever and announce that I’m running an impromptu devchat. I offer to teleport any players who want to attend to an out of the way spot and then spend an hour or so answering their questions. I’ve run four of these so far (with one of our designers helping out on all but one of them.)

The biggest difference between what I hear in these impromptu devchats and what I read on the forums is the tone.  The forums are all about this OMG important issue or that OMG important issue.  The devchats have all been players asking about various new stuff that we might add to the game. (The answer is almost always “That’s a great idea that we want to implement, but we don’t know when we’ll get to it.”)  I think to get more feedback from players I’ll need to actually ask them some questions.

Maybe I’ll have to try that in the next one…

February 20, 2008

p

Does this mean I have to rename my blog?

Filed under: Day Job — Joe @ 1:42 pm

This morning we announced that I’m now the Producer of Pirates of the Burning Sea. They haven’t come for my compiler yet, but I’m sure it’s not long now. :)

This isn’t actually as big a change in duties as it might seem at first. I’ve been spending a lot of time over the past year dealing with our partnerships with SOE and others. I’ve also always been a meddler, so I was always sticking my nose in Rev’s business anyway. Now it’s my business instead.

January 6, 2008

p

From Beta to Live

Filed under: Day Job, Game Industry, Production — Joe @ 7:04 pm

The other day I was on a conference call with SOE and said something like, “That build will go to testbed on Monday and to beta a week later.” Since we are going live tomorrow, they were confused by my calling anything “beta”.  I meant “live” of course, but we’ve been in closed beta for two years and just finished a month of open beta. Old habits die hard.

It did get me thinking about the difference between the two and the big change we’re going through when our first paying customers log into the game. The shift in vocabulary is really the least important change that launch day will bring. The biggest change is that we shift from doing players a favor by letting them play to them doing us a favor by being our customers.

When you’re in closed beta you have a huge pile of applicants begging for a small number of beta slots. They have to play by your rules or you will kick them out, and those rules are pretty draconian.  They can’t let anyone know they’ve been accepted to beta. They can only play during select hours. Their characters could be wiped at any time. They work for you.

When horrible bugs or completely broken builds happen in beta, people are upset, but they understand that such things are what they signed up for. The people who rant about stability problems in the beta forum are invariably attacked by other testers with cries of “It’s a beta!” These “beta-cops” have much more tolerance for downtime than the developers do, and often need to be reined in.  Extending downtime to debug a serious server problem is a frequent occurrence in beta. Beta downtime is often unpredictable and usually not announced more than a few hours in advance. You also tend to push the limits of your systems to see where they break, even though the breakage means hardship for players.

All of that changes when you go live. Once your game is live, yours is just one of many ways your customers can spend their time and money. You are lucky they picked you, and if you want to keep them, you will treat them well. You can’t tell them what they can say or when they can play. You can’t ever delete (or lose) any of their data without serious repercussions.  You work for them, and can’t ever forget it.

Your game must be up as much as it possibly can be. If another half hour of downtime will let you diagnose a problem that will take two days to figure out otherwise, tough.  (Obviously exceptions can be made if the half hour of debugging will save you a hour of downtime down the road.) Planned outages have to be announced at least a day, and preferably a week, in advance. Major changes can’t be on the test server for a few days, they need two or more weeks. Nobody gets check-in permission on the closest-to-live branch(es) unless multiple high-ranking staff members have signed off on the change.

Some things don’t change when you go live. Communicating honestly and frequently with your community remains essential. In fact, communicating with the community gets much more interesting after your NDA drops. They start to have an idea what they’re talking about when they can see what game you actually made. Of course keeping them involved in your decision making process is just as important in beta as in live, the audience just gets larger because the whole world can see the discussion.

At least I think that’s how it will go. I’m still 14 hours from launching my first MMO. Maybe those of you who have done this before can tell me how far off I am. :)

December 14, 2007

p

Open Open Beta

Filed under: Day Job — Joe @ 11:37 pm

Fileplanet has dropped the subscription requirement on our open beta so Pirates is now open to all. Hopefully we’ll break all our concurrency records again this weekend.  It’s not as good for the marketing-event side of open beta when the servers get overloaded, but it’s very good for the make-launch-stable side of things. :)

Sorry this has kind of been All PotBS News All The Time of late. We’re working crazy hours to get everything ready for launch and I haven’t had time to breath, let alone blog.  Turns out that launching an MMO is hard.

December 13, 2007

p

We’ve gone gold!

Filed under: Day Job — Joe @ 2:40 pm

The DVDs with the bits for the Pirates boxes are off to the printers. Woot!

Of course with the open beta in full swing and the pre-order head start just 24 days away I don’t have time to post much more than that. :)

December 3, 2007

p

Pirates NDA drops

Filed under: Day Job — Joe @ 3:56 pm

We just dropped our NDA with the start of open beta today. Hopefully a billion forums will be abuzz with talk of Pirates today. :)

This is a pretty big deal around Flying Lab. We’ve been working on the game for more than five years and have been in closed beta for two. Now we’re entering the final stretch before the pre-order head start on January 7th.  (The game “launches” on January 22, but paying customers will gain access on the 7th and get to keep their characters, so that’s our real launch date by every measure that matters.)  It will be nice to have people who have actually played the game get a chance to talk about it.

November 24, 2007

p

How to make Microsoft SQL Server cry like a baby

Filed under: Day Job, Engineering — Joe @ 3:46 pm

Earlier this year we switched from MySQL to MS SQL Server. I don’t regret the switch at all; MS SQL Server has been far more stable than MySQL was, and has lots of whizzy new features. The MySQL client library was dropping connections under load and then crashing when it reconnected. That is what pushed us to switch in the first place. Well it turns out that MS SQL Server has some scaling problems of its own. It doesn’t crash, but it does get so slow as to be non-functional. This is a helpful guide that will help you make your own installation of SQL Server whimper.

Our server boxes are 8-way 2.6GHx Xeons with 16GB RAM running Windows Server 2003 64-bit and SQL Server Enterprise Edition 64-bit. If your configuration is different your mileage may vary.

Technique #1

We are using a system called the Flogger to record gameplay event into a database. To make this happen, all server processes connect to one central DB and call a stored procedure per event. This works fine when the number of processes is low, as in under 500. When the load on a world instance grows the number of processes connecting to the flogger DB increases to 1200.

Exactly how long seems to vary from a few hours to a few days, but after a while at this load SQL server decides that it has had enough and stops accepting new connections. New processes starting up time out eventually and things generally start going badly on the servers. Once SQL Server starts timing out connections the only way we’ve found to get the database running again is to restart the SQL Server service. While it’s in this state the server is only using moderate server resources.

The way we’re working around this problem is to use files as a buffer between the server processes and the database. Every so often (depending on activity) each process will dump the events it wants to record out to file. Some time later (well under a second when there’s no load, but potentially longer on a well loaded cluster) another process that maintains a connection to the flog database reads the file, dumps it to the database, and then deletes the file. This eliminates the need for the game servers to connect to this database at all, so if it decides to go out to lunch the game is unaffected. It also makes the data collection more reliable by putting any backlog into one directory full of files instead of in memory on 1500 different processes spread across five server machines.

Technique #2

We have another database exhibiting similar problems, though not quite as severely. Each process in a game cluster connects to a shared database called Serverdir and uses the DB to report its status back to operations tools and the “keep everything running” processes. This data is strictly temporary and probably doesn’t belong in a database all, but Horrible Design Flaws That Are All My Fault aside, it’s just not that many queries and they’re all very simple selects and updates. This shouldn’t be a problem for server hardware as beefy as ours.

That argument doesn’t convince SQL Server, however. After a few days SQL Server pauses for a few minutes. The CPU goes to 0% and no queries return for the entire time it’s paused. Our code responds to that by closing things down because it can’t currently tell the difference between “Query takes over a minute” and “Crashed process.” At that point half the cluster shuts down.

We don’t have a great workaround for this one yet. We’ve been steadily reducing the load on the Serverdir database, but it doesn’t seem to take all that much load to make it happen. Our best bet is to make the code smarter and have it detect these situations. If it just sits tight for a few minutes everything will return to normal without needing to restart anything. Fortunately it only happens a couple times a week so while it’s something we definitely need to fix before launch it isn’t impacting beta tester’s ability to play.

Making an MMO scale is a pain

None of the profiling tools we’re using at the SQL Server or OS levels are much help with either of these problems. Nothing tells us why SQL Server is refusing connections, or why it’s refusing to work on queries. Most database books and websites think that a slow query is one that takes longer than a minute or two, but in our world that’s a dead process and a disappointed customer.

We have made great strides in scalability since the first stress test, but no matter how many things you fix there is always one more waiting to bite you on the ass. *sigh*  We’ll get it figured out and apart from these DB troubles everything is staying up quite well at this point. We have 43 more days until the pre-order head start, so there’s still plenty of time to get through this round of problems. Then we break through into the infinite!

My fix for the flogger scale problem is now ready for a code review, so I’m going home to play Rock Band.

November 15, 2007

p

PotBS Stress Test this weekend

Filed under: Day Job — Joe @ 7:47 pm

We are running our second stress test this weekend, and so far it’s going quite well.  Fileplanet just opened it up to non-subscribers, so head on over and give the game a try!

October 31, 2007

p

Scripting in PotBS

Filed under: Day Job, Engineering — Joe @ 4:54 pm

The sweng-gamedev list is all a-flutter with a debate about the merits of scripting in games.  I wrote up a response that describes our experiences and figured I’d share it here too.

We had Lua integrated into the client and wrote much of our UI logic written in it. We struggled with bugs in our glue layer, difficulty debugging, and major spikes in our frame times whenever the garbage collector ran. Of course the glue code was terrible to begin with and we were exporting script hooks for much of the game instead of a nice clean interface, so that didn’t help. After a while we started moving away from Lua and began implementing new UI in C++. We’ve now removed all the Lua from the game.

On the gameplay side we use a rich data-drive system that lets designers define an arbitrary list of “requirements” with which they are able to test most any condition. When a trigger fires, object is used, or skill is activated, an arbitrary list of “results” is activated which is capable of modifying just about any state in the game.  The designers also have a few ways of maintaining persistent state on the characters depending on the circumstances.  This system is working pretty well for us and eliminates the need for any designer-written scripts.

If I ever integrate scripting into an engine again, there are several things I’ll do differently to make it go more smoothly:

  1. No designer scripting.  If designers are writing scripts, You’re Doing It Wrong. Scripts are code, and need to be just as maintainable as all your other code.
  2. A much cleaner API layer between the C++ code and the script code.  Exporting the whole game to Lua was just dumb.
  3. A built-in debugger. Printf-style debugging is so incredibly painful when you’re used to having a rich source-level debugger.
  4. Built-in profiling. All calls across the native/script interface should be timed and memory consumption should be strictly monitored.
  5. Dynamic script loading. Part was stupid glue and part was just our poor use of the scripts, but the first time around we ended up loading all the scripts when the client booted and couldn’t reload most of them. This one of the major advantages of scripting and we were missing out on it.
  6. Much more evaluation time. We know a bunch of things to look for the next time around including slow garbage collection, object lifecycle issues, memory corruption in the glue, testability of the scripts in isolation, etc.

On the other hand, I think writing servers in a higher-level-than-C++ language like C# or Java makes a lot of sense and would save us tons of development time. It’s the dynamically typed language with no debugger that didn’t work well for us.

October 11, 2007

p

How I Spent My Weekend

Filed under: Day Job — Joe @ 2:07 pm

We just had our first stress test for pirates. I just posted a devlog with the details. Except for the first three hours (which were glorious) it was a 4 day long marathon debugging session. I don’t think I’ve ever been so exhausted as I was on Sunday night. :)

Still, it was totally worth it. We learned many things about our server and where they break that will hopefully let future tests run much more smoothly.

Next Page »