<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Continuous Deployment with Thick Clients</title>
	<atom:link href="http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/feed/" rel="self" type="application/rss+xml" />
	<link>http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/</link>
	<description>Joe Ludwig's blog</description>
	<lastBuildDate>Sun, 01 Jan 2012 19:08:41 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: Bryant</title>
		<link>http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/comment-page-1/#comment-330782</link>
		<dc:creator>Bryant</dc:creator>
		<pubDate>Tue, 31 Mar 2009 11:36:26 +0000</pubDate>
		<guid isPermaLink="false">http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/#comment-330782</guid>
		<description>I don&#039;t think continuous deployment is a good idea, speaking as an MMO operations guy, because it makes the troubleshooting process substantially more difficult. I use the Visible IT methodology when possible, and it keys off of change management -- anything that makes the changes that triggered the latest problem harder to identify is bad. The first reaction might be that continuous deployment doesn&#039;t make that harder if it&#039;s well tracked, but you have to assume some changes cause problems which don&#039;t manifest immediately. 

On the other hand, I also strongly believe that the industry&#039;s dependence on weekly or even daily downtimes is bad for us and we need to fix it. Zero downtime is a legitimate goal. Linden Labs can do updates to Second Life without taking everything down for hours; why can&#039;t the rest of us? Sounds like Guild Wars is doing something really similar to what I&#039;ve mentally sketched out as a process, which is neat.

I gotta blog about both these things at some point.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t think continuous deployment is a good idea, speaking as an MMO operations guy, because it makes the troubleshooting process substantially more difficult. I use the Visible IT methodology when possible, and it keys off of change management &#8212; anything that makes the changes that triggered the latest problem harder to identify is bad. The first reaction might be that continuous deployment doesn&#8217;t make that harder if it&#8217;s well tracked, but you have to assume some changes cause problems which don&#8217;t manifest immediately. </p>
<p>On the other hand, I also strongly believe that the industry&#8217;s dependence on weekly or even daily downtimes is bad for us and we need to fix it. Zero downtime is a legitimate goal. Linden Labs can do updates to Second Life without taking everything down for hours; why can&#8217;t the rest of us? Sounds like Guild Wars is doing something really similar to what I&#8217;ve mentally sketched out as a process, which is neat.</p>
<p>I gotta blog about both these things at some point.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin Gadd</title>
		<link>http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/comment-page-1/#comment-316278</link>
		<dc:creator>Kevin Gadd</dc:creator>
		<pubDate>Thu, 19 Feb 2009 01:53:48 +0000</pubDate>
		<guid isPermaLink="false">http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/#comment-316278</guid>
		<description>Well, my ideal is closer to 30 minutes, but from the perspective of a gamer, a 7 hour deploy is pretty damn good if it includes things like publisher approval and QA turnaround. 

The speed of light starts becoming a serious barrier when you have as much content to deploy as a typical MMO; IMVU gets off a little easy since our deployments are almost always in the 5-50MB range, so we don&#039;t run into bandwidth issues when getting stuff onto the servers.

I&#039;ll try and share what I remember about GW&#039;s deployment process: (Disclaimer: I spent most of my time working on the content pipeline and designing game content, so I&#039;m not a domain expert here. Some of this is probably completely wrong.)

Internal &#039;development&#039; builds took between 15 and 60 minutes, depending on whether we were building large content changes or just code changes. After we had a finished dev build, we could test it on local machines and on a small development cluster that our QA and alpha test groups had access to. From development we did a merge over to our &#039;staging&#039; (production rev) branch, and did another build in about the same amount of time. Once that was done, something resembling a one-button deploy moved the content and game code from staging to our production machines, which took under an hour - most of the time being spent actually sending the bits to the datacenter(s), so the time varied based on how much had changed.
Once the new code was on all the servers in the datacenters, we flipped a switch that notified the live servers that any new &#039;instances&#039; should be started using the latest version of the game code. At this point, any users running older builds of the client got told that they needed to update, and attempts to load into a new instance would be rejected until they updated. 
To summarize how the live update worked on the server, I believe we atomically loaded the new game content onto the servers, and then loaded the newest version of the game code into the server processes alongside the older version, because each build was a single loadable DLL. That let us keep old instances alive alongside new instances on the same hardware, and isolated us from some of the hairy problems you&#039;d get from multiple client versions running in the same world.
All game content was stored using a versioned filesystem of some sort on both the servers and the clients, so the update process was fairly efficient and incremental, with our fileservers doing the work of figuring out where revision foo of file bar happened to live and sending it off to the client.

As for the client side, here&#039;s what I remember:
All content (with a few exceptions) was streamed down to the client &#039;as required&#039;, based on a set of rules about when content was needed:
There was one set of content that was essentially &quot;required&quot; to run the game. If all you had was the game client, it would automatically figure out that it needed the required content and pull it down for you. After this, you got from the loader/updater screen into the login screen. Things like UI textures, fonts, etc were in here, since the client itself used these resources.
The next set of content was essentially &#039;universal&#039; content. This stuff wasn&#039;t necessary to start up the client, but was basically required for loading into any instance where players could be running around - I think it contained things like the basic human skeletons, common textures, etc. This would be streamed on demand at the loading screen for a given instance once you logged in or changed zones.
Finally, there was a set of content needed for the actual location you were in. The game code had a lot of really elaborate scaffolding that made it possible for us to statically determine what assets were needed by a given instance. From this, we built a &#039;manifest&#039; that listed all the assets the game client could *possibly* need in that instance, and the client loaded them all before leaving the loading screen. This was particularly painful in some respects (for example, if you had a monster that spawned 5% of the time, 100% of your players would have to download the assets for it because manifests couldn&#039;t safely be dynamic in such a manner), but it really did work well when executed correctly. The successes and failures in this particular area depended as much on careful design as they did on clever technology.
There was also some magic for things like skill icons and character textures, where we only required the download of low-res subsets of the data, and we would stream in the high-res versions on demand as soon as they were actually loaded. I&#039;m not certain exactly where we used it, but I know that you&#039;d sometimes see the textures for a person&#039;s armor &#039;pop&#039; from low-res to high-res as we streamed them in over the network if you didn&#039;t already have them locally. It was hard to tell exactly when this happened, though, since we also did lots of asynchronous loading from disk that tended to look like network streaming if you were on a good connection. This primarily helped for the new user case, since it meant they could download Gw.exe off our website in less than a minute and have enough content loaded to enter a town within an hour or less.

The manifest system tied in to the build process pretty directly, and I suspect it was responsible for the short build times. If a build only took 15 minutes, that usually meant that none of the content had to be updated - and since everything was built around incremental updates, that meant we only had to deploy binaries to the servers and users&#039; machines, and all the build server needed to do was crunch through a bunch of source code using burly hardware. In degenerate cases where a bunch of content changed, there was a lot more time involved, but it still scaled nicely. We also had the ability to do incremental builds of source code, but we didn&#039;t use it for deployment in fear of linker issues (we used it on local development boxes, and while it was much faster, the linker issues were real.)

Based on my experience with GW, if I wanted to get POTBS&#039;s build times down (as a contrived example), I&#039;d try and kill the &#039;server upgrade&#039; stage first by making it asynchronous in a manner like what I described above. Improving the speed of pushing out a build is valuable but not particularly easy to do, so I don&#039;t expect there are many wins to be made there. You could cut down on the length of smoke tests, but I suspect less than 3 hours is irrational - I wouldn&#039;t be surprised if at least that much time went into testing GW deployments and I just didn&#039;t see it happen. Packing the build seems like it shouldn&#039;t take an hour, but the elaborate manifest/versioned filesystem setup we had meant that there was no &#039;packing&#039; process, so I don&#039;t quite know what went into yours. I suspect the overhead of having to collaborate with SOE probably factors in here too; I saw very little NCSoft red tape involved in smaller deployments.

I may regret posting this comment. (:</description>
		<content:encoded><![CDATA[<p>Well, my ideal is closer to 30 minutes, but from the perspective of a gamer, a 7 hour deploy is pretty damn good if it includes things like publisher approval and QA turnaround. </p>
<p>The speed of light starts becoming a serious barrier when you have as much content to deploy as a typical MMO; IMVU gets off a little easy since our deployments are almost always in the 5-50MB range, so we don&#8217;t run into bandwidth issues when getting stuff onto the servers.</p>
<p>I&#8217;ll try and share what I remember about GW&#8217;s deployment process: (Disclaimer: I spent most of my time working on the content pipeline and designing game content, so I&#8217;m not a domain expert here. Some of this is probably completely wrong.)</p>
<p>Internal &#8216;development&#8217; builds took between 15 and 60 minutes, depending on whether we were building large content changes or just code changes. After we had a finished dev build, we could test it on local machines and on a small development cluster that our QA and alpha test groups had access to. From development we did a merge over to our &#8216;staging&#8217; (production rev) branch, and did another build in about the same amount of time. Once that was done, something resembling a one-button deploy moved the content and game code from staging to our production machines, which took under an hour &#8211; most of the time being spent actually sending the bits to the datacenter(s), so the time varied based on how much had changed.<br />
Once the new code was on all the servers in the datacenters, we flipped a switch that notified the live servers that any new &#8216;instances&#8217; should be started using the latest version of the game code. At this point, any users running older builds of the client got told that they needed to update, and attempts to load into a new instance would be rejected until they updated.<br />
To summarize how the live update worked on the server, I believe we atomically loaded the new game content onto the servers, and then loaded the newest version of the game code into the server processes alongside the older version, because each build was a single loadable DLL. That let us keep old instances alive alongside new instances on the same hardware, and isolated us from some of the hairy problems you&#8217;d get from multiple client versions running in the same world.<br />
All game content was stored using a versioned filesystem of some sort on both the servers and the clients, so the update process was fairly efficient and incremental, with our fileservers doing the work of figuring out where revision foo of file bar happened to live and sending it off to the client.</p>
<p>As for the client side, here&#8217;s what I remember:<br />
All content (with a few exceptions) was streamed down to the client &#8216;as required&#8217;, based on a set of rules about when content was needed:<br />
There was one set of content that was essentially &#8220;required&#8221; to run the game. If all you had was the game client, it would automatically figure out that it needed the required content and pull it down for you. After this, you got from the loader/updater screen into the login screen. Things like UI textures, fonts, etc were in here, since the client itself used these resources.<br />
The next set of content was essentially &#8216;universal&#8217; content. This stuff wasn&#8217;t necessary to start up the client, but was basically required for loading into any instance where players could be running around &#8211; I think it contained things like the basic human skeletons, common textures, etc. This would be streamed on demand at the loading screen for a given instance once you logged in or changed zones.<br />
Finally, there was a set of content needed for the actual location you were in. The game code had a lot of really elaborate scaffolding that made it possible for us to statically determine what assets were needed by a given instance. From this, we built a &#8216;manifest&#8217; that listed all the assets the game client could *possibly* need in that instance, and the client loaded them all before leaving the loading screen. This was particularly painful in some respects (for example, if you had a monster that spawned 5% of the time, 100% of your players would have to download the assets for it because manifests couldn&#8217;t safely be dynamic in such a manner), but it really did work well when executed correctly. The successes and failures in this particular area depended as much on careful design as they did on clever technology.<br />
There was also some magic for things like skill icons and character textures, where we only required the download of low-res subsets of the data, and we would stream in the high-res versions on demand as soon as they were actually loaded. I&#8217;m not certain exactly where we used it, but I know that you&#8217;d sometimes see the textures for a person&#8217;s armor &#8216;pop&#8217; from low-res to high-res as we streamed them in over the network if you didn&#8217;t already have them locally. It was hard to tell exactly when this happened, though, since we also did lots of asynchronous loading from disk that tended to look like network streaming if you were on a good connection. This primarily helped for the new user case, since it meant they could download Gw.exe off our website in less than a minute and have enough content loaded to enter a town within an hour or less.</p>
<p>The manifest system tied in to the build process pretty directly, and I suspect it was responsible for the short build times. If a build only took 15 minutes, that usually meant that none of the content had to be updated &#8211; and since everything was built around incremental updates, that meant we only had to deploy binaries to the servers and users&#8217; machines, and all the build server needed to do was crunch through a bunch of source code using burly hardware. In degenerate cases where a bunch of content changed, there was a lot more time involved, but it still scaled nicely. We also had the ability to do incremental builds of source code, but we didn&#8217;t use it for deployment in fear of linker issues (we used it on local development boxes, and while it was much faster, the linker issues were real.)</p>
<p>Based on my experience with GW, if I wanted to get POTBS&#8217;s build times down (as a contrived example), I&#8217;d try and kill the &#8216;server upgrade&#8217; stage first by making it asynchronous in a manner like what I described above. Improving the speed of pushing out a build is valuable but not particularly easy to do, so I don&#8217;t expect there are many wins to be made there. You could cut down on the length of smoke tests, but I suspect less than 3 hours is irrational &#8211; I wouldn&#8217;t be surprised if at least that much time went into testing GW deployments and I just didn&#8217;t see it happen. Packing the build seems like it shouldn&#8217;t take an hour, but the elaborate manifest/versioned filesystem setup we had meant that there was no &#8216;packing&#8217; process, so I don&#8217;t quite know what went into yours. I suspect the overhead of having to collaborate with SOE probably factors in here too; I saw very little NCSoft red tape involved in smaller deployments.</p>
<p>I may regret posting this comment. (:</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe</title>
		<link>http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/comment-page-1/#comment-316123</link>
		<dc:creator>Joe</dc:creator>
		<pubDate>Wed, 18 Feb 2009 19:18:44 +0000</pubDate>
		<guid isPermaLink="false">http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/#comment-316123</guid>
		<description>Is 7.5 hours really close to the ideal? :)

I would love to hear more about how this process worked on Guild Wars.  It seems like the ability to download content from within the client and to operate with a subset of the data would be a big help and GW could do both of those things.</description>
		<content:encoded><![CDATA[<p>Is 7.5 hours really close to the ideal? <img src='http://programmerjoe.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>I would love to hear more about how this process worked on Guild Wars.  It seems like the ability to download content from within the client and to operate with a subset of the data would be a big help and GW could do both of those things.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin Gadd</title>
		<link>http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/comment-page-1/#comment-315953</link>
		<dc:creator>Kevin Gadd</dc:creator>
		<pubDate>Wed, 18 Feb 2009 04:49:54 +0000</pubDate>
		<guid isPermaLink="false">http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/#comment-315953</guid>
		<description>It&#039;s great to hear that POTBS is able to come close to the continuous deployment ideal for an MMO. I&#039;m also surprised to hear that Cryptic&#039;s build process is so streamlined - I never would have guessed from my experience playing CoH/CoV, but maybe that&#039;s just because I started playing after the game was fairly mature, so the major changes came in the Publishes.

Continous deployment has a fundamental impact on how you think about game deployment and patching, compared to the old-fashioned deployment approach where you focus on building gold masters, but it totally pays off.

My experience with semi-continous deployment on Guild Wars definitely left me hungering for more, and the horror stories about weekly builds that I heard from some of my more experienced coworkers made me wonder why more companies hadn&#039;t tried to get their build times down - it&#039;s really great to hear that other people are trying as hard as the ANet crew were.

There were cases where being able to deploy more than once a day meant that we could tackle an emergency in the span of a day or weekend instead of having to crunch away for weeks, by simply putting together a potential solution, rolling it out, and seeing if the problem went away, instead of having to spend days or weeks waiting on builds and exhaustive QA.

On the other hand, there&#039;s still a lot more ground to be gained, so I hope people like you keep working hard on improvements in this area - it&#039;s amazing how much better the quality of life can be for hardworking artists, designers and engineers when the consequences of a failed build or bad deployment are measured in hours of stress instead of days or weeks. In an industry that&#039;s somewhat famous for bad work conditions and endless crunch time, I think that&#039;s a really big deal.</description>
		<content:encoded><![CDATA[<p>It&#8217;s great to hear that POTBS is able to come close to the continuous deployment ideal for an MMO. I&#8217;m also surprised to hear that Cryptic&#8217;s build process is so streamlined &#8211; I never would have guessed from my experience playing CoH/CoV, but maybe that&#8217;s just because I started playing after the game was fairly mature, so the major changes came in the Publishes.</p>
<p>Continous deployment has a fundamental impact on how you think about game deployment and patching, compared to the old-fashioned deployment approach where you focus on building gold masters, but it totally pays off.</p>
<p>My experience with semi-continous deployment on Guild Wars definitely left me hungering for more, and the horror stories about weekly builds that I heard from some of my more experienced coworkers made me wonder why more companies hadn&#8217;t tried to get their build times down &#8211; it&#8217;s really great to hear that other people are trying as hard as the ANet crew were.</p>
<p>There were cases where being able to deploy more than once a day meant that we could tackle an emergency in the span of a day or weekend instead of having to crunch away for weeks, by simply putting together a potential solution, rolling it out, and seeing if the problem went away, instead of having to spend days or weeks waiting on builds and exhaustive QA.</p>
<p>On the other hand, there&#8217;s still a lot more ground to be gained, so I hope people like you keep working hard on improvements in this area &#8211; it&#8217;s amazing how much better the quality of life can be for hardworking artists, designers and engineers when the consequences of a failed build or bad deployment are measured in hours of stress instead of days or weeks. In an industry that&#8217;s somewhat famous for bad work conditions and endless crunch time, I think that&#8217;s a really big deal.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ben Zeigler</title>
		<link>http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/comment-page-1/#comment-313923</link>
		<dc:creator>Ben Zeigler</dc:creator>
		<pubDate>Fri, 13 Feb 2009 22:03:00 +0000</pubDate>
		<guid isPermaLink="false">http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/#comment-313923</guid>
		<description>I believe they hosted them, but the patchservers were our own code so we had control over the process. I know servers patched the same as clients, but the details of that predate me.

There was much politicking required to let us use our own written patchservers, as opposed to the &quot;approved&quot; ones.</description>
		<content:encoded><![CDATA[<p>I believe they hosted them, but the patchservers were our own code so we had control over the process. I know servers patched the same as clients, but the details of that predate me.</p>
<p>There was much politicking required to let us use our own written patchservers, as opposed to the &#8220;approved&#8221; ones.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe</title>
		<link>http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/comment-page-1/#comment-313914</link>
		<dc:creator>Joe</dc:creator>
		<pubDate>Fri, 13 Feb 2009 20:03:15 +0000</pubDate>
		<guid isPermaLink="false">http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/#comment-313914</guid>
		<description>Did you run the patch servers in the CoH days, or did they run down at NC Austin and cause a similar two-stage process?</description>
		<content:encoded><![CDATA[<p>Did you run the patch servers in the CoH days, or did they run down at NC Austin and cause a similar two-stage process?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ben Zeigler</title>
		<link>http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/comment-page-1/#comment-313911</link>
		<dc:creator>Ben Zeigler</dc:creator>
		<pubDate>Fri, 13 Feb 2009 19:43:38 +0000</pubDate>
		<guid isPermaLink="false">http://programmerjoe.com/2009/02/12/continuous-deployment-with-thick-clients/#comment-313911</guid>
		<description>Our build/deploy is a fair amount like POTBS, with the following changes:

1) We run the patchservers, and our patchservers handle incremental data uploads, so the build to patcher push is much faster for most builds. It also happens at build time automatically, so when we push an incremental build most of the files are already up on the patch servers.
2) Our full builds max out at 30 minutes or so. Data-packing can take a good bit longer, but that&#039;s incremental so doesn&#039;t happen most of the time.
3) Server upgrading uses an identical process to client patching (grabbing a different subset of files off the patchserver). Also, this means end users can pre-patch at the same time the servers are getting ready in theory.

I&#039;ve run a complete deploy process of a prototype project (ie, much less data then PotBS but just as much code, so the patching is faster than it would be for a real game) in about 40 minutes from a completely new build to externally accessible servers.</description>
		<content:encoded><![CDATA[<p>Our build/deploy is a fair amount like POTBS, with the following changes:</p>
<p>1) We run the patchservers, and our patchservers handle incremental data uploads, so the build to patcher push is much faster for most builds. It also happens at build time automatically, so when we push an incremental build most of the files are already up on the patch servers.<br />
2) Our full builds max out at 30 minutes or so. Data-packing can take a good bit longer, but that&#8217;s incremental so doesn&#8217;t happen most of the time.<br />
3) Server upgrading uses an identical process to client patching (grabbing a different subset of files off the patchserver). Also, this means end users can pre-patch at the same time the servers are getting ready in theory.</p>
<p>I&#8217;ve run a complete deploy process of a prototype project (ie, much less data then PotBS but just as much code, so the patching is faster than it would be for a real game) in about 40 minutes from a completely new build to externally accessible servers.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

