Valve Hardware Day 2006 - Multithreaded Edition

Name: Valve Hardware Day 2006 - Multithreaded Edition
Item: Valve Hardware Day 2006 - Multithreaded Edition
Author: Jarred Walton

by Jarred Walton on November 7, 2006 6:00 AM EST

Posted in
Trade Shows

55 Comments | Add A Comment

55 Comments

Closing Thoughts

At this point, we almost have more questions than we have answers. The good news is that it appears Valve and others will be able to make good use of multi-core architectures in the near future. How much those improvements will really affect the gameplay remains to be seen. In the short term, at least, it appears that most of the changes will be cosmetic in nature. After all, there are still a lot of single core processors out there, and you don't want to cut off a huge portion of your target market by requiring multi-core processors in order to properly experience a game. Valve is looking to offer equivalent results regardless of the number of cores for now, but the presentation will be different. Some effects will need to be scaled down, but the overall game should stay consistent. At some point, of course, we will likely see the transition made to requiring dual core or even multi-core processors in order to run the latest games.

We had a lot of other questions for Valve Software, so we'll wrap up with some highlights from the discussions that followed their presentations. First, here's a quick summary of the new features that will be available in the Source engine in the Episode Two timeframe.

We mentioned earlier that Valve is looking for things to do with gaming worlds beyond simply improving the graphics: more interactivity, a better feeling of immersion, more believable artificial intelligence, etc. Gabe Newell stated that he felt the release of Kentsfield was a major inflection point in the world of computer hardware. Basically, the changes this will bring about are going to be far reaching and will stay with us for a long time. Gabe feels that the next major inflection point is going to be related to the post-GPU transition, although anyone's guess as to what exactly it will be and when it will occur will require a much better crystal ball than we have access to. If Valve Software is right, however, the work that they are doing right now to take advantage of multiple cores will dovetail nicely with future computing developments. As we mentioned earlier, Valve is now committed to buying all quad core (or better) systems for their own employees going forward.

In regards to quad core processing, one of the questions that was raised is how the various platform architectures will affect overall performance. Specifically, Kentsfield is simply two Core 2 Duo die placed within a single CPU package (similar to Smithfield and Presler relative to Pentium 4). In contrast, Athlon X2 and Core 2 Duo are two processor cores placed within a single die. There are quite a few options for getting quad core processing these days. Kentsfield offers four cores in one package, in the future we should see four cores in a single die, or you could go with a dual socket motherboard with each socket containing a dual core processor. On the extreme end of things, you could even have a four socket motherboard with each socket housing a single core processor. Valve admitted that they hadn't done any internal testing of four socket or even two socket platforms -- they are primarily interested in desktop systems that will see widespread use -- and much of their testing so far has been focused on dual core designs, with quad core being the new addition. Back to the question of CPU package design (two chips on a package vs. a single die) the current indication is that the performance impact of a shared package vs. a single die was "not enough to matter".

One area in which Valve has a real advantage over other game developers is in their delivery system. With Steam comes the ability to roll out significant engine updates; we saw Valve add HDR rendering to the Source engine last year, and their episodic approach to gaming allows them to iterate towards a completely new engine rather than trying to rewrite everything from scratch every several years. That means Valve is able to get major engine overhauls into the hands of the consumers more quickly than other gaming companies. Steam also gives them other interesting options, for example they talked about the potential to deliver games optimized specifically for quad core to those people who are running quad core systems. Because Steam knows what sort of hardware you are running, Valve can make sure in advance that people meet the recommended system requirements.

Perhaps one of the most interesting things about the multi-core Source engine update is that it should be applicable to previously released Source titles like the original Half-Life 2, Lost Coast, Episode One, Day of Defeat: Source, etc. The current goal is to get the technology released to consumers during the first half of 2007 -- sometime before Episode Two is released. Not all of the multi-core enhancements will make it into Episode Two or the earlier titles, but as time passes Source-based games should begin adding additional multi-core optimizations. It is still up to the individual licensees of the Source engine to determine which upgrades they want to use or not, but at least they will have the ability to add multithreading support.

With all of the work that has been done by Valve during the past year, what is their overall impression of the difficulty involved? Gabe put it this way: it was hard, painful work, but it's still a very important problem to solve. There are talented people that have been working on getting multithreading to work rather than on other stuff, so there has been a significant monetary investment. The performance they will gain is going to be useful however, as it should allow them to create games that other companies are currently unable to build. Even more surprising is how the threading framework is able to insulate other programmers from the complexities involved. These "leaf coders" (i.e. junior programmers) are still able to remain productive within the framework, without having to be aware of the low-level threading problems that are being addressed. One of these programmers for example was able to demonstrate new AI code running on the multithreaded platform, and it only took about three days of work to port the code from a single threaded design to the multithreading engine. That's not to say that there aren't additional bugs that need to be addressed (there are certain race conditions and timing issues that become a problem with multithreading that you simply don't see in single threaded code), but over time the programmers simply become more familiar with the new way of doing things.

Another area of discussion that brought up several questions was in regards to other hardware being supported. Specifically, there is now hardware like PhysX, as well as the potential for graphics cards to do general computations (including physics) instead of just pure graphics work. Support for these technologies is not out of the question according to Valve. The bigger issue is going to be adoption rate: they are not going to spend a lot of man-hours supporting something that less than 1% of the population is likely to own. If the hardware becomes popular enough, it will definitely be possible for Valve to take advantage of it.

As far as console hardware goes, the engine is already running on Xbox 360, with support for six simultaneous threads. The PC platform and Xbox 360 are apparently close enough that getting the software to run on both of them does not require a lot of extra effort. PS3 on the other hand.... The potential to support PS3 is definitely there, but it doesn't sound like Valve has devoted any serious effort into this platform as of yet. Given that the hardware isn't available for consumer purchase yet, that might make sense. The PS3 Cell processor does add some additional problems in terms of multithreading support. First, unlike Xbox 360 and PC processors, the processor cores available in Cell are not all equivalent. That means they will have to spend additional effort making sure that the software is aware of what cores can do what sort of tasks best (or at all as the case may be). Another problem that Cell creates is that there's not a coherent view of memory. Each core has its own dedicated high-speed local memory, so all of that has to be managed along with worrying about threading and execution capabilities. Basically, PS3/Cell takes the problems inherent with multithreading and adds a few more, so getting optimal utilization of the Cell processor is going to be even more difficult.

One final area that was of personal interest is Steam. Unfortunately, one of the biggest questions in regards to Steam is something Valve won't discuss; specifically, we'd really like to know how successful Steam has been as a distribution network. Valve won't reveal how many copies of Half-Life 2 (or any of their other titles) were sold on Steam. As the number of titles showing up has been increasing quite a bit, however, it's safe to say Steam will be with us for a while. This is perhaps something of a no-brainer, but the benefits Valve gets by selling a game via Steam rather than at retail are quite tangible. Valve did confirm that any purchases made through Steam are almost pure profit for them. Before you get upset with that revelation, though, consider: whom would you rather support, the game develpers or the retail channel, publishers, and distributors? There are plenty of people that will willingly pay more money for a title if they know the money will all get to the artists behind the work. Valve also indicated that a lot of the "indie" projects that have made their way over to Steam have become extremely successful compared to their previously lackluster retail sales -- and the creators see a much greater percentage of the gross sales vs. typical distribution (even after Valve gets its cut). So not only does Steam do away with scratched CDs, lost keys, and disc swapping, but it has also started to become a haven for niche products that might not otherwise be able to reach their target audience. Some people hate Steam, and it certainly isn't perfect, but many of us have been very pleased with the service and look forward to continuing to use it.

Some of us (author included) have been a bit pessimistic on the ability of game developers to properly support dual cores, let alone multi-core processors. We would all like for it to be easy to take advantage of additional computational power, but multithreading certainly isn't easy. After the first few "SMP optimized" games came and went, the concerns seemed to be validated. Quake 4 and Call of Duty 2 both were able to show performance improvements courtesy of dual core processors, but for the most part these performance gains only came at lower resolutions. After Valve's Hardware Day presentations, we have renewed hope that games will actually be able to become more interesting again, beyond simply improving visuals. The potential certainly appears to be there, so now all we need to see is some real game titles that actually make use of it. Hopefully, 2007 will be the year that dual core and multi-core gaming really makes a splash. Looking at the benchmark results, are four cores twice as exciting as two cores? Perhaps not yet for a lot of people, but in the future they very well could be! The performance offered definitely holds the potential to open up a lot of new doors.

As a closing comment, there was a lot of material presented, and in the interest of time and space we have not covered every single topic that was discussed. We did try to hit all the major highlights, and if you have any further questions, please join the comments section and we will do our best to respond.

Benchmark Performance

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

55 Comments

View All Comments

JarredWalton - Tuesday, November 7, 2006 - link
What's with the octal posting? Too many CPU cores running? ;)

I deleted the other 7 identical posts for you. Careful with that Post Comment button!
saratoga - Tuesday, November 7, 2006 - link
Server kept timing out when I hit post, so I assumed it wasn't committing :)
exdeath - Tuesday, November 7, 2006 - link
You can see my recent comments on this topic here:

http://www.dailytech.com/Article.aspx?newsid=4847&...">http://www.dailytech.com/Article.aspx?newsid=4847&...

In my experience relying on atomic CPU swap operations isn't enough as it only works with a single value (32 bit word for example).

While you lock and swap a 32 bit Y value, someone else has just finished reading the newly written X value but beat you to the lock to read the old Y value before you've updated. Clearly whole data structures need to be coherent, not just small atomic values.

Also it’s unusual to modify objects observable states mid frame. Even if you avoided the above example so that the X,Y pair was always updated together, you'd still have different objects interpreting the position as a whole of that object in different places at different times. State data must be held constant to all observers throughout the context of a single frame.
exdeath - Tuesday, November 7, 2006 - link
Even if you avoided the above example so that the X,Y pair was always updated together, you'd still have different objects interpreting the position as a whole of that object in different places at different times in the same frame.
JarredWalton - Tuesday, November 7, 2006 - link
I'm assuming your comment is in regards to the PS3/Cell comments on the last page? It's sort of sounds like you're arguing about the way Valve has chosen to go about doing things, or that you disagree with some of the opinions they've expressed concerning other hardware. We have only tried to provide a very high-level overview of what Valve is doing, and we hardly touched the low-level details -- Valve didn't spend a lot of time on specific implementation issues either. All they did was provide us with some information about what they are doing, and a bit of opinion on what they think of the rest of the hardware options.

Preventing anything else from doing write operations to the world state during an entire frame in order to keep things coherent is a big problem with multithreading. Apparently Valve has found a way around that, or at least found a way to do it more efficiently, using lock free and wait free algorithms. No, I can't honestly say I really understand what those algorithms do, but if they say it worked better for their code base I'm willing to trust them.

As far as the PS3/Cell processor goes, Valve did say that they have various thoughts on how to properly utilize the architecture. It is simply going to be more difficult to do relative to Xbox 360 and PC. It's not impossible, and companies are definitely going to tackle this problem. As far as how they tackle it, I'm more than a bit rusty on my coding background, and other than high-level details I'm not too concerned how they improve their multithreading code on any specific platform, just that they do it.
exdeath - Tuesday, November 7, 2006 - link
The other issue is OS support.

Compiler add-on's or third party APIs can only serve to hide the details or make things look cleaner. But no matter what, the final barrier between the application and the OS are the API calls provided by the OS threading model. Thus no third party implementation can be better than the OS thread model itself in terms of performance and overhead. All those can do is make it easier to use at the top by handling the OS details.

I imagine threading APIs on popular OSes will start to evolve, just like graphics APIs have, once everyone gets on the multi-core bandwagon and starts to get a feel for what's available in the OS APIs and what they'd rather have. So far, Vista's thread pool API looks good, but I still don't see an API to determine such basic things as checking if the work queue is empty and all threads are idle, etc.

Currently I find it's easier to implement my own thread pool manager which does atomic increments and decrements on a 'task count' variable as tasks are entered or completed in the queue. Checking if all tasks are done involves testing that task count against 0 and signaling an event flag that wakes any management threads sleeping until all its work tasks to complete. It also allows for more flexibility in 'before and after' housekeeping as work threads move from task to task and that kind of control isn't offered in the XP’s built in thread pool API, nor Vista’s as far as I can tell.
exdeath - Tuesday, November 7, 2006 - link
Not arguing their methods, a lot of things in this article are in line with my own opinions on multithreading, pretty much the best way to got about it. I'm just pointing out that atomic lock/swap operations in hardware are very primitive and typically operate only on CPU word size values, not entire data structures. Thus it's possible between doing two atomic operations on two variables on one core, another core can get an old version of one variable and a new version of another.

core1: compute X
core2: ...

core1: lock/write x
core2: read x, get newly written version

core1: compute Y
core2: read Y, get old y before the update

core1: lock/write Y
core2: ...

The task on core2 is working with inconsistent data, the new X and the old Y. If the task on core2 only uses the data as input, i.e.: AI tracking another AI entity, it has the wrong position, and won't know about it since it has no need to perform its own lock/write (so it never gets the exception that says the value changed). Even if it did, it would have to throw out all work and redo it with the new Y, and then it could possibly change again.

Looping and retrying seems wasteful. And I’m thinking the only way to catch such a hardware error on a failed lock/write update is via exceptions, and handling a thrown exception on an attempt to write a single 32 bit value is very wasteful of CPU cycles.

In my own research I have had excellent results with double buffering any modified data. Each threaded task only updates its hidden internal working state for frame n+1 while all reads to the object are read from its external current state for frame n. At the end of the frame when all parallel tasks have completed, the current/working states are swapped, and the work queue is filled again to start the next frame.

This ensures that throughout the entire computation of frame n+1, the current frame n state will be available to all threads, and guaranteed to not be modified through the duration of current frame. So basically all threads can read anything they want and modify their own data. On PC/360 the time to swap everything is basically nothing; you just swap a few pointers, or a single pointer to an array/structure of current/working data for the frame.

On the PS3 some data copying and moving will be required, but this is mandatory due to design anyway and assisted by an extremely smart and powerful DMAC.

One place to be critical about is message passing between objects since it requires posting (writing) data to be picked up by another object. But the time to lock/post/unlock a queue is negligible compared to the time it takes to process the results leading up to the creation of the message. This is similar to the D3D notion of doing as much as you can before you lock and only do the minimal work needed inside the lock and unlock as quickly as possible.
GhandiInstinct - Tuesday, November 7, 2006 - link
Jarred Walton,

My question: Will Valve's games in 2007 be released with specificaitons such as: "For minimum requirements you need a dual-core cpu, for maximum results you need a quad-core" or anything to that nature? Because I seem to be confused in what Valve is working on dual or quad or both or neither or something different, and what I should get to best utilize their games and multi-core software in general.

Thanks.
JarredWalton - Tuesday, November 7, 2006 - link
Episode Two should come out sometime in 2007, and before that happens you will get the multithreading patch affecting previous Source engine titles. Right now, it doesn't sound like anything released in the next year or so from valve is going to require dual cores. That's what I was trying to get out on the conclusion page where I mentioned that they are targeting an "equivalent experience" regardless of what sort of processor you are running.

So just like you could turn down the level of detail in Half-Life 2 and run it on DX8 or even DX7 hardware, Source engine should be able to accommodate single core processors all the way up through N-core processors. The engine will spawn as many threads as you have processor cores, with one main thread serving as the controller and N - 1 helper threads. Xbox 360 for example would have 5 helper threads plus the master thread, because it has three course each capable of executing to threads simultaneously.
Patrese - Tuesday, November 7, 2006 - link
Great article, good to see dual-quad cores being used for something in games. By the way, the kitchen examples made me hungry... :)

Valve Hardware Day 2006 - Multithreaded Edition

Post Your Comment

55 Comments

View All Comments

JarredWalton - Tuesday, November 7, 2006 - link

saratoga - Tuesday, November 7, 2006 - link

exdeath - Tuesday, November 7, 2006 - link

exdeath - Tuesday, November 7, 2006 - link

JarredWalton - Tuesday, November 7, 2006 - link

exdeath - Tuesday, November 7, 2006 - link

exdeath - Tuesday, November 7, 2006 - link

GhandiInstinct - Tuesday, November 7, 2006 - link

JarredWalton - Tuesday, November 7, 2006 - link

Patrese - Tuesday, November 7, 2006 - link

Log in

Don't have an account? Sign up now