Promit's Ventspace

July 22, 2010

PhysX and Hardware Acceleration

Filed under: Software Engineering — Promit @ 2:49 pm
Tags: , , , , , ,

A more accurate title would be PhysX and its lack of hardware acceleration. In retrospect this is largely my fault for believing marketing hype without looking into details, but I figured that I’d discuss it. Finding resources on the subject was difficult, so I think a lot of people may be suffering from the same delusion as I was.

Super short version: PhysX computes its rigid body simulation strictly in software. The GPU never gets involved.

Let’s step back for a second and look at the history. It starts with a library called NovodeX which was a popular physics package around the same time Half-Life 2 launched with Havok. A semiconductor company called Ageia acquired the company to build Physics Processing Unit (PPU) based cards, an idea that got a lot of attention at the time. The hardware was over priced, mismanaged, and probably doomed in the end regardless. The effort started sinking pretty quickly until NVIDIA swept in and bought the whole thing. NV then announced that the whole PPU project would be scrapped and that hardware acceleration of physics would be done on the GPU.

The current state of PhysX is somewhat confusing. There are plenty of PhysX accelerated games. Benchmarks of these games show dramatic performance gains when running with hardware supported physics. There have been recent allegations that PhysX’s CPU performance is unreasonably poor, possibly to strengthen the case for hardware acceleration. I don’t have any comment on that mess, but it’s part of the evidence suggesting that hardware is the real focus of PhysX and NVIDIA. That’s not a surprise.

What is a surprise is that the rigid body simulation — the important part of the physics — is not hardware accelerated. Apparently it was when the PPU rolled out, but the GPU based acceleration does not support it. Look at any PhysX based game that advertises and you’ll notice gobs of destruction, cloth, fluids, etc. That’s because those are the only GPU-accelerated effects PhysX supports (plus a few misc things like soft body). Probably the big tip off is that none of these effects require forces to be imparted back to the main physics scene in any way. This is strictly eye candy.

So the interactive rigid body simulation, the part that actually affects gameplay, is completely in software. And if you believe the claims, it’s not even done well in software. All these problems will apparently be fixed in a magical 3.0 release, coming at some vague point in the future. Why? My best guess is that no one has paid any attention to the core PC code in six years. I’d wager that everyone’s been so obsessed with hardware acceleration, and that the basic problem of writing a rigid body solver is so stupidly easy, that we’re simply coasting on the same 2004 NovodeX era code that made the library popular in the first place. Version 3.0 is probably a ground up rewrite.

Don’t get me wrong. PhysX is not bad. It is simply stagnant. Take a recent game and strip away the GPU driven effects candy. What exactly is left in the interactive part of the simulation that wasn’t already in Half Life 2? That was also a 2004 release. NVIDIA did what they do best, visual effects. Marketing also did what they do best, letting everybody assume something untrue without actually ever saying it.

Anyway, now you know where hype ends and fact begins. Rigid body game physics is not hardware accelerated; only special effects that fall pretty loosely into the category of “physics” are. Maybe that’s common knowledge, but it was news to me.

Update: This presentation about Bullet’s GPU acceleration is a good read.
Update 2: I’m wrong on one technical point — PhysX’s hardware accelerated systems can impart forces back to the main scene, and their cloth shows off this capability in the samples.

About these ads

4 Comments »

  1. I’m probably wrong, but I had assumed that the broadphase for rigid bodies was done on the GPU. Then, the collision response from those collision pairs was done on the CPU.

    At least, that’s the approach Bullet takes, and I thought PhysX would have a similar implementation.

    Comment by Patrick — July 22, 2010 @ 5:13 pm | Reply

    • It’s possible, but I’ve seen no information or data to bear that out. I just looked up the Bullet GPU implementation presentation though, and it looks like some really cool stuff.

      Comment by Promit — July 22, 2010 @ 5:16 pm | Reply

  2. “Probably the big tip off is that none of these effects require forces to be imparted back to the main physics scene in any way. This is strictly eye candy.”

    This is not strictly true, fluids and cloth can impart forces on rigid bodies. For example a wheel turned by a stream of fluid or a heap of cloth on a balance.

    Also, having worked on PhysX in the past, I can say that the core (software) code has been improved in that time, however the API has remained static(only additions), until a major release.

    The software rigid body code has improved a bit, in particular things like zero overhead sleeping, threading within a scene, dominance groups etc. But rigid body is a more mature technology anyway(in the realm of physics SDKs).

    The RB changes dont recieve as much attention as the more easily paralizable stuff, but it has improved(at least until nVidia, even then we have a new Visual Debugger).

    Comment by therealdblack — July 23, 2010 @ 1:45 am | Reply

  3. “I’m probably wrong, but I had assumed that the broadphase for rigid bodies was done on the GPU. Then, the collision response from those collision pairs was done on the CPU.”

    In the PPU days(and probably still if nVidia have RB internally) the broad phase could optionally be done on the PPU, with a CPU work phase between it and the dynamics solver. This was to allow more complex operations(eg non D6 joint setup) to occur on the CPU.

    However it wasnt necaserily a huge advantage to run the BP on the PPU, due to the algorithms, the PPU version was less prone to performance spikes but very constrained in the number of objects(4k).

    The BP is memory bandwidth, not compute limited, so paralization may be less advantagous…

    I wrote a BP algorithm for the CPU(designed to be easily ported to PPU/GPU) before leaving which had good (and consistant) performance and did not have(inherent) limits for object count(configured for 64k), not sure if it was included or if it was included as part of a partitioning BP later.

    Comment by therealdblack — July 23, 2010 @ 2:00 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 511 other followers

%d bloggers like this: