These last couple of days I’ve decided to dive into what might be the most important technique in games of the future, computing on the GPU.
Now like you, I like particles. I remember doing a simple 2D particle system back in the Amiga days. It was basically a star field on the Amiga. Not really 3D other than it looked it. (I always preferred faking). I couldn’t draw a lot of particles (stars) before it started chugging. Then years later, in the mid 1990s in fact, I worked on a particle system again on the PC. This came on the back of numerous PC related tasks, one of which was to convert an undocumented, unportable video decoder written in assembler that only worked on a 320×200 screen to one that ran in C on a 640×400 screen. In a weekend. Which I did. Should have seen the look on the assembler programmer’s face when I turned up on Monday (I was the Producer) and demonstrated that. He’d previously said it couldn’t be done. (Seeing a pattern here?)
So anyway, I worked on a particle system on a PC that chugged at a thousand particles, but was OK with 400 or so at a decent clip. I was not happy.
And so today a few thousand particles on a CPU on top of everything else is good, but it’s not good enough for me. I’m thinking of hundreds of thousands of particles. The only way to achieve that is to do them on the GPU and exploit the enormous parallel compute power. Particles lend themselves to parallelisation of course.
So I’ve been looking, learning, rewriting, testing, writing shaders, understanding shaders, and I’m getting somewhere.
Computing on the GPU is achieved through a technique called FBO ping-ponging. It sounds tough. It’s not all that tough. The reason you do this is because you cannot read from and write to a texture map at the same time. Why texture maps? We’ll come to that, but suffice to say they’re no longer acting as texture maps.
You use a texture map to store other data, in our case, particle data; like velocity, position, time to live and so on. You then read this data, perform your computations and then write out the modified data to the write-only texture map. You then flip the FBOs and carry on. With my test program, I am initialising from the CPU, but there is no reason why you can’t initialise from the GPU and that’s what I’ll be doing at some point. Ping-ponging textures like this is the only way you can maintain state in what would otherwise be a stateless architecture. Exploiting the heavily parallel nature of modern GPUs is what gives these techniques such extraordinary performance over doing them on the CPU.
I have a lot to learn. Ray-marching, distance fields, ray-tracing, ambient occlusion in real time and other forms of optimised global illumination. It sounds heavy, but it’s just process and like anything else, if someone else has done it, you probably can too.
In the meantime, I am going to spend some more time studying ShaderToy and seeing just how they achieve such incredible effects. I’m planning on having Chimera looking amazing. At the beginning of the week, it was a crazy, foolish dream. Tonight, it is within sight.