A tale of how we optimized whole game logic by a factor of 6.


Before we start, we have a Discord now, join us!.


For the last few weeks, I have been hard at work completely rewriting the game internals in a preparation for a secret feature (which will be revealed in the next devlog, stay tuned). I have tried hard to not change any gameplay logic during the rewrite. The goal has been to move all gameplay data into flat arrays, so that the entire gameplay state is in the same place, both conceptually and physically in memory.

Unity status quo

To understand what actually changed, I have to talk about Unity and how it wants you to structure your code. Wants is perhaps a bit too strong of a word here – there is nothing actually preventing you from doing things differently, but most tutorials and the few games I have seen all do this. Everything you see in a typical Unity game is made up out of GameObjects, which are arranged in a parent-child relationships (in a tree). What each GameObject does is specified by its components. Lowest children (that don’t have any children themselves) usually contain 3D models that are drawn on screen (for example, individual stones around a campfire model) or particle effect emitters (fire effect), while their parents don’t draw anything, but represent the object (campfire) as a whole. The parent-child relationship also makes sure that when the parent moves, the children also move with the parent. Then there is the object’s behavior: for example, what happens when player clicks the object (log is added into the campfire) and what happens over time (amount of fuel decreases). Each GameObject may provide some custom independent behavior, and the sum of these behaviors forms a game.

Omnibullet before

Omnibullet was not any different. In the initial prototype, we have relied on Unity’s physics system to drive collisions and each tower, bullet and battery was independent, like Unity wanted. But as the development continued, we have realized that the game would be better if we aligned everything to the beat. Bullets travel exactly 2 tiles each beat, batteries travel 1 tile per beat every other beat, things like that. But that would be very difficult to achieve with the original physics system, because physics is not fully deterministic, aligned and clean. Not to mention, using the physics system for this would be an overkill. Thus TileMap was born. This was our own class which held references to all relevant game objects (bullets, towers, batteries and some tiles). Most of the logic was still in the game objects, but everything to do with movement and collisions was in the TileMap. The system was still trying to be generic. TileMap did not know about what bullet or tower is, it was only seeing Dynamic and Static TileMapEntities. The demo is still using this system.

Omnibullet after

For the secret feature, we really needed to have the whole game state at one place, as a single source of truth. This was not easily possible through the generic TileMap and the game state was strewn all across the behaviors of different game objects. Enter Data oriented design. We have moved everything related to game logic into its own structs, stored in arrays. There is a single array for bullets, a single array for batteries, one array for each tower type and for each tile type. The struct still holds a reference to the behavior that is responsible for drawing the object, the behavior takes care of visual effects, but all gameplay logic is done on the structs and the positions and visuals of the objects are updated at the end of the frame. This is more efficient, because we can just update what needs to be updated at the time, we can easily precompute some things that were previously computed again for each tower/bullet/battery/tile, and because computers really like working with tight arrays. We no longer need any special data structures for handling collisions that were present in TileMap and which were error-prone to keep updated, because the main storage already has methods for quick spatial lookups.

Performance before and after (blue is what is being optimized, green is rendering)

The screenshot from the profiler shows, that the peak time needed for an update (the game logic update, not counting Unity internals, which are about the same) went down from about 40ms to 6ms, mostly due to much lower peaks of activity at beat boundaries, when the collisions are most frequent. Note that this is on a benchmark level (and in Unity editor), which is much larger and more busy than any other level in the game, so this does not really represent the usual gameplay, but is good for stress testing.

The benchmark level

The majority of the time is now spent in rendering, and the game logic does not suffer from massive peaks on each beat. This gives us milliseconds in which we can fit other features in. The performance is still not perfect, but most of it is lost in Unity’s internals, which we have little control over (at least in for this game). However, should we need it in the future, it should now be fairly easy to migrate the current system into Unity’s Burst compiler, which could give us some more performance.

Thanks for reading!