Data compression has had a renaissance in the last two decades. Real speed gains in newer processor generations have started to slow down in favor of putting more cores onto a chip. So, bridging the processor / memory performance gap has become a very important strategy for scaling up. At the same time innovations such as Asymmetric Numeral Systems and cache friendly bit stream formats, have pushed the envelope of what compression can achieve in a given time budget to new heights. Of course, demands have also grown. For instance, the currently ongoing change to make 4K the standard video resolution replacing 1080p video means a fourfold increase of pixels on screen.
But what about online games? Pixels on screen and the size on disk requirements are rising just as fast as other games. Also, an unprecedented number of games now have online features. However, besides having to get the pixels on the screen and the HD sound out of the speakers, online games also need to get more and more data to and from their server(s) and to their peers. The available bandwidth is ever increasing as technological and infrastructure development continues. The rate at which this happens is never fast enough and varies greatly by locality. While a data center can easily double the bandwidth every year, rural and residential urban areas are not so lucky. The situation becomes even more dire when considering emerging markets that may still have a very poor Internet infrastructure, yet a quick growing demand for digital entertainment. An individual may be able to afford a decent handset capable of running the latest titles, but they will never be able to set up a new cell tower or lay an undersea cable for a good connection.
And so, it is up to us, the game developers, to make sure we are cunning and full of tricks in order to deliver the best possible experience to our customers. The following test shows us how to use data compression to do that in two major use cases, and one bonus use case.
Providing file downloads does not seem to be very special at first glance: Put the files on a web server and scale up with CDN or edge services, done! All those things have costs and those costs scale up as well. These costs are usually associated with the amount of bandwidth used, sometimes the amount of storage space used, and rarely the number of CPU cycles used. All three rates can be reduced by applying a stronger compression algorithm to the files.
Since the files are only compressed once at authoring time, but downloaded and decompressed many more times, the speed of compression is usually negligible. Instead we can focus on compression ratio and decompression speed. Curiously, these two values tend to correlate directly to a certain extent, meaning that a stronger compression ratio will also aid decompression speed, as there is less data to handle in total. This mitigates the cost of the greater complexity of stronger algorithms, but certainly doesn’t eliminate it. However, decompression speed is not the primary concern either, because it is usually a great deal higher than the network transfer speed can ever be. This leaves compression ratio as our prime focus, as any improvement in file size will translate directly into cost savings for the client and the server, as well as less time spent waiting for downloads to finish.
The obvious, but often neglected first step in reducing download sizes, is to weed out everything that doesn’t need to be downloaded in the first place. This requires careful tracking of all the game’s assets and their usages, which is a good idea for many reasons. In the context of downloads, it allows the culling of unused data before it even goes up onto a server. It may seem like an easy way to save space but tracking assets and their dependencies can be a daunting task and should be considered from day one. Once a project has grown to thousands of assets, it is often too late.
Secondly, many file formats already have built in compression options. The most obvious candidate are image and video files. Where lossy compression has been the standard for decades, Lossy compression means that the contents of a file will have deteriorated to a certain extent after going through a compression / decompression cycle. When implemented properly, this data loss does not cause perceptible differences to the original unlike analog copying of music and VHS cassettes, this happens only the first time a certain compression is applied. Subsequent compression passes will not further degrade the data, unless either a more aggressive setting is chosen, or a fundamental algorithm change happens. However, it is good practice to keep the original files around and always compress from those.
As image file formats go, the most commonly used ones, PNG and JPEG are already solid choices. Sometimes there are requirements that these formats cannot meet, like special GPU formats, but more on that later. Unfortunately, these two file formats are also quite complex, and it is important to avoid a few common pitfalls. For instance, PNG files created with Adobe Photoshop have a lot of metadata that is stored as plain uncompressed text in them, which can take up more space than the image data itself. In order to get rid of this and other inefficiencies that authoring programs might incur, the use of a PNG optimizer like Pngcrush, PNGOUT, AdvanceCOMP, or others is highly recommended.
Although PNG is often referred to as a lossless format, it does indeed have a lossy mode that uses color quantization. Here the colors used are stored in a palette and then referenced with an index per pixel. The number of palette entries is limited, usually to 256, thus reducing the amount of information to store, but also the amount of unique colors that will end up in the image. The tool (and library) pngquant makes use of these modes to produce even smaller images, but not all image handling libraries can deal with palettized PNGs.
For JPEG, less options are available, as the compression to image fidelity trade off can already be finely controlled with its quality parameter. However, this hasn’t stopped people from tinkering with it since its inception almost 30 years ago. A recent and quite successful effort is Guetzli, by Google’s subsidiary in Switzerland. Just drop these tools into the, hopefully fully automated, asset export pipeline and the images that come out at the other end will be smaller.
For image files not covered by these two image formats, Microsoft’s DDS or Khronos Group’s KTX file format may be a good choice, as is a self-made image format. In these cases, a general-purpose compression has to be applied to the pixel data. This is regardless of the chosen format and even when the pixels are in a GPU compressed format.
The pinnacle for image compression in games is Crunch, and its more powerful and flexible successor Basis by Rich Geldreich and Stephanie Hurlburt. These toolkits combine a customized compression algorithm with a universal GPU friendly raster format and can reach JPEG like compression levels that can be loaded directly into textures, with little processing required. Since a few weeks, with thanks to a collaboration with Google, a “light” version of the toolkit is now available for everyone.
Other media formats, such as video and audio files have lossy compression schemes as well, which should be used when available. However, opposite to the image files, their file formats seem to have naturally evolved towards the most efficient storage, so that any reasonably modern format will do a good job. It seems there are just less things one can do with these kinds of media files. Often these formats are also tied to the authoring and playback software used.
This is where the availability of lossy formats seems to be drying up. With some exceptions, like animation curves, most other types of files have very little to zero tolerance for information loss, making lossy compression infeasible. Luckily a wide range of lossless general-purpose compression formats exists. The most commonly used one is the Deflate algorithm. Known under many names, it is used in ZIP files, but also many other places. The Deflate algorithm has served us very well, but it is also roughly 30 years old now and computer science as well as technology has made a lot of progress during that time.
The probably most widely known competitor to Deflate is the LZMA algorithm, made popular by the 7-zip archiving software. Related compression algorithms in this class are LZHAM and Bzip2. All these are still widely used to provide downloads but are, like Deflate, not the best choices anymore. While LZMA still sports a competitive compression ratio, its decompression speed is quite lackluster. So, let’s look at more modern alternatives.
Arguably one of the first algorithms to make use of the techniques described in the ANS paper1 mentioned earlier is Zstd by Yann Collet. Yann Collet was hired by Facebook some time ago, but the algorithm is still free and open source under a pretty standard license that oddly enough forbids the use of Facebook’s trademark, but not much else. As a compression algorithm it has been a great success and it beats Deflate in ratio, compression speed, and decompression speed. There is simply no reason not to use it.
Finally, at the top of the hill of general-purpose compression algorithms sits a commercial package called Oodle. Its high compression algorithms Kraken and Leviathan beat any other currently available compression algorithm in one or more categories.
Unlike the previous use case regarding downloads, optimizing live communication is a trickier. For downloads, we can easily measure how much data needs to be downloaded per client, estimate how often it will be downloaded, and voila, we have our basic cost structure done. And to get those costs down by shrinking the amount of data or reducing how often it needs to be downloaded by using local caches. This is pretty much a silver bullet.
For live communication, meaning all kinds of bi-directional interaction over the network between clients, servers, and each other, we need to consider a few more variables. How long are user sessions? How often do data transfers need to happen? How much CPU resources do I have to spare for data preparation and processing? How much bandwidth are my users willing to spare? Most of these questions should be answered at the various design stages of a project. For at least two of them data compression can have a meaningful impact: Bandwidth and CPU resource use.
A further complication is that the compression ratio is no longer our single most important factor anymore. Both compression speed (which in the download case we could safely ignore) and decompression speed play big roles as well. Thus, it is extra important to consider all the available options carefully and pick the most suitable one.
Let’s start off with communication via HTTP, where our options are limited by the framework. Not only is the range of available compression methods limited to what the standard allows (which would be quite wide) but also to what is actually available and widely supported. Here, for a long time, gzip was the only game in town. Gzip is just another name for the Deflate algorithm mentioned earlier and is therefore clearly not the best choice anymore.
Luckily Google comes to the rescue with their Swiss made Brotli algorithm. While Brotli is very much in line with the advanced algorithms discussed in the earlier section, it does have two distinct advantages. Firstly, due to Google’s very large influence on the WWW’s infrastructure, Brotli has become a very widely supported algorithm in recent years. Secondly, Brotli is not only a cutting-edge algorithm, it also comes pre-loaded with a shared dictionary that was trained by using a vast amount of common WWW traffic. This gives Brotli an additional edge when applied to compress the kind of files that are commonly used in websites.
This means, when required to communicate via HTTP, turn compression on and request or provide Brotli compressed content. Conversely, when serving already compressed content, turn HTTP compression off, as it is only a waste of resources in this case. However, from a performance point of view, it is always better to choose simpler protocols than HTTP, possibly even via WebSockets. Although, before we get to that, a quick intermission to talk about dictionaries and compression.
One unavoidable detail when discussing data compression is that all compression algorithms build and use a dictionary. This may happen implicitly, explicitly, or in a mixed way, but it is always there. Simply speaking, such a dictionary contains the fragments of the original data that appear more than once and are replaced in the compressed data with dictionary references. If dictionary plus the references are smaller than the original data, data compression has been achieved.
Now, as has been alluded to already at the introduction of Brotli, a related set of data will have related dictionaries. Therefore, it possible to create a shared dictionary, that will contain data fragments common to the whole data set. Brotli already does that with a dictionary defined in its specification, but of course that’s for a fixed data set defined by Google, which may not match our desired data set well.
It should not be surprising that Brotli and Zstd come with tools to build a shared dictionary for any desired data set. Using this can be very effective when dealing with packets with similar data. There is even a web standard evolving to support this, but it is unclear if it will ever get into the mainstream. However, if we are not limited by HTTP and in control of the presentation layer of the protocol, we can do much better.
Before we begin, it is important to consider what data needs to be transferred. All data that isn’t needed can be compressed to nothing by just not sending it, requiring no fancy algorithm at all. So, the first order of business is to make sure that no unneeded data is sent. Secondly, data should be organized into parts that change frequently and bits that don’t. For instance, configuration values only need to be sent once at the beginning of a session, and not with every single packet.
As discussed in the downloads section, this may seem obvious, but is difficult to do in practice. For instance, as soon as any kind of automated serialization is introduced, redundant data starts to creep in. One way to get out of this is to automatically track to what state has been sent already and only transfer the necessary changes to get the state of the receiver up to date. Doing so requires a persistent connection and this is an area where HTTP’s request/response model can quickly reach its limits.
So, we move on to the realm of persistent connections, i.e. TCP/IP. The exact protocol does not matter, as long we have a persistent communication channel between two endpoints that are available for the duration of a session that don’t lose or reorder packets. With this guarantee we can pull another trick out of the hat.
Compression tends to get better when bigger chunks of data is getting compressed, because the chance to find repeated data fragments increases. For live communications however we want to send data packets that are as small as possible, so we lose a lot of compression opportunities. The way around this is to communicate using a compressed stream of data, which is achieved by not resetting the state of the compressor / decompressor pair after every packet. This way the algorithm can refer to data fragments sent in previous packets, approaching the effectiveness of compressing larger chunks of data.
All compression algorithms can do that in principle, but not all compression APIs support it. If it is in any way possible to use stream compression, it is advisable to make use of it. It is more effective and less cumbersome than the shared dictionary approach, because the shared dictionary fitting the actual data being transferred in a session is built implicitly over time.
Now, let’s look at some concrete compression algorithms. Brotli and Zstd support stream compression, so they are already good choices. But they are not the best at compression speed, which is often a concern in live communication. Data is always dynamically generated and needs to be compressed / decompressed every time it is sent by both the sender and the receiver.
In recent years a set of algorithms has been developed that provide a decent compression ratio but compress and decompress much faster. Some of them, at the right setting, can exceed the speed of just copying the same amount of data in memory, making it actually slower not to use compression.
One of the oldest algorithms in this class is LZO, and is available under a commercial and a free license. Yann Collet, of Zstd fame, created LZ4, which is completely free and will usually do better than LZO. Another set of free algorithms are available in Density, but the king of the hill is again the Oodle16 package with its Selkie and Mermaid algorithms. Any of these algorithms will likely speed up your life communication, reduce bandwidth used, and, as a bonus, provide some obfuscation.
While obfuscation alone clearly is no security guarantee, it helps with encryption efficiency, because, simply speaking, compressed data is mathematically more random. At this point it should be mentioned that encrypted data, by design of the encryption algorithm, if it is any good, cannot be compressed any more. Thus, encryption must always be applied after compression.
A word of caution regarding streaming compression. If the number of connections on a single host is very high, the amount of memory required for compression state tracking can add up to a significant number. If that is the case, careful study of the used compression algorithms documentation or API will usually reveal ways to reduce this, at the cost of compression ratio of course. Also switching the used algorithm to one with a lesser state memory requirement may be necessary.
This brings us neatly to another point worth considering. Since we have so many good compression choices available now, it makes sense to integrate more than one of them and make their use configurable, maybe even on a per session level. This is definitely helpful during development to measure various algorithms at various settings with real data to see which one is the most efficient. Secondly, there may be situations during the operation where an algorithm switch is beneficial, for instance when a DOS or crash vector is discovered in the compression algorithm. There are libraries, like Squash, available that already provide a common interface for various compression algorithms.
Which brings us to the issue of safety. Most of the modern algorithms mentioned here are created or backed by companies with a financial interest in the safety of their products and have a good track record of doing so. They can all be considered safe for general use. For any specific needs or questions in that area it is always advisable to get in contact with the providers of these algorithms directly.
Streaming of audio and video is currently the most rapidly growing sector in interactive and other media, and for good reason. Without heavy use of data compression it would not be possible to do any of that. Unfortunately, the media streaming field currently has a very high barrier of entry, but we can learn some things from it.
Streaming media is time sensitive and has rather large data packets to transfer. This time sensitivity can be handled in two ways, either pause the stream or skip parts of it. Pausing is only possible for non-interactive and non-live streaming, like a movie stream. For any kind of interactive media, data that does not arrive in time is no longer needed and can be discarded. The same is true for real time games, like shooter or real-time strategy games, by the way. This means that, somewhat ironically, streaming compression is not possible for the whole duration, because a contiguous data stream cannot be guaranteed.
In order to achieve a decent compression ratio, simply speaking, these media files are split into chunks that are made up of a few seconds of content, which are then compressed individually and served to the client. This way, whole chunks can be skipped if necessary. In addition, chunks can be prepared in multiple quality levels, allowing to adjust the bandwidth on the fly.
As we have seen, there are many options available that can make a significant improvement over leaving data uncompressed or using Deflate. Choosing the best option for a given problem is not always straight forward, but some guaranteed wins exist. For instance, replacing Deflate with Zstd or Brotli is almost always a net positive and incurs no costs in addition to implementing the change. Other changes may be more involved, and require a good amount of experimentation, but also allow for bigger benefits.
Remember, every bit of bandwidth saved is an equal saving in bandwidth cost. And that cost quickly turns into the number one non-personnel cost as a product or service scales up. In fact, certain use-cases, like video streaming, are right out technically or financially infeasible without proper data treatment.
In addition to the potentially substantial cost savings, clever use of data compression can also reduce wait times and memory use, while opening opportunities to deliver higher quality content with the same bandwidth, or the same quality content with reduced bandwidth. It enables a product to stand out from its competition. And in the entertainment industry, this can be worth a lot more than any cost savings.
This is a guest blog by Dietmar Hauser, who is an experienced programmer and owner of Roborodent.