| # ASTC Format Overview |
| |
| Adaptive Scalable Texture Compression (ASTC) is an advanced lossy texture |
| compression technology developed by Arm and AMD. It has been adopted as an |
| official Khronos extension to the OpenGL and OpenGL ES APIs, and as a standard |
| optional feature for the Vulkan API. |
| |
| ASTC offers a number of advantages over earlier texture compression formats: |
| |
| * **Format flexibility:** ASTC supports compressing between 1 and 4 channels of |
| data, including support for one non-correlated channel such as RGB+A |
| (correlated RGB, non-correlated alpha). |
| * **Bit rate flexibility:** ASTC supports compressing images with a fine |
| grained choice of bit rates between 0.89 and 8 bits per texel (bpt). The bit |
| rate choice is independent to the color format choice. |
| * **Advanced format support:** ASTC supports compressing images in either low |
| dynamic range (LDR), LDR sRGB, or high dynamic range (HDR) color spaces, as |
| well as support for compressing 3D volumetric textures. |
| * **Improved image quality:** Despite the high degree of format flexibility, |
| ASTC manages to beat nearly all legacy texture compression formats -- such as |
| ETC2, PVRCT, and the BC formats -- on image quality at equivalent bit |
| rates. |
| |
| This article explores the ASTC format, and how it manages to generate the |
| flexibility and quality improvements that it achieves. |
| |
| |
| Why ASTC? |
| ========= |
| |
| Before the creation of ASTC, the format and bit rate coverage of the available |
| formats was very sparse: |
| |
|  |
| |
| In reality the situation is even worse than this diagram shows, as many of |
| these formats are proprietary or simply not available on some operating |
| systems, so any single platform will have very limited compression choices. |
| |
| For developers this situation makes developing content which is portable across |
| multiple platforms a tricky proposition. It's almost certain that differently |
| compressed assets will be needed for different platforms. Each asset pack would |
| likely then need to use different levels of compression, and may even have to |
| fall back to no compression for some assets on some platforms, which leaves |
| either some image quality or some memory bandwidth efficiency untapped. |
| |
| It was clear a better way was needed, so the Khronos group asked members to |
| submit proposals for a new compression algorithm to be adopted in the same |
| manner that the earlier ETC algorithm was adopted for OpenGL ES. ASTC was the |
| result of this, and has been adopted as an official algorithm for OpenGL, |
| OpenGL ES, and Vulkan. |
| |
| |
| Format overview |
| =============== |
| |
| Given the fragmentation issues with the existing compression formats, it should |
| be no surprise that the high level design objectives for ASTC were to have |
| something which could be used across the whole range of art assets found in |
| modern content, and which allows artists to have more control over the quality |
| to bit rate tradeoff. |
| |
| There are quite a few technical components which make up the ASTC format, so |
| before we dive into detail it will be useful to give an overview of how ASTC |
| works at a higher level. |
| |
| |
| Block compression |
| ----------------- |
| |
| Compression formats for real-time graphics need the ability to quickly and |
| efficiently make random samples into a texture. This places two technical |
| requirements on any compression format: |
| |
| * It must be possible to compute the address of data in memory given only a |
| sample coordinate. |
| * It must be possible to decompress random samples without decompressing too |
| much surrounding data. |
| |
| The standard solution for this used by all contemporary real-time formats, |
| including ASTC, is to divide the image into fixed-size blocks of texels, each |
| of which is compressed into a fixed number of output bits. This feature makes |
| it possible to access texels quickly, in any order, and with a well-bounded |
| decompression cost. |
| |
| The 2D block footprints in ASTC range from 4x4 texels up to 12x12 texels, which |
| all compress into 128-bit output blocks. By dividing 128 bits by the number of |
| texels in the footprint, we derive the format bit rates which range from 8 bpt |
| (`128/(4*4)`) down to 0.89 bpt (`128/(12*12)`). |
| |
| |
| Color encoding |
| -------------- |
| |
| ASTC uses gradients to assign the color values of each texel. Each compressed |
| block stores the end-point colors for a gradient, and an interpolation weight |
| for each texel which defines the texel's location along that gradient. During |
| decompression the color value for each texel is generated by interpolating |
| between the two end-point colors, based on the per-texel weight. |
| |
|  |
| |
| In many cases a block will contain a complex distribution of colors, for |
| example a red ball sitting on green grass. In these scenarios a single color |
| gradient will not be able to accurately represent all of the texels' values. To |
| support this ASTC allows a block to define up to four distinct color gradients, |
| known as partitions, and can assign each texel to a single partition. For our |
| example we require two partitions, one for our ball texels and one for our |
| grass texels. |
| |
|  |
| |
| Now that you know the high level operation of the format, we can dive into more |
| detail. |
| |
| |
| Integer encoding |
| ================ |
| |
| Initially the idea of fractional bits per texel sounds implausible, or even |
| impossible, because we're so used to storing numbers as a whole number of bits. |
| However, it's not quite as strange as it sounds. ASTC uses an encoding |
| technique called Bounded Integer Sequence Encoding (BISE), which makes heavy |
| use of storing numbers with a fractional number of bits to pack information |
| more efficiently. |
| |
| |
| Storing alphabets |
| ----------------- |
| |
| Even though color and weight values per texel are notionally floating-point |
| values, we have far too few bits available to directly store the actual values, |
| so they must be quantized during compression to reduce the storage size. For |
| example, if we have a floating-point weight for each texel in the range 0.0 to |
| 1.0 we could choose to quantize it to five values - 0.0, 0.25, 0.5, 0.75, and |
| 1.0 - which we can then represent in storage using the integer values 0 to 4. |
| |
| In the general case we need to be able to efficiently store characters of an |
| alphabet containing N symbols if we choose quantize to N levels. An N symbol |
| alphabet contains `log2(N)` bits of information per character. If we have an |
| alphabet of 5 possible symbols then each character contains ~2.32 bits of |
| information, but simple binary storage would require us to round up to 3 bits. |
| This wastes 22.3% of our storage capacity. The chart below shows the percentage |
| of our bit-space wasted when using simple binary encoding to store an arbitrary |
| N symbol alphabet: |
| |
|  |
| |
| ... which shows for most alphabet sizes we waste a lot of our storage capacity |
| when using an integer number of bits per character. Efficiency is of critical |
| importance to a compression format, so this is something we needed to be able |
| to improve. |
| |
| **Note:** We could have chosen to round-up the quantization level to the next |
| power of two, and at least use the bits we're spending. However, this forces |
| the encoder to spend bits which could be used elsewhere for a bigger benefit, |
| so it will reduce image quality and is a sub-optimal solution. |
| |
| |
| Quints |
| ------ |
| |
| Instead of rounding up a 5 symbol alphabet - called a "quint" in BISE - to |
| three bits, we could choose to instead pack three quint characters together. |
| Three characters in a 5-symbol alphabet have 5<sup>3</sup> (125) combinations, |
| and contain 6.97 bits of information. We can store this in 7 bits and have a |
| storage waste of only 0.5%. |
| |
| |
| Trits |
| ----- |
| |
| We can similarly construct a 3-symbol alphabet - called a "trit" in BISE - and |
| pack trit characters in groups of five. Each character group has 3<sup>5</sup> |
| (243) combinations, and contains 7.92 bits of information. We can store this in |
| 8 bits and have a storage waste of only 1%. |
| |
| |
| BISE |
| ---- |
| |
| The BISE encoding used by ASTC allows storage of character sequences using |
| arbitrary alphabets of up to 256 symbols, encoding each alphabet size in the |
| most space-efficient choice of bits, trits, and quints. |
| |
| * Alphabets with up to (2<sup>n</sup> - 1) symbols can be encoded using n bits |
| per character. |
| * Alphabets with up (3 * 2<sup>n</sup> - 1) symbols can be encoded using n bits |
| (m) and a trit (t) per character, and reconstructed using the equation |
| (t * 2<sup>n</sup> + m). |
| * Alphabets with up to (5 * 2<sup>n</sup> - 1) symbols can be encoded using n |
| bits (m) and a quint (q) per character, and reconstructed using the equation |
| (q * 2<sup>n</sup> + m). |
| |
| When the number of characters in a sequence is not a multiple of three or five |
| we need to avoid wasting storage at the end of the sequence, so we add another |
| constraint on the encoding. If the last few values in the sequence to encode |
| are zero, the last few bits in the encoded bit string must also be zero. |
| Ideally, the number of non-zero bits should be easily calculated and not depend |
| on the magnitudes of the previous encoded values. This is a little tricky to |
| arrange during compression, but it is possible. This means that we do not need |
| to store any padding after the end of the bit sequence, as we can safely assume |
| that they are zero bits. |
| |
| With this constraint in place - and by some smart packing the bits, trits, and |
| quints - BISE encodes an string of S characters in an N symbol alphabet using a |
| fixed number of bits: |
| |
| * S values up to (2<sup>n</sup> - 1) uses (NS) bits. |
| * S values up to (3 * 2<sup>n</sup> - 1) uses (NS + ceil(8S / 5)) bits. |
| * S values up to (5 * 2<sup>n</sup> - 1) uses (NS + ceil(7S / 3)) bits. |
| |
| ... and the compressor will choose the one of these which produces the smallest |
| storage for the alphabet size being stored; some will use binary, some will use |
| bits and a trit, and some will use bits and a quint. If we compare the storage |
| efficiency of BISE against simple binary for the range of possible alphabet |
| sizes we might want to encode we can see that it is much more efficient. |
| |
|  |
| |
| |
| Block sizes |
| =========== |
| |
| ASTC always compresses blocks of texels into 128-bit outputs, but allows the |
| developer to select from a range of block sizes to enable a fine-grained |
| tradeoff between image quality and size. |
| |
| | Block footprint | Bits/texel | | Block footprint | Bits/texel | |
| | --------------- | ---------- | --- | --------------- | ---------- | |
| | 4x4 | 8.00 | | 10x5 | 2.56 | |
| | 5x4 | 6.40 | | 10x6 | 2.13 | |
| | 5x5 | 5.12 | | 8x8 | 2.00 | |
| | 6x5 | 4.27 | | 10x8 | 1.60 | |
| | 6x6 | 3.56 | | 10x10 | 1.28 | |
| | 8x5 | 3.20 | | 12x10 | 1.07 | |
| | 8x6 | 2.67 | | 12x12 | 0.89 | |
| |
| |
| |
| Color endpoints |
| =============== |
| |
| The color data for a block is encoded as a gradient between two color |
| endpoints, with each texel selecting a position along that gradient which is |
| then interpolated during decompression. ASTC supports 16 color endpoint |
| encoding schemes, known as "endpoint modes". Options for endpoint modes |
| include: |
| |
| * Varying the number of color channels: e.g. luminance, luminance + alpha, rgb, |
| and rgba. |
| * Varying the encoding method: e.g. direct, base+offset, base+scale, |
| quantization level. |
| * Varying the data range: e.g. low dynamic range, or high dynamic range |
| |
| The endpoint modes, and the endpoint color BISE quantization level, can be |
| chosen on a per-block basis. |
| |
| |
| Color partitions |
| ================ |
| |
| Colors within a block are often complex, and cannot be accurately captured by a |
| single color gradient, as discussed earlier with our example of a red ball |
| lying on green grass. ASTC allows up to four color gradients - known as |
| "partitions" - to be assigned to a single block. Each texel is then assigned to |
| a single partition for the purposes of decompression. |
| |
| Rather then directly storing the partition assignment for each texel, which |
| would need a lot of decompressor hardware to store it for all block sizes, we |
| generate it procedurally. Each block only needs to store the partition index - |
| which is the seed for the procedural generator - and the per texel assignment |
| can then be generated on-the-fly during decompression. The image below shows |
| the generated texel assignments for two (top), three (middle), and four |
| (bottom) partitions for the 8x8 block size. |
| |
|  |
| |
| The number of partitions and the partition index can be chosen on a per-block |
| basis, and a different color endpoint mode can be chosen per partition. |
| |
| **Note:** ASTC uses a 10-bit seed to drive the partition assignments. The hash |
| used will introduce horizontal bias in a third of the partitions, vertical bias |
| in a third, and no bias in the rest. As they are procedurally generated not all |
| of the partitions are useful, in particular with the smaller block sizes. |
| |
| * Many partitions are duplicates. |
| * Many partitions are degenerate (an N partition hash results in at least one |
| partition assignment that contains no texels). |
| |
| |
| Texel weights |
| ============= |
| |
| Each texel requires a weight, which defines the relative contribution of each |
| color endpoint when interpolating the color gradient. |
| |
| For smaller block sizes we can choose to store the weight directly, with one |
| weight per texel, but for the larger block sizes we simply do not have enough |
| bits of storage to do this. To work around this ASTC allows the weight grid to |
| be stored at a lower resolution than the texel grid. The per-texel weights are |
| interpolated from the stored weight grid during decompression using a bilinear |
| interpolation. |
| |
| The number of texel weights, and the weight value BISE quantization level, can |
| be chosen on a per-block basis. |
| |
| |
| Dual-plane weights |
| ------------------ |
| |
| Using a single weight for all color channels works well when there is good |
| correlation across the channels, but this is not always the case. Common |
| examples where we would expect to get low correlation at least some of the time |
| are textures storing RGBA data - alpha masks are not usually closely |
| correlated with the color value - or normal data - the X and Y normal values |
| often change independently. |
| |
| ASTC allows a dual-plane mode, which uses two separate weight grids for each |
| texel. A single channel can be assigned to a second plane of weights, while |
| the other three use the first plane of weights. |
| |
| The use of dual-plane mode can be chosen on a per-block basis, but its use |
| prevents the use of four color partitions as we do not have enough bits to |
| concurrently store both an extra plane of weights and an extra set of color |
| endpoints. |
| |
| |
| End results |
| =========== |
| |
| So, if we pull all of this together what do we end up with? |
| |
| |
| Adaptive |
| -------- |
| |
| The first word in the name of ASTC is "adaptive", and it should now hopefully |
| be clear why. Each block always compresses into 128-bits of storage, but the |
| developer can choose from a wide range of texel block sizes and the compressor |
| gets a huge amount of latitude to determine how those 128 bits are used. |
| |
| The compressor can trade off the number of bits assigned to colors (number of |
| partitions, endpoint mode, and stored quantization level) and weights (number |
| of weights per block, use of dual-plane, and stored quantization level) on a |
| per-block basis to get the best image quality possible. |
| |
|  |
| |
| |
| Format support |
| -------------- |
| |
| The compression scheme used by ASTC effectively compresses arbitrary sequences |
| of floating point numbers, with a flexible number of channels, across any of |
| the supported block sizes. There is no real notion of "color format" in the |
| format itself at all, beyond the color endpoint mode selection, although a |
| sensible compressor will want to use some format-specific heuristics to drive |
| an efficient state-space search. |
| |
| The orthogonal encoding design allows ASTC to provide almost complete coverage |
| of our desirable format matrix from earlier, across a wide range of bit rates: |
| |
|  |
| |
| The only significant omission is the absence of a dedicated two channel |
| encoding for HDR textures. We simply ran out of entries in the space we had for |
| encoding color endpoint modes, and this one didn't make the cut. |
| |
| The flexibility allowed by ASTC ticks the requirement that almost any asset can |
| be compressed to some degree, at an appropriate bitrate for its quality needs. |
| This is a powerful enabler for a compression format, because it puts control in |
| the hands of content creators and not arbitrary format restrictions. |
| |
| |
| Image quality |
| ------------- |
| |
| The normal expectation would be that this level of format flexibility would |
| come at a cost of image quality; it has to cost something, right? Luckily this |
| isn't true. The high packing efficiency allowed by BISE encoding, and the |
| ability to dynamically choose where to spend encoding space on a per-block |
| basis, means that an ASTC compressor is not forced to spend bits on things that |
| don't help image quality. |
| |
| This gives some significant improvements in image quality compared to the older |
| texture formats, even though ASTC also handles a much wider range of options. |
| |
| * ASTC at 2 bpt outperforms PVRTC at 2 bpt by ~2.0dB. |
| * ASTC at 3.56 bpt outperforms PVRTC and BC1 at 4 bpt by ~1.5dB, and ETC2 by |
| ~0.7dB, despite a 10% bit rate disadvantage. |
| * ASTC at 8 bpt for LDR formats is comparable in quality to BC7 at 8 bpt. |
| * ASTC at 8 bpt for HDR formats is comparable in quality to BC6H at 8 bpt. |
| |
| Differences as small as 0.25dB are visible to the human eye, and remember that |
| dB uses a logarithmic scale, so these are significant image quality |
| improvements. |
| |
| |
| 3D compression |
| -------------- |
| |
| One of the nice bonus features of ASTC is that the techniques which underpin |
| the format generalize to compressing volumetric texture data without needing |
| very much additional decompression hardware. |
| |
| ASTC is therefore also able to optionally support compression of 3D textures, |
| which is a unique feature not found in any earlier format, at the following |
| bit rates: |
| |
| | Block footprint | Bits/texel | | Block footprint | Bits/texel | |
| | --------------- | ---------- | --- | --------------- | ---------- | |
| | 3x3x3 | 4.74 | | 5x5x4 | 1.28 | |
| | 4x3x3 | 3.56 | | 5x5x5 | 1.02 | |
| | 4x4x3 | 2.67 | | 6x5x5 | 0.85 | |
| | 4x4x4 | 2.00 | | 6x6x5 | 0.71 | |
| | 5x4x4 | 1.60 | | 6x6x6 | 0.59 | |
| |
| |
| Availability |
| ============ |
| |
| The ASTC functionality is specified as a set of feature profiles, allowing |
| GPU hardware manufacturers to select which parts of the standard they |
| implement. There are four commonly seen profiles: |
| |
| * "LDR": |
| * 2D blocks. |
| * LDR and sRGB color space. |
| * [KHR_texture_compression_astc_ldr][astc_ldr]: KHR OpenGL ES extension. |
| * "LDR + Sliced 3D": |
| * 2D blocks and sliced 3D blocks. |
| * LDR and sRGB color space. |
| * [KHR_texture_compression_astc_sliced_3d][astc_3d]: KHR OpenGL ES extension. |
| * "HDR": |
| * 2D and sliced 3D blocks. |
| * LDR, sRGB, and HDR color spaces. |
| * [KHR_texture_compression_astc_hdr][astc_ldr]: KHR OpenGL ES extension. |
| * "Full": |
| * 2D, sliced 3D, and volumetric 3D blocks. |
| * LDR, sRGB, and HDR color spaces. |
| * [OES_texture_compression_astc][astc_full]: OES OpenGL ES extension. |
| |
| The LDR profile is mandatory in OpenGL ES 3.2 and a standardized optional |
| feature for Vulkan, and therefore widely supported on contemporary mobile |
| devices. The 2D HDR profile is not mandatory, but is widely supported. |
| |
| 3D texturing |
| ------------ |
| |
| The APIs expose 3D textures in two flavors. |
| |
| The sliced 3D texture support builds a 3D texture from an array of 2D image |
| slices that have each been individually compressed using 2D ASTC compression. |
| This is required for the HDR profile, so is also widely supported. |
| |
| The volumetric 3D texture support uses the native 3D block sizes provided by |
| ASTC to implement true volumetric compression. This enables a wider choice of |
| low bitrate options than the 2D blocks, which is particularly important for 3D |
| textures of any non-trivial size. Volumetric formats are not widely supported, |
| but are supported on all of the Arm Mali GPUs that support ASTC. |
| |
| ASTC decode mode |
| ---------------- |
| |
| ASTC is specified to decompress texels into fp16 intermediate values, except |
| for sRGB which always decompresses into 8-bit UNORM intermediates. For many use |
| cases this gives more dynamic range and precision than required. This can cause |
| a reduction in both texture cache efficiency and texture filtering performance |
| due to the larger decompressed data size. |
| |
| A pair of extensions exist, and are widely supported on recent mobile GPUs, |
| which allow applications to reduce the intermediate precision to either UNORM8 |
| (recommended for LDR textures) or RGB9e5 (recommended for HDR textures). |
| |
| * [OES_texture_compression_astc_decode_mode][astc_decode]: Allow UNORM8 |
| intermediates |
| * [OES_texture_compression_astc_decode_mode_rgb9e5][astc_decode]: Allow RGB9e5 |
| intermediates |
| |
| [astc_ldr]: https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_texture_compression_astc_hdr.txt |
| [astc_3d]: https://www.khronos.org/registry/OpenGL/extensions/KHR/KHR_texture_compression_astc_sliced_3d.txt |
| [astc_full]: https://www.khronos.org/registry/OpenGL/extensions/OES/OES_texture_compression_astc.txt |
| [astc_decode]: https://www.khronos.org/registry/OpenGL/extensions/EXT/EXT_texture_compression_astc_decode_mode.txt |