When using Golomb/Rice code in image compression, it is inevitable for us to meet large values. Golomb coding uses a tunable parameter M to divide an input value N into two parts : q, the result of a division by M, and r, the remainder. The quotient is sent in unary coding, followed by the remainder in binary.
For example, if N is 120, and M is 4, then q is 30(in unary 111…..1111110, 31bits), r is 0. So the result is 33 bits. It is not practical in the use of image compression.
So what should I do when using Golomb/Rice code for large values?
Or does it have any rules for large values?
3
I don’t understand why “33 bits is not practical”. There’s two answers to that depending on what you mean:
-
The whole point of this sort of encoding is to optimise
representation of the small values at the expense of large values.
Assuming a suitable distribution of numbers to be compressed with
many more smaller values than larger values, there should be less
bits needed by the encoded representation overall. Note that for image
compression, this sort of scheme is generally used to encode errors
from a predicted pixel value, not the pixel values themselves. -
If you’re referring to some issue due to a 32 bit integer size in the language you’re using, I find it’s quite important/useful to have an nice arbitrary length bitvector class for this sort of thing (boost’s
dynamic_bitset
for example).
Golomb-Codes and Rice-Codes are only efficient, if the Symbols you want to encode have a geometric distribution or near geometric distribution. ( d(n)=(1-p) pn ). This is a code which can encode even infinite values, but the encoded symbol grows in size when further away from 0, since they are less probable than values near 0. This is why you spend more bits on less probable symbols and few bits on very probable symbols.
A Golomb code is a variable length code. Where each symbol could be encoded at different legth. The less probable, the longer the encoding. Each symbol is encoded by multiple bits and you write the encoding bit by bit, regardless if you encountered a magic 8, 16, 32, or 64 bit bound. The code is not meant to be read/written byte, word, int or long, this is what you do, when writing the encoded data to the disk. If you must spend 33 bits for a given symbol, you must spend it, otherwise the code is not reversible. But you must read or write bitwise.
In your example the value 120 would create a 31 bit prefix and some extrabits 2 for the code. But on the other hand 120 is 2-30 times less probable than 0. Thus it is completely perfect to spend this “much” bits for it.
The selection of the optimal M depends on the given distribution. Golomb-Coding and Rice-Coding (which is a special case of Golomb coding) are used in image compression and video compression, because it is easy to implement and is fast in processing.