It is always said that dealing with floating point numbers is slow. So slow that if precision can be compromised and speed is more important, programmers will use integers instead. Integers ? yes ! But how will you deal with fractions then ? The answer is fixed point arithmetic.

When we deal with decimal numbers, each digit in a number is actually a coefficient of the base, which is 10, to a certain power for the digit. For example:

1234 = 1 \cdot 10^3 + 2 \cdot 10^2 + 3 \cdot 10^1 + 4 \cdot 10^0

We can rewrite the above equation as:

10 \cdot (123.4) = 1234 = 10 \cdot (1 \cdot 10^2 + 2 \cdot 10^1 + 3 \cdot 10^0 + 4 \cdot 10^{-1})

So by expressing with a number multiplied by 10, we can have the decimal point between the third and the fourth digit. By changing the 10 to 100, we can move the decimal point to between the second and the third digit. Basically, we can place the decimal point at any position we want, as long as we know what the multiplier we are using. That is how we express fraction with integers.

The downside is, fixed point numbers do not have fixed effective number of digits. The smaller number it has to express the less effective number of digits it has.

       1.234 ->  4 digits       0.123 ->  3 digits       0.012 ->  2 digits       0.001 ->  1 digit

Thus, when dealing with fixed points, the order of computations can really matter. The good point is, we do not need special hardware or software to deal with them since basically all arithmetics now deal with normal integers.

For binary numbers, the base is 2. The basic rules do not change.

10111_2 = 1 \cdot 2^4 + 0 \cdot 2^3 + 1 \cdot 2^2 + 1 \cdot 2^1 + 1 \cdot 2^0

2 \cdot (1011.1_2) = 10111_2 = 2 \cdot ( 1 \cdot 2^3 + 0 \cdot 2^2 + 1 \cdot 2^1 + 1 \cdot 2^0 + 1 \cdot 2^{-1})

If we split an M+N bit number into M bit and N bit numbers with decimal point between them, then the multiplier is 2N. It can be easily shown that 1.0 is then
equals to 2N in the fixed point representation. Then all we have to do to convert a real number to its fixed point representation is to multiply it with 2N (and round, any fraction after multiplication becomes conversion error). And to get the real number from a fixed point representation we simply divide it with 2N.

Addition with fixed point numbers is as simple as addition with integers but of course you need to align the decimal points in case they are different. Multiplication is also as simple as multiplication with integers, but care must be taken as the decimal point position will change. If we multiply M.N with O.P fixed point numbers, the result will be in (M+O).(N+P) format. The O+P can be easily understood as M.N has a multiplier of 2N and O.P has a multiplier of 2P. By multiplying both numbers, we will also have the multipliers be multiplied too, resulting a new multiplier of 2N+P.

Below is a small Perl program to help you convert from real numbers to fixed point numbers and vice versa. Simply execute it with:

fixedpt.pl <qformat> [number]

The qformat should be specified prepended with Q, i.e. to specify M.N format, use QM.N

number, when supplied will be autodetected. If it is detected a real number, the program will output the fixed point counterpart, and vice versa. Fixed point number must be supplied in hexadecimal.

> fixedpt.pl Q2.5 1.61.60000 = 33(16) = 0110011(2)> fixedpt.pl Q2.5 3333(16) = 0110011(2) = 1.59375

If no number is provided, the program will read from standard input.

#!/usr/local/bin/perlif ($#ARGV < 0) { print "usage:$0  [number]\n";    exit;}$QFORMAT =$ARGV[0];if ($QFORMAT =~ /[qQ]([0-9]+)\.([0-9]+)/) {$QM = $1;$QN = $2;}else { print "invalid format specifier. expected Qm.n form\n"; exit;}$QTOTAL = $QM +$QN;  # sign bit is included in QM# one = 1.000000..., where the number of 0 is $QN$ONE = 2.0 ** $QN;$MASK = (2.0 ** $QTOTAL) - 1;if ($#ARGV >= 1) {    &convert($ARGV[1]);}else { while () { chop; &convert($_);    }}sub DecToBin() {    local($FPN) = @_; local$FPNBIN, $x,$n;    $FPNBIN = ""; while (length($FPNBIN) < $QTOTAL) {$n = int($FPN / 2.0);$x = (abs($FPN -$n * 2.0) > 0.5) ? 1 : 0;        $FPNBIN =$x . $FPNBIN;$FPN = $n; } die "overflow !\n" if ($FPN > 0.0);    return $FPNBIN;}sub BinToHex() { local($FPNBIN) = @_;    local $FPNHEX,$x, $h;$FPNHEX = "";    while (length($FPNBIN) > 0) { if (length($FPNBIN) >= 4) {            $x = substr($FPNBIN, -4);            $FPNBIN = substr($FPNBIN, 0, length($FPNBIN)-4); } else {$x = $FPNBIN;$h = substr($x, 0, 1);$x = $h .$x while (length($x) < 4);$FPNBIN = "";        }        $h = ($x eq "0000") ? "0" :             ($x eq "0001") ? "1" : ($x eq "0010") ? "2" :             ($x eq "0011") ? "3" : ($x eq "0100") ? "4" :             ($x eq "0101") ? "5" : ($x eq "0110") ? "6" :             ($x eq "0111") ? "7" : ($x eq "1000") ? "8" :             ($x eq "1001") ? "9" : ($x eq "1010") ? "A" :             ($x eq "1011") ? "B" : ($x eq "1100") ? "C" :             ($x eq "1101") ? "D" : ($x eq "1110") ? "E" : "F";        $FPNHEX =$h . $FPNHEX; } return$FPNHEX;}sub HexToDec() {    local($FPN) = @_; local$FPNDEC, $x; local$hexstr = "0123456789ABCDEF";    $FPN =$FPN . "";   # make sure we are processing string    $FPNDEC = 0.0; while ($FPN ne "") {        $x = uc(substr($FPN, 0, 1));        $x = index($hexstr, $x); die "bad hexadecimal input\n" if ($x == -1);        $FPNDEC =$FPNDEC * 16.0 + $x;$FPN = substr($FPN, 1); } die "overflow !\n" if ($FPNDEC > $MASK); return$FPNDEC;}sub convert() {    local ($EXPR) = @_; local$NUMBER;    if ($EXPR =~ /[^0-9A-Fa-f\.\-]/) {$EXPR = "\$NUMBER =$EXPR";        eval $EXPR; } else {$NUMBER = $EXPR; } if ($NUMBER =~ /\./) {        # real number        $MAX = 2.0 **$QM;            if ($NUMBER >=$MAX) {            print "overflow !\n";            exit;        }	        $NEG = ($NUMBER < 0.0) ? 1 : 0;        $FPN =$ONE * abs($NUMBER);$FPN = ($MASK -$FPN) + 1.0 if ($NEG); # 2's complement on negative number$FPNBIN = &DecToBin($FPN);$FPNHEX = &BinToHex($FPNBIN); printf "%." .$QN . "f = %s(16) = %s(2)\n", $NUMBER,$FPNHEX, $FPNBIN; } else {$FPN = &HexToDec($NUMBER);$FPNBIN = &DecToBin($FPN); if (substr($FPNBIN, 0, 1) eq "1") {            $FPN = ($MASK - $FPN) + 1.0; # 2's complement on negative number$FPN = -$FPN; }$FPN = $FPN /$ONE;	        printf "%s(16) = %s(2) = %." . $QN . "f\n",$NUMBER, $FPNBIN,$FPN;    }}