binary calculators

scicalc and most PDA based calculators calculate internally in binary rather than decimal (ie BCD). some modern physical calculators do this too, like the hp30s and hp9g. traditionally all physical calculators have performed their arithmetic internally in BCD.

the advantage for decimal calculation is that it is consistent with our own expectations of numbers in decimal, especially for calculations involving money. the value $0.10, for example is exactly 1/10 and is representable exactly in bcd. however 0.1 does not have a finite digit string in binary. ie, it recurs like 1/3 does in decimal.

early calculators (in bcd) had problems with recurring decimals. take 1 divide by 3 and multiply by 3. sometimes you don't get back to 1, instead the machine reads out 0.999999999. this is because 1/3 internally was a bit short of a real third.

this deficiency is easy to fix by using internal guard digits. basically, extra precision of say 1 or 2 digits (of bcd) is retained internally. so for a 10 digit calculator, internally it would hold 11, rounding the internal number to a 10 digit display. thus 0.9999999999 (11 digits) rounded to 1.000000000, simple!

if your internals are binary, you have more problems.

consider, 2 + 0.2 + 0.2 + 0.2 + 0.2 + 0.2 - 3. If calculated in binary, you will not get zero.

main()
{
    double a = 0.2;
    double v;
    v = 2.0 + a + a + a + a + a - 3.0;
    printf("%g\n", v);
    return 0;
}

gives me: 8.88178e-016. the problem again is that 0.2 is not binary representable. this is not a guard digit problem. to fix this, binary calculators have to have some form of value suppression when performing arithmetic.

define a ~ b if |a-b| < eps*|a| and let eps be 10^-d where d is the total number of decimal digits needed internally including any guards. so, roughly speaking, a ~ b if a and b are the same enough for it not to matter. for binary, its more sensible to let eps be 2^-n for some n.

subtraction of a and b, then works as a - b or 0 if a ~ b. similarly a plus b is, a + b or 0 if a ~ -b.

let's try some cases on the hp30s. 2+.2+.2+.2+.2+.2-3=0 good! 2+1e-9-2=0 clang! wrong answer. 1e-9 fits on the display and is 10 digits. 2+1e-10-2=0 correct because we can excuse 10^-10 underflowing on a 10 digit calculator. interestingly, the machine works internally to 24 digits and appears to suppress to 10 (or 9 sometimes!)

on the hp9g, 2+1e-9-2=1e-9. correct this time! 2+1e-13-2=1e-13, good! and 2+1e-14-2=0. 10^-14 underflow. looks like they've chosen the suppression to mimic 13 decimal digits which would be like a decimal machine with 3 digits guard. again the 9g is 24 digit internal. you can coax out more than 10 digits by subtracting nine digits at a time (30s) or up to 12 (9g).

(4invtan(1)-3.14159265)*1e9=3.589793116 on the 30s or 3.589793238 on the 9g. here the 9g is correct, lets get some more. Ans-3.589793238 gets 4.626490837. giving a total of 3.14159265358979323846264(90837). here, ive put brackets where its gone wrong and given garbage low bits, but we got 24 digits correct.

you don't need to do the same trick for scicalc, because it displays all its digits on the screen. the suppression value is 2^-100 which gives 30 digits. here 4*atan(1) = 3.14159265358979323846264338328 which is correct.

but now there is a new problem to taunt us,

consider 1 + 1e-9 - 1. no matter what you do, you are going to lose around 6 or 7 digits at the low end of your precision and bring in garbage. on a bcd machine that garbage will be decimal zeros, that's fine. but on a binary machine, they will be binary zeros. those binary zeros leave a non-zero decimal tail. so for example, in scicalc, i get:

> 1+1e-9-1
= 9.999999999999999999999999365945e-10

now that wouldn't be so bad if you were trying to build a 10 digit calculator in binary because the garbage is far away. now consider

ln(1+1e-9). this is bad because ln(1+x) = x - x^2/2 + x^3/3 - ... and sure enough, you're bringing into range garbage destroyed by 1+x. let's see what i get and also the hp30s.

> ln(1+1e-9)
= 9.999999995000000003333332696778e-10
> lnone(1e-9)
= 9.999999995000000003333333330833e-10

as you see, the ln(1+1e-9) is at the mercy of garbage already introduced, the ln can't win and the expression is unable to yield the full precision answer. lnone function shows what the real answer should be.

both the hp30s and hp9g show

ln(1+1e-9)=9.999999995x10^-10

which is correct. they're only able to do this because of the internal extra precision (around 24 digits). if they worked to, say, 12 digits internally but in binary, they would be in trouble with this example.

on the 9g, some of the internal digits can be revealed by, ln(1+1e-9)-9.9999999e-10 = 9.9499999029e-18 showing that the internal precision is reduced by garbage to 15 digits, since its internal answer is 9.99999999499999(029)...e-10 where brackets show the end of the correct answer.

so if you're building a binary calculator, you need a lot more internal precision to be sure of delivering 10 correct figures.