Half the Decimal Trick

January 9, 2015

If something happened 1,234 out of 10,000 times, we'd estimate that the true probability of occurence is about 0.1234. Of course, we wouldn't expect the true probability to be exactly 0.1234, and to quantify the uncertainty in this estimate statisticians have long computed confidence intervals. But in this particular case, there's a simple eyeballing trick we can use to get approximate error bars: we round the proportion to half the number of decimal places, (0.12|34 becomes 0.12), and add a plus or minus 1 in the least significant digit, (0.12 +/- 0.01).

By comparision, the canonical confidence interval for a proportion begins by estimating the standard deviation of the empirical frequency using the formala sqrt(p(1-p)/n), where p=0.1234 is the observed frequency and n=10,000 is the number of observations used to estimate it. Then we invoke the central limit theorem and argue that the empirical frequency is approximately normally distributed. Finally, we add plus or minus 1.96 standard deviations to either side of the empirical frequency, (corresponding to the 2.5th and 97.5th percentiles on a normal distribution), to get a 95% confidence interval.

For an event that occured 1,234 out of 10,000 times, we can plug the numbers into a calculator and find that the confidence interval above will be 0.1234 +/- 0.0064, or the interval (0.1170, 0.1298). Comparing this to the "eyeball approximation" of (0.1100, 0.1300), we see that the eyeball approximation is a little bit off center, (due to the rounding), and a little bit wide, (as we'll discuss in a bit). But it's not bad for a simple trick!

Why does it work? Well, p(1-p) defines a parabola that can be upper bounded by its maximum value of 0.5*0.5, and 1.96 can be upper bounded by 2, so that 1.96*sqrt(p(1-p)/10^k) gets bounded by 10^(-k/2). (This bound shows why the ballpark approximation is a little wide). When k is even, (ie, n = 100, 10,000, 1,000,000, etc), then 10^(-k/2) corresponds to a one in the least significant digit of the frequency rounded to half its number of decimals, and the rounding itself off-centers the interval by at most half of the error bar radius.

(Of course, to apply the "half the decimal trick" literally we need to make sure to add trailing zeros to the frequency if necessary. If the event had occured 1,230 times instead, we'd need to record the frequency as 0.1230, not 0.123).

If n is instead an odd power of ten, (ie, n = 10, 1,000, 100,000, etc), then the empirical frequency will have an odd number of digits following the decimal point. In this case, when we take "half the decimal" we round the number of number of digits we keep up. And instead of adding plus or minus one in the least significant digit, we add plus or minus three, (since 3 ~ sqrt(10)).

So, for example, the confidence interval for an event that occurs 123 out of 1,000 times would get approximated as 0.12 +/- 0.03, (since 0.12|3 becomes 0.12 becomes 0.12 +/- 0.03), and the confidence interval for an event that occurs 56,789 out of 100,000 times would get approximated as 0.568 +/- 0.003, (since 0.567|89 becomes 0.568 becomes 0.568 +/- 0.003).

(If the event occurs say 2 out of 10 times, the normal approximation we invoked above really doesn't apply. But hey, this whole page is about an approximation, and the eyeballed interval 0.2 +/- 0.3 does the job of telling us that we don't have enough data to estimate the true proportion with any precision).

The "half the decimal" trick only works directly when n is an integer power of ten, and it breaks down somewhat when p is too close to 0 or 1, (since the interval gets way too wide and the normal approximation gets worse). But I hope this trick gives you enough insight and feel into the formation of binomial confidence intervals that you can figure out roughly how much precision to quote when you see some data.