February 16, 2015

Swiftly Secure, Part 3 – Dividing Huge Numbers

Posted in iPhone development, mac development tagged , , , , , at 9:05 pm by tetontech

Let’s face reality. Computer division is ugly. It’s not the case, but it might seem to some people that those who conceived the computers we use today dropped the ball. Addition and Multiplication algorithms at both code and hardware levels are fairly straight forward, even if there is significant computational cost involved. Division, on the other hand, can seem like it was ‘tacked on’ afterwards. To think this thought is to do a disservice to the computational founders. The problem didn’t arise with them. It came from how we humans have done division for millennia. The age-old long hand grade school division algorithm taught to young people around the world is the problem. It is a guess-and-check process and computers are notoriously bad at guessing. It is no wonder the founders struggled. What else did they have to base division on?

When I set out to find a division algorithm to implement for Swiftly Secure and this posting, the other common division method, inverting the divisor, jumped right out of old and new computer science patents and research. The inversion idea is that a / b is the same as a * 1/b and it may be easier to calculate 1/b than to do the original division. Modern research into division seems to be fixated on optimizing variations on this approach. Each variation seems to share the same weakness as the grade school algorithm–they end up being guess-and-check algorithms trying to outdo Isaac Newton’s division algorithm or its more modern variants and replacements. This doesn’t resolve the problem, it shifts it to an area where guessing is hopefully easier.

I’m going to make a bold statement here. Computers are bad at guessing. I wanted to come up with an algorithm where no guessing was going on to use in Swiftly Secure. I think I’ve found one. It works for integer division for single word integers and huge numbers that require multiple words to represent them. I haven’t tested the algorithm for applicability to floating point division yet.

The idea behind the solution is this–since computers work in binary maybe there is something in the binary representations of the dividend, a, the divisor, b, and the quotient, q (a / b = q) that allows calculation of the quotient without guessing. There is. Finding it was a significant process and like all good answers to interesting questions, after I found it it the solution seemed obvious. Here it is simplified.

The difference between the index of the highest on-bit of a (the dividend) and b (the divisor) is often the index of the highest on-bit of q (the quotient).

A) if a / b = q

B) then  i[a] – i[b] = i[q] (where i means the index of the highest on-bit)

Where statement B isn’t true, the difference calculated is always one too large. When the difference is off is completely predictable, calculable, and derivable from  the value of the dividend (I’ll show how to do this later in this posting).

Interesting, but h0w does this help us not guess and check? Instead of guessing and checking by attacking the dividend and/or the divisor and then seeing if the quotient is ‘close enough’, as is done by Newton’s method and its decedents, this realization allows us to attack the quotient, q, directly. As an example lets look at a simple one-word division.

 

127 / 17 = 7 Remainder 8

decimal                 binary (bigendian)              highest on-bit index (zero based)

127                        01111111                                               6

17                         00010001                                            4

7                          00000111                                              2

 

i[127] = 6,

i[17] = 4, and

i[7] = 2

6-4 = 2

The answer’s highest on-bit index is calculable. This implies an algorithm that reduces the dividend while increasing the quotient.

1) calculate the quotient’ highest on-bit index.

2) calculate the numeric value of the quotient’s highest on-bit index

3) add the numeric value calculated to a running quotient accumulator

4) subtract from the dividend the numeric value * the divisor

5) loop back to 1) until the updated dividend is less than the divisor

6) the final value of the dividend is the division’s remainder

No guessing is involved nor is there a ‘stop when close enough’ guess as in Newton’s and many other divisor inversion methods. The code for this algorithm, even for huge multi-word unsigned integers, lays out easily in Swift.

func divide(var dividend:[UInt64], divisor:[UInt64]) 
                                  ->([UInt64],[UInt64],NSError?){
    var quotient = [UInt64]()
    /*
     * sanity checking for division by zero, empty arrays, and 
     * where the divisor > dividend goes here
     */
    //5) loop until the divisor > dividend
    while compare(dividend, divisor) >= 0{
        //1) calculate the highest on-bit index
        let highestQuotientBit = calculateHighBitIndex(dividend, divisor)
        //2) calculate the numeric value of the highest on-bit
        let dividendPartialValue = shift_1_LeftBy(highestQuotientBit)
        //3) update the value of the quotient collector
        quotient = add(quotient, dividendPartialValue)
        //4 subtract the calculated numeric value from the dividend
        let tooSmallBy = multiply(dividendPartialValue, divisor)
        dividend = subtract(dividend, tooSmallBy)
    }
    //the amount left in the dividend is the remainder
    return (removeLeadingZeros(quotient),removeLeadingZeros(dividend),nil)
}

In this algorithm I take advantage of the multiplication algorithm discussed and implemented in a previous post. There are several additional helper functions I’ve created to make the divide function’s implementation easier to understand.

The most interesting of these helper functions is calculateHighBitIndex. It calculates the index of the highest on-bit of the quotient for any divisor-dividend pair. To help you understand it, I’ll show the equations for calculating when the difference between the dividend and divisor’s highest on-bit index, described above, is too high by 1.

when a / b = q

if  a <= (b *  (1 << (i[a] – i[b]))) – 1

then i[a] – i[b] is too large by 1

This is a pattern I found by examining every integer division possible using unsigned 8 bit integers. I’ve been running an exhaustive check of all possible UInt64 divisions. It hasn’t failed yet but is obviously taking a long time. I’m also working on a mathematical proof but won’t bore you with it here.

Describing the equations using words, if 1 is left-shifted by the highest on-bit index difference, the result is then multiplied by the dividend, and reduced by 1, we get a value for the largest dividend where the highest on-bit index difference is too large. The source code is as follows.

//calculate the highest on-bit of a multi-word quotient 
//given multi-word dividends and divisors
func calculateHighBitIndex(var dividend:[UInt64], var divisor:[UInt64]) 
                                                                ->UInt64{
    var calculatedIndex:UInt64 = 0
    if dividend != divisor{
        //cleanup the dividend and the divisor before processing
        //by removing any leading zero valued array elements
        dividend = removeLeadingZeros(dividend)
        divisor = removeLeadingZeros(divisor)
        //dividend's highest bit location ( log2 dividend) 
        //for 64 bit array elements
        let dividendLog2 = UInt64(floorLogBase2(dividend[0]) 
                      + (dividend.count - 1 ) << 6)
        //divisor's highest bit location (log2 divisor)
        //for 64 bit array elements
        let divisorLog2 = UInt64(floorLogBase2(divisor[0]) 
                      + (divisor.count - 1) << 6)
        //calculate the inter-highest on-bit index
        calculatedIndex = dividendLog2 - divisorLog2
        //calculate the maximum dividend value yielding the error
        var maxLocation:[UInt64] = subtract(multiply(divisor,
                                 shift_1_LeftBy(calculatedIndex)), 
                                                            [UInt64(1)])
        //check if the calculated index needs to be reduced by 1
        if compare (dividend, maxLocation) <= 0{
            calculatedIndex--
        }
    }
    return calculatedIndex
}

I created the floorLogBase2 function because the built-in log function works on doubles, is imprecise, and there is an easy, efficient algorithm to calculate log base 2 for unsigned ints. The code for this function is

func floorLogBase2(var x:UInt64) ->Int{
    //calculate the numeric value of the highest bit (1,2,4,8,etc.)
    x |= (x >> UInt64(1))
    x |= (x >> UInt64(2))
    x |= (x >> UInt64(4))
    x |= (x >> UInt64(8))
    x |= (x >> UInt64(16))
    x |= (x >> UInt64(32))
    let numericValueOfHighestBit = (x - (x >> 1))
    
    //lookup log base 2 of the value
    //it is faster to look up the log than calculate it
    return bitLocations[numericValueOfHighestBit]!
}

The last line of the floorLogBase2 function uses a Dictionary to look up the logarithm. The Dictionary’s key is 2^n and the value is n.

The other helper functions (compare, add, subtract, shift_1_leftBy, and removeLeadingZeros) aren’t that interesting. The source code for them will be part of the open source Swiftly Secure library I’ll be putting on GitHub when its complete.

When I started looking for a way to divide huge multi-word numbers, I wanted to find a way to do computerized division without the guesses required by other modern methods. I found one and a way to implement it in Swift. As I wrote the Swift code, I kept optimal computational practice in mind, but I make no claims regarding how ‘optimal’ this algorithm is compared to others. I plan on doing theoretical (big O) and actual comparisons a little later. First I have to find a way to efficiently raise huge multi-word numbers to huge multi-word powers without repeatedly multiplying the base number by itself. This is the last missing piece to completing Swiftly Secure’s implementation of the socialist millionaire protocol.

[UInt64]^[UInt64] is my next task and posting.

May 24, 2008

Handling very large integers in Erlang

Posted in erlang development tagged , , , , , , , , , , at 5:03 pm by tetontech

As mentioned in a previous post, Erlang appears to handle very large integers whose limit is RAM and other hardware limitations of the machine. In fact I have calculated 50,000! several times on my machine.
Multiplication, subtraction, and addition all work with these numbers without any programmer interference in Erlang. Division and other functionality uses the underlying OS functionality and so do not work with these large numbers.
I have written some code that allows division to work with these large numbers as well as a few other items such as m_choose_n, factorial, and floor. I have not spent time to evaluate speed optimizations for the code other than to use bit shifting to handle the actual division.
The division functions are:

  1. big_rem – similar to mod. It returns only the remainder of the division
  2. big_div_with_rem – returns both an integer quotient and remainder
  3. big_divide – returns a floating point value if it is possible to calculate it or an integer quotient and remainder if it is not possible

The code below is part of my simulation engine written in Erlang and is used in conjunction with the sim_dist library that contains random number distribution streams such as normal, binomial, negative binomial, and other types of streams the code for which will become available on sourceforge when I have finished testing it.
%% Copyright 2008 Lee S. Barney

%% This file is part of erl_sim.

%% erl_sim is free software: you can redistribute it and/or modify
%% it under the terms of the GNU Lesser General Public License as published by
%% the Free Software Foundation, either version 3 of the License, or
%% (at your option) any later version.
%%
%% erl_sim is distributed in the hope that it will be useful,
%% but WITHOUT ANY WARRANTY; without even the implied warranty of
%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
%% GNU Lesser General Public License for more details.
%%
%% You should have received a copy of the GNU Lesser General Public License
%% along with QuickConnectiPhoneHybrid. If not, see http://www.gnu.org/licenses/.
%% Description: This file contains functions for doing division with very large numbers as well as some other
%% useful functions
-module(sim_math).

%%
%% Include files
%%

%%
%% Exported Functions
%%
-export([big_divide/2, big_div_with_rem/2, big_rem/2, floor/1,
factorial/1, m_choose_n/2]).

%%
%% API Functions
%%

%%
%% TODO: Add description of div_with_rem/function_arity
%%
big_div_with_rem(Dividend, Divisor) ->
if
Divisor /= 0 ->
{ok,New_divisor,Cnt} = count_div_shifts(abs(Dividend), abs(Divisor), 0),
case big_div_with_rem_helper(abs(Dividend), abs(New_divisor),Cnt,0) of
{ok, Quotient, Remainder} when (Dividend < 0) xor (Divisor
{ok,-Quotient,-Remainder};
{ok, Quotient, Remainder} ->
{ok, Quotient, Remainder};
{error,Why} ->
{error,Why};
true ->
{error,"Unknown error"}
end;
true ->
{error,"Division by Zero error"}
end.

%%
%% TODO: Add description of big_divide/function_arity
%%
big_divide(Dividend, Divisor)->
%double limit. Erlang uses underlying system algorithms for floating point mathematics
Limit = math:pow(2,1023),
case big_div_with_rem(Dividend, Divisor) of
{ok,Quotient,Remainder} ->
if
Remainder == 0 ->
{ok, Quotient};

%the quotient must be less than the limit or else and error will happen
%when the result floating point value is added to it
%the remainder and the divisor must be less than the limit or else the
%division will fail.

Remainder < Limit andalso Divisor < Limit andalso Quotient
Result = Remainder/abs(Divisor),
io:format("Remainder/Divisor:~p Quotient:~p~n",[Result,Quotient]),
{ok, Quotient+Result};
true ->
{ok, Quotient,Remainder}
end;
{error,Why} ->
{error,Why}
end.

%%
%% TODO: Add description of big_rem/function_arity
%%
big_rem(A, B) ->
if
B /= 0 ->
{ok,New_divisor,Cnt} = count_div_shifts(abs(A), abs(B), 0),
case big_div_with_rem_helper(abs(A), abs(New_divisor),Cnt,0) of
{ok, Quotient, Remainder} when (A < 0) xor (B
{ok,-Remainder};
{ok, Quotient, Remainder} ->
{ok, Remainder};
{error,Why} ->
{error,Why};
true ->
{error,"Unknown error"}
end;
true ->
{error,"Division by Zero error"}
end.

%%
%% TODO: Add description of floor/function_arity
%%
floor(X) ->
T = trunc(X),
case X - T T - 1;
false -> T
end.

%factorial of N

factorial(N) ->
fact_helper(N, 1).

fact_helper(0, Product) ->
Product;
fact_helper(1, Product) ->
Product;
fact_helper(N, Product) ->
%io:format("N: ~p Product: ~p~n",[N, Product]),
fact_helper(N - 1, Product * N).

m_choose_n(M,N) ->
if
N > M orelse N < 0 orelse M 0;
true ->
factorial(M)/(factorial(N)*factorial(M-N))
end.

%%
%% Local Functions
%%
big_div_with_rem_helper(Dividend, Divisor, Count, Quotient) ->
%io:format("Dividend:~p Divisor:~p Count:~p~n",[Dividend, Divisor, Count, Quotient]),
if
Count
{ok, Quotient, Dividend};
true ->
if
Divisor =
big_div_with_rem_helper(Dividend-Divisor, Divisor bsr 1, Count-1, (Quotient bsl 1) +1);
true ->
big_div_with_rem_helper(Dividend, Divisor bsr 1, Count -1, Quotient bsl 1)
end
end.

count_div_shifts(Dividend, Divisor, Cnt) ->
if
Divisor =
count_div_shifts(Dividend, Divisor bsl 1, Cnt+1);
true ->
{ok,Divisor bsr 1, Cnt -1}
end.

%d bloggers like this: