February 16, 2015

Swiftly Secure, Part 3 – Dividing Huge Numbers

Posted in iPhone development, mac development tagged , , , , , at 9:05 pm by tetontech

Let’s face reality. Computer division is ugly. It’s not the case, but it might seem to some people that those who conceived the computers we use today dropped the ball. Addition and Multiplication algorithms at both code and hardware levels are fairly straight forward, even if there is significant computational cost involved. Division, on the other hand, can seem like it was ‘tacked on’ afterwards. To think this thought is to do a disservice to the computational founders. The problem didn’t arise with them. It came from how we humans have done division for millennia. The age-old long hand grade school division algorithm taught to young people around the world is the problem. It is a guess-and-check process and computers are notoriously bad at guessing. It is no wonder the founders struggled. What else did they have to base division on?

When I set out to find a division algorithm to implement for Swiftly Secure and this posting, the other common division method, inverting the divisor, jumped right out of old and new computer science patents and research. The inversion idea is that a / b is the same as a * 1/b and it may be easier to calculate 1/b than to do the original division. Modern research into division seems to be fixated on optimizing variations on this approach. Each variation seems to share the same weakness as the grade school algorithm–they end up being guess-and-check algorithms trying to outdo Isaac Newton’s division algorithm or its more modern variants and replacements. This doesn’t resolve the problem, it shifts it to an area where guessing is hopefully easier.

I’m going to make a bold statement here. Computers are bad at guessing. I wanted to come up with an algorithm where no guessing was going on to use in Swiftly Secure. I think I’ve found one. It works for integer division for single word integers and huge numbers that require multiple words to represent them. I haven’t tested the algorithm for applicability to floating point division yet.

The idea behind the solution is this–since computers work in binary maybe there is something in the binary representations of the dividend, a, the divisor, b, and the quotient, q (a / b = q) that allows calculation of the quotient without guessing. There is. Finding it was a significant process and like all good answers to interesting questions, after I found it it the solution seemed obvious. Here it is simplified.

The difference between the index of the highest on-bit of a (the dividend) and b (the divisor) is often the index of the highest on-bit of q (the quotient).

A) if a / b = q

B) then  i[a] – i[b] = i[q] (where i means the index of the highest on-bit)

Where statement B isn’t true, the difference calculated is always one too large. When the difference is off is completely predictable, calculable, and derivable from  the value of the dividend (I’ll show how to do this later in this posting).

Interesting, but h0w does this help us not guess and check? Instead of guessing and checking by attacking the dividend and/or the divisor and then seeing if the quotient is ‘close enough’, as is done by Newton’s method and its decedents, this realization allows us to attack the quotient, q, directly. As an example lets look at a simple one-word division.

 

127 / 17 = 7 Remainder 8

decimal                 binary (bigendian)              highest on-bit index (zero based)

127                        01111111                                               6

17                         00010001                                            4

7                          00000111                                              2

 

i[127] = 6,

i[17] = 4, and

i[7] = 2

6-4 = 2

The answer’s highest on-bit index is calculable. This implies an algorithm that reduces the dividend while increasing the quotient.

1) calculate the quotient’ highest on-bit index.

2) calculate the numeric value of the quotient’s highest on-bit index

3) add the numeric value calculated to a running quotient accumulator

4) subtract from the dividend the numeric value * the divisor

5) loop back to 1) until the updated dividend is less than the divisor

6) the final value of the dividend is the division’s remainder

No guessing is involved nor is there a ‘stop when close enough’ guess as in Newton’s and many other divisor inversion methods. The code for this algorithm, even for huge multi-word unsigned integers, lays out easily in Swift.

func divide(var dividend:[UInt64], divisor:[UInt64]) 
                                  ->([UInt64],[UInt64],NSError?){
    var quotient = [UInt64]()
    /*
     * sanity checking for division by zero, empty arrays, and 
     * where the divisor > dividend goes here
     */
    //5) loop until the divisor > dividend
    while compare(dividend, divisor) >= 0{
        //1) calculate the highest on-bit index
        let highestQuotientBit = calculateHighBitIndex(dividend, divisor)
        //2) calculate the numeric value of the highest on-bit
        let dividendPartialValue = shift_1_LeftBy(highestQuotientBit)
        //3) update the value of the quotient collector
        quotient = add(quotient, dividendPartialValue)
        //4 subtract the calculated numeric value from the dividend
        let tooSmallBy = multiply(dividendPartialValue, divisor)
        dividend = subtract(dividend, tooSmallBy)
    }
    //the amount left in the dividend is the remainder
    return (removeLeadingZeros(quotient),removeLeadingZeros(dividend),nil)
}

In this algorithm I take advantage of the multiplication algorithm discussed and implemented in a previous post. There are several additional helper functions I’ve created to make the divide function’s implementation easier to understand.

The most interesting of these helper functions is calculateHighBitIndex. It calculates the index of the highest on-bit of the quotient for any divisor-dividend pair. To help you understand it, I’ll show the equations for calculating when the difference between the dividend and divisor’s highest on-bit index, described above, is too high by 1.

when a / b = q

if  a <= (b *  (1 << (i[a] – i[b]))) – 1

then i[a] – i[b] is too large by 1

This is a pattern I found by examining every integer division possible using unsigned 8 bit integers. I’ve been running an exhaustive check of all possible UInt64 divisions. It hasn’t failed yet but is obviously taking a long time. I’m also working on a mathematical proof but won’t bore you with it here.

Describing the equations using words, if 1 is left-shifted by the highest on-bit index difference, the result is then multiplied by the dividend, and reduced by 1, we get a value for the largest dividend where the highest on-bit index difference is too large. The source code is as follows.

//calculate the highest on-bit of a multi-word quotient 
//given multi-word dividends and divisors
func calculateHighBitIndex(var dividend:[UInt64], var divisor:[UInt64]) 
                                                                ->UInt64{
    var calculatedIndex:UInt64 = 0
    if dividend != divisor{
        //cleanup the dividend and the divisor before processing
        //by removing any leading zero valued array elements
        dividend = removeLeadingZeros(dividend)
        divisor = removeLeadingZeros(divisor)
        //dividend's highest bit location ( log2 dividend) 
        //for 64 bit array elements
        let dividendLog2 = UInt64(floorLogBase2(dividend[0]) 
                      + (dividend.count - 1 ) << 6)
        //divisor's highest bit location (log2 divisor)
        //for 64 bit array elements
        let divisorLog2 = UInt64(floorLogBase2(divisor[0]) 
                      + (divisor.count - 1) << 6)
        //calculate the inter-highest on-bit index
        calculatedIndex = dividendLog2 - divisorLog2
        //calculate the maximum dividend value yielding the error
        var maxLocation:[UInt64] = subtract(multiply(divisor,
                                 shift_1_LeftBy(calculatedIndex)), 
                                                            [UInt64(1)])
        //check if the calculated index needs to be reduced by 1
        if compare (dividend, maxLocation) <= 0{
            calculatedIndex--
        }
    }
    return calculatedIndex
}

I created the floorLogBase2 function because the built-in log function works on doubles, is imprecise, and there is an easy, efficient algorithm to calculate log base 2 for unsigned ints. The code for this function is

func floorLogBase2(var x:UInt64) ->Int{
    //calculate the numeric value of the highest bit (1,2,4,8,etc.)
    x |= (x >> UInt64(1))
    x |= (x >> UInt64(2))
    x |= (x >> UInt64(4))
    x |= (x >> UInt64(8))
    x |= (x >> UInt64(16))
    x |= (x >> UInt64(32))
    let numericValueOfHighestBit = (x - (x >> 1))
    
    //lookup log base 2 of the value
    //it is faster to look up the log than calculate it
    return bitLocations[numericValueOfHighestBit]!
}

The last line of the floorLogBase2 function uses a Dictionary to look up the logarithm. The Dictionary’s key is 2^n and the value is n.

The other helper functions (compare, add, subtract, shift_1_leftBy, and removeLeadingZeros) aren’t that interesting. The source code for them will be part of the open source Swiftly Secure library I’ll be putting on GitHub when its complete.

When I started looking for a way to divide huge multi-word numbers, I wanted to find a way to do computerized division without the guesses required by other modern methods. I found one and a way to implement it in Swift. As I wrote the Swift code, I kept optimal computational practice in mind, but I make no claims regarding how ‘optimal’ this algorithm is compared to others. I plan on doing theoretical (big O) and actual comparisons a little later. First I have to find a way to efficiently raise huge multi-word numbers to huge multi-word powers without repeatedly multiplying the base number by itself. This is the last missing piece to completing Swiftly Secure’s implementation of the socialist millionaire protocol.

[UInt64]^[UInt64] is my next task and posting.

Advertisements

May 22, 2008

Numeric Limitations in Erlang

Posted in erlang development tagged , , , , at 4:29 pm by tetontech

The debugging process for the erlang distribution code continues. While debugging the binomial distribution last night I wrote a naive factorial calculator the code for which is common.
%factorial of N

factorial(N) ->
fact_helper(N, 1).
fact_helper(0, Product) ->
Product;
fact_helper(1, Product) ->
Product;
fact_helper(N, Product) ->
fact_helper(N - 1, Product * N).

Out of curiosity I started plugging in large numbers. The largest I have tried is 50,000! It took about 1.5 seconds to calculate and the result is much to large to post here. In fact, I was going to post it here but the number is too large to fit in the clipboard on my machine. Even if the sequence of digits was divided into 6 segments the segments still don’t fit in the clipboard.
This seems to indicate that for integer addition, subtraction, and multiplication the upper numerical bound is limited only by your hardware.

Division is different. It appears to use standard floating point type algorithms. Floating point numbers appear to have a limit similar to other languages and causes an error if the numbers are too large.
For example
sim_dist:factorial(20)/sim_dist:factorial(19).
yeilds 20 as expected.
sim_dist:factorial(200)/sim_dist:factorial(199).
** exception error: bad argument in an arithmetic expression
in operator '/'/2

throws a bad argument exception for the division operator. Floating point numbers therefor must have a maximum size that is less than 199!

An amazing language.

%d bloggers like this: