David Neto's code tip: top 1 bit

Created July 3, 1997

What is the fastest way to find the most significant 1 bit in a C integer?

The problem
Solutions
Conclusion

The problem

Here's my posting to ut.dcs.general and ut.cdf.general:

Article: 2669 of ut.dcs.general
Xref: utcsri ut.cdf.general:4769 ut.dcs.general:2669
Path: utcsri!cs.toronto.edu!neto
Newsgroups: ut.dcs.general,ut.cdf.general
from: neto at cs toronto edu (David Neto)
Subject: Fast bit fiddling question
X-Nntp-Posting-Host: dvp.cs.toronto.edu
Message-ID: <1997Jun26.172926.22089@jarvis.cs.toronto.edu>
Organization: CS Lab, University of Toronto
Distribution: tor
Date: 26 Jun 97 21:29:27 GMT
Lines: 64


Summary:  Can I find the most significant 1 bit in a machine word in
constant time?  By constant I mean independent of the machine word
width w. (e.g. w=32.)  Can I do it in ANSI C?

This problem came up in coding the innermost loop in my research code,
so it's important to me.

Suppose I have a positive integer stored in C language int variable t. 
I can find the top 1 bit in t as follows:

    t |= t>>1;
    t |= t>>2;
    t |= t>>4;
    t |= t>>8;
    t |= t>>16;

This assumes t is a 32 bit integer.  If t is 64 bits wide, then I'd have
to append:

    t |= t>>32;

This code sequence copies the 1 bit down using ORs and shifts, so that
t ends up with all positions lower than the top 1 bit being filled with
1's.  (That's good enough for my purposes.  In fact, I can then
isolate the top 1 as follows: 
	t ^ (t>>1)
That is, exclusive-or t with a right-shifted copy of t.)

The above code sequence requires at least log w time.

I don't know how to do it faster, besides using table lookup.
But table lookup in an array with 2^w entries is just not practical.  :)
(I suppose I could use a hybrid approach and achieve log w / log log w
time, but it's hardly worth the bother, since the code sequence above
has no branches and I think is very fast on modern CPU architectures.)

In contrast, I can find the least significant 1 bit in t as follows:
	s=t^(t-1)
If the lowest 1 bit in t is in position j (counting the LSB as 0), 
then s has j+1 `1' bits in the low positions, and `0' bits everywhere else.
This is again good enough.

If I were able to reverse the order of bits in a word in constant time,
then I'd be home free on finding the top 1 bit because I could use the
``find low 1 bit'' code sequence wrapped around by the appropriate
bit-order reversal and masking and negating code sequences.
(Exercise.)

So, my question is, am I just not knowledgable enough?  Or is there
something inherent in ``find the top 1 bit''.  Since this is a coding
problem, I'd like an answer in C.  Does anyone know of an architecture
that has a ``reverse the order of bits in a word'' instruction?


Cheers,
David

P.S.  I've looked in HAKMEM.  (See my programming web page.)

-- 
David Neto                                  "The definitions are the hard part.
http://www.cs.utoronto.ca/ neto/home.html    Once you have the definitions, the
netod@acm.org (ACM Member since 1989)        rest is easy." M. Akcoglu, MAT250Y

In the posting I forgot to mention that finding the most significant 1 bit in a word is the same as computing

2^floor(log(t)/log(2))

The exponentiation can be done with table lookup by indexing into a table of w precomputed values. The difficulty now becomes computing the floor of the base 2 logarithm of a positive integer.

Solutions

Here are the best solutions I got, including my own. I thank Vince Gogan, Kal Lin, Hugh Redelmeier, Gordon Turpin, and Dave Wortman for working code. (Some people sent code for a different problem. Alas.)

In log w time

Here's my original solution.

    t |= t>>1;
    t |= t>>2;
    t |= t>>4;
    t |= t>>8;
    t |= t>>16;

If C ints are 64 bits wide, then append

    t |= t>>32;

I like this code sequence because all the work is done in registers. There are no cache misses to contend with. Also, there are no branches. However, there is a critical path of data dependencies. You can't have everything.

This code usually compiles to logw instructions (SPARC, MIPS). The main question is whether the CPU performs shifts by many bits (2, 4, 8, 16, 32) in constant time.

Trading space for time

I rejected using table lookup on a table of 2^w integers. However, Dave Wortman sent the following code for 32 bit integers:

        unsigned char wherebit [ 2^16 ] ;
        /* initialized to index of leftmost bit in 16 bits, 
           wherebit[ 0 ] = 0 
        */
        if(( where = wherebit[ t >> 16 ] + 16 ) == 16 )
                where = wherebit[ t ] ;

This code splits the word into two halves, and uses table lookup on one or both halves. This idea trades space for time. In fact, one can code solutions with any table size that is a power of two. I'd also try a 256 entry table.

I wonder about taking a cache miss on the table access. Of course, this is heavily dependent upon the context in which the code sequence is used.

In constant time

Hugh Redelmeier and Gordon Turpin observed that floating point numbers are normalized before storing, and suggested extracting the exponent. Hugh pointed out the ANSI C library function frexp. Here's how to use it:

        int the_exp;
        frexp((double)t,&the_exp);

Breifly, the cast to the double type converts integer t into a floating point value, f*2^e, where f lies between 1/2 and 1, and e is an integer. The return value of the call to frexp is f, and the integer exponent e is stored in the_exp. Then we can use table lookup, using index the_exp.

Hugh cautioned that the floating point type would have to have at least w bits in its mantissa, and even then one would have to be careful about rounding. (I believe the IEEE-754 standard mandates at least 53 bits of precision in the mantissa of a double precision value.)

This code is constant time if the floating point manipulations are constant time. However, in my quick test, it appears that the generated code (SPARC and MIPS) performs a function call to frexp. This might take a long time, relatively speaking. (There may be ways to make your compiler include the frexp code inline.) Gordon Turpin sent implementation-dependent code to extract the exponent directly.

A non-standard function call

Vince Gogan pointed out the ffs function. It computes the floor of the base 2 logarithm of an integer. Function ffs is a non-standard extension to the C library under SunOS. For some reason, it's declared in the string.h header file. Who knows what trickery it performs behind the scenes?

Tongue in cheek

From:	Wayne Hayes 
Subject: Re: Fast bit fiddling question
Date:	Fri, 27 Jun 1997 07:56:48 -0400
Status: RO

I don't think it's possible, because if you *could* do it, then you
would finish your thesis sooner than expected, and I think there's a
theorem that says this is impossible.

-- 
"C provides you with plenty of rope.  If you're ||        Wayne Hayes       |
not careful, you can easily hang yourself with  ||  wayne@cdf.utoronto.ca   |
it."  -- CSRI 'C' introduction, many Moons ago. ||    Designated Loafer     |
             "I'd hate to meet Darth Vader's evil twin." -- me

Conclusion

Clearly, there is more than one way to slice a banana. Sometimes ``fewer instructions'' doesn't mean ``faster code''. On today's machines, all sorts of other factors come into play: caches, register usage, pipelining, etc. One might achieve ``constant time'' in theory, but in practice one may be foiled by implementation choices in the underlying language, compiler and CPU.

Dave Wortman mentioned that some machines have an instruction for finding the most significant 1 bit in a word. It was intended for priority scheduling code in operating systems. Unfortunately, I don't know a portable way to access it in C.

For now, my shifting and OR'ing sequence stays in my code, even if I can't call it constant time.

I thank all who took the time to reply in the newsgroups or via email.

Back to David Neto's code page
Back to David Neto's teaching material page
Back to David Neto's programming page
Back to David Neto's home page