What is the fastest way to find the most significant 1 bit in a C integer?
Article: 2669 of ut.dcs.general Xref: utcsri ut.cdf.general:4769 ut.dcs.general:2669 Path: utcsri!cs.toronto.edu!neto Newsgroups: ut.dcs.general,ut.cdf.general from: neto at cs toronto edu (David Neto) Subject: Fast bit fiddling question X-Nntp-Posting-Host: dvp.cs.toronto.edu Message-ID: <1997Jun26.172926.22089@jarvis.cs.toronto.edu> Organization: CS Lab, University of Toronto Distribution: tor Date: 26 Jun 97 21:29:27 GMT Lines: 64 Summary: Can I find the most significant 1 bit in a machine word in constant time? By constant I mean independent of the machine word width w. (e.g. w=32.) Can I do it in ANSI C? This problem came up in coding the innermost loop in my research code, so it's important to me. Suppose I have a positive integer stored in C language int variable t. I can find the top 1 bit in t as follows: t |= t>>1; t |= t>>2; t |= t>>4; t |= t>>8; t |= t>>16; This assumes t is a 32 bit integer. If t is 64 bits wide, then I'd have to append: t |= t>>32; This code sequence copies the 1 bit down using ORs and shifts, so that t ends up with all positions lower than the top 1 bit being filled with 1's. (That's good enough for my purposes. In fact, I can then isolate the top 1 as follows: t ^ (t>>1) That is, exclusive-or t with a right-shifted copy of t.) The above code sequence requires at least log w time. I don't know how to do it faster, besides using table lookup. But table lookup in an array with 2^w entries is just not practical. :) (I suppose I could use a hybrid approach and achieve log w / log log w time, but it's hardly worth the bother, since the code sequence above has no branches and I think is very fast on modern CPU architectures.) In contrast, I can find the least significant 1 bit in t as follows: s=t^(t-1) If the lowest 1 bit in t is in position j (counting the LSB as 0), then s has j+1 `1' bits in the low positions, and `0' bits everywhere else. This is again good enough. If I were able to reverse the order of bits in a word in constant time, then I'd be home free on finding the top 1 bit because I could use the ``find low 1 bit'' code sequence wrapped around by the appropriate bit-order reversal and masking and negating code sequences. (Exercise.) So, my question is, am I just not knowledgable enough? Or is there something inherent in ``find the top 1 bit''. Since this is a coding problem, I'd like an answer in C. Does anyone know of an architecture that has a ``reverse the order of bits in a word'' instruction? Cheers, David P.S. I've looked in HAKMEM. (See my programming web page.) -- David Neto "The definitions are the hard part. http://www.cs.utoronto.ca/ neto/home.html Once you have the definitions, the netod@acm.org (ACM Member since 1989) rest is easy." M. Akcoglu, MAT250YIn the posting I forgot to mention that finding the most significant 1 bit in a word is the same as computing
2^floor(log(t)/log(2))The exponentiation can be done with table lookup by indexing into a table of w precomputed values. The difficulty now becomes computing the floor of the base 2 logarithm of a positive integer.
t |= t>>1; t |= t>>2; t |= t>>4; t |= t>>8; t |= t>>16;If C ints are 64 bits wide, then append
t |= t>>32;I like this code sequence because all the work is done in registers. There are no cache misses to contend with. Also, there are no branches. However, there is a critical path of data dependencies. You can't have everything.
This code usually compiles to logw
instructions (SPARC,
MIPS). The main question is whether
the CPU performs shifts by many bits (2, 4, 8, 16, 32) in constant time.
Trading space for time
I rejected using table lookup on a table of 2^w integers.
However, Dave Wortman sent the following code for 32 bit integers:
unsigned char wherebit [ 2^16 ] ; /* initialized to index of leftmost bit in 16 bits, wherebit[ 0 ] = 0 */ if(( where = wherebit[ t >> 16 ] + 16 ) == 16 ) where = wherebit[ t ] ;This code splits the word into two halves, and uses table lookup on one or both halves. This idea trades space for time. In fact, one can code solutions with any table size that is a power of two. I'd also try a 256 entry table.
I wonder about taking a cache miss on the table access. Of course,
this is heavily dependent upon the context in which the code sequence
is used.
In constant time
Hugh Redelmeier and Gordon Turpin observed that floating point
numbers are normalized before storing, and suggested extracting the
exponent. Hugh pointed out the ANSI C library function
frexp. Here's how to use it:
int the_exp; frexp((double)t,&the_exp);Breifly, the cast to the double type converts integer t into a floating point value, f*2^e, where f lies between 1/2 and 1, and e is an integer. The return value of the call to frexp is f, and the integer exponent e is stored in the_exp. Then we can use table lookup, using index the_exp.
Hugh cautioned that the floating point type would have to have at least w bits in its mantissa, and even then one would have to be careful about rounding. (I believe the IEEE-754 standard mandates at least 53 bits of precision in the mantissa of a double precision value.)
This code is constant time if the floating point manipulations are constant
time.
However, in my quick test, it appears
that the generated code (SPARC and MIPS)
performs a function call to frexp.
This might take a long time, relatively speaking.
(There may be ways to make your compiler include the
frexp code inline.) Gordon Turpin sent implementation-dependent
code to extract the exponent directly.
A non-standard function call
Vince Gogan pointed out the ffs function.
It computes the floor of the base 2 logarithm of an integer.
Function ffs is a non-standard extension to the C library under SunOS.
For some reason, it's declared in the string.h header file.
Who knows what trickery it performs behind the scenes?
Tongue in cheek
From: Wayne HayesSubject: Re: Fast bit fiddling question Date: Fri, 27 Jun 1997 07:56:48 -0400 Status: RO I don't think it's possible, because if you *could* do it, then you would finish your thesis sooner than expected, and I think there's a theorem that says this is impossible. -- "C provides you with plenty of rope. If you're || Wayne Hayes | not careful, you can easily hang yourself with || wayne@cdf.utoronto.ca | it." -- CSRI 'C' introduction, many Moons ago. || Designated Loafer | "I'd hate to meet Darth Vader's evil twin." -- me
Dave Wortman mentioned that some machines have an instruction for finding the most significant 1 bit in a word. It was intended for priority scheduling code in operating systems. Unfortunately, I don't know a portable way to access it in C.
For now, my shifting and OR'ing sequence stays in my code, even if I can't call it constant time.
I thank all who took the time to reply in the newsgroups or via email.