What is the fastest way to find the most significant 1 bit in a C integer?

Article: 2669 of ut.dcs.general Xref: utcsri ut.cdf.general:4769 ut.dcs.general:2669 Path: utcsri!cs.toronto.edu!neto Newsgroups: ut.dcs.general,ut.cdf.general from: neto at cs toronto edu (David Neto) Subject: Fast bit fiddling question X-Nntp-Posting-Host: dvp.cs.toronto.edu Message-ID: <1997Jun26.172926.22089@jarvis.cs.toronto.edu> Organization: CS Lab, University of Toronto Distribution: tor Date: 26 Jun 97 21:29:27 GMT Lines: 64 Summary: Can I find the most significant 1 bit in a machine word in constant time? By constant I mean independent of the machine word width w. (e.g. w=32.) Can I do it in ANSI C? This problem came up in coding the innermost loop in my research code, so it's important to me. Suppose I have a positive integer stored in C language int variable t. I can find the top 1 bit in t as follows: t |= t>>1; t |= t>>2; t |= t>>4; t |= t>>8; t |= t>>16; This assumes t is a 32 bit integer. If t is 64 bits wide, then I'd have to append: t |= t>>32; This code sequence copies the 1 bit down using ORs and shifts, so that t ends up with all positions lower than the top 1 bit being filled with 1's. (That's good enough for my purposes. In fact, I can then isolate the top 1 as follows: t ^ (t>>1) That is, exclusive-or t with a right-shifted copy of t.) The above code sequence requires at least log w time. I don't know how to do it faster, besides using table lookup. But table lookup in an array with 2^w entries is just not practical. :) (I suppose I could use a hybrid approach and achieve log w / log log w time, but it's hardly worth the bother, since the code sequence above has no branches and I think is very fast on modern CPU architectures.) In contrast, I can find the least significant 1 bit in t as follows: s=t^(t-1) If the lowest 1 bit in t is in position j (counting the LSB as 0), then s has j+1 `1' bits in the low positions, and `0' bits everywhere else. This is again good enough. If I were able to reverse the order of bits in a word in constant time, then I'd be home free on finding the top 1 bit because I could use the ``find low 1 bit'' code sequence wrapped around by the appropriate bit-order reversal and masking and negating code sequences. (Exercise.) So, my question is, am I just not knowledgable enough? Or is there something inherent in ``find the top 1 bit''. Since this is a coding problem, I'd like an answer in C. Does anyone know of an architecture that has a ``reverse the order of bits in a word'' instruction? Cheers, David P.S. I've looked in HAKMEM. (See my programming web page.) -- David Neto "The definitions are the hard part. http://www.cs.utoronto.ca/ neto/home.html Once you have the definitions, the netod@acm.org (ACM Member since 1989) rest is easy." M. Akcoglu, MAT250YIn the posting I forgot to mention that finding the most significant 1 bit in a word is the same as computing

2^floor(log(t)/log(2))The exponentiation can be done with table lookup by indexing into a table of

t |= t>>1; t |= t>>2; t |= t>>4; t |= t>>8; t |= t>>16;If C

t |= t>>32;I like this code sequence because all the work is done in registers. There are no cache misses to contend with. Also, there are no branches. However, there is a critical path of data dependencies. You can't have everything.

This code *usually* compiles to log*w*
instructions (SPARC,
MIPS). The main question is whether
the CPU performs shifts by many bits (2, 4, 8, 16, 32) in constant time.

unsigned char wherebit [ 2^16 ] ; /* initialized to index of leftmost bit in 16 bits, wherebit[ 0 ] = 0 */ if(( where = wherebit[ t >> 16 ] + 16 ) == 16 ) where = wherebit[ t ] ;This code splits the word into two halves, and uses table lookup on one or both halves. This idea trades space for time. In fact, one can code solutions with any table size that is a power of two. I'd also try a 256 entry table.

I wonder about taking a cache miss on the table access. Of course, this is heavily dependent upon the context in which the code sequence is used.

int the_exp; frexp((double)t,&the_exp);Breifly, the cast to the

Hugh cautioned that the floating point type would have to have at least
*w* bits in its mantissa, and even then one would have to be careful
about rounding. (I believe the IEEE-754 standard mandates at least 53
bits of precision in the mantissa of a double precision value.)

This code is constant time if the floating point manipulations are constant
time.
However, in my quick test, it appears
that the generated code (SPARC and MIPS)
performs a function call to `frexp`.
This might take a long time, relatively speaking.
(There may be ways to make your compiler include the
`frexp` code inline.) Gordon Turpin sent implementation-dependent
code to extract the exponent directly.

From: Wayne HayesSubject: Re: Fast bit fiddling question Date: Fri, 27 Jun 1997 07:56:48 -0400 Status: RO I don't think it's possible, because if you *could* do it, then you would finish your thesis sooner than expected, and I think there's a theorem that says this is impossible. -- "C provides you with plenty of rope. If you're || Wayne Hayes | not careful, you can easily hang yourself with || wayne@cdf.utoronto.ca | it." -- CSRI 'C' introduction, many Moons ago. || Designated Loafer | "I'd hate to meet Darth Vader's evil twin." -- me

Dave Wortman mentioned that some machines have an instruction for finding the most significant 1 bit in a word. It was intended for priority scheduling code in operating systems. Unfortunately, I don't know a portable way to access it in C.

For now, my shifting and OR'ing sequence stays in my code, even if I can't call it constant time.

I thank all who took the time to reply in the newsgroups or via email.

Back to David Neto's code page

Back to David Neto's teaching material page

Back to David Neto's programming page

Back to David Neto's home page