Last week someone I mentor asked for help with a programming challenge that involved some bit twiddling. The first part of the problem was
A number is known as special bit number if its binary representation contains atleast two consecutive 1’s or set bits. For example 7 wih binary representation
111
is a special bit number. Similarly 3 (11
) is also a special bit number.
The naive implementation was quite simple but got me thinking if there were better options.
1. Right Shift and Counter
This is the most straightforward approach where we check the least significant
bit (LSB) and increment a counter if it is 1
. As soon as the counter hits 2
we exit or reset to 0
if the bit is not set.
func isSpecial(n uint32) bool {
for wasLastBitOne := false; n > 0; n = n >> 1 {
isCurrentBitOne := n%2 == 1
if isCurrentBitOne && wasLastBitOne {
return true
}
wasLastBitOne = isCurrentBitOne
}
return false
}
Now let’s write a quick benchmark to see how it performs.
func BenchmarkIsSpecial(b *testing.B) {
for n := 0; n < b.N; n++ {
isSpecial(uint32(n))
}
}
And the results are: you can perform about a 100 million of these per second.
> go test -bench 'BenchmarkIsSpecial$'
goos: darwin
goarch: amd64
pkg: github.com/zqureshi/special
BenchmarkIsSpecial-8 200000000 9.46 ns/op
PASS
ok github.com/zqureshi/special 2.861s
2. Speedup via Lookup Table
We are working with 32-bit unsigned integers and notice that checking an 8-bit long subsequence is no different than checking the whole number. For an 8-bit sequence we have 256 (2**8) possible combinations and can precompute a lookup table of all possible results.
var lookupTable = [256]bool{}
func init() {
fmt.Println("Recomputing lookup table...")
for i := uint32(0); i < 256; i++ {
lookupTable[i] = isSpecial(i)
}
}
func isSpecialLookup(n uint32) bool {
if lookupTable[3] == false {
panic("Lookup table not initialized!")
}
return lookupTable[uint8(n)] ||
lookupTable[uint8(n>>7)] ||
lookupTable[uint8(n>>14)] ||
lookupTable[uint8(n>>21)] ||
lookupTable[uint8(n>>24)]
}
An interesting thing to note here is that we cannot just check four bit ranges
0-7, 8-15, 16-23, 28-31
because we will miss numbers that have consecutive
bits on the boundary i.e. bit 7 and 8, therefore we must always check
overlapping ranges 0-7, 7-14, 14-21, 21-28, 24-31
. In the last comparison of
bits 24-31
we re-check bits 24-28
but it is easier to just do so than special
case it and add extra logic.
And now let’s also benchmark this.
func BenchmarkIsSpecialLookup(b *testing.B) {
for n := 0; n < b.N; n++ {
isSpecialLookup(uint32(n))
}
}
> go test -bench '.'
Recomputing lookup table...
goos: darwin
goarch: amd64
pkg: github.com/zqureshi/special
BenchmarkIsSpecial-8 200000000 9.85 ns/op
BenchmarkIsSpecialLookup-8 1000000000 2.60 ns/op
PASS
ok github.com/zqureshi/special 5.817s
We get an almost 4x speedup using this approach as we are doing lesser comparisons and memory writes but also due to the fact that the whole lookup table fits in 32 bytes of memory which sits cosily in a 64 byte cache line.
3. Larger Lookup Table
We could extend the previous approach and precompute 65536 (2**16) values which would
allow us to cover the whole number in just 3 comparisons 0-15, 15-30, 16-31
.
func isSpecialLookup16(n uint32) bool {
if lookupTable[3] == false {
panic("Lookup table not initialized!")
}
return lookupTable[uint16(n)] ||
lookupTable[uint16(n>>15)] ||
lookupTable[uint16(n>>16)]
}
And benchmark
func BenchmarkIsSpecialLookup16(b *testing.B) {
for n := 0; n < b.N; n++ {
isSpecialLookup16(uint32(n))
}
}
Recomputing lookup table...
goos: darwin
goarch: amd64
pkg: github.com/zqureshi/special
BenchmarkIsSpecial-8 200000000 9.57 ns/op
BenchmarkIsSpecialLookup-8 1000000000 2.62 ns/op
BenchmarkIsSpecialLookup16-8 1000000000 2.31 ns/op
PASS
ok github.com/zqureshi/special 8.325s
A 12% improvement in runtime but a 256x blowup in table size (32 bytes vs 8192 bytes). While this performs better in this synthetic benchmark it might not perform as well in production because of cache line and page misses.
4. Finding Something Better
At this point I was quite happy with the progress but was wondering if there was some intrinsic property of numbers that could be exploited to make this even faster. I landed upon this HackerRank post and a collection of bit twiddling hacks from Stanford’s graphics department.
The solution was quite ingenious, if you take any binary number it will have
runs of set bits followed by one or more unset bits. Now if you left shift the
number, you move the 0
left and now if you bitwise and it with the original
number you have effectively cleared one set bit from a run. If you repeat this
process you will clear another bit, and another until the number is 0.
A worked out example will demonstrate this better.
14 = 1110
14 << 1 = 11100
a & b = 01100
12 = 1100
12 << 1 = 11000
a & b = 01000
8 = 1000
8 << 1 = 10000
a & b = 00000
We can then simplify this algorithm to derive that if a number has at least two consecutive bits set, then left shift followed by bitwise and will result in a non-zero value. Let’s work through this.
12 = 1100
12 << 1 = 11000
a & b = 01000
10 = 1010
10 << 1 = 10100
a & b = 00000
So now the code simply becomes
func isSpecialLeftShift(n uint32) bool {
return (n & (n << 1)) > 0
}
and benchmark
func BenchmarkIsSpecialLeftShift(b *testing.B) {
for n := 0; n < b.N; n++ {
isSpecialLeftShift(uint32(n))
}
}
Recomputing lookup table...
goos: darwin
goarch: amd64
pkg: github.com/zqureshi/special
BenchmarkIsSpecial-8 200000000 9.55 ns/op
BenchmarkIsSpecialLookup-8 1000000000 2.60 ns/op
BenchmarkIsSpecialLookup16-8 1000000000 2.32 ns/op
BenchmarkIsSpecialLeftShift-8 2000000000 0.27 ns/op
PASS
ok github.com/zqureshi/special 8.871s
This is a 35x improvement over our naive approach and only involves 3 instructions, which means it is also a candidate for inlining.
5. Epilogue
One thing that we didn’t talk about was how did I verify that my implementations were correct? I implemented the naive version and manually tested it on various inputs. Then for every other implementation I iterated through the first billion integers and reconciled the optimized against the naive.
func main() {
for n := uint32(0); n < 1000000000; n++ {
result := isSpecial(n)
if isSpecialLookup(n) != result {
fmt.Printf("Error: %v == %b isSpecial: %v, isSpecialLookup: %v\n", n, n, result, isSpecialLookup(n))
break
}
if isSpecialLookup16(n) != result {
fmt.Printf("Error: %v == %b isSpecial: %v, isSpecialLookup16: %v\n", n, n, result, isSpecialLookup16(n))
break
}
if isSpecialLeftShift(n) != result {
fmt.Printf("Error: %v == %b isSpecial: %v, isSpecialLeftShift: %v\n", n, n, result, isSpecialLeftShift(n))
break
}
}
}