August 27, 2018

Efficiently Checking if a Number Has Two Consecutive Bits Set

Last week someone I mentor asked for help with a programming challenge that involved some bit twiddling. The first part of the problem was

A number is known as special bit number if its binary representation contains atleast two consecutive 1’s or set bits. For example 7 wih binary representation 111 is a special bit number. Similarly 3 (11) is also a special bit number.

The naive implementation was quite simple but got me thinking if there were better options.

1. Right Shift and Counter

This is the most straightforward approach where we check the least significant bit (LSB) and increment a counter if it is 1. As soon as the counter hits 2 we exit or reset to 0 if the bit is not set.

func isSpecial(n uint32) bool {
	for wasLastBitOne := false; n > 0; n = n >> 1 {
		isCurrentBitOne := n%2 == 1
		if isCurrentBitOne && wasLastBitOne {
			return true
		}
		wasLastBitOne = isCurrentBitOne
	}

	return false
}

Now let’s write a quick benchmark to see how it performs.

func BenchmarkIsSpecial(b *testing.B) {
	for n := 0; n < b.N; n++ {
		isSpecial(uint32(n))
	}
}

And the results are: you can perform about a 100 million of these per second.

> go test -bench 'BenchmarkIsSpecial$'
goos: darwin
goarch: amd64
pkg: github.com/zqureshi/special
BenchmarkIsSpecial-8    200000000          9.46 ns/op
PASS
ok    github.com/zqureshi/special 2.861s

2. Speedup via Lookup Table

We are working with 32-bit unsigned integers and notice that checking an 8-bit long subsequence is no different than checking the whole number. For an 8-bit sequence we have 256 (2**8) possible combinations and can precompute a lookup table of all possible results.

var lookupTable = [256]bool{}

func init() {
	fmt.Println("Recomputing lookup table...")
	for i := uint32(0); i < 256; i++ {
		lookupTable[i] = isSpecial(i)
	}
}

func isSpecialLookup(n uint32) bool {
	if lookupTable[3] == false {
		panic("Lookup table not initialized!")
	}

	return lookupTable[uint8(n)] ||
		lookupTable[uint8(n>>7)] ||
		lookupTable[uint8(n>>14)] ||
		lookupTable[uint8(n>>21)] ||
		lookupTable[uint8(n>>24)]
}

An interesting thing to note here is that we cannot just check four bit ranges 0-7, 8-15, 16-23, 28-31 because we will miss numbers that have consecutive bits on the boundary i.e. bit 7 and 8, therefore we must always check overlapping ranges 0-7, 7-14, 14-21, 21-28, 24-31. In the last comparison of bits 24-31 we re-check bits 24-28 but it is easier to just do so than special case it and add extra logic.

And now let’s also benchmark this.

func BenchmarkIsSpecialLookup(b *testing.B) {
	for n := 0; n < b.N; n++ {
		isSpecialLookup(uint32(n))
	}
}
> go test -bench '.'
Recomputing lookup table...
goos: darwin
goarch: amd64
pkg: github.com/zqureshi/special
BenchmarkIsSpecial-8          200000000          9.85 ns/op
BenchmarkIsSpecialLookup-8    1000000000         2.60 ns/op
PASS
ok    github.com/zqureshi/special 5.817s

We get an almost 4x speedup using this approach as we are doing lesser comparisons and memory writes but also due to the fact that the whole lookup table fits in 32 bytes of memory which sits cosily in a 64 byte cache line.

3. Larger Lookup Table

We could extend the previous approach and precompute 65536 (2**16) values which would allow us to cover the whole number in just 3 comparisons 0-15, 15-30, 16-31.

func isSpecialLookup16(n uint32) bool {
	if lookupTable[3] == false {
		panic("Lookup table not initialized!")
	}

	return lookupTable[uint16(n)] ||
		lookupTable[uint16(n>>15)] ||
		lookupTable[uint16(n>>16)]
}

And benchmark

func BenchmarkIsSpecialLookup16(b *testing.B) {
	for n := 0; n < b.N; n++ {
		isSpecialLookup16(uint32(n))
	}
}
Recomputing lookup table...
goos: darwin
goarch: amd64
pkg: github.com/zqureshi/special
BenchmarkIsSpecial-8            200000000          9.57 ns/op
BenchmarkIsSpecialLookup-8      1000000000         2.62 ns/op
BenchmarkIsSpecialLookup16-8    1000000000         2.31 ns/op
PASS
ok    github.com/zqureshi/special 8.325s

A 12% improvement in runtime but a 256x blowup in table size (32 bytes vs 8192 bytes). While this performs better in this synthetic benchmark it might not perform as well in production because of cache line and page misses.

4. Finding Something Better

At this point I was quite happy with the progress but was wondering if there was some intrinsic property of numbers that could be exploited to make this even faster. I landed upon this HackerRank post and a collection of bit twiddling hacks from Stanford’s graphics department.

The solution was quite ingenious, if you take any binary number it will have runs of set bits followed by one or more unset bits. Now if you left shift the number, you move the 0 left and now if you bitwise and it with the original number you have effectively cleared one set bit from a run. If you repeat this process you will clear another bit, and another until the number is 0.

A worked out example will demonstrate this better.

14      =   1110
14 << 1 =  11100
a & b   =  01100

12      =   1100
12 << 1 =  11000
a & b   =  01000

8       =   1000
8  << 1 =  10000
a & b   =  00000

We can then simplify this algorithm to derive that if a number has at least two consecutive bits set, then left shift followed by bitwise and will result in a non-zero value. Let’s work through this.

12      =   1100
12 << 1 =  11000
a & b   =  01000

10      =   1010
10 << 1 =  10100
a & b   =  00000

So now the code simply becomes

func isSpecialLeftShift(n uint32) bool {
	return (n & (n << 1)) > 0
}

and benchmark

func BenchmarkIsSpecialLeftShift(b *testing.B) {
	for n := 0; n < b.N; n++ {
		isSpecialLeftShift(uint32(n))
	}
}
Recomputing lookup table...
goos: darwin
goarch: amd64
pkg: github.com/zqureshi/special
BenchmarkIsSpecial-8              200000000          9.55 ns/op
BenchmarkIsSpecialLookup-8        1000000000         2.60 ns/op
BenchmarkIsSpecialLookup16-8      1000000000         2.32 ns/op
BenchmarkIsSpecialLeftShift-8     2000000000         0.27 ns/op
PASS
ok    github.com/zqureshi/special 8.871s

This is a 35x improvement over our naive approach and only involves 3 instructions, which means it is also a candidate for inlining.

5. Epilogue

One thing that we didn’t talk about was how did I verify that my implementations were correct? I implemented the naive version and manually tested it on various inputs. Then for every other implementation I iterated through the first billion integers and reconciled the optimized against the naive.

func main() {
	for n := uint32(0); n < 1000000000; n++ {
		result := isSpecial(n)
		if isSpecialLookup(n) != result {
			fmt.Printf("Error: %v == %b isSpecial: %v, isSpecialLookup: %v\n", n, n, result, isSpecialLookup(n))
			break
		}
		if isSpecialLookup16(n) != result {
			fmt.Printf("Error: %v == %b isSpecial: %v, isSpecialLookup16: %v\n", n, n, result, isSpecialLookup16(n))
			break
		}
		if isSpecialLeftShift(n) != result {
			fmt.Printf("Error: %v == %b isSpecial: %v, isSpecialLeftShift: %v\n", n, n, result, isSpecialLeftShift(n))
			break
		}
	}
}

August 24, 2018

Abbreviations in Vim

Vim has a nifty feature that allows you to define keywords that will then expand to a configurable string. The basic syntax is

:iabbrev {kw} {expansion}

I am going to define two abbreviations __w and __t that will expand to my personal homepage and to this journal respectively

:iabbrev __w http://zqureshi.in
:iabbrev __t https://til.zqureshi.in

Now when I’m in insert mode and type __w followed by a <Space> the expansion will be triggered. In the situation that you do not want the expansion you would type __w and then press <Ctrl-V> followed by any other character to insert it verbatim. Usually I’ll do <Ctrl-V><Space>.

August 22, 2018

HTML5 UP

While reading a post on Netlify’s blog I encountered html5up.net. These look like an excellent set of starter templates for any new project, really polished!

August 21, 2018

Makefile .ONESHELL

Suppose you wanted to take user input in your Makefile and then use it in subsequent commands you might try

newfile:
  @printf "Filename: "
  @read FILE
  touch $$FILE

This would not work because each command in that recipe is run in a separate shell. You’d have to cram it all in one line

newfile:
  @printf "Filename: " && read FILE && touch $$FILE

This works but is not super readable, fortunately make provides a phony target .ONESHELL which basically lets you write a full script inline in the recipe. The full contents of the recipe are passed to a single shell to be executed so you might write

.ONESHELL:
newfile:
  @printf "Filename: "
  @read FILE
  touch $$FILE

This would be almost correct except for the fact that the second @ would be passed verbatim to your shell, therefore with .ONESHELL only the first character is checked to see if it is special. So your final recipe looks like

.ONESHELL:
newfile:
  @printf "Filename: "
  read FILE
  touch $$FILE

Note: The default make command that ships with MacOS does not support .ONESHELL just yet, I had to brew install make and then invoke it via gmake to get things to work.

August 17, 2018

Platform Specific Go

You can compile platform specific go code by putting it in specially named files. If your component is called foo.go then putting something in foo_darwin.go will only compile that on Mac and foo_linux will do it on linux distros accordingly.

I recently encountered this while reading through Shopify’s sysv_mq implementation. This library is a wrapper around native syscalls via cgo.

The ipcStat() method in wrapper.go pulls out all the queue metadata from the native struct

// wrapper.go
stat := &QueueStats{
	Perm:  perm,
	Stime: int64(info.msg_stime),
	// Rtime:  int64(info.msg_rtime), // https://github.com/Shopify/sysv_mq/issues/10
	Ctime:  int64(info.msg_ctime),
	Cbytes: cbytesFromStruct(info),
	Qnum:   uint64(info.msg_qnum),
	Qbytes: uint64(info.msg_qbytes),
	Lspid:  int32(info.msg_lspid),
	Lrpid:  int32(info.msg_lrpid),
}

But it turns out that the Cbytes value is not in the same field across platforms, so instead of putting the fetching logic inline it is moved into a method cbytesFromStruct().

On Mac the value is fetched from msg_cbytes

// wrapper_darwin.go
func cbytesFromStruct(info *_Ctype_struct___msqid_ds_new) uint64 {
	return uint64(info.msg_cbytes)
}

whereas on linux it is available in __msg_cbytes

// wrapper_linux.go
func cbytesFromStruct(info *_Ctype_struct_msqid_ds) uint64 {
	return uint64(info.__msg_cbytes)
}

Now go build will only compile the specific version of cbytesFromStruct() based on the target platform. One thing to note here is that if you are building static binaries from your Mac to run on a *nix machine or container you’ll probably want to build it inside a container to force the correct version.