Archive for August, 2009

Morton numbers

August 3, 2009

Long time no posting, but I have excuses (also I’m posting some at openstreetmap user diaries).

So anyway here’s a cheap trick I came up with but which you might already know.  If you’re indexing any georeferenced data, such as when doing any fun stuff with OpenStreetMap data, you’ve probably wanted to index by location among other things, and location is two or three dimensional (without loss of generality assume two as in GIS).  So obviously you can combine latitude and longitude as one key and index by that but that’s only good for searching for exact pairs of values.  If your index is for a hash table then you can’t hope for anything more but if it’s for sorting of an array you can do a little better (well, here’s my trick): convert the two numbers to fixed point and interleave their bits to make one number.  This is better because two positions that are close to each other in an array sorted by this number probably are close to each other on the map.  You could probably use floating point too if you stuff the exponent in the most significant bits and get a result similar to some degree.  With fixed point you can then compare only the top couple of bits when searching in the array to locate something with a desired accuracy.

Converting to and from the interleaved bits form is straight forward and you can easily come up with a O(log(number of bits)) procedure (5 steps for 32 bit lat / lon) or use lookup tables as suggested by the Bit Twiddling Hacks page, where I learnt they’re called Morton numbers.  32-bit lat/lon will give you a 64-bit number and that should be accurate enough for most uses if you map the whole -90 – 90 / -180 – 180 deg range to integers.  Even 20-bit lat/lon (5 bytes for the index) gives you 0.0003 deg accuracy.

So what else can you do with this notation? Obviously you can compare two numbers and use bisection search in arrays or the different kinds of trees.  You  can not add or subtract them directly (or rather, you won’t get useful results) but you can add / subtract individual coordinates without converting to normal notation and back, here’s how:

First separate latitude from longitude by masking:

uint64_t x = a & 0x5555555555555555;
uint64_t y = a & 0xaaaaaaaaaaaaaaaa;

Now you can subtract two numbers directly, you’ll notice that the carry flags are correctly carried over the unused bits, you’ll just need to mask them out of the result:

uint64_t difference(uint64_t a, uint64_t b)
{
	...
	return ((ax - bx) & 0x5555555555555555) |
		((ay - by) & 0xaaaaaaaaaaaaaaaa);
}

(you can also use this Masked Merge hack from the page I linked earlier).

The result is signed two’s complement with two sign bits in the top bits.

Now something much less obvious is that if you want to calculate absolute difference, you can call abs() directly on the result of subtraction and only mask out the unused bits afterwards.  How does this work?  The top bit in (ax - bx) always equals the sign bit even if ax and bx only use even bits (top bit is odd), so this part is ok.  Now, if the number is positive then there’s nothing to do with it.  If it’s negative, then abs negates it again (strips the minus).  Conveniently -x equals ~(x - 1) in two’s complement, so let’s see what these two operations do to a negative (ax - bx)~ or bitwise negation just works because it inverts all bits including the ones we’re interested in.  The x – 1 part also works because it flips all the bits until the first 1 bit starting from lowest bit, and you’ll find, although it may be tricky to see, that the first bit set in (ax - bx) is always even (or always odd).

uint64_t distance(uint64_t a, uint64_t b)
{
	...
	return (llabs(ax - bx) & 0x5555555555555555) |
		(llabs(ay - by) & 0xaaaaaaaaaaaaaaaa);
}

Addition requires a little trick for the carry flags to work: just set all unused bits in either ax or bx:

uint64_t sum(uint64_t a, uint64_t b)
{
	uint64_t ax = a & 0x5555555555555555;
	uint64_t ay = a & 0xaaaaaaaaaaaaaaaa;
	uint64_t bx = b | 0xaaaaaaaaaaaaaaaa;
	uint64_t by = b | 0x5555555555555555;

	return ((ax + bx) & 0x5555555555555555) |
		((ay + by) & 0xaaaaaaaaaaaaaaaa);
}