Indexing a map with byte slices

TLDR⌗

Since map[[]byte]_ are not possible in go, use map[string]_ and index your byte slices using string(byteSlice).

Need for byte slice map keys⌗

Go’s default hash.Hash interface returns hashes as slices of bytes.

type Hash interface {
    // ...
    Sum(b []byte) []byte
    // ...
}

As we often use cryptographic hashes to identify complex objects, associative arrays with hashes as keys are needed to provide efficient insertion and retrieval of those objects.

Problem⌗

In go, associative arrays are provided by the language through the map construct.

However, byte slices cannot be used as a map key, as explained in the language specification. Operators == and != must be defined for the map key type. Since slices are descriptors of an underlying array, equality is ambiguous different as multiple definitions can exist. This explains the lack of a standard definition in the language.

Comparisons of possible solutions⌗

So how can we still use byte slices as indexes to our maps ? Possibilities are:

avoid slices, use arrays if possible
convert the byte slices to a key type using a one-to-one function, such as string(), or hex.EncodeToString()
implement a custom map

Here is a benchmark of some possibilities, testing insertions, retrievals and deletions:

BenchmarkArrayKeyed: use byte array as a key
BenchmarkStringKeyed: convert byte slice to a key using string()
BenchmarkOptimizedStringKeyed: convert byte slice to a key using string(), and leverage an optimization described below
BenchmarkHexgKeyed: convert byte slice to a string using hex.EncodeToString().

BenchmarkArrayKeyed-8             	19557058	        60.6 ns/op
BenchmarkOptimizedStringKeyed-8   	17753274	        67.7 ns/op
BenchmarkStringKeyed-8            	11834193	        102 ns/op
BenchmarkHexKeyed-8               	 6212077	        192 ns/op

Benchmark code is available here.

As one can see, using the built-in string() conversion with the optimization leads to a very small performance loss compared to using byte arrays. Since we expect to use the Hash interface, the optimized string() conversion is to prefer.

Below I explain what is the optimized string() conversion.

Optimization⌗

Cases of v := m[string(byteSlice)] can be optimized by the go compiler. The optimization was introduced here.

In short, internally the compiler can construct a temporary unsafe string from the byte slice, because it knows it will only be used as an index:

func slicebytetostringtmp(b Slice) (s String) {
    // ...
	s.str = b.array;
	s.len = b.len;
}

This explains the difference in running times of:

// string only used as an index
v := m[string(byteSlice)]

// string may be used later
key := string(byteSlice)
v := m[key]