Indexing a map with byte slices
TLDR⌗
Since map[[]byte]_
are not possible in go, use map[string]_
and index your byte slices using string(byteSlice)
.
Need for byte slice map keys⌗
Go’s default hash.Hash
interface returns hashes as slices of bytes.
type Hash interface {
// ...
Sum(b []byte) []byte
// ...
}
As we often use cryptographic hashes to identify complex objects, associative arrays with hashes as keys are needed to provide efficient insertion and retrieval of those objects.
Problem⌗
In go, associative arrays are provided by the language through the map
construct.
However, byte slices cannot be used as a map key, as explained in the language specification. Operators ==
and !=
must be defined for the map key type.
Since slices are descriptors of an underlying array, equality is ambiguous different as multiple definitions can exist.
This explains the lack of a standard definition in the language.
Comparisons of possible solutions⌗
So how can we still use byte slices as indexes to our maps ? Possibilities are:
- avoid slices, use arrays if possible
- convert the byte slices to a key type using a one-to-one function, such as
string()
, orhex.EncodeToString()
- implement a custom map
Here is a benchmark of some possibilities, testing insertions, retrievals and deletions:
BenchmarkArrayKeyed
: use byte array as a keyBenchmarkStringKeyed
: convert byte slice to a key usingstring()
BenchmarkOptimizedStringKeyed
: convert byte slice to a key usingstring()
, and leverage an optimization described belowBenchmarkHexgKeyed
: convert byte slice to a string usinghex.EncodeToString()
.
BenchmarkArrayKeyed-8 19557058 60.6 ns/op
BenchmarkOptimizedStringKeyed-8 17753274 67.7 ns/op
BenchmarkStringKeyed-8 11834193 102 ns/op
BenchmarkHexKeyed-8 6212077 192 ns/op
Benchmark code is available here.
As one can see, using the built-in string()
conversion with the optimization leads to a very small performance loss compared to using byte arrays.
Since we expect to use the Hash
interface, the optimized string()
conversion is to prefer.
Below I explain what is the optimized string()
conversion.
Optimization⌗
Cases of v := m[string(byteSlice)]
can be optimized by the go compiler. The optimization was introduced here.
In short, internally the compiler can construct a temporary unsafe string from the byte slice, because it knows it will only be used as an index:
func slicebytetostringtmp(b Slice) (s String) {
// ...
s.str = b.array;
s.len = b.len;
}
This explains the difference in running times of:
// string only used as an index
v := m[string(byteSlice)]
// string may be used later
key := string(byteSlice)
v := m[key]