Introducing Anvil

Go is a good language (link to my thoughts impending). I like its philosophy, which is that software is a complex enough without your tools introducing more complexity. In my 10 or so years of writing software professionally (sometimes software with demanding concurrency and performance requirements), I’ve found that you can get away with a good resizable array and hashtable implementation for 99.9% of the problems out there. If you have some nice syntactic jujubees over the array, you get your stacks and your queues for free, which is what Go opted for.

Another thing that I mostly like about Go’s hashtable (map) implementation is that it’s restrictive as to what you can use for keys. I can’t tell you how many bugs I’ve seen introduced in Java projects from someone implementing hashCode and equals methods incorrectly, or forgetting to implement one of them when they do the other. In fact, hashing is such a misunderstood topic that I have no doubt that omitting hash-based structures from the C++ standard until C++11 was in part due to that. In summary, Go made the right choice in restricting their standard container library and building it into the language–mostly.

Limitations

Online sorting

It’s a bit baffling that there is no sorted map or set. The online sorting provided by a good red-black tree implementation is frequently useful, especially when you’re creating networked middleware or (in my case), online algorithms for rewriting trees and graphs.

Iteration and Access

The idiom for iteration in go is the range operator. The only arguments that the range operator accepts are native Go types (chan, map, array), so the language does not help you in creating abstractions for something fundamental like iteration. While this is by design, it can in many cases hinder good software design. Maybe there’s a super-hot portion of an algorithm wherein iteration over the default implementation of a map just isn’t cutting it (hash tables require that you iterate over their entire backing store in the worst case, which can be substantially larger than the number of items they contain). Whatever your case may be, when you encounter a limitation in the core Go container, you must

  • Implement a new container that meets your requirement
  • Adapt the algorithm to suit your needs

I give you Anvil

Oddly, I do not notice any widely-used or maintained collections collections for Go. For Updraft, I had already had to implement a red-black tree. I then had a situation wherein an Updraft user could legitimately need different hashing characteristics. Finally, not being able to use an IR node as a key turned out to be a pain. So, I decided to implement my own collections framework with the intent that it should:

  • Be simple and similar to existing collections frameworks
  • Be Correct
  • Be Idiomatic
  • Be Consistent
  • Be Literate
  • Get the small things like serialization right

So far, the interfaces are still in flux and in a state of partial implementation, but I have implementations for a sorted map (RB tree) and an unordered map (linear probing hash table) that can be used as follows:

Map Creation

Unordered linear-probing hashmap

    hash := anvil.
        Maps().
        Hashes(anvil.Hashing.LinearProbing).
        DefaultLoadFactor().
        InitialCapacity(10).
        KeyedBy(strings.DefaultHash)

Sorted map using red-black balancing

    rbtree := anvil.Maps().Sorted(anvil.Trees.RedBlack).OrderBy(cmp)

Iteration

Stateful iterator (very fast, available on all collections)

    for iter := map.Iterator(); iter.HasNext(); {
        value := iter.Next()
        //etc
    }

Range iterator (slower than stateful iterator, but fast enough for most purposes, available on all collections)

    for value := range m.Range() {
        //etc
    }

The collections are inspired by the JCF and .Net collections classes, but have a distinct Go flavor.

Performance

One of these days when I’m not getting married and starting a new job, I’ll post full benchmarks of everything relative to builtin types. My preliminary impression is that they are competitive with the corresponding default implementations (I was actually pleasantly surprised that my linear probing hashmap is virtually indistinguishable in terms of insertion and lookup performance from the default implementation for certain keys (strings) and load-factors (.85)). As the collections mature, they’ll get even faster.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: