Grappling with Go

In a previous blog post, I observed that one of the medium-term possibilities we’re seriously considering for NTPsec is moving the entire codebase out of C. I continued that the two languages we’re looking at hard are Rust and Go. Then, in another blog post, I revealed the existence of a Go port of David Wheeler’s sloccount tool. In a third, I described my rather frustrating experience with Rust.

In this post I’m going to get a bit deeper into Go, loccount, my first learning experience with the language, and Go’s potential suitability for what we’re doing at NTPsec.

Note this piece is being published out of order of events - I actually wrote it well before my Rust vs. Go post, which I shipped early partly in response to the Rust discussion over on my personal blog.

Onward to Go…

I think writing toy programs - conscious finger exercises - are a terrible way to learn a new language. They’re boring, and they don’t make you think enough. For me, tutorials stuffed with pretty, bite-sized examples chosen by others don’t work well either - I feel like I’m being patronized by the authors, fed with excessively-safe bonbons when what I want to do is joyously tear off and bolt huge chunks of meat.

What does work for me is diving in - tackling a real problem of the right size, with a reference manual for the language to hand in a browser window.

The right size of problem can’t be too small, because if it is you don’t engage enough of the language to really learn it. Nor can it be so large that you’re it risk of getting lost in the problem itself, rather than staying at least half focused on exploring the implementation strategies that the language affords. In the ideal case, your right-sized problem is one you’re strongly motivated to solve independently of your desire to learn the language.

I was fortunate. I found an NTPsec-related problem that hit the sweet spot almost exactly. Later I’ll explain why it might have been just a tiny bit too small, but in retrospect it still isn’t clear how I could have chosen any better in advance. I’m happy with the outcome I got.

One of the central thrusts of our strategy is removing code from the ancestral NTP Classic codebase without breaking function. One of the main themes in the story we tell the world about what we’re doing is how much code we’ve removed. We measure our success that way. So a tool I have used a lot is David A. Wheeler’s sloccount.

It’s a good tool, but it has some minor annoying flaws. The largest of these is that, in my opinion, the output is excessively verbose. I don’t like programs that eat your screen space with unnecessary progress messages, license announcements, author credits, copyrights, and other impedimenta and don’t give you any way to shut that off. I’m also not usually interested in the cost-modeling feature.

A second problem is that sloccount is slow. It has to rely on MD5 caching of repeated counts to have acceptable performance at all. And it’s slow because internally it’s a pile of Perl, shell scripts, and C helper code that David himself concedes is rather a mess. Thus makes it difficult to modify, so de-verbosifying and speeding it up is not very practical.

Problem: write a sloccount replacement in Go. It took me about four days of fairly steady thinking and work to get to a version that could replace sloccount for NTPsec’s needs, then another ten days of occasional tweaking to improve it to its present state while I worked on other project issues. You can browse the result on GitLab.

The result is quite fast compared to sloccount, like a 4x to 10x speedup on the typical case. Speedup is inversely proportional to codebase size, which just tells us that both sloccount and loccount are dominated by I/O costs that increase as a fraction of runtime with code volume. Notably, however, this speed is without the complicated caching/memoization that sloccount uses; if you run sloccount without that, the difference gets quite a bit larger.

I also took the opportunity to add recognizers and totalizers for a bunch of new languages, because who wouldn’t? I discovered that if all you care about is comment and string syntax, almost all computer languages fall into five groups: C-like, scripting, generic, Pascal-like, and Fortran-like. It’s not too difficult to write generic parsers for these groups and keep most of the language-specific information in a handful of large tables. Thus, loccount is also dramatically easier to extend than sloccount.

Now to Go itself. I’ll admit I was a little apprehensive going in. The culture around Go smells of hypernormativeness - you’re going to program the way Rob Pike and his buddies think is right conduct, and if you don’t like that you can hit the highway, Jack. There’s also been a lot of talk about Go’s alleged refusal to incorporate design ideas from recent languages; one aggregation site of Go criticism tags this the "stuck in 70s" problem.

Thankfully, my fears went largely unrealized. The rhetoric around the language may be rather truculent, but there’s more room to stretch in it than I expected. Pike’s taste is austere but mostly pretty good. And "stuck in 70s" is unfair in at least one major respect: the language includes first-class map objects (almost equivalent to Python dictionaries) which Perl pioneered in the 1980s.

One half-forgotten 1970s concept I’m glad Go revived is channels. I’ve had a strong feeling Tony Hoare was onto something seriously good ever since reading about occam’s implementation of his Communicating Sequential Processes model back in the 80s. CSP has a spareness and elegance that the conventional concurrency apparatus of mutexes, semaphores, and shared memory lacks. Of course the question is whether that elegance leaves CSP sufficiently unconstrained to solve real problems.

I checked this by writing loccount as a parallelized tree traversal. Each file in the tree gets a goroutine spawned to count its lines; the threads run in parallel and as each finishes its work it stuffs a statistics block into a channel. The main thread blocks on the output end, reading each statistics block as it becomes available and saving it for report generation.

It only took me about two hours from a standing start to write this, counting time to read and process the documentation and think through the design. The resulting code is really simple and natural. The channel is the only synchronization mechanism, and its thread-safe queue behavior is easy to reason about. It’s even easy to tune, with (at least in this case) a relatively clear relationship between optimal channel depth and the number of processor cores you have available.

It’s nice to see CSP in a language I can use for production, and very gratifying to find that it really is an effective tool for carving up concurrency problems. If there is one pitch the Go designers have knocked right out of the park, this is it. I’m already sure I’m going to miss it a lot in Rust.

Ability to handle concurrency well is actually a prompt issue for NTPsec. While the major clock-disciplining algorithm is intrinsically serial, we have a problem with DNS-lookup stalls. Because they can be long, the lookups have to be done in a worker thread with the results posted back to the main thread. The code for this is messy, complicated, and has known bugs. Being able to rewrite it in terms of goroutines and channels would be a blessing.

Here are some things I thought I’d dislike about Go that rapidly turned into non-issues as I actually used it: visibility by variable case, odd required syntax of blocks and conditional chains, absence of operator overloading, absence of class inheritance.

The language does have annoyances. The absence of const is a minor one. Go binaries are statically linked to a copy of its runtime; thus they are annoyingly fat (more about this later). $GOPATH is a pain, and the style of project layout the build system wants you to use is ugly to my eyes. No ternary operator. There are no map/reduce/filter primitives.

I found the single most perplexing thing about the language to be the module-import rules. The documentation offered a surfeit of examples but never stated a simple generative rule that explained them.

Go has a remarkably strong flavor of Python for a compiled language, which I think shows that the designers were paying attention to the right things. On the other hand, the single feature I find most conspicuous by its absence is Python-style try/catch with class-valued exceptions (and the related idioms around context managers and Python "with").

I’m aware of all the arguments against nonlocal jumps, but this is a near-showstopper for readily translating code out of Python into Go; too many Python idioms will break and have to be translated in complicated and failure-prone ways. Someday this could be a problem for NTPsec; if we move to Go we’ll probably want to go that whole way and shift our Python as well.

My learning project may have been just a little too small, as it gave me no reason to dive into the stranger corners of the Go object system. I haven’t fully learned interfaces and reflection yet, so I can neither praise that part of the design nor speak to any deficiencies that might lurk there.

Now we’ll return to more specific NTPsec-related concerns. What specifically can we say about Go’s suitability as an implementation language for ntpd?

The big foreground issue is garbage collection. It’s nearly unprecedented ^[1] for a language to be both garbage-collected and as squarely aimed at systems programming as Go is, because systems programmers have trouble tolerating the stop-the-world pauses that happen when the GC runs.

On the other hand, this problem potentially affects ntpd less than one might think. The way it normally operates is to make fine adjustments to the system clock at a relatively long interval (normally a second). It is not very sensitive to small variations in the size of that interval; it can’t be, because it has to handle incoming packets at unpredictable times.

That said, ntpd does have a few time-sensitive critical regions; these are wrapped closely around the system calls that manipulate the clock. Which brings us to the first relevant Go promise; you can tell its runtime to turn the GC off. Of course if you do this memory usage will grow without bound while it is off, but for a small critical region that is doing neither string operations nor I/O this shouldn’t be a problem.

Go’s second relevant promise is that it can hold the size of stop-the-world pauses way down, to under 10ms and usually much less. I’ve seen graphs of real-world server performance on large applications that back this up, with pause times tightly clustered around 2ms. Some of these are linked from the Go blog post on Prioritizing low latency and simplicity.

I’m pretty sure that the combination of suppressing GC in critical regions with stalls outside them held to under 10ms will be good enough for NTPsec. We’ll measure to be sure, but I would be quite surprised if this became a showstopper.

Another issue: I mentioned before that Go produces fat binaries due to the size of the Go runtime. As a painful example, the binary for my 1600LOC loccount.go program is twice the size of the 62KLOC ntpd binary in C!

At first sight this seems ridiculous, but there are two mitigating factors. One is that the Go runtime will be a constant rather than scaling overhead; as the actual program it’s attached to gets larger the size numbers will look progressively less silly. The other is that statically-linked C programs look bloated, too. Our expectations are formed by dynamic linking, which hides the memory cost of the C standard library and whatever other libraries we’re linking.

Still, a little nervousness about how gracefully Go would deploy in constrained embedded environments seems justified. While this is not historically a significant use case for ntpd, I would hate to choose an implementation language that forecloses it.

Fortunately, this is a business problem for Google. To fully realize the benefits of Go they need to be able to deploy on Android, and we know this is a use case the Go developers pay attention to. NTPsec probably won’t need to be able to deploy on embedded faster than Google needs to.

Since writing most of the preceding I have also implemented the outer parts of an IRC server in Go - all the socket handling and multiplex I/O, basically everything but the IRC protocol state machine itself. As a network service daemon, this is architecturally much closer to ntpd than loccount was. You can clone the result from here:

git@git.binaryredneck.net:p/ircsome/ircsome

It’s only 358 lines in Go, which is rather impressive.

In conclusion, I judge that while Go has some minor liabilities it is quite suitable as an implementation language for ntpd.

1. After I first wrote this, without the "nearly" qualification, I was reminded of one precedent for a systems language featuring garbage collection: Oberon