#NoNullPointers - Read more "good code"

I’ve recently been getting more involved in programming with Haskell. It is a language that has very deep roots in academia but in recent years has developed a surge of popularity amongst engineers for solving both, the problems in software engineering and the semantics of writing software in one fell swoop.

The language is quite terse. With a strong influence of mathematical concepts including lambda calculus, type-theory, category-theory, logic-theory, etc. Much of the learning around Haskell is often rooted in ideas that are foreign to programmers who work in the procedural and object-oriented paradigms.

I had worked through all of the Haskell learning material that I could in order to grasp the concepts. Learn you a Haskell for Great Good, Haskell Programming from first principles , Real World Haskell, and about half a dozen monad tutorials. They were all excellent, however I was still struggling to come up with a way to write a coherent piece of software.

Define “good code”

There’s a sense that you start to develop over time when working with code - especially when working with code that you haven’t written yourself.

It manifests as an intuition to recognise clarity over confusion, simplicity over complexity, readability over incomprehensibility. One of the surprising things, is that you’ll be able to recognise these qualities even in languages that you aren’t overly familiar with.

Take a read through the source code of a library that you’ve used, but never looked at in detail. How quickly can you work out what a function does? How easy is it to understand the flow of control? What are the characteristics of “good code” that you notice - are these common amongst other code bases you’ve worked on? Do your peers agree with these characteristics?

The design of a program is more than just what it produces as output. I think this is best captured by the following quote.

“Programs are meant to be read by humans and only incidentally for computers to execute.” ― Donald Knuth

Programming languages need to assist us in our design goals and values for writing software as humans, not just for what the software produces when the machine runs it.

Elm - a bridge to Haskell

The Elm programming language came into my view around mid 2017. I could see how it captured much of what Haskell has to offer albeit with a set of constraints: the web as a delivery platform, a restricted set of capabilities compared with Haskell - yet my intuition told me that there would be power in these constraints.

In order to start writing Haskell code that I felt comfortable with, I started looking at the source code for the Elm compiler. As I read through the code base, it checked all of my boxes for language design, coherence, safety, expressiveness, clarity, etc. The code reflected the values of software that had been encoded into the Elm language itself.

Now, this post is primarily talking about Haskell, but the higher principles of software design and choices about semantics and wording are things that are worth keeping in the front of your mind.

Function Imports

I spent some time a couple of weeks ago analysing the imports in the Elm compiler. Here are some rules that I was able to discern.

import Control.Concurrent (forkIO)
import Control.Concurrent.Chan (Chan, newChan, readChan, writeChan)
import Control.Concurrent.MVar (MVar, newEmptyMVar, putMVar, readMVar)
import Control.Exception (Exception, SomeException, catch)
import Control.Monad (forever, join, replicateM_, void)
import qualified Data.ByteString.Char8 as BS
import qualified Network.HTTP as Http (urlEncodeVars)
import qualified Network.HTTP.Client as Http
import qualified Network.HTTP.Types.Header as Http (hAcceptEncoding, hUserAgent)

import qualified Elm.Compiler as Compiler
import qualified Elm.Package as Pkg
import qualified Reporting.Exit as Exit
import qualified Reporting.Exit.Http as E
import qualified Reporting.Progress as Progress
import qualified Reporting.Task as Task

Imports from the Haskell core packages, are imported at the top level, with their explicit functions imported. No qualification needed.
Imports from additional packages from Hackage, are imported qualified with an alias and; where possible, functions from multiple modules all end up in the same namespace as the primary package alias.
Imports from local modules are imported qualified, without exposing any explicit functions.

Why does this matter? It wasn’t clear to me but as I started to scan through the Http file I could immediately recognise where the source of all of my functions came from. This meant two things:

I could easily see where things were supposed to be (and if they didn’t make sense, I should move them to another module)
My cognitive overhead was significantly reduced because I wasn’t relying on the fact that functions could be coming from any of my imports.

I started applying these rules to a Haskell project at work and my productivity skyrocketed. Why? Because the clarity in the code naming coupled with a reduced cognitive load, made it really easy for me to see where my design and organisation of the code was failing.

If you’ve had trouble with organising your code and getting the module design to feel right, I’d strongly recommend giving this approach a try.

Language extensions

Coming to Haskell can be overwhelming when you start reading all of the material about what the language is capable of. Certainly, more of it’s sophisticated features can be enabled via language extension pragmas.

Here’s what was shocking to me about the Elm compiler codebase. The majority of the files use the OverloadedStrings extension which is for assisting with the easy conversion of string literals to other types like ByteString, Text and more.

There were only two others that were prominent, but even then - they appear in less that 10% of the files. They were:

GADTs
Rank2Types

That’s it. There were a few more but I really want to be clear about what I discovered. You can write a significant amount of production Haskell code just with the bare bones language out of the box.

Now, I should say, that I’ve been reading through Sandy Maguire’s book “Thinking with Types”. There are dedicated chapters on many of the language extensions, notably GADTs and RankNTypes - and with the explanations in the book, I actually know what they are, why they are useful and most importantly, where I can use them in my own projects.

Again, I need to stress, by removing the complexity and cognitive overhead of language extensions, although a little more verbose, perhaps a little less expressive, a little less safe, the code that exists in the majority of the files in the compiler code, can be understood even if you’ve never done any Haskell before.

This is the mark of someone who truly understands the impact of design in not only what the software outputs are, but what the software is, how it evolves over time and how that fits into the broader programming language landscape.

Monads, without saying Monads

I grepped the codebase for instance Monad to see the extent in which Haskell Type Classes were used.

I was shocked. Seven occurrences. Seven.

I know what you’re thinking… “Yeah, but you need to look for the automatic deriving also”. I did. The only instance being derived are Eq and Ord.

What does this mean in practice?

{- LANGUAGE GADTs -}

module Reporting.Task.Http
  ( Fetch
  , andthen
  )
  where

data Fetch a where
  AndThen :: Fetch a -> (a -> Fetch b) -> Fetch b
  ...

andThen :: Fetch a -> (a -> Fetch b) -> Fetch b
andThen =
  AndThen

AndThen subFetch callback ->
  do  subMVar <- runHelp chan manager tell subFetch
      mvar <- newEmptyMVar
      void $ forkIO $
        do  result <- readMVar subMVar
            putMVar mvar =<<
              case result of
                Left err ->
                  return (Left err)
                Right value ->
                  readMVar =<< runHelp chan manager tell (callback value)
      return mvar

If we compare the signature for bind and andThen we’ll see.

(>>=)   :: Monad m => m     a -> (a -> m     b) -> m     b
andThen ::            Fetch a -> (a -> Fetch b) -> Fetch b

They are identical - except that the andThen is specialised to work with the Fetch type signature from the GADT. What are the implications of this design decision?

It’s hard to really know what the author meant when writing this code, but I infer that they value and respect the high cognitive load for a project like this, so any effort to reduce that is a win. Also, having a preference for clarity over confusion in implied or implicit knowledge.

There’s another gem in the second argument to andThen, which is a function - has been named callback locally. This intuitively makes sense especially if you’ve come from a web background and done a bunch of programming in Javascript.

A simple name change like this can make a foreign concept reachable for someone trying to get a handle on the concepts in Haskell. Again, the attention and care given to a higher level of software comprehension in this codebase is remarkable.

Closing thoughts

These are just a few examples of what I was able to learn from reading through the Elm compiler source code. I’m sure that if you read through some popular and mature libraries in your ecosystem you’ll be able to pick up on many things that are either good or bad depending on what your “good code” values are.

Here are my two takeaways from these insights.

That if you are wanting to learn Haskell, don’t be scared off by all of the things you see on Twitter or out in the category theory stratosphere. You can write production Haskell today and be productive just by using the basics of the language.
That you can improve your own code by reading and attempting to understand the mindset and values that the author has applied to their code.

As always, I hope this is useful information to you and thanks for reading!

If you like this post and want to get in touch, please reach out and follow me on Twitter.