GIANT ROBOTS SMASHING INTO OTHER GIANT ROBOTS

Written by thoughtbot

Applicative Options Parsing in Haskell

I’ve just finished work on a small command line client for the Heroku Build API written in Haskell. It may be a bit overkill for the task, but it allowed me to play with a library I was very interested in but hadn’t had a chance to use yet: optparse-applicative.

In figuring things out, I again noticed something I find common to many Haskell libraries:

  1. It’s extremely easy to use and solves the problem exactly as I need.
  2. It’s woefully under-documented and appears incredibly difficult to use at first glance.

Note that when I say under-documented, I mean it in a very specific way. The Haddocks are stellar. Unfortunately, what I find lacking are blogs and example-driven tutorials.

Rather than complain about the lack of tutorials, I’ve decided to write one.

Applicative Parsers

Haskell is known for its great parsing libraries and this is no exception. For some context, here’s an example of what it looks like to build a Parser in Haskell:

type CSV = [[String]]

csvFile :: Parser CSV
csvFile = do
    lines <- many csvLine
    eof

    return lines

  where
    csvLine = do
        cells <- many csvCell `sepBy` comma
        eol

        return cells

    csvCell = quoted (many anyChar)

    comma = char ','

    eol = char '\n' <|> char '\r\n'

    -- etc...

As you can see, Haskell parsers have a fractal nature. You make tiny parsers for simple values and combine them into slightly larger parsers for slightly more complicated values. You continue this process until you reach the top level csvFile which reads like exactly what it is.

When combining parsers from a general-purpose library like parsec (as we’re doing above), we typically do it monadically. This means that each parsing step is sequenced together (that’s what do-notation does) and that sequencing will be respected when the parser is ultimately executed on some input. Sequencing parsing steps in an imperative way like this allows us to make decisions mid-parse about what to do next or to use the results of earlier parses in later ones. This ability is essential in most cases.

When using libraries like optparse-applicative and aeson we’re able to do something different. Instead of treating parsers as monadic, we can treat them as applicative. The Applicative type class is a lot like Monad in that it’s a means of describing combination. Crucially, it differs in that it has no ability to define an order – there’s no sequencing.

If it helps, you can think of applicative parsers as atomic or parallel while monadic parsers would be incremental or serial. Yet another way to say it is that monadic parsers operate on the result of the previous parser and can only return something to the next; the overall result is then simply the result of the last parser in the chain. Applicative parsers, on the other hand, operate on the whole input and contribute directly to the whole output – when combined and executed, many applicative parsers can run “at once” to produce the final result.

Taking values and combining them into a larger value via some constructor is exactly how normal function application works. The Applicative type class lets you construct things from values wrapped in some context (say, a Parser State) using a very similar syntax. By using Applicative to combine smaller parsers into larger ones, you end up with a very convenient situation: the constructed parsers resemble the structure of their output, not their input.

When you look at the CSV parser above, it reads like the document it’s parsing, not the value it’s producing. It doesn’t look like an array of arrays, it looks like a walk over the values and down the lines of a file. There’s nothing wrong with this structure per se, but contrast it with this parser for creating a User from a JSON value:

data User = User String Int

-- Value is a type provided by aeson to represent JSON values.
parseUser :: Value -> Parser User
parseUser (Object o) = User <$> o .: "name" <*> o .: "age"

It’s hard to believe the two share any qualities at all, but they are in fact the same thing, just constructed via different means of combination.

In the CSV case, parsers like csvLine and eof are combined monadically via do-notation:

You will parse many lines of CSV, then you will parse an end-of-file.

In the JSON case, parsers like o .: "name" and o .: "age" each contribute part of a User and those parts are combined applicatively via (<$>) and (<*>) (pronounced fmap and apply):

You will parse a user from the value for the “name” key and the value for the “age” key

Just by virtue of how Applicative works, we find ourselves with a Parser User that looks surprisingly like a User.

I go through all of this not because you need to know about it to use these libraries (though it does help with understanding their error messages), but because I think it’s a great example of something many developers don’t believe: not only can highly theoretic concepts have tangible value in real world code, but they in fact do in Haskell.

Let’s see it in action.

Options Parsing

My little command line client has the following usage:

% heroku-build [--app COMPILE-APP] [start|status|release]

Where each sub-command has its own set of arguments:

% heroku-build start SOURCE-URL VERSION
% heroku-build status BUILD-ID
% heroku-build release BUILD-ID RELEASE-APP

The first step is to define a data type for what you want out of options parsing. I typically call this Options:

import Options.Applicative -- Provided by optparse-applicative

type App = String
type Version = String
type Url = String
type BuildId = String

data Command
    = Start Url Version
    | Status BuildId
    | Release BuildId App

data Options = Options App Command

If we assume that we can build a Parser Options, using it in main would look like this:

main :: IO ()
main = run =<< execParser
    (parseOptions `withInfo` "Interact with the Heroku Build API")

parseOptions :: Parser Options
parseOptions = undefined

-- Actual program logic
run :: Options -> IO ()
run opts = undefined

Where withInfo is just a convenience function to add --help support given a parser and description:

withInfo :: Parser a -> String -> ParserInfo a
withInfo opts desc = info (helper <*> opts) $ progDesc desc

So what does an Applicative Options Parser look like? Well, if you remember the discussion above, it’s going to be a series of smaller parsers combined in an applicative way.

Let’s start by parsing just the --app option using the library-provided strOption helper:

parseApp :: Parser App
parseApp = strOption $
    short 'a' <> long "app" <> metavar "COMPILE-APP" <>
    help "Heroku app on which to compile"

Next we make a parser for each sub-command:

parseStart :: Parser Command
parseStart = Start
    <$> argument str (metavar "SOURCE-URL")
    <*> argument str (metavar "VERSION")

parseStatus :: Parser Command
parseStatus = Status <$> argument str (metavar "BUILD-ID")

parseRelease :: Parser Command
parseRelease = Release
    <$> argument str (metavar "BUILD-ID")
    <*> argument str (metavar "RELEASE-APP")

Looks familiar, right? These parsers are made up of simpler parsers (like argument) combined in much the same way as our parseUser example. We can then combine them further via the subparser function:

parseCommand :: Parser Command
parseCommand = subparser $
    command "start"   (parseStart   `withInfo` "Start a build on the compilation app") <>
    command "status"  (parseStatus  `withInfo` "Check the status of a build") <>
    command "release" (parseRelease `withInfo` "Release a successful build")

By re-using withInfo here, we even get sub-command --help flags:

% heroku-build start --help
Usage: heroku-build start SOURCE-URL VERSION
  Start a build on the compilation app

Available options:
  -h,--help                Show this help text

Pretty great, right?

All of this comes together to make the full Options parser:

parseOptions :: Parser Options
parseOptions = Options <$> parseApp <*> parseCommand

Again, this looks just like parseUser. You might’ve thought that o .: "name" was some kind of magic, but as you can see, it’s just a parser. It was defined in the same way as parseApp, designed to parse something simple, and is easily combined into a more complex parser thanks to its applicative nature.

Finally, with option handling thoroughly taken care of, we’re free to implement our program logic in terms of meaningful types:

run :: Options -> IO ()
run (Options app cmd) = do
    case cmd of
        Start url version  -> -- ...
        Status build       -> -- ...
        Release build rApp -> -- ...

Wrapping Up

To recap, optparse-applicative allows us to do a number of things:

  • Implement our program input as a meaningful type
  • State how to turn command-line options into a value of that type in a concise and declarative way
  • Do this even in the presence of something complex like sub-commands
  • Handle invalid input and get a really great --help message for free

Hopefully, this post has piqued some interest in Haskell’s deeper ideas which I believe lead to most of these benefits. If not, at least there’s some real world examples that you can reference the next time you want to parse command-line options in Haskell.