I’ve just finished work on a small command line client for the Heroku Build API written in Haskell. It may be a bit overkill for the task, but it allowed me to play with a library I was very interested in but hadn’t had a chance to use yet: optparse-applicative.
In figuring things out, I again noticed something I find common to many Haskell libraries:
- It’s extremely easy to use and solves the problem exactly as I need.
- It’s woefully under-documented and appears incredibly difficult to use at first glance.
Note that when I say under-documented, I mean it in a very specific way. The Haddocks are stellar. Unfortunately, what I find lacking are blogs and example-driven tutorials.
Rather than complain about the lack of tutorials, I’ve decided to write one.
Haskell is known for its great parsing libraries and this is no exception. For some context, here’s an example of what it looks like to build a Parser in Haskell:
type CSV = [[String]] csvFile :: Parser CSV csvFile = do lines <- many csvLine eof return lines where csvLine = do cells <- many csvCell `sepBy` comma eol return cells csvCell = quoted (many anyChar) comma = char ',' eol = char '\n' <|> char '\r\n' -- etc...
As you can see, Haskell parsers have a fractal nature. You make tiny
parsers for simple values and combine them into slightly larger parsers
for slightly more complicated values. You continue this process until
you reach the top level
csvFile which reads like exactly what it is.
When combining parsers from a general-purpose library like parsec (as we’re doing above), we typically do it monadically. This means that each parsing step is sequenced together (that’s what do-notation does) and that sequencing will be respected when the parser is ultimately executed on some input. Sequencing parsing steps in an imperative way like this allows us to make decisions mid-parse about what to do next or to use the results of earlier parses in later ones. This ability is essential in most cases.
When using libraries like optparse-applicative and aeson
we’re able to do something different. Instead of treating parsers as
monadic, we can treat them as applicative. The
Applicative type class
is a lot like
Monad in that it’s a means of describing combination.
Crucially, it differs in that it has no ability to define an order –
there’s no sequencing.
If it helps, you can think of applicative parsers as atomic or parallel while monadic parsers would be incremental or serial. Yet another way to say it is that monadic parsers operate on the result of the previous parser and can only return something to the next; the overall result is then simply the result of the last parser in the chain. Applicative parsers, on the other hand, operate on the whole input and contribute directly to the whole output – when combined and executed, many applicative parsers can run “at once” to produce the final result.
Taking values and combining them into a larger value via some
constructor is exactly how normal function application works. The
Applicative type class lets you construct things from values wrapped
in some context (say, a Parser State) using a very similar syntax. By
Applicative to combine smaller parsers into larger ones, you end
up with a very convenient situation: the constructed parsers resemble
the structure of their output, not their input.
When you look at the CSV parser above, it reads like the document it’s
parsing, not the value it’s producing. It doesn’t look like an array
of arrays, it looks like a walk over the values and down the lines of a
file. There’s nothing wrong with this structure per se, but contrast it
with this parser for creating a
User from a JSON value:
data User = User String Int -- Value is a type provided by aeson to represent JSON values. parseUser :: Value -> Parser User parseUser (Object o) = User <$> o .: "name" <*> o .: "age"
It’s hard to believe the two share any qualities at all, but they are in fact the same thing, just constructed via different means of combination.
In the CSV case, parsers like
eof are combined
monadically via do-notation:
You will parse many lines of CSV, then you will parse an end-of-file.
You will parse a user from the value for the “name” key and the value for the “age” key
Just by virtue of how
Applicative works, we find ourselves with a
Parser User that looks surprisingly like a
I go through all of this not because you need to know about it to use these libraries (though it does help with understanding their error messages), but because I think it’s a great example of something many developers don’t believe: not only can highly theoretic concepts have tangible value in real world code, but they in fact do in Haskell.
Let’s see it in action.
My little command line client has the following usage:
% heroku-build [--app COMPILE-APP] [start|status|release]
Where each sub-command has its own set of arguments:
% heroku-build start SOURCE-URL VERSION % heroku-build status BUILD-ID % heroku-build release BUILD-ID RELEASE-APP
The first step is to define a data type for what you want out of
options parsing. I typically call this
import Options.Applicative -- Provided by optparse-applicative type App = String type Version = String type Url = String type BuildId = String data Command = Start Url Version | Status BuildId | Release BuildId App data Options = Options App Command
If we assume that we can build a
Parser Options, using it in
would look like this:
main :: IO () main = run =<< execParser (parseOptions `withInfo` "Interact with the Heroku Build API") parseOptions :: Parser Options parseOptions = undefined -- Actual program logic run :: Options -> IO () run opts = undefined
withInfo is just a convenience function to add
given a parser and description:
withInfo :: Parser a -> String -> ParserInfo a withInfo opts desc = info (helper <*> opts) $ progDesc desc
So what does an Applicative Options Parser look like? Well, if you remember the discussion above, it’s going to be a series of smaller parsers combined in an applicative way.
Let’s start by parsing just the
--app option using the
parseApp :: Parser App parseApp = strOption $ short 'a' <> long "app" <> metavar "COMPILE-APP" <> help "Heroku app on which to compile"
Next we make a parser for each sub-command:
parseStart :: Parser Command parseStart = Start <$> argument str (metavar "SOURCE-URL") <*> argument str (metavar "VERSION") parseStatus :: Parser Command parseStatus = Status <$> argument str (metavar "BUILD-ID") parseRelease :: Parser Command parseRelease = Release <$> argument str (metavar "BUILD-ID") <*> argument str (metavar "RELEASE-APP")
Looks familiar, right? These parsers are made up of simpler parsers
argument) combined in much the same way as our
example. We can then combine them further via the
parseCommand :: Parser Command parseCommand = subparser $ command "start" (parseStart `withInfo` "Start a build on the compilation app") <> command "status" (parseStatus `withInfo` "Check the status of a build") <> command "release" (parseRelease `withInfo` "Release a successful build")
withInfo here, we even get sub-command
% heroku-build start --help Usage: heroku-build start SOURCE-URL VERSION Start a build on the compilation app Available options: -h,--help Show this help text
Pretty great, right?
All of this comes together to make the full
parseOptions :: Parser Options parseOptions = Options <$> parseApp <*> parseCommand
Again, this looks just like
parseUser. You might’ve thought that
o .: "name" was some kind of magic, but as you can see, it’s just a
parser. It was defined in the same way as
parseApp, designed to parse
something simple, and is easily combined into a more complex parser
thanks to its applicative nature.
Finally, with option handling thoroughly taken care of, we’re free to implement our program logic in terms of meaningful types:
run :: Options -> IO () run (Options app cmd) = do case cmd of Start url version -> -- ... Status build -> -- ... Release build rApp -> -- ...
To recap, optparse-applicative allows us to do a number of things:
- Implement our program input as a meaningful type
- State how to turn command-line options into a value of that type in a concise and declarative way
- Do this even in the presence of something complex like sub-commands
- Handle invalid input and get a really great
--helpmessage for free
Hopefully, this post has piqued some interest in Haskell’s deeper ideas which I believe lead to most of these benefits. If not, at least there’s some real world examples that you can reference the next time you want to parse command-line options in Haskell.