A cheatsheet to JSON handling with Aeson

October 19, 2019

« Previous post Next post »

Parsing and emitting JSON is a necessity for basically any program that wants to talk to the internet. Haskell's most widely used JSON package, aeson, is the de-facto standard choice here. Its basic usage is easy enough: define your Haskell types for some JSON data, derive FromJSON and ToJSON instances, and you're ready to encode and decode. Of course, real-life JSON data and API returns are never quite that simple; work with aeson long enough, and you'll likely need more complicated use cases. For instance, what if you need to:

parse JSON data where the attribute names are different from your Haskell property names? e.g. where the attribute names are snake_case and your code is camelCase?
have several different JSON parsers for the same Haskell data? If you use typeclasses, there has to be a single 'canonical' parser.
parse a string enum which can only be one of a few alternatives? e.g. your JSON "user_type" can only be one of "user", "admin", or "customer_support".
use one field to determine how to parse the rest of the document? e.g. for Slack's Web API, you need to check the "ok" field first to determine whether the document contains data, or whether it only has an error message.
deal with weird strange JSON formats, like if collections are sent as dictionaries with ordered keys instead of a list?
parse JSON fields without needing to create a new datatype?

The autoderived parsers that aeson gives you won't cut it for these sorts of situations. Fortunately, it is possible to use aeson for any complicated JSON parsing solution you need, including the ones listed above, but it can be surprisingly nonobvious how to do so. So here's a cheatsheet for some common operations using aeson.

Note that the later examples will make heavy use of monadic code; for the more complicated use cases of aeson, there's really no way around it. You should be able to understand the logic behind what's going on even if you're not 100% on monads, but if you're not comfortable with reading monadic code and do-syntax, take a look at my introduction to monads.

All examples were tested on aeson 1.4.4.0.

Importing and using

Add to your package.yaml/cabal file:

dependencies:
  - aeson

In modules where you need to manipulate JSON:

import Data.Aeson
import Data.Aeson.Types

While the import of Data.Aeson.Types isn't strictly necessary, it's so often useful that it's usually worth importing.

Basics

Autoderiving parsers and serializers

{-# LANGUAGE DeriveGeneric  #-}
{-# LANGUAGE DeriveAnyClass #-}

import Data.Aeson
import GHC.Generics

data Foo = Foo
  { field1 :: Int
  , field2 :: String
  }
  deriving (Show, Generic, ToJSON, FromJSON)
  -- ToJSON so that we can encode *to* a JSON string,
  -- FromJSON so that we can parse *from* a JSON string

This is the 'preferred' way to automatically derive JSON parsers and encoders for your types, using Generic. Essentially, this gives aeson enough information to introspect on the structure of Foo and figure out all the fields it needs and what they're named, and thus how to construct one using a JSON string. Don't forget to enable the language extensions.

Like I mentioned before, the autoderived instances aren't particularly flexible, and won't cut it for even slightly complicated formats. It will only parse JSON where the attribute names are exactly the same as the Haskell names, including case. In practice, I find that I never write my instances this way; I prefer manually implementing the ToJSON and FromJSON instances for more control.

Decoding from JSON strings

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.ByteString.Lazy as LB

jsonString :: LB.ByteString
jsonString = "{ \"field1\": 27, \"field2\": \"hello!\" }"

maybeFoo :: Maybe Foo
maybeFoo = decode jsonString

λ> maybeFoo
>>> Just (Foo {field1 = 27, field2 = "hello!"})

decode :: FromJSON a => LB.ByteString -> Maybe a

Aeson only seems to provide functionality to decode from lazy ByteStrings.

Encoding to JSON strings

myFoo :: Foo
myFoo = Foo
  { field1 = 909
  , field2 = "take your time"
  }

λ> encode myFoo
>>> "{\"field1\":909,\"field2\":\"take your time\"}"

encode :: ToJSON a => a -> LB.ByteString

Basically, if you're working with aeson, use lazy ByteStrings.

Core JSON types

-- Any possible JSON value.
data Value
  = Object Object
  | Array Array
  | String Text
  | Number Scientific
  | Bool Bool
  | Null

-- Just JSON objects, e.g. things constructed using {...}
type Object = HashMap Text Value

-- Just JSON arrays, e.g. things constructed using [...]
type Array = Vector Value

Text, Bool, HashMap and Vector are as you'd expect. Scientific is from the scientific package, and represents arbitrary-precision numbers.

Constructing JSON values directly

While the data definition above is already enough to construct any valid JSON value you want, there's some convenience functions for constructing JSON objects. (By the way, pay attention to the difference between JSON values and JSON objects.)

{-# LANGUAGE OverloadedStrings #-}

customValue :: Value
customValue = object
  [ "list_price" .= (150000 :: Int)
  , "sale_price" .= (143000 :: Int)
  , "description" .= ("2-bedroom townhouse" :: String)
  ]

λ> customValue
>>> Object
      (fromList
         [ ( "sale_price" , Number 143000.0 )
         , ( "list_price" , Number 150000.0 )
         , ( "description"
           , String "2-bedroom townhouse"
           )
         ])

object :: [Pair] -> Value
(.=)   :: ToJSON v => (strict) Text -> v -> Pair

Common use cases

Implementing a custom parser

Eventually the default parsers won't cut it. If you need custom parsing behavior, you can always write your ToJSON and FromJSON values by hand.

For writing ToJSON instances, you can use the convenience functions from before to build aeson Values.

{-# LANGUAGE OverloadedStrings #-}

data Person = Person
  { firstName :: String
  , lastName  :: String
  }
  deriving (Show)

-- our fields are snake_case instead
instance ToJSON Person where
  toJSON (Person { firstName = firstName, lastName = lastName }) =
    object [ "first_name" .= firstName
           , "last_name"  .= lastName
           ]

λ> encode (Person "Karl" "Popper")
>>> "{\"first_name\":\"Karl\",\"last_name\":\"Popper\"}"

For writing FromJSON instances, the main functions you'll want are withObject and (.:).

-- our fields are snake_case instead
instance FromJSON Person where
  -- note that the typeclass function is parseJSON, not fromJSON
  parseJSON = withObject "Person" $ \obj -> do
    firstName <- obj .: "first_name"
    lastName <- obj .: "last_name"
    return (Person { firstName = firstName, lastName = lastName })

karlJSON :: LB.ByteString
karlJSON = "{\"first_name\":\"Karl\",\"last_name\":\"Popper\"}"

λ> decode karlJSON :: Maybe Person
>>> Just (Person {firstName = "Karl", lastName = "Popper"})

If you have an optional field, use (.:?) instead of (.:).

data Item = Item
  { name :: String
  , description :: Maybe String
  }
  deriving (Show)

instance FromJSON Item where
  parseJSON = withObject "Item" $ \obj -> do
    name <- obj .: "name"
    description <- obj .:? "description"
    return (Item { name = name, description = description })

λ> decode "{\"name\": \"Very Evil Artifact\"}" :: Maybe Item
>>> Just (Item {name = "Very Evil Artifact", description = Nothing})

withObject :: String -> (Object -> Parser a) -> Value -> a
(.:)       :: FromJSON a => Object -> Text -> Parser a
(.:?)      :: FromJSON a => Object -> Text -> Parser (Maybe a)

Note that Parser implements Monad and Alternative. So if you need to do more complex things like take in both snake_case and camelCase keys for the same field, or conditionally parse one field based on the value of another, you can use the normal applicative/monadic tools for doing so. We'll see some examples of doing that later.

Parsing enum datatypes

The autoderive works for simple enum types as well.

data UserType = User | Admin | CustomerSupport
  deriving (Generic, ToJSON, FromJSON)

λ> encode CustomerSupport
>>> "\"CustomerSupport\""

But the output is, once again, exactly the same case as the Haskell code. So if you're trying to parse some API enum you'll need to write custom instances once again.

The ToJSON instance should be fairly obvious. Writing the FromJSON instance is a little bit trickier.

instance FromJSON UserType where
  parseJSON = withText "UserType" $ \text ->
    case text of
      "user"             -> return User
      "admin"            -> return Admin
      "customer_support" -> return CustomerSupport
      _                  -> fail "string is not one of known enum values"

Parsing weird JSON formats

Since the Parser type is a monad, we can write as complicated conditional logic as we want inside our parser code.

For instance, let's say an API we're working with can either send us some data or an error message; we need to check the "ok" attribute first to see which way to parse it. We might represent this with a sum type on the Haskell side. How do we write our FromJSON instance?

import Data.Text

data APIResult
  = JSONData Value
  | Error Text
  deriving (Show)

instance FromJSON APIResult where
  parseJSON = withObject "APIResult" $ \obj -> do
    ok <- obj .: "ok"
    if ok
      then fmap JSONData (obj .: "data")
      else fmap Error (obj .: "error_msg")

goodData :: LB.ByteString
goodData = "{\"ok\":true,\"data\":{\"foo\":2}}"

badData :: LB.ByteString
badData = "{\"ok\":false,\"error_msg\":\"no_credentials\"}"

λ> decode goodData :: Maybe APIResult
>>> Just (JSONData (Object (fromList [("foo",Number 2.0)])))

λ> decode badData :: Maybe APIResult
>>> Just (Error "no_credentials")

Another annoying situation might be if collections are sent as dictionaries with ordered keys instead of as JSON lists. But again, we can handle this:

-- e.g. our API sends us data like
--
-- {
--   "element1": 42,
--   "element2": -20,
--   "element3": 1000
-- }
--
-- instead of [42, -20, 1000]

import qualified Data.List as L
import qualified Data.HashMap.Strict as HM

data JSONHashList a = HashList [a]
  deriving (Show)

instance FromJSON a => FromJSON (JSONHashList a) where
  parseJSON = withObject "JSONHashList" $ \obj ->
    let kvs = HM.toList obj
        sorted = L.sortOn (\(key, _) -> key) kvs
        vals = map (\(_, val) -> val) sorted
        parsed = mapM parseJSON vals
    in fmap HashList parsed

weirdListData :: LB.ByteString
weirdListData = "{\"element1\":42,\"element2\":-20,\"element3\":1000}"

λ> decode weirdListData :: Maybe (JSONHashList Int)
>>> Just (HashList [42,-20,1000])

Parse a type directly from a Value

Right now we can parse from a ByteString, but what if we already have a Value?

The simplest way is to use fromJSON:

fromJSON :: FromJSON a => Value -> Result a

value :: Value
value = object [ "first_name" .= "Juniper", "last_name" .= "Lerrad" ]

λ> fromJSON value :: Result Person
>>> Success (Person {firstName = "Juniper", lastName = "Lerrad"})

However, fromJSON returns aeson's own custom Result type, which is all fine and dandy, but probably not what you're passing around in the rest of your application.

Thankfully, Data.Aeson.Types provides the parseMaybe and parseEither functions, which return values of types rather more compatible with the rest of the Haskell ecosystem:

parseMaybe  :: (a -> Parser b) -> a -> Maybe b
parseEither :: (a -> Parser b) -> a -> Either String b

Since parseJSON from the FromJSON typeclass already has the type Value -> Parser a, we can use it to define useful utility functions. Plugging it into the first argument of parseMaybe and gives us:

fromJSONValue :: FromJSON a => Value -> Maybe a
fromJSONValue = parseMaybe parseJSON

λ> fromJSONValue value :: Maybe Person
>>> Just (Person {firstName = "Juniper", lastName = "Lerrad"})

Have multiple parsing functions for a single type

Sometimes you might have several different JSON formats for the same object, and then the typeclass solution won't cut it. But we just saw that parseMaybe and parseEither take a parser function as their first argument. We used the function that the FromJSON typeclass provides before, but there's nothing stopping us from putting something else there.

data Person = Person
  { firstName :: String
  , lastName  :: String
  }
  deriving (Show)

snakeCaseParser :: Value -> Parser Person
snakeCaseParser = withObject "Person" $ \obj -> do
  firstName <- obj .: "first_name"
  lastName <- obj .: "last_name"
  pure (Person { firstName = firstName, lastName = lastName })

pascalCaseParser :: Value -> Parser Person
pascalCaseParser = withObject "Person" $ \obj -> do
  firstName <- obj .: "FirstName"
  lastName <- obj .: "LastName"
  pure (Person { firstName = firstName, lastName = lastName })

snakeCasePerson :: Value
snakeCasePerson = object
  [ "first_name" .= ("Dimitri" :: String)
  , "last_name" .= ("Blaiddyd" :: String)
  ]

pascalCasePerson :: Value
pascalCasePerson = object
  [ "FirstName" .= ("Dimitri" :: String)
  , "LastName" .= ("Blaiddyd" :: String)
  ]

λ> parseMaybe snakeCaseParser snakeCasePerson :: Maybe Person
>>> Just (Person {firstName = "Dimitri", lastName = "Blaiddyd"})

λ> parseMaybe snakeCaseParser pascalCasePerson :: Maybe Person
>>> Nothing

λ> parseMaybe pascalCaseParser snakeCasePerson :: Maybe Person
>>> Nothing

λ> parseMaybe pascalCaseParser pascalCasePerson :: Maybe Person
>>> Just (Person {firstName = "Dimitri", lastName = "Blaiddyd"})

Parse a type directly from an Object

Sometimes we already know that we have a JSON Object and don't need the full generality of Value. But all of our functions thus far either parse from a ByteString or a Value.

However, since withObject takes in a parser that takes in an Object and turns it into a parser that takes a Value, we can get to what we want by just removing the withObject wrapping and defining the parser separately. So instead of defining FromJSON instances the way we did above, we can do it like this:

{-# LANGUAGE OverloadedLists #-}

personParser :: Object -> Parser Person
personParser obj = do
  firstName <- obj .: "first_name"
  lastName  <- obj .: "last_name"
  return (Person { firstName = firstName, lastName = lastName })

instance FromJSON Person where
  parseJSON = withObject "Person" personParser

personObject :: Object
personObject = [("first_name", "Anthony"), ("last_name", "Yoon")]

λ> parseMaybe personParser personObject
>>> Just (Person {firstName = "Anthony", lastName = "Yoon"})

Parsing without a new datatype

Since the parseX family of functions takes in a parser directly, there's no need to define a new datatype for one-off or bespoke parses.

tupleizeFields :: Value -> Either String (Int, Bool)
tupleizeFields = parseEither $
  withObject "<fields>" $ \obj -> do
    field1 <- obj .: "field1"
    field2 <- obj .: "field2"
    return (field1, field2)

tupleJSON :: Value
tupleJSON = object
  [ "field1" .= 955
  , "field2" .= True
  ]

λ> tupleizeFields tupleJSON
>>> Right (955,True)

Putting it all together, we can go straight from a ByteString to parsed data without having to define a datatype at all.

import Data.ByteString.Lazy

tupleizeFieldsBS :: ByteString -> Either String (Int, Bool)
tupleizeFieldsBS input = do
  object <- eitherDecode input
  let parser = (\obj -> do
        field1 <- obj .: "field1"
        field2 <- obj .: "field2"
        return (field1, field2))
  parseEither parser object

λ> tupleizeFieldsBS "{\"field1\":955,\"field2\":true}"
>>> Right (955,True)

Less common stuff

Parsing nested fields

While being able to write parsers to access nested fields is a natural consequence of the monad instance for Parser, it may not immediately spring to mind the first time you need to do it.

-- { contact_info: { email: <string> } }
nested :: Value -> Parser String
nested = withObject "ContactInfo" $ \obj -> do
  contact <- obj .: "contact_info"
  contact .: "email"

λ> parseMaybe nested $ object
 |   [ "contact_info" .=
 |     object [ "email" .= "williamyaoh@gmail.com" ]
 |   ]
>>> Just "williamyaoh@gmail.com"

If you find yourself doing this a lot, it might even be worth it to define a new operator specifically for this use case:

import Data.Text

(.->) :: FromJSON a => Parser Object -> Text -> Parser a
(.->) parser key = do
  obj <- parser
  obj .: key

nested' :: Value -> Parser (String, String)
nested' = withObject "ContactInfo" $ \obj -> do
   email <- obj .: "contact_info" .-> "email"
   state <- obj .: "contact_info" .-> "address" .-> "state"
   return (email, state)

λ> parseMaybe nested' $ object
 |   [ "contact_info" .= object
 |     [ "email" .= "williamyaoh@gmail.com"
 |     , "address" .= object
 |       [ "state" .= "OK"
 |       , "zip_code" .= "74008"
 |       ]
 |     ]
 |   ]
>>> Just ("williamyaoh@gmail.com","OK")

Parsing multiple JSON values in the same string

The default functions that aeson provides don't allow you to inspect what's left over in the input string after parsing a Value, so if you need to do things like parse a complicated file format where JSON is somewhere in the format, or parse multiple JSON values appended to the same file, the functions we've looked at up to now won't cut it.

Thankfully, aeson also exposes its attoparsec parsers, so we can use all the tools we have for manipulating parser combinators to handle JSON input as well.

You'll likely want to import parser-combinators as well.

{-# LANGUAGE OverloadedStrings #-}

import Control.Monad.Combinators
  -- from parser-combinators

import Data.Aeson
import Data.ByteString.Lazy
import qualified Data.Attoparsec.ByteString.Lazy as Atto

jsonStr :: ByteString
jsonStr = "{ \"foo\": 555 }"

input :: ByteString
input = jsonStr `mappend` jsonStr

λ> Atto.parse (many json) input
>>> Done
      ""
      [ Object (fromList [ ( "foo" , Number 555.0 ) ])
      , Object (fromList [ ( "foo" , Number 555.0 ) ])
      ]

-- attoparsec Parser, not aeson Parser
json :: Parser Value

Useful auxilliary libraries

Since aeson is so widely used, there are a fair amount of libraries in the ecosystem that provide extra functionality on top of what is provided in aeson itself. You don't need any of these libraries to work with JSON, but you might find them useful.

Pretty-printing

Aeson doesn't provide a way to pretty-print the encoded JSON strings by default, but the aeson-pretty package does.

{-# LANGUAGE OverloadedStrings #-}

import Data.Aeson
import Data.Aeson.Encode.Pretty

import qualified Data.ByteString.Lazy.Char8 as B

encodePretty :: ToJSON a => a -> (lazy) ByteString

λ> json = object
     [ "period" .= ("yearly" :: String)
     , "metadata" .= object
       [ "created_at" .= ("2019-05-01" :: String)
       , "views" .= 0
       ]
     ]
λ> B.putStrLn $ encodePretty json
>>> {
>>>     "period": "yearly",
>>>     "metadata": {
>>>         "views": 0,
>>>         "created_at": "2019-05-01"
>>>     }
>>> }

If you need more control over how the output is formatted, aeson-pretty also provides encodePretty':

encodePretty' :: ToJSON a => Config -> a -> (lazy) ByteString

data Config = Config
  { confIndent          :: Indent
    -- how to sort object keys
  , confCompare         :: Text -> Text -> Ordering
    -- how to output numeric types
  , confNumFormat       :: NumberFormat
  , confTrailingNewline :: Bool
  }

defConfig :: Config
  -- * 4 spaces per indent
  -- * don't sort keys
  -- * don't add trailing newline

data Indent = Spaces Int | Tab

data NumberFormat
  = Generic
  | Scientific
  | Decimal
  | Custom (Scientific -> Data.Text.Lazy.Builder)

λ> config = defConfig
     { confIndent = Spaces 2
     , confCompare = compare
     }
λ> B.putStrLn $ encodePretty' config json
>>> {
>>>   "metadata": {
>>>     "created_at": "2019-05-01",
>>>     "views": 0
>>>   },
>>>   "period": "yearly"
>>> }

Embedding literal JSON values in code

The aeson-qq package provides a quasiquoter to allow you to directly write JSON strings into your code and have them converted into Values.

{-# LANGUAGE QuasiQuotes #-}

import Data.Aeson
import Data.Aeson.QQ

users :: Value
users = [aesonQQ|
  {
    "users": [
      {
        "username": "michael.oakeshott",
        "id": 1
      },
      {
        "username": "miguel.de.cervantes",
        "id": 4
      }
    ]
  }
|]

This quasiquoter also allows you interpolate in any values that implement ToJSON by enclosing them with #{...}.

Newer versions of aeson provide a simple version of this quasiquoter in Data.Aeson.QQ.Simple, but without the ability to interpolate values.

Doing data access directly on Values

If you need to directly grab data from within a Value, you can always just use pattern matching. However, this quickly gets pretty tedious if you need to do anything more complicated than grabbing a single, surface-depth attribute.

The lens-aeson provides (what else?) lenses for accessing JSON data.

{-# LANGUAGE QuasiQuotes #-}

import Control.Lens
  -- from lens

import Data.Aeson
import Data.Aeson.QQ
import Data.Aeson.Lens
  -- from lens-aeson

someJSON :: Value
someJSON = [aesonQQ|
  {
    "data": {
      "timestamps": {
        "created_at": "2019-05-11 17:53:21"
      }
    }
  }
|]

λ> someJSON ^? key "data".key "timestamps".key "created_at"
>>> Just (String "2019-05-11 17:53:21")

A full explanation of lenses is outside the scope of this article. Here are some more examples of using lens-aeson, as well as exercises for learning lenses themselves.

JSON seems to be the de-facto standard interchange format for the internet for the time being. So if you're doing anything online with Haskell, you'll likely end up working with aeson, one way or another. With this, you should be equipped to handle most of the JSON-related situations you come across.

If there's one thing to take away from this post, keep in mind that the Parser type (and its Applicative/Monad instances) is what drives all the fancy JSON ingesting. If you're having trouble parsing something, it's likely in how you're constructing values of this type. Pay attention to all the functions that produce Parser values.

While aeson is the de-facto standard JSON library, waargonaut is a recent addition to the ecosystem with a focus on supporting JSON parsing through term-level parsers rather than typeclasses. I haven't actually used it and can't comment on its usefulness, but if you need more flexible parsing, it may be worth taking a look at.

Come across any particularly hairy JSON and having trouble wrangling it in Haskell? Got a comment? Talk to me!