### Summary

In this post, I go over what monad transformers are and how to use them. I go into the internals of some common transformers, and we see how monad transformers are essentially functions that take in a monad and return an "augmented" monad with extra capabilities. We finish with a discussion of two different ways of using monad transformers, `mtl`

-style and `transformers`

-style.

*This post is adapted from the chapter on monad transformers from my book,* Abstractions in Context.

In this two-part series, we're going to talk about monad transformers, what they are, and why they matter.

You may have heard of them before, and you might be wondering what they are. What is being "transformed" through monads?

From a practical standpoint, what monad transformers give you is the ability to pick and choose what "capabilities" your code has, and combine those capabilities à la carte. A few examples: you might want to add in the capability for your code to access random values, or handle exceptions. In another language, you might think of these as things that have to be baked into the language at, say, the compiler level. They're just always available, and there's nothing you can do about it. But Haskell takes the opposite design philosophy, by making it so your functions can do almost *nothing* side effect-y by default. So it's very important that you can put those functionalities back into your code, and do so conveniently.

Capability is a very broad term, though, and many of the things that monad transformers allow you to "mix in" to your code are much more featureful than you might expect. It's possible to add in the ability to stream data from a file or socket directly on top of existing code, or use monad transformers to do incremental processing of some parse input.

Don't worry if none of that preamble makes any sense right now; we'll see in more detail how this pans out once we start looking at specific examples. It's time to talk about how monad transformers work.

## What is a monad transformer, anyways?

Imagine, if you will, working with a bunch of functions that return various combinations of `IO`

and `Maybe`

, like `IO a`

, `Maybe a`

, `IO (Maybe a)`

. Somehow we need to reconcile these types so that we can use them together. This isn't too uncommon; validating data (say, against a database, or against a 3rd party API) might be one instance where this happens a lot. `IO`

to access some not-in-memory information, `Maybe`

to signal when the data doesn't pass validation.

But this is quickly going to get very tedious, because you're going to end up having to write a lot of code like this:

```
validateData1 :: Int -> IO (Maybe Int)
validateData2 :: String -> IO (Maybe String)
validateForm :: Int -> String -> IO (Maybe (Int, String))
= do
validateForm rawData1 rawData2 <- validateData1 data1
data1m case data1m of
Nothing -> pure Nothing
Just data1 -> do
<- validateData2 data2
data2m case data2m of
Nothing -> pure Nothing
Just data2 -> pure (data1, data2)
```

We're only two parameters deep and this is already getting unreadable.

What we want is some way where each monadic binding will *both* deal with the IO portion, *and* handle pattern-matching/short-circuiting on the inner Maybe. Which suggests creating a new monad, with a different bind definition:

`newtype MaybeIO a = MaybeIO { runMaybeIO :: IO (Maybe a) }`

Try implementing the functor/applicative/monad instances for this type; it shouldn't be too difficult. Once we have those instances, we can write code like this:

```
validateData1 :: Int -> MaybeIO Int
validateData2 :: String -> MaybeIO String
validateForm :: Int -> String -> MaybeIO (Int, String)
= do
validateForm rawData1 rawData2 <- validateData1 rawData1
data1 <- validateData2 rawData2
data2 pure (data1, data2)
```

Much less noisy, no?

We've run into a different problem, though. What if we want to add another monad into this stack? Say we want to also have a `Reader`

to hold some configuration, or some `State`

to hold a cache. Are we going to create new wrapping types for every single possible combination of monads? Are we going to do the tedious work of writing monad instances and helper functions for each of those possible combinations?

It seems like what we want is a way to "isolate" the functionality of each individual monad we want to use, while then having the ability to cobble them back together into a working whole. That is, we'd have a "Maybe" component that gives you the ability to short-circuit, we'd have a "Reader" component that lets you store some read-only data, and some way to plug those together.

What about a type like this?

`newtype MaybeT m a = MaybeT { runMaybeT :: m (Maybe a) }`

Notice how if you substitute in `IO`

for the $m$ parameter, you get exactly the same structure as we had with our original `MaybeIO`

type:

$$\text{MaybeT}\text{IO}a\equiv \text{IO}\left(\text{Maybe}a\right)\equiv \text{MaybeIO}a$$

We took the normal monad (in this case, Maybe) and punched a parameter into the type to put some other monad into. If we could write a working monad instance for our MaybeT type, we could then drop this type "on top" of any existing monad and keep the underlying functionality while adding Maybe-ness!

```
instance Functor m => Functor (MaybeT m) where
fmap f (MaybeT mx) = MaybeT $ (fmap . fmap) f mx
instance Applicative m => Applicative (MaybeT m) where
pure = MaybeT . pure . pure
<*>) (MaybeT mf) (MaybeT mx) = MaybeT $ liftA2 (<*>) mf mx
(
instance Monad m => Monad (MaybeT m) where
return = pure
>>=) (MaybeT rawX) f = MaybeT $ do
(<- rawX
mx case mx of
Nothing -> pure Nothing
Just x -> runMaybeT (f x)
```

Figuring out the implementation of these instances can be little tricky, but once we have them we're able to use any inner monad we want. We could write similar instances for a hypothetical `StateT`

, for a `ReaderT`

, and so on, and then we'd have the ability to mix and match them in our code, while still retaining the syntactical convenience of our initial `MaybeIO`

example.

The key thing to realize is that monad transformers can take in any other monad as their inner monad. And since monad transformers are themselves monads, you can stack these up indefinitely!

```
-- a type the compiler will happily accept
-- notice how each "layer" takes in exactly one other monad
type AMonadStack a =
StateT Int (ReaderT String (MaybeT IO)) a
```

A word about the terminology here. A bunch of transformers chained together like this, where each transformer is the inner monad of another transformer, is referred to as a monad "stack." The innermost monad in the stack, like `IO`

above, is usually referred to as the "base" or "bottom" of the monad stack. The base monad is also what we eventually run our code in; typical choices for the base are `IO`

(for side effects) or `Identity`

(for code that's pure). We write the bulk of our code wrapped inside a monad stack to get a convenient monad instance, then at the toplevel of our program we use functions like `runMaybeT`

, `runReaderT`

, `runStateT`

etc. to unwrap all those transformer types and get a value like `IO (Maybe a)`

or `state -> cfg -> IO (a, state)`

that we can actually run.

There's one small hiccup we still need to handle, though. Look at `MaybeT`

. How do we access the inner Maybe value? When we go to signal that the current function should short-circuit, how do we do it? Returning a `Nothing`

won't work, since Maybe and MaybeT are distinct types. So we'll need a helper function that specifically returns a MaybeT.

```
nothing :: Applicative m => MaybeT m a
= MaybeT (pure Nothing) nothing
```

For any other monad transformers we create, we'd need to do the same thing and write helper functions to enable monad-specific functionality. But compared to the amount of boilerplate that we'd need to implement the "every possible combination of monads" choice from before, this level of repetition is something we'd take anyday.

```
-- for StateT
get :: Applicative m => StateT s m s
put :: Applicative m => s -> StateT s m ()
-- for ReaderT
ask :: Applicative m => ReaderT cfg m cfg
local :: (cfg -> cfg') -> ReaderT cfg' m a -> ReaderT cfg m a
```

You can think of monad transformers as functions, functions that take in some other monadic type and "transform" it by adding in extra capabilities. Hence the name. And it really can be any monad; the definitions that we've written above are completely agnostic of what the inner monad is, only that it implements certain typeclasses.^{1}

`mtl`

-style and `transformers`

-style

Everything we've covered so far forms the foundationals of monad transformers; if you've made it to this point, you've understood monad transformers. However, there are a few extra conveniences that are possible to add. Specifically, there are two common ways to use typeclasses to make working with transformers more convenient, referred to as the `transformers`

style and the `mtl`

style.

Consider our MaybeT example. Our solution worked well when every function we were working with returned `MaybeT IO a`

. But what if we want to just call `IO a`

actions directly? It seems like we should be able to. Unfortunately, trying to call them directly inside a MaybeT function wouldn't work; it's a different type, after all.

```
validateInput :: MaybeT IO String
= do
validateInput <- getLine -- doesn't compile; IO =/= MaybeT IO
line ...
```

Instead, we have to write something like this:

```
validateInput :: MaybeT IO String
= do
validateInput <- MaybeT $ fmap Just getLine
line ...
```

And we'd have to do something similar for each IO action we ran inside our `MaybeT IO`

. Smells a bit boilerplate-y, doesn't it? We could cut down on some of the repetition by writing a function `IO a -> MaybeT IO a`

, but what do we then do if we're using a different inner monad? What do we do if our monad stack is multiple layers deep; do we have to manually do the wrapping for every layer above the one we're trying to use? Plus, forget MaybeT; wouldn't every monad transformer type need us to write a similar function?

`transformers`

-style and `mtl`

-style, named after the respective libraries which implement them in Haskell, are two different ways to tackle this problem, cutting down on the amount of repetition needed to use monad transformers in your own code. We'll start with `transformers`

.

Fundamentally, the problem is that we need some easy way to take the inner, "wrapped" monad, and convert it to the "wrapping" transformer. Well, the simplest, most direct way to solve that seems like an actual function from one to the other. We've already talked about how we could do this and write a function `IO a -> MaybeT IO a`

, but as we said, we need a function like this for both variations on the inner and the outer type. Sounds like polymorphism; sounds like we need a typeclass specifically for monad transfomers.

```
class MonadTrans trans where
lift :: Monad m => m a -> trans m a
```

Take a second to understand this typeclass; try substituting in some specific monads for `m`

and `trans`

. For instance, if we substitute `trans = MaybeT`

, `m = IO`

, `lift`

will have exactly the type `IO a -> MaybeT IO a`

. But since `lift`

is polymorphic, we can now wrap any inner monad we want with just one function.

This also gives us a concise definition of a monad transformer: it's any type where an inner monad can be converted to the type itself. The operation is called "lift" because we're moving an inner monad "upwards" through the stack, towards the topmost transformer.

```
instance MonadTrans MaybeT where
= MaybeT . fmap Just
lift instance MonadTrans (StateT s) where
= StateT (\s -> fmap ((,) s) m)
lift m
validateInput :: MaybeT IO String
= do
validateInput <- lift getLine -- much shorter now!
line ...
```

This solution is what's known as `transformers`

style.

Success! And with this, we're done, right? Boilerplate problem solved? Unfortunately, not quite yet. Though this is a major improvement, there are still degenerate cases when the monad stack starts getting tall.

```
foo :: StateT Int (ReaderT String (MaybeT IO)) ()
= do
foo -- need to do some IO in this stack
<- lift $ lift $ lift getLine -- lots of lifts needed
input ...
```

We've at least gotten rid of the problem of requiring the end developer to implement and/or remember lots of different lifting functions for each monad transformer. But we can still only lift one layer at a time. What we really want is, for every component in our monad stack, we can use the functions from that component in any stack containing that component, with no wrapping needed.

For this to work, suddenly the functions that each layer provide have to be polymorphic. For StateT, right now we have functions like this:

```
get :: Applicative m => StateT s m s
put :: Applicative m => s -> StateT s m ()
```

But for what we want, signatures like this can't possibly work; the return type is too concrete.

Instead, we need these functions to have a signature more like so:

```
get :: MonadState trans s => trans s
put :: MonadState trans s => s -> trans ()
```

Rather than tying ourselves to a concrete state type, we use a typeclass that represents "statefulness." The idea is that as long as the type of our monad stack implements this typeclass, we can call `get`

and `put`

without lifting, no matter how deep the `StateT`

is in the stack, regardless of what order the transformers have been stacked in.

```
class Monad m => MonadState m s where
get :: m s
put :: s -> m ()
-- StateT can implement MonadState directly...
instance Monad m => MonadState (StateT s m) s where
= StateT $ \s -> pure (s, s)
get = StateT $ const (pure (s, ()))
put s
-- ...and if the inner monad supports statefulness, other monad
-- transformers can simply delegate the state operations downward
instance MonadState m s => MonadState (MaybeT m) s where
= lift get
get = lift (put s)
put s -- instances look very similar for ReaderT, WriterT, etc.
```

Another way to look at it is that for any transformer type that's not `StateT`

itself, the `MonadState`

instance handles calling the appropriate amount of `lift`

s for us.

With all that, you can see that we now have the capability to use `get`

and `put`

in whatever monad stack we want, as long as we specify that said monad stack has something to handle that statefulness somewhere in the hierarchy:

```
-- our functions now work in this stack...
foo :: StateT Int IO ()
= do
foo <- get
state + 1)
put (state
-- ...as well as this one
bar :: MaybeT (StateT Int IO) ()
= do
bar <- get
state + 1)
put (state
-- ...or even fully polymorphic
baz :: MonadState m Int => m ()
= do
baz <- get
state + 1) put (state
```

This solution is what's known as `mtl`

style.^{2}

To fully make use of this style, we'd have to write similar typeclasses for all our other transformers as well, which can be a lot of work. But once we do, the syntactic noise of calling `lift`

disappears; we can directly use any function from any transformer in our stack wherever we want. Whether doing all this implementation work purely to remove calls to `lift`

is worth it is another question entirely, but there's no denying that this works.

Beyond just the syntactic convenience of not having to call `lift`

anymore, there are a number of other benefits to using monad transformers like this compared to `transformers`

style.

Say we had two functions, one that returned a `MaybeT IO a`

and one that returned a `MaybeT (StateT Int IO) b`

. Intuitively, it seems like we should be able to use these together, since the latter monad stack has strictly more functionality than the former. But if you actually try to do this, you'll need to do some painful finagling to convert from one type to the other; lifts won't cut it here, since the offending StateT is in the middle of the stack rather than the top.

This problem disappears in `mtl`

style, since we never actually specify concrete types for our monad stack anywhere, just typeclasses describing what functionality we need. So if we have one function that uses just reader functions, and another that uses both state and reader functions, there's nothing stopping us from combining them.

```
justReader :: MonadReader m String => m ()
= ...
justReader
readerAndState :: (MonadReader m String, MonadState m Int) => m ()
= ...
readerAndState
-- we can use both in the same function!
combined :: (MonadReader m String, MonadState m Int) => m ()
= do
combined
justReader readerAndState
```

In fact, this elegantly deals with a number of other potential problems with using `transformers`

style, like how `ReaderT cfg (StateT s IO) a`

and `StateT s (ReaderT cfg IO) a`

aren't the same type and can't be used together, despite being equivalent by inspection; in `mtl`

style, the order that transformers are specified in doesn't matter when writing your logic, only when you eventually go to run it.

Between `mtl`

and `transformers`

, which one is better? While we can see that `mtl`

is definitely more syntactically convenient, it's actually a rather subtle question. Both libraries can be better in different situations. `mtl`

allows you to avoid having to specify an order when writing code, and only choose it once you go to run it, which can matter when working with monads that allow for early exits. It's also just nicer to use. On the other hand, `transformers`

can allow you to have more than one of the same transformer in the same stack; with `mtl`

, if you want two different StateT's, both with an Int state parameter, you can't; there's no way to differentiate the constraints needed. This ambiguity doesn't exist in `transformers`

style. Implementing `mtl`

style (say, if you create your own monad transformers) is significantly more work as well, since every new transformer requires a typeclass, plus instances for every existing transformer, something known as the "$n^2$ instances" problem.

One important difference is that `mtl`

is often slower than `transformers`

, despite providing the exact same functionality and using the exact same underlying transformer types. The reasons why are outside the scope of this post, but it's something to keep in mind if you're using monad transformers for performance-critical code.^{3}

In the next post, we'll go deeper into the *why* of monad transformers. We'll look at practical examples of combining transformers to solve real problems. We'll see how the resulting code is more than the sum of its parts, while still retaining the modularity typical of Haskell abstractions. Stay tuned!

Found this useful, or otherwise have comments or questions? Talk to me!

### Before you close that tab...

Want to become an expert at Haskell, but not sure how?
I get it: it's an endless stream of inscrutable concepts and words,
as if you've stepped into some strange bizarro world.
Where do you even start? Why does any of this matter? How deep do
these rabbit holes *go*?

I want to help. What if you always knew exactly what the next signpost on your journey was, if you always knew exactly what to learn next? That's why I created a Roadmap to Expert for you: a checklist of everything you need to know, sorted by difficulty, broken down into individual, easily-digestible chunks. Best of all: it's free! Just sign up for my email list below.

And there's more where that came from: Only a fraction of what I write ends up on this blog. Sign up and get advice, techniques, and templates for writing real, useful programs, straight to your inbox.

Absolutely no spam, ever. I respect your email privacy. Unsubscribe anytime.

## Footnotes

^{↥1} I wish this footnote didn’t have to exist, but it does. Unfortunately, there are exceptions; there are *some* transformers that don’t form law-abiding monads, or have to be used in very specific ways. The most notorious of these is `ListT`

. My impression is that people try to stay away from these kinds of transformers that don’t compose well.

^{↥2} More generally, this approach of using typeclass constraints/instances and making the type of a value completely polymorphic is known as “tagless final.”. Tagless final can be seen as an inversion of control, where the behavior of the code is determined by usage sites, rather than by the code definition itself. It’s a more general technique than we’ve looked at here; it just happens to be useful for working with monad transformers as well.

^{↥3} The short explanation for why `mtl`

is slower is that generally GHC doesn’t monomorphize code that calls typeclass functions; usually a record containing the type-specific typeclass functions are passed to the code at runtime instead. If you’re interested in the gorey details, check out this great talk about the performance of various effect systems by Alexis King!