Are you frustrated trying to do any kind of string manipulation in Haskell? The same kind of interpolation or string building that would require zero thought in other languages seems to always turn into a mess of (<>)
s, mconcat
s, and show
s in vanilla Haskell. Throw in Text
and ByteString
and now we have to juggle collisions between different definitions of pack
, module imports, lazy versus strict versions of the same functions... not to mention having to manually handle Unicode conversions between Text and ByteString, even when 99% of the time you know it'll be UTF-8. In short, it just becomes a huge mess.
I got fed up with this situation and wrote string-interpolate
.
Here's how you use it:
{-# LANGUAGE QuasiQuotes #-}
import Data.String.Interpolate ( i )
import Data.Text
firstName :: String
= "Yuuko"
firstName
lastName :: String
= "Shirasawa"
lastName
message :: Text
= [i|Good morning, #{firstName} #{lastName}!|] message
Note how we can directly interpolate Strings into a Text without needing to do any explicit conversion; interpolation does what you expect, where textual types and Chars will be interpolated without the surrounding quotes.
You don't have to worry about lazy versus strict:
import qualified Data.Text as T
import qualified Data.Text.Lazy as LT
description :: T.Text
= "a very, very old strategy game"
description
payload :: LT.Text
= [i|{ 'description': '#{description}' }|] payload
Anything which implements Show
gets interpolated as you'd expect:
data Direction = Left | Right
deriving Show
> dir = Left
λ> [i|Heading: #{dir}|] :: String
λ>>> "Heading: Left"
Conversions between Text/String and ByteString automatically convert to/from UTF-8:
import Data.Text
import Data.ByteString
text :: Text
= "💩"
text
> [i|Bytes: #{text}|] :: ByteString
λ>>> "Bytes: \240\159\146\169"
In short, it makes building up strings much, much easier. It's also fast!
The rest of this post is about the internal architecture of string-interpolate; while the tricks used involving -XOverlappingInstances
and related type hackery should be old hat to advanced Haskellers, I haven't seen a case study of using them in practice.
While using string-interpolate is extremely simple, internally we have to somehow transform the quasiquotes that users give us into code that:
- handle taking in values of the various textual types to interpolate, and
- outputting every textual type
Since quasiquote handlers don't allow you to look at the type of identifiers, we can't actually emit some code conditionally; we have to spit out the same code no matter what types our users give us. Sounds like a job for typeclasses.
The simplest way to implement string-interpolate would just be to use Show
and IsString
together, since every textual type already implements both of these. So a quasiquote like:
[i|Username: #{username} | Updoots: #{updoots}|]
would emit code like:1
mconcat
"Username: "
[ show username)
, fromString (" | Updoots: "
, show updoots)
, fromString ( ]
While this immediately gets us to a working, convenient solution, it's painfully slow; using show
and toString
means we have to construct a String for every interpolated value and then convert it into our target type. Just as importantly, it doesn't handle Unicode strings correctly! If we took our emoji-containing Text above and tried to interpolate it into a ByteString, we'd end up just getting a string containing the Unicode code point as a sequence of digit characters. Ditto if we tried to interpolate the UTF-8 bytes into a Text, just with the individual bytes. Weirdly, interpolating a ByteString into another ByteString won't even give you the same string if that ByteString isn't ASCII, which certainly seems counterintuitive for any users.
Another idea would be to create our own Interpolatable
typeclass. Since we want the behavior to depend both on what we're interpolating in and what we want out, we'd have to take in two parameters:
{-# LANGUAGE MultiParamTypeClasses #-}
-- from `text-conversions'
import Data.Text.Conversions
-- from `utf8-string'
import qualified Data.ByteString.UTF8 as UTF8
class Interpolatable src dst where
interpolate :: src -> dst
instance Interpolatable Text Text where
= id
interpolate instance Interpolatable ByteString ByteString where
= id
interpolate instance Interpolatable Text ByteString where
= unUTF8 . convertText
interpolate instance Interpolatable ByteString Text where
-- emit replacement character � for anything
-- which isn't valid UTF-8
= UTF8.foldr
interpolate -> Data.Text.singleton char <> output)
(\char output ""
So that our quasiquote from earlier now emits something like:
mconcat
"Username: "
[
, interpolate username" | Updoots: "
,
, interpolate updoots ]
Now we're fast and handle Unicode properly, but it means that we're kind of limited in what we can interpolate. Do we really want to provide instances for, say, Int
, Integer
, Float
, Double
, Bool
, and so on? Not to mention that if someone wants to use their own type with our interpolation, they'll also have to write their own instances for Interpolatable. Wasn't our library supposed to be convenient?
It would be nice if we could simply use Show
as a default, and use the specific instances if we've defined one. What happens if we just try to add an instance for anything that's Show?
{-# LANGUAGE FlexibleInstances #-}
instance Show src => Interpolatable src Text where
= Data.Text.pack . show
interpolate instance Show src => Interpolatable src ByteString where
= unUTF8 . convertText . show interpolate
This seems to compile fine, but when we try it out, GHC starts complaining at us:
> text = "abc" :: Text
λ> (interpolate text) :: ByteString
λ
<interactive>:76:1: error:
Overlapping instances for Interpolatable Text ByteString
• of ‘interpolate’
arising from a use Matching instances:
instance [safe] Show src => Interpolatable src ByteString
-- Defined at <interactive>:70:10
instance [safe] Interpolatable Text ByteString
-- Defined at <interactive>:44:10
In the expression: interpolate text :: ByteString
• In an equation for ‘it’: it = interpolate text :: ByteString
Hmm. Well, that's not great.
What GHC is telling us is that it doesn't know which typeclass instance to use for our call to interpolate
; it could use the Interpolatable Text ByteString
instance, which is probably what we want. But the Interpolatable src ByteString
instance is also possible, since src
could unify with Text
.
We're able to describe in plain English what we want the compiler to do: only fall back to the Show
-based instance if we don't have something more specific. Is there a way to explain that to the compiler? Thankfully, yes!
If we mark our instances as overlapping, GHC will attempt to pick the most specific instance when there are multiple that might work. The exact instance matching rules are explained in detail here.
instance {-# OVERLAPS #-} Interpolatable Text ByteString where
= ...
interpolate instance {-# OVERLAPS #-} Show src => Interpolatable src ByteString where
= ...
interpolate
> text = "abc" :: Text
λ> (interpolate text) :: ByteString
λ>>> "abc"
> (interpolate True) :: ByteString
λ>>> "True"
Success!
In practice, I've found overlapping instances to be useful for exactly this sort of behavior: providing a "default" typeclass behavior that can be given specific overrides.
At this point we've sketched out the implementation of a fast, Unicode-aware interpolation backend; you could totally take this and spin off a new library with it. However, there's one last tweak we want to make to how this works for string-interpolate. Relying on the Monoid-based implementation to handle assembling the final output is okay in most cases, but would blow up if the interpolation gets really large. Both Text and ByteString have their own Builder
types that are designed for efficiently constructing instances of their associated textual types. Could we turn our interpolation fragments into that intermediate type first, and then use that to output the result, instead of having to directly convert our fragments into the target type?
As it turns out, yes, but here's where the type hacking starts to get really hairy.
We need to associate with each output type what its builder type is, which sounds like a job for type families. If you're not familiar with type families, all they are are functions that operate on type terms instead of data. So we can have a function that takes in the output type and returns the builder type. One nice thing is that there's a convenient syntax for doing this right in the typeclass instance itself:
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE AllowAmbiguousTypes #-}
import Data.ByteString
import qualified Data.ByteString.Lazy as LB
import qualified Data.ByteString.Builder as B
import Data.Text.Conversions
class InterpSink dst where
type SinkBuilder dst :: *
finalize :: SinkBuilder dst -> dst
combine :: SinkBuilder dst -> SinkBuilder dst -> SinkBuilder dst
-- For the verbatim parts of the interpolation quasiquote
literal :: String -> SinkBuilder dst
class InterpSink dst => Interpolatable src dst where
interpolate :: src -> SinkBuilder dst
instance {-# OVERLAPS #-} InterpSink ByteString where
type SinkBuilder ByteString = B.Builder
= LB.toStrict . B.toLazyByteString
finalize = (<>)
combine = B.lazyByteString . unUTF8 . convertText literal
Once again, though, we run into an issue when we try to define our "default" implementation:
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE UndecidableInstances #-}
import Data.String
import Text.Show ( ShowS, showString )
instance {-# OVERLAPS #-} IsString dst => InterpSink dst where
type SinkBuilder dst = ShowS
= fromString . ($ "")
finalize = (.)
combine = showString
literal
-- <interactive>:33:8: error:
-- Conflicting family instance declarations:
-- SinkBuilder ByteString = B.Builder -- Defined at <interactive>:33:8
-- SinkBuilder dst = ShowS -- Defined at <interactive>:62:8
Unfortunately, type families don't play nice with overlaps. The problem is exactly as it was with our basic overlapping instances: there are two possibilities for what the type SinkBuilder ByteString
might be, and the compiler doesn't currently have the ability to decide between them. Here there doesn't seem to be any way to get around this restriction on overlapping instances. So instead, we have to figure out a way to make our instances not overlapping.
The approach I chose was to add another parameter to all the involved typeclasses purely for the purpose of making the instance heads not collide. So the typeclasses look like this:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
data SinkDefinition = Default | Specific
type family HasCustomSink dst where
HasCustomSink ByteString = 'Specific
HasCustomSink _ = 'Default
class (HasCustomSink dst ~ flag) => InterpSink (flag :: SinkDefinition) dst where
type SinkBuilder flag dst :: *
finalize :: SinkBuilder flag dst -> dst
combine :: SinkBuilder flag dst -> SinkBuilder flag dst -> SinkBuilder flag dst
literal :: String -> SinkBuilder flag dst
class (HasCustomSink dst ~ flag) => Interpolatable (flag :: SinkDefinition) src dst where
interpolate :: src -> SinkBuilder flag dst
And once that's set up, we can write our instances using the flag to make them distinct:
instance InterpSink 'Specific ByteString where
type SinkBuilder 'Specific ByteString = B.Builder
...
instance (IsString dst, HasCustomSink dst ~ 'Default)
=> InterpSink 'Default dst where
type SinkBuilder 'Default dst = ShowS
...
Since the flag parameter in the instances differ, a type can't match both of these at once, solving our overlap problem. The rest is then implementing all the instances for InterpSink
and Interpolatable
that we want to offer.
So... success! We have our working interpolation backend, using the most efficient builder types that we can for each textual type. Unfortunately, we gave up the ability for library users to override the Show/IsString default behavior for their own types, since there's no way to define a new instance of HasCustomSink
. If that was a priority, we could stick with our two-parameter version using overlapping instances and take the performance hit on more complicated interpolations.
Unfortunately, I haven't been able to figure out how to get string-interpolate to have all three of 1) usage of efficient intermediate types, 2) defaulting to Show/IsString, and 3) allowing library users to customise the behavior for their own types. Any two together seem doable, but getting all three seems not to be.
Have you run into obvious use cases that Haskell makes way harder than other languages? Do you like string-interpolate and find it useful? Talk to me!
Before you close that tab...
Want to write practical, production-ready Haskell? Tired of broken libraries, barebones documentation, and endless type-theory papers only a postdoc could understand? I want to help. Subscribe below and you'll get useful techniques for writing real, useful programs straight in your inbox.
Absolutely no spam, ever. I respect your email privacy. Unsubscribe anytime.
Footnotes
↥1 We’d still have to do some finagling to get rid of the surrounding quotes for textual types, but this approach is directionally correct.