A cheatsheet to regexes in Haskell

Posted on April 11, 2019

UPDATE: This cheatsheet is now part of the documentation for regex-tdfa!

While Haskell is great for writing parsers, sometimes the simplest solution is just to do some text munging with regular expressions. Whenever I find myself needing to do some simple pattern matching on strings, I always reach for regex-tdfa; it’s fast and supports all the directives I need. But whenever I do, I inevitably find myself having to reread the documentation all over again to figure out how to use anything. So here’s a cheatsheet for the most common use cases for regexes in Haskell.

Importing and using

Add to your package.yaml/cabal file:

In modules where you need to use regexes:

The regex-tdfa package only lets you match on String/ByteString; hence the import of regex-tdfa-text.

Basics

(=~) and (=~~) are polymorphic in their return type. This is so that regex-tdfa can pick the most efficient way to give you your result based on what you need. For instance, if all you want is to check whether the regex matched or not, there’s no need to allocate a result string. If you only want the first match, rather than all the matches, then the matching engine can stop after finding a single hit.

This does mean, though, that you may sometimes have to explicitly specify the type you want, especially if you’re trying things out at the REPL.

Common use cases

Get the first match

Check if it matched at all

Get first match + text before/after

Get first match + submatches

Get all matches

Special characters

regex-tdfa only supports a small set of special characters and is much less featureful than some other regex engines you might be used to, such as PCRE.

  • \` — Match start of entire text (similar to ^ in other regex engines)
  • \' — Match end of entire text (similar to $ in other regex engines)
  • \< — Match beginning of word
  • \> — Match end of word
  • \b — Match beginning or end of word
  • \B — Match neither beginning nor end of word

Less common stuff

Get match indices

Get submatch indices

Replacement

Unfortunately, regex-tdfa doesn’t seem to provide functionality to do find-and-replace.

Avoiding backslashes

If you find yourself writing a lot of regexes, take a look at raw-strings-qq. It’ll let you write regexes without needing to escape all your backslashes.


If you find that you need to do something more complicated with your text, it may be that you’re trying to use the wrong tool; take a look at using parser combinators instead. For parsing human-generated files, take a look at megaparsec. If you need maximum speed for over-the-wire formats, attoparsec is probably what you’re looking for.

Still have questions? Talk to me!


Want more Haskell tips and tricks? Subscribe for useful stuff, no spam, straight to your inbox.