Which type-safe database library should you use?

December 14, 2019

Beam versus Squeal: which one is better? Or maybe you’ve heard good things about Selda or Opaleye. Lots of options, not a lot of guidance.

To answer this question, I took 7 popular options for database libraries and implemented the same project and queries using each one.

The contestants:

Why use any of these libraries?

Presumably you, like me, are sold on the benefits of strong typing to ensure that you write better software. (If you’re not, let’s just pretend you are for now.) Your application presumably needs some sort of permanent storage. You could just use postgresql-simple for everything, but it’s a little embarassing to be writing raw SQL queries as strings and hoping that they work in a language that wants to do anything but.

Thankfully, there are lots of options in the Haskell ecosystem for type-safe SQL queries. You can make sure that you’re not forgetting to include columns in your output, or getting them in the wrong order. You can make sure that you’re joining tables on an ID for the same entity. You can even reuse queries easily by composing them directly in Haskell, then producing a single query to send to your DB backend. All with the type-checker helping you, making sure you don’t create invalid queries and runtime errors.

What’s the example project?

We’re creating the backend for a website for professional hitmen. Think Fiverr or Upwork, but for paid killers. Each hitman has a handler (handlers might handler multiple hitmen), and hitmen pursue “marks.” Once the job is done, hitmen mark their targets as “erased.” We’ll model this like so in our database. Adding an erased_marks entity doesn’t delete any pursuing_marks.

For the purposes of our exercise, we’re using Postgres as our database backend. While some of the libraries we’ll look at (like Beam) are fairly backend-agnostic and can be used for any database engine, others (like Opaleye and Squeal) only really target Postgres.

In order to serve up data from our backend, we’ll need some queries. Specifically, these queries, ranging from trivial ones to ones that exercise how the libraries handle joins, subqueries, and aggregates.

  • Get all the hitmen
  • Get all the hitmen that are pursuing active marks (i.e. marks that haven’t been erased yet)
  • Get all the marks that have been erased since a given date
  • Get all the marks that have been erased since a given date by a given hitman
  • Get the total bounty awarded to each hitman
  • Get the total bounty awarded to a specific hitman
  • Get each hitman’s latest kill
  • Get a specific hitman’s latest kill
  • Get all the active marks that have only a single pursuer
  • Get all the “marks of opportunity” (i.e. marks that a hitman erased without them marking the mark as being pursued first)

We also want to write updates and inserts using each library to see how they handle them. These should stretch the query capabilities of each library sufficiently to find rough spots.

If you’d like to skip straight to the code, here’s the repository containing implementations for each library.

With all that in mind, let’s jump into comparing.

Beam

Beam is an attempt to solve the type-safe SQL problem in a completely backend-agnostic way. The way it does this is by adding a type parameter to each query for the backend and having lots of typeclasses to specify functionality. Unfortunately, this gets really old, really fast, especially when you have to use typeclasses with names like HasSqlEqualityCheck backend Int64 and BeamSqlT071Backend backend, because god forbid you want to use BIGINTs. (You’ll need both of those on the same query, by the way.)

You can get around this by ignoring the backend-agnosticism entirely and specifying the specific backend in each query, but even then the types of your queries will be a mess of inscrutable QExprs, QAggs, and an ever-present s type parameter for query scoping.

Once you get past all of that, building and composing queries in Beam is actually fairly nice; subqueries can be reused very easily by using Beam’s monad instance for its queries, and joins (both inner and left) are one line away. It’s easy to define queries that returning ad-hoc tuples, without needing to define new datatypes. It’s just that the Beam types are obnoxious. You might be able to get around that with type aliases and clever type families, but at point… why bother, when some of the other library choices do the same thing without the fuss?

Some other red flags are that there’s no easy way to get distinct rows, so you’ll need to explicitly aggregate and group on all the columns you want. Beam also seems to type SUMs incorrectly, at least on Postgres; it doesn’t correctly change the column type. For instance, in Postgres, summing over a BIGINT column will give you back a NUMERIC result, but in Beam-land, if your datatype attribute is an Int64, summing it will still try to give you back an Int64, which causes a runtime error.

Overall I can’t recommend Beam; it’s too finicky.

Opaleye

Opaleye is a SQL DSL specifically for Postgres. Out of the box, it was the solution that “just worked” without major tweaking or having to completely ignore core parts of the library.

Writing queries works. Composing queries works. The types are (relatively) simple and easy to work with. Joins are a little bit painful, because left joins break type inference and require type annotations, but inner joins work fine.

One major distinction between Opaleye and other DB libraries is the way you define table schemas. All the schema definitions are done at the term level, in a way that’s mostly independent of your domain types themselves. Because of the way it’s set up (using product-profunctors), you can easily abstract out common columns. For instance, I used it to abstract out the definitions of created_at and updated_at timestamps. Further, Opaleye makes a distinction between write-time and read-time data, so it’s easy to, say, make it so certain columns can’t get written on inserts/updates (like the aforementioned timestamps).

Some red flags are that, like Beam, it has the same issue with sums and counts, where it incorrectly keeps the column type the same on aggregates and causes a runtime error. I fixed it in my queries using explicit casts, but it feels like something that shouldn’t be necessary. It’s especially egregious here since Opaleye only targets Postgres and should be aware of this situation. Other stumbling blocks are that it uses arrows instead of monads for its main abstraction, and uses product-profunctors everywhere. Thankfully, you should be able to get by without a deep knowledge of either; just treat arrows like funny-looking monads and add p3s where the documentation tells you to.

Overall Opaleye just works, and it’s my personal recommendation. It has a bit of an abstraction learning curve and takes a little getting used to, but I like it a lot.

Squeal

Squeal is a bit of an odd child; less of a convenient DSL, it’s meant to be a deep embedding of SQL in Haskell itself. So it’s much closer to writing actual SQL, and doesn’t try to abstract things away, down to having your SQL keywords appearing in the right place in your query.

This rigidity makes Squeal extremely painful to use in practice, as it uses types to enforce, for instance, that you put your WHERE clause after all the tables you’re joining from are brought into scope. Since Squeal uses a pure combinatorial approach, rather than the arrows/monads of other libraries, using it becomes a juggling exercise of nested parentheses and constantly hopping between different levels of nesting. Frankly, it feels like a mess.

Squeal also uses OverloadedLabels for selecting columns and tables, and goes even further than everything else on this list, in that it not only asks you to type your columns, but also keeps track of which names you used for each column. Which, admirable, but also extremely annoying when composing a subquery into another and you have to explicitly reselect the subquery results using the exact name.

This insistence on naming columns causes a lot of other problems as well. The way you return instances of your domain types is to explicitly name the columns of the query the same as the properties in your datatype, using SQL AS. There’s no easy way to just define the correspondence once and forget about it, which means that even if you’re just selecting all the entities out of a single table, you have to explicitly rename all the columns. Fun! Want to return an ad-hoc tuple of data from a quick one-off query? Sorry, can’t do that; tuples don’t have named fields, so how can you label your columns correctly? In fact, any time you want to return data in a new form, you’ll need to define a completely new datatype for that and rederive Squeal’s special typeclasses.

While working with Squeal, I felt like I never stopped running into stumbling blocks. Squeal’s query type has type parameters for both the query’s inputs and outputs, but there doesn’t seem to be a way to then pass an input parameter to a subquery? So you end up just having to duplicate query code. Sometimes using subqueries just… caused a runtime error for no discernable reason, despite type checking. I hope you never misspell a column either, or Squeal will drown you in a sea of inscrutable type errors.

Overall, Squeal feels like it succeeds at its goal of embedding SQL in Haskell, but fails to actually be able to describe common SQL patterns without breaking down. Not recommended.

Persistent + Esqueleto: Disqualified

Persistent is a thin persistence layer for doing simple CRUD operations. Esqueleto is a SQL DSL on top of Persistent that adds the ability to do joins and more complicated queries.

Unfortunately, even with its additional power, I wasn’t able to actually implement the full example project in Esqueleto. The support for subqueries seems to be somewhere between anemic and nonexistent; they’re only allowed as expressions to SELECT; you can’t join on them. Even when doing joins, you have to make sure to put your join conditions in reverse order of the tables, or your query will just fail at runtime. This is fixed in more recent versions of Esqueleto, but even in versions as recent as October of this year, you’ll still run into this.

Between the inability to express complex SQL queries, the amount of extensions and Template Haskell magic needed to use Persistent, and the crappy type errors, I don’t know why you would use Esqueleto.

Hasql: not like the others

While implementing this project, I realized that Hasql isn’t actually meant to be a full-fledged query/DB library, despite the fact that I’d heard about it in context of Haskell ORM choices. Instead, Hasql is an alternative to postgresql-simple; it’s a way to write and run raw SQL queries, optimized for speed. It’s not trying to compete with Beam and Opaleye; instead, it’s trying to be a backend for libraries like those to emit code to.

So as it turns out, Hasql wasn’t in the running in the first place for a type-safe DB library. I’ve still provided a reference implementation for our hitmen backend, in case you’re curious. I’ve omitted including the inserts and updates, because, well, I assume you know how to write SQL queries.

Groundhog: Disqualified

Unfortunately, just like with Esqueleto, I wasn’t able to implement the full example project in Groundhog. Groundhog fell down even faster than Esqueleto, as it doesn’t support joins.

Next, please.

Selda

Selda is another DSL that feels very much like a stripped-down version of Beam or Opaleye. Like Beam, it also aims to provide a backend-agnostic query interface. Unlike Beam, it doesn’t go to quite the same lengths to achieve that, and ends up being fairly pleasant to use.

One nice thing is that relative to the other options on this list, Selda doesn’t require too much advanced type machinery to work. It mostly “just works.”

However, there are still some red flags. One of the biggest is that Selda doesn’t seem to provide DB conversions for fairly common types. For instance, it’s missing typeclass instances for lazy Text, as well as Int32 and Int64; it doesn’t even provide an instance for Integer! So have fun trying to store a integer larger than 32 bits. Insertion/updates also feel like an afterthought, as Selda won’t even allow you to use SQL DEFAULT for columns other than autoincrementing primary keys.

Overall, I’d say that Selda is usable. If you’re not willing to bite the bullet for Opaleye’s learning curve, Selda is a passable alternative.


The verdict: Use Opaleye if you’re comfortable with arrows and profunctors, use Selda (or maybe Beam) otherwise.

None of the DB library options in the Haskell ecosystem are really at the level of a full ORM; you’re not going to find something to equal ActiveRecord or similar. Still, if you need your queries to be rock-solid, you’re in good hands.

Here’s a link to the repository containing implementations for each library. Each implementation also has a small executable driver to demonstrate how to run queries.

If you’ve gotten this far, why not give a library of your choice a shot? Try getting something from that repository up and running. Write a new query to see how the library feels to use. For instance, you could try getting all marks that required a long chase. Or, take one of the implementations as a template and start on a project of your own. If you’re thinking of trying Opaleye, Saurabh Nanda has written some excellent docs to get you started.

Found this useful? Still have questions? Talk to me!


You might also like


Before you close that tab...