Figuring out how to use Beam for DB migrations

September 27, 2019
« Previous post   Next post »

So you've started writing type-safe database queries using Beam or another Haskell database library. No more writing raw SQL strings into your code and praying that your integration tests don't let things slip through the cracks. But maybe having just your queries be type-safe isn't good enough for you. Long-lived application are going to have to change their database schema eventually. Can you manage those changes in Haskell as well? Make them type-safe and reliable? As it turns out, you can, with Beam's built-in migration functionality.

If you don't know about it, Beam is one of a number of Haskell libraries for providing type-safe, composable database access. Beam also provides migration functionality for managing your database schema. Unfortunately, there's basically no documentation about how to use it. Rest assured, it is completely functional, if difficult to start using.

This post collects everything I learned while trying to get up and running with Beam's migrations, and should help you get started as well.

There are actually a lot of different Haskell tools for managing migrations directly in pure Haskell, from the most barebones like postgresql-simple-migration and drifter up to the more heavyweight migration frameworks provided by Beam, Squeal, and Hasql. Some of these options will only work for specific DB backends, predominantly Postgres. If you're looking to make a decision between the various options, this post won't go into detailed comparisons between any of these options, but should give you enough information about Beam's migrations to allow you to make your own comparison.

Before we dive into exploring the migration framework proper, we'll need an example schema that we're interested in. Let's assume we're working with Postgres. We'll need the beam-core, beam-migrate, and beam-postgres packages. These examples were tested against beam-migrate-

Since we're working with migrations, we'll build up the schema step by step. Let's say that we're trying to set up a database to store orders for a flower shop. For now, we only care about storing information about each flower we stock, and what quantity of each flower is in each order. So we'll have a table for flowers, a table for orders, and a many-to-many join table between orders and flowers.

When writing migrations in Beam, you first need Haskell datatypes for your entities and database. So let's write those.

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE TypeFamilies #-}

import Data.Int ( Int32, Int64 )
import Data.Text ( Text )
import Data.Scientific ( Scientific )
import Data.Time ( UTCTime )

import Database.Beam

data FlowerT f = Flower
  { flowerID :: Columnar f Int32
  , flowerName :: Columnar f Text
  , flowerPrice :: Columnar f Scientific
  deriving (Generic, Beamable)

data OrderT f = Order
  { orderID :: Columnar f Int32
  , orderTime :: Columnar f UTCTime
  deriving (Generic, Beamable)

data LineItemT f = LineItem
  { lineItemOrderID :: PrimaryKey OrderT f
  , lineItemFlowerID :: PrimaryKey FlowerT f
  , lineItemQuantity :: Columnar f Int64
  deriving (Generic, Beamable)

data FlowerDB f = FlowerDB
  { dbFlowers :: f (TableEntity FlowerT)
  , dbOrders :: f (TableEntity OrderT)
  , dbLineItems :: f (TableEntity LineItemT)
  deriving (Generic, Database be)

instance Table FlowerT where
  data PrimaryKey FlowerT f = FlowerID (Columnar f Int32)
    deriving (Generic, Beamable)
  primaryKey = FlowerID . flowerID

instance Table OrderT where
  data PrimaryKey OrderT f = OrderID (Columnar f Int32)
    deriving (Generic, Beamable)
  primaryKey = OrderID . orderID

instance Table LineItemT where
  data PrimaryKey LineItemT f =
    LineItemID (PrimaryKey OrderT f) (PrimaryKey FlowerT f)
    deriving (Generic, Beamable)
  primaryKey = LineItemID
    <$> lineItemOrderID
    <*> lineItemFlowerID

Unfortunately, Beam's migration framework seems to require these types to be written first, because they need to be used in the migration code as well. It would be nice to be able to autogenerate the datatype code as well, but alas.

Once we have this, we can write our first migration. All the major functions you'll need to actually write migrations are in Database.Beam.Migrate.SQL.Tables.

{-# LANGUAGE OverloadedStrings #-}

import Database.Beam.Migrate
import Database.Beam.Backend
import Database.Beam.Postgres

-- It's unfortunate that we have to define this ourselves.
utctime :: BeamSqlBackend be => DataType be UTCTime
utctime = DataType (timestampType Nothing True)

initialSetup :: Migration Postgres
  (CheckedDatabaseSettings Postgres FlowerDB)
initialSetup = FlowerDB
  <$> (createTable "flowers" $ Flower
        { flowerID = field "id"
            int notNull unique
        , flowerName = field "name"
            (nationalVarchar (Just 20)) notNull unique
        , flowerPrice = field "price"
            (numeric (Just (20, Just (2)))) notNull
  <*> (createTable "orders" $ Order
        { orderID = field "id"
            int notNull unique
        , orderTime = field "order_time"
            utctime notNull (defaultTo_ (cast_ currentTimestamp_ utctime))
  <*> (createTable "line_items" $ LineItem
        { lineItemOrderID = OrderID $
            field "order_id" int notNull
        , lineItemFlowerID = FlowerID $
            field "flower_id" int notNull
        , lineItemQuantity = field "quantity"
            bigint notNull

initialSetupStep :: MigrationSteps Postgres
  (CheckedDatabaseSettings Postgres FlowerDB)
initialSetupStep = migrationStep
  (const initialSetup)

Here we can see why we needed to write the datatypes before writing the migration itself; Beam uses the "shape" of the datatypes to store information about which field/table name corresponds to which attribute/record type. This pattern should be familiar if you've already used Beam's query interface, where Beam uses the polymorphism of the datatype fields to store either literal data or expressions to compute said data. Here, it's used to store table mapping information.

The various database types that can be used for columns are defined in Database.Beam.Query.DataTypes.

With this, we have enough to actually run this migration using the simple implementation Beam has defined in Database.Beam.Migrate.Simple.

import Control.Monad.Fail ( MonadFail )
import Database.Beam.Migrate.Simple
import qualified Database.Beam.Postgres.Migrate as PG

-- Beam's simple runner doesn't run destructive migrations
-- by default, so we have to override that.
allowDestructive :: (Monad m, MonadFail m) => BringUpToDateHooks m
allowDestructive = defaultUpToDateHooks
  { runIrreversibleHook = pure True }

migrateDB :: Connection
          -> IO (Maybe (CheckedDatabaseSettings Postgres FlowerDB))
migrateDB conn = runBeamPostgresDebug putStrLn conn $

Once we have our working migration, we can also produce the DatabaseSettings that we need to actually run any queries. CheckedDatabaseSettings contains all the information we need, we just have to throw some of it away.

flowerDB :: DatabaseSettings Postgres FlowerDB
flowerDB = unCheckDatabase $ evaluateDatabase initialSetupStep

exampleQuery :: Connection -> IO [FlowerT Identity]
exampleQuery conn = runBeamPostgres conn $
  runSelectReturningList $
    select (all_ (dbFlowers flowerDB))

If you've used Beam before, you might notice that we don't need to handwrite DatabaseSettings to query the database, since we can just derive it from our migration. Which is... nice, except that we still have to define our custom datatypes before we can write our migrations.

We've got our initial database setup done, but the point of a migration framework is to manage changes over time. What kinds of things might change in our quaint little flower shop?

Perhaps the shop owner comes to us and says that they also want to display scientific names for their flowers alongside the common names. So we'll need an extra column in our database, and we'll want to rename our original name column to avoid confusion.

The idea is simple enough, but how do we implement it in Beam?

Once you start actually changing the database schema, Beam's preferred way of representing those changes is as functions from one CheckedDatabaseSettings to another. To see what I mean, we'll change the name of the original name column first.

import Control.Arrow

  :: CheckedDatabaseSettings Postgres FlowerDB
  -> Migration Postgres (CheckedDatabaseSettings Postgres FlowerDB)
changeCommonName settings = do
  withColumnChange <- alterTable (dbFlowers settings) $
    \dbFlowers' -> do
      commonName <- renameColumnTo "common_name" (flowerName dbFlowers')
      pure $ dbFlowers' { flowerName = commonName }
  pure $ settings { dbFlowers = withColumnChange }

changeCommonNameStep :: MigrationSteps Postgres
  (CheckedDatabaseSettings Postgres FlowerDB)
  (CheckedDatabaseSettings Postgres FlowerDB)
changeCommonNameStep = migrationStep

fullMigration :: MigrationSteps Postgres
  (CheckedDatabaseSettings Postgres FlowerDB)
fullMigration = initialSetupStep >>> changeCommonNameStep

You can see that changeCommonName is the actual code, which we lift into a MigrationStep so that we can combine it with the setup step and run them. This is the basic pattern for each migration you'll write: create a function that takes in an existing schema and returns a Migration, add the migration code in its body, and then lift and combine it.1

So far so good. Unfortunately, adding the scientific name column is a little harder. After all, our FlowerT datatype doesn't have an attribute for it. We can't add it to our existing datatype, since that would mess up our existing migrations; they wouldn't have a database field to put into that attrtibute! We'll need to create an entirely new datatype, which also means creating a new database type to hold our new table type.

data FlowerT' f = Flower'
  { flowerID' :: Columnar f Int32
  , flowerCommonName' :: Columnar f Text
  , flowerScientificName' :: Columnar f Text
  , flowerPrice' :: Columnar f Scientific
  deriving (Generic, Beamable)

data FlowerDB' f = FlowerDB'
  { dbFlowers' :: f (TableEntity FlowerT')
  , dbOrders' :: f (TableEntity OrderT)
  , dbLineItems' :: f (TableEntity LineItemT)
  deriving (Generic, Database be)

instance Table FlowerT' where
  data PrimaryKey FlowerT' f = FlowerID' (Columnar f Int32)
    deriving (Generic, Beamable)
  primaryKey = FlowerID' . flowerID'

At this point you can probably see why Beam's migration tooling can get very annoying; if we want to actually use the migration incrementally, we'll have to keep creating new types, and keep updating which DB type is our most current one throughout our code. And we still haven't written our migration yet.

retable :: Table tbl
        => CheckedDatabaseEntity Postgres db (TableEntity tbl)
        -> Migration Postgres (CheckedDatabaseEntity Postgres db' (TableEntity tbl))
retable t = alterTable t pure

  :: CheckedDatabaseSettings Postgres FlowerDB
  -> Migration Postgres (CheckedDatabaseSettings Postgres FlowerDB')
addScientificName settings = do
  withColumnChange <- alterTable (dbFlowers settings) $
    \dbFlowers' -> do
      scientificName <- addColumn (field "scientific_name"
        (nationalVarchar (Just 20)) unique)
      pure $ Flower'
        { flowerID' = flowerID dbFlowers'
        , flowerCommonName' = flowerName dbFlowers'
        , flowerScientificName' = scientificName
        , flowerPrice' = flowerPrice dbFlowers'
  orders' <- retable (dbOrders settings)
  lineItems' <- retable (dbLineItems settings)
  pure $ FlowerDB'
    { dbFlowers' = withColumnChange
    , dbOrders' = orders'
    , dbLineItems' = lineItems'

addScientificNameStep :: MigrationSteps Postgres
  (CheckedDatabaseSettings Postgres FlowerDB)
  (CheckedDatabaseSettings Postgres FlowerDB')
addScientificNameStep = migrationStep

fullMigration :: MigrationSteps Postgres
  (CheckedDatabaseSettings Postgres FlowerDB')
fullMigration = initialSetupStep
  >>> changeCommonNameStep
  >>> addScientificNameStep

We can't even reuse the tables from the original FlowerDB directly; we have to write our own retable function to get the type tetris to line up correctly. Frankly, once you have to, y'know, do actual migrations using Beam, it gets incredibly unwieldy.

Let's say there's one last change that we want to make. We want to reward loyal customers, so we want to give frequent customers a loyalty account, and add points for each order they make. For simplicity's sake, we'll just say there's a single point per order.

I'm going to skip the datatype updates necessary; they're pretty mechanical and not very interesting. We'll have two new datatypes, OrderT' and LoyaltyAccountT, and a new DB type, FlowerDB''. (I never said Beam's migration framework was good.) The only thing of note is that the loyalty ID needs to be of type PrimaryKey LoyaltyAccount (Nullable f), since not every order made might have an associated loyalty account; after all, not everyone will sign up before buying something.

  :: CheckedDatabaseSettings Postgres FlowerDB'
  -> Migration Postgres (CheckedDatabaseSettings Postgres FlowerDB'')
addLoyaltyAccounts settings = do
  withLoyaltyID <- alterTable (dbOrders' settings) $
    \dbOrders' -> do
      loyaltyID <- addColumn (field "loyalty_id" (maybeType int))
      pure $ Order'
        { orderID' = orderID dbOrders'
        , orderLoyaltyID' = LoyaltyAccountID loyaltyID
        , orderTime' = orderTime dbOrders'
  loyaltyAccounts <- createTable "loyalty_accounts" $
      { loyaltyAccountID = field "id"
          int notNull unique
      , loyaltyAccountCustomerName = field "customer_name"
          (nationalVarchar (Just 30)) notNull
  flowers'' <- retable (dbFlowers' settings)
  lineItems'' <- retable (dbLineItems' settings)
  pure $ FlowerDB''
    { dbFlowers'' = flowers''
    , dbOrders'' = withLoyaltyID
    , dbLineItems'' = lineItems''
    , dbLoyaltyAccounts'' = loyaltyAccounts

addLoyaltyAccountsStep :: MigrationSteps Postgres
  (CheckedDatabaseSettings Postgres FlowerDB')
  (CheckedDatabaseSettings Postgres FlowerDB'')
addLoyaltyAccountsStep = migrationStep

fullMigration :: MigrationSteps Postgres
  (CheckedDatabaseSettings Postgres FlowerDB'')
fullMigration = initialSetupStep
  >>> changeCommonNameStep
  >>> addScientificNameStep
  >>> addLoyaltyAccountsStep

And with that, we have a few examples of doing the most common schema updates. But phew, these migrations are rather a mouthful for how little we're actually doing in them.

If you're currently evaluating different options for managing your database schema, hopefully this post has helped you form your own opinion of Beam. My own personal opinion is that Beam's current migration tooling isn't very good, and I'll probably either stick to doing migrations manually or look into some other tools.

The full migration code in this post is available here, for reference.

Found this useful? Got a comment to make? Talk to me!

« Previous post   Next post »

Before you close that tab...


↥1 One footgun to be aware of is that Beam expects the columns and tables created in migrations to be used in a linear way; that is, once you rename a column or drop a table, you shouldn’t use the old value anymore. Since Haskell doesn’t have linear types yet, doing this resource management is on you. For instance, we created the commonName column by renaming the old column, passing in flowerName dbFlowers'. But what if we accidentally reused the old column instead of returning the updated table with commonName inside?

    \dbFlowers' -> do
      commonName <- renameColumnTo "common_name" (flowerName dbFlowers')
      pure $ dbFlowers' { flowerName = flowerName dbFlowers' }

This would still typecheck, but would produce a migration that wouldn’t do anything.

You might also like