Abstracting out common columns in Opaleye

December 28, 2019

« Previous post Next post »

As a follow-up to my last post on which type-safe DB library to use, let's go over something that might have caught your attention about Opaleye. Unlike a lot of the other library choices available, Opaleye makes it possible to abstract out common columns, cutting down on having to redefine table mapping code. How?

Let's say we have the following datatypes, set up the way Opaleye likes them:

data UserT a b c d e = User
  { userID :: a
  , userName :: b
  , userEmail :: c
  , userCreatedAt :: d
  , userUpdatedAt :: e
  }

data OrderT a b c d = Order
  { orderID :: a
  , orderTotal :: b
  , orderCreatedAt :: c
  , orderUpdatedAt :: d
  }

$(makeAdaptorAndInstance "pUser" ''UserT)
$(makeAdaptorAndInstance "pOrder" ''OrderT)

type User = UserT Int32 Text Text UTCTime UTCTime
type UserF = UserT
  (Field SqlInt4)
  (Field SqlText)
  (Field SqlText)
  (Field SqlTimestamptz)
  (Field SqlTimestamptz)

type Order = OrderT Int32 Double UTCTime UTCTime
type OrderF = OrderT
  (Field SqlInt4)
  (Field SqlFloat8)
  (Field SqlTimestamptz)
  (Field SqlTimestamptz)

as well as the following Opaleye table mappings:

users :: Table UserF UserF
users = table "users" $
  pUser User
    { userID = required "id"
    , userName = required "username"
    , userEmail = required "email"
    , userCreatedAt = required "created_at"
    , userUpdatedAt = required "updated_at"
    }

orders :: Table OrderF OrderF
orders = table "orders" $
  pOrder Order
    { orderID = required "id"
    , orderTotal = required "total"
    , orderCreatedAt = required "created_at"
    , orderUpdatedAt = required "updated_at"
    }

This works. It's a little redundant though. If you're like me, having to rewrite the exact same code for the created_at and updated_at columns hurts your heart. Even the id column could potentially be factored out. Could we do better?

Fortunately, Opaleye already gives you enough functionality to do so, as it bases its table definitions around product-profunctors. I can't claim to fully understand the abstraction myself, but the gist of it in this case is that it allows you to take smaller components, like individual column mappings or groups of columns, and lay them out "side-by-side" to create a combined table mapping. For instance, we can create the mappings for just our timestamps like so:

timestamps :: TableFields
  (Field SqlTimestamptz, Field SqlTimestamptz)
  (Field SqlTimestamptz, Field SqlTimestamptz)
timestamps = p2 ( required "created_at", required "updated_at" )

After removing these fields from our datatypes, we can then redefine our tables like so:

users :: Table
  (UserF, (Field SqlTimestamptz, Field SqlTimestamptz))
  (UserF, (Field SqlTimestamptz, Field SqlTimestamptz))
users = table "users" $
  p2 ( pUser User
         { userID = required "id"
         , userName = required "username"
         , userEmail = required "email"
         }
     , timestamps
     )

orders :: Table
  (OrderF, (Field SqlTimestamptz, Field SqlTimestamptz))
  (OrderF, (Field SqlTimestamptz, Field SqlTimestamptz))
orders = table "orders" $
  p2 ( pOrder Order
         { orderID = required "id"
         , orderTotal = required "total"
         }
     , timestamps
     )

We've removed a source of repetiveness and potential errors. Success!

Notice that the types of our tables have changed. Now when we write queries or do inserts, we'll get back a tuple with our datatype inside. There's not really a way to avoid this, since it's not like this is Ruby or JavaScript; we can't dynamically add new fields to our types after they've been defined. Your queries will have to be adjusted accordingly.

We're not done abstracting this out yet. There are two issues that we can see from this definition:

Having to match on a tuple every time we query is sort of annoying. What if we end up with lots of common columns that we want to abstract? Then every query will start becoming a mess of underscores when we don't need everything:
```
someQuery :: Select (F SqlText)
someQuery = proc () -> do
  (user, _, _, _, _) <- selectTable users -< ()
  returnA -< userName user
```
Truthfully, we don't want developers to need to provide values for these timestamps every time they run an insert or update. In fact, for the most part we don't want them to be able to manually change them at all. We'd rather have Opaleye just pass DEFAULT as the value for updated_at on every update, for instance.

Avoiding tuple blindness

Using a tuple for this isn't just annoying, it's also somewhat dangerous, since there's no information about which part of the tuple signifies what information. What if you mixed up the order of updated_at and created_at in your pattern match?

Thankfully, Opaleye's table definitions are flexible. One thing you could do is to create a single entity type that "wraps" common fields. Then you could create type aliases for this for your different types.

data EntityT a b c d = EntityT
  { entityValue :: a
  , entityCreatedAt :: b
  , entityUpdatedAt :: c
  , entityOwner :: d
  }

$(makeAdaptorAndInstance "pEntity" ''EntityT)

type User = EntityT (UserT Int32 Text Text) UTCTime UTCTime Text
type Order = EntityT (OrderT Int32 Double) UTCTime UTCTime ()

type UserF = EntityT ...
type OrderF = EntityT ...

Once we define a table mapping for this, we now have the best of both worlds: our common fields are factored out, and we no longer have to pattern match on fields we don't want to use. We also have meaningful accessors that tell us what each piece of information signifies.

Note the use of unit to stub out the owner field on orders. Maybe lots of the entities in our system have an internal admin who is responsible for triaging bugs or tickets arising from said entity, but orders don't. This way we can accomodate even columns that are only "mostly common," but which also don't seem like they belong with the entity itself.

Once we have all this, we just need to create a way to "lift" column mappings for our "base" entity types into our Entity wrapper with everything stubbed out:

import Data.Function ( (&) )

wrapEntityMapping :: TableFields a b
                  -> EntityT (TableFields a b)
                             (TableFields () ())
                             (TableFields () ())
                             (TableFields () ())
wrapEntityMapping mapping = EntityT
  { entityValue = mapping
  , entityCreatedAt = pure ()
  , entityUpdatedAt = pure ()
  , entityOwner = pure ()
  }

withTimestamps :: EntityT a (TableFields () ()) (TableFields () ()) b
               -> EntityT a
                          (TableFields (Field SqlTimestamptz)
                                       (Field SqlTimestamptz))
                          (TableFields (Field SqlTimestamptz)
                                       (Field SqlTimestamptz))
                          b
withTimestamps mapping = mapping
  { entityCreatedAt = required "created_at"
  , entityUpdatedAt = required "updated_at"
  }

withOwner :: EntityT a b c (TableFields () ())
          -> EntityT a b c (TableFields (Field SqlText) (Field SqlText))
withOwner mapping = mapping
  { entityOwner = required "owner"
  }

users :: Table UserF UserF
users = table "users" $ pEntity $
  wrapEntityMapping
    (pUser User
      { userID = required "id"
      , userName = required "username"
      , userEmail = required "email"
      })
    & withTimestamps
    & withOwner

orders :: Table OrderF OrderF
orders = table "orders" $ pEntity $
  wrapEntityMapping
    (pOrder Order
      { orderID = required "id"
      , orderTotal = required "total"
      })
  & withTimestamps

You could go even further and write lenses for Entity to make it even more convenient to use.

Restricting updates/inserts

You might have noticed that the type signatures of our tables duplicate their parameters. You might reasonably wonder what this duplication is for.

As it just so happens, the fact that the Table type has two parameters instead of just one is the key to resolving our update situation.

The first parameter is what data you need to provide to insert into that table. The second parameter is what data you get back when querying.¹ I'll refer to these as the write type and read type.

Up until now, we've been using required to define the column names. As you might expect, this signifies to Opaleye that we have to provide a value for that column on writes. Opaleye provides a similar function called optional that leaves the read type of said column the same as required, but allows the write type to be Maybe. Updating our timestamps function gives us:

type UserWrite = EntityT (UserT (Field SqlInt4) (Field SqlText) (Field SqlText))
                         (Maybe (Field SqlTimestamptz))
                         (Maybe (Field SqlTimestamptz))
                         (Field SqlText)

withTimestamps :: EntityT a (TableFields () ()) (TableFields () ()) b
               -> EntityT a
                          (TableFields (Maybe (Field SqlTimestamptz))
                                       (Field SqlTimestamptz))
                          (TableFields (Maybe (Field (SqlTimestamptz)))
                                       (Field SqlTimestamptz))
                          b
withTimestamps mapping = mapping
  { entityCreatedAt = optional "created_at"
  , entityUpdatedAt = optional "updated_at"
  }

users :: Table UserWrite UserF
users = {- unchanged -}

Passing a Nothing on writes will result in Opaleye passing DEFAULT to the database, while still allowing you to pass through the created_at unchanged on an update.

We can go one step further and prevent a value from being passed for updated_at at all, by taking advantage of the fact that Table and TableFields are both instances of Profunctor.

import Data.Profunctor

withTimestamps :: EntityT a (TableFields () ()) (TableFields () ()) b
               -> EntityT a
                          (TableFields (Maybe (Field SqlTimestamptz))
                                       (Field SqlTimestamptz))
                          (TableFields ()
                                       (Field SqlTimestamptz))
                          b
withTimestamps mapping = mapping
  { entityCreatedAt = optional "created_at"
  , entityUpdatedAt = lmap (\() -> Nothing) (optional "updated_at")
  }

Now that the write type of the updated_at column has been stubbed out to unit, we don't have to worry about it at all on writes; assuming a DEFAULT value has been specified in our database, it'll automatically update for us.

Wrapping up

After working through all these problems and abstracting them out, reflecting on how we did it reveals that we were just constructing normal functions and values; outside of using optional, we didn't have to lean on any Opaleye-specific functionality at all. All we did was build up larger table mappings from smaller table mappings.

This, I think, is part of the beauty of Haskell; everything is built from the same basic components, and if you can grasp a small amount of extra concepts (in this case, how product-profunctors fit together) you already have everything you need to contort libraries in ways their authors didn't expect.

Found this useful? Still have questions? Talk to me!

« Previous post Next post »

Want to write practical, production-ready Haskell? Tired of broken libraries, barebones documentation, and endless type-theory papers only a postdoc could understand? I want to help. Subscribe below and you'll get useful techniques for writing real, useful programs straight in your inbox.

Footnotes

^↥1 This also makes it possible to make a table read-only on the Haskell side. Use the Profunctor instance on Table to make the write type Void.

import Data.Void
import Data.Profunctor

readOnlyTable :: Table a b -> Table Void b
readOnlyTable = lmap absurd

Abstracting out common columns in Opaleye

Avoiding tuple blindness

Restricting updates/inserts

Wrapping up

You might also like

Before you close that tab...

Footnotes