Abstract
Functional programmers often work with trees; for explanatory or debugging purposes, it is useful to be able to visualize them. I explain one such algorithm for drawing m-ary trees, the Reingold-Tilford algorithm, which runs in time. Animations showing the core ideas of the algorithm are provided. I present a reimplementation of the algorithm in Haskell, showing how to implement the algorithm without complicated pointer manipulations, while still preserving the time bound.
In Tidier Drawings of Trees, 1981 (PDF), Reingold and Tilford introduce an algorithm for drawing aesthetically pleasing trees. They define specific aesthetics that their algorithm upholds:
- Nodes at the same depth in the tree should be drawn at the same vertical level in the output.
- A left child should be drawn to the left of its parent, and the right child to the right of the parent.
- A parent should be centered over its children.
- A subtree should be drawn the same way regardless of where it appears in the tree.
For reference, here's what a tree drawn by the Reingold-Tilford algorithm looks like:
Drawing a tree distills down into assigning coordinates to each node in the tree such that the coordinates don't overlap, while satisfying the aesthetics defined above. We first look at the simpler case of drawing binary trees, then extend the algorithm to m-ary trees later. Assigning coordinates is easy, since for each node we assign its depth to the coordinate. For the coordinates, the idea is to recursively position the subtrees at each node. At each node, we place the roots of the left and right subtrees on top of each other, then recurse down the contours of the subtrees, pushing the subtrees apart whenever we find that they overlap. Once we finish scanning the contours, the subtrees are as close as they can be without overlapping, so we place the current node as their parent, centered between them.
A contour of a tree is all the rightmost or leftmost nodes at each level in the tree. For instance, in the below tree, the nodes in the left contour are marked in red:
More formally, the left contour of a tree with height is the sequence of nodes such that the leftmost node of depth in the tree. The right contour is defined symmetrically. Note that this means the contours of a tree are always the same length as the height of the tree, plus one.
Reingold and Tilford split the assignment of coordinates into two steps: first, we calculate the horizontal distance from each node to either of its children; second, "petrify" the tree by calculating the actual coordinates based on each node's path from the root. Calculating the horizontal offsets works by a postorder traversal; at each node, we (1) recursively position and construct the contours for our left and right subtree, (2) figure out how far apart we should space the subtrees (and hence, the offset at the current node) by recursing in lockstep down the right contour of the left subtree and the left contour of the right subtree, (3) construct the contours of the tree rooted at the current node.
Intuitively, it's easy to see why we recurse down the right contour of the left subtree and the left contour of the right subtree: these are the "closest" parts of the subtrees to each other.
Assuming we can follow the contours, the core idea of the paper is simply to go level by level on the subtree contours, looking at the left subtree's rightmost and right subtree's leftmost node on that level. We keep track of how far apart the left and right nodes are, and whenever that distance is too small, we increase the distance at the root to separate the left and right.
But as you can see from Figs. 3 and 4 above, there may not be a direct link from one node on the contour to the next. So as part of the algorithm, we need to construct a way for us to efficiently follow the contour in that case. The way Reingold and Tilford tackle this is through "threads," pointers from dead-end nodes to the next node on the contour, creating the threads as the recursion unwinds back up the tree. Since those dead-end nodes would typically contain two null child pointers, they reuse one of them to point to the next contour node, creating a path to ensure later traversal is efficient.1
All the complexity of the paper is how those threads get constructed; once we have a contour to follow, calculating the distances is straightforward. If you're interested in the specifics of that thread construction, I encourage you to read the paper itself; it's only 6 pages! For your benefit, here's my comment-annotated version of the Pascal source code provided in the paper. And as we'll see in the next section, it's not necessary to do this pointer manipulation anyways; there's a much simpler way to recurse down the contours.
As mentioned, the Reingold-Tilford algorithm is time in number of nodes in the tree. Extending the algorithm to m-ary trees while preserving this bound is also fairly simple: in each node, store the distance between each adjacent pair of subtrees, and do the same scan as on binary trees to "clump" pairs of subtrees together, calculating how far apart they should be. In other words, fold the scan across all subtrees.
Once More, With Purity
In a purely functional language like Haskell, we can't exactly do the same sort of raw pointer manipulation. Though we can definitely construct a DAG where a node has two parents (one "real" parent and one "thread" parent), updating such a structure is painful at best. What if that thread target then needs to be threaded itself? We would have to update both parents as well, making the recursion impossible to write.
The only reason we need the threads is to provide information about the contours. That suggests an alternative: we could simply construct said contour information directly. For each subtree, we construct two lists of nodes, representing the left and right contours. Thankfully, contours have an easy recursive definition.
Let be the height of tree . Let the sequence of nodes in the left contour , as defined above. Given a node with children and , let and . Then,
That is, the left contour of the tree rooted at always starts with the node , since the root is the only node at its level and thus leftmost by definition. Then we follow the left contour of the left subtree as far as we can. If the left subtree is shorter than the right subtree, we then follow the left contour of the right subtree for any remaining levels. Remember that the length of a contour for our tree must be the same as the height of the tree + 1, so we cannot just use the left subtree's contour if the right subtree is taller. The definition of is symmetrical.
It remains to show that this definition contains the leftmost nodes at each level of the tree rooted at . Firstly, it contains nodes, which is exactly correct. For each node in , it is either itself, a node from , or a node from . As above, is trivially the leftmost node on its level. If the node is from , by definition it is the leftmost node in at its level, and is trivially to the left of any nodes on that level in . If the node is from , must not have any nodes at its level (since we drop the first part of ), and thus the node is also trivially leftmost at its level.
Now that we have a way to construct the contour information, we do a straightforward recursive scan down the sequence of contour nodes, the same way we would if we had thread pointers. The full implementation can be seen here.
I show that this way of implementing the Reingold-Tilford algorithm still runs in time. Other than the construction of the contours at each node, the Haskell code above recurses in exactly the same way as Reingold and Tilford's original Pascal code, visiting each node and scanning down the contours. Therefore, we can assume everything excluding the contour construction runs in time.
We use the function spliceContours
to construct the contours at a node. Calls to this function can be expensive when we need to take some contour nodes from the left subtree and some from the right subtree, due to the calls to list append, take, drop, and length.
-- Since the scan only cares about the distance between the
-- two nodes at the current level, the only info we need about
-- the contour is relative x distance to the next node in the contour
type Contour = [Int64]
data Contours = Contours
left :: Contour
{ right :: Contour
,
}deriving Show
-- Distances calculated while performing the contour scan.
-- lroffsum = the offset from the root to the node in the right contour
-- of the left subtree, which is at the level we stopped the scan at
-- and so on for lloffsum, rloffsum, rroffsum
data ScanResult = ScanResult
rootOffset :: Offset
{ lloffsum :: Offset
, lroffsum :: Offset
, rloffsum :: Offset
, rroffsum :: Offset
,
}deriving Show
spliceContours :: Int64 -> ScanResult -> (Contours, Contours) -> Contours
Contours ll lr, Contours rl rr) =
spliceContours rootOffset scan (Contours leftContour rightContour
where
compareLength :: [a] -> [b] -> Ordering
= EQ
compareLength [] [] = LT
compareLength [] _y = GT
compareLength _x [] :xs) (_:ys) = compareLength xs ys
compareLength (_
leftContour :: Contour
= case (rl `compareLength` ll, ll, rl) of
leftContour -> [0]
(_, [], []) EQ, _, _) -> negate rootOffset : ll
(LT, _, _) -> negate rootOffset : ll
(GT, [], _) -> rootOffset : rl
(GT, _, _) ->
(negate rootOffset
: take (length ll - 1) ll
++ (rloffsum scan - lloffsum scan)
: drop (length ll) rl
rightContour :: Contour
= case (lr `compareLength` rr, rr, lr) of
rightContour -> [0]
(_, [], []) EQ, _, _) -> rootOffset : rr
(LT, _, _) -> rootOffset : rr
(GT, [], _) -> negate rootOffset : lr
(GT, _, _) ->
(
rootOffset: take (length rr - 1) rr
++ (lroffsum scan - rroffsum scan)
: drop (length rr) lr
All the expensive functions called by spliceContours
take time linear in the height of the shorter subtree, and it calls them a constant number of times. Let us then define
as the cost of spliceContours
for constructing the contours at a single node , and let us define as the sum of for all nodes in tree .
The paper defines the total cost of the recursive scan and contour threading as , where the number of nodes in the tree. They show that is , and define it recursively as
If we expand in the definition of above, we see that the definition of is exactly that of . Since , . The total cost of the recursive scan is , which is linear.
Conclusion
I use lists for representing the contours to simplify the presentation and implementation of the algorithm. It just so happens that even using linked lists, we still get a linear time algorithm! Happily, this means that the algorithm can easily be implemented in a pure manner even in non-lazy languages, removing the need for complicated pointer manipulations.
But in a real implementation, there's no reason to prefer linked lists over a data structure which supports efficient insertion and catenation. If one expects to be drawing very large trees, storing the contours using a different, persisent data structure like 2-3 finger trees or Okasaki's implicit recursive slowdown deque may prove more efficient.
Technically, the only operations needed are the stack operations, head, tail, and cons. Constructing the new contours can happen at the same time as the contour scan, by consing the correct nodes to the appropriate contours as the recursion unwinds. The implementation would become more complicated, however.
Before you close that tab...
Want to become an expert at Haskell, but not sure how? I get it: it's an endless stream of inscrutable concepts and words, as if you've stepped into some strange bizarro world. Where do you even start? Why does any of this matter? How deep do these rabbit holes go?
I want to help. What if you always knew exactly what the next signpost on your journey was, if you always knew exactly what to learn next? That's why I created a Roadmap to Expert for you: a checklist of everything you need to know, sorted by difficulty, broken down into individual, easily-digestible chunks. Best of all: it's free! Just sign up for my email list below.
And there's more where that came from: Only a fraction of what I write ends up on this blog. Sign up and get advice, techniques, and templates for writing real, useful programs, straight to your inbox.
Absolutely no spam, ever. I respect your email privacy. Unsubscribe anytime.
Footnotes
↥1 Thinking that this pointer manipulation is gross is a sign of sanity.