How does "tying the knot" on this cyclic linked list in Haskell work?

Question

I am learning Haskell and was reading through Tying the Knot on how to build a cyclic linked list. In the code

data DList a = DLNode (DList a) a (DList a)
mkDList :: [a] -> DList a
mkDList [] = error "must have at least one element"
mkDList xs = let (first,last) = go last xs first
         in  first
         where go :: DList a -> [a] -> DList a -> (DList a, DList a)
               go prev []     next = (next,prev)
               go prev (x:xs) next = let this        = DLNode prev x rest
                                         (rest,last) = go this xs next
                                     in  (this,last)

I am trying to understand the call where they link the last element to the first through a "little trick" (!) called tying the knot:

mkDList xs = let (first,last) = go last xs first

But I am having difficulty seeing how this works. What is "go" initially called with? And per the comment in the article, how is the first result from "go" "passed back in"?

Thanks!


Show source
| haskell   | lazy-evaluation   2017-01-06 17:01 2 Answers

Answers ( 2 )

  1. 2017-01-06 19:01

    Since Haskell is lazy, values are evaluated until strictly necessary. We can walk through a simple example using equational reasoning to see where that gets us.

    Start with the simplest example: a one-element list.

    mkDList [1] == let (first, last) = go last [1] first in first
    

    It seems that you can't call go, because you don't know what last and first are equal to yet. However, you can think of them as unevaluated black boxes: it doesn't matter what they are, just that you can proceed with equational reasoning with them.

    -- Just plug last and first into the definition of go
    -- last2 is just a renaming of the argument for clarity
    go last [1] first == let this = DLNode last 1 rest
                             (rest, last2) = go this [] first
                         in (this, last2)
    

    Let's try to evaluate the next call to go in the same way.

    go this [] first == (first, this)
    

    Conveniently, we didn't need to imagine any new black boxes; go simply evaluates to its original arguments in a slightly repackaged manner.

    OK, now we can go back the way we came, and replace the recursive call to go with its evaluation.

    go last [1] first == let this = DLNode last 1 rest
                             (rest, last2) = (first, this)
                         in (this, last2)
    

    And we'll plug this back into our original equation with mkDList:

    mkDList [1] == let (first, last) = let this = DLNode last 1 rest
                                           (rest, last2) = (first, this)
                                       in (this, last2)
                   in first
    

    This doesn't look too helpful, but remember, we haven't actually called mkDList yet; we just used equational reasoning to simplify its definition a little bit. In particular, there are no recursive calls to go, just one let expression nested in another.


    Since Haskell is lazy, we don't have to evaluate any of this until it is absolutely necessary, such as when we try to pattern-match against the return value of mkDlist [1]:

    let (DLNode p x n) = mkDList [1] in x
    

    To evaluate this expression, we just need to ask the following questions:

    1. "What is the value of x?" Answer: we need to pattern-match against mkDList [1] first.
    2. "What is the value of mkDList?" Answer: first.
    3. "What is the value of first?" Answer: this.
    4. "What is the value of this?" Answer: DLNode last 1 rest

    At this point, you have enough information to see that x == 1, and last and rest don't need to be evaluated further. You could, though, pattern match again to see what, for example, p is, and discover that

    p == last == last2 == this == DLNode last 1 rest
    

    and

    n == rest == first == this == DLNode last 1 rest
    

    The magic is a call like (first, last) = go last xs first doesn't actually need values for its arguments; it just needs placeholders to keep track of what values first and last will eventually get when they are evaluated. These placeholders are called "thunks", and they represent pieces of unevaluated code. They let us refer to boxes that we haven't filled with anything yet, and we can pass the empty boxes to go safe in the knowledge that somebody will fill them before anyone else tries to look in them. (In fact, go itself never does; it simply keeps passing them along until somebody outside of mkDList tries to look at them.)

  2. 2017-01-06 19:01

    We can try some simple input first, to see what's going on there:

    mkDList [1] = first
        where 
        (first,last) := go last [1] first
                      = let { prev=last; next=first;         -- prev = last
                              x=1; xs=[] }                   -- next = first
                        in go prev (x:xs) next
                           = let this := (DLNode prev 1 rest)  -- a node is constructed, with 
                                          -- the two pointers still pointing into the unknown
                                 (rest,last2) := go this [] next
                                               = (next,this) -- rest := next
                                                             -- last2 := this
                             in  (this,last2)                -- first := this
                                                             -- last := last2
    

    let in Haskell is recursive: same name can appear on both left and right sides of the equation sign, and will refer to the same entity.

    First, go is called with last, [1] and first. Both last and first do not yet refer to any value; they exist only as identity, kind of "named pointers" to still empty boxes, boxes which are yet to be "filled" with values.

    Going into the guts of go, both names are "fleshed out", and then the final value (this, last2) is returned; then the pattern (first, last) is matched with that value. This is how the last finally gets its value, even though it was used as a named identity inside the go invocations. This is what is referred to as the tying of the knot: imagine an arrow going "out" of last into the go invocations, coming back to it from the deep; and same with first; thus creating chains of equivalences:

        first := this = (DLNode prev 1 rest)
        last := last2 := this
        prev = last
        rest := next = first
    

    The above follows a somewhat imperative view of Haskell's semantics as a single-assignment language. The := operator is used, as a pseudocode, to provide a visual clue to the fact that a value is calculated by the expression on the right, and is "passed" into the variables in the pattern on the left of the equality sign (when that pattern is matched with the calculated value).


    In fact, the name "next" is no good, as we're just passing along the very first node all the way down, to be used as the last node's next node:

     mkDList xs@(_:_) = first where (first,last) = go last xs first
    
     go ::   DList a ->  [a]  ->        DList a -> (DList a, DList a)
     go          prev  (x:xs)            first = 
        (this,                     last)        -- (this   , last   )
         where 
         this                                 := DLNode 
                 prev   x    rest
         (                   rest, last)      := go 
                 this     xs             first 
    
     go          prev     []             first = 
                                        (first,  -- first --> rest of the last node
                 prev)
    

    This can be "described" by equivalent Prolog definition:

    mkDList(    [X|XS], First) :-                 % mkDList( in, out)
      go( Last, [X|XS], First,   First, Last).    % go( in, in, in, out, out)
    
    go(   Prev, [X|XS], First,   This,  Last) :-   This = dlnode(Prev, X, Rest),
      go(  This,   XS,  First,   Rest,  Last).    % fill the Rest, return Last
    go(   Prev, [],     First,   First, Prev).
    

    Indeed,

    ?- mkDList([1,2,3],X).
    X = dlnode(_S1, 1, _S2), % where
        _S2 = dlnode(X, 2, _S1),
        _S1 = dlnode(_S2, 3, X).
    
◀ Go back