[blog] | [projects] | [about] | [imprint]

Lazy-sequences - part 2

13 January 2021
 

Lazy (evaluated) sequences - part 2.

In the last blog post I've talked about lazy sequences that are generated by a generator. The generator needs state to remember what number (or thing) it has generated before in order to generate the next. A consumer simply asks the generator about the next 'thing'.

This way of implementing lazy evaluated sequences has two negative consequences (thanks for pointing this out, Rainer). 1) there is not really a list or sequence data structure (not even a 'lazy' one :). The consumer generates a result data structure (a list) by repeatedly asking the generator for the requested number of items. 2) it is not possible to re-use a lexically scoped generator like below. Because it keeps state it will continue counting.

(let ((generator (range)))
         (print (take 3 generator))
         (print (take 2 generator)))
(0 1 2)
(3 4)
;; where we would expect:
(0 1 2)
(0 1)

So we will look at proper lazy sequences. What I'm writing about is fully handled in the book "Structure and Interpretation of Computer Programs" chapter 3.5. I have changed the names of functions slightly in order to be better comparable to the naming scheme in the last blog post.

Primitives

As you might know, a cons cell is the basis for lists. A cons is constructed of two cells. In Lisp the left part is called car and the right part is called cdr (those two names still refer to (maybe obsolete) implementation details as 'address register' and 'decrement register'). With combining conses it is possible to generate linked lists when the cdr is again a cons. The car also represents the head of the list and the cdr the tail.

Now, in lazy sequences the computation of the cdr part is deferred until needed like this. (I've called this cons wrapper just lazy-cons):

(defmacro lazy-cons (a b)
  `(cons ,a (delay ,b)))

This generates a normal cons from two values only that the second, the cdr, is not evaluated yet. The delay is simply just b wrapped into a lambda like this:

(defmacro delay (exp)
  `(lambda () ,exp))

So if we would macroexpand this we would get just this:

(cons a
      (lambda () b))

But it is good to create a new layer of meaning so I want to create a few primitives to hide the details of this and to make working with lazy sequences more natural similar to a normal cons.

Both of the definitions above have to be macros because otherwise both delay and b when passed into lazy-cons would be evaluated immediately. But that should be delayed until wanted.

In order to access car and cdr of the lazy-cons we introduce two more primitives, lazy-car and lazy-cdr:

(defun lazy-car (lazy-seq)
  (car lazy-seq))
(defun lazy-cdr (lazy-seq)
  (force (cdr lazy-seq)))

lazy-car just calls car. We could certainly just use car directly, but to be consistent and to create a new metaphor to be used for this lazy sequence we'll add both.

lazy-cdr does somthing additional. This is a key element. When accessing the cdr of the list we now enforce the computation of it. force is very simple. Where delay wrapped the expression into a lambda we now have to unwrap it by funcalling this lambda to compute the expression. So force looks like this:

(defun force (delayed-object)
  (funcall delayed-object))

Those 5 things are the base primitives to construct lazy sequences. Now let's create a new range function - which now is not a generator anymore.

Generate

(defun range (&key (from 0))
  (lazy-cons
   from
   (range :from (1+ from))))

This range implementation doesn't need state. It simply constructs and returns a lazy-cons. The special feature is that the cdr is again a call to range with an incremented from parameter. In effect, calling force on the lazy-cons will construct the next lazy-cons and so on.

If we mentally go through a call chain to construct the values 0, 1, 2, 3 we'd have to:

call (range :from 0)
=> (cons 0 <delayed range call>)
=> lazy-car = 0

force lazy-cdr which calls range :from 1
=> (cons 1 <delayed range call>)
=> lazy-car = 1

force lazy-cdr which calls range :from 2
=> (cons 2 <delayed range call>)
=> lazy-car = 2

force lazy-cdr which calls range :from 3
=> (cons 3 <delayed range call>)
=> lazy-car = 3

And so on. So a consumer, like take would have to iteratively or recursively go through this call chain to construct lazily computed values.

Consume

take generates a list, so it must collect all lazily computed values. How does it do that? By recursively creating conses.

(defun take (n lazy-seq)
  (if (= n 0)
      nil
      (cons (lazy-car lazy-seq)
            (take (1- n) (lazy-cdr lazy-seq)))))

So take creates conses from lazy-car (the head) and a recursive call to take with the next delayed cdr (the tail) which then constructs the result list we're after.

(take 5 (range))
(0 1 2 3 4)

That was pretty simple so far. A library (or language) that supports lazy sequences usually provides more functionality, like filtering or mapping.

Filtering

Filtering is a pretty nice and important part in this. It allows to create specialized lazy sequences. For example we could create a lazy sequence where take collects just even numbers.

(defun even-numbers ()
  (lazy-filter #'evenp (range)))
(take 5 (even-numbers))
(0 2 4 6 8)

This lazy-filter function is very flexible by allowing arbitrary filter functions (just like a filter function on normal lists). The implementation of lazy-filter must again create lazy-cons to be completely transparent to the consumer functions. This also allows the composition of filter functions.

(defun lazy-filter (pred lazy-seq)
  (cond
    ((null lazy-seq)
     nil)
    ((funcall pred (lazy-car lazy-seq))
     (lazy-cons (lazy-car lazy-seq)
                (lazy-filter pred (lazy-cdr lazy-seq))))
    (t
     (lazy-filter pred (lazy-cdr lazy-seq)))))

When the passed in lazy-seq parameter (which is a lazy-cons) is empty, just return nil (the empty list). When applying the predicate to lazy-car is true (t) then return a new lazy-cons with lazy-car as head and delayed call to lazy-filter with the 'forced' tail. When none of the above is true. Which means that this result has to be filtered out. Then call again lazy-filter with the next 'forced' tail.

A similar thing can be done for mapping. But I'll leave that for you to read up on in SICP.

[atom/rss feed]