20 Days of Clojure: Day 9

News from Rich on a optimization he found for vectors:

I re-read Okasaki’s paper yesterday and woke up today with an idea for speeding up the Clojure vector conj and pop. They are now 2-5x faster than they were. The trick is to keep the tail of the vector out of the tree until it is a full (32-element) node. Note that this greatly speeds up vector construction, which typically consists of repeatedly conj’ing onto [].

Today, I’m going to look at sequences. I think it’s helpful to go back to the SICP — this time, the lecture on stream machines:



This is just a small part of the whole
SICP stream machine lecture (and Streams Part II).

Sequences in clojure are lazy like stream as described in the video, and if you watch the whole lecture, you will see a rich stream library built up that is present in clojure.

This means that many operations can be incredibly efficient, but seem like they are operating on gigantic data structures at once (since my day job is image processing, this is especially intriguing to me — maybe more on that later). To really see this, you need to play with infinite sequences. In clojure, it’s easy to create an infinite sequence with cycle — (cycle [1 2 3]) returns a sequence which continuously yields 1, 2, 3, 1, 2, 3, etc. If you use sequence operations on it, you can get a feel for how laziness works because it’s obvious that these functions could not possibly be consuming the whole sequence. Here’s a clojure Repl script.

user=> (def cyc123 (cycle [1 2 3]))
#<Var: user/cyc123>
user=> (every? (fn [x] (= 1 x)) cyc123)
false
user=> (take 5 cyc123)
(1 2 3 1 2)
user=> (take 5 (filter (fn [x] (= x 1)) cyc123))
(1 1 1 1 1)
user=> (take 5 (drop 1 cyc123))
(2 3 1 2 3)
user=> (take 5 (interleave cyc123 cyc123))
(1 1 2 2 3)
user=> (take 5 (take-nth 3 cyc123))
(1 1 1 1 1)
user=> (take 10 (for [x cyc123] (< x 3) x)) ;old syntax
user=> (take 10 (for [x cyc123 :when (< x 3)] x))
(1 2 1 2 1 2 1 2 1 2)
user=> (mapcat (fn [x y] (list x y)) cyc123 ‘(10 11 12))
(1 10 2 11 3 12)

More on this tomorrow.