December 27, 2014

OCaml Lists in Rust

Rust is a blazingly fast and safe systems programming language which marries low-level programming constructs like pointers and move semantics with high-level programming constructs like algebraic data types, pattern matching, and type inference. In an effort to learn Rust, I ported OCaml's List module to Rust! You can find the code and its documentation on GitHub.

In this article, I'll walk through the process of implementing the Rust library, and I'll explain a few OCaml and Rust concepts along the way. I'll assume you have some familiarity with Rust, OCaml and OCaml's List module.

1. The Type

The first step in porting OCaml's lists to Rust is to decide on a list type for Rust. OCaml's builtin list type is a polymorphic algebraic data type similar to the following.

type 'a list = Nil | Cons of 'a * 'a list

What does this mean? Well, type 'a list declares a new type 'a list pronounced "alpha list". 'a is a type variable that can be instantiated with any concrete type. For example, you could instantiate 'a with int and get int list's like [1; 2; 3]. Or, you could instantiate 'a with int list to get int list list's like [[1]; [2; 3]]!

Instances of the 'a list type can take two forms. They can either be Nil which represents the empty list, or they can be a Cons of an element of type 'a and an element of type 'a list. A Cons of an element x and a list l represents the list whose head is x and whose tail is l. For example, the list Nil represents the empty list [], the list Cons(1, Nil) represents the list [1], and the list Cons(1, Cons(2, Cons(3, Nil))) represents the list [1; 2; 3].

$$ \texttt{Cons(}\underbrace{\texttt{1}}_{\text{head}}\texttt{,} \underbrace{\texttt{Cons(2, Cons(3, Nil))}}_{\text{tail}} \texttt{)} $$

Now that we've got OCaml's list type sorted out, how do we represent this type in Rust? Since Rust has polymorphic algebraic data types too, it should be easy! Here's our first attempt.

enum List<A> {
    Nil,
    Cons(A, List<A>)
}

Pretty similar, huh? Here, List<A> takes the place of 'a list. A, like 'a, is a type variable that can be instantiated with concrete types (an int list in OCaml, for example, is a List<int> in Rust). And again, lists can take two forms: Nil and Cons.

Sounds good, so what happens when we try to compile this?

list.rs:1:1: 4:2 error: illegal recursive enum type; wrap the inner value in a box to make it representable [E0072]
list.rs:1 enum List<A> {
list.rs:2     Nil,
list.rs:3     Cons(A, List<A>)
list.rs:4 }
error: aborting due to previous error

Uh-oh, our code is erroneous; what's going on? Well, OCaml's recursive types allow us to define 'a list in terms of itself (i.e. type 'a list = ... | Cons of ... * 'a list). Rust allows us to do this too, but in order for instances of List<A> to have a fixed size, we have to wrap the recursive branches of our algebraic data type in boxes (in short, a box is a unique pointer to data allocated on the heap; more information can be found elsewhere).

With boxes in mind, we can settle on our Rust list type.

enum List<A> {
    Nil,
    Cons(A, Box<List<A>>)
}

Nil represents the empty list []. Cons(1i, box Nil) represents the singleton list [1i]. Cons(1i, box Cons(2, box Cons(3, box Nil))) represents the list [1i, 2, 3].

2. The Macro

Earlier, I mentioned that the OCaml value Cons(1, Cons(2, Cons(3, Nil))) represents the list [1; 2; 3]. Luckily, OCaml has sugary syntax and let's us write [1; 2; 3] wherever we mean Cons(1, Cons(2, Cons(3, Nil))). It would be nice if we had the same sugar in Rust so that we could write something like [1i, 2, 3] wherever we mean Cons(1i, box Cons(2, box Cons(3, box Nil))). Good news: with Rust macros, we can!

What are macros? Well, as programmers we're used to abstracting over values using things like functions. Macros are like functions, but they abstract over syntax rather than values. In reality, macros are rather complicated, so I'll defer an in-depth discussion. For now, just read the following macro.

macro_rules! list[
    ()                       => (Nil);
    ($x:expr)                => (Cons($x, box Nil));
    ($x:expr, $($xs:expr),+) => (Cons($x, box list!($($xs),+)));
];

The macro lets us write things like list![], list![1i], and list![1i, 2, 3] which expand to Nil, Cons(1i, box Nil), and Cons(1i, box Cons(2, box Cons(3, box Nil)))!

3. The Functions

Now that we've ported OCaml's list type and OCaml's syntactic sugar, it's time to port the functions. OCaml's List module defines quite a few functions that Rust includes in its iter module (e.g. map, filter, fold, any, all). I wrote functions to convert our Rust lists to and from iterators, so we could take advantage of the iter module if we wanted to. However, since the project is mainly pedagogical, I decided to write all of OCaml's List functions from scratch as well.

I'll walk you through the implementation of two functions, fold_left and map, and let you explore the other functions at your leisure.

3.1 fold_left's Type

First up, let's tackle fold_left and its type. In OCaml, fold_left has the following type (pronounced "alpha to beta to alpha to alpha to beta list to alpha").

('a -> 'b -> 'a) -> 'a -> 'b list -> 'a

You can understand this type by thinking of fold_left as a function which takes three arguments: f, a, and l.

  1. f is a function of type ('a -> 'b -> 'a).
  2. a is an accumulator and is of type 'a.
  3. l is a list and is of type 'b list.

fold_left evaluates to something of type 'a.

$$ \newcommand{\ra}{\rightarrow} \underbrace{\texttt{('a $\ra{}$ 'b $\ra{}$ 'a)}}_{\texttt{f}} \ra{} \underbrace{\texttt{'a}}_{\texttt{a}} \ra{} \underbrace{\texttt{'b list}}_{\texttt{l}} \ra{} \underbrace{\texttt{'a}}_{\texttt{return}} $$

3.2 fold_left's Semantics

Now, let's tackle fold_left's semantics. In brief, fold_left f a [b_1; b_2; ...; b_n] evaluates to f (... (f (f a b1) b2) ... ) bn. First, the function f is applied to the accumulator a and the first element of the list b1. The result of this application, f a b1 becomes the new accumulator. Then, f is applied to the new accumulator and the second element of the list: f (f a b1) b2. This pattern continues until the list is empty at which point the fold evaluates to the accumulator. For example, consider the evaluation of the following fold which computes the sum of a list of integers.

fold_left (+) 0 [1; 2; 3]
fold_left (+) ((+) 0 1) [2; 3]
fold_left (+) (1) [2; 3]
fold_left (+) ((+) 1 2) [3]
fold_left (+) (3) [3]
fold_left (+) ((+) 3 3) []
fold_left (+) (6) []
6

Here, f = (+), a = 0, and l = [1; 2; 3]. You can convince yourself this evaluation is the same as the following.

(+) ((+) ((+) 0 1) 2) 3
(+) ((+) 1 2) 3
(+) 3 3
6

3.3 fold_left's Code

On to the code! Here's the standard implementation of fold_left in OCaml.

let rec fold_left f a l =
  match l with
  | [] -> a
  | head::tail -> fold_left f (f a head) tail

Now, let's port it to Rust! The first thing we'll do is get the function signature compiling but leave the body unimplemented for now.

impl<A> List<A> {
    fn fold_left<B>(&self, f: |B, &A| -> B, a: B) -> B {
        unimplemented!()
    }
}

Let's break down what's going on.

  • The impl block, impl<A> List<A> {...}, declares A as a type variable and allows us to define methods for expressions of type List<A>.
  • fn fold_left<B> declares a function fold_left and a second type variable B.
  • (&self, f: |B, &A| -> B, a: B) specifies that fold_left takes three arguments.
    1. &self is a reference to an expression of type List<A>. OCaml doesn't have any notion of pointers or references, but in Rust we have to borrow arguments using references (i.e. &self as opposed to self) to avoid moving them. Also, note that when we type something like list![1i, 2, 3].fold_left(f, a), Rust converts it to something like fold_left(&list![1i, 2, 3], f a).
    2. f is a closure that takes to arguments of type B and &A (a reference to something of type A) and that output is something of type B.
    3. a is our accumulator of type B.
  • -> B specifies that our function returns something of type B.

This type signature should look very similar to our OCaml type where A and B are substitutes for 'b and 'a respectively. Now, let's add the body.

impl<A> List<A> {
    fn fold_left<B>(&self, f: |B, &A| -> B, a: B) -> B {
        match *self {
            Nil => a,
            Cons(ref x, ref xs) => {
                let a = f(a, x);
                xs.fold_left(f, a)
            }
        }
    }
}

This code should look very similar to our OCaml code. When our list is empty we evaluate to the accumulator. When the list is non-empty, we compute a new accumulator using f and then recurse.

You can play with fold_left yourself! Just press "evaluate".

3.4 map

Now that we have fold_left written, let's write a simpler function, map, which has the following type.

('a -> 'b) -> 'a list -> 'b list

map f [a1, ..., an] evaluates to [f a1, ..., f an]. For example map (fun x -> x + 1) [1; 2; 3] evaluates to [2; 3; 4], and map (fun x -> x * x) [1; 2; 3] evaluates to [1; 4; 9].

What's neat about map is that we can implement it using fold_left (and a function reved which reverses a list)!

impl<A> List<A> {
    fn map<B>(&self, f: |&A| -> B) -> List<B> {
        self.fold_left(|l, x| Cons(f(x), box l), list![]).reved()
    }
}

You can play with map too!

3.5 Adding Lifetimes

So far, so good. We've written fold_left and used it to implement map. Now, let's experiment with map to make sure it works how we think it should. First, let's create a list of integers xs. Then, we'll map the identity function |x| x across the list. Since map expects a function of type |&A| -> B, this should create a list of int references (List<&int>) from xs.

fn main() {
    let xs: List<int> = list![1i, 2, 3];
    let ys: List<&int> = xs.map(|x| x);
    println!("xs = {}", xs);
    println!("ys = {}", ys);
}

So what happens when we run this program? Well, click "evaluate" and see for yourself.

That's a beefy error message.

<anon>:69:37: 69:38 error: cannot infer an appropriate lifetime due to conflicting requirements
<anon>:69     let ys: List<&int> = xs.map(|x| x);
                                              ^
<anon>:69:37: 69:38 note: first, the lifetime cannot outlive an anonymous lifetime defined on the block at 69:36...
<anon>:69     let ys: List<&int> = xs.map(|x| x);
                                              ^
<anon>:69:37: 69:38 note: ...so that expression is assignable (expected `&int`, found `&int`)
<anon>:69     let ys: List<&int> = xs.map(|x| x);
                                              ^
<anon>:69:26: 69:39 note: but, the lifetime must be valid for the method call at 69:25...
<anon>:69     let ys: List<&int> = xs.map(|x| x);
                                   ^~~~~~~~~~~~~
<anon>:69:26: 69:39 note: ...so that the declared lifetime parameter bounds are satisfied
<anon>:69     let ys: List<&int> = xs.map(|x| x);
                                   ^~~~~~~~~~~~~
error: aborting due to previous error
playpen: application terminated with error code 101
Program ended.

So what's going on here? Well, the compiler is telling us that it can't infer a proper lifetime. It says the lifetime of x is limited to the lifetime of the closure |x| x, but we want the lifetime of x to be the same as the lifetime of xs.

Wow.

That's an interesting bug that you won't find in pretty much any other programming language besides Rust and one that I admittedly didn't even know existed until I had written a fairly large chunk of the code. I'll again defer an in-depth explanation to elsewhere, but I'll show you the fix: we just have to explicitly annotate the lifetimes of our references.

impl<'a, A> List<A> {
    fn fold_left<B>(&'a self, f: |B, &'a A| -> B, a: B) -> B {
        match *self {
            Nil => a,
            Cons(ref x, ref xs) => {
                let a = f(a, x);
                xs.fold_left(f, a)
            }
        }
    }

    fn map<B>(&'a self, f: |&'a A| -> B) -> List<B> {
        self.fold_left(|l, x| Cons(f(x), box l), list![]).reved()
    }
}

After this, our code compiles and runs how we expect!

4. Final Thoughts

Porting OCaml's List module to Rust was not without some head scratching. Rust is a complicated beast, and on more than one occasion I found myself staring at the code I just wrote wondering how in the world it worked. Frankly, I still don't know how some of the code works (e.g. some mem::replace wizardry I needed to use to implement into_iter). However, along with the head scratching came plenty of "Aha!" moments which made the experience awesome! Rust's concepts of ownership and borrowing were new to me and made me think about programming in a new way. Rust is definitely awesome, and I can't wait to keep learning more about it!