Rust is a blazingly fast and safe systems programming language which marries low-level programming constructs like pointers and move semantics with high-level programming constructs like algebraic data types, pattern matching, and type inference. In an effort to learn Rust, I ported OCaml's List module to Rust! You can find the code and its documentation on GitHub.
In this article, I'll walk through the process of implementing the Rust library, and I'll explain a few OCaml and Rust concepts along the way. I'll assume you have some familiarity with Rust, OCaml and OCaml's List module.
The first step in porting OCaml's lists to Rust is to decide on a list type for Rust. OCaml's builtin list type is a polymorphic algebraic data type similar to the following.
type 'a list = Nil | Cons of 'a * 'a list
What does this mean? Well, type 'a list
declares a new type
'a list
pronounced "alpha list". 'a
is a type
variable that can be instantiated with any concrete type. For example, you
could instantiate 'a
with int
and get int
list
's like [1; 2; 3]
. Or, you could instantiate
'a
with int list
to get int list
list
's like [[1]; [2; 3]]
!
Instances of the 'a list
type can take two forms. They can
either be Nil
which represents the empty list, or they can be
a Cons
of an element of type 'a
and an element of
type 'a list
. A Cons
of an element x
and a list l
represents the list whose head is x
and whose tail is l
. For example, the list Nil
represents the empty list []
, the list Cons(1,
Nil)
represents the list [1]
, and the list
Cons(1, Cons(2, Cons(3, Nil)))
represents the list [1;
2; 3]
.
Now that we've got OCaml's list type sorted out, how do we represent this type in Rust? Since Rust has polymorphic algebraic data types too, it should be easy! Here's our first attempt.
enum List<A> {
Nil,
Cons(A, List<A>)
}
Pretty similar, huh? Here, List<A>
takes the place of
'a list
. A
, like 'a
, is a type
variable that can be instantiated with concrete types (an int
list
in OCaml, for example, is a List<int>
in
Rust). And again, lists can take two forms: Nil
and
Cons
.
Sounds good, so what happens when we try to compile this?
list.rs:1:1: 4:2 error: illegal recursive enum type; wrap the inner value in a box to make it representable [E0072]
list.rs:1 enum List<A> {
list.rs:2 Nil,
list.rs:3 Cons(A, List<A>)
list.rs:4 }
error: aborting due to previous error
Uh-oh, our code is erroneous; what's going on? Well, OCaml's recursive
types allow us to define 'a list
in terms of itself (i.e.
type 'a list = ... | Cons of ... * 'a list
). Rust allows us to
do this too, but in order for instances of List<A>
to
have a fixed size, we have to wrap the recursive branches of our algebraic
data type in box
es (in short, a box
is a unique
pointer to data allocated on the heap; more information can be found
elsewhere).
With box
es in mind, we can settle on our Rust list type.
enum List<A> {
Nil,
Cons(A, Box<List<A>>)
}
Nil
represents the empty list []
. Cons(1i,
box Nil)
represents the singleton list [1i]
.
Cons(1i, box Cons(2, box Cons(3, box Nil)))
represents the
list [1i, 2, 3]
.
Earlier, I mentioned that the OCaml value Cons(1, Cons(2, Cons(3,
Nil)))
represents the list [1; 2; 3]
. Luckily, OCaml
has sugary syntax and let's us write [1; 2; 3]
wherever we
mean Cons(1, Cons(2, Cons(3, Nil)))
. It would be nice if we
had the same sugar in Rust so that we could write something like [1i,
2, 3]
wherever we mean Cons(1i, box Cons(2, box Cons(3, box
Nil)))
. Good news: with
Rust macros,
we can!
What are macros? Well, as programmers we're used to abstracting over values using things like functions. Macros are like functions, but they abstract over syntax rather than values. In reality, macros are rather complicated, so I'll defer an in-depth discussion. For now, just read the following macro.
macro_rules! list[
() => (Nil);
($x:expr) => (Cons($x, box Nil));
($x:expr, $($xs:expr),+) => (Cons($x, box list!($($xs),+)));
];
The macro
lets us write things like list![]
, list![1i]
, and
list![1i, 2, 3]
which expand to Nil
,
Cons(1i, box Nil)
, and Cons(1i, box Cons(2, box Cons(3,
box Nil)))
!
Now that we've ported OCaml's list type and OCaml's syntactic sugar, it's
time to port the functions. OCaml's List module defines quite a few
functions that Rust includes in its iter
module (e.g.
map
,
filter
,
fold
,
any
,
all
).
I wrote functions to convert our Rust lists
to and
from
iterators, so we could take advantage of the iter
module if we
wanted to. However, since the project is mainly pedagogical, I decided to
write all of OCaml's List functions from scratch as well.
I'll walk you through the implementation of two functions,
fold_left
and
map
, and let you explore the
other functions
at your leisure.
fold_left
's Type
First up, let's tackle fold_left
and its type. In OCaml,
fold_left
has the following type (pronounced "alpha to beta to
alpha to alpha to beta list to alpha").
('a -> 'b -> 'a) -> 'a -> 'b list -> 'a
You can understand this type by thinking of fold_left
as a
function which takes three arguments: f
, a
, and
l
.
f
is a function of type ('a -> 'b ->
'a)
.
a
is an accumulator and is of type
'a
.
l
is a list and is of type 'b list
.
fold_left
evaluates to something of type 'a
.
fold_left
's Semantics
Now, let's tackle fold_left
's semantics. In brief,
fold_left f a [b_1; b_2; ...; b_n]
evaluates to f (...
(f (f a b1) b2) ... ) bn
. First, the function f
is
applied to the accumulator a
and the first element of the list
b1
. The result of this application, f a b1
becomes the new accumulator. Then, f
is applied to the new
accumulator and the second element of the list: f (f a b1) b2
.
This pattern continues until the list is empty at which point the fold
evaluates to the accumulator. For example, consider the evaluation of the
following fold which computes the sum of a list of integers.
fold_left (+) 0 [1; 2; 3]
fold_left (+) ((+) 0 1) [2; 3]
fold_left (+) (1) [2; 3]
fold_left (+) ((+) 1 2) [3]
fold_left (+) (3) [3]
fold_left (+) ((+) 3 3) []
fold_left (+) (6) []
6
Here, f = (+)
, a = 0
, and l = [1; 2;
3]
. You can convince yourself this evaluation is the same as the
following.
(+) ((+) ((+) 0 1) 2) 3
(+) ((+) 1 2) 3
(+) 3 3
6
fold_left
's Code
On to the code! Here's
the standard implementation
of fold_left
in OCaml.
let rec fold_left f a l =
match l with
| [] -> a
| head::tail -> fold_left f (f a head) tail
Now, let's port it to Rust! The first thing we'll do is get the function signature compiling but leave the body unimplemented for now.
impl<A> List<A> {
fn fold_left<B>(&self, f: |B, &A| -> B, a: B) -> B {
unimplemented!()
}
}
Let's break down what's going on.
impl
block,
impl<A> List<A> {...}
, declares A
as a type variable and allows us to define methods for expressions of
type List<A>
.
fn fold_left<B>
declares a function
fold_left
and a second type variable B
.
(&self, f: |B, &A| -> B, a: B)
specifies that
fold_left
takes three arguments.
&self
is a reference to an expression of type
List<A>
. OCaml doesn't have any notion of
pointers or references, but in Rust we have to borrow arguments
using references (i.e. &self
as opposed to
self
) to avoid moving them. Also, note that when we
type something like list![1i, 2, 3].fold_left(f, a)
,
Rust converts it to something like fold_left(&list![1i, 2,
3], f a)
.
f
is a closure that takes to arguments of type
B
and &A
(a reference to something of
type A
) and that output is something of type
B
.
a
is our accumulator of type B
.
-> B
specifies that our function returns something of type
B
.
This type signature should look very similar to our OCaml type where
A
and B
are substitutes for 'b
and
'a
respectively. Now, let's add the body.
impl<A> List<A> {
fn fold_left<B>(&self, f: |B, &A| -> B, a: B) -> B {
match *self {
Nil => a,
Cons(ref x, ref xs) => {
let a = f(a, x);
xs.fold_left(f, a)
}
}
}
}
This code should look very similar to our OCaml code. When our list is
empty we evaluate to the accumulator. When the list is non-empty, we
compute a new accumulator using f
and then recurse.
You can play with fold_left
yourself! Just press "evaluate".
map
Now that we have fold_left
written, let's write a simpler
function, map
, which has the following type.
('a -> 'b) -> 'a list -> 'b list
map f [a1, ..., an]
evaluates to [f a1, ..., f
an]
. For example map (fun x -> x + 1) [1; 2; 3]
evaluates to [2; 3; 4]
, and map (fun x -> x * x) [1; 2;
3]
evaluates to [1; 4; 9]
.
What's neat about map
is that we can implement it using
fold_left
(and a function reved
which reverses a
list)!
impl<A> List<A> {
fn map<B>(&self, f: |&A| -> B) -> List<B> {
self.fold_left(|l, x| Cons(f(x), box l), list![]).reved()
}
}
You can play with map too!
So far, so good. We've written fold_left
and used it to
implement map
. Now, let's experiment with map
to
make sure it works how we think it should. First, let's create a list of
integers xs
. Then, we'll map the identity function |x|
x
across the list. Since map
expects a function of type
|&A| -> B
, this should create a list of int references
(List<&int>
) from xs
.
fn main() {
let xs: List<int> = list![1i, 2, 3];
let ys: List<&int> = xs.map(|x| x);
println!("xs = {}", xs);
println!("ys = {}", ys);
}
So what happens when we run this program? Well, click "evaluate" and see for yourself.
That's a beefy error message.
<anon>:69:37: 69:38 error: cannot infer an appropriate lifetime due to conflicting requirements
<anon>:69 let ys: List<&int> = xs.map(|x| x);
^
<anon>:69:37: 69:38 note: first, the lifetime cannot outlive an anonymous lifetime defined on the block at 69:36...
<anon>:69 let ys: List<&int> = xs.map(|x| x);
^
<anon>:69:37: 69:38 note: ...so that expression is assignable (expected `&int`, found `&int`)
<anon>:69 let ys: List<&int> = xs.map(|x| x);
^
<anon>:69:26: 69:39 note: but, the lifetime must be valid for the method call at 69:25...
<anon>:69 let ys: List<&int> = xs.map(|x| x);
^~~~~~~~~~~~~
<anon>:69:26: 69:39 note: ...so that the declared lifetime parameter bounds are satisfied
<anon>:69 let ys: List<&int> = xs.map(|x| x);
^~~~~~~~~~~~~
error: aborting due to previous error
playpen: application terminated with error code 101
Program ended.
So what's going on here? Well, the compiler is telling us that it can't
infer a proper lifetime. It says the lifetime of x
is limited
to the lifetime of the closure |x| x
, but we want the lifetime
of x
to be the same as the lifetime of xs
.
Wow.
That's an interesting bug that you won't find in pretty much any other programming language besides Rust and one that I admittedly didn't even know existed until I had written a fairly large chunk of the code. I'll again defer an in-depth explanation to elsewhere, but I'll show you the fix: we just have to explicitly annotate the lifetimes of our references.
impl<'a, A> List<A> {
fn fold_left<B>(&'a self, f: |B, &'a A| -> B, a: B) -> B {
match *self {
Nil => a,
Cons(ref x, ref xs) => {
let a = f(a, x);
xs.fold_left(f, a)
}
}
}
fn map<B>(&'a self, f: |&'a A| -> B) -> List<B> {
self.fold_left(|l, x| Cons(f(x), box l), list![]).reved()
}
}
After this, our code compiles and runs how we expect!
Porting OCaml's List module to Rust was not without some head scratching.
Rust is a complicated beast, and on more than one occasion I found myself
staring at the code I just wrote wondering how in the world it worked.
Frankly, I still don't know how some of the code works (e.g. some
mem::replace
wizardry I needed to use to implement
into_iter
). However, along with the head scratching came
plenty of "Aha!" moments which made the experience awesome! Rust's
concepts of ownership and borrowing were new to me and made me think about
programming in a new way. Rust is definitely awesome, and I can't wait to
keep learning more about it!