Writing a Lisp Interpreter in Rust - Part 2: Parsing

index

The Problem

We have tokens. We need a tree.

1
// We have this:
2
[LeftParen, Symbol("+"), Integer(1), Integer(2), RightParen]
3

4
// We need this:
5
CallExpression("+", [Literal(Integer(1)), Literal(Integer(2))])

Why a tree? Because evaluation is recursive. To evaluate (+ (* 2 3) 4):

Evaluate the function: +
Evaluate each argument: (* 2 3) and 4
Apply the function to the results

But (* 2 3) is itself a function call, so we recurse. The tree structure mirrors this evaluation order - parent nodes can’t be evaluated until their children are.

Most languages require complex parsing. Lisp doesn’t. Its syntax is already a tree.

The Type System

First, we convert tokens into AST nodes. The key types:

1
pub enum Tokens {
2
    Bounds(TokenBounds),     // (, ), [, ], {, }
3
    Literal(Literal),        // 42, "hello", true
4
    Symbol(String),          // +, defn, my-var
5
    Key(String),            // :name, :age
6
}
7

8
pub enum Form {
9
    Root,                    // Top level container
10
    Literal(Literal),        // Numbers, strings, bools
11
    CallExpression(String),  // (function-name ...)
12
    Symbol(String),          // Variable references
13
    List,                    // [1 2 3]
14
    Record,                  // {:key value}
15
    Key(String),            // Record keys
16
}

The magic happens with the From trait:

1
impl From<Tokens> for Form {
2
    fn from(value: Tokens) -> Self {
3
        match value {
4
            Tokens::Bounds(_) => unreachable!(),
5
            Tokens::Literal(l) => Form::Literal(l),
6
            Tokens::Symbol(s) => Form::Symbol(s),
7
            Tokens::Key(k) => Form::Key(k),
8
        }
9
    }
10
}

Bounds tokens don’t become forms - they control the parsing flow.

The AST

Our tree nodes are represented by the Form enum:

1
pub enum Form {
2
    Root,
3
    Literal(Literal),
4
    CallExpression(String),
5
    Symbol(String),
6
    List,
7
    Record,
8
    Key(String),
9
}

Each variant represents a different syntactic element:

CallExpression: Function calls like (+ 1 2)
List: Data lists like [1 2 3]
Record: Hash maps like {:name "Alice"}
Symbol: Variables and function names
Literal: Numbers, strings, booleans

The Algorithm

We use recursive descent parsing. The algorithm:

Take one token at a time
If it’s an opening delimiter: Parse a nested structure
If it’s a value: Add it to the current node
If it’s a closing delimiter: Skip it (handled elsewhere)
Recurse with remaining tokens

1
pub fn parse(tokens: &[Tokens], mut parent_node: Node<Form>) -> Result<Node<Form>> {
2
    let Some((current_token, remaining_tokens)) = tokens.split_first() else {
3
        return Ok(parent_node);
4
    };
5

6
    match current_token {
7
        Tokens::Bounds(TokenBounds::LeftParen) => {
8
            // Parse function call: (symbol ...)
9
            let (Tokens::Symbol(function_name), tokens_after_symbol) = remaining_tokens
10
                .split_first()
11
                .ok_or(anyhow!("Unexpected empty parens"))?
12
            else {
13
                return Err(anyhow!("Expected symbol after left paren"));
14
            };
15
            // We'll parse everything between ( and ) into a CallExpression node
16
            parse_delimited_form(
17
                Form::CallExpression(function_name.clone()),
18
                TokenBounds::LeftParen,
19
                tokens_after_symbol,
20
                parent_node,
21
            )
22
        }
23
        Tokens::Bounds(TokenBounds::LeftBracket) => {
24
            // Parse list: [items...]
25
            parse_delimited_form(Form::List, TokenBounds::LeftBracket, remaining_tokens, parent_node)
26
        }
27
        Tokens::Bounds(TokenBounds::LeftCurlyBraces) => {
28
            // Parse record: {:key value ...}
29
            parse_delimited_form(Form::Record, TokenBounds::LeftCurlyBraces, remaining_tokens, parent_node)
30
        }
31
        Tokens::Bounds(_) => parse(remaining_tokens, parent_node),
32
        token => {
33
            // Simple value: add and continue
34
            let form: Form = token.clone().into();
35
            parent_node.append(Node::new(form));
36
            parse(remaining_tokens, parent_node)
37
        }
38
    }
39
}

When we encounter an opening delimiter, we need to find everything that belongs inside it. That’s where parse_delimited_form comes in.

Parsing Nested Structures

The parse_delimited_form function handles nested structures:

1
fn parse_delimited_form(
2
    form_type: Form,
3
    opening_delimiter: TokenBounds,
4
    tokens_to_parse: &[Tokens],
5
    mut parent_node: Node<Form>,
6
) -> Result<Node<Form>> {
7
    let (tokens_inside_delimiters, tokens_after_closing) =
8
        find_matching_delimiter(tokens_to_parse, opening_delimiter)?;
9

10
    // Parse the contents into a new node
11
    let parsed_form = parse(tokens_inside_delimiters, Node::new(form_type))?;
12
    parent_node.append(parsed_form);
13

14
    // Continue parsing after the closing delimiter
15
    parse(tokens_after_closing, parent_node)
16
}

This creates a tree structure: the parent gets a child node containing all the parsed contents.

The Power of Mutual Recursion

The magic happens through mutual recursion between parse and parse_delimited_form:

parse encounters an opening delimiter → calls parse_delimited_form
parse_delimited_form finds the matching closer → calls parse on the contents
parse processes the contents, which may contain more delimiters → back to step 1

This mutual recursion naturally handles arbitrary nesting:

1
Input: (+ (* 2 3) (- 5 1))
2

3
parse sees '(' → parse_delimited_form
4
  → finds matching ')' after "1)"
5
  → calls parse on "(* 2 3) (- 5 1)"
6
    → parse sees '(' → parse_delimited_form
7
      → finds matching ')' after "3)"
8
      → calls parse on "2 3"
9
        → parse adds 2 and 3 as children
10
    → parse sees '(' → parse_delimited_form
11
      → finds matching ')' after "1)"
12
      → calls parse on "5 1"
13
        → parse adds 5 and 1 as children

Each level of nesting creates a new branch in the tree. The recursion depth matches the nesting depth - no special cases needed.

Finding Matching Delimiters

The trickiest part is finding where a nested structure ends:

1
fn find_matching_delimiter(tokens: &[Tokens], opening_delimiter: TokenBounds)
2
    -> Result<(&[Tokens], &[Tokens])> {
3
    let closing_delimiter = match opening_delimiter {
4
        TokenBounds::LeftParen => TokenBounds::RightParen,
5
        TokenBounds::LeftBracket => TokenBounds::RightBracket,
6
        TokenBounds::LeftCurlyBraces => TokenBounds::RightCurlyBraces,
7
        _ => return Err(anyhow!("Invalid opening delimiter")),
8
    };
9

10
    let mut nesting_level = 1;
11
    for (index, token) in tokens.iter().enumerate() {
12
        match token {
13
            Tokens::Bounds(delimiter) if *delimiter == opening_delimiter => {
14
                nesting_level += 1;
15
            }
16
            Tokens::Bounds(delimiter) if *delimiter == closing_delimiter => {
17
                nesting_level -= 1;
18
                if nesting_level == 0 {
19
                    // Split: tokens inside, tokens after closer
20
                    let (inside, including_closer) = tokens.split_at(index);
21
                    return Ok((inside, &including_closer[1..]));
22
                }
23
            }
24
            _ => {}
25
        }
26
    }
27
    Err(anyhow!("Unmatched {} - missing closing delimiter",
28
        match opening_delimiter {
29
            TokenBounds::LeftParen => "parenthesis '('",
30
            TokenBounds::LeftBracket => "bracket '['",
31
            TokenBounds::LeftCurlyBraces => "brace '{'",
32
            _ => "delimiter"
33
        }
34
    ))
35
}

We track nesting level. Each opener increases it, each closer decreases it. When it hits zero, we’ve found our match.

The function returns two slices:

Tokens inside the delimiters (for parsing)
Tokens after the closing delimiter (to continue)

Complete Example

Let’s parse a function definition:

Parse function definition diagram showing transformation from tokens to AST

For a simpler example, (+ 1 2):

Simple parse example showing (+ 1 2) transformation

What We Built

Our parser:

Transforms flat tokens into a tree structure
Handles nested expressions recursively
Validates delimiter matching
Returns helpful errors for malformed input

The parser leverages Lisp’s simple syntax. No precedence rules, no ambiguity. Just nested lists.

Next: Part 3 - Evaluation