Skip to content

Writing a Lisp Interpreter in Rust - Part 2: Parsing

Published: at 07:00 AM

The Problem

We have tokens. We need a tree.

// We have this:
[LeftParen, Symbol("+"), Integer(1), Integer(2), RightParen]

// We need this:
CallExpression("+", [Literal(Integer(1)), Literal(Integer(2))])

Why a tree? Because evaluation is recursive. To evaluate (+ (* 2 3) 4):

  1. Evaluate the function: +
  2. Evaluate each argument: (* 2 3) and 4
  3. Apply the function to the results

But (* 2 3) is itself a function call, so we recurse. The tree structure mirrors this evaluation order - parent nodes can’t be evaluated until their children are.

Most languages require complex parsing. Lisp doesn’t. Its syntax is already a tree.

The Type System

First, we convert tokens into AST nodes. The key types:

pub enum Tokens {
    Bounds(TokenBounds),     // (, ), [, ], {, }
    Literal(Literal),        // 42, "hello", true
    Symbol(String),          // +, defn, my-var
    Key(String),            // :name, :age
}

pub enum Form {
    Root,                    // Top level container
    Literal(Literal),        // Numbers, strings, bools
    CallExpression(String),  // (function-name ...)
    Symbol(String),          // Variable references
    List,                    // [1 2 3]
    Record,                  // {:key value}
    Key(String),            // Record keys
}

The magic happens with the From trait:

impl From<Tokens> for Form {
    fn from(value: Tokens) -> Self {
        match value {
            Tokens::Bounds(_) => unreachable!(),
            Tokens::Literal(l) => Form::Literal(l),
            Tokens::Symbol(s) => Form::Symbol(s),
            Tokens::Key(k) => Form::Key(k),
        }
    }
}

Bounds tokens don’t become forms - they control the parsing flow.

The AST

Our tree nodes are represented by the Form enum:

pub enum Form {
    Root,
    Literal(Literal),
    CallExpression(String),
    Symbol(String),
    List,
    Record,
    Key(String),
}

Each variant represents a different syntactic element:

The Algorithm

We use recursive descent parsing. The algorithm:

  1. Take one token at a time
  2. If it’s an opening delimiter: Parse a nested structure
  3. If it’s a value: Add it to the current node
  4. If it’s a closing delimiter: Skip it (handled elsewhere)
  5. Recurse with remaining tokens
pub fn parse(tokens: &[Tokens], mut parent_node: Node<Form>) -> Result<Node<Form>> {
    let Some((current_token, remaining_tokens)) = tokens.split_first() else {
        return Ok(parent_node);
    };

    match current_token {
        Tokens::Bounds(TokenBounds::LeftParen) => {
            // Parse function call: (symbol ...)
            let (Tokens::Symbol(function_name), tokens_after_symbol) = remaining_tokens
                .split_first()
                .ok_or(anyhow!("Unexpected empty parens"))?
            else {
                return Err(anyhow!("Expected symbol after left paren"));
            };
            // We'll parse everything between ( and ) into a CallExpression node
            parse_delimited_form(
                Form::CallExpression(function_name.clone()),
                TokenBounds::LeftParen,
                tokens_after_symbol,
                parent_node,
            )
        }
        Tokens::Bounds(TokenBounds::LeftBracket) => {
            // Parse list: [items...]
            parse_delimited_form(Form::List, TokenBounds::LeftBracket, remaining_tokens, parent_node)
        }
        Tokens::Bounds(TokenBounds::LeftCurlyBraces) => {
            // Parse record: {:key value ...}
            parse_delimited_form(Form::Record, TokenBounds::LeftCurlyBraces, remaining_tokens, parent_node)
        }
        Tokens::Bounds(_) => parse(remaining_tokens, parent_node),
        token => {
            // Simple value: add and continue
            let form: Form = token.clone().into();
            parent_node.append(Node::new(form));
            parse(remaining_tokens, parent_node)
        }
    }
}

When we encounter an opening delimiter, we need to find everything that belongs inside it. That’s where parse_delimited_form comes in.

Parsing Nested Structures

The parse_delimited_form function handles nested structures:

fn parse_delimited_form(
    form_type: Form,
    opening_delimiter: TokenBounds,
    tokens_to_parse: &[Tokens],
    mut parent_node: Node<Form>,
) -> Result<Node<Form>> {
    let (tokens_inside_delimiters, tokens_after_closing) = 
        find_matching_delimiter(tokens_to_parse, opening_delimiter)?;
    
    // Parse the contents into a new node
    let parsed_form = parse(tokens_inside_delimiters, Node::new(form_type))?;
    parent_node.append(parsed_form);
    
    // Continue parsing after the closing delimiter
    parse(tokens_after_closing, parent_node)
}

This creates a tree structure: the parent gets a child node containing all the parsed contents.

The Power of Mutual Recursion

The magic happens through mutual recursion between parse and parse_delimited_form:

  1. parse encounters an opening delimiter → calls parse_delimited_form
  2. parse_delimited_form finds the matching closer → calls parse on the contents
  3. parse processes the contents, which may contain more delimiters → back to step 1

This mutual recursion naturally handles arbitrary nesting:

Input: (+ (* 2 3) (- 5 1))

parse sees '(' → parse_delimited_form
  → finds matching ')' after "1)"
  → calls parse on "(* 2 3) (- 5 1)"
    → parse sees '(' → parse_delimited_form  
      → finds matching ')' after "3)"
      → calls parse on "2 3"
        → parse adds 2 and 3 as children
    → parse sees '(' → parse_delimited_form
      → finds matching ')' after "1)"
      → calls parse on "5 1"
        → parse adds 5 and 1 as children

Each level of nesting creates a new branch in the tree. The recursion depth matches the nesting depth - no special cases needed.

Finding Matching Delimiters

The trickiest part is finding where a nested structure ends:

fn find_matching_delimiter(tokens: &[Tokens], opening_delimiter: TokenBounds) 
    -> Result<(&[Tokens], &[Tokens])> {
    let closing_delimiter = match opening_delimiter {
        TokenBounds::LeftParen => TokenBounds::RightParen,
        TokenBounds::LeftBracket => TokenBounds::RightBracket,
        TokenBounds::LeftCurlyBraces => TokenBounds::RightCurlyBraces,
        _ => return Err(anyhow!("Invalid opening delimiter")),
    };

    let mut nesting_level = 1;
    for (index, token) in tokens.iter().enumerate() {
        match token {
            Tokens::Bounds(delimiter) if *delimiter == opening_delimiter => {
                nesting_level += 1;
            }
            Tokens::Bounds(delimiter) if *delimiter == closing_delimiter => {
                nesting_level -= 1;
                if nesting_level == 0 {
                    // Split: tokens inside, tokens after closer
                    let (inside, including_closer) = tokens.split_at(index);
                    return Ok((inside, &including_closer[1..]));
                }
            }
            _ => {}
        }
    }
    Err(anyhow!("Unmatched {} - missing closing delimiter", 
        match opening_delimiter {
            TokenBounds::LeftParen => "parenthesis '('",
            TokenBounds::LeftBracket => "bracket '['",
            TokenBounds::LeftCurlyBraces => "brace '{'",
            _ => "delimiter"
        }
    ))
}

We track nesting level. Each opener increases it, each closer decreases it. When it hits zero, we’ve found our match.

The function returns two slices:

Complete Example

Let’s parse a function definition:

Parse function definition diagram showing transformation from tokens to AST

For a simpler example, (+ 1 2):

Simple parse example showing (+ 1 2) transformation

What We Built

Our parser:

The parser leverages Lisp’s simple syntax. No precedence rules, no ambiguity. Just nested lists.

Next: Part 3 - Evaluation