Lucas Rangel

Writing a Lisp Interpreter in Rust - Part 2: Parsing

January 17, 2024
5 min read
index

The Problem

We have tokens. We need a tree.

// We have this:
[LeftParen, Symbol("+"), Integer(1), Integer(2), RightParen]
// We need this:
CallExpression("+", [Literal(Integer(1)), Literal(Integer(2))])

Why a tree? Because evaluation is recursive. To evaluate (+ (* 2 3) 4):

  1. Evaluate the function: +
  2. Evaluate each argument: (* 2 3) and 4
  3. Apply the function to the results

But (* 2 3) is itself a function call, so we recurse. The tree structure mirrors this evaluation order - parent nodes can’t be evaluated until their children are.

Most languages require complex parsing. Lisp doesn’t. Its syntax is already a tree.

The Type System

First, we convert tokens into AST nodes. The key types:

pub enum Tokens {
Bounds(TokenBounds), // (, ), [, ], {, }
Literal(Literal), // 42, "hello", true
Symbol(String), // +, defn, my-var
Key(String), // :name, :age
}
pub enum Form {
Root, // Top level container
Literal(Literal), // Numbers, strings, bools
CallExpression(String), // (function-name ...)
Symbol(String), // Variable references
List, // [1 2 3]
Record, // {:key value}
Key(String), // Record keys
}

The magic happens with the From trait:

impl From<Tokens> for Form {
fn from(value: Tokens) -> Self {
match value {
Tokens::Bounds(_) => unreachable!(),
Tokens::Literal(l) => Form::Literal(l),
Tokens::Symbol(s) => Form::Symbol(s),
Tokens::Key(k) => Form::Key(k),
}
}
}

Bounds tokens don’t become forms - they control the parsing flow.

The AST

Our tree nodes are represented by the Form enum:

pub enum Form {
Root,
Literal(Literal),
CallExpression(String),
Symbol(String),
List,
Record,
Key(String),
}

Each variant represents a different syntactic element:

  • CallExpression: Function calls like (+ 1 2)
  • List: Data lists like [1 2 3]
  • Record: Hash maps like {:name "Alice"}
  • Symbol: Variables and function names
  • Literal: Numbers, strings, booleans

The Algorithm

We use recursive descent parsing. The algorithm:

  1. Take one token at a time
  2. If it’s an opening delimiter: Parse a nested structure
  3. If it’s a value: Add it to the current node
  4. If it’s a closing delimiter: Skip it (handled elsewhere)
  5. Recurse with remaining tokens
pub fn parse(tokens: &[Tokens], mut parent_node: Node<Form>) -> Result<Node<Form>> {
let Some((current_token, remaining_tokens)) = tokens.split_first() else {
return Ok(parent_node);
};
match current_token {
Tokens::Bounds(TokenBounds::LeftParen) => {
// Parse function call: (symbol ...)
let (Tokens::Symbol(function_name), tokens_after_symbol) = remaining_tokens
.split_first()
.ok_or(anyhow!("Unexpected empty parens"))?
else {
return Err(anyhow!("Expected symbol after left paren"));
};
// We'll parse everything between ( and ) into a CallExpression node
parse_delimited_form(
Form::CallExpression(function_name.clone()),
TokenBounds::LeftParen,
tokens_after_symbol,
parent_node,
)
}
Tokens::Bounds(TokenBounds::LeftBracket) => {
// Parse list: [items...]
parse_delimited_form(Form::List, TokenBounds::LeftBracket, remaining_tokens, parent_node)
}
Tokens::Bounds(TokenBounds::LeftCurlyBraces) => {
// Parse record: {:key value ...}
parse_delimited_form(Form::Record, TokenBounds::LeftCurlyBraces, remaining_tokens, parent_node)
}
Tokens::Bounds(_) => parse(remaining_tokens, parent_node),
token => {
// Simple value: add and continue
let form: Form = token.clone().into();
parent_node.append(Node::new(form));
parse(remaining_tokens, parent_node)
}
}
}

When we encounter an opening delimiter, we need to find everything that belongs inside it. That’s where parse_delimited_form comes in.

Parsing Nested Structures

The parse_delimited_form function handles nested structures:

fn parse_delimited_form(
form_type: Form,
opening_delimiter: TokenBounds,
tokens_to_parse: &[Tokens],
mut parent_node: Node<Form>,
) -> Result<Node<Form>> {
let (tokens_inside_delimiters, tokens_after_closing) =
find_matching_delimiter(tokens_to_parse, opening_delimiter)?;
// Parse the contents into a new node
let parsed_form = parse(tokens_inside_delimiters, Node::new(form_type))?;
parent_node.append(parsed_form);
// Continue parsing after the closing delimiter
parse(tokens_after_closing, parent_node)
}

This creates a tree structure: the parent gets a child node containing all the parsed contents.

The Power of Mutual Recursion

The magic happens through mutual recursion between parse and parse_delimited_form:

  1. parse encounters an opening delimiter → calls parse_delimited_form
  2. parse_delimited_form finds the matching closer → calls parse on the contents
  3. parse processes the contents, which may contain more delimiters → back to step 1

This mutual recursion naturally handles arbitrary nesting:

Input: (+ (* 2 3) (- 5 1))
parse sees '(' → parse_delimited_form
→ finds matching ')' after "1)"
→ calls parse on "(* 2 3) (- 5 1)"
→ parse sees '(' → parse_delimited_form
→ finds matching ')' after "3)"
→ calls parse on "2 3"
→ parse adds 2 and 3 as children
→ parse sees '(' → parse_delimited_form
→ finds matching ')' after "1)"
→ calls parse on "5 1"
→ parse adds 5 and 1 as children

Each level of nesting creates a new branch in the tree. The recursion depth matches the nesting depth - no special cases needed.

Finding Matching Delimiters

The trickiest part is finding where a nested structure ends:

fn find_matching_delimiter(tokens: &[Tokens], opening_delimiter: TokenBounds)
-> Result<(&[Tokens], &[Tokens])> {
let closing_delimiter = match opening_delimiter {
TokenBounds::LeftParen => TokenBounds::RightParen,
TokenBounds::LeftBracket => TokenBounds::RightBracket,
TokenBounds::LeftCurlyBraces => TokenBounds::RightCurlyBraces,
_ => return Err(anyhow!("Invalid opening delimiter")),
};
let mut nesting_level = 1;
for (index, token) in tokens.iter().enumerate() {
match token {
Tokens::Bounds(delimiter) if *delimiter == opening_delimiter => {
nesting_level += 1;
}
Tokens::Bounds(delimiter) if *delimiter == closing_delimiter => {
nesting_level -= 1;
if nesting_level == 0 {
// Split: tokens inside, tokens after closer
let (inside, including_closer) = tokens.split_at(index);
return Ok((inside, &including_closer[1..]));
}
}
_ => {}
}
}
Err(anyhow!("Unmatched {} - missing closing delimiter",
match opening_delimiter {
TokenBounds::LeftParen => "parenthesis '('",
TokenBounds::LeftBracket => "bracket '['",
TokenBounds::LeftCurlyBraces => "brace '{'",
_ => "delimiter"
}
))
}

We track nesting level. Each opener increases it, each closer decreases it. When it hits zero, we’ve found our match.

The function returns two slices:

  • Tokens inside the delimiters (for parsing)
  • Tokens after the closing delimiter (to continue)

Complete Example

Let’s parse a function definition:

Parse function definition diagram showing transformation from tokens to AST

For a simpler example, (+ 1 2):

Simple parse example showing (+ 1 2) transformation

What We Built

Our parser:

  • Transforms flat tokens into a tree structure
  • Handles nested expressions recursively
  • Validates delimiter matching
  • Returns helpful errors for malformed input

The parser leverages Lisp’s simple syntax. No precedence rules, no ambiguity. Just nested lists.

Next: Part 3 - Evaluation