The Whitespace Trick
The Oxylisp lexer uses a simple approach: add spaces around delimiters, then split on whitespace. Why?
expression
.replace('(', " ( ")
.replace(')', " ) ")
.split_whitespace()
Pros:
- Dead simple to understand
- No complex state machine
- Works for 90% of Lisp code
- Easy to debug
Cons:
- Breaks with strings containing spaces
- Can’t handle certain edge cases
- Not the “proper” way
When Simple Isn’t Enough
You’d need a real lexer when:
- Supporting strings with spaces
- Adding line/column numbers for errors
- Optimizing for performance
- Supporting more complex syntax
The Trade-off
For learning interpreters, clarity beats cleverness. A 50-line lexer you understand is better than a 500-line one you don’t.
The goal isn’t to build the next Clojure. It’s to understand how interpreters work.
Error Handling Evolution
Version 1 used panic!
:
_ => panic!("invalid token")
Version 2 uses Result
:
_ => Err(anyhow!("Unrecognized token: '{}'", token))
This isn’t just “better practice” - it’s about learning. When your lexer fails, you want to know why, not just crash.
Regex Patterns
The patterns evolved from:
r"[A-Za-z+*-=]" // Matches single character
To:
r"^[A-Za-z+*/-=<>!][A-Za-z0-9+*/-=<>!]*$" // Matches full token
The anchors (^
and $
) prevent matching substrings. The extended character set handles real Lisp operators.
Why These Choices Matter
Every decision optimizes for learning:
- Simple over sophisticated
- Clear over clever
- Working over perfect
The best code for learning isn’t the best code for production. And that’s okay.