This page describes how rustc transforms raw source text into an Abstract Syntax Tree (AST). It covers the lexer/tokenizer, the recursive-descent parser in the rustc_parse crate, and the AST node types defined in rustc_ast. It ends where macro expansion and name resolution begin — for those topics, see Name Resolution and Macro Expansion. For how the resulting AST is converted to HIR, see AST to HIR Lowering.
| Crate | Role |
|---|---|
rustc_ast | Defines all AST node types, token types, visitor traits |
rustc_parse | Implements the lexer and recursive-descent parser |
rustc_ast_passes | Post-expansion AST validation (AstValidator) |
rustc_ast_pretty | Pretty-printing AST nodes back to source text |
rustc_span | Span, SourceMap, Ident, Symbol types used throughout |
rustc_ast Crate: AST Data ModelAll AST node structs and enums are defined in compiler/rustc_ast/src/ast.rs The central design pattern is a pair: a wrapper struct holding a NodeId, Span, and optional token stream, paired with a Kind enum carrying the actual variant-specific data.
Top-level structure diagram:
Sources: compiler/rustc_ast/src/ast.rs549-558 compiler/rustc_ast/src/ast.rs610-619
| Struct | Kind Enum | Purpose |
|---|---|---|
Crate | — | Top-level crate containing items and inner attributes |
Item | ItemKind | Functions, structs, enums, traits, impls, use, extern, etc. |
Expr | ExprKind | Expressions: calls, binops, closures, blocks, if/match/loop |
Stmt | StmtKind | Statements: let, expression statements, macros |
Pat | PatKind | Patterns: identifiers, structs, tuples, wildcards, ranges |
Ty | TyKind | Types: paths, references, tuples, slices, fn pointers |
Block | — | A { ... } block containing a Vec<Stmt> |
Path | — | A sequence of PathSegments separated by :: |
GenericParam | GenericParamKind | Lifetime, type, or const generic parameters |
Generics | — | A collection of GenericParams and a WhereClause |
WherePredicate | WherePredicateKind | Bounds in a where-clause |
Sources: compiler/rustc_ast/src/ast.rs91-98 compiler/rustc_ast/src/ast.rs443-453 compiler/rustc_ast/src/ast.rs469-486 compiler/rustc_ast/src/ast.rs624-630
Every AST node carries a NodeId (u32 newtype from rustc_span), assigned initially as DUMMY_NODE_ID during parsing. Real IDs are assigned during a later pass before lowering. Spans are attached to nodes at parse time and preserved through expansion.
Nodes that need to support attribute macros (items, expressions, statements) carry an Option<LazyAttrTokenStream> field named tokens. This lazily stores the original token stream for the node so proc-macro attribute handlers can inspect or replace it.
Sources: compiler/rustc_ast/src/ast.rs611-619 compiler/rustc_ast/src/ast.rs626-630
rustc_parse CrateParser sub-module to parsing responsibility:
Sources: compiler/rustc_parse/src/parser/mod.rs1-18
ParseSessThe ParseSess struct (from rustc_session) is the shared context for a parse job. It holds:
SourceMapDiagCtxt for emitting errorsgated_spans)ambiguous_block_expr_parse)The Parser holds a &'a ParseSess as its psess field.
Parser StructDefined in compiler/rustc_parse/src/parser/mod.rs183-240
| Field | Type | Description |
|---|---|---|
psess | &'a ParseSess | Parse session with source map and diagnostics |
token | Token | Current lookahead token |
prev_token | Token | Previously consumed token |
token_cursor | TokenCursor | Cursor into the TokenStream |
restrictions | Restrictions | Active parsing restrictions (bitflags) |
expected_token_types | TokenTypeSet | For error message generation |
recovery | Recovery | Whether error recovery is allowed |
capture_state | CaptureState | Token collection state for attribute streams |
num_bump_calls | u32 | Count of bump() calls for position tracking |
The parser is cloned cheaply for speculative parsing and diagnostic snapshots via create_snapshot_for_diagnostic.
Sources: compiler/rustc_parse/src/parser/mod.rs183-240
Restrictions BitflagsThe Restrictions type (a bitflags! struct) modifies parser behavior at specific grammar positions:
| Flag | Meaning |
|---|---|
STMT_EXPR | Stop expression parsing at statement-terminating tokens |
NO_STRUCT_LITERAL | Disallow Foo { } struct literal syntax (e.g., in if conditions) |
CONST_EXPR | Limit to const-generic-legal expressions; > terminates parsing |
ALLOW_LET | Allow let pattern = expr expressions (in if/while chains) |
IN_IF_GUARD | Better error when => is missing in match guard |
IS_PAT | Parse as pattern during error recovery; = and ` |
Sources: compiler/rustc_parse/src/parser/mod.rs66-124
Tokens are represented by Token (a TokenKind plus a Span) and TokenKind (an enum). Key TokenKind variants:
| Variant | Example |
|---|---|
Ident(Symbol, IdentIsRaw) | foo, r#type |
Literal(Lit) | 42, "hello", 3.14 |
Lifetime(Symbol) | 'a |
OpenBrace / CloseBrace | { / } |
OpenParen / CloseParen | ( / ) |
Lt / Gt | < / > |
PathSep | :: |
Eq / EqEq | = / == |
Plus, Minus, Star, Slash | Arithmetic operators |
DotDot / DotDotEq | .. / ..= |
Eof | End of file |
Sources: compiler/rustc_ast/src/token.rs1-50
The main entry point for parsing a complete source file is parse_crate_mod in compiler/rustc_parse/src/parser/item.rs31-34:
Parser::parse_crate_mod()
└─ parse_mod(Eof)
├─ parse_inner_attributes()
└─ loop: parse_item() → Box<Item>
parse_mod drives a loop that consumes items until the terminating token (EOF for a file, } for an inline module). The result is an ast::Crate.
Sources: compiler/rustc_parse/src/parser/item.rs31-115
parse_item calls parse_item_kind, which dispatches by the leading keyword or token:
Each item parser reads its leading keyword(s), the item name (usually an Ident), optional Generics, a signature or body, and wraps everything into Item { attrs, id, kind, vis, span, tokens }.
Sources: compiler/rustc_parse/src/parser/item.rs232-360
Expression parsing uses a Pratt (top-down operator precedence) algorithm implemented in compiler/rustc_parse/src/parser/expr.rs
Expression parsing call chain:
The parse_expr_assoc_with function takes a min_prec: Bound<ExprPrecedence> and handles left/right associativity. The AssocOp enum (in rustc_ast::util::parser) lists all infix operators with their precedences and fixity.
Prefix expression dispatch:
| Token | Parsed as |
|---|---|
! | ExprKind::Unary(UnOp::Not, ...) |
- | ExprKind::Unary(UnOp::Neg, ...) |
* | ExprKind::Unary(UnOp::Deref, ...) |
& / && | ExprKind::AddrOf(...) |
.. / ..= | ExprKind::Range(...) (prefix) |
Postfix operations (.field, .method(), [idx], ?, .await) are parsed in parse_expr_dot_or_call_with.
Sources: compiler/rustc_parse/src/parser/expr.rs54-573
parse_stmt in compiler/rustc_parse/src/parser/stmt.rs distinguishes:
StmtKind::Let: let keyword → pattern, optional type annotation, optional initializer, optional else block.StmtKind::Item: an item embedded in a block (e.g., fn or struct inside a function).StmtKind::Expr / StmtKind::Semi: an expression, optionally followed by ;.StmtKind::MacCall: a macro invocation used as a statement.Patterns are parsed in compiler/rustc_parse/src/parser/pat.rs Key PatKind variants and when they are parsed:
| PatKind | Syntax example |
|---|---|
Ident(BindingMode, Ident, Option<Pat>) | x, mut x, ref x, x @ pat |
Struct(Path, Vec<PatField>, Rest) | Foo { x, y } |
TupleStruct(Path, Vec<Pat>) | Some(x) |
Tuple(Vec<Pat>) | (a, b, c) |
Wild | _ |
Lit(Expr) | 42, "hello" |
Range(Expr, Expr, RangeEnd) | 0..=9 |
Slice(Vec<Pat>) | [a, b, c] |
Or(Vec<Pat>) | A | B |
Type parsing is in compiler/rustc_parse/src/parser/ty.rs Key TyKind variants:
| TyKind | Syntax |
|---|---|
Path(Option<QSelf>, Path) | Vec<T>, <T as Trait>::Assoc |
Ref(Lifetime, MutTy) | &'a T, &mut T |
Ptr(MutTy) | *const T, *mut T |
Tup(Vec<Ty>) | (A, B, C) |
Array(Ty, AnonConst) | [T; N] |
Slice(Ty) | [T] |
ImplTrait(NodeId, Bounds) | impl Trait |
TraitObject(Bounds, syntax) | dyn Trait + 'a |
FnPtr(FnPtrTy) | fn(A) -> B |
Never | ! |
Infer | _ |
Attributes (#[...] and #![...]) are parsed in compiler/rustc_parse/src/parser/attr.rs Outer attributes are collected before each item/expression via parse_outer_attributes(), which returns an AttrWrapper. Inner attributes (#![...]) are parsed inside module bodies and function bodies via parse_inner_attributes().
Parsing and token collection are interleaved. When the parser encounters a node that may be annotated with an attribute macro, it calls collect_tokens (in compiler/rustc_parse/src/parser/attr_wrapper.rs). This records the start and end positions in the TokenStream and produces a LazyAttrTokenStream stored in the node's tokens field. The stream is materialized on demand when a proc-macro attribute handler inspects it.
The CaptureState inside Parser tracks:
capturing: Capturing — whether collection is activeparser_replacements — substitutions made during macro expansionseen_attrs — attribute IDs already incorporated into a token streamSources: compiler/rustc_parse/src/parser/attr_wrapper.rs1-40 compiler/rustc_parse/src/parser/mod.rs267-276
The parser attempts to produce a useful AST even in the presence of syntax errors. Key mechanisms:
may_recover() — guards all recovery logic; returns false when parsing macro arguments (see Recovery::Forbidden).create_snapshot_for_diagnostic() — cheaply clones the parser state. Speculative parses are tried; on failure the snapshot is restored via restore_snapshot.DUMMY_NODE_ID and ExprKind::Err — placeholder nodes that allow parsing to continue after an error. An ErrorGuaranteed token is threaded through to suppress downstream cascading errors.===, !==, <>, <=>, ++, --, and, or, and others and emits targeted diagnostics with corrections.Sources: compiler/rustc_parse/src/parser/diagnostics.rs264-279 compiler/rustc_parse/src/parser/mod.rs376-385 compiler/rustc_parse/src/parser/expr.rs199-272
After parsing and after macro expansion, rustc_ast_passes runs AstValidator (compiler/rustc_ast_passes/src/ast_validation.rs90-117). This is a Visitor over the frozen AST that enforces constraints that cannot be checked during parsing because attribute macros may produce or consume syntactically invalid constructs:
| Check category | Examples |
|---|---|
| Missing function bodies | fn foo(); outside trait |
Nested impl Trait | impl Into<impl Debug> |
Invalid ~const positions | ~const in trait objects |
| ABI validity | Invalid ABI strings in extern "..." |
| Pattern validity | Variadic ... usage |
| Where-clause location | Where-clause before type alias body |
The validator does not perform name resolution or type checking.
Sources: compiler/rustc_ast_passes/src/ast_validation.rs1-18 compiler/rustc_ast_passes/src/ast_validation.rs90-117
From source bytes to ast::Crate:
Sources: compiler/rustc_parse/src/parser/item.rs31-34 compiler/rustc_parse/src/parser/mod.rs331-358 compiler/rustc_ast/src/ast.rs549-558 compiler/rustc_ast_passes/src/ast_validation.rs90-117
Code entity map: key types and their locations:
Sources: compiler/rustc_parse/src/parser/mod.rs183-240 compiler/rustc_ast/src/ast.rs1-50 compiler/rustc_ast/src/token.rs1-50
rustc_ast exposes two traversal traits for consumers of the AST:
| Trait | Location | Use |
|---|---|---|
Visitor | compiler/rustc_ast/src/visit.rs | Read-only traversal; default implementations call walk_* helpers |
MutVisitor | compiler/rustc_ast/src/mut_visit.rs | Mutating traversal; used by macro expansion |
The AstValidator in rustc_ast_passes implements Visitor. Macro expansion implements MutVisitor to rewrite subtrees in-place.
Sources: compiler/rustc_ast/src/visit.rs1-30 compiler/rustc_ast/src/mut_visit.rs1-40
Refresh this wiki
This wiki was recently refreshed. Please wait 4 days to refresh again.