Cendol is a C23 compiler implemented in Rust. It is a project to understand the process of building a compiler from scratch, focusing on high-performance compiler architecture and comprehensive C23 standard compliance.
- Full C23 Preprocessor: Complete preprocessor with macro expansion, conditional compilation, file inclusion,
#embeddirective, and built-in macros. - Lexer: Tokenization of C23 source code with proper handling of literals, keywords, and operators.
- Parser: Comprehensive C23 syntax parsing including attribute syntax
[[...]], Pratt parsing for expressions, and recursive descent for statements. - Semantic Analysis: Type checking, symbol resolution, and semantic validation with support for C23 features like
auto,constexpr, and enum underlying types. - Code Generation: Compiles to native object code using Cranelift backend.
- Linker Integration: Automatic invocation of system linker (clang) to produce executables.
- Rich Diagnostics: Error reporting with source location tracking.
Cendol aims for comprehensive C23 support. Currently implemented features include:
autoandconstexpr: Type inference and constant expressions.bool,true,false: Built-in boolean types and literals.alignas,alignof,thread_local,static_assert,typeof,typeof_unqual: C23 keywords.- Attribute Syntax: Support for
[[...]]attributes. - Enum Underlying Types: Enums with specified underlying types (e.g.,
enum e : int). - Empty Initializer: Support for
{}to zero-initialize objects. #embedDirective: Resource inclusion in the preprocessor.- Improved Function Declarations:
int foo()is equivalent toint foo(void).
- No Trigraph Support: Trigraphs were officially removed in C23.
- No Digraph Support: Two-character sequences like
<:are not supported. - Missing C23 Language Features:
- Bit-precise integers (
_BitInt(N)) are not yet implemented. - Decimal floating-point types (
_Decimal32, etc.) are not supported.
- Bit-precise integers (
- No Standard Library: Cendol relies on the system's C library for headers and linking.
Cendol follows a traditional multi-phase compiler architecture optimized for performance:
- Preprocessing Phase: Transforms C source with macro expansion and includes
- Lexing Phase: Converts preprocessed tokens to lexical tokens
- Parsing Phase: Builds a flattened Abstract Syntax Tree (AST)
- Semantic Analysis Phase: Performs type checking and symbol resolution
- MIR Generation: Lowers AST to Mid-level Intermediate Representation
- Code Generation: Generates native machine code via Cranelift
- Linking: Links object files to create the final executable
- Rust 2024 edition or later
- Cargo
- Clang (used as the system linker)
To build the compiler, run:
cargo buildFor release build with optimizations:
cargo build --releaseTo compile a C file to an executable:
cargo run -- -o <output_file> <input_file>-E: Preprocess only, output preprocessed source to stdout-P: Suppress line markers in preprocessor output-C: Retain comments in preprocessor output-I <path>: Add include search path-D <name>[=<value>]: Define preprocessor macro--verbose: Enable verbose diagnostic output
Preprocess a file:
cargo run -- -E test.cDefine macros and include paths:
cargo run -- -D DEBUG=1 -I /usr/include test.cComprehensive design documentation is available in the design-document/ directory:
- Main Architecture - Overall compiler design and goals
- Preprocessor Design - Preprocessing phase details
- Lexer Design - Tokenization strategy
- Parser Design - AST construction
- Semantic Analysis - Type checking and validation
This is a learning project, but contributions are welcome! Areas of interest include:
- Additional C23 language features
- Performance optimizations
- Testing and bug fixes
- Documentation improvements
This project is AI-friendly and welcomes contributions from developers using AI tools. We encourage the use of AI for code generation, debugging, and documentation to enhance productivity.
See LICENSE file for details.