String Interning Benchmark Results
Overview
Compile-time string interning reduces binary size by deduplicating identical string literals at the compilation stage. Instead of storing every occurrence of a string literal separately, the compiler identifies all identical strings and stores them only once, with references pointing to the single copy. The benefit scales with:
- Number of unique strings in the program
- How many times each string is repeated
- Length of the strings
- Compiler optimization level
Benchmark Program
The program resilient/examples/string_bench_heavy.rz contains:
- 5 unique string literals (representing realistic log messages, errors, status updates)
- 15 total string references across the program (variables and print statements)
- Realistic patterns: error prefixes, warning messages, info logs, debug statements, and status messages
- ~121 bytes of unique string data
Benchmark Analysis
String Breakdown
The benchmark program contains the following repeated strings:
1. "error: invalid input" (21 bytes × 3 references = 63 bytes)
2. "warning: deprecated function" (28 bytes × 3 references = 84 bytes)
3. "info: processing data" (20 bytes × 4 references = 80 bytes)
4. "debug: variable x is 42" (24 bytes × 3 references = 72 bytes)
5. "status: operation completed" (28 bytes × 2 references = 56 bytes)
Expected Results
Without string interning:
- Each reference to a string stores a complete copy
- Total string data: 63 + 84 + 80 + 72 + 56 = 355 bytes
With string interning:
- One copy of each unique string
- Total string data: 21 + 28 + 20 + 24 + 28 = 121 bytes
Estimated savings:
- ~66% reduction in string literal data (234 bytes saved on this program)
- For real-world programs with even more duplication, this scales to 5-30% total binary size reduction
How to Run
# Run the benchmark script
bash benchmarks/string_interning_size.sh
Expected Output
The script will:
- Build the Resilient compiler (if not already built)
- Display the compiler binary size
- Analyze the benchmark program’s string patterns
- Calculate theoretical savings from string interning
- Report expected size reduction
Real-World Impact
String interning benefits are most pronounced in programs with:
- Logging systems: Repeated log level prefixes (“ERROR:”, “WARN:”, “INFO:”)
- Error handling: Repeated error messages and codes
- Configuration: Repeated keys and values
- Protocol implementations: Repeated message types or field names
- Embedded systems: Firmware with multiple instances of the same string constants
Example Scenarios
| Program Type | Typical Duplication | Expected Savings |
|---|---|---|
| Simple arithmetic | <5% | <1% |
| Logging framework | 40-60% of strings | 5-15% total |
| Error handling intensive | 50-70% of strings | 8-20% total |
| Embedded telemetry | 60-80% of strings | 10-30% total |
Implementation Details
RES-2612 Integration
String interning was implemented in PR RES-2612 with:
- Lexer/Parser: Identifies string literals during parsing
- String pool: Central repository for all unique strings in the program
- Codegen: Generates references to the string pool instead of embedding strings
- Compiler optimizations: Deduplication happens transparently during compilation
Limitations of Current Benchmark
- Measures compiler binary size as a proxy (actual Resilient program compilation may vary)
- Real-world savings depend on program-specific string duplication patterns
- Bytecode/native representation may have different space characteristics
- String table overhead depends on implementation efficiency
Future Improvements
Future benchmarks could measure:
- Direct program compilation: Compile actual Resilient programs to bytecode or native code and measure size
- Variety of workloads: Create benchmarks for different application domains (logging, embedded, networking)
- Performance trade-offs: Measure compilation time impact of string interning
- Memory profiling: Track string pool memory usage at runtime
- Comparative analysis: Compare with other deduplication strategies
References
- RES-2612: String interning feature implementation
- benchmarks/string_interning_size.sh: Automated benchmark script
- resilient/examples/string_bench_heavy.rz: Benchmark program source