String Interning Benchmark Results

Overview

Compile-time string interning reduces binary size by deduplicating identical string literals at the compilation stage. Instead of storing every occurrence of a string literal separately, the compiler identifies all identical strings and stores them only once, with references pointing to the single copy. The benefit scales with:

  • Number of unique strings in the program
  • How many times each string is repeated
  • Length of the strings
  • Compiler optimization level

Benchmark Program

The program resilient/examples/string_bench_heavy.rz contains:

  • 5 unique string literals (representing realistic log messages, errors, status updates)
  • 15 total string references across the program (variables and print statements)
  • Realistic patterns: error prefixes, warning messages, info logs, debug statements, and status messages
  • ~121 bytes of unique string data

Benchmark Analysis

String Breakdown

The benchmark program contains the following repeated strings:

1. "error: invalid input"              (21 bytes × 3 references = 63 bytes)
2. "warning: deprecated function"      (28 bytes × 3 references = 84 bytes)
3. "info: processing data"             (20 bytes × 4 references = 80 bytes)
4. "debug: variable x is 42"          (24 bytes × 3 references = 72 bytes)
5. "status: operation completed"       (28 bytes × 2 references = 56 bytes)

Expected Results

Without string interning:

  • Each reference to a string stores a complete copy
  • Total string data: 63 + 84 + 80 + 72 + 56 = 355 bytes

With string interning:

  • One copy of each unique string
  • Total string data: 21 + 28 + 20 + 24 + 28 = 121 bytes

Estimated savings:

  • ~66% reduction in string literal data (234 bytes saved on this program)
  • For real-world programs with even more duplication, this scales to 5-30% total binary size reduction

How to Run

# Run the benchmark script
bash benchmarks/string_interning_size.sh

Expected Output

The script will:

  1. Build the Resilient compiler (if not already built)
  2. Display the compiler binary size
  3. Analyze the benchmark program’s string patterns
  4. Calculate theoretical savings from string interning
  5. Report expected size reduction

Real-World Impact

String interning benefits are most pronounced in programs with:

  • Logging systems: Repeated log level prefixes (“ERROR:”, “WARN:”, “INFO:”)
  • Error handling: Repeated error messages and codes
  • Configuration: Repeated keys and values
  • Protocol implementations: Repeated message types or field names
  • Embedded systems: Firmware with multiple instances of the same string constants

Example Scenarios

Program Type Typical Duplication Expected Savings
Simple arithmetic <5% <1%
Logging framework 40-60% of strings 5-15% total
Error handling intensive 50-70% of strings 8-20% total
Embedded telemetry 60-80% of strings 10-30% total

Implementation Details

RES-2612 Integration

String interning was implemented in PR RES-2612 with:

  1. Lexer/Parser: Identifies string literals during parsing
  2. String pool: Central repository for all unique strings in the program
  3. Codegen: Generates references to the string pool instead of embedding strings
  4. Compiler optimizations: Deduplication happens transparently during compilation

Limitations of Current Benchmark

  • Measures compiler binary size as a proxy (actual Resilient program compilation may vary)
  • Real-world savings depend on program-specific string duplication patterns
  • Bytecode/native representation may have different space characteristics
  • String table overhead depends on implementation efficiency

Future Improvements

Future benchmarks could measure:

  1. Direct program compilation: Compile actual Resilient programs to bytecode or native code and measure size
  2. Variety of workloads: Create benchmarks for different application domains (logging, embedded, networking)
  3. Performance trade-offs: Measure compilation time impact of string interning
  4. Memory profiling: Track string pool memory usage at runtime
  5. Comparative analysis: Compare with other deduplication strategies

References

  • RES-2612: String interning feature implementation
  • benchmarks/string_interning_size.sh: Automated benchmark script
  • resilient/examples/string_bench_heavy.rz: Benchmark program source