audio-tabula-rasa

commit 7ca4031 — RLAIF training overhaul: corpus reward, theory sweep models, audio samples

A music generator trained against psychoacoustic physics rewards only — no MIDI, no audio corpus, no learned vocoder. Every audio file below is freshly rendered from the model weights in this commit by an additive sine-bank synthesizer.

RLAIF demo — melody trained vs mock judge

8 sample melodies from the Phase-3 generator after a REINFORCE pass that uses a feature-based mock judge as reward. Same loop accepts Qwen2.5-Omni-7B as the judge instead — see scripts/judge_with_ollama.py and src/train/rlaif_train.py --judge qwen.

Phase 14 — First composed song (16 bars)

Multi-track arrangement of trained generators: Phase-2 progressions on a pad, plucked bass on the chord roots, Phase-3 melodies quantized to a major pentatonic on the lead, and a 4/4 backbeat. Closer to listenable music than the per-phase demos but still robotic — work in progress.

Phase 1 — Interval discovery

Two-frequency generator + Sethares dissonance reward. Consonant intervals (major sixth / octave region) emerge with no music data.

Phase 1 — Interval discovery plot

Phase 2 — Triads

3-voice generator + pairwise Sethares + voice spread. Discovers sus4 (6:8:9), major (4:5:6), augmented, and diminished chords; prefers the upper register.

Phase 2 — Triads plot

Phase 2 — Chord progressions

Adds Tymoczko-style voice-leading cost. Mixed canonical triads with smooth voice movement.

Phase 2 — Chord progressions plot

Phase 3 — Monophonic melodies

Sequential Sethares + Terhardt virtual-pitch salience + pitch-class diversity. 4–5-PC melodic gestures emerge — the top Western-scale match is the blues scale.

Phase 3 — Monophonic melodies plot

Phase 4 — Rhythm

Phase-coherence-based entrainment reward (linear approximation to Large–Kolen 1994). Discovered tempo peaks at ~120 BPM — inside Fraisse's preferred-tempo window.

Phase 4 — Rhythm plot

Phase 3+4 — Cross-paired melodic rhythm

Phase-3 melody pitches placed at Phase-4 rhythm onsets — no joint training, just synthesis composition.

Phase 4.5 — Joint melodic rhythm

Single MLP emits (pitch, IOI) pairs. Reward = melody + rhythm, jointly optimized. Tonal salience reaches 0.73 while phase coherence holds at 0.69.

Phase 4.5 — Joint melodic rhythm plot

Phase 7 — 2-voice counterpoint

Banded per-voice generator + horizontal/vertical Sethares + voice-crossing penalty. Zero crossings, P5–octave vertical intervals.

Phase 7 — 2-voice counterpoint plot

Phase 13 — 3-voice chorale

Same architecture, n_voices=3. Best-checkpoint reward +5.33, stratified voice lines with zero crossings.

Phase 13 — 3-voice chorale plot

Phase 13 — 4-voice chorale

n_voices=4. Six vertical pairs to satisfy — the limit of banded-MLP + REINFORCE at this training budget.

Phase 13 — 4-voice chorale plot

Phase 8b — Bohlen-Pierce triads (odd-partial timbre)

Same triad generator, partials=odd. Discovered chords cluster on BP-style ratios (≈ 5:7:9). Rendered with odd-only-harmonic synthesis so you hear the matching timbre.

Phase 8b — Bohlen-Pierce triads (odd-partial timbre) plot

Quantitative report

Full statistics across all phases:

============================================================
Phase 1 — intervals (harmonic timbre)
============================================================
  median ratio: 3.049
  top labels:
       7  minor_seventh
       7  octave
       7  major_seventh
       4  minor_sixth
       3  non-musical (ratio=2.919)

============================================================
Phase 2 — triads (harmonic timbre)
============================================================
  N samples: 512
  mean dissonance: 0.195
  pct samples with spread penalty > 0.01: 2.0%
  top triad labels:
      40  sus4_6_8_9
      28  major_4_5_6
      18  augmented
      10  diminished
       2  non-musical (r=[1.461,1.814])
       1  non-musical (r=[1.398,1.625])
       1  non-musical (r=[1.391,1.790])
       1  non-musical (r=[1.395,1.771])

============================================================
Phase 3 — melodies
============================================================
  N samples: 512
  mean tonal salience: 0.646
  PC count distribution: {5: 186, 4: 177, 3: 80, 6: 52, 2: 9, 7: 5, 8: 3}
  closest Western scale matches:
     115  blues              root=3
      64  blues              root=9
      50  blues              root=6
      32  blues              root=4
      26  major              root=4

============================================================
Phase 4 — rhythms
============================================================
  N samples: 256
  mean phase coherence: 0.828
  median best period: 0.553s (108.5 BPM)
  IQR period: 0.473–0.567s

============================================================
Phase 7 — counterpoint
============================================================
  N samples: 256
  mean vertical dissonance: 0.241
  mean voice crossings (out of 8): 0.00
  mean shared tonal salience: 0.419
  vertical-interval percentiles: 25%=22.0st, 50%=26.8st, 75%=33.4st

============================================================
Phase 8 — intervals (ODD-partial timbre)
============================================================
  median ratio: 1.425
  top BP-style labels:
     214  7:5
     177  3:2
      72  9:7
      16  25:21
       5  other (r=1.339)

============================================================
Phase 8b — triads (ODD-partial timbre)
============================================================
  N samples: 512
  mean dissonance (under odd partials): 0.092
  mean dissonance (under harmonic timbre, for comparison): 0.194
  median r1: 1.389,  median r2: 1.683

============================================================
Phase 8c — intervals (INHARMONIC partials, negative control)
============================================================
  median ratio: 1.225
  top labels (under Western tuning):
     101  minor_third
      84  major_third
      63  perfect_fourth
      58  major_second
      26  tritone

============================================================
Phase 11 — autoregressive melody (with motif autocorrelation)
============================================================
  N samples: 256, length per melody: 16
  mean tonal salience: 0.536
  PC count distribution: {8: 70, 9: 57, 7: 50, 6: 24, 10: 23, 5: 17, 11: 12, 12: 1, 13: 1, 4: 1}
  mean motif autocorrelation: 0.342
  closest Western scale matches:
     103  chromatic          root=0
      18  harmonic minor     root=2
      10  major              root=2

============================================================
Phase 12 — cadence-aware chord progressions
============================================================
  N samples: 256, chord positions: 4
  mean cadence_arc (middle − endpoint dissonance): -0.135
  mean dissonance per position:
    pos 0: 0.353
    pos 1: 0.168
    pos 2: 0.162
    pos 3: 0.247

============================================================
Phase 13 — 3-voice counterpoint
============================================================
  N samples: 256
  mean vertical dissonance: 0.824
  mean voice crossings: 0.02
  mean shared tonal salience: 0.424

============================================================
Phase 13 — 4-voice counterpoint
============================================================
  N samples: 256
  mean vertical dissonance: 1.998
  mean voice crossings: 0.65
  mean shared tonal salience: 0.416

built by .github/workflows/audio-preview.yml