Concurrency and Real-Time Scheduling
What Resilient does and does not do about concurrent execution, interrupts, and real-time scheduling — today, and where the design is heading.
Table of contents
Current Model: Single-Threaded
Resilient programs run in a single thread. All three execution
backends — the tree-walking interpreter, the bytecode VM
(--vm), and the Cranelift JIT (--jit) — evaluate a program
on one OS (or bare-metal) thread from main to program exit.
There is no spawn, no async, no green threads, no parallel
for, no work-stealing runtime. A running Resilient program
occupies exactly one CPU, executes one statement at a time, and
observes a single total order over its own side effects.
The embedded runtime (resilient-runtime/, #![no_std]) is the
same story. The crate is a pure library: it exposes a Value
type and the arithmetic / comparison operators over it, and
that is all. It does not start threads, does not install
interrupt handlers, does not call into any scheduler, does not
even take ownership of the program’s main. Whatever
surrounds it — a C main() on a bare-metal Cortex-M4F, a
FreeRTOS task, a Zephyr thread — is responsible for getting
cycles to it.
What this means for interrupt service routines:
- Resilient code does not run inside an ISR. A program
compiled with
--jitor interpreted at the host level cannot be installed as an interrupt handler. The host platform is not designed for it, and the JIT’s compile-time assumptions (heap access, tracing, relocation tables) would violate an ISR context anyway. - On embedded targets, the surrounding C / RTOS layer owns ISRs. The Resilient runtime library is called from task context only. If an ISR needs to wake a Resilient-hosted task, it does so through whatever primitive the RTOS provides (semaphore, queue, event flag) — Resilient sees only the task-side read of that primitive.
- There is no
unsafeblock in the language, by design. This means Resilient code cannot, today, dereference a volatile memory-mapped register. MMIO lives in the surrounding C layer; Resilient consumes already-cleaned values.
The single-threaded guarantee is strong and load-bearing for the safety story. A reviewer can read a Resilient function top-to-bottom and know that nothing else mutates its locals or arguments during execution. No hidden concurrency, no memory model to reason about.
Fault Tolerance vs. Concurrency
Two features of Resilient can superficially look like concurrency but are not.
live { ... } blocks are fault tolerance, not concurrency.
A live block executes its body on the calling thread, checks
its invariants, and on failure restores the block’s local
environment to its last-known-good snapshot and retries the
body — with exponential backoff between attempts (RES-142
timeout clause, RES-141 telemetry counters). At no point are
two things running simultaneously. The block just runs the
same code again, sequentially, with clean state. See
Memory Model — live-block snapshot semantics
for what is and is not captured in the snapshot.
Live blocks handle:
- a sensor returning a transient out-of-range reading
- a checksum failing on a packet
- a contract’s
asserttripping mid-computation
Live blocks do not handle:
- two tasks contending for the same mutable buffer
- an ISR preempting a task in the middle of a struct update
- lock-free coordination across cores
These are concurrency problems. Resilient does not have an answer for them yet — see the roadmap below.
Retries are not a concurrency primitive. A live block’s
retry loop is deterministic and serial. It does not spawn a
worker, does not time out against a wall clock in a separate
thread, does not deliver an interrupt to itself. The backoff
is a plain nop / busy loop at the caller’s thread
granularity.
Interoperability with RTOSes Today
The realistic deployment shape today is Resilient as a single-threaded task inside a host RTOS. The host provides the scheduler, the ISR machinery, the synchronization primitives, and the memory map. The Resilient runtime is a library that the task’s code links against and calls like any other library.
+---------------------------------------------------+
| Host RTOS (FreeRTOS / Zephyr / bare-metal loop) |
| |
| +---------------+ +---------------+ |
| | Task A (C) | | Task B | |
| | ISR handlers | | Resilient | |
| | MMIO | | runtime | |
| | IPC primitives| | (library) | |
| +-------+-------+ +-------+-------+ |
| | | |
| +---- queue / ------+ |
| mailbox / shared mem |
+---------------------------------------------------+
ISR handling is the surrounding layer’s responsibility. The ISR is a C function registered with the vector table at the platform level. It may deposit a value into a ring buffer, set a flag, or signal a semaphore. The Resilient task reads that value / flag / semaphore through a plain function call when the RTOS schedules it in.
Data exchange between an ISR and a Resilient task uses whatever mechanism the C side provides. The common patterns:
- Volatile shared memory. The C ISR writes a
volatile uint32_t; the Resilient task reads it through a small C shim wrappingldrwith avolatilemarker. On the Resilient side, each read produces anIntvalue; once that value is in hand, it can’t be torn by another context, because the runtime is single-threaded. - Lock-free single-producer / single-consumer queue. The ISR is the producer, the Resilient task is the consumer. Because Resilient is single-threaded, the consumer side does not need internal synchronization — the task reads a whole element out of the queue and works with it in isolation.
- RTOS primitive (semaphore / event flag / mailbox). The Resilient task blocks on the primitive via an FFI call (not a Resilient-language feature yet — the shim is in C). Wake-ups are driven by the ISR posting to the primitive.
ISR-safety by construction. Because only one Resilient
execution happens at a time, there is no scenario in which an
ISR preempts Resilient mid-update to a Resilient-owned value.
A Resilient Int, Bool, Float, or struct field is never
observed in a half-written state by another Resilient
execution, because there is no other Resilient execution. The
only interference possible is between Resilient and the
surrounding C layer, which is governed by the C layer’s
discipline (volatiles, atomics, critical sections) — not
Resilient’s.
What the Resilient side must still do carefully. Reading a multi-word value from shared memory (e.g., a 64-bit timestamp on a 32-bit MCU) is not atomic at the instruction level. The C shim, not the Resilient caller, is responsible for tearing protection (disable-interrupts-around-read, or a lock-free read pattern like Peterson’s double-read). Once the shim returns a scalar, Resilient treats it as a single value.
Effect Tracking (G18 — Prerequisite for Safe Concurrency)
Goalpost G18 on the roadmap is effect tracking: annotating every function with the set of effects it can perform. The planned alphabet starts small:
@pure— no reads or writes outside locals and arguments; no I/O, noliveretries, no randomness. Deterministic as a function of its inputs.@io— performs I/O (file, network, MMIO shim,println, anything observable outside the process).@random— reads a non-deterministic source.
The concrete form is not yet final — see RES-191 (@pure
annotation), RES-192 (@io inference), and RES-193 (effect
polymorphism for higher-order functions) in the ticket ledger.
What matters for this document is why we block concurrency
on effects.
Data races require knowing which functions touch shared
state. The moment two Resilient executions can run
concurrently — whether on two cores, two preemptible tasks on
one core, or two actors multiplexed by a cooperative scheduler
— the compiler needs a way to prove that no two of them mutate
the same location. The simplest sound rule is “only @pure
functions may be called from more than one place
concurrently,” with explicit channel / mailbox operations as
the only sanctioned @io-in-parallel primitive. Without
effect annotations, the compiler has no way to check that rule.
How this maps to safety-standard requirements. The effect system is not cosmetic; it directly discharges obligations downstream reviewers would otherwise impose by hand:
- MISRA C rule 8.6 — objects and functions used in more than one translation unit must have exactly one external definition. The analog in a Resilient world with concurrent tasks is: any shared mutable state must be reachable only through a nominated owner. Effect annotations let the compiler refuse a program that closes over shared state without declaring it.
- ISO 26262 ASIL-B and above — freedom from interference between software components. The standard requires evidence that a fault in one component cannot corrupt another’s state, timing, or control flow. An effect system plus message-passing between actors produces exactly that evidence as a compile-time property, not a testing artifact.
- DO-178C objective A-7 / DO-330 — verification of coupling between software components. Data coupling and control coupling are required to be enumerated; effects make both explicit at every call edge.
Effect tracking is therefore not a concurrency feature. It is the foundation a concurrency feature can rest on without becoming a new source of safety-case risk.
Roadmap for Structured Concurrency
The target shape is actor-based, not shared-memory, and
the inspiration is explicit: Erlang/OTP supervisor trees, as
noted in the design philosophy.
The language already commits to “let it crash” semantics at
the block level (the live { } block is a supervisor of
scope = one block); the concurrency roadmap extends the same
model to scope = one actor.
The broad shape being designed toward:
- Each actor owns its state. An actor is a Resilient function plus a private local heap. No other actor has a reference to that heap. This is the data-race-freedom property stated positively: there is no shared mutable state to race over.
- Actors communicate by message passing. A message is a
value — an
Int, a struct, an array — sent through a typed channel. The send operation is@io; the send is non-blocking if the channel has capacity. The receive operation is@ioand may block (cooperatively). - Each actor has its own supervisor. The Erlang pattern:
when an actor crashes, its supervisor observes the exit,
decides on a restart policy, and either restarts, escalates
to its own supervisor, or lets the subtree die. A
live { }block composes with this — the block is one level of retry, the supervisor is the next level up. - The scheduler is cooperative. Actors yield at well-defined points (receive, explicit yield, completion). No preemption of Resilient code by other Resilient code; preemption only happens at the OS / RTOS layer, where it is the host’s responsibility.
A syntax sketch — speculative, not committed, may change:
// Speculative — no ticket has claimed this syntax yet.
actor Counter {
let mut count: Int = 0;
receive Increment {
count = count + 1;
}
receive Read(reply: Channel<Int>) {
send reply, count;
}
}
fn main() {
let c = spawn Counter;
send c, Increment;
send c, Increment;
let answer_chan = channel<Int>();
send c, Read(answer_chan);
let n = recv answer_chan;
println("count = " + n);
}
Things this sketch is deliberately noncommittal about:
- Whether
actoris a top-level declaration or a block form. - Whether message types are structurally typed, nominally
typed, or defined with a dedicated
messagekeyword. - Whether
spawnreturns a typed handle, an opaqueActorRef, or a pair of sender / receiver. - How supervisors are declared (likely a separate
supervise { ... }block that lists child actors and restart strategies). - Scheduling guarantees — the first cut will not make any.
The point of the sketch is to show the shape, not to promise the shape. Expect changes once the real design work starts.
How live blocks compose with actors. A live block inside
an actor retries local work. If the live block’s budget is
exhausted, it raises to the actor. If the actor can’t handle
it, the actor crashes. If the actor crashes, its supervisor
decides the next move. The same failure-handling vocabulary
(live → actor → supervisor) nests cleanly from expression to
system.
Real-Time Scheduling Considerations
Resilient does not currently make real-time scheduling guarantees. There is no guarantee on worst-case execution time, no bound on retry-loop wall-clock duration, no hard deadline mechanism. Programs are correct-in-functional-terms; they are not yet correct-in-timing-terms.
Practical consequences:
- The JIT introduces non-deterministic latency. Cranelift compiles lazily on first call, allocates in the host heap, and may trigger guard-page handling or ICache invalidation. None of that is safe for hard real-time code where a missed deadline is a safety event. The JIT is fine for dev-loop iteration and for soft-RT paths where an occasional 10 ms stall is tolerable; it is not fine for control loops running at 1 kHz or above with hard deadlines.
- The bytecode VM is more predictable — no on-line
compilation, a fixed dispatch loop, no heap growth during
steady-state execution of pure-integer code — but it still
uses the host allocator for
String/Array/Map/Setoperations. A program that stays insideInt/Bool/Floatand stack-allocated structs will run with predictable per-op cost on the VM; a program that allocates, won’t. - The tree-walking interpreter is the least suitable of the three for RT — every expression evaluation allocates intermediate environments. Useful for dev, unsuitable for a flight control loop.
Where the design is heading:
- AOT compilation. A Cranelift-based AOT path (or an LLVM backend, depending on target support) would emit a single binary with no runtime compiler, no lazy codegen, no JIT heap. That is the precondition for meaningful WCET analysis — the binary the analyzer sees is the binary that runs.
- WCET analysis integration. Once AOT lands, the
standard tools (aiT, Chronos, OTAWA) can operate on the
produced binary. Resilient’s contribution at the
language level would be to keep loops bounded (either by
a
whilecondition the verifier can pin, or by aforover a statically-sized array) so that the analysis doesn’t dead-end on an unbounded cycle. - Retry-budget timing. Today a
liveblock retries with exponential backoff up to a default budget. The retry itself is not wall-clock-bounded — a future RES ticket will add adeadlineclause tied to a platform-provided monotonic clock, so the block fails deterministically if its retries take too long in real time.
Practical guidance today:
- Hard real-time work stays in C. Control loops, ISRs, any code that has to meet a deadline at μs granularity — C or hand-written assembly, supervised by the RTOS.
- Soft real-time work can use the VM. Deadlines in the 10–100 ms range, algorithms with bounded-but-variable cost, code paths where an occasional slower iteration is acceptable.
- Batch / one-shot analysis can use the JIT. Startup self-test, offline telemetry post-processing, config validation.
- ISRs never call into Resilient. Full stop. If an ISR needs a computation done, it queues work for a Resilient-hosted task.
What This Means for Safety Standards
The single-threaded model is, surprisingly, a feature from a certification perspective. It removes a large class of evidence obligations.
DO-178C (airborne software) — coupling analysis. Multi-task data coupling and control coupling have to be analyzed at the system integration level (objectives A-7.7 and A-7.8). A single-threaded Resilient task collapses the intra-Resilient analysis to a trivial result: there are no internal tasks, hence no internal data coupling or control coupling to enumerate. The analysis reduces to the boundary between the Resilient task and its neighbors — which is identical to the boundary any C task of the same shape would present. The tool qualification question (DO-330) for Resilient itself is orthogonal and is addressed by the Z3 certificate story (see philosophy — verifiability).
ISO 26262 (road vehicles) — freedom from interference. The standard (Part 6, clause 7.4.8) requires evidence that a software element cannot cause another element to fail through shared resources, timing, or execution. For two ASIL-separated Resilient tasks to be argued free of interference, each is a separate Resilient program in its own address space (OS partition or MPU-enforced region), with a declared message- passing interface. Within a single Resilient program, the single-threaded execution model means that intra-program freedom from interference is structural rather than evidential. When multi-actor Resilient lands (see roadmap above), the argument has to be refreshed to cover the actor boundary — which is exactly why effect tracking (G18) is the prerequisite.
IEC 61508 / IEC 62304 / DO-178C common theme — determinism. Certification-grade code is required to behave deterministically as a function of its inputs and state. The JIT’s on-demand compilation violates this for the first call to each function; the VM’s allocator can violate it under heap pressure. An AOT build is the standard-compliant configuration. This document should be read as a promise that the design is heading there, not a claim that any build of Resilient today is qualification-ready.
Summary
- Today: one thread, three backends, no concurrency primitives in the language.
- ISR-safe at the boundary because Resilient never runs in an ISR and is never preempted mid-value by another Resilient execution.
- Live blocks are fault tolerance, not concurrency.
- Effect tracking (G18) is the named prerequisite for any concurrency work.
- The roadmap points to Erlang-style actors + supervisor trees, built on top of effect tracking and an AOT path.
- Real-time scheduling guarantees are a future work item gated on AOT; today, hard-RT code stays in C.
If you are evaluating Resilient for a safety-critical project: the honest pitch is “a single-threaded, verifiable, self-healing task that your RTOS runs alongside C code.” Anything richer is roadmap.
Further reading
- Design philosophy — the three pillars, including why “let it crash” is the model.
- Memory model — live-block snapshot / restore / retry details and value ownership across tiers.
- no_std runtime — the embedded build and its feature flags.
- ROADMAP.md — G18 (effect tracking) and the full goalpost ladder.
- GitHub Issues — the ticket ledger for everything above.