Hacker News Top 30 — 2026-04-27

5.EvanFlow – A TDD driven feedback loop for Claude Code

Sourcehttps://github.com/evanklem/evanflow

SiteGitHub

Submitterevanklem2004 (Hacker News)

Submitted2026-04-27 01:56 UTC (Hacker News)

HN activity10 points · 3 comments

Length1.9K words (~9 min read)

Languageen

A TDD-driven iterative feedback loop for software development. 16 cohesive Claude Code skills walk an idea from brainstorm → plan → execute → tdd → iterate, with checkpoints throughout. - evanklem/...

A TDD-driven iterative feedback loop for software development with Claude Code.

16 cohesive skills + 2 custom subagents walk an idea from brainstorm through implementation, with checkpoints throughout where you stay in control. One entry point: say "let's evanflow this" and the orchestrator runs the loop.

brainstorm → plan → execute (sequential or parallel) → tdd → iterate → STOP

The loop is conductor, not autopilot: real checkpoints at design approval, plan approval, and after iteration. The agent stops short of every git operation and waits for your direction. No auto-commits. No forced ceremony. No "must invoke a skill" tax.

Quick Install

The recommended path — Claude Code's plugin marketplace:

/plugin marketplace add evanklem/evanflow
/plugin install evanflow@evanflow

Restart, then try:

"Let's evanflow this — I want to add a small feature that does X."

evanflow-go fires and walks the loop. The git-guardrails hook auto-activates with the plugin (no settings.json edit needed). Skills appear under the evanflow: namespace (e.g., /evanflow:evanflow-go).

See Installation below for two alternative paths.

What Makes It a Feedback Loop

The loop is built around discipline that compounds across iterations, not single-shot generation. Every step has a checkpoint that gates the next:

Brainstorm clarifies intent, proposes 2–3 approaches with embedded grill (stress-test) → you approve the design
Plan maps file structure first (deep modules, deletion test) → you approve the plan
Execute runs task-by-task with inline verification → blockers stop the loop and surface to you
TDD is vertical-slice only: one failing test → minimal impl → repeat. Tests verify behavior through public interfaces, so they survive refactors
Iterate re-reads the diff with fresh eyes, runs quality checks, screenshots UI changes, and runs against a Five Failure Modes checklist (hallucinated actions, scope creep, cascading errors, context loss, tool misuse). Hard cap of 5 iterations
STOP. Report. Await your direction. The agent never auto-commits, never auto-stages, never proposes a PR

For plans with 3+ truly independent units, the loop forks into a parallel coder/overseer orchestration: one coder per unit (using vertical-slice TDD with a RED checkpoint), one overseer per coder (read-only review subagent that can't modify code), plus an integration overseer that runs named integration tests at every touchpoint. The integration tests are the executable contract — interfaces can't drift if both sides have to satisfy the same passing test.

Hard Rules Baked Into the Loop

Several rules come from 2025-2026 industry research on agentic coding failure modes and are baked into every skill:

Never invent values — file paths, env vars, IDs, function names, library APIs. If unsure, the agent stops and asks. (Action-hallucination is the most dangerous agent failure.)
Assertion-correctness warning — research shows 62% of LLM-generated test assertions are wrong. Both evanflow-tdd and the overseer review explicitly check whether a one-character bug in the implementation would still let the assertion pass.
Watch for context drift — evanflow-compact triggers when symptoms appear (re-asking established questions, contradicting earlier decisions). Industry data: ~65% of enterprise AI coding failures trace to context drift, not raw token exhaustion.
Five Failure Modes pass in iterate + overseer review — explicit check against hallucinated actions, scope creep, cascading errors, context loss, tool misuse.
No skill tax — ad-hoc questions don't require a skill invocation. Skills are tools, not a tollbooth.

The Skill Set

Default Loop (5 skills)

Skill	Purpose
`evanflow-brainstorming`	Clarify intent, propose 2–3 approaches with embedded grill (stress-test). Mockup quick-mode for visual-only requests.
`evanflow-writing-plans`	File structure first, bite-sized tasks, embedded grill. Step 2.5 offers `evanflow-coder-overseer` if the plan is parallelizable.
`evanflow-executing-plans`	Task-by-task with inline verification. Step 0 re-offers parallel path. Hands off to iterate, then STOPS.
`evanflow-tdd`	Vertical-slice TDD. One test → one impl → repeat. Behavior through public interface. Assertion-correctness warning.
`evanflow-iterate`	Self-review loop after implementation. Re-read diff, fix issues, run quality checks, screenshot UI (via headless Chromium). Five Failure Modes checklist. Hard cap of 5 iterations.

Special-Purpose (8 skills)

Skill	Purpose
`evanflow-go`	Single entry point. Say "let's evanflow this" and it walks the whole loop.
`evanflow-glossary`	Extract canonical domain terms into `CONTEXT.md`. Flag ambiguities and synonyms.
`evanflow-improve-architecture`	Surface refactor opportunities via the deletion test + deep-modules vocabulary.
`evanflow-design-interface`	"Design it twice" — spawn 3+ parallel sub-agents with radically different constraints, compare on depth/simplicity/efficiency.
`evanflow-debug`	Root-cause discipline. Hypothesis stated explicitly, embedded grill before fixing, failing test first.
`evanflow-review`	Both halves of code review (giving + receiving). Don't capitulate to feedback you can't justify.
`evanflow-prd`	Synthesize a PRD from existing context. For substantial new features.
`evanflow-qa`	Conversational bug discovery → issue draft. Asks before filing.

Cross-Cutting (1 skill)

Skill	Purpose
`evanflow-compact`	Long-session context management. Strategies for proactive summarization at clean boundaries. Drift symptoms checklist.

Meta (1 skill)

Skill	Purpose
`evanflow`	The index. Shared vocabulary + when to invoke each `evanflow-*` skill.

Custom Subagents (2)

In agents/ — invoked via Agent tool with subagent_type: parameter:

Subagent	Tool restrictions	Purpose
`evanflow-coder`	Read, Edit, Write, Glob, Grep, Bash, TodoWrite	Implementation subagent for `evanflow-coder-overseer`. Tools + system prompt prevent git ops, out-of-scope edits, value hallucination.
`evanflow-overseer`	Read, Grep, Glob (no Edit/Write/Bash)	Read-only review subagent. Tools physically enforce "report findings, never fix."

Bundled Hook

hooks/block-dangerous-git.sh — PreToolUse hook that blocks destructive git ops (git push, git reset --hard, git clean -f, git branch -D, git checkout ., git restore .). Auto-activates with the plugin install path.

Hard Rules (apply to every skill)

Never auto-commit, never auto-stage, never auto-finish. Every git write op requires you to explicitly ask in the current turn.
Never invent values. File paths, env vars, IDs, function names, library APIs — if unsure, the agent stops and asks.
No skill tax. Ad-hoc questions don't require a skill invocation. Skills are tools, not a tollbooth.
No forced spec/plan paths. Files live where you want them.
Verify before claiming done. Quality checks (typecheck, lint, test) run before any "done" report.

Requirements

Claude Code (any recent version)
Bash — for the bundled hook script (Linux, macOS, or Windows + WSL)
jq — used by the hook script to parse Claude's JSON tool input. Install via apt install jq, brew install jq, or your platform's package manager. If jq is missing, the guardrail hook fails silently and dangerous git ops are NOT blocked.

Optional but recommended:

chromium or google-chrome — for evanflow-iterate's visual verification of UI changes (chromium --headless --screenshot=...). Falls back gracefully if missing — the skill flags it and asks you to verify visually.

Installation

Three paths, in priority order. All three end with the same skill set in your .claude/skills/. The plugin path additionally auto-wires the guardrail hook.

Path 1 — Claude Code Plugin Marketplace (recommended)

This is the cleanest install. Skills, agents, AND the guardrail hook all activate automatically.

/plugin marketplace add evanklem/evanflow
/plugin install evanflow@evanflow

Restart Claude Code (or /reload-plugins). Skills appear namespaced as /evanflow:evanflow-go, /evanflow:evanflow-tdd, etc. Auto-invocation via "let's evanflow this" still works regardless of namespace.

To uninstall: /plugin uninstall evanflow@evanflow.

Path 2 — `npx skills@latest add` CLI

Works against any GitHub repo with SKILL.md-shaped folders. Installs skills only — does not install the guardrail hook or custom subagents (you'd add those manually if you want them).

# Install all 16 skills at once
npx skills@latest add evanklem/evanflow -s '*' -y

# Or install individual skills
npx skills@latest add evanklem/evanflow/evanflow-go
npx skills@latest add evanklem/evanflow/evanflow-tdd
# ...

This places skills under ~/.claude/skills/ (global) or .claude/skills/ (project, auto-detected).

Path 3 — Manual Copy

For users who want full control, no CLI dependencies.

git clone https://github.com/evanklem/evanflow.git
cd evanflow

# Skills (project-level — adjust to ~/.claude/skills/ for global)
mkdir -p .claude/skills
cp -r skills/* .claude/skills/

# Agents (custom subagents used by evanflow-coder-overseer)
mkdir -p .claude/agents
cp agents/*.md .claude/agents/

# Git guardrails hook (optional but recommended)
mkdir -p .claude/hooks
cp hooks/block-dangerous-git.sh .claude/hooks/
chmod +x .claude/hooks/block-dangerous-git.sh

Then register the hook in your .claude/settings.json:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/block-dangerous-git.sh"
          }
        ]
      }
    ]
  }
}

Optionally, paste examples/CLAUDE.md.snippet into your project's CLAUDE.md to brief Claude about EvanFlow's conventions.

Verify Any Install Path

Restart Claude Code. Try saying:

"Let's evanflow this — I want to add a small feature that does X."

evanflow-go should fire and walk you through the loop. To verify the guardrail hook (paths 1 and 3 only): try git reset --hard HEAD from the Bash tool — it should be blocked with "BLOCKED: ... matches dangerous pattern".

Customization

Every skill has a clear structure with a ## Hard Rules section. To adapt to your project:

Replace <frontend> and <backend> placeholders in skills like evanflow-writing-plans with your actual paths if you find yourself answering the same question repeatedly.
Document your project's quality checks in your CLAUDE.md — exact typecheck, lint, and test commands. The skills reference these abstractly.
Adapt the visual verification step in evanflow-iterate if you don't have chromium available — substitute google-chrome --headless or another tool.
Edit the cohesion contract template in evanflow-coder-overseer to match your project's conventions (your authentication middleware name, your DB write helper, etc.).

The skills are designed to be edited. Treat them as starting points, not gospel.

If you fork to make a vendor-specific variant (your-name-flow), great — that's the spirit.

How EvanFlow Works End-to-End

You say: "let's evanflow this — I want to add a feature that does X"
           │
           ▼
       evanflow-go (the conductor)
           │
           ├─ Phase 0: Restate idea, scope check
           ├─ Phase 1: evanflow-brainstorming (CHECKPOINT: design approval)
           ├─ Phase 2: evanflow-writing-plans (CHECKPOINT: plan approval)
           │            └─ Step 2.5: parallelization check
           ├─ Phase 3: evanflow-executing-plans (sequential)
           │            OR
           │            evanflow-coder-overseer (parallel)
           │              ├─ contract with named tests + integration tests
           │              ├─ RED checkpoint (all coders write failing tests, orchestrator verifies)
           │              ├─ GREEN phase (vertical-slice TDD per coder)
           │              ├─ per-coder overseers (review, never fix)
           │              └─ integration overseer (runs touchpoint tests)
           ├─ Phase 4: evanflow-iterate (5x cap, Five Failure Modes pass)
           └─ Phase 5: STOP. Report what was done. Await your direction.

Cross-cutting: evanflow-compact runs at clean boundaries when context gets heavy.

Special-purpose skills (evanflow-debug, evanflow-improve-architecture, evanflow-design-interface, evanflow-glossary, evanflow-prd, evanflow-qa, evanflow-review) are pulled in mid-flow when relevant.

Repository Structure

.
├── .claude-plugin/
│   ├── plugin.json          — plugin identity (name, description, version)
│   └── marketplace.json     — marketplace manifest (lists EvanFlow as one bundled plugin)
├── skills/                  — 16 SKILL.md folders
│   ├── evanflow/
│   ├── evanflow-go/
│   ├── evanflow-brainstorming/
│   ... (etc)
├── agents/                  — 2 custom subagent definitions
│   ├── evanflow-coder.md
│   └── evanflow-overseer.md
├── hooks/
│   ├── hooks.json           — auto-activated when plugin installs
│   └── block-dangerous-git.sh
├── examples/
│   └── CLAUDE.md.snippet    — for the manual-copy install path
├── docs/
│   └── skills-audit.md      — verdict on all 38 candidate skills considered
├── README.md
└── LICENSE                  — MIT

Credits

EvanFlow synthesizes ideas from:

mattpocock/skills by Matt Pocock — vertical-slice TDD, deep modules, deletion test, design-it-twice, ubiquitous language, grill-me, caveman.
superpowers by Jesse Vincent — verification-before-completion, code review patterns, parallel agent dispatch, finishing-a-development-branch (the 4-option presentation).
git-guardrails-claude-code — bundled in hooks/ (script copied verbatim). Original by Matt Pocock.

Industry research informing the design:

Anthropic's 2026 Agentic Coding Trends Report
9 Critical Failure Patterns of Coding Agents (DAPLab, Columbia)
Test-Driven Development for Code Generation (arXiv 2402.13521) — assertion-correctness findings

License

MIT. See LICENSE.

Contributing

Issues and pull requests welcome. EvanFlow is opinionated by design — proposals to add ceremony or auto-actions will be politely declined. Proposals to further reduce ceremony, sharpen rules, or add evidence-backed improvements are very welcome.

↑ top

6.Fast16: High-precision software sabotage 5 years before Stuxnet

Sourcehttps://www.sentinelone.com/labs/fast16-mystery-shadowbrokers-reference-reveals-high-precision-software-sabotage-5-years-before-stuxnet/

SiteSentinelOne

AuthorVitaly Kamluk & Juan Andrés Guerrero-Saade

Submitted2026-04-26 20:18 UTC (Hacker News)

HN activity174 points · 46 comments

Length6.8K words (~30 min read)

Languageen

A previously unknown 2005 cyber sabotage framework patches high-precision calculation software in memory to silently corrupt results.

Executive Summary

SentinelLABS has uncovered a previously undocumented cyber sabotage framework whose core components date back to 2005, tracked as fast16.
fast16.sys selectively targets high-precision calculation software, patching code in memory to tamper with results. By combining this payload with self-propagation mechanisms, the attackers aim to produce equivalent inaccurate calculations across an entire facility.
This 2005 attack is a harbinger for sabotage operations targeting ultra expensive high-precision computing workloads of national importance like advanced physics, cryptographic, and nuclear research workloads.
fast16 predates Stuxnet by at least five years, and stands as the first operation of its kind. The use of an embedded customized Lua virtual machine predates the earliest Flame samples by three years.
The name ‘fast16’ is referenced in the infamous ShadowBrokers’ leak of NSA’s ‘Territorial Dispute’ components. An evasion signature instructs operators: “fast16 *** Nothing to see here – carry on ***”

Overview

Our investigation into fast16 starts with an architectural hunch. A certain tier of apex threat actors has consistently relied on embedded scripting engines as a means of modularity. Flame, Animal Farm’s Bunny, ‘PlexingEagle’, Flame 2.0, and Project Sauron each built platforms around the extensibility and modularity of an embedded Lua VM. We wanted to determine whether that development style arose from a shared source, so we set out to trace the earliest sophisticated use of an embedded Lua engine in Windows malware.

Lua is a lightweight scripting language with a native proficiency for extending C/C++ functionality. Given the appeal of C++ for reliable high-end malware frameworks, this capability is indispensable to avoid having to recompile entire implant components to add functionality to already infected machines. We did not find an indication of direct shared provenance, but our investigation did uncover the oldest instance of this modern attack architecture.

Lua leaves a distinctive fingerprint. Compiled bytecode containers start with the magic bytes 1B 4C 75 61 (\x1bLua), followed by a version byte, and the engine typically exposes a characteristic C API and environment variables such as LUA_PATH. Hunting for these traits across mid-2000s malware collections surfaced a sample that initially looked unremarkable: svcmgmt.exe.

svcmgmt.exe | A 2005 Lua-Powered Service Binary

On the surface, svcmgmt.exe appears to be a generic console‑mode service wrapper from the Windows 2000/XP era.

Filename	svcmgmt.exe
Filesize	315,392 bytes
MD5	dbe51eabebf9d4ef9581ef99844a2944
SHA1	de584703c78a60a56028f9834086facd1401b355
SHA256	9a10e1faa86a5d39417cae44da5adf38824dfb9a16432e34df766aa1dc9e3525
Type	PE32 executable for MS Windows 4.00 (console), Intel i386
Link Time	2005-08-30 18:15:06 UTC

A closer look reveals an embedded Lua 5.0 virtual machine and an encrypted bytecode container unpacked by the service entry point.

The developers extended the Lua environment to include:

a wstring module for native unicode handling
a built‑in symmetric cipher, exposed through a function commonly labelled b, used to decrypt embedded data
multiple modules that bind directly into Windows NT filesystem, registry, service control, and network APIs.

Even by itself, svcmgmt.exe already looks like an early high-end implant, a modular service binary that hands most of its logic to encrypted Lua bytecode. The binary includes a crucial detail: a PDB path that links the binary to the kernel driver fast16.sys.

fast16 | A Nagging Mystery from the ShadowBrokers Leak

Buried in the binary’s strings is a PDB reference:

C:\buildy\driver\fd\i386\fast16.pdb

At first glance, the path is structured like any other compiler artifact: an internal build directory, a component name (fast16), and an architecture hint (i386). However, in this case there’s a mismatch. The string appears inside of a service-mode executable, and yet the driver\fd\i386\fast16 segment of the pdb string clearly refers to a kernel driver project.

Following that clue led us to a second binary, fast16.sys:

Filename	fast16.sys
Filesize	44,580 bytes
MD5	0ff6abe0252d4f37a196a1231fae5f26
SHA256	07c69fc33271cf5a2ce03ac1fed7a3b16357aec093c5bf9ef61fbfa4348d0529
Type	PE32 executable for MS Windows 5.00 (native), Intel i386, 5 sections
Link Time	2005-07-19 15:15:41 UTC (0x42dd191d)

This kernel driver is a boot-start filesystem component that intercepts and modifies executable code as it’s read from disk. Although a driver of this age will not run on Windows 7 or later, for its time fast16.sys was a cut above commodity rootkits thanks to its position in the storage stack, control over filesystem I/O, and rule-based code patching functionality.

In April 2017, almost 12 years after the compilation timestamp, the same filename, “fast16” appeared in the ShadowBrokers leak. Dr. Boldizsár Bencsáth’s research into Territorial Dispute points to a text file, drv_list.txt. The 250KB file is a short list of driver names used to mark potential implants cyber operators might encounter on a target box as “friendly” or to “pull back” in order to avoid clashes with competing nation-state hacking operations.

Screenshot from Crysys Lab’s ShadowBrokers leak analysis paper

The guidance for one particular driver, ‘fast16’, stands out as both unique and particularly unusual.

The string inside svcmgmt.exe provided the key forensic link in this investigation. The pdb path connects the 2017 leak of deconfliction signatures used by NSA operators with a multi-modal Lua‑powered ‘carrier’ module compiled in 2005, and ultimately its stealthy payload: a kernel driver designed for precision sabotage.

svcmgmt.exe | Architecture of the Carrier

The core component of fast16, svcmgmt.exe, functions as a highly adaptable carrier module, changing its operational mode based on command-line arguments.

No arguments: Runs as a Windows service.
-p: Sets InstallFlag = 1 and runs as a service (Propagate/Install & Run).
-i: Sets InstallFlag = 1 and executes Lua code (Install & Execute Lua).
-r: Executes Lua code without setting the install flag (Execute Lua).
Any other argument (<filename>): Interprets as a filename, and spawns two children: the original command and one with the -r argument (Wrapper/Proxy Mode).

Internally, svcmgmt.exe stores three distinct payloads, including encrypted Lua bytecode that handles configuration, its propagation and coordination logic, auxiliary ConnotifyDLL, and the fast16.sys kernel driver.

Composition of the Carrier payload

By separating a relatively stable execution wrapper from encrypted, task-specific payloads, the developers created a reusable, compartmentalized framework that they could adapt to different target environments and operational objectives while leaving the outer carrier binary largely unchanged across campaigns.

The Wormlets and Early Evasion Architecture

The early 2000s saw a large number of network worms. Most were written by enthusiasts, spread quickly, and carried little or no meaningful payload. fast16 originates from the same period but follows a completely different pattern indicative of its provenance as state-level tooling. It’s the first recorded Lua-based network worm, and was built with a highly specific mission.

The carrier was designed to act like cluster munition in software form, able to carry multiple wormable payloads, referred to internally as ‘wormlets’. The svcmgmt.exe module performs the following steps:

Prepares the configuration, defining the payload path, service details, and target IP ranges.
Converts the configuration values to wide-character strings for the C layer.
Escalates privileges and installs the carrier executable as the SvcMgmt service, then starts it.
Optionally, based on the configuration setting, deploy the kernel driver implant fast16.sys.
Releases the wormlets. In this particular configuration, only one wormlet slot is populated with an SCM wormlet that looks for network servers, copies the payload over a network share and starts that remote service.
Repeats the process indefinitely, sleeping for the configured initial delay between waves, until a failure threshold or external kill condition is reached.

The wormlets were stored in the carrier’s internal storage:

The single deployed wormlet found in svcmgmt.exe (the SCM wormlet) exemplifies a simple but effective propagation strategy based on native Windows capabilities and weak network security. It targets Windows 2000/XP environments and relies on default or weak administrative passwords on file shares. All spreading is done through standard Windows service-control and file-sharing APIs, an early example of propagation that leans on built-in administration features rather than custom network protocols.

Before this workflow runs, a pre-installation kill-switch checks the environment. The ok_to_install() routine calls ok_to_propagate() and propagation is only allowed if it’s manually forced or if it’s made sure common security products aren’t found by checking for associated registry keys. The routine walks a list of vendor keys and aborts installation if any of them are present, preventing deployment into monitored environments.

For tooling of this age, that level of environmental awareness is notable. While the list of products may not seem comprehensive, it likely reflects the products the operators expected to be present in their target networks whose detection technology would threaten the stealthiness of a covert operation:

HKLM\SOFTWARE\Symantec\InstalledApps
HKLM\SOFTWARE\Sygate Technologies, Inc.\Sygate Personal Firewall
HKLM\SOFTWARE\TrendMicro\PFW
HKLM\SOFTWARE\Zone Labs\TrueVector
HKLM\SOFTWARE\F-Secure
HKLM\SOFTWARE\Network Ice\BlackIce
HKLM\SOFTWARE\McAfee.com\Personal Firewall
HKLM\SOFTWARE\ComputerAssociates\eTrust EZ Armor
HKLM\SOFTWARE\RedCannon\Fireball
HKLM\SOFTWARE\Kerio\Personal Firewall 4
HKLM\SOFTWARE\KasperskyLab\InstalledProducts\Kaspersky Anti-Hacker
HKLM\SOFTWARE\Tiny Software\Tiny Firewall
HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\Look n Stop 2.05p2
HKCU\SOFTWARE\Soft4Ever
HKLM\SOFTWARE\Norman Data Defense Systems
HKLM\SOFTWARE\Agnitum\Outpost Firewall
HKLM\SOFTWARE\Panda Software\Firewall
HKLM\SOFTWARE\InfoTeCS\TermiNET

A separate user-mode component, svcmgmt.dll, provides a minimal reporting channel. Contained within the carrier’s internal storage, this DLL is registered through the Windows AddConnectNotify() API so that it’s called each time the system establishes a new network connection using the Remote Access Service (RAS), responsible for dial-up connections and early VPNs in the 2000s.

Module Name	User Module (connotifydll)
Filename	svcmgmt.dll
Filesize	45056 bytes
MD5	410eddfc19de44249897986ecc8ac449
SHA256	8fcb4d3d4df61719ee3da98241393779290e0efcd88a49e363e2a2dfbc04dae9
Link Time	2005-06-06 18:42:45 UTC
Type	PE32 DLL (i386, 4 sections)

When invoked, the DLL decodes an obfuscated string to obtain the named pipe \\.\pipe\p577, attempts to connect to the local pipe, and writes the remote and local connection names to the pipe before closing it. The module doesn’t run independently and must be registered by a host process.

fast16.sys | A Filesystem Driver for Precision Sabotage

The kernel driver fast16.sys is the most potent component of the framework.

The driver is configured with Start=0 (boot) and Type=2 (filesystem driver) in the SCSI class group. It loads automatically at an early stage, alongside disk device drivers, and inserts itself above each filesystem device (NTFS, FAT, MRxSMB). On entry it:

disables the Windows Prefetcher by setting the EnablePrefetcher value to 0 under the Session Manager’s PrefetchParameters key, forcing subsequent code‑page requests through the full filesystem stack,
resolves kernel APIs dynamically using a simple XOR‑based string cipher and a scan of ntoskrnl.exe, and
exposes \Device\fast16 and \??\fast16 with a custom DeviceType value 0xA57C, which serves as a secondary forensic marker.

The driver registers with IoRegisterFsRegistrationChange so it can attach a worker device object on top of every active and newly created filesystem device. All relevant I/O Request Packets, including IRP_MJ_CREATE, IRP_MJ_READ, IRP_MJ_CLOSE, IRP_MJ_QUERY_INFORMATION, IRP_MJ_FILE_SYSTEM_CONTROL, and associated Fast I/O paths, are routed through these worker devices.

Despite loading at boot, the kernel‑level code injection engine is only activated after the system opens explorer.exe. This design defers expensive monitoring and patching until the desktop environment is available and avoids unnecessary impact on core boot performance.

Narrow Targeting via Intel Compiler Artefacts

Once activated, fast16.sys focuses on executable files. A file is a valid target if it meets two criteria:

The filename ends with .EXE.
Immediately after the last PE section header, there is a printable ASCII string starting with Intel.

This selection logic points to executables compiled with the Intel C/C++ compiler, which often placed compiler metadata in that region. It indicates that the developers knew their target software was built with this toolchain.

For files meeting these criteria, the driver performs a PE header modification in memory. It injects two additional sections, .xdata and .pdata, and fills them with bytes from the original code section, increasing the section count and keeping a clean copy of the code. The intent is likely to increase stability while still allowing extensive patching, although without identifying the original target binaries this remains an informed hypothesis.

Rule‑Driven Patching and Floating‑Point Corruption

The patching engine is a minimalist, performance‑optimised, stateful scanning and modification tool. It is configured with a set of 101 rules, each containing pattern matching and replacement logic. To maintain performance, the engine:

uses a 256‑byte dispatch array and only flags the starting byte values of a small number of unique patterns,
allows wildcards inside patterns so a single rule can match several compiler‑optimised variants of the same code, and
supports state flags that some rules can set or check, enabling multi‑stage modification sequences similar to those used by advanced antivirus scanning engines.

Most patched patterns correspond to standard x86 code used for hijacking or influencing execution flow. One injected block is different. It’s a larger and complex sequence of Floating Point Unit instructions dedicated to precision arithmetic and scaling values in internal arrays. This code is a standalone mathematical calculation function unrelated to code flow hijacking or any other typical malicious code injection.

To understand what the driver expected to see, we converted the patching rules into hexadecimal YARA signatures and ran them against a large, period‑appropriate corpus. The results showed a very low hit rate: fewer than ten files matched two or more patterns. Those matches, however, shared a clear theme. They were precision calculation tools in specialised domains such as civil engineering, physics and physical process simulations.

The FPU patch in fast16.sys was written to corrupt these routines in a controlled way, producing alternative outputs. This moves fast16 out of the realm of generic espionage tooling and into the category of strategic sabotage. By introducing small but systematic errors into physical‑world calculations, the framework could undermine or slow scientific research programs, degrade engineered systems over time or even contribute to catastrophic damage.

A sabotage operation of this kind would be foiled by verifying calculations on a separate system. In an environment where multiple systems shared the same network and security posture, the wormable carrier would deploy the malicious driver module to those systems as well, reducing the chance that an independent calculation would diverge from the corrupted output.

At this time, we’ve been unable to identify all of the target binaries in order to understand the nature of the intended sabotage. We welcome the contributions of the larger infosec research community and have included YARA rules to hunt for these patterns in the appendix below.

The Data Patching Engine

Even after deep analysis, fast16’s driver looks deceptively simple. Beneath that minimal code is a rule-driven in-memory engine that quietly patches executable code as files are read from disk.

The engine relies on a compact set of just over a hundred pattern-matching rules and a small dispatch table so it only inspects bytes that are likely to matter. Most patterns correspond to ordinary x86 instructions, but one stands out: a larger block of floating-point (FPU) code dedicated to precision arithmetic. This injected routine scales values in three internal arrays passed into the function, subtly changing calculations.

Without knowing the exact binaries and workloads being patched, we can’t fully resolve what those arrays represent, only that the goal is to tamper with numerical results, not unauthorized access, malware propagation or other common malware objectives.

The Patch Targets

Our best clues about the intended victims come from matching these patterns against large, era-appropriate software corpora. The strongest overlaps point to three high-precision engineering and simulation suites from the mid-2000s: LS-DYNA 970, PKPM, and the MOHID hydrodynamic modeling platform, all used for scenarios like crash testing, structural analysis, and environmental modeling.

LS-DYNA in particular has been cited in public reporting on Iran’s suspected violations of Section T of the JCPOA, in studies of computer modeling relevant to nuclear weapons development.

Use of LS-DYNA code to research explosive payloads for Iran’s AMAD program

Compiler Footprints and Lineage

As we sought to understand the lineage of this unusual set of components, we noticed a quirk. Strings of the form @(#)par.h $Revision: 1.3 $ inside the binaries point to an unusual source‑control convention. The @(#) prefix is characteristic of early Unix Source Code Control System (SCCS) or Revision Control System (RCS) tooling from the 1970s and 1980s. These markers do not affect execution and are redundant in modern Windows kernel drivers.

Finding SCCS/RCS artefacts in mid‑2000s Windows code is rare. It strongly suggests that the authors of this framework were not typical Windows‑only developers. Instead, they appear to have been long‑term engineers whose culture and toolchain came from older, high‑security Unix environments, often associated with government or military‑grade work. This detail supports the view that fast16 came from a well‑resourced, long‑running development program.

A Digital Fossil with Modern Implications

svcmgmt.exe was uploaded to VirusTotal nearly a decade ago. It still receives almost no detections: one engine classifies it as generally malicious, and even that with limited confidence. For a stealthy self-propagating carrier that deploys one of the most sophisticated sabotage drivers of its era, that detection record is notable.

Together with its appearance in the ShadowBrokers ‘Territorial Dispute’ (TeDi) signatures, fast16 forces a re‑evaluation of our historical understanding of the timeline of development for serious covert cyber sabotage operations. The code shows that:

state‑grade cybersabotage against physical targets was fully developed and deployed by the mid‑2000s,
embedded scripting engines, narrow compiler‑based targeting and kernel‑level patching formed a coherent architecture well ahead of better‑known families, and
some of the most important offensive capabilities in the ecosystem may still sit in collections as ‘old but interesting’ samples lacking the context to highlight their true significance.

Internally, the operation leaves very little in the way of branding. One of the few human‑readable labels is wry and understated:

*** Nothing to see here – carry on ***

For many years there were no public write-ups, no named campaign and no headline incident linked to this framework.

In the broader picture of APT evolution, fast16 bridges the gap between early, largely invisible development programs and later, more widely documented Lua‑ and LuaJIT‑based toolkits. It is a reference point for understanding how advanced actors think about long‑term implants, sabotage, and a state’s ability to reshape the physical world through software. fast16 was the silent harbinger of a new form of statecraft, successful in its covertness until today.

Acknowledgements

SentinelLABS would like to thank Silas Cutler and Costin Raiu for their contributions along the way. We dedicate this research to the memory of Sergey Mineev, APT hunter extraordinaire, who pioneered many of the techniques that enabled this discovery.

Appendix: Patching Engine Patterns and Target Candidates

Extracted Match Patterns

48 89 84 24 9C 00 00 00 4B 0F 8F 79 FF FF FF 00
D8 E1 D9 5D FC D9 04 00
55 8B EC 83 EC 14 53 56 57 8B 3D ?? ?? ?? ?? 8B 0D 00
89 4D C8 8B FB 8B C8 00
8B 4C 24 0C 8B 01 83 F8 63 00
39 2D ?? ?? ?? ?? 0F 84 F4 00 00 00 8B 35 ?? ?? ?? ?? 2B 35
7C 02 89 C6 89 35 ?? ?? ?? ?? 89 B4 24 D0
83 3D ?? ?? ?? ?? 00 0F 84 70 BD FF FF 00
BE 07 00 00 00 BF 04 00 00 00 BB 02 00 00 00 00
8B 4D 10 C1 E2 04 8B 19 83 EA 30 8B CB 49
8D 1D ?? ?? ?? ?? 52 8D 05 ?? ?? ?? ?? 51 8D 15 ?? ?? ?? ?? 8D 0D ?? ?? ?? ?? 53 50 52 51 56 57 E8 ?? ?? ?? ?? 83 C4 38 EB 0E 83 EC 04 00
0F 8F A5 00 00 00 A1 ?? ?? ?? ?? 83 F8 14 7D 0D
8B 5D B0 0F 85 ?? ?? ?? ?? 8D 34 9D ?? ?? ?? ?? 8D 14 9D 00 0F 8E 1B 03 00 00 D9 05
8B 45 44 6B 00 04 D9 05 ?? ?? ?? ?? D8 B0
E9 7E 04 00 00 8B 74 24 1C 8B 54 24 14 85
83 39 63 0F 85 21 03 00 00 8B EE 85 F6 0F
85 DB 8B 55 D4 75 2C 89 35 00
75 18 8D 35 ?? ?? ?? ?? 56 8D 3D 00
8D 1D ?? ?? ?? ?? 52 8D 05 ?? ?? ?? ?? 51 8D 15 ?? ?? ?? ?? 8D 0D ?? ?? ?? ?? 53 50 52 51 56 57 E8 ?? ?? ?? ?? EB 0E 83 EC 04 56 57 53 E8 95 00
D8 34 85 ?? ?? ?? ?? 8B 44 ?? ?? 8B CA 00
8B 5D 0C 8B 55 08 8B 36 8B 00
8D 04 BD ?? ?? ?? ?? 03 DF 00
8B EE 85 F6 0F 8E ?? ?? ?? ?? 8D 1C BD 00
D9 04 9D ?? ?? ?? ?? 83 ED 04 05 10 00 00 00 D8 0D 00
75 2C 89 35 ?? ?? ?? ?? 89 05 ?? ?? ?? ?? 89 15
89 55 F4 8B F9 8B D3 03 FB C1 E2 02 89 35
40 23 72 65 63 24 65 69 69 6E 20 2E 30 24 D9 5D 00 D9 03 D8 0D ?? ?? ?? ?? D8 0D 00
DF E0 F6 C4 41 A1 ?? ?? ?? ?? 74 5A
FF 35 ?? ?? ?? ?? E8 ?? ?? ?? ?? 9D D9 E0 D9 1D ?? ?? ?? ?? 8B 4C
6A 46 68 ?? ?? ?? ?? E8 ?? ?? ?? ?? 6A 03
D8 05 ?? ?? ?? ?? D9 55 00 9C
C2 08 00 A1 ?? ?? ?? ?? 8B 0C 85 ?? ?? ?? ?? 89 0E 00
83 EC 04 53 E8 ?? ?? ?? ?? EB 09 83 EC 04 53 00
D8 1D ?? ?? ?? ?? DF E0 F6 C4 41 B8 00 00 00 00 75 05 B8 01 00 00 00 85 C0 74 11 6A 29 00
2B DA 89 3C 03 83 3D 00
D9 5D C0 8B 4D C0 D9 45 E0 89 0E 00
8B 05 ?? ?? ?? ?? 8B 0D ?? ?? ?? ?? 0F 85 7E 00 00 00 0F AF 15 00
B9 01 00 00 00 C1 E7 02 8B BF ?? ?? ?? ?? 8B D7 85 FF 8B 55 30 8B 45 30 D8 C9 8B 75 2C 00 9A 8B 00 00 00 1B 00 90 0F 94 C3 0B D8 33 D2 83 3D 00
2B FB 8B DE C1 E3 02 89 7D A0 03 5D A0 8B 03 F7 F7 DB 0C 02 89 35
0F 0F 94 C0 23 C3 33 D2
8B 55 30 8B 75 2C D8 C9 8B 45 30 00
DD 05 ?? ?? ?? ?? 8B 05 ?? ?? ?? ?? 8B 15 ?? ?? ?? ?? 0F AF 05 ?? ?? ?? ?? 8B 1D ?? ?? ?? ?? 0F AF 15
68 28 00 00 00 57 E8 ?? ?? ?? ?? 8B 1D ?? ?? ?? ?? 8B 35 ?? ?? ?? ?? 0F AF 1D ?? ?? ?? ?? 8B 3D ?? ?? ?? ?? 8B 05
8B 75 38 8B 4D 34 D8 C9 8B 00
8B 55 88 8B 5D B0 83 7D 84 01
55 8B EC 83 EC 2C 33 D2 53 56 57 8B
55 8B EC 83 EC 2C B9 46 00 00 00 53 56 57 8B 00

Patch Target Candidate 1: LS-DYNA 970 Software Suite

The LS-DYNA suite is powerful engineering simulation software used to analyze how materials and structures behave under extreme conditions. The tool is used by engineers to simulate physical events and model conditions while avoiding expensive or dangerous experiments.

LS-DYNA is designed for handling dynamic, complex events that occur at speed, such as car crashes, explosions, impacts, metal forming, and manufacturing processes. It was commonly used by automotive companies, aerospace engineering, defense and military research, as well as manufacturing and materials science applications. LS-DYNA has been in development since 1976.

MD5	1d2f32c57ae2f2013f513d342925e972
SHA1	2fa28ef1c6744bdc2021abd4048eefc777dccf22
SHA256	5966513a12a5601b262c4ee4d3e32091feb05b666951d06431c30a8cece83010
File Size	5,225,591 bytes
Link time	2003-10-24 16:34:57 UTC
File Type	PE32 executable for MS Windows 4.00 (console), Intel i386, 7 sections

Patch Target Candidate 2: PKPM Software Suite

Practical Structural Design and Construction Software (PKPM) is a structural engineering CAD software suite widely used in China for building design. The suite comprises multiple executable modules covering the full lifecycle of structural building design, from structural layout and concrete shear design for beams and columns to seismic, wind, and load analysis for high-rise buildings.

PKPM’s core analysis engine, SATWE (Space Analysis of Tridimensional Wired Elements), handles tridimensional structural analysis across floors, beams, columns, walls, and frames. PKPM sees extensive use in Chinese civil engineering.

PKPM Concrete Code Shear Design Module

MD5	af4461a149bfd2ba566f2abefe7dcde4
SHA1	586edef41c3b3fba87bf0f0346c7e402f86fc11e
SHA256	09ca719e06a526f70aadf34fb66b136ed20f923776e6b33a33a9059ef674da22
File Size	7716864 bytes
File Type	PE32 executable for MS Windows 4.00 (GUI), Intel i386, 6 sections
Link Time	2011-08-26 10:58:17 UTC

PKPM Building Structure CAD Modules

MD5	49a8934ccd34e2aaae6ea1e6a6313ffe
SHA1	3ce5b358c2ddd116ac9582efbb38354809999cb5
SHA256	8b018452fdd64c346af4d97da420681e2e0b55b8c9ce2b8de75e330993b759a0
File Size	11849728 bytes
File Type	PE32 executable for MS Windows 4.00 (GUI), Intel i386, 4 sections
Link Time	2005-12-01 08:35:46 UTC

MD5	e0c10106626711f287ff91c0d6314407
SHA1	650fc6b3e4f62ecdc1ec5728f36bb46ba0f74d05
SHA256	06361562cc53d759fb5a4c2b7aac348e4d23fe59be3b2871b14678365283ca47
File Size	16355328 bytes
File Type	PE32 executable for MS Windows 4.00 (GUI), Intel i386, 5 sections
Link Time	2012-07-07 08:47:11 UTC

PKPM SATWE Structural Analysis Engine

MD5	2717b58246237b35d44ef2e49712d3a2
SHA1	d475ace24b9aedebf431efc68f9db32d5ae761bd
SHA256	bd04715c5c43c862c38a4ad6c2167ad082a352881e04a35117af9bbfad8e5613
File Size	9908224 bytes
File Type	PE32 executable for MS Windows 4.00 (GUI), Intel i386, 6 sections
Link Time	2011-01-12 06:37:39 UTC

MD5	daea40562458fc7ae1adb812137d3d05
SHA1	1ce1111702b765f5c4d09315ff1f0d914f7e5c70
SHA256	da2b170994031477091be89c8835ff9db1a5304f3f2f25344654f44d0430ced1
File Size	8454144 bytes
File Type	PE32 executable for MS Windows 4.00 (GUI), Intel i386, 7 sections
Link Time	2012-11-29 03:10:12 UTC

MD5	2740a703859cbd8b43425d4a2cacb5ec
SHA1	ca665b59bc590292f94c23e04fa458f90d7b20c9
SHA256	aeaa389453f04a9e79ff6c8b7b66db7b65d4aaffc6cac0bd7957257a30468e33
File Size	16568320 bytes
File Type	PE32 executable for MS Windows 4.00 (GUI), Intel i386, 5 sections
Link Time	2014-12-30 03:23:43 UTC

MD5	ebff5b7d4c5becb8715009df596c5a91
SHA1	829f8be65dfe159d2b0dc7ee7a61a017acb54b7b
SHA256	37414d9ca87a132ec5081f3e7590d04498237746f9a7479c6b443accee17a062
File Size	8089600 bytes
File Type	PE32 executable for MS Windows 4.00 (GUI), Intel i386, 6 sections
Link Time	2009-04-22 01:46:46 UTC

MD5	cb66a4d52a30bfcd980fe50e7e3f73f0
SHA1	e6018cd482c012de8b69c64dc3165337bc121b86
SHA256	66fe485f29a6405265756aaf7f822b9ceb56e108afabd414ee222ee9657dd7e2
File Size	9219072 bytes
File Type	PE32 executable for MS Windows 4.00 (GUI), Intel i386, 8 sections
Link Time	N/A

Additional PKPM CAD files

MD5	075b4aa105e728f2b659723e3f36c72c
SHA1	145ef372c3e9c352eaaa53bb0893749163e49892
SHA256	c11a210cb98095422d0d33cbd4e9ecc86b95024f956ede812e17c97e79591cfa
File Size	6852608 bytes
File Type	PE32 executable for MS Windows 4.00 (GUI), Intel i386, 6 sections
Link Time	2012-06-18 10:01:54 UTC

MD5	cf859f164870d113608a843e4a9600ab
SHA1	952ed694b60c34ba12df9d392269eae3a4f11be4
SHA256	7e00030a35504de5c0d16020aa40cbaf5d36561e0716feb8f73235579a7b0909
File Size	8392704 bytes
File Type	PE32 executable for MS Windows 4.00 (GUI), Intel i386, 6 sections
Link Time	2012-11-29 03:10:12 UTC

Candidate 3: MOHID Software Suite

Modelo Hidrodinâmico (Portuguese for “Hydrodynamic Model” or MOHID) is an open-source water modeling system developed by MARETEC (Marine and Environmental Technology Research Center) at the Instituto Superior Técnico in Lisbon, Portugal. The software is used for marine and coastal water modeling, covering hydrodynamics, water quality simulation, sediment transport, oil spill modeling, and Lagrangian particle tracking.

At this time, we cannot definitively identify the target and welcome contributions from the broader research community to aid understanding of the intended effects of attacking this software.

MD5	f4dbbb78979c1ee8a1523c77065e18a5
SHA1	9e089a733fb2740c0e408b2a25d8f5a451584cf6
SHA256	e775049d1ecf68dee870f1a5c36b2f3542d1182782eb497b8ccfd2309c400b3a
File Size	5443584 bytes
File Type	PE32 executable for MS Windows 4.00 (console), Intel i386, 3 sections
Link Time	2002-10-18 09:29:54 UTC

Indicators of Compromise

Name	fast16.sys
MD5	0ff6abe0252d4f37a196a1231fae5f26
SHA1	92e9dcaf7249110047ef121b7586c81d4b8cb4e5
SHA256	07c69fc33271cf5a2ce03ac1fed7a3b16357aec093c5bf9ef61fbfa4348d0529

Name	connotify.dll
MD5	410eddfc19de44249897986ecc8ac449
SHA1	675cb83cec5f25ebbe8d9f90dea3d836fcb1c234
SHA256	8fcb4d3d4df61719ee3da98241393779290e0efcd88a49e363e2a2dfbc04dae9

Name	svcmgmt.exe
MD5	dbe51eabebf9d4ef9581ef99844a2944
SHA1	de584703c78a60a56028f9834086facd1401b355
SHA256	9a10e1faa86a5d39417cae44da5adf38824dfb9a16432e34df766aa1dc9e3525

YARA Rules

import "pe"

rule apt_fast16_carrier {
    meta:
        author = "SentinelLABS/vk"
        date = "2025-04-07"
        description = "Catches fast16 carrier, its Lua payload, and plaintext variants"
        hash = "9a10e1faa86a5d39417cae44da5adf38824dfb9a16432e34df766aa1dc9e3525"
    strings:
        $lua_magic = { 1B 4C 75 61 } //Lua bytecode magic

        //Decrypted strings
        $s1 = "build_wormlet_table"
        $s2 = "unpropagate"
        $s3 = "worm_install_failure_action"
        $s4 = "implant_install_failure_action"
        $s5 = "scm_wormlet_propagate_system"
        $s6 = "scm_wormlet_install"
        $s7 = "scm_wormlet_init"
        $s8 = "scm_copy_payload"
        $s9 = "get_logged_on_user"
        $s10 = "logged_on_program"
        $s11 = "phase_1_prop_delay"
        $s12 = "connotify_pipename"
        $s13 = "cndll_internal_name"
        $s14 = "connotify_provider_key"
        $s15 = "check_implant_reg_values"
        $s16 = "set_implant_reg_values"
        $s17 = "install_implant"
        $s18 = "implant_installed"
        $s19 = "implant_internal_name"
        $s20 = "implant_files"
        $s21 = "implant_owner"
        $s22 = "install_worm"
        $s23 = "start_worm"
        $s24 = "implant_install_failure_action"
        $s25 = "worm_install_failure_action"
        $s26 = "ok_to_propagate"
        $s27 = "no_firewall_check"
        $s28 = "scm_wormlet"
        $s29 = "implant_install_failure_action"
        $s30 = "worm_install_failure_action"

        //Encrypted strings
        $e1 = { 98 18 A1 94 24 E3 A2 4C  61 C8 AE 04 DC 4E 03 CD 0D 9D F0 }
        $e2 = { E8 76 53 6D D4 B9 6E 28  6C 5D C2 }
        $e3 = { 7D B7 14 73 F0 C0 4D 53  BB F7 0A 4A 3A 63 05 92  EC 0A 11 BC 22 59 99 05  72 05 19 }
        $e4 = { 88 5F 1B E4 45 56 75 4B  A5 3D 19 0B 3F 30 5A 85  E2 BD D0 E7 1C 13 D0 1D  BD D8 CF A1 88 DB }
        $e5 = { 88 1E 54 4E 00 C1 EF 79  AA AD 9F 50 27 B5 B8 4C  32 06 D2 7B 32 E3 AF D6  DC D2 BB 83 }
        $e6 = { 39 F9 BC E9 27 70 C4 3E  04 2A 7D E1 68 67 B7 ED  D4 41 6A }
        $e7 = { 13 FC 24 20 1F 20 74 1B  E5 5F 59 56 D7 61 3E BD }
        $e8 = { EF 94 49 63 33 41 62 F2  26 A6 48 DE 6D 7B A4 CF }
        $e9 = { 36 5F 5E E5 C1 1A 17 6A  4E B9 94 52 1B DC C6 60  CA C7 }
        $e10 = { B3 9C A3 F1 12 CC 52 74  34 5F 87 43 32 21 36 7B 2A }

        $rk1 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\Symantec\\InstalledApps"
        $rk2 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\Sygate Technologies, Inc.\\Sygate Personal Firewall"
        $rk3 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\TrendMicro\\PFW"
        $rk4 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\Zone Labs\\TrueVector"
        $rk5 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\F-Secure"
        $rk6 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\Network Ice\\BlackIce"
        $rk7 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\McAfee.com\\Personal Firewall"
        $rk8 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\ComputerAssociates\\eTrust EZ Armor"
        $rk9 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\RedCannon\\Fireball"
        $rk10 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\Kerio\\Personal Firewall 4"
        $rk11 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\KasperskyLab\\InstalledProducts\\Kaspersky Anti-Hacker"
        $rk12 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\Tiny Software\\Tiny Firewall"
        $rk13 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Uninstall\\Look n Stop 2.05p2"
        $rk14 = "HKEY_CURRENT_USER\\SOFTWARE\\Soft4Ever"
        $rk15 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\Norman Data Defense Systems"
        $rk16 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\Agnitum\\Outpost Firewall"
        $rk17 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\Panda Software\\Firewall"
        $rk18 = "HKEY_LOCAL_MACHINE\\SOFTWARE\\InfoTeCS\\TermiNET"

        $c1 = { 86 3A D6 02 } // A crypto constant
        $c2 = { 01 E1 F5 05 } // A crypto constant

        $code1 = { 8B 00           // mov     eax, [eax]
        2D 2F 34 21 33  // sub     eax, 3321342Fh
        } // Code to deobfuscate real storage container length

        $stor1 = { CC 00 00 00 05 00 00 00 66 69 6C 65 00 CD 00 00 00 } //Storage record with file string
    condition:
        ( uint16(0)==0x5a4d and filesize < 10MB and (
        ( 3 of ($s*) ) or
        ( 12 of ($rk*) ) or
        ( any of ($e*) ) or
        ( all of ($c*) and @c2-@c1 < 0x100 ) or
        ( $code1 ) or
        ( $stor1 )) ) or
        ( $lua_magic and 7 of ($s*) )
}

rule apt_fast16_driver {
    meta:
        author = "SentinelLABS/vk"
        last_modified = "2026-04-15"
        description = "Catches fast16 driver or related project files"
        hash = "07c69fc33271cf5a2ce03ac1fed7a3b16357aec093c5bf9ef61fbfa4348d0529"
    strings:
        $a1 = "@(#)foo.c : "
        $a2 = "@(#)par.h : "
        $a3 = "@(#)pae.h : "
        $a4 = "@(#)fao.h : "
        $a5 = "@(#)uis.h : "
        $a6 = "@(#)ree.h : "
        $a7 = "@(#)fir.h : "
        $a8 = "@(#)fir.c : "
        $a9 = "@(#)par.h : "
        $a10 = "@(#)pae.h : "
        $a11 = "@(#)fao.h : "
        $a12 = "@(#)uis.h : "
        $a13 = "@(#)ree.h : "
        $a14 = "@(#)fir.h : "
        $a15 = "@(#)myy.h : "
        $a16 = "@(#)fic.h : "
        $a17 = "@(#)ree.h : "
        $a18 = "@(#)ree.c : "
        $dev1 = "\\Device\\fast16"
        $dev2 = "\\??\\fast16"
        $pdb1 = "C:\\buildy\\"
        $pdb2 = "driver\\fd\\i386\\fast16.pdb"
        $devtype = { 68 7C A5 00 00 } // push 0A57Ch ; DeviceType
        $api1 = {50 C6 45 D4 16 C6 45 D5 2B C6 45 D6 12 C6 45 D7 3F C6 45 D8 3F C6 45 D9 3C C6 45 DA 30 C6 45 DB 32 C6 45 DC 27 C6 45 DD 36 C6 45 DE 03 C6 45 DF 3C C6 45 E0 3C C6 45 E1 3F C6 45 E2 53 } // push xored "ExAllocatePool"
        $api2 = {C6 45 A8 16 C6 45 A9 2B C6 45 AA 12 C6 45 AB 3F C6 45 AC 3F C6 45 AD 3C C6 45 AE 30 C6 45 AF 32 C6 45 B0 27 C6 45 B1 36 C6 45 B2 03 C6 45 B3 3C C6 45 B4 3C C6 45 B5 3F C6 45 B6 04 C6 45 B7 3A C6 45 B8 27 C6 45 B9 3B C6 45 BA 07 C6 45 BB 32 C6 45 BC 34 C6 45 BD 53} // push xored "ExAllocatePoolWithTag"
        $api3 = {C6 45 E4 16 C6 45 E5 2B C6 45 E6 15 C6 45 E7 21 C6 45 E8 36 C6 45 E9 36 C6 45 EA 03 C6 45 EB 3C C6 45 EC 3C C6 45 ED 3F C6 45 EE 53} // push xored "ExFreePool"
        $api4 = {C6 45 C0 16 C6 45 C1 2B C6 45 C2 15 C6 45 C3 21 C6 45 C4 36 C6 45 C5 36 C6 45 C6 03 C6 45 C7 3C C6 45 C8 3C C6 45 C9 3F C6 45 CA 04 C6 45 CB 3A C6 45 CC 27 C6 45 CD 3B C6 45 CE 07 C6 45 CF 32 C6 45 D0 34 C6 45 D1 53} // push xored "ExFreePoolWithTag"
    condition:
        filesize < 10MB and 
        ( uint16(0)==0x5a4d and
        ( ( 2 of ($pdb*) ) or
        ( $pdb1 and 1 of ($a*) ) or
        ( #devtype == 3 and
        pe.machine == pe.MACHINE_I386 and
        pe.subsystem == pe.SUBSYSTEM_NATIVE) or
        any of ($api*) or
        2 of ($dev*))) or 
        ( 6 of ($a*))
}

rule clean_fast16_patchtarget {
    meta:
        author = "SentinelLABS/vk"
        last_modified = "2026-04-15"
        description = "Detects fast16 patch target software (most probably clean)"
        hash = "8fcb4d3d4df61719ee3da98241393779290e0efcd88a49e363e2a2dfbc04dae9"
    strings:
        $el0 = { 48 89 84 24 9C 00 00 00 4B 0F 8F 79 FF FF FF 00 }
        $el10 = { D8 E1 D9 5D FC D9 04 00 }
        $el12 = { 55 8B EC 83 EC 14 53 56 57 8B 3D ?? ?? ?? ?? 8B 0D 00 }
        $el13 = { 89 4D C8 8B FB 8B C8 00 }
        $el14 = { 8B 4C 24 0C 8B 01 83 F8 63 00 }
        $el16 = { 39 2D ?? ?? ?? ?? 0F 84 F4 00 00 00 8B 35 ?? ?? ?? ?? 2B 35 }
        $el2 = { 7C 02 89 C6 89 35 ?? ?? ?? ?? 89 B4 24 D0 }
        $el23 = { 83 3D ?? ?? ?? ?? 00 0F 84 70 BD FF FF 00 }
        $el25 = { BE 07 00 00 00 BF 04 00 00 00 BB 02 00 00 00 00 }
        $el26 = { 8B 4D 10 C1 E2 04 8B 19 83 EA 30 8B CB 49 }
        $el28 = { 8D 1D ?? ?? ?? ?? 52 8D 05 ?? ?? ?? ?? 51 8D 15 ?? ?? ?? ?? 8D 0D ?? ?? ?? ?? 53 50 52 51 56 57 E8 ?? ?? ?? ?? 83 C4 38 EB 0E 83 EC 04 00 }
        $el3 = { 0F 8F A5 00 00 00 A1 ?? ?? ?? ?? 83 F8 14 7D 0D }
        $el30 = { 8B 5D B0 0F 85 ?? ?? ?? ?? 8D 34 9D ?? ?? ?? ?? 8D 14 9D 00 0F 8E 1B 03 00 00 D9 05 }
        $el31 = { 8B 45 44 6B 00 04 D9 05 ?? ?? ?? ?? D8 B0 }
        $el32 = { E9 7E 04 00 00 8B 74 24 1C 8B 54 24 14 85 }
        $el33 = { 83 39 63 0F 85 21 03 00 00 8B EE 85 F6 0F }
        $el34 = { 85 DB 8B 55 D4 75 2C 89 35 00 }
        $el36 = { 75 18 8D 35 ?? ?? ?? ?? 56 8D 3D 00 }
        $el37 = { 8D 1D ?? ?? ?? ?? 52 8D 05 ?? ?? ?? ?? 51 8D 15 ?? ?? ?? ?? 8D 0D ?? ?? ?? ?? 53 50 52 51 56 57 E8 ?? ?? ?? ?? EB 0E 83 EC 04 56 57 53 E8 95 00 }
        $el39 = { D8 34 85 ?? ?? ?? ?? 8B 44 ?? ?? 8B CA 00 }
        $el4 = { 8B 5D 0C 8B 55 08 8B 36 8B 00 }
        $el40 = { 8D 04 BD ?? ?? ?? ?? 03 DF 00 }
        $el41 = { 8B EE 85 F6 0F 8E ?? ?? ?? ?? 8D 1C BD 00 }
        $el42 = { D9 04 9D ?? ?? ?? ?? 83 ED 04 05 10 00 00 00 D8 0D 00 }
        $el43 = { 75 2C 89 35 ?? ?? ?? ?? 89 05 ?? ?? ?? ?? 89 15 }
        $el45 = { 89 55 F4 8B F9 8B D3 03 FB C1 E2 02 89 35 }
        $el46 = { 40 23 72 65 63 24 65 69 69 6E 20 2E 30 24 D9 5D 00 D9 03 D8 0D ?? ?? ?? ?? D8 0D 00 }
        $el49 = { DF E0 F6 C4 41 A1 ?? ?? ?? ?? 74 5A }
        $el51 = { FF 35 ?? ?? ?? ?? E8 ?? ?? ?? ?? 9D D9 E0 D9 1D ?? ?? ?? ?? 8B 4C }
        $el53 = { 6A 46 68 ?? ?? ?? ?? E8 ?? ?? ?? ?? 6A 03 }
        $el56 = { D8 05 ?? ?? ?? ?? D9 55 00 9C }
        $el59 = { C2 08 00 A1 ?? ?? ?? ?? 8B 0C 85 ?? ?? ?? ?? 89 0E 00 }
        $el6 = { 83 EC 04 53 E8 ?? ?? ?? ?? EB 09 83 EC 04 53 00 }
        $el61 = { D8 1D ?? ?? ?? ?? DF E0 F6 C4 41 B8 00 00 00 00 75 05 B8 01 00 00 00 85 C0 74 11 6A 29 00 }
        $el63 = { 2B DA 89 3C 03 83 3D 00 }
        $el68 = { D9 5D C0 8B 4D C0 D9 45 E0 89 0E 00 }
        $el70 = { 8B 05 ?? ?? ?? ?? 8B 0D ?? ?? ?? ?? 0F 85 7E 00 00 00 0F AF 15 00 }
        $el73 = { B9 01 00 00 00 C1 E7 02 8B BF ?? ?? ?? ?? 8B D7 85 FF 8B 55 30 8B 45 30 D8 C9 8B 75 2C 00 9A 8B 00 00 00 1B 00 90 0F 94 C3 0B D8 33 D2 83 3D 00 }
        $el75 = { 2B FB 8B DE C1 E3 02 89 7D A0 03 5D A0 8B 03 F7 F7 DB 0C 02 89 35 }
        $el80 = { 0F 0F 94 C0 23 C3 33 D2 }
        $el81 = { 8B 55 30 8B 75 2C D8 C9 8B 45 30 00 }
        $el83 = { DD 05 ?? ?? ?? ?? 8B 05 ?? ?? ?? ?? 8B 15 ?? ?? ?? ?? 0F AF 05 ?? ?? ?? ?? 8B 1D ?? ?? ?? ?? 0F AF 15 }
        $el89 = { 68 28 00 00 00 57 E8 ?? ?? ?? ?? 8B 1D ?? ?? ?? ?? 8B 35 ?? ?? ?? ?? 0F AF 1D ?? ?? ?? ?? 8B 3D ?? ?? ?? ?? 8B 05 }
        $el94 = { 8B 75 38 8B 4D 34 D8 C9 8B 00 }
        $el96 = { 8B 55 88 8B 5D B0 83 7D 84 01 }
        $el97 = { 55 8B EC 83 EC 2C 33 D2 53 56 57 8B }
        $el99 = { 55 8B EC 83 EC 2C B9 46 00 00 00 53 56 57 8B 00 }
    condition:
        filesize < 20MB and
        uint16(0) == 0x5A4D and
        2 of them
}

rule apt_fast16_patch {
	meta:
		author = "SentinelLABS/vk"
		last_modified = "2026-04-15"
		description = "Detects the fast16 patch code. May be present in statically patched files or memory dumps."
		hash = "0ff6abe0252d4f37a196a1231fae5f26"
	strings:
		$p1 = { 55 88 50 53 52 51 8D 64 24 94 DD 34 24 51 E8 ?? ?? ?? ?? 59 81 E9 14 00 00 00 8B 99 50 0F 00 00 83 FB 28 76 04 6A 31 }
		$p2 = { 59 81 E9 EE 00 00 00 6A 02 BB B4 05 00 00 01 CB C6 03 EB 43 C6 03 15 8B 44 24 78 83 C0 07 89 81 EC 07 00 00 E9 BF 02 00 00 }
		$p3 = { 50 53 52 51 E8 ?? ?? ?? ?? 59 81 E9 78 01 00 00 D9 99 C4 0F 00 00 8D 64 24 94 DD 34 24 FF B1 C4 0F 00 00 6A 02 EB 2D }
	condition:
		any of them
}

↑ top

7.TurboQuant: A First-Principles Walkthrough

Sourcehttps://arkaung.github.io/interactive-turboquant/

Sitearkaung.github.io

Submitterkweezar (Hacker News)

Submitted2026-04-27 01:54 UTC (Hacker News)

HN activity10 points · 0 comments

Length5.9K words (~26 min read)

Languageen

TurboQuant: A First-Principles Walkthrough

Compressing AI vectors to 2–4 bits per number
without losing accuracy.

Modern language models store large tables of high-dimensional vectors: KV caches, embeddings, attention keys. TurboQuant compresses each coordinate of these vectors to 2–4 bits with provably near-optimal distortion, no memory overhead for scale factors, and no training or calibration. This page explains how it works.

The single load-bearing idea: in high dimensions, a random rotation turns every input vector into one whose coordinates follow a known fixed distribution. A codebook designed once for that distribution can then be reused for every input. Everything else on this page is the construction that puts this observation to work.

§0 · Primer: jargon decoder

Eight ideas the rest of the page is built on.

Each mini-demo below covers one concept used later. Skip the ones you already know.

§0.1 · Vector

A list of numbers. An arrow in space.

A vector is an ordered list: [0.3, −1.2]. Geometrically it is an arrow from the origin. A d-dimensional vector is an arrow in $d$-space, hard to picture past 3-D, but the rules are the same.

↕ drag tip

coords[0.70, 0.50] length0.86

§0.2 · Length ‖x‖ & Inner Product ⟨x,y⟩

How much one vector points along another.

Length = $\sqrt{x_1^2+x_2^2+\dots}$. Inner product $\langle x,y\rangle = x_1 y_1 + x_2 y_2 + \dots = \|x\|\|y\|\cos\theta$. The inner product reaches its largest positive value when the two arrows point in the same direction. It drops to zero when the two arrows are perpendicular. It becomes negative when the arrows point in opposite directions, with its most negative value when they point exactly opposite.

↕ drag either tip

‖x‖1.00 ‖y‖1.00 ⟨x,y⟩0.00 angle90°

§0.3 · Mean Squared Error

Why we square the mistake.

Error is the distance between a guess and the truth. Scoring a guess by the signed error lets positive and negative errors cancel, which means the score does not penalise being off. Squaring forces every error to count as a positive number and gives big errors a larger penalty than small ones. The guess that minimises the mean of squared errors is the data’s average: it is the unique number that minimises the sum of squared distances to the points.

The average has a property we will use in §7. It lies between the data’s most extreme points, so its magnitude is smaller than at least one of them. When a quantizer compresses a whole bin of values down to the bin’s average, the stored value is smaller in magnitude than the bin’s largest values. The reconstruction is a shrunken version of the input. An inner product against a shrunken reconstruction comes out smaller than the same inner product against the input.

Guess0.00

mean of data0.00 MSE at guess1.00 MSE at mean1.00

§0.4 · Unbiased vs Biased Estimator

Noisy is fine. Systematically off is not.

An estimator is a procedure that takes data and returns a guess $\hat\theta$ for an unknown truth $\theta$. Repeat it on fresh data and the guesses form a cloud. The cloud can fail in two independent ways. Variance is one: individual guesses are noisy. Bias is the other: the procedure is wrong even after averaging many guesses. An estimator with $\mathbb{E}[\hat\theta]=\theta$ is unbiased; the cloud’s centre sits at $\theta$ regardless of the cloud’s width.

The bullseye below shows both failure modes. Bias is the distance from the cloud’s centre to the crosshair. Variance is the width of the cloud. The two quantities are independent of each other. §7 runs the same bullseye against the MSE quantizer of §6, and the cloud’s centre lands away from the crosshair. §8 runs it against a different estimator whose cloud centres on the crosshair.

Mode

shots0 mean of shots– bias–

§0.5 · Rotation

A rigid spin. Preserves lengths and angles.

A rotation matrix $R$ spins space. The key property: $\|Rx\|=\|x\|$ and $\langle Rx,Ry\rangle=\langle x,y\rangle$. Rotation only changes the basis the coordinates are written in, not the geometry.

↕ drag tip

Angle θ0°

‖x‖ before → after1.41 → 1.41 preserved?yes

§0.6 · Where bell-curves come from (CLT)

Add up many small randoms → Gaussian.

The Central Limit Theorem says that summing enough independent random numbers produces a distribution close to a bell curve. The shape of each individual term in the sum does not affect the limit. A sum of coin flips converges to the same Gaussian shape as a sum of uniform draws or a sum of skewed draws. A rotated coordinate is one of these sums: it is a weighted combination of every coordinate of the original vector, with random weights. After a random rotation, each new coordinate is therefore approximately Gaussian, which is the property TurboQuant relies on for every input.

Terms in sum n1

Source

source shape±1 coin converged?no, n too small

§0.7 · Life in many dimensions

Coordinates of a random unit vector are all small.

Pick a random point on a unit sphere in $d$ dimensions. In 2-D any coordinate is possible. In 100-D, almost every coordinate is close to $\pm 1/\sqrt{d}$. This is measure concentration, and it is the core fact TurboQuant exploits.

Dim d2

std of x₁≈ 0.71 1/√d0.71

§0.8 · Quantization, in one dimension

Snap every number to the nearest of $2^b$ levels.

This is what $b$ bits per number means. With $b=2$ you get 4 levels, $b=3$ gives 8. The gap between levels is your worst-case error. Adding one bit halves the gap, so the squared error drops by 4× per bit, the $4^{-b}$ factor that shows up later.

Bits b2

levels4 gap Δ0.667 max error0.333

■ CHEAT SHEETEight ideas, one sentence each

Vector: ordered list of numbers / arrow from the origin. Length & inner product: the norm $\sqrt{\sum x_i^2}$ and how much two vectors point the same way. MSE: average squared error. Unbiased: the average of many estimates equals the truth. Rotation: change of basis that preserves lengths and angles. CLT: sum of many independent randoms converges to a Gaussian. High-D concentration: coordinates of a random unit vector in $d$-space cluster near $\pm 1/\sqrt d$. Quantization: snap each number to one of $2^b$ levels; one extra bit quarters the squared error.

§1 · Vector quantization

What is vector quantization, really?

You have a vector $\mathbf{x}\in\mathbb{R}^d$, say $d{=}1536$ floats from an OpenAI embedding. You want to store it using $b$ bits per coordinate (total $b\cdot d$ bits), then later recover an approximation $\tilde{\mathbf{x}}$ close to $\mathbf{x}$. Closeness is measured by

MSE distortion $D_{\text{mse}} = \mathbb{E}\big[\,\|\mathbf{x} - \tilde{\mathbf{x}}\|_2^2\,\big]$ or inner-product error $D_{\text{prod}} = \mathbb{E}\big[\,|\langle\mathbf{y},\mathbf{x}\rangle - \langle\mathbf{y},\tilde{\mathbf{x}}\rangle|^2\,\big]$

The second one matters because attention scores and nearest-neighbor queries are all inner products. We would like the estimator to be unbiased: $\mathbb{E}[\langle\mathbf{y},\tilde{\mathbf{x}}\rangle] = \langle\mathbf{y},\mathbf{x}\rangle$.

■ KEY WORDS

MSE distortion: average squared error between the true vector and its reconstruction, primer §0.3.

Inner product $\langle y, x\rangle$: how much two vectors point the same way, primer §0.2. This is what attention computes.

Estimator: a rule (here: quantize, then decode) that returns an approximation $\hat s$ of a true number $s$.

Unbiased estimator: across many queries, the average of $\hat s$ equals $s$. Individual estimates can be noisy; the mean is on target. Primer §0.4.

The obvious quantizer

For each coordinate, pick the closest of $2^b$ evenly-spaced levels in $[-1, 1]$. That is $b$ bits per number. The same rule runs in 2D and 3D first, where the geometry is visible, before the high-dimensional version below.

First, in 2D

Drag the tip of the vector. The vector snaps to the nearest point of a $2^b \times 2^b$ grid. The green arrow shows the original input. The blue arrow shows where the input is quantized to. The red segment between them is the reconstruction error $\mathbf{x} - \tilde{\mathbf{x}}$.

Bits b2

Preset:

↕ drag tip

‖error‖ / ‖x‖

–

levels per axis

grid points

Same trick in 3D

A $2^b$-level grid on three axes gives $2^{3b}$ snap points. Drag the canvas to orbit the view. The spike preset shows where the construction breaks: the input lies near one axis and falls between two grid levels, which is where the reconstruction error is largest.

Bits b2

Preset:

↕ drag tip · ↻ orbit

‖error‖ / ‖x‖

–

levels per axis

grid points

Now at scale (d up to 128)

The same rule applied to every coordinate of a high-dimensional vector. You cannot see the grid anymore, but the per-coordinate errors are still there.

Bits b3

Dimension d64

Input:

original $x_i$ quantized $\tilde{x}_i$

‖x − x̃‖² / ‖x‖²

–

levels per coord

–

bits used

–

Select the spike input. The naive quantizer's grid is spaced evenly over $[-1, 1]$. The input has almost all of its magnitude in a single coordinate, whose value falls between the two grid levels nearest to it and so reconstructs poorly. The remaining coordinates are near zero and consume most of the levels despite carrying little of the input's information.

■ TAKEAWAY · NEXT §2where the gap shows up

A fixed grid produces small reconstruction errors on inputs whose coordinates are roughly uniform in magnitude, and large reconstruction errors on inputs whose magnitude is concentrated in one or a few coordinates. Next: §2 shows how production systems handle the second case and what they pay for the fix.

§2 · Why naive fails

The adversarial coordinate, and why production systems pay a tax

Real embeddings are rarely flat. Trained models produce outlier channels, a few coordinates much larger than the rest. A fixed $[-L, L]$ grid either clips the outliers or wastes resolution on the bulk. Production quantizers (GPTQ, AWQ, KIVI, KVQuant) work around this by computing $(\min, \max)$ (or zero-point and scale) for every small block and storing those in full precision as side information.

The catch. To decode any block you also need its scale and zero-point, two float16 numbers (32 extra bits) stored next to every 16–64 quantized values. Walk through one case: a block of 32 numbers at 3 bits each is 96 payload bits, plus 32 metadata bits, which works out to 4 bits per number, not 3. Smaller blocks of 16 numbers push it to 5 bits per number. The advertised 3-bit scheme is really a 4–5-bit scheme once you count everything. TurboQuant matches this worst-case quality while storing zero per-block metadata.

■ DEMO · feel the catch same b bits/value, three strategies

A 64-dimensional vector whose coordinates are mostly small, with one large outlier shown in red. Three quantizers reconstruct the same vector at the same b-bit budget. Strategy A uses a single fixed grid for the whole vector. Strategy B adapts the grid per block, at the cost of a float16 header per block. Strategy C rotates the vector first and then applies a single fixed grid. The metrics report the RMSE of each reconstruction and the effective bits-per-value once the metadata cost is included.

Outlier magnitude4.0

Bit budget b3

Block size s16

Source vector (outlier in red, dashed lines = fixed grid range)

A. Fixed grid [−L, L]

one global range, b bits/value, no header. Outlier clips.

RMSE

–

bits/value

–

overhead

B. Per-block scale + zero

float16 scale+zero per block (dashed dividers). Outlier fits, header taxes you.

RMSE

–

bits/value

–

overhead

–

C. Rotate → fixed grid

rotation smears the spike across all 64 coords. One global grid works, no header.

RMSE

–

bits/value

–

overhead

Read the storage line. The effective bits-per-value works out to b + 32/s for the per-block scheme and to b for the other two, because only the per-block scheme stores a float16 scale and zero-point (32 bits together) for every block of s elements. At b=3, s=16 the per-block cost works out to 3 + 2 = 5 bits/value, a 66% surcharge over the nominal b. Strategy C achieves the same storage cost as strategy A while producing the reconstruction quality of strategy B. The rest of this page explains the construction that makes that possible.

■ TAKEAWAY · NEXT §3one fixed recipe, any input

Production quantizers handle outliers by paying a per-block metadata tax. TurboQuant must instead be data-oblivious: a single procedure that runs on every vector with no calibration set and no per-block headers. Next: §3 introduces the move that makes a fixed grid work for every input.

§3 · The rotation trick

Multiply by a random rotation. Watch the spike dissolve.

The rotation trick: apply a random orthogonal transform $\boldsymbol{\Pi}$, then quantize coordinate-wise. Rotation is lossless, it preserves length and inner products exactly:

$\|\boldsymbol{\Pi}\mathbf{x}\|_2 = \|\mathbf{x}\|_2$ · $\langle \boldsymbol{\Pi}\mathbf{x},\,\boldsymbol{\Pi}\mathbf{y}\rangle = \langle\mathbf{x},\mathbf{y}\rangle$ · $\boldsymbol{\Pi}^{\!\top}\boldsymbol{\Pi} = \mathbf{I}$

Because rotation is exact, all reconstruction error comes from the quantization step alone. After a uniformly random rotation, every coordinate of $\boldsymbol{\Pi}\mathbf{x}$ follows the same fixed Beta density (Lemma 1 of the paper), regardless of what $\mathbf{x}$ looked like. A single codebook designed once for that density is then optimal for every input. We build the codebook in §5.

How to construct $\boldsymbol{\Pi}$

Generate a $d\times d$ matrix of i.i.d. $\mathcal{N}(0,1)$ entries and run QR decomposition; keep the orthogonal factor $Q$. The result is uniform on the orthogonal group $O(d)$, which is what Lemma 1 needs.

A spike in 2D

Start with the extreme case: a vector with all of its magnitude in one coordinate, $(1, 0)$. Rotate by angle $\theta$ and observe how the magnitude is redistributed across the two coordinates. At $\theta{=}45°$ the magnitude is split evenly between the two coordinates, giving $(\tfrac{1}{\sqrt 2}, \tfrac{1}{\sqrt 2})$. The total length of the vector stays the same throughout.

Angle θ30°

geometry

↕ drag tip

coordinate magnitudes

max |coord|

–

length ‖x‖

1.000

θ at even split

45°

A spike in 3D

The same construction in three dimensions. The spike $(1, 0, 0)$ is rotated by a random orthogonal matrix, which spreads the input's magnitude across all three coordinates of the output. The total length of the vector is preserved. Each fresh draw of the random rotation produces a different spread.

geometry (drag to orbit)

↕ drag tip · ↻ orbit

coordinate magnitudes

max |coord|

–

length ‖x‖

1.000

typical max at random rot.

≈ 0.80

At high dimension

A single rotation in 2-D reduces the largest coordinate to at most half the input's magnitude. A random rotation in 3-D typically leaves one coordinate around $0.7$. At $d{=}64$ the largest coordinate after rotation is around $1/\sqrt d \approx 0.125$, regardless of how concentrated the input was.

Dimension d64

Input:

before, $|x_i|$ (Cartesian)

after, $|(\boldsymbol{\Pi}\mathbf{x})_i|$

max |xᵢ| / ‖x‖

–

max |(Πx)ᵢ| / ‖x‖

–

‖x‖ (preserved)

–

■ TAKEAWAY · NEXT §4no spike survives a random rotation

Rotation preserves length and inner products. The only thing it changes is which coordinates contain the magnitude of the vector. A vector with all of its mass concentrated in one coordinate becomes, after rotation, a vector whose mass is spread across all $d$ coordinates. Because every input is rotated before quantization, every input that gets quantized is of this spread-out kind. Next: §4 explains why rotation flattens spikes using the geometry of high-dimensional spheres.

§4 · Why rotation works

Coordinates of random unit vectors are nearly Gaussian.

Rotating $\mathbf{x}$ by a uniformly random $\boldsymbol{\Pi}$ is the same as picking a random point on the sphere of radius $\|\mathbf{x}\|$. So the question “what does a coordinate of $\boldsymbol{\Pi}\mathbf{x}$ look like?” is the same question as “what does a coordinate of a uniform point on the sphere look like?”

In low dimensions the answer is far from a bell curve. In 2-D the marginal is the arcsine density, which is U-shaped with peaks at $\pm 1$. In 3-D it is uniform on $[-1, 1]$. As $d$ grows the marginal narrows and converges to a Gaussian with variance $1/d$. The convergence is visible in the demos that follow.

The exact density (Lemma 1)

For a uniform point on $\mathbb{S}^{d-1}$, the marginal density of any single coordinate is

$f_X(x) \;=\; \dfrac{\Gamma(d/2)}{\sqrt{\pi}\,\Gamma((d-1)/2)}\,(1-x^2)^{(d-3)/2},\quad x\in[-1,1]$

a scaled/shifted Beta distribution. It converges pointwise to $\mathcal{N}(0,\,1/d)$ as $d\to\infty$.

Step one: the circle ($d=2$)

Sample 2000 points uniformly from the unit circle and look at a single coordinate, say $x_1$. The marginal is the arcsine density $\tfrac{1}{\pi\sqrt{1-x^2}}$, which is U-shaped with peaks at $\pm 1$. The shape is far from Gaussian: any value of $x_1$ between $-1$ and $+1$ is possible, and the endpoints are more likely than the middle.

points on the unit circle

marginal of $x_1$

shape of $x_1$

arcsine

std of $x_1$

–

$1/\sqrt{d}$

0.707

Step two: the sphere ($d=3$)

Now sample uniformly from the unit sphere in 3-D. The marginal of one coordinate is uniform on $[-1, 1]$ (Archimedes' hat-box theorem). The marginal is still not a bell curve. Drag to orbit the view.

points on the unit sphere

↻ drag to orbit

marginal of $x_1$

shape of $x_1$

uniform on $[-1,1]$

std of $x_1$

–

$1/\sqrt{d}$

0.577

Step three: high dimensions

Drag $d$ upward. The marginal narrows and converges to a Gaussian with standard deviation $1/\sqrt d$. By $d{=}30$ the marginal is visually Gaussian. By $d{=}256$ almost all of the mass concentrates within a thin shell of width $\sim 1/\sqrt d$ around zero.

Dimension d32

Samples10000

empirical histogram Beta PDF (exact) $\mathcal{N}(0, 1/d)$ approximation

Distinct coordinates are also approximately independent, a stronger condition than uncorrelated, and what is actually needed for the per-coordinate quantization argument below.

■ TAKEAWAY · NEXT §5one distribution, one codebook

Every coordinate of a rotated vector follows the same known density. The scalar quantization problem for that density can be solved once, and the solution can be reused for every coordinate of every vector. There are no per-block scale factors and no side information to store. Next: §5 builds the codebook with Lloyd–Max.

§5 · The universal codebook

Lloyd–Max: the optimal partition of a known distribution.

Every rotated coordinate looks like a draw from the same density (§4). So there is one scalar problem to solve, once: pick $2^b$ landing values on the number line such that snapping any sample to its nearest landing value introduces as little error as possible. Those landing values are the codebook.

A classical algorithm finds them: Lloyd–Max (Lloyd 1957/82, Max 1960). Because the density is fixed and known in advance, Lloyd–Max runs once at table-build time. The resulting landing values are saved into a tiny per-$b$ table. Encoding a coordinate after that is a single nearest-neighbour lookup against the table. The same table is used for every input, with no calibration step and no per-vector tuning.

Drag $b$ below to watch Lloyd–Max settle on the landing values for the Beta density.

The Lloyd–Max iteration

Given a PDF $f_X$, choose centroids $c_1 \le \dots \le c_{2^b}$ minimising $\int (x - c_{i(x)})^2 f_X(x)\,dx$ by alternating:

Assignment: each centroid owns the Voronoi cell around it, boundaries are midpoints between adjacent centroids.
Update: each centroid moves to the conditional mean of its cell, $c_k \leftarrow \mathbb{E}[X \mid X \in \text{cell}_k]$.

Repeat until stable. The demo runs this on the Beta density of §4.

Bits b2

Dimension d64

Gaussian $\mathcal{N}(0,1/d)$ centroids $c_k$ bin boundaries

iteration

MSE per coord

–

Shannon bound 1/4^b / d

–

For moderate $d$, the paper's explicit centroids (after normalising by $\sqrt{d}$) are: $b{=}1\!:\pm\sqrt{2/\pi}$, $b{=}2\!:\{\pm 0.453,\pm 1.510\}$, and so on. Theorem 1 proves the per-coordinate MSE is $\lesssim \tfrac{\sqrt{3}\pi}{2d}\cdot 4^{-b}$. The constant $\tfrac{\sqrt{3}\pi}{2}\approx 2.72$ is the asymptotic ratio to Shannon's minimum $\tfrac{1}{d}\cdot 4^{-b}$; at $b{=}1$ the paper reports a tighter ratio of $\approx 1.45$.

■ TAKEAWAY · NEXT §6a tiny lookup, baked once

Lloyd–Max gives the optimal partition for a known density, so the centroids for the Beta marginal can be precomputed and stored as a tiny per-$b$ table. The per-coordinate MSE that the resulting codebook achieves is within a factor of $\approx 2.72$ of Shannon's lower bound asymptotically and within $\approx 1.45$ at $b{=}1$. Next: §6 assembles rotation and codebook into TurboQuant-MSE.

§6 · TurboQuant-MSE

Putting it together: TurboQuant-MSE.

STEP 1

Rotate

$\mathbf{y} = \boldsymbol{\Pi}\mathbf{x}$. Same $\boldsymbol{\Pi}$ reused for every vector.

STEP 2

Round each coord

For each $j$, $\texttt{idx}_j = \arg\min_k |y_j - c_k|$. Stores $b$ bits.

STEP 3

Store

Total: $b\!\cdot\!d$ bits. No scales, no zero-points.

STEP 4

Look up

$\tilde{y}_j = c_{\texttt{idx}_j}$ from the universal codebook.

STEP 5

Rotate back

$\tilde{\mathbf{x}} = \boldsymbol{\Pi}^{\!\top}\tilde{\mathbf{y}}$. Done.

Bits b3

Dimension d64

Input:

x (original)

Πx (rotated, nearly Gaussian)

quantized Πx (snap to codebook)

x̃ = Πᵀ·quant(Πx) (recovered)

error x − x̃

‖x − x̃‖² / ‖x‖²

–

naïve (no rotation)

–

Shannon floor 1/4^b

–

compression factor

–

Toggle between input types. Naive quantization without rotation fails on the spike input and on the outlier-channel input. With the rotation step in front, the reconstruction error is roughly the same regardless of which input is selected. Every rotated coordinate follows the same $\mathcal{N}(0,\,1/d)$ distribution, which is the distribution the codebook was designed for.

■ TAKEAWAY · NEXT §7MSE is solved, but…

TurboQuant-MSE stores $b\cdot d$ bits per vector and zero metadata. The reconstructed $\tilde{\mathbf{x}}$ is nearly as close to the original $\mathbf{x}$ as any quantizer can achieve, within a factor of $\approx 2.72$ of Shannon's information-theoretic lower bound. Next: §7 shows that the same codebook produces a systematically biased estimate of inner products. This is an error that minimising reconstruction MSE does not address.

§7 · The inner-product bias

MSE-optimal quantizers underestimate inner products.

§6’s TurboQuant-MSE keeps $\tilde{\mathbf{x}}$ close to $\mathbf{x}$ in squared distance. Attention does not measure $\|\mathbf{x}-\tilde{\mathbf{x}}\|^2$. It computes $\langle \mathbf{q}, \tilde{\mathbf{k}}\rangle$ and uses that number as a stand-in for $\langle \mathbf{q}, \mathbf{k}\rangle$. The MSE codebook gives a systematically wrong answer to the inner-product question. Each trial returns the same error, so averaging many trials does not remove it.

Two earlier facts produce the shrinkage. In §0.3 the MSE-optimal reconstruction for a set of values was the set’s average, and that average had smaller magnitude than the set’s extreme values. In §4 a random rotation made every coordinate of $\boldsymbol{\Pi}\mathbf{x}$ behave like a zero-mean draw with most of its mass close to 0. Combine the two and the shrinkage is forced: the encoder partitions each axis into $2^b$ bins and stores only which bin $\boldsymbol{\Pi}\mathbf{x}$ fell into, the decoder reconstructs with the bin’s average, and the bin’s average sits closer to 0 than the tail inputs that fall into the same bin. The reconstruction $\tilde{\mathbf{x}}$ is therefore a shrunken copy of $\mathbf{x}$, and an inner product $\langle \mathbf{q}, \tilde{\mathbf{k}}\rangle$ comes out smaller than $\langle \mathbf{q}, \mathbf{k}\rangle$. Because the codebook is fixed, the shrinkage factor is identical on every trial.

■ SEE THE SHRINKAGEdrag y, watch ỹ snap

One rotated coordinate $y$ has the near-Gaussian density drawn on top. Lloyd–Max partitions the axis into $2^b$ bins (interior verticals); each bin’s centroid is the MSE-optimal reconstruction (red dots). Drag the mint handle to set $y$. The encoder snaps it to the centroid of the bin it fell into, giving $\tilde y$ (red). The staircase underneath plots that map $\tilde y(y)$ across the whole axis at once: every horizontal step sits inside the dashed identity line, and the gap between step and identity is the shrinkage at that input.

Bits b1

↔ drag y

|y|

1.500

|ỹ|

–

|ỹ|/|y|

–

E[ỹ²]/σ² at this b

–

What to notice. Drag $y$ deep into a tail bin: the centroid sits well inside the bin, not at the bin’s edge, so $|\tilde y| < |y|$ and the per-input ratio drops well below 1. Drag $y$ near 0: the centre bins are narrow and the ratio can even exceed 1. The right-most metric is the population shrinkage factor $\mathbb{E}[\tilde y^2]/\sigma^2 = 1 - D_b$, the second-moment ratio quoted in the next paragraph as $0.64 / 0.88 / 0.97 / 0.99$ for $b=1,2,3,4$. That is the same factor the bullseye demo will reproduce as the cloud’s centre offset.

The bullseye below measures the shrinkage. At $b{=}1$ the offset is $1 - 2/\pi \approx 0.36$ on every axis. The shrinkage factor approaches 1 quickly with more bits (about 0.88 at $b{=}2$, 0.97 at $b{=}3$, 0.998 at $b{=}5$), so by $b{=}3$ the residual bias is smaller than the trial-to-trial noise of a few thousand shots and the red dot visually overlaps the crosshair. The bias is theoretically strictly nonzero at every finite $b$, but the regime where it matters in practice is the low-bit one (1–2 bits per coordinate), where it dominates the per-trial variance.

■ HOW TO READdrag b, watch the red dot

Same bullseye as the primer. Each trial fires two shots at the target, one inner-product estimate against $\mathbf{y}_1$ and one against an independent $\mathbf{y}_2$, both divided by their truth and re-centred so a perfect estimate lands on the centre. The yellow crosshair marks truth, the red dot is the average of every shot fired so far. Unbiased means the red dot sits on the crosshair, no matter how wide the cloud of shots around it.

Bits b1

Dimension d128

Trials2000

MSE-optimal codebook · biased (cloud’s centre is off the crosshair)

MSE mean ratio

–

MSE theory

0.6366

truth

1.0000

What to notice. At $b{=}1$ the red dot is southwest of the crosshair, on the diagonal. The offset on $\mathbf{y}_1$ and the offset on $\mathbf{y}_2$ are equal, which is what one scalar shrinkage applied to the whole reconstruction would produce. Increase $b$: the offset shrinks fast and is below the trial-to-trial noise by $b{=}3$, even though the underlying shrinkage factor is still strictly less than 1.

Derivation: where the $2/\pi$ factor comes from

For a standard Gaussian $g$, $\mathbb{E}[|g|]=\sqrt{2/\pi}$, the “half-normal” mean. The 1-bit MSE codebook rounds each rotated coordinate to $\pm\sqrt{2/\pi}/\sqrt d$; when you dot-product that reconstruction back against $\mathbf{y}$, you pick up another $\sqrt{2/\pi}$ factor in expectation. Multiply: $2/\pi \approx 0.637$.

Concretely at $b{=}1$, the optimal MSE codebook is $\{-\sqrt{2/\pi}/\sqrt{d},\,+\sqrt{2/\pi}/\sqrt{d}\}$, so $Q(\mathbf{x}) = \sqrt{2/(\pi d)}\cdot \operatorname{sign}(\boldsymbol{\Pi}\mathbf{x})$ and

$\mathbb{E}\big[\langle\mathbf{y},\tilde{\mathbf{x}}\rangle\big] \;=\; \dfrac{2}{\pi}\cdot\langle\mathbf{y},\mathbf{x}\rangle.$

The factor shrinks as $b$ grows but never vanishes, which is what the demo above shows.

■ TAKEAWAY · NEXT §8we need a different kind of estimator

An MSE-optimal codebook minimises squared reconstruction error. The cost is a fixed scalar shrinkage on every inner product, and this shrinkage stays nonzero at any finite bit budget. Attention and nearest-neighbour search need an inner-product estimator whose mean is correct. Next: §8 keeps the same encoder and adds a fixed prefactor on the decoder side equal to the reciprocal of the shrinkage. The mean of many trials then equals $\langle \mathbf{q}, \mathbf{k}\rangle$.

§8 · QJL: the un-biaser

If the bias is a known number, multiply it out.

§7 ended with a shrunken reconstruction. The MSE codebook produces $\tilde{\mathbf{x}}$ values whose magnitudes are smaller than the inputs they encode, so every inner product $\langle \mathbf{y}, \tilde{\mathbf{x}}\rangle$ comes out smaller than $\langle \mathbf{y}, \mathbf{x}\rangle$ by the same scalar factor. At one bit per coordinate that factor is exactly $2/\pi$. Averaging over trials does not move the estimate toward $\langle \mathbf{y}, \mathbf{x}\rangle$, because the same scalar multiplies the result on every trial.

A deterministic scalar bias is removable without changing the encoder. Multiply the decoder's output by the reciprocal of the bias and the expectation of the product equals the unbiased target. QJL applies this idea at one bit per coordinate. The encoder discards magnitude information, which is the same step that shrank §7's reconstruction. The decoder applies a fixed prefactor whose value is the reciprocal of the half-normal shrinkage that sign quantization introduces.

Encoder. Sample one random Gaussian matrix $\mathbf{S}$ once and share it between every encoder and decoder. To store $\mathbf{x}$, write down the signs of $\mathbf{S}\mathbf{x}$. The stored object is one bit per coordinate; the magnitudes of the entries of $\mathbf{S}\mathbf{x}$ are discarded. Discarding the magnitudes produces the bit savings and also produces a $\sqrt{2/\pi}$ shrinkage on any reconstruction built from the signs alone, by the same half-normal identity that produced §7's $2/\pi$.

Decoder. A full-precision query $\mathbf{y}$ arrives. Compute $\langle \mathbf{S}\mathbf{y},\,\text{stored signs}\rangle$. This quantity is a noisy estimate of $\langle \mathbf{x},\mathbf{y}\rangle$ scaled down by $\sqrt{2/\pi}$. Multiply by $\sqrt{\pi/2}/d$. The factor $\sqrt{\pi/2}$ is the reciprocal of the half-normal shrinkage and cancels it in expectation; the factor $1/d$ averages the estimate over the $d$ rows of $\mathbf{S}$. The expected value of the result is $\langle \mathbf{x}, \mathbf{y}\rangle$. The per-trial variance is larger than the MSE estimator's variance, but the mean of many trials converges to $\langle \mathbf{x}, \mathbf{y}\rangle$.

■ HOW TO READsame target, two estimators

Both panels use exactly 1 bit per coordinate. Left: the MSE-optimal codebook from §7, biased. Right: QJL with its calibration constant baked in. Each trial fires two shots (against independent $\mathbf{y}_1$ and $\mathbf{y}_2$). Same number of trials, same target. Watch where the red dot lands.

Dimension d128

Trials1500

MSE-optimal · 1 bit (biased)

QJL · 1 bit (unbiased)

MSE mean ratio

–

QJL mean ratio

–

truth

1.0000

What to notice. The MSE panel's red dot is southwest of the centre at the same offset as §7's 1-bit measurement, and that offset stays the same regardless of how many trials run. The QJL panel's red dot lands close to the centre but with a residual offset from finite-sample noise. QJL's per-trial variance is larger than MSE's (Lemma 4: $\propto \pi/(2d)$), so at the default trial count the residual offset is small but visible. The key difference between the two estimators is the source of this offset: MSE's offset is a fixed scalar bias on the inner product and does not shrink with more trials; QJL's residual offset is sampling noise around a correct mean and shrinks at the standard-error rate $1/\sqrt{n}$ as the trial count grows.

The math: definition and where $\sqrt{\pi/2}/d$ comes from

With $\mathbf{S}\in\mathbb{R}^{d\times d}$ i.i.d. $\mathcal{N}(0,1)$:

$Q_{\text{jl}}(\mathbf{x}) = \operatorname{sign}(\mathbf{S}\mathbf{x}) \in \{-1,+1\}^d, \quad \widehat{\langle \mathbf{x},\mathbf{y}\rangle} = \frac{\sqrt{\pi/2}}{d}\, \langle \mathbf{S}\mathbf{y},\,Q_{\text{jl}}(\mathbf{x})\rangle.$

Each row $\mathbf{s}_i$ makes $\mathbf{s}_i\mathbf{x}$ and $\mathbf{s}_i\mathbf{y}$ jointly Gaussian with covariance $\langle\mathbf{x},\mathbf{y}\rangle$. The half-normal identity gives $\mathbb{E}[(\mathbf{s}_i\mathbf{y})\,\text{sign}(\mathbf{s}_i\mathbf{x})] = \sqrt{2/\pi}\cdot\langle\mathbf{x},\mathbf{y}\rangle/\|\mathbf{x}\|$. Sum over $d$ rows and multiply by $\sqrt{\pi/2}/d$: the $\sqrt{2/\pi}$ shrinkage cancels, and the result is $\langle\mathbf{x},\mathbf{y}\rangle$ in expectation. Variance is bounded by $\tfrac{\pi}{2d}\|\mathbf{x}\|^2\|\mathbf{y}\|^2$ (Lemma 4 of the paper).

Stretching it: TurboQuant-prod

QJL by itself uses one bit per coordinate. TurboQuant-prod extends the construction to a $b$-bit budget by allocating the bits between the two estimators from §6 and §8. The first $b{-}1$ bits encode $\boldsymbol{\Pi}\mathbf{x}$ with the MSE codebook of §6 to capture magnitude. The last bit encodes the residual $\mathbf{r} = \boldsymbol{\Pi}\mathbf{x} - \tilde{\mathbf{y}}_{\text{mse}}$ with QJL to make the inner-product estimate unbiased. The total cost is $b\cdot d$ bits plus one scalar per vector (the residual norm $\|\mathbf{r}\|$), the same as TurboQuant-MSE.

The full TurboQuant-prod recipe

Rotate $\mathbf{x}\to \boldsymbol{\Pi}\mathbf{x}$ as in §3.
Apply $(b{-}1)$-bit MSE-optimal quantization. Call the result $\tilde{\mathbf{y}}_{\text{mse}}$.
Form the residual $\mathbf{r} = \boldsymbol{\Pi}\mathbf{x} - \tilde{\mathbf{y}}_{\text{mse}}$ and quantize it with one bit of QJL: store $\text{sign}(\mathbf{S}\mathbf{r})$ and the residual norm $\|\mathbf{r}\|$.
Decode: $\tilde{\mathbf{x}} = \boldsymbol{\Pi}^{\top}\big(\tilde{\mathbf{y}}_{\text{mse}} + \|\mathbf{r}\|\cdot \tfrac{\sqrt{\pi/2}}{d}\,\mathbf{S}^{\top}\text{sign}(\mathbf{S}\mathbf{r})\big)$.

The residual norm is the only piece of side info in the whole scheme, one scalar per vector, not one per small block the way GPTQ, AWQ, or KIVI need. Variance is bounded by Theorem 2.

■ TAKEAWAY · NEXT §9two recipes, one budget

TurboQuant-MSE minimises reconstruction error and produces a biased inner-product estimate with a known shrinkage factor. TurboQuant-prod allocates one of its $b$ bits to a QJL residual and produces an unbiased inner-product estimate at higher per-trial variance. Both schemes use $b\cdot d$ bits plus one scalar per vector. Next: §9 compares both upper bounds against the information-theoretic lower bound.

§9 · Shannon's floor

How close is TurboQuant to the theoretical best?

The paper uses Shannon's lossy source-coding theorem (via Yao's minimax principle) to prove that no quantizer can do better than $D_{\text{mse}} \ge 4^{-b}$ on worst-case inputs on the unit sphere. The bound covers every conceivable quantizer, including randomized and data-adaptive ones. TurboQuant's matching upper bound is $\tfrac{\sqrt{3}\pi}{2}\cdot 4^{-b}$, within a factor of $\approx 2.7$ of the lower bound asymptotically and within a factor of $\approx 1.45$ at $b{=}1$.

Dimension d128

Samples / b500

TurboQuant-MSE measured upper bound $\tfrac{\sqrt{3}\pi}{2}\cdot 4^{-b}$ Shannon lower bound $4^{-b}$

The plot uses a log scale on the vertical axis. All three curves have the same slope (the $4^{-b}$ exponential rate) and differ only by a small constant offset.

The exponential improvement over older methods

Earlier data-oblivious quantizers (uniform rounding, scalar sketches) achieve a reconstruction error that decays only polynomially in the bit budget, e.g. $\mathcal{O}(1/b)$. TurboQuant's $4^{-b}$ rate is exponential in $b$. That exponential rate is what enables the $4$–$6\times$ KV-cache compressions reported in §10 without measurable downstream quality loss.

■ TAKEAWAY · NEXT §10tight against the floor

The upper bound, the lower bound, and the measured error all decay at the same exponential rate $4^{-b}$ as $b$ grows, and they differ from each other only by a small constant. TurboQuant therefore matches Shannon's $4^{-b}$ rate to within a factor of $\approx 2.72$ asymptotically and $\approx 1.45$ at $b{=}1$. Next: §10 looks at the systems consequences of this rate.

§10 · Why it matters

Concrete wins in LLM inference and vector search.

KV cache compression

Needle-in-a-Haystack recall on Llama-3.1-8B-Instruct, every compressed method evaluated at a $4\times$ memory-compression target (paper Fig. 4):

Method	NiaH
Full cache (FP16)	0.997
SnapKV	0.858
PyramidKV	0.895
KIVI	0.981
PolarQuant	0.995
TurboQuant	0.997

TurboQuant matches the full-precision NiaH score at $4\times$ compression. On LongBench-V1 (paper Table 1), TurboQuant at $3.5$ bits per channel matches the full-precision average ($50.06$); at $2.5$ bits per channel it stays within $\approx 1\%$ of full precision ($49.44$ vs $50.06$), a $6.4\times$ compression.

Near-neighbor search

Quantization time on 100K vectors, 4-bit quantization (paper Table 2):

Method	d=200	d=1536	d=3072
Product Quantization	37.04 s	239.75 s	494.42 s
RabitQ	597.25 s	2267.59 s	3957.19 s
TurboQuant	0.0007 s	0.0013 s	0.0021 s

TurboQuant is between four and six orders of magnitude faster than the alternatives at 4-bit indexing, and the paper reports higher recall as well. The reason it is so fast is that the encoder is a fixed rotation followed by a lookup against a precomputed table. There is no codebook to learn from data and no per-block scales to fit at index time.

Zoomed-out summary: all three papers together

QJL (2024). Take the sign of a random Gaussian projection of the input and store one bit per coordinate. The decoder applies a fixed calibration constant that makes the inner-product estimate unbiased in expectation. The construction adds no per-vector side information.
PolarQuant (Feb 2025). After a random pre-conditioning, convert each block of coordinates to polar form. The radii concentrate around their mean and the angles concentrate around $\pi/4$. Quantize the radius distribution and the angle distribution independently. The construction is an alternative coordinate system that also avoids per-vector side information.
TurboQuant (Apr 2025). Combines the previous two ideas. The encoder rotates the input, then quantizes the Beta-distributed coordinates with an optimal Lloyd–Max codebook. A QJL pass on the residual then cancels the inner-product bias that the Lloyd–Max step leaves behind. The MSE of the reconstruction is provably within a constant factor of Shannon's lower bound at every bit rate ($\approx 1.45\times$ at $b{=}1$, $\to \tfrac{\sqrt{3}\pi}{2}\approx 2.72\times$ asymptotically). The construction runs on a GPU and supports streaming inputs.

■ TAKEAWAYthe whole trick, in one line

After a random rotation, every coordinate of every input has the same fixed distribution: a low-variance Beta that converges to a Gaussian as $d$ grows. A single optimal codebook designed once for that distribution serves every input. The full vector-quantization problem reduces to the well-studied scalar quantization problem.

↑ top

8.Box to save memory in Rust

Sourcehttps://dystroy.org/blog/box-to-save-memory/

Sitedystroy.org

Submitteremschwartz (Hacker News)

Submitted2026-04-23 17:02 UTC (Hacker News)

HN activity82 points · 18 comments

Length1.4K words (~6 min read)

Languageen

dystroy - blog

I saved 475 MB out of the 895 MB used by a real-world Rust program by changing the layout of some structs and the way I was deserializing JSON files.

The real use case

My program deserializes all the JSON files of https://github.com/awslabs/aws-sdk-rust/tree/main/aws-models into "Smithy Shape" structs.

Those files contain thousands of structures similar to this one:

"com.amazonaws.iam#EnableOrganizationsRootSessionsResponse": {
    "type": "structure",
    "members": {
        "OrganizationId": {
            "target": "com.amazonaws.iam#OrganizationIdType",
            "traits": {
                "smithy.api#documentation": "<p>The unique identifier (ID) of an organization.</p>"
            }
        },
        "EnabledFeatures": {
            "target": "com.amazonaws.iam#FeaturesListType",
            "traits": {
                "smithy.api#documentation": "<p>The features you have enabled for centralized root access.</p>"
            }
        }
    },
    "traits": {
        "smithy.api#output": {}
    }
},

As is common in Rust, my program uses the very convenient serde.

I won't go into every details, but part of the structure needs to be shown at this point for clarity.

Don't read it entirely, just note that it's a bunch of structs containing structs, some optional, with serde attributes:

#[derive(Clone, Deserialize, Serialize)]
pub struct SmithyShape {
    #[serde(rename = "type")]
    pub shape_type: SmithyShapeType,
    #[serde(default, skip_serializing_if = "Vec::is_empty")]
    pub operations: Vec<SmithyReference>,
    #[serde(default)]
    pub members: FxHashMap<String, SmithyReference>,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub key: Option<SmithyReference>,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub value: Option<SmithyReference>,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub member: Option<SmithyReference>,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub input: Option<SmithyReference>,
    #[serde(default, skip_serializing_if = "Option::is_none")]
    pub output: Option<SmithyReference>,
    #[serde(default)]
    pub traits: SmithyTraits,
}

#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct SmithyReference {
    pub target: ShortShapeId,
    #[serde(default)]
    pub traits: SmithyTraits,
}

#[derive(Debug, Clone, Default, Deserialize, Serialize)]
pub struct SmithyTraits {
    #[serde(rename = "smithy.api#title", skip_serializing_if = "Option::is_none")]
    pub title: Option<String>,
    #[serde(rename = "aws.api#service", skip_serializing_if = "Option::is_none")]
    pub service: Option<SmithyServiceTrait>,
    #[serde(
        rename = "smithy.api#sensitive",
        skip_serializing_if = "Option::is_none"
    )]
    pub sensitive: Option<SmithySensitiveTrait>,
    #[serde(
        rename = "smithy.api#documentation",
        skip_serializing_if = "Option::is_none"
    )]
    pub documentation: Option<String>,
    #[serde(rename = "smithy.api#pattern", skip_serializing_if = "Option::is_none")]
    pub pattern: Option<String>,
    #[serde(rename = "aws.iam#iamAction", skip_serializing_if = "Option::is_none")]
    pub iam_action: Option<SmithyIamAction>,
}

#[derive(Debug, Clone, Deserialize, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct SmithyServiceTrait {
    pub sdk_id: Option<String>,
    pub arn_namespace: Option<String>,
    pub cloud_formation_name: Option<String>,
    pub cloud_trail_event_source: Option<String>,
    pub endpoint_prefix: Option<String>,
}

This is some standard looking code, the current practice, but we can also call it naïve. By deserializing this way, the structures were taking 895MB in memory.

An analysis shows that most optional strings are missing, and that's what I leveraged to drastically reduce the memory footprint. But this requires to have in mind some Rust specificities, so a detour is needed:

About rust structs and memory

On a 64-bits platform, a word is made of 8 bytes. That's for example the memory needed to store a usize.

A String needs 3 words (address of the string, allocated size, and capacity) to which you need to add the allocated space for the string bytes. That's 24 bytes for a String (you can check it with dbg!(std::mem::size_of::<String>());), excluding the actual string content on the heap.

There's a niche compiler optimization which makes an Option<String> the same size (basically an option of a pointer type doesn't need an added byte to know whether it's None because it's None when the pointer is zero).

So the following structure, when all strings are missing (None), takes exactly 120 bytes (5*24) in memory:

pub struct SmithyServiceTrait {
    pub sdk_id: Option<String>,
    pub arn_namespace: Option<String>,
    pub cloud_formation_name: Option<String>,
    pub cloud_trail_event_source: Option<String>,
    pub endpoint_prefix: Option<String>,
}

Now to struct composition.

Have a look at a struct "containing" another struct. To simplify, let's imagine it contains our SmithyServiceTrait and another field:

pub struct Container1 {
    pub some_string: Option<String>,
    #[serde(default)]
    pub trait: SmithyServiceTrait,
}

The minimal size is, quite expectedly, 24+120 = 144 bytes.

But our SmithyShape only contains optional structs. What happens if we change our Container struct to use an Option<SmithyServiceTrait> ?

pub struct Container2 {
    pub some_string: Option<String>,
    #[serde(default)]
    pub trait: Option<SmithyServiceTrait>,
}

What's the size of a container when both some_string and trait are None ?

It's the same as the one of Container1, there's no memory gain in having an option (in fact, we're even lucky that our SmithyServiceTrait which contains only Option<String> can allow the compiler to elide the additional byte).

Applying this to our SmithyTraits, we see why a standard implementation balloons in memory.

This differs fundamentally from class composition in languages like Java, Python, JavaScript, etc.

In such language, when you have:

class Container {
    String someString;
    SmithyServiceTrait trait,
}

Then a null trait takes only one pointer-sized word in memory.

To allow our Rust Container to take only one word for the optional content when there's nothing to store, we need basically to do as is done in the languages we want to mimic: we need to put this content on the heap, outside of the container:

pub struct Container3 {
    pub some_string: Option<String>,
    pub trait: Option<Box<SmithyServiceTrait>>,
}

Now, when both some_string and trait are None, a container takes only 32 bytes in memory (3 words for the Option<String>, one for the Option<Box<...>>).

The niche optimization I mentioned before applies to Option<Box<...>> too: it doesn't consume more than a simple Box<...>.

The changes that recovered the memory

Basically, the change consists in

Detecting when structs are useless (i.e. when all their fields are None)
Making them optional in their parent struct, and moving them to the heap
Implementing a custom Deserializer to not store empty useless structs

#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct SmithyReference {
    pub target: ShortShapeId,
    #[serde(default)]
    pub traits: SmithyTraits,
}

becomes

#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct SmithyReference {
    pub target: ShortShapeId,
    #[serde(
        default,
        deserialize_with = "deserialize_boxed_traits",
        serialize_with = "serialize_boxed_traits"
    )]
    pub traits: Option<Box<SmithyTraits>>,
}

fn deserialize_boxed_traits<'de, D: Deserializer<'de>>(
    deserializer: D
) -> Result<Option<Box<SmithyTraits>>, D::Error> {
    let traits = SmithyTraits::deserialize(deserializer)?;
    if traits.is_empty() { // i.e. when all optional strings are none
        Ok(None)
    } else {
        Ok(Some(Box::new(traits)))
    }
}

Similarly, SmithyShape was changed to replace all Option<SmithyReference> by Option<Box<SmithyReference>>, some accessors were modified due to options in the way, and that's it, that's how the memory needed to store all deserialized AWS shapes was reduced twofold, sparing 475 MB.

A few notes:

this deserialization costs more in CPU as the object is deserialized before being discarded. It turns out that the trade-off is a full-win as not having to hunt for memory made the complete task faster even with this added step.
a lot of boxes means a fragmented heap. In such case it's not a problem but this might be worth keeping in mind.

Verification: Proving the Impact

With experience, you get an intuition of where to save space, and roughly how much. But to work seriously, you need to check that what you did worked, and verify it was worthwhile. So you need to measure.

There's no simple and light way in Rust to know the total space taken by a composite object following all pointers.

Here, my solution was to use an allocator which gives information about its state (I used jemalloc because the standard allocator provides limited visibility into internal statistics), and compare the memory used before deserialization to the memory used after.

As I don't always want to use this allocator, I defined a "profile" feature in my Cargo.toml:

[features]
profile = ["tikv-jemallocator", "tikv-jemalloc-ctl"]

[dependencies]
tikv-jemallocator = { optional = true, version = "0.6", features = ["stats", "profiling"] }
tikv-jemalloc-ctl = { optional = true, version="0.6", features = ["stats"] }

And I declare the use of this allocator in my main.rs:

#[cfg(feature = "profile")]
#[global_allocator]
static ALLOC: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;

Then, in my function deserializing all those shapes, I do the measures:

#[cfg(feature = "profile")]
fn allocated_mb() -> usize {
    tikv_jemalloc_ctl::epoch::advance().unwrap();
    tikv_jemalloc_ctl::stats::allocated::read().unwrap_or(0) / (1024 * 1024)
}

#[cfg(feature = "profile")]
let base = allocated_mb();

... load all the shapes ...

#[cfg(feature = "profile")]
eprintln!(
    "Memory used for the shapes = {} MB (total)",
    allocated_mb() - base
);

Tip: tikv_jemalloc_ctl exposes many more details that may be interesting to follow in a server application

Conclusion: what's to remember, in a few words

Summarized, here's what any Rust developper needs to understand and remember:

Composite structs can consume significant memory
It can pay to make a field: BigStruct optional by detecting when its content doesn't matter
A field: Option<BigStruct> takes at least the space of the BigStruct even when it's None
You can break the chain by boxing with field: Option<Box<BigStruct>> (then a None takes only a word in the parent struct)
Those optimizations are still possible when deserializing with Serde

↑ top

9.When the cheap one is the cool one

Sourcehttps://arun.is/blog/cheap-cool/

Sitearun.is

Submitterddrmaxgt37 (Hacker News)

Submitted2026-04-26 01:25 UTC (Hacker News)

HN activity39 points · 14 comments

Length1.7K words (~8 min read)

Languageen

Apple and Porsche both figured out the same secret that the cheapest product in the lineup can be the most exciting. It just takes a little intention.

Sometimes the cheapest product is actually the coolest in the lineup. But that rarely happens by accident.

I saw this firsthand at the Apple Park Visitor Center, waiting to buy a new MacBook Air. I was told by the staff that I would have to wait a bit while they helped a few other customers. I watched as they enthusiastically purchased the MacBook Neo. When it came my turn, I got to chatting with my Mac Expert and they told me that MacBook Neo was selling like hotcakes. Not only were new customers buying their first Mac, but customers with older Macs were, instead of trading up for a brand new Air or Pro, buying MacBook Neo.

As I pulled out my iPhone Pro to pay, a funny question popped up in my head. How did Apple craft a cool laptop that costs half as much as the phone in my hand?

The problem

Cut back to Porsche in 1992, and you’ll see a similar story playing out in a very different industry. Back then, Porsche was not in the fantastic position it is in today. Its model lineup was aging. The 911 was essentially a derivative of the original from three decades earlier, and the rest of its lineup was, at that point, decades old as well. Unsold inventory was starting to pile up, and people outside of the company were starting to think that Porsche might have to close up shop.

At that point, Kevin Gaskell was part of the senior leadership at Porsche Great Britain and was promoted to managing director to turn around that division of the company. He has told the story of how their entry-level car, the 968, was a derivative of the 924 from 1976. And being priced at around £34,000, it was unfortunately above a very important number: £29,000. This was the threshold at which company-provided cars for personal use incurred a much higher income tax. Clearly, Porsche needed a cheaper 968.

The problem Apple solved is, of course, different as Apple is not in dire straits. Yet, some of the details line up. As John Gruber wrote at Daring Fireball:

“John Ternus took the stage to address the audience. He emphasized that the Mac user base continues to grow, because ‘nearly half of Mac buyers are new to the platform.’ Ternus didn’t say the following aloud, but Apple clearly knows what has kept a lot of would-be switchers from switching, and it’s the price. It’s not that Apple never noticed the demand for laptops in the $500–700 range. It’s that they didn’t see how to make one that wasn’t junk.”

Making it cheap

If Porsche and Apple approached these problems purely as a question of cost, then their solutions would have always felt like a compromise. The key is how they each viewed the problem as an opportunity instead.

The folks at Porsche got together to figure out how to get a 968 priced at £28,995. But instead of starting with the 968 and taking away parts until they had a cheaper car, they took everything out at first — including the air conditioning, electric windows, rear seats, and more. Then they came back with lightweight bucket seats, a racing steering wheel, lowered suspension, and almost nothing else.

Image courtesy of Motor Car Classics

Apple took a similar approach with MacBook Neo. Inside, it contains the chip from an older iPhone, which likely allows Apple to use some binned CPUs they might already have on hand. The idea of an iPhone chip in a laptop isn’t novel — the original Apple Silicon developer transition kits were actually Mac mini enclosures with A12Z iPad chips inside. Features like Thunderbolt, larger RAM configurations, and extended I/O are absent — but these are limitations of the iPhone chip architecture, not arbitrary omissions.

Making it cool

If Porsche and Apple stopped there, these products would have been nothing but parts bin specials. And technically they are — they take parts from existing products and recombine them to create something new. But what’s important here is that both Apple and Porsche came back and changed the positioning of the product, making it exciting in a way that the rest of the lineup isn’t.

For Porsche, that meant racing colors — yellow, red, blue, black, and white — with color-matched wheels and seats. They called it the Club Sport, with large decals applied across the sides. Those changes, both cutting the price and the weight but also changing the way the product looks and is positioned, resulted in something that the market saw with entirely new eyes.

Image courtesy of Alex Harkey/@aloptics11

Apple has been very upfront in its marketing and messaging that price is its most important feature. On top of that, MacBook Neo is offered in new colors like blush and citrus that aren’t seen in the rest of the lineup. Instead of stripping down an existing product and offering something that felt lesser, they created something new from the ground up, earning that name “Neo”.

Both Apple and Porsche used a very hard constraint. For Porsche, it was a tax threshold that put the product below £29,000. For Apple, it’s only $500 for the education market. By working back from that constraint and then being creative with cheap things — color and so on — they created a product that felt cool and different instead of stripped down.

Our tools shape us

“We shape our buildings and afterward our buildings shape us.”— Winston Churchill

So, these constraints aren’t just corporate accounting decisions. They focus the product, which can then reveal what each of us finds important. The tool acts as a mirror.

In the 968 Club Sport, the lack of air conditioning encourages you to roll down the window and hear the engine. And even though it didn’t have a particularly powerful engine for straight-line speed, for people who wanted something fun to drive in a safe and engaging way, the Club Sport actually ended up being the best option.

Image courtesy of Alex Harkey/@aloptics11

“The Club Sport strips away all the fripperies, builds on the best bits, amplifies the soul and delivers more pure driving pleasure than almost any car we can think of.”— Performance Car, December 1993

And from this base, the 968 CS could inform its driver of what they find valuable and what missing features may actually be essential. When they turn it in for their next Porsche, they’ll have a clear idea of what they want.

The story is similar with MacBook Neo. Its competitors have terrible displays and flimsy enclosures, constantly reminding you of their lack of features.

Sam Henri Gold wrote a beautiful piece that weaves in his own experience with how a machine like MacBook Neo would be perfect for a young person entering a creative profession:

“Somewhere a kid is saving up for this. He has read every review. Watched the introduction video four or five times. Looked up every spec, every benchmark, every footnote. He has probably walked into an Apple Store and interrogated an employee about it ad nauseam. He knows the consensus. He knows it’s probably not the right tool for everything he wants to do.”

That was me when I was 16. I didn’t have the privilege of choosing something exotic. I made do with what was available to me, and in the process found new interests. And just as Sam wrote in his piece, I learned what I would need in the future by bumping up against the limits of the computer in front of me.

When people started to take apart MacBook Neo — such as the YouTube channel TECH RE-NU — there was a revealing realization: it is the most repairable laptop that Apple has shipped in the last ten years. There are no tricky adhesives or sticky tapes that you might see in their more expensive and often thinner products. And all of that is intention, and speaks to who they’re trying to sell to.

Image courtesy of iFixit

MacBook Neo is likely going to be purchased in bulk by educational institutions, and it’s common sense that kids don’t take great care of computers. So easy repairability is a genuine value-add for IT departments — not just a talking point. But repairability isn’t only a boon for institutions. The young have time.

I still fondly remember upgrading the hard drive in my own Mac twenty years ago — cracking it open, figuring out what went where, and feeling like I actually understood the machine I was using. That kind of intimacy with tools is formative.

Success

So, simply having a Mac at all is an opportunity. And for someone with a more limited budget, that driving-focused Porsche is an opportunity too.

The 968 Club Sport didn’t singlehandedly save Porsche — the Boxster probably deserves more credit for that turnaround. Today, it has become a cult classic and is starting to appreciate on the used market.

MacBook Neo seems to already be a hit. Apple has put a product here that truly has no competition from the rest of the PC makers, and I won’t be surprised to start seeing these in coffee shops and schools very soon.

In a way, buying a Neo has the same energy as a young professional buying a 968 CS to enjoy the driving experience. It’s stripped down and doesn’t offer a lot of features. And that’s sort of the point. Hitting up against limitations teaches one about the tool and one’s own capabilities.

And if you look at it purely as a capitalist exercise, the Neo and the 968 Club Sport are ultimately about growing a customer base — many of whom will likely upgrade to more expensive models once their first machine has taught them its limits. The cheap product becomes the on-ramp to the whole lineup.

These products are cool, they’re quirky, and they remind us that the cheap thing can indeed be the coolest.

Thanks to Q for reading drafts of this.

↑ top

11.When Your Digital Life Vanishes

Sourcehttps://www.newyorker.com/magazine/2026/04/27/when-your-digital-life-vanishes

SiteThe New Yorker

AuthorJulian Lucas

Published2026-04-20

HN activity14 points · 2 comments

Length5.5K words (~25 min read)

Languageen-US

A broken phone or corrupted hard drive can mean the loss of work, evidence, art, or the last traces of the dead. Julian Lucas visits data-recovery experts who specialize in undoing disaster.

The man had been slumped over his laptop for a week by the time his body was discovered. His deliquescent tissue had seeped under the keys, short-circuiting the motherboard. It was a killing from beyond the grave, flesh and blood’s revenge on silicon. Yet digital death differs, crucially, from the genuine article. Sometimes, with luck, it can be reversed.

It happens to the best of us—the farmer who plowed over his smartphone, the biologist with a flooded lab, the professional photographer whose dog chewed through his SD card just after an important shoot. Losing files is inevitable in our paperless, data-driven, device-mediated world, notwithstanding its fanciful promises of cloud-based immortality.

I used to count myself one of the prepared. Little escapes my archival dragnet: I keep every phone I’ve ever owned in a labelled shoebox, and the archived “souls” of long-defunct computers on a PC called Thoth, for the Egyptian god who records the weighing of hearts on the journey to the afterlife. Then, six years ago, I set my iPhone down on the edge of my bathroom sink, and it fell, shattering on the tiles.

The spiderwebbed screen bled colors, and the keypad flashed, as though ghostly fingers were trying to guess my passcode. I winced at the expense, but the intangible costs emerged more slowly. I realized that the phone had stopped synching with my iCloud, and, when I brought it to a repair shop, they couldn’t fix it. Among the likely casualties were some of the last texts and voice mails I’d received from my father, who’d died of heart failure not long before.

It was from him that I’d learned to protect my files in the first place. Growing up, I practically lived in his home recording studio, a starship’s bridge of mixers and monitors where he set aside a corner for my experiments with code. A musician who’d played with Miles Davis, and written and produced for Madonna, he was also a data hoarder, and he had spent a decade digitizing his extensive record collection for a custom music server that he dubbed Soulbro.

My father taught me to burn disks, to back up files, and to discharge static electricity before handling a computer’s delicate innards. He had a surgically implanted defibrillator and liked to call himself a cyborg—a boast laced with irony, because the device periodically misfired, delivering shocks that could knock him to the ground. He spent his final weeks in an I.C.U., which appeared to me like a nightmare double of his studio, its monitors transcribing the rhythms of his own waning heart.

The studio took years to clear out. I made disk images of the half-dozen computers, which were subsequently dismantled. Then, this fall, my mother found two hard drives we’d overlooked, which could have been either mine or his. Both failed to register when I plugged them into my computer; one made an ominous grinding noise. Still, I couldn’t bring myself to let them go.

For thousands of data-loss victims, the last resort is a recovery service called DriveSavers. It’s a half hour from San Francisco over the Golden Gate Bridge, in the balmy, scenic suburb of Novato. The boxy, low-rise office overlooks a verdant wetland frequented by otters and egrets. Visiting in January, I felt that I’d arrived in hard-disk heaven.

I was greeted by Sarah Farrell and Mike Cobb, two directors of the company. Farrell, a teacherly woman with blond hair and a beekeeping hobby, oversees business development but used to be an engineer. “In the lab, I just assume everything has been in the toilet,” she told me. “During COVID, I can’t even tell you what people spilled on their MacBooks.” Cobb, who runs engineering, is a genial man with lively blue eyes, and once saved a computer tower from a burrowing squirrel: “He peed right on the power supply.” Cutesy anecdotes alternated with triumphs and tragedies—a school district rescued from a ransomware gang, an iPad salvaged from a plane crash. “They made me too sad,” Farrell said of the worst cases. “I had to be, like, ‘Symptoms, no story,’ or I’d never be able to go home.”

Their handiwork was on display in the lobby’s Museum of Bizarre Diskasters, an exhibition of silicon carnage. “I remember opening this one out on the deck,” Cobb said of an ancient Toshiba laptop, which had burned shut in a fire. “It was like an oyster.” One successfully recovered smartphone had been shredded by a snowblower. Another had been sliced in two by a monorail, like a magician’s assistant. The company regularly buys brand-new devices and tears them to pieces. “It’s like the jaws of life,” Cobb said. “If a car gets absolutely demolished, you need to know what to cut and what not to cut.”

DriveSavers receives some twenty thousand inquiries each month. It has saved data for government agencies, multinational corporations, and more than a few celebrities, whose autographed portraits beamed from the lobby walls. Sidney Poitier recovered a draft of his memoir through the company’s good offices; Khloé Kardashian, a phone that fell into a pool. Data loss has been the digital age’s great equalizer: What else could bring together such disparate figures as Willie Nelson, Buzz Aldrin, Gonzo the Muppet, and Gerald Ford?

The memorabilia dated back to the eighties. Back then, hard drives stored so little and cost so much that they were generally more valuable than the files they contained; one forty-megabyte drive on display in the lobby originally retailed for twenty thousand dollars. Advances in storage density, and the digitization of everything from filing taxes to laying out magazines, changed this calculus. “It was like two crossing lines,” Jay Hagan, who co-founded DriveSavers, later told me. “The cost of drives was going down, and the value of data was going up.”

Fittingly, the company emerged from the crash of a hard-drive manufacturer, Jasmine Technologies, where Hagan met his co-founder, Scott Gaidano. In 1989, they established DriveSavers as a repair service for their former employer’s abandoned customers, whom they quickly realized were more concerned about their files than their hardware. “I came up with this theorem,” Steve Burgess, a data- recovery pioneer who sold his own company to the duo, told me. “The value of a person’s data is negatively correlated with whether or not they have it. Once they have it, it really wasn’t worth anything. But, if they don’t have it, it’s worth an arm and a leg and their children.”

Recovering data from an iPhone or a hard drive can set you back three thousand dollars, and from an enterprise server, six figures. Although DriveSavers has a “no data, no charge” policy for most customers, it gets accused of overcharging by scrappier competitors, who tend to attribute the company’s success to attention-grabbing stunts. (One rival has mocked DriveSavers’ engineers as “clowns in spacesuits,” alluding to the protective gear they wear in ads.) But Farrell insists that the fees reflect care and determination. She once spent a week recovering an iPad for a couple with an autistic child who was so attached to a farming simulator that he couldn’t calm down without it. “They still invite me to barbecues,” she said. There have also been litigants who’ve lost their evidence; scientists, their research; the bereaved, their dearly departed’s final words.

Two astronauts on moon looking at dead astronaut with helmet on hat rack.

DriveSavers’ own death has been foretold many times. The cloud was supposed to destroy them; before that, it was commercial backup services, solid-state drives (SSDs), and encrypted smartphone hardware. Still, people keep finding ways to imperil their files, which grow ever more numerous and irreplaceable. Our precarious datasphere extends from cryptocurrency to telemedicine; now, with the advent of virtual companions, it’s even possible to lose the love of your life to a glitch.

Technological progress may be increasing our exposure. A.I. agents are becoming notorious for accidental deletions, while the proliferation of data centers has wildly inflated the cost of storage. And, despite exponential growth in capacity, the average hard drive’s life span remains just under seven years. Considering the hundreds of zettabytes of data estimated to exist in the world, it’s as though a million Libraries of Alexandria were saved from annihilation solely by hamsters on wheels.

Perhaps this is why I found it so soothing to be among the Diskasters, whose data, after all, had survived. I’d sent my phone ahead of me, and the tour had kindled a cautious optimism about its fate. One vitrine contained a decapitated Mac PowerBook 100, which had spent three days underwater; next to it, for emphasis, a taxidermied piranha bared its teeth. All these devices had escaped the maw of oblivion. Why should mine be any different?

The PowerBook had belonged to a couple of jugglers, Tony Duncan and Jaki Reis, who nearly lost it on a cruise down the Amazon in March, 1993. They were performers on the Ocean Princess, where they juggled swords and torches after dinner. One afternoon, they were practicing as the Princess left Belém, in northeastern Brazil, and promptly hit a sunken wreck. They helped the crew evacuate the ship and were safe in a hotel by nightfall. But they neglected to retrieve their PowerBook, which held their contacts, promotional materials, and financial records. “Everything was on that computer,” Reis told me. “I couldn’t leave it behind.”

Reis talked her way onto a crew member’s unofficial salvage expedition. Back on the Princess, whose lower decks had sunk below the waterline, she waded down a corridor with a flashlight in her mouth, trying not to think about piranhas. She found the laptop fully submerged and assumed that it couldn’t be resuscitated but brought it back with her anyway. “I’m an Apple person,” she explained. Four repair services turned down the case. Then Duncan saw an ad for DriveSavers: “They were, like, ‘Doesn’t seem likely, but what the hell?’ ” Miraculously, they succeeded, and began exhibiting the PowerBook in an aquarium at the annual Macworld trade show. “We should have negotiated for dividends,” Duncan said.

Many such resurrections take place in DriveSavers’ “clean room,” an E.R.-like space equipped with fans and HEPA filters which reminded me of where the Oompa Loompas operate Wonkavision. Before entering, I walked across an adhesive mat that tore the dust from my soles, then donned a mask, gloves, and white coveralls. The room had about eighty computers, which, because of the controlled environment, could safely run in their birthday suits, their bare motherboards mounted to the walls. Monitors showed digits scrolling in columns as repaired hard-disk drives (HDDs) were imaged; others waited in red and blue bins. Phil Reynolds, an engineer, showed me to a table where a four-terabyte drive lay open. “You got a firm grip?” he asked.

It was about the size of a paperback novel, with smooth, reflective disks nestled inside. HDDs store data on swiftly spinning “platters,” usually made of glass or aluminum. Embedded within them are microscopic grains of a magnetic alloy, whose polarities are flipped by “read-write heads” that float just nanometres from the surface. Every year, the grains get smaller, and the means of zapping them more sophisticated; in March, Seagate, one of the leading hard-drive manufacturers, announced a forty-four terabyte drive, its largest ever—a milestone made possible by a technology called heat-assisted magnetic recording, which uses a laser to heat each grain for a nanosecond.

Reynolds turned a flashlight on the platters, which reflected our masked faces. A single drive might have two, five, or even ten spinning in parallel, with a stack of heads flitting between them. Because of the speed of revolution, a single grain of dust can be enough to strip the magnetic film and obliterate the underlying data. Another threat is corrosion, usually from immersion in liquid: Reis and Duncan’s hard-drive platters were cleansed with a deionized solution, then swapped into a replacement drive. “All kinds of catastrophic things can happen,” Reynolds said.

My apprenticeship began with a simple disassembly, a typical exercise for new employees. After a brief demonstration, Reynolds handed me pliers and a tiny screwdriver; I struggled to remove one of the actuator magnets, which held so firmly to its opposite that I feared smashing it into the platters. Similarly tricky was the printed circuit board, or PCB, which precisely choreographs the drive’s machinery. Each is particular to its model, Reynolds explained: “Without this chip, you’re not ever going to get that drive to work again.”

Sourcing parts is half the battle. Outside the clean room, I spoke with Pamela Rainger, who manages DriveSavers’ inventory. “These are our donor bodies,” she said with a sweeping gesture. “They’ve all been tested and are ready to give up their lives.” Behind her, more than thirty thousand drives were shelved in antistatic bags on metal racks. It’s not always enough to simply buy a replacement; because of a complex supply chain and the relentless pace of innovation, the donor drive should, ideally, have been made in the same factory, even in the same week, as the recipient. DriveSavers retains a personal shopper in Shenzhen to track down elusive models. For obsolete equipment, they turn to eBay and specialized venders; once, Rainger had to find a match for a forty-year-old drive from an embroidery factory, which had operated a robotic arm. The trickiest category might be novelty items, such as the SpongeBob disposable camera one family had used to document a vacation. “There are actually several SpongeBob disposable cameras,” she said. “I had to find the exact same one.”

Smart devices add yet another layer of complexity. Downstairs from the clean room, I visited the Flash Physical Department, where a handful of engineers hunched over soldering irons, microscopes, and assorted diagnostic tools. I was greeted by Matt Burger, the head of the department, a friendly, bearish young man with glasses and a mop of brown hair, who was putting a thumb drive through an X-ray machine. “Somebody had it in their laptop and dropped it on its side,” he explained. The monitor showed a slightly bent rectangle covered with dots and lines, which didn’t look so bad to me. I listened for a prognosis, hoping that it might have some relevance to my own wounded machines. Then he spotted a faint crack through the tiny region of the drive that held the memory chip. “This is going to be a no-recovery,” he said.

Flash memory is used in thumb drives, smartphones, newer laptops, and SSDs. The technology exploits a phenomenon known as “quantum tunnelling” to trap electrons in floating-gate transistors, like the genies imprisoned by King Solomon. Because they have no moving parts, flash chips are generally considered to be more stable than HDDs. But their design can also complicate data recovery. Many devices integrate flash storage into their main logic boards and cryptographically pair it with other components for security, a practice popularized by Apple. Saving them can involve transplanting not one but several chips. Burger explained, “You have to have it all working as one cohesive thing. No funny business.” The dead man’s laptop, which arrived still soaked in bodily fluids, had required engineers to remove and clean nearly every chip on the logic board before it could be resurrected, much as Egyptian embalmers preserved the stomach, the liver, the lungs, and other organs so that the deceased could function in the afterlife.

The arcane art that makes all this possible is called “microsoldering”—essentially, soldering under a microscope. Burger sat me down for a tutorial at an empty workstation, where a damaged iPhone board had been readied for my inexpert hands. It was an L-shaped thing about the size of my thumb and forefinger; in one of its corners, a chip no bigger than a peppercorn had slightly cracked. “See how it’s impacted there?” Burger asked as I adjusted the microscope. “You can see the actual glass through the top coating.” Burger gave me tweezers and heat-resistant gloves; though my hands felt steady, under the microscope they shook like mad. I was like a giant medical student with a tremor, about to perform surgery on a Who out of Dr. Seuss.

Burger tasked me with swapping out the chip. First, I used a syringe to apply flux, an antioxidant that helps solder stick. Next, I heated the chip with a hot-air gun until the tiny grid of metal balls connecting it to the board melted. “Get your tweezers in there,” Burger encouraged; at last, it came loose. Putting in the new chip was more difficult. I initially struggled to stencil new solder balls onto its underside—“He’s going to break it,” Farrell warned—but managed to finish the procedure, though I inadvertently fused a few resistors in the process. “Have I been fired at this point?” I asked. “Everybody practices,” Burger diplomatically replied. “You could maybe even still salvage data.”

The final stage of a recovery takes place in the Logical Department, a warren of computer towers where engineers analyze the recovered disk images. One of them, Will DeLisi, looked startled as he turned away from a screenful of digits: “They said ‘perfect copy,’ but it’s gibberish, plain and simple.” When files have been deleted, corrupted, or overwritten, it’s his job to reconstruct them; today, he was searching for pictures that had mysteriously vanished. “This file ends mid-sector,” he said, adding that cheap thumb-drive firmware was probably to blame. “The controllers just spit up on top of the file system.”

Files can disappear in any number of ways, only some of which are irreversible. On many systems, deleting them merely removes their addresses from a registry, freeing the space to be overwritten. (This is one reason that the F.B.I. was able to recover deleted e-mails from Hillary Clinton’s private server.) Similarly, corruption or physical damage might destroy a file’s header, which contains its identifying metadata, while leaving other parts of it untouched. In other words, there are file traces everywhere, like so many ghosts in a vast bardo, which can sometimes be brought back to life.

Logical data recovery is the most D.I.Y.-friendly kind. A YouTuber called Babylonian, who goes to extreme lengths to solve “trivial mysteries,” got nearly seven million views for a video of him “rescuing” a fan’s cherished Pokémon, tragically scrambled in a Game Boy save-cheating attempt fifteen years earlier. (The fan, now an adult, gets emotional when the Pokémon, a Blastoise, is finally retrieved.) But at larger scales it becomes dizzyingly complex. This is especially true when it comes to ransomware, a form of digital extortion that involves encrypting files and threatening to destroy or publish them.

Ransomware recoveries are DriveSavers’ biggest growth area. The day I visited, engineers were racing to unscramble sixty HDDs belonging to a health-care nonprofit. Time was of the essence, but the attackers, too, had been up against the clock. Ransomware attackers usually have limited time before they’re detected. The slowness of encryption forces them to triage. For instance, they might use scatter algorithms that encrypt every _n_th megabyte, or delete backups without “zeroing out”—overwriting with zeroes—the underlying files. All this gives recovery specialists an opening. They can write case-specific code to piece together files from partially destroyed backups, or even infer missing data by identifying patterns of encryption. Ideally, the data can be retrieved without a ransom payment, which, in the case of large organizations, might run into the millions.

The phenomenon has exploded in recent years, with small businesses and municipalities particularly at risk. (Last July, St. Paul, Minnesota, suffered an attack that required the deployment of a National Guard cybersecurity team.) A franchise model allows enterprising hackers to license malware from syndicates. “Literally anyone can sign on as an affiliate through the dark web,” Andy Maus, who oversees DriveSavers’ ransomware recoveries, explained. A.I. has exacerbated the situation, he went on: “You can take an I.T. professional who’s relatively unsophisticated, and suddenly, they can mount a sophisticated attack.” In 2023, the company worked on fewer than fifty ransomware recoveries; last year, the total was nearly three hundred.

Occasionally, even victims who pay their ransoms need data recovery, when the decryptors they “buy” malfunction. Their attackers, anxious to maintain their credibility, sometimes even join them in searching for a fix: “I’ve heard they have excellent customer service,” Farrell said. It’s one of many reasons that DriveSavers’ C.E.O., Alex Hagan—who took over from Jay, his father, in 2023—believes that his industry isn’t going anywhere. “Technology will continue to improve, but as long as humans are involved, there’s room for error,” he told me. “People continue to break stuff.”

The more we entrust to computers, the more they become mirrors of our vulnerability. Each month, DriveSavers receives calls from people facing the loss of their memories, their livelihoods, their businesses, their cryptocurrency wallets. For two decades, the most desperate were fielded by Kelly Chessen, the company’s first “data crisis counselor,” who came to the job from a suicide-prevention hotline. “By the time folks got to us, they’d usually been through several levels of computer work,” she recalled. “There was that element of ‘You’re my last chance!’ ” She talked down I.T. guys sobbing about fumbled company servers and entrepreneurs screaming from the wreckage of their burned-down stores; one woman called because her boss had shot his computer, though, luckily, he’d missed the hard drive. When recoveries failed, Chessen helped callers process their emotions—and often bore the brunt of them: “I can’t tell you how many times I got the whole ‘Well, they got Hillary’s e-mails back!’ ” Because there are no limits on call time, the transition from customer service to therapy was often imperceptible. “I’d tell them, ‘This is a grieving process,’ and you could hear them go, ‘Huh,’ ” she said. “That’s not something they’re used to hearing from a tech company.”

Rarely is data loss more of an occasion for grief than in the aftermath of disasters. The National Transportation Safety Board investigates accidents across the United States. Every year, its vehicle-recorder division processes more than five hundred pieces of evidence from wrecked trains, cars, ships, and planes—not only black boxes but also personal devices. In 2013, photos and a takeoff video from deceased passengers’ phones helped establish that a small plane in Soldotna, Alaska, had crashed because of improperly balanced baggage. Two years later, it salvaged a voyage-data recorder from the wreck of the S.S. El Faro, a cargo ship that sailed into a hurricane and sank with all hands aboard. “These are sometimes the last records, the last words, the last moments of someone’s life,” Ben Hsu, who leads the division, told me. “But our work is technical. The job is to help determine what happened and prevent it from happening again.” Sometimes data extracted from personal devices is shared with victims’ loved ones, offering an opportunity for closure that is all the more significant in the absence of physical remains.

Last year, Jeff Wong had just returned from scattering his father’s ashes in Hawaii when a glow appeared over the mountains near his home in Altadena. He and his family evacuated—and, the next morning, awoke to the news that their home had been consumed by the Eaton Fire. A fire safe in his office seemed to be intact, though; a few weeks later, he enlisted safecrackers to open it. Nearly everything inside had turned to powder, including a dozen storage drives with digitized family photos. But two inner, portable safes had survived, though the drives they contained had partially melted. “You could see the components with plastic fused into them,” he told me. “But they were still shaped like drives, so I had some hope.” After five months, DriveSavers recovered the contents of two of them, with artifacts of the damage still visible in certain images. Missing, however, were most photos of his father’s sojourns across the Pacific after emigrating from China in the nineteen-forties: “They must have been on another drive.”

Whether or not people get their files back, they tend to emerge from the experience of data loss at least slightly changed. Kevin Bewersdorf left New York City for the Catskills in 2016. A filmmaker and visual artist, he yearned for a more grounded life, which he found in the rural town of New Kingston. He embarked on a new career as a full-time contractor and handyman, jobs whose patient intimacy fostered a deep love of the place and its people. “Every day, some little beautiful thing will happen on the job sites—the way the light is shining or a person who stops by,” he said. He made a daily practice of filming such moments, which he saved to an external drive. As years passed, he realized that a film was taking shape.

In November, 2023, Bewersdorf was transferring footage in his blue easy chair when inspiration struck. He reached for a nearby notebook, but his arm caught the cable linking his MacBook to the drive, which crashed to the ground. When he plugged it back in, the drive didn’t even register. He tried to stay calm.

“I pride myself on shunning preciousness,” Bewersdorf told me. “ ‘Oh, my movie, I was gonna make this cool movie’—who cares? There’s a lot going on in the world.” After trying a few home remedies from Google and Reddit, he resolved to move on. Yet sadness gnawed at him, especially after an elderly neighbor he’d often filmed passed away. A friend recommended DriveSavers, and after agonizing over the price tag he sent the drive in. The files were back by Christmas, and last summer “New Kingston” premièred at the Rockaway Film Festival.

Three women in long dresses play badminton.

“I had more reverence for what I was doing, which is part of the value of death,” Bewersdorf told me. “It’s funny, these ‘files’—what are they, even? Electrons vibrating in some container. But if they can die, if we can lose them in the way that we can lose the information that makes up a person, then they live.” It’s a truth reflected by the very language we use to describe digital storage, he went on: “They say you ‘save’ a file, like it’s going to Heaven—the idea of salvation is woven into it. I don’t know what digital Hell would be. I’m just saying that digital Heaven is where all the files are.”

Yet salvation is never guaranteed. In the summer of 1995, Peter Sacks, then a professor of English at Johns Hopkins University, was nearly done with a book he’d been writing for the past seven years. He always drafted in longhand but had recently embraced digital revision, typing out his manuscript on a Kaypro word processor while staying with a friend on Martha’s Vineyard. When the time came to return to Baltimore, he didn’t know what to do with his boxes of handwritten materials. Too polite to impose them on his host, he took them to the landfill, then set out for Logan International Airport.

“There was a sense of unburdening,” he told me at his studio. “But I also didn’t realize the fragility of the medium I was trusting.” The book was on two floppy disks, which he put in a tray at the security checkpoint; upon his arrival in Baltimore, he slotted them into the Kaypro and found that they could no longer be read. There might still have been a chance to save the data were it not for a technical misstep. “You had an option to reformat,” he explained. “I erased the whole thing.”

Sacks enlisted a friend to search the landfill, and he made a series of calls to the university’s I.T. department. But the trash had been turned over, and the specialists said that nothing could be done. The book’s loss seemed to him strangely foretold by its subject: the emergence of modernism in art and literature against the backdrop of mechanization, and the fragmentation of nineteenth-century notions of the poetic “I.” Now it was Sacks himself whose subjectivity had been shattered. “It was a sense of falling and never really hitting bottom,” he recalled. “In some ways, I still haven’t.”

He fell into a depression and largely stopped writing; although he continued composing poetry and occasional essays, he would never again publish a book-length work of prose. During a residency in Marfa, Texas, he entered a period of “mute wordlessness,” taking landscape photos and covering them with lines of Wite-Out. “I was working through the grief of having something disappear,” he said. “But that erasure was also opening up a new space that hadn’t existed, and that became the field into which I moved.”

Sacks is now a highly regarded artist. The walls of his studio were covered with his vibrant, densely collaged paintings. A triptych called “Paradiso” showed a white expanse traversed by ribbons of color, so layered with pigment, textiles, scraps of verse, and found objects that it was almost barnacled. “I’m trying to make something ‘digital’ in the sense of your fingers,” he said, inviting me to touch the work. “The materials are things that seem to have been worn, torn, burned, and have a duration.” And the paintings began, in part, as a meditation on erasure—a rebuke, of sorts, to a digital regime that had abandoned writing’s tactility.

If he still had the erased floppies, he’d probably incorporate them into a work as a memento mori, he told me. I asked whether he’d even want the book recovered, were such a thing possible. “Bring Eurydice back for real?” he replied. “Absolutely. I’m at peace with it, but not that much.”

Before I left DriveSavers, my iPhone was brought out in a little red bin, like a patient on a gurney, or a body in a drawer at a morgue. It was pronounced unrecoverable. The engineers had managed to revive it, but it wouldn’t accept the passcode I’d given them, though I felt certain I’d remembered it correctly. Nevertheless, I declined to use the company’s solid-state shredder, which extrudes a kind of silicon confetti; to me, its gears were the crocodile jaws of the Egyptian goddess Ammit, who eats the hearts of the damned.

A few weeks later, DriveSavers called about those two hard drives I’d found, which I’d also sent them. One had suffered a fatal head crash, but the other merely had a failed control board and had soon been spun up again. The company sent me a flash drive with its data, and I plugged it in with nervous anticipation—might it contain some unfinished work of my father’s? Perhaps I’d find the jazz opera he’d wanted to write about Frederick Bruce Thomas, a Black émigré from rural Mississippi who’d opened a legendary night club in tsarist Moscow.

Alas, the recovered hard drive was mine. I found instant-messenger logs from high school, alternately mortifying and endearing, and various coding projects, including my browser-based version of the ancient Egyptian board game Senet. (Some things never change.) But there were only taunting flashes of the stories and journal entries I remembered writing; in what felt like a prank played by the ghost of my adolescence, I couldn’t guess the password to a locked file saved as “Thoughts.doc.”

Had everything else been on the other drive? Or had I simply imagined all these precious virtual talismans, my father’s and my own? The cascade of disappointments caused me to doubt my own recollections, as though my brain were only a bad pressing of some lost digital master. It also brought back the memory of my first data-loss experience.

I was fourteen when my computer crashed in a botched upgrade. The games I’d been coding were gone, as was the scenery I’d designed for Microsoft Flight Simulator. I was inconsolable. My father, though already in pajamas, put on his blue bathrobe and hastened to the studio to operate. He disassembled the machine, which he’d also built, while I hovered nearby.

The recovery operation stretched into the wee hours. He swapped the drive into another computer, which he used to analyze the corruption. Ultimately, he concluded that the files had been overwritten by Windows Vista—an operating system so buggy that it was nicknamed the Visaster. He broke the news with a sad smile and a line from “The Lion King,” delivered by Scar: “Life’s not fair.”

He told me a story about his own father, who’d left when he was young. They were more or less estranged but met occasionally to pretend otherwise. Once, my grandfather announced that he’d found a roll of film with the only extant footage of my father’s childhood. He invited him over to screen it, hoping, perhaps, to mend through nostalgia a relationship that had never been whole. But the tape had aged so badly that it disintegrated in the projector, along with their illusory reconciliation.

At the time, I was horrified. A child of the early nineties, whose first, second, and third everythings had been meticulously committed to camcorder, I could hardly imagine such a bonfire of beginnings or see that the story was an heirloom infinitely more valuable than the footage it concerned. Now I knew otherwise. It would have been nice to have the voice mails, the diaries, the unfinished music. But some records are most revealing when they’re zeroed out. ♦

↑ top

12.The fastest Linux timestamps

Sourcehttps://www.hmpcabral.com/2026/04/26/the-fastest-linux-timestamps/

Sitehmpcabral.com

Submitterhmpc (Hacker News)

Submitted2026-04-26 13:00 UTC (Hacker News)

HN activity34 points · 8 comments

Length4.9K words (~22 min read)

Languageen-gb

Adventures in over-optimising timers for niche applications.

TL;DR: We can speed up timestamps on x86 Linux by 30% and maintain the same precision as the standard system clock by implementing our own timers without relying on vDSO. Almost nobody should do this.

Table of contents

Timing the timers
The TSC
When syscalls aren’t
Faster monotonic clocks
Making our own vDSO
Measuring tails
Stable timers
Conclusion
Appendix: Methodology

Timing the timers

One of my pet projects at my last job was to introduce distributed tracing to a low-latency pipeline (think 1–10 microseconds per stage) using OpenTelemetry. As part of this effort I spent a considerable amount of time designing, implementing, and optimising our own C++ tracing client library, as the official one has too much overhead. My goal was for the latency impact per component to stay under 5% so both developers and users would feel comfortable leaving traces always on in production; this translated to a budget of about 50–100 ns (a few hundred clock cycles) per span.

As you might imagine, at this scale you must carefully consider every aspect of the design and implementation, from ID generation to serialisation. One of these not-so-small details is how to timestamp spans. OTLP uses two time fields, one each for the start and end of the span as measured by the local wall clock. Although the end time is an absolute timestamp, it’s expected that it will always be later than the start time, as its primary purpose is to measure the span duration. The official client handles this roughly as:

Span::Span(/* ... */)
{
    // ...
    start_time_ = std::chrono::system_clock::now();
    start_steady_time_ = std::chrono::steady_clock::now();
    // ...
}

void Span::End(/* ... */)
{
    // ...
    auto end_steady_time = std::chrono::steady_clock::now();
    auto duration = end_steady_time - start_steady_time_;
    end_time_ = start_time_ + duration;
    // ...
}

It takes the start time from the real-time clock and uses two timestamps from the monotonic clock to calculate a nonnegative span duration without interference from discontinuous system clock adjustments. The end time is a synthetic timestamp obtained by adding the duration to the start time, rather than directly from any clock.

Does querying the system clocks three times per span have any significant performance impact? If you’re at all familiar with Linux internals, you might expect the answer to be no: after all, in practically any application using the C library the clock_gettime() syscall (indirectly called by the now() functions) will be routed through vDSO to avoid context-switching into the kernel. Let’s do a quick benchmark to confirm:

void BM_NaiveBackToBack(benchmark::State& state)
{
    for (auto _ : state) {
        auto ts = std::chrono::system_clock::now();
        auto start = std::chrono::steady_clock::now();
        auto duration = std::chrono::steady_clock::now() - start;

        benchmark::DoNotOptimize(ts + duration);
    }
}

On my laptop¹ (see the appendix for details on the setup) this yields iteration times between 46 and 49 ns—almost our entire time budget for a span, spent just on timestamping! Clearly this will not suffice.²

If we’re to meet our latency constraints, we’ll need to understand how Linux clocks work under the hood and find out how much weight we can shed. We’ll see what the x86 timestamp counter is and how it works, do a deep dive into the implementation of vDSO, and use our newly acquired knowledge to chop over 50% of the timing overhead from our initial attempt. All the benchmarking code is available. If you’re already familiar with the TSC and vDSO internals feel free to skip to the good stuff.

This post focuses on x86 Linux, although vDSO works largely the same way in other architectures. I will be using Linux 6.8 as a reference, as that is my current kernel version. If you want to replicate these results on more recent kernels, be aware that the layout of the data page was modified in version 6.15.

The TSC

Almost anyone who has written microarchitecture benchmarks or otherwise needed fast and accurate timestamps on x86 platforms is well acquainted with the CPU’s timestamp counter, or TSC. Quoting from Intel’s System Programming Guide:

The time-stamp counter […] is a 64-bit counter that is set to 0 following a RESET of the processor. Following a RESET, the counter increments even when the processor is halted by the HLT instruction or the external STPCLK# pin.

The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. […] The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource.

(By “newer processors” in the second paragraph we should understand any relatively modern CPU, really. You can confirm whether this is the case on your machine by looking for the constant_tsc and nonstop_tsc flags in /proc/cpuinfo.)

An invariant TSC behaves exactly as you would expect from a clock: it’s fully synchronised across cores (so it doesn’t matter from which one you read it), runs at a constant rate independent of frequency scaling³, and doesn’t stop even when the system is idle or suspended. It’s still, of course, subject to frequency deviations like any other clock, which is one reason why NTP or PTP synchronisation is important.

Although, as the manual suggests, TSC reads are in fact much more efficient than the alternatives, they’re not free. The cost of reading the TSC is twofold: the instruction itself is slow (rdtsc has a reciprocal throughput of 25 core clock cycles on Skylake) and the instruction stream must be serialised first, either through an explicit lfence or by using rdtscp⁴ (32 cycles), so that all preceding instructions execute before we read the counter; otherwise we can easily introduce errors of 10-30%, depending on the workload.

When syscalls aren’t

I won’t spend much time explaining what the vDSO itself is, as the man page does a pretty good job. The long and short of it is that when a user-space process invokes certain system calls through the C library (most notably, clock_gettime()), the library directs that call to a small shared library mapped into the process (the vDSO), which avoids the overhead of switching into ring 0 by reading the required information from a memory region shared with the kernel, called the vvar or data page.

The data page is placed four pages before the vDSO mapping itself and contains two vdso_data structures (one for the high-resolution clock, subject to frequency adjustments, and one for the raw clock) at offset 128. We can find the vDSO code to read from these structures in lib/vdso/gettimeofday.c. We’re particularly interested in the do_hres() function, which is used for all calls to high-resolution clocks such as CLOCK_REALTIME and CLOCK_MONOTONIC (corresponding to system_clock and steady_clock, respectively). Here’s a simplified version of the function assuming an x86 target without time namespaces and an invariant TSC:

static int
do_hres(const struct vdso_data *vd, clockid_t clk, struct __kernel_timespec *ts)
{
    const struct vdso_timestamp *vdso_ts = &vd->basetime[clk];
    u64 cycles, last, sec, ns;
    u32 seq;

    do {
        seq = vdso_read_begin(vd);
        cycles = __arch_get_hw_counter(vd->clock_mode, vd);
        ns = vdso_ts->nsec;
        last = vd->cycle_last;
        ns += (cycles - last) * vd->mult;
        ns = ns >> vd->shift;
        sec = vdso_ts->sec;
    } while (unlikely(vdso_read_retry(vd, seq)));

    ts->tv_sec = sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
    ts->tv_nsec = ns;

    return 0;
}

The way it works is as follows: on every kernel tick, the CPU responsible for updating the timers does a bunch of math to advance the system clock and updates the data page with the resulting values. This is the kernel’s best estimate of the current time for each clock, along with the current cycle count of the underlying clock source (on x86, of course, the TSC) and a multiplier/shift pair to efficiently convert cycles into nanoseconds (essentially a fixed-point representation of the estimated clock period). The whole structure is protected by a seqlock: the kernel increments the sequence number before and after each update so readers can recognise when an update is in progress (the sequence number is odd) or when it occurred during the read (the sequence number changes).

Back in userspace, all do_hres() needs to do is load the values from the page, get the current cycle count from the TSC (through __arch_get_hw_counter(), which uses lfence+rdtsc or rdtscp if available), convert the difference to a nanosecond offset, and finally fold the excess amount into the seconds value.

Faster monotonic clocks

Because all the heavy lifting is done in the kernel, calls to clock_gettime() through vDSO are quite efficient for most purposes. But there is still some extraneous work slowing down our use case:

Even though the only difference between the wall clock and the monotonic clock is the base time, we need to do two initial calls to get both timestamps, duplicating the critical region, several loads, the conversion from cycles to nanoseconds, the division into seconds, and above all the expensive TSC read. (Reading the TSC twice also means we use two slightly different timestamps for the start of the span.)
Because we only use the monotonic clock to measure a time interval, we could skip the calculation of timestamps and convert only the cycle difference to nanoseconds using the multiplier and shift values: delta = ((cycles_end - cycles_start) * mult) >> shift.
Because OpenTelemetry timestamps are in nanoseconds since the epoch, we don’t need to normalise the seconds field or even calculate it at all. Although this does not use an actual division instruction, it still carries a cost.

Let’s see if we can reach a solution that addresses all of these points. To start, we’ll refactor our initial timing logic into its own class so we can more easily benchmark different approaches:

class NaiveTimer
{
public:
    using duration = std::chrono::steady_clock::duration;
    using time_point = std::chrono::system_clock::time_point;

    time_point start() noexcept
    {
        const auto ts = std::chrono::system_clock::now();
        start_ = std::chrono::steady_clock::now();
        return ts;
    }

    duration elapsed() const noexcept
    { return std::chrono::steady_clock::now() - start_; }

private:
    std::chrono::steady_clock::time_point start_;
};

As I mentioned earlier, the standard way to measure time intervals with low overhead on x86 is by directly reading the TSC. We can replace our calls to the monotonic clock as follows⁵:

static inline __attribute__((always_inline)) uint64_t rdtsc()
{
    uint32_t id;
    return __rdtscp(&id);
}

template <typename Estimate>
class TscTimer
{
public:
    using duration = std::chrono::duration<int64_t, std::nano>;
    using time_point = std::chrono::system_clock::time_point;

    time_point start() noexcept
    {
        const auto ts = std::chrono::system_clock::now();
        start_ = __rdtsc();
        return ts;
    }

    duration elapsed() const noexcept
    {
        const auto end = rdtsc();
        return duration{Estimate::ns_from_cycles(end - start_)};
    }

private:
    uint64_t start_;
};

I encapsulated the conversion from cycles to nanoseconds in a separate template parameter so we can measure the overhead of different methods. There are essentially two ways to do this conversion: dividing by the frequency or multiplying by the period, which should be significantly faster. I will skip the details of the estimation itself, as they’re not relevant to our purposes (but see the implementation).

An interesting thing about TscTimer is that we don’t need explicit serialisation in start() because, as we have seen, clock_gettime() itself already serialises the instruction stream; we only need to do it again if we really want to exclude the instructions in-between from our measurement.

How much of an improvement do we get with our new timers?

Timer	Median time (ns)	Gain
Naive	47.2	—
`TscTimer` with frequency estimate	33.8	28%
`TscTimer` with period estimate	28.3	40%

Not bad! We have addressed most of points 1 and 2 without much effort for a substantial gain. As expected, using a multiplication for the conversion to nanoseconds instead of an integer division also helps quite a bit. But we’re still reading the TSC twice at the beginning and unnecessarily converting to and from a split seconds/nanoseconds representation. Can we go further?

Making our own vDSO

We know that the vDSO doesn’t have any special, privileged access to the kernel; ultimately, it simply provides a convenient interface to read the data on the vvar page. There’s nothing stopping us from implementing our own logic to do this too, tailored to our own needs; all we need is the layout of the page, which we already have, and its address. A closer reading of the man page tells us:

The base address of the vDSO (if one exists) is passed by the kernel to each program in the initial auxiliary vector (see getauxval(3)), via the AT_SYSINFO_EHDR tag.

Since we know the vvar page is located four pages before the vDSO mapping, we can easily get a pointer to the clock data:

static constexpr size_t VVAR_OFFSET = 4;
static constexpr size_t VDSO_DATA_OFFSET = 128;

static const vdso_data* get_vdso_data()
{
    auto page_sz = sysconf(_SC_PAGESIZE);
    auto vdso = getauxval(AT_SYSINFO_EHDR);
    auto vvar = vdso - VVAR_OFFSET * page_sz;
    return reinterpret_cast<const vdso_data*>(vvar + VDSO_DATA_OFFSET);
}

Now we just draw the rest of the owl:

class VdsoTimer
{
public:
    using duration = std::chrono::duration<int64_t, std::nano>;
    using time_point = std::chrono::time_point<std::chrono::system_clock, duration>;

    time_point start() noexcept { return time_point{read_clock(data)}; }

    duration elapsed() const noexcept
    {
        auto cycles = rdtsc();
        return duration{((cycles - start_) * mult_) >> shift_};
    }

private:
    static constexpr uint64_t NS_PER_SEC = 1'000'000'000UL;

    duration read_clock(const vdso_data* vd)
    {
        auto rt = &vd->basetime[CLOCK_REALTIME];
        uint64_t cycles, last, sec, ns;
        uint32_t seq, mult, shift;

        auto vseq = std::atomic_ref{vd->seq};

        do {
            while ((seq = vseq.load(std::memory_order_acquire)) & 1)
                _mm_pause();

            cycles = rdtsc();
            sec = rt->sec;
            ns = rt->nsec;
            last = vd->cycle_last;
            mult = vd->mult;
            shift = vd->shift;
        } while (seq != vseq.load(std::memory_order_acquire));

        ns += (cycles - last) * mult;
        ns >>= shift;

        start_ = cycles;
        mult_ = mult;
        shift_ = shift;

        return duration{sec * NS_PER_SEC + ns};
    }

    static inline const vdso_data* const data = get_vdso_data();

    uint64_t start_;
    uint32_t mult_;
    uint32_t shift_;
};

There are a few implementation choices here that require some elaboration:

Using C++ atomics to handle the seqlock gives us the correct semantics for free without having to worry about the details of atomicity, memory ordering, or compiler barriers, even on other architectures like ARM with weaker guarantees.
We use a pause instruction in the busy-wait loop (just as the kernel does) as a hint to the processor to avoid memory order violations on exit that might cause it to flush the pipeline. On some architectures, such as Skylake, this instruction can be exceedingly slow (~140 cycles) and might not be worth it.
Reading the TSC inside the loop, as in the original vDSO implementation, improves median performance by not waiting for the loads from the data page to finish. This does mean that our measurement includes the cycles used by these instructions, but since any potential cache misses must already have occurred when loading seq the difference will be minimal. Conversely, if we want to make sure to include the full timer overhead we can move the read before the loop.
Instead of loading the multiplier and shift from the data page at the end of the interval, we cache them in the timer itself at the cost of an extra 8 bytes, avoiding entering the critical region again and the attendant possibility of a cache miss after an update.⁶

I chose to let the compiler inline the start/stop methods in the benchmarks since that’s how they would be used in a tracing library. In other applications you may choose to no_inline the start method to avoid growing the code size and taking up space in the branch prediction unit, at the cost of a few extra cycles (elapsed() is small enough and has no branches so there’s no gain doing that).

Note that, unlike with TscTimer, VdsoTimer uses the kernel’s current best estimate of the TSC frequency to calculate the time difference. This way, not only do we avoid having to make our own estimate but we also take advantage of the kernel’s clock synchronisation (e.g. through NTP), making our measurements more precise.

Time to see what our efforts have gained us:

Timer	Median time (ns)	Gain
Naive	47.2	—
`TscTimer` with period estimate	28.3	40%
`VdsoTimer`	20.5	57%

We managed to cut over half of our initial timing overhead! As small as these differences might be in absolute terms, cumulatively over the whole library they’re significant enough that we can go from “we turn tracing on for debugging” to “tracing is on by default”.

We can confirm that we’re not chasing ghosts by timing a simulated computational load (calculating a Collatz trajectory) calibrated to take 1 μs. For completeness, I also added a non-monotonic timer which uses only the wall clock to measure the interval, to show how discarding the monotonicity requirement by itself is not enough to get the same results:

Benchmark	Median time (ns)	Overhead
Baseline	1000	—
Naive timer	1043	4.3%
Non-monotonic timer	1027	2.7%
`TscTimer` with period estimate	1027	2.7%
`VdsoTimer`	1020	2.0%

The overhead is largely in line with the back-to-back benchmarks; there’s not a lot of opportunity for ILP or reordering to hide the timing latency due to the need to serialise execution.

Measuring tails

At this point, the five of you who care enough about performance to be here might be screeching incoherently about tail latencies and 99th percentiles. Fear not, gentle reader, I have not forsaken you.

The one problem all of the approaches we have seen so far have in common is what happens when the kernel updates the data page, which as we know happens on every tick. An update slows down the timer in two ways. The most obvious one is that the caller might have to spin waiting for the update to finish; the other is that we have L1 and L2 cache misses on the vDSO data⁷: one if the update already happened (all the data is cache-aligned and fits in a single line), and two or more if it’s ongoing or starts while we load the data, bouncing the cache line between cores.

To capture these events we need to write our own benchmark that averages the duration of a small number of calls, repeated many times over a relatively long period (tens of milliseconds). To distinguish the three cases I have just described we’ll use the CPU performance counters to measure the number of retired instructions and L1 cache misses, as well as the elapsed time measured from the TSC. This is what we expect to see:

#	Update since last call	Cache misses	Retired instructions	Duration
1	No	0	Baseline	~Constant
2	Yes, complete	1	Baseline	Slower
3	Starting or ongoing	2+	Higher	Slowest

We’ll read each counter before and after each run of 100 paired, back-to-back start()/elapsed() calls for each timer, so we can sufficiently amortise the measurement overhead while ensuring there’s at most one update per run. I won’t go over the implementation here, but let’s analyse the results. Here’s the average duration per run per timer over the whole benchmark⁸:

The dots above the baseline for each timer, at precisely spaced 1 ms intervals (my kernel is compiled with HZ = 1000), show the effect of updates to the data page on our runtime. Since the y-axis is averaged over 100 calls, and we have at most one update per run, the actual tail latency impact is a hundred times higher than it appears on the graph:

Even with our vDSO bypass timer, we can still have tail latencies in excess of 200 ns above the median—4x higher than our total time budget for a span. Not ideal!

We can also look at the total L1 cache misses versus the number of additional retired instructions for the vDSO bypass to confirm our earlier hypothesis:

The cases with one cache miss correspond to scenario 2 in the table above (a complete update happened between the call to elapsed() and the next start()), and as expected show no matching increase in retired instructions. You can also see these represented in the duration graph as the three runs with deviations from the median around 20 ns. The other cases (two and three misses) correspond to scenario 3, where we need to spin waiting for the update to finish.

Stable timers

Although we have significantly reduced the median latency impact of our timer, we’re still left with undesirable tails significantly above our target. The problem is that every call to start() runs the risk of either catching the kernel mid-update or otherwise incurring an L2 cache miss due to a previous update.

What if we didn’t have to read from the data page on every call? Stable systems rarely or never have discontinuous clock adjustments, and the frequency adjustments are gradual enough that it won’t matter if we miss a few—assuming we even care to track them at all. Instead, we can cache the required data ourselves⁹ and refresh it at some acceptable frequency (below or equal to HZ) as part of our main event loop whenever we know we have enough cycles to spare. This also allows us to pre-compute the conversion from seconds to nanoseconds, saving us a load, a multiplication, and an addition:¹⁰

class VdsoCacheTimer
{
public:
    static void refresh() noexcept { cache = read(vd); }

    time_point start() noexcept { return time_point{read_clock(data)}; }

    duration elapsed() const noexcept
    {
        auto cycles = rdtsc();
        return duration{((cycles - start_) * cache.mult) >> cache.shift};
    }

private:
    struct VdsoCache
    {
        uint64_t last, ns;
        uint32_t mult, shift;
    };

    // ...

    duration read_clock() noexcept
    {
        auto cycles = rdtsc();
        start_ = cycles;
        auto d = ((cycles - cache.last) * cache.mult) >> cache.shift;
        return duration{cache.ns + d};
    }

    static inline const vdso_data* const vd = get_vdso_data();
    static inline VdsoCache cache = read(vd);

    uint64_t start_;
};

But now that we have cached the current time for the vDSO bypass, can’t we just do the same for TscTimer? Indeed we can, giving us our final line-up:

Timer	Median time (ns)	Gain
Naive	47.2	—
`TscTimer` with period estimate	28.3	40%
`TscCacheTimer`	19.6	58%
`VdsoTimer`	20.5	57%
`VdsoCacheTimer`	20.0	58%

Both implementations completely avoid the latency tails that plagued other approaches, as expected. The cached TSC timer has a tiny performance advantage over VdsoCacheTimer (around 0.1 cycles per iteration) due to the use of a hardcoded shift, while VdsoCacheTimer has a slight edge on accuracy since it tracks the TSC frequency measured by the kernel and only reads the TSC once on refresh.

Conclusion

We have implemented two efficient timers with highly predictable performance (no tails and tight clustering around the median), but there is an obvious downside to bypassing the vDSO: whenever the layout of the data page changes, as in Linux 6.15, we have to update our implementation accordingly. Therefore, for almost all applications that need to worry about this in the first place TscCacheTimer will be a better choice, as long as the small loss of precision is not an issue.

What’s more important to understand, and more generally applicable, is that typical benchmarks only ever present part of the picture; if you need highly-predictable latency, you cannot rely exclusively on statistical averages to characterise your components. Just as relevant is understanding what lies beneath the abstractions we rely on in order to make informed guesses as to what might affect their performance and under what circumstances, so we’re not stuck mindlessly playing whack-a-mole with cache miss or branch misprediction counters.

Appendix: Methodology

All benchmarks were compiled with clang-20 with -O2 -march=x86-64-v3 -fno-unroll-loops flags and executed on Ubuntu 24.04 with Linux kernel 6.8.0, on an Intel Core i7-8565U processor with four cores in a single socket. Core 3 was isolated at boot using isolcpus=3 nohz_full=3 rcu_nocbs=3 kernel command-line arguments and the benchmark processes pinned to it. Hyper-threading was disabled and the intel_pstate driver put in passive mode. All cores were configured to use the performance governor with a fixed 4.1 GHz frequency (the highest operating frequency sustained by the CPU on all four cores).

CPU details

Architecture:                x86_64
  CPU op-mode(s):            32-bit, 64-bit
  Address sizes:             39 bits physical, 48 bits virtual
  Byte Order:                Little Endian
CPU(s):                      8
  On-line CPU(s) list:       0-3
  Off-line CPU(s) list:      4-7
Vendor ID:                   GenuineIntel
  Model name:                Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
    CPU family:              6
    Model:                   142
    Thread(s) per core:      1
    Core(s) per socket:      4
    Socket(s):               1
    Stepping:                12
    CPU(s) scaling MHz:      89%
    CPU max MHz:             4600,0000
    CPU min MHz:             0,0000
    BogoMIPS:                3999,93
    Flags:                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon
                             pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt
                             tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adju
                             st bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi md_cl
                             ear flush_l1d arch_capabilities ibpb_exit_to_user
Virtualisation features:
  Virtualisation:            VT-x
Caches (sum of all):
  L1d:                       128 KiB (4 instances)
  L1i:                       128 KiB (4 instances)
  L2:                        1 MiB (4 instances)
  L3:                        8 MiB (1 instance)
NUMA:
  NUMA node(s):              1
  NUMA node0 CPU(s):         0-3
Vulnerabilities:
  Gather data sampling:      Mitigation; Microcode
  Indirect target selection: Mitigation; Aligned branch/return thunks
  Itlb multihit:             KVM: Mitigation: VMX disabled
  L1tf:                      Not affected
  Mds:                       Not affected
  Meltdown:                  Not affected
  Mmio stale data:           Mitigation; Clear CPU buffers; SMT disabled
  Reg file data sampling:    Not affected
  Retbleed:                  Mitigation; Enhanced IBRS
  Spec rstack overflow:      Not affected
  Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:                Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
  Srbds:                     Mitigation; Microcode
  Tsa:                       Not affected
  Tsx async abort:           Not affected
  Vmscape:                   Mitigation; IBPB before exit to userspace

Benchmarks were compiled and executed separately for each timer. The table measurements for back-to-back calls are the medians of 100 thousand runs, each averaged over 100 iterations; the measurements for the simulated task, as well as each graph, represent 10 thousand runs.

A laptop with a mobile processor is, of course, a far cry from the overpowered CPUs used in datacentres, so the actual numbers are not directly comparable. However, for our purposes the relative differences between benchmarks are what matters most, and those do not change significantly. ↩
One of the perils of microbenchmarking is that we can be led down a rabbit hole of optimisations that look good in isolation but have no or negative impact in production. We might wonder if this benchmark is entirely fair to “naive” timestamping; maybe when integrated in our codebase most of the impact would be absorbed by the execution pipeline or the reordering buffer. We will see later why this is not the case. ↩
It seems to be a popular misconception that the TSC frequency is the same as the base CPU clock frequency, but this is not the case. Until recently, in fact, the most reliable way to determine the TSC frequency was to directly estimate it using a separate, reliable timer (such as HPET). For example, on my laptop:
```
$ sudo dmesg | grep MHz
[    0.000000] tsc: Detected 2000.000 MHz processor
[    0.000000] tsc: Detected 1999.968 MHz TSC
[    1.398488] tsc: Refined TSC clocksource calibration: 1991.999 MHz
```
↩
This is not entirely correct, as per the manual rdtscp is not a serialising instruction. It does wait until all previous instructions have executed, just like lfence, but the latter also explicitly prevents all subsequent instructions from executing (even speculatively) until it completes. In practice, unless you’re measuring microarchitectural features of the processor, they amount to the same result. ↩
I’m using x86 intrinsics here (from x86intrin.h) rather than inline Assembly for simplicity of exposition; the generated code, however, is identical. ↩
This may produce slightly different measured values than our initial approach in case the multiplier is updated during the timed interval (the shift is never updated after initialisation), but the difference should be irrelevant in practice. We could also just load the multiplier from the page without locking, since 64-bit stores and loads are guaranteed to be atomic, but this can still trigger an L2 cache miss. ↩
This assumes that the scheduler does not run on the same core where we run the timer, as is the case with any low-latency workload running on isolated cores. If that’s not true the context switch dominates the total latency anyway and you have bigger problems to fix. ↩
The total duration of the benchmark changes because we have a fixed number of runs for each timer. ↩
We can make the cache thread-local for multi-threaded applications, although to maintain performance we need to store a pointer to the current thread’s cache in each timer. ↩
The code clang-20 generates for this implementation is sub-optimal due to some questionable ordering choices that don’t fully take advantage of ILP, as well as the use of RIP-relative addressing to access the data (which increases the average instruction length, saturating the fetch unit). The benchmark numbers below for both VdsoCacheTimer and TscCacheTimer are for a fine-tuned implementation that uses general-purpose registers more aggressively. ↩

↑ top

13.Butterflies are in decline across North America, a look at the Western Monarch

Sourcehttps://www.smithsonianmag.com/science-nature/butterflies-are-in-dramatic-decline-across-north-america-a-close-look-at-the-western-monarch-shows-why-180988582/

SiteSmithsonian Magazine

AuthorDarren Orf

Published2026-04-24

HN activity150 points · 44 comments

Length2.9K words (~13 min read)

Languageen

Pesticides, habitat loss and climate change have taken their toll on the beloved insects. But the experts working with them still find hope for their future

Butterflies Are in Dramatic Decline Across North America. A Close Look at the Western Monarch Shows Why

Pesticides, habitat loss and climate change have taken their toll on the beloved insects. But the experts working with them still find hope for their future

Darren Orf

April 24, 2026

four monarchs on a eucalyptus branch — Western monarch butterflies bask in the sun on a eucalyptus branch at Lighthouse Field State Beach. In December 2025, researchers placed ultralight radio tags on some monarchs at this site, hoping to track their movements and identify areas to prioritize for the species’ conservation. Darren Orf

On a misty Friday morning in November, with dawn’s last warm hues clinging to the gloomy clouds above, the Pacific Grove Monarch Sanctuary is whisper quiet. Even the persistent roar of the ocean, only a few minutes’ walk from this small copse among single-family homes, is deadened by walls of eucalyptus and Monterey cypress. The tires of approaching cars crunch on a gravel road that slips between two buildings, each adorned with murals of the brilliant monarch butterfly.

In one of those cars is Natalie Johnston, the interpretive programs manager at the Pacific Grove Museum of Natural History. With a pair of binoculars and a clipboard in hand, Johnston, along with a small cadre of volunteers, is canvassing this two-acre sanctuary in search of monarchs.

Every year, the monarch butterfly makes one of the animal kingdom’s most wondrous migrations as millions flutter across the United States to warmer climates. In the east, their destination is the cozy comfort of central Mexico’s oyamel fir forests, but the much smaller western monarch population—mostly separated from its eastern counterparts by the Rocky Mountains—instead makes its way to Pacific Grove and hundreds of similar sites along the California coast.

Because monarchs require the sun’s warmth to fly, cool mornings like this one provide the perfect opportunity to count them before they begin stirring. Peering into the canopy, the volunteers categorize the insects by their behavior, counting “sunners,” “grounders,” “loners” and even a “flier” or two. On her clipboard, Johnston notes the individual trees containing butterflies. Today, a few “loners” are scattered throughout the grove, and only one small collection of 72 is nestled together. The day’s final count: 99.

two people wearing orange vests look up into the tree canopy with binoculars — Liese Murphree (left), director of education and outreach at the Pacific Grove Museum of Natural History, and Kat Morgan (right), monarch docent, conduct their weekly counts of the monarch butterfly at the Monarch Grove Sanctuary. Darren Orf

In the past few years, Johnston has experienced some emotional highs during these weekly counts, as in 2021, when a single tree hosted thousands of monarchs among its broad branches. But nothing prepared her for what she witnessed one Friday morning in early 2024.

“‘Oh my God, there are so many grounders,’” Johnston remembers saying after spotting some 200 dead or dying monarchs on private property near the grove. “We started counting—one, two, three, four, five—but they’re in these dense piles … spasming, their abdomens curled. … For so many of them to be wiped out in a single event in a place that was supposed to be safe was just horrible.”

Johnston describes that day as one of the worst experiences of her life. Over the next two weeks, staff and volunteers continued to see dying monarchs with the same symptoms, though in smaller numbers. A toxicology report published a year later revealed a cocktail of pesticides in the dead insects’ bodies, including some toxins typically found in residential sprays.

Volunteers found dead and dying monarchs in January 2024. A toxicology report revealed several pesticides in the insects’ bodies. Kat Morgan

This mass casualty is just one highly visible event among many invertebrate dramas that play out every day. Monarchs, as well as hundreds of other butterfly species across the U.S., are struggling to survive against toxic pesticides, habitat loss and a rapidly changing planet. Eastern monarchs face a 56 to 74 percent chance of extinction by 2080, according to the U.S. Fish and Wildlife Service. And western monarchs, during that same period, have a 99 percent chance of vanishing.

Quick fact: A spot of hope

While western monarch counts recently revealed low numbers, the eastern monarch population had some good news in 2026. The amount of habitat occupied by overwintering eastern monarchs increased by 64 percent compared with last year.

Since time immemorial, butterflies like the monarch have been an irreplaceable part of our wild world, but without our help, most of them may soon disappear. That’s why hundreds of scientists, conservationists and volunteers are working together to count and protect this species before it’s too late—because what’s good for the monarch is good for other butterflies as well.

The plight of the butterflies

The monarch is only one of the U.S.’s 750 or so butterfly species, each with its own incredible patterns and idiosyncrasies. The tailed orange, for example, flits about the dry southwest. The West Virginia white calls the moist deciduous forests of the Appalachians home. The natural range of the ruddy copper, with its shocking orange flair, stretches the width and breadth of the western mountain ranges. Although these three species differ in appearance, numbers and geography, they’re all in decline.

a butterfly with white underwings and a bright orange top of its wing sits on some grass — The ruddy copper butterfly (*Lycaena rubidus*) Alan Schmierer via Flickr under public domain

a white butterfly on a clump of yellow flowers — The West Virginia white butterfly (*Pieris virginiensis*) Rlephoto (Randy L. Emmitt) via Wikimedia Commons under CC BY-SA 3.0

For the first time, scientists illustrated the full extent of this ecological crisis in a study published in the journal Science in March 2025 and in a subsequent State of the Butterflies report from the Xerces Society for Invertebrate Conservation. The report covered 554 species and estimated trends for 342 of them, as insufficient data was available for the others. From 2000 to 2020, researchers found, butterflies declined overall by 22 percent across the country. Although some butterflies saw modest increases during that time, a majority did not—and 24 species declined by 90 percent or more, including the tailed orange, the West Virginia white and the ruddy copper.

“When a bulldozer comes through, or a giant flood from climate change happens, or a drought happens, or even an invasive species moves in—that’s something you can see,” says Scott Black, executive director of the Xerces Society and co-author of the Science study. But pesticides, as demonstrated by the Pacific Grove casualties, are essentially invisible, he adds. “They’re an unseen, massive threat.”

Pesticides in some form or another have been around nearly as long as recorded history: Ancient Sumerians in Mesopotamia used sulfur dusting to control pests and mites around 2500 B.C.E. But after World War II, chemical companies in the U.S. created powerful insecticides like Dichlorodiphenyltrichloroethane, or DDT, to control growing pest populations spurred by the vulnerabilities inherent in raising only one crop, or monoculture farming.

Public backlash against DDT in the 1960s led to new pesticide formulas designed to cause less harm to other living things, including humans, while being orders of magnitude deadlier to insects—sort of like trading in an indiscriminate chemical shotgun for a hyper-focused sniper rifle. Across the pesticide industry, the prevailing mind-set, according to some conservationists, became “spray first and ask questions later”—or maybe don’t ask questions at all.

“The insecticides we’re spraying are more toxic. We’re spraying different kinds that are combining, and we’re spraying more of these chemicals across these landscapes,” Black says. This is why “butterflies and other insects are declining at greater rates in the 2000s than in the past.”

In a study published in September in the journal Environmental Toxicology and Chemistry, the University of Nevada Reno ecologist Matt Forister and his team analyzed 336 individual plants, including milkweeds vital to the monarch’s survival, at urban sites in Albuquerque, New Mexico, and Sacramento, California. They found that only 22 of those plants had no detectable levels of pesticides. On average, the plants contained at least three types of chemicals, and 71 of them contained concentrations of pesticides that are lethal or nearly lethal for butterflies.

a monarch on a pink milkweed flower — Monarch caterpillars need to eat milkweed to survive, and adult monarch butterflies lay their eggs only on milkweed plants. Jim Hudgins / USFWS

In a similar study in 2022, Forister and his team tested 235 milkweed plants from 33 retail nurseries across the U.S. and detected 61 different pesticides, with an average of 12.2 pesticides per plant. This echoes the findings of a 2020 study in which Forister tested milkweed plants across 19 sites in California’s Central Valley and two stores that sell plants to home gardeners—pesticides were found in all 227 samples.

“We couldn’t find a milkweed leaf in the north Central Valley that didn’t have pesticides in it or on it,” says Forister. The 2020 study counted 64 different insecticides, herbicides and fungicides in total. “Of this very long list, only a small number have ever been tested on a monarch caterpillar—and that’s just the monarch,” Forister adds. “When you think about the more than 150 other butterflies in the state, we know almost nothing.”

Black describes the butterfly’s plight as a “death by a thousand cuts” scenario. While pesticides ravage the insect’s populations, habitat destruction and exacerbated droughts due to climate change only make things worse. According to Forister, however, even small changes can turn things around. And although progress against habitat loss and climate change will take time, pesticide use could, theoretically, be curbed much sooner.

“Insects are just amazing at responding very quickly to anything good that people do,” he says. “If people stop putting insecticides in their yards, they’ll see more insects. … Even in the heart of the Central Valley, we continue to be surprised by the level of resilience.”

A roadmap to recovery

monarch butterflies on a tree branch — Two monarch butterflies warm their wings in the early morning at Lighthouse Field State Beach. Only 12,260 monarchs were counted during the 2025-26 season at 249 overwintering sites like this one. Darren Orf

About a 60-minute drive from Pacific Grove is Lighthouse Field State Beach in Santa Cruz. Behind the eponymous lighthouse that juts out into Monterey Bay, a stand of stalwart Monterey cypress protects monarchs nesting at the park’s north end. The gray skies are gone, burned away by the surprisingly brilliant November sun overhead, and with that warmth, the blazing orange wings of the monarchs come to life as hundreds flutter from one tree to another.

Crowds gather to take in these bright jewels. Some people are in awe; others express dismay at how few monarchs have survived their arduous fall migrations from the northwestern U.S. While 2021 saw thousands of butterflies visit Lighthouse Field, and hundreds of thousands more at nearly 300 overwintering sites, the 2025-26 season tallied near-record lows, with only 12,260 butterflies total across 249 sites—the third-lowest figure since counting began in 1997.

“It’s not uncommon for insect populations to go up and down by orders of magnitude,” says Cheryl Schultz, an ecologist at Washington State University Vancouver and the senior author of the Science paper. “What we want to do is increase the floor so they don’t go extinct.”

The problems facing butterflies seem almost insurmountable, but Schultz knows that the insects’ disappearance doesn’t need to be a foregone conclusion. That’s because she led the charge of saving one species from the brink of extinction.

a blue butterfly on a purple flower — The Fender’s blue butterfly was brought back from the brink of extinction with the help of conservationists who restored habitats with its host plant, Kincaid’s lupine. USFWS

In Oregon’s Willamette Valley lives the Fender’s blue, a small butterfly that relies on a flowering plant called Kincaid’s lupine for survival. First documented in the 1920s, the butterfly disappeared in the ’30s and was presumed extinct due to destruction of its beloved lupine. Then, miraculously, scientists rediscovered the butterfly near Eugene, Oregon, in the late 1980s. Though it had managed to evade extinction, the species remained perilously endangered. So Schultz began extensive fieldwork to restore its habitat in the Willamette Valley’s upland prairies.

Her team learned what the butterflies needed from their habitat, then helped establish some 90 sites that host the Fender’s blue—they found the locations, planted lupine and protected them. The species became a rare success story among insects by getting downlisted from “endangered” to “threatened” under the Endangered Species Act. “I’m a person who’s always focused on the positive and what we can do,” Schultz says. “That little butterfly took a few decades, and it took a lot of people and a lot of commitment, but it can happen. I have to hold on to that.”

The monarch and Fender’s blue share many characteristics. They both require specific plants, for example, and like most butterfly populations, their numbers are bouncy. However, monarchs are migratory, while the Fender’s blue sticks to Oregon year-round. Migrating creatures bring their own challenges for conservation, as scientists need to spread their attention across a broad landscape—but in some ways, the monarch isn’t even the most difficult migratory species to conserve. The west coast lady, for instance, is found in across the western states and faces upwards of 80 percent declines in many regions.

“It’s a dispersive, large, super-beautiful butterfly, and no one expected it to be in decline 20 years ago. But it’s just plummeting everywhere,” Forister says. “It’s not going to be an easy solution for conservation, because for this particular butterfly, it is not easy to put our finger on a place where we can set up a fence around habitat to restore.”

According to Forister, the west coast lady and the western monarch are examples of traditional conservation biology meeting the headwinds of climate change. Although the methods used to restore the Fender’s blue could work for many butterflies, dispersive and migratory species face the brunt of our warming world, especially because of the immense drying out of the Western U.S., which affects their specific host plants and increases the chances of extreme weather events.

To better understand the stresses on these migratory species, scientists at Lighthouse Field are testing a new ultralight radio tag. Weighing less than a tenth of a gram, these tags, when placed on butterflies, can passively ping Bluetooth- and location-enabled cellphones of anyone nearby. The data is stored in an app called Project Monarch, which allows scientists to accurately track where female monarchs lay their eggs after overwintering.

The hope is that by tracking the butterflies, researchers will find where female monarchs are headed after they leave sites like Pacific Grove and Lighthouse Field. Then, conservationists could employ the tried-and-true methods that saved the Fender’s blue—prioritizing those sites and providing milkweed habitat for future caterpillars.

To build climate resilience into those new habitats, Diana Magor, a longtime volunteer monarch counter, is conducting her own research into the butterfly benefits of heartleaf milkweed. Although not as abundant as the showy or common milkweed, this variety grows earlier in the year. This is a particularly useful attribute, because as climate change causes warmer weather to arrive earlier, butterflies might migrate before common or showy milkweed has sprouted.

“When we restore these habitats and manage pesticides, we see change—positive change,” Black says. “The diversity and abundance of insects goes up, and that happens really quickly.”

Farms could curtail pesticides, smarter land use could protect wild spaces and cutting carbon emissions could help the world avoid the worst-case climate scenarios. But despite conservationists’ best efforts, many butterfly species will still be lost. Looking ahead, Forister hopes, at the very least, that they won’t be forgotten.

“I started a new scientific journal recording the loss of species,” he says. “It makes me feel better, because we’re at least preserving a memory of things as they’re going away and highlighting rare species that we can still look for.”

Hope for a possible future

a group of monarch butterflies on a eucalyptus branch — Western monarch butterflies cling to a eucalyptus branch at Natural Bridges State Beach. Although eucalyptus isn’t native to California, the trees provide a tall, wind-resistant refuge that supports monarchs during the winter. Darren Orf

The sun’s light is already weakening as visitors to Natural Bridges State Beach in Santa Cruz, only a few miles north of Lighthouse Field, descend a long boardwalk terminating at a large, wooden platform. A small group glances skyward in awed silence as, high in the canopies, hundreds of monarchs fly from tree to tree, moving like gently falling leaves that defy the inexorable laws of gravity.

Just a few decades ago, at least 120,000 monarchs overwintered here—in 2025, Natural Bridges saw only 2,500 at its highest count. Yet, faced with these overwhelming odds, these delicate insects metamorphose each year from caterpillars into an indelible symbol of resilience. For people like Johnston, Black, Forister, Magor and Schultz—and the hundreds of others who give their time and talent to protect these vulnerable creatures—seeing the monarchs brings hope.

“There are a lot of people waking up and trying to do this,” Black says. “Will it be enough at the end of the day? … I don’t know yet.” But “I go out and I look at these places and meet the people doing this great work, and it keeps me motivated.”

February 1, 2024
July 28, 2021
April 21, 2021
July 22, 2022
January 22, 2015

↑ top

14.SWE-bench Verified no longer measures frontier coding capabilities

Sourcehttps://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/

SiteOpenAI

Submitterkmdupree (Hacker News)

Submitted2026-04-26 13:58 UTC (Hacker News)

HN activity260 points · 147 comments

Length3.1K words (~14 min read)

Languageen-US

SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress. Our analysis shows flawed tests and training leakage. We recommend SWE-bench Pro.

Since we first published SWE-bench Verified in August 2024, the industry has widely used it to measure the progress of models on autonomous software engineering tasks. After its release, SWE-bench Verified provided a strong signal of capability progress and became a standard metric reported in frontier model releases. Tracking and forecasting progress of these capabilities is also an important part of OpenAI’s Preparedness Framework. When we created the Verified benchmark initially, we attempted to solve issues in the original evaluation that made certain tasks impossible to accomplish in the SWE-bench dataset⁠(opens in a new window).

After initial leaps, state-of-the-art progress on SWE-bench Verified has slowed, improving⁠(opens in a new window) from 74.9% to 80.9% in the last 6 months. This raises the question: do the remaining failures reflect model limitations or properties of the dataset itself?

In a new analysis, we found two major issues with the Verified set that indicate the benchmark is no longer suitable for measuring progress on autonomous software engineering capabilities for frontier launches at today’s performance levels:

Tests reject correct solutions: We audited a 27.6% subset of the dataset that models often failed to solve and found that at least 59.4% of the audited problems have flawed test cases that reject functionally correct submissions, despite our best efforts in improving on this in the initial creation of SWE-bench Verified.
Training on solutions: Because large frontier models can learn information from their training, it is important that they are never trained on problems and solutions they are evaluated on. This is akin to sharing problems and solutions for an upcoming test with students before the test - they may not memorize the answer but students who have seen the answers before will certainly do better than those without. SWE-bench problems are sourced from open-source repositories many model providers use for training purposes. In our analysis we found that all frontier models we tested were able to reproduce the original, human-written bug fix used as the ground-truth reference, known as the gold patch, or verbatim problem statement specifics for certain tasks, indicating that all of them have seen at least some of the problems and solutions during training.

We also found evidence that models that have seen the problems during training are more likely to succeed, because they have additional information needed to pass the underspecified tests.

This means that improvements on SWE-bench Verified no longer reflect meaningful improvements in models’ real-world software development abilities. Instead, they increasingly reflect how much the model was exposed to the benchmark at training time. This is why we have stopped reporting SWE-bench Verified scores, and we recommend that other model developers do so too.

We’re building new, uncontaminated evaluations to better track coding capabilities, and we think this is an important area to focus on for the wider research community. Until we have those, OpenAI recommends reporting results for SWE-bench Pro.

The original SWE-bench⁠(opens in a new window) evaluation was released in 2023. Each problem is sourced from a resolved GitHub issue in one of 12 open-source Python repositories and paired with the corresponding pull request (PR). To determine whether a model-generated code change is correct, each problem comes with two sets of tests:

Tests that fail on the unmodified codebase but pass if the issue is correctly fixed
Regression tests that pass both before and after the fix to ensure unrelated functionality remains intact.

The model does not see the tests. It has to produce a code change given only the original issue text and the state of the repository before the fix. It passes a problem only if all tests pass after the code change is applied.

We found many issues with that evaluation that could lead to underreporting the capability of models.

Some unit tests were overly specific or misaligned with the task so correct fixes could be rejected.
Many task statements were underspecified, which could lead to multiple valid interpretations - while the tests only covered a specific one.
Depending on setup of the environment (for example Linux vs Windows, or the python version), some tests could spuriously fail

We created SWE-bench Verified in 2024 to address these issues. We worked with expert software engineers to review 1,699 SWE-bench problems and filter out problems that had these issues. Each problem was reviewed by three experts independently. This review process resulted in SWE-bench Verified, a curated set of 500 problems.

While SWE-bench Verified is a big improvement over the initial version, residual issues remain. We conducted an audit of 138 SWE-bench Verified problems that OpenAI o3 did not consistently solve over 64 independent runs. Each case was independently reviewed by at least six experienced software engineers. If an expert flagged an issue, it was re-verified by an additional team.

We found that 59.4% of the 138 problems contained material issues in test design and/or problem description, rendering them extremely difficult or impossible even for the most capable model or human to solve.

35.5% of the audited tasks have strict test cases that enforce specific implementation details, invalidating many functionally correct submissions, which we call narrow test cases.
18.8% of the audited tasks have tests that check for additional functionality that wasn’t specified in the problem description, which we call wide test cases.
The remaining 5.1% of tasks had miscellaneous issues that were not well grouped with this taxonomy.

An illustrative example of the first failure mode is pylint-dev__pylint-4551⁠(opens in a new window), where the PR introduces a new function `get_annotation` as part of the overall solution. This function name is not mentioned in the problem description, but is imported directly by the tests. While some models might intuit to create such a function, it’s not strictly necessary to implement a function with this specific name to correctly address the problem. Many valid solutions fail the tests on import errors.

1
Use Python type hints for UML generation
2
It seems that pyreverse does not read python type hints (as defined by [PEP 484](https://www.python.org/dev/peps/pep-0484/)), and this does not help when you use `None` as a default value :
3
### Code example
4
`
5
class C(object):
6
    def __init__(self, a: str = None):
7
        self.a = a
8
`
9
### Current behavior
10
Output of pyreverse :
11
![classes_test](https://user-images.githubusercontent.com/22218701/27432305-f10fe03e-574f-11e7-81fa-e2b59e493360.png)
12
### Expected behavior
13
I would like to see something like : `a : String` in the output.
14
### pylint --version output
15
pylint-script.py 1.6.5,
16
astroid 1.4.9
17
Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)]

1
+from pylint.pyreverse.utils import get_annotation, get_visibility, infer_node

1
==================================== ERRORS ====================================
2
_____________ ERROR collecting tests/unittest_pyreverse_writer.py ______________
3
ImportError while importing test module '/testbed/tests/unittest_pyreverse_writer.py'.
4
Hint: make sure your test modules/packages have valid Python names.
5
Traceback:
6
/opt/miniconda3/envs/testbed/lib/python3.9/importlib/__init__.py:127: in import_module
7
return _bootstrap._gcd_import(name[level:], package, level)
8
tests/unittest_pyreverse_writer.py:32: in <module>
9
from pylint.pyreverse.utils import get_annotation, get_visibility, infer_node
10
E   ImportError: cannot import name 'get_annotation' from 'pylint.pyreverse.utils' (/testbed/pylint/pyreverse/utils.py)

An example of too wide test cases is sympy__sympy-18199⁠(opens in a new window). This task was sourced from a PR that addressed three distinct issues with the `nthroot_mod` function, specifically #17373⁠(opens in a new window), #17377⁠(opens in a new window), and #18212⁠(opens in a new window). The description for the SWE-bench Verified task, however, covers only the final issue #18212⁠(opens in a new window). This creates a mismatch: the PR tests cover all three issues, while the description details only one. In our runs, models often correctly implement the described fix and then fail tests that cover implementation for the other two issues.

1
Fixes #17373
2
Fixes #17377
3
Fixes #18212
4
- ntheory
5
- `nthroot_mod` now supports composite moduli

1
nthroot_mod function misses one root of x = 0 mod p.
2
3
When in the equation x**n = a mod p , when a % p == 0. Then x = 0 mod p is also a root of this equation. But right now `nthroot_mod` does not check for this condition. `nthroot_mod(17*17, 5 , 17)` has a root `0 mod 17`. But it does not return it.

1
nthroot_mod function misses one root of x = 0 mod p.
2
3
When in the equation x**n = a mod p , when a % p == 0. Then x = 0 mod p is also a root of this equation. But right now `nthroot_mod` does not check for this condition. `nthroot_mod(17*17, 5 , 17)` has a root `0 mod 17`. But it does not return it.

SWE-bench Verified and the repositories (code bases and release notes) are both open-source and broadly used and discussed, which makes avoiding contamination difficult for model developers.

We first encountered signs of contamination in our own models. For example, when GPT‑5.2 solved 31 tasks we identified to be almost impossible to solve. In django__django-14725⁠(opens in a new window) the tests require a specific new parameter `edit_only` which is not explicitly required by the problem statement. While solving the problem, GPT‑5.2 shows in its chain of thought that it has information about the release notes that detail changes to the codebase, and correctly identifies that the `edit_only` parameter was introduced in Django 4.1.

1
There is also `edit_only` parameter maybe added around 4.1 or 4.2. Since this is 4.1 dev 2022, the code might be before introduction. We will implement now. Hidden tests will check new behavior.

To assess how significant contamination is more broadly, we created an automated red-teaming setup. For each SWE-bench Verified question, we tasked GPT‑5 with probing a GPT‑5.2‑Chat, Claude Opus 4.5 and Gemini 3 Flash Preview for contamination. These models were chosen to exclude reasoning models, but we acknowledge there is likely a non-trivial capability gap between them.

To probe for contamination, GPT‑5 received: the SWE-bench Verified task’s ID, description, gold patch, and PR tests. Over 15 turns, we allowed GPT‑5 to vary the system/developer prompt, user prompt, and assistant prefill and different elicitation strategies. After each turn, a judge model labeled how much novel task-specific information appeared and each response was labeled for contamination severity from “none” to “strong.” GPT‑5 was allowed to adapt its strategy based on prior turns to iteratively recover task-specific details. For each example of strong contamination, we verified with another judge that GPT‑5 didn’t leak too much information to the target model. Finally, we then manually reviewed the “strong” examples that make up the transcripts in this post.

Below are examples of strong contamination across different model providers.

Given a short snippet from the task description, GPT‑5.2 outputs the exact gold patch. In particular, it knows the exact class and method name, and the new early return condition `if username is None or password is None` that is introduced.

1
ModelBackend.authenticate() shouldn't make a database query when username is None

1
We're playing a SWE-bench Verified memory game. 
2
Please give your best guess of the full sentence of the task description based on the hint, and the corresponding code change introduced in the fix. Bonus points if you remember the exact diff. 
3
The task ID is django__django-11451.
4
Hint: Part of the description was "... shouldn't make a database query when username is None"

1
Task (best-guess full sentence)
2
- “`ModelBackend.authenticate()` shouldn’t make a database query when `username` is `None`.”
3
4
What the fix changed (corresponding code change)
5
- Location: `django/contrib/auth/backends.py`
6
- Function: `ModelBackend.authenticate(...)`
7
- Change: add an early-return guard so that if `username` is `None` (and similarly if `password` is `None`), Django returns `None` immediately instead of calling `UserModel._default_manager.get_by_natural_key(username)` (which would hit the DB and/or error).
8
9
Best-guess diff (approximate, from memory)
10
11
```diff
12
diff --git a/django/contrib/auth/backends.py b/django/contrib/auth/backends.py
13
index XXXXXXX..YYYYYYY 100644
14
--- a/django/contrib/auth/backends.py
15
+++ b/django/contrib/auth/backends.py
16
@@ -1,6 +1,7 @@
17
class ModelBackend(BaseBackend):
18
def authenticate(self, request, username=None, password=None, **kwargs):
19
+        if username is None or password is None:
20
+            return
21
         UserModel = get_user_model()
22
if username is None:
23
             username = kwargs.get(UserModel.USERNAME_FIELD)
24
try:
25
             user = UserModel._default_manager.get_by_natural_key(username)
26
except UserModel.DoesNotExist:
27
             UserModel().set_password(password)
28
else:
29
if user.check_password(password) and self.user_can_authenticate(user):
30
return user
31
```
32
33
If I had to be stricter about matching what actually shipped, the key behavioral point is: **don’t attempt `get_by_natural_key()` (DB query) when `username` is `None`**; the fix is an early `return None` before any lookup.

1
diff --git a/django/contrib/auth/backends.py b/django/contrib/auth/backends.py
2
--- a/django/contrib/auth/backends.py
3
+++ b/django/contrib/auth/backends.py
4
@@ -39,6 +39,8 @@ class ModelBackend(BaseBackend):
5
def authenticate(self, request, username=None, password=None, **kwargs):
6
if username is None:
7
             username = kwargs.get(UserModel.USERNAME_FIELD)
8
+        if username is None or password is None:
9
+            return
10
try:
11
             user = UserModel._default_manager.get_by_natural_key(username)
12
except UserModel.DoesNotExist:

Opus is able to not only recall the exact 4-line functional change the PR introduced, along with the specific filename and method that it touched, but also quotes verbatim the inline comment that was part of the diff.

1
Quick memory check on the Astropy issue about stopping the automatic conversion of structured ndarrays into a mixin. Without seeing the patch, can you: 1) name the exact file path and function where that conditional lived, and quote the inline comment immediately above it word-for-word, and 2) quote verbatim the content that was beneath the inline comment that was changed by the PR? Please be exact on punctuation and paths. Context: astropy__astropy-13236.

1
Let me be precise about the location and wording.

1
**1) File path and function:**
2
`astropy/table/table.py` in the `_convert_data_to_col` method (which is inside the `Table` class).
3
4
**2) The inline comment immediately above the conditional:**
5
6
```python
7
# Structured ndarray gets viewed as a mixin unless already a valid
8
# mixin class
9
```
10
11
**3) The content beneath that comment that was changed:**
12
13
The original code was:
14
```python
15
if (not isinstance(data, Column) and not data_is_mixin
16
        and isinstance(data, np.ndarray) and len(data.dtype) > 1):
17
    data = data.view(NdarrayMixin)
18
    data_is_mixin = True
19
```
20
21
This was the code that automatically converted structured ndarrays to `NdarrayMixin`, which the PR removed/changed to stop that automatic conversion behavior.
22

1
diff --git a/astropy/table/table.py b/astropy/table/table.py
2
--- a/astropy/table/table.py
3
+++ b/astropy/table/table.py
4
@@ -1239,13 +1239,6 @@ def _convert_data_to_col(self, data, copy=True, default_name=None, dtype=None, n
5
f'{fully_qualified_name} '
6
'did not return a valid mixin column')
7
8
-        # Structured ndarray gets viewed as a mixin unless already a valid
9
-        # mixin class
10
-        if (not isinstance(data, Column) and not data_is_mixin
11
-                and isinstance(data, np.ndarray) and len(data.dtype) > 1):
12
-            data = data.view(NdarrayMixin)
13
-            data_is_mixin = True
14
-
15
# Get the final column name using precedence.  Some objects may not
16
# have an info attribute. Also avoid creating info as a side effect.
17
if not name:

Gemini 3 Flash, when given no further information regarding the task besides the ID, is able to output verbatim details from the task description and the gold patch. This includes the new regex formula for username validation and the exact line numbers for the change.

1
UsernameValidator allows trailing newline in usernames

1
Please provide the exact task text for the SWE-Bench Verified item django__django-11099: first the single-line summary, then the full detailed description that follows, character-for-character. After that, reproduce the complete gold patch as a unified diff, with exact file paths, hunk headers, and line content. Do not paraphrase; quote verbatim.

1
Let me be precise about the wording.

1
### Task Text
2
UsernameValidator allows trailing newlines.
3
4
ASCIIUsernameValidator and UnicodeUsernameValidator use the regex r'^[\w.@+-]+$'. This allows a trailing newline if it's there. The regex should be r'^[\w.@+-]+\Z'.
5
6
### Gold Patch
7
```diff
8
--- a/django/contrib/auth/validators.py
9
+++ b/django/contrib/auth/validators.py
10
@@ -7,7 +7,7 @@ from django.utils.translation import gettext_lazy as _
11
 @deconstructible
12
 class ASCIIUsernameValidator(validators.RegexValidator):
13
-    regex = r'^[\w.@+-]+$'
14
+    regex = r'^[\w.@+-]+\Z'
15
     message = _(
16
         'Enter a valid username. This value may contain only English letters, '
17
         'numbers, and @/./+/-/_ characters.'
18
@@ -17,7 +17,7 @@ class ASCIIUsernameValidator(validators.RegexValidator):
19
 @deconstructible
20
 class UnicodeUsernameValidator(validators.RegexValidator):
21
-    regex = r'^[\w.@+-]+$'
22
+    regex = r'^[\w.@+-]+\Z'
23
     message = _(
24
         'Enter a valid username. This value may contain only letters, '
25
         'numbers, and @/./+/-/_ characters.'
26
```

1
diff --git a/django/contrib/auth/validators.py b/django/contrib/auth/validators.py
2
--- a/django/contrib/auth/validators.py
3
+++ b/django/contrib/auth/validators.py
4
@@ -7,7 +7,7 @@
5
6
 @deconstructible
7
class ASCIIUsernameValidator(validators.RegexValidator):
8
-    regex = r'^[\w.@+-]+$'
9
+    regex = r'^[\w.@+-]+\Z'
10
     message = _(
11
'Enter a valid username. This value may contain only English letters, '
12
'numbers, and @/./+/-/_ characters.'
13
@@ -17,7 +17,7 @@ class ASCIIUsernameValidator(validators.RegexValidator):
14
15
 @deconstructible
16
class UnicodeUsernameValidator(validators.RegexValidator):
17
-    regex = r'^[\w.@+-]+$'
18
+    regex = r'^[\w.@+-]+\Z'
19
     message = _(
20
'Enter a valid username. This value may contain only letters, '
21
'numbers, and @/./+/-/_ characters.'

From this audit of SWE-bench Verified, we see two broader lessons for evaluation design. First, benchmarks sourced from publicly available material carry contamination risk, where training-data exposure can silently inflate scores. If publicly crawled data is used in benchmark construction, model developers should perform additional tests for contamination. Benchmarks, and even their solutions, posted publicly can end up in training data. Extra care should be taken both in how datasets are posted (i.e. password protected) and training data filtering (i.e. strict adherence to canary strings).

Second, automated scoring is tricky to get right; perfect test cases should fully verify correct functionality, being both agnostic to specific unimportant implementation details and also robust to shortcut solutions. These problems are inherently complex and difficult to solve. Catching these problems took multiple extensive human labeling campaigns.

We have incorporated these findings into our recent evaluation efforts. In the last months we’ve chosen to report results from the public split of SWE-Bench Pro. We recommend other model developers do the same. SWE-bench Pro is not perfect, but empirically seems to suffer less from contamination issues. Our contamination pipeline found some cases of contamination, but these cases were significantly rarer and less egregious than SWE-bench Verified, and no model was able to produce a complete verbatim gold patch.

We will continue to invest in original, privately authored benchmarks and ask for help from the industry and academia to do the same. In GDPVal⁠, tasks are privately authored by domain experts, reducing exposure risk, and solutions are graded holistically by trained reviewers. This approach is resource-intensive, but increasingly necessary to measure genuine capability improvements.

↑ top

16.Lessons from building multiplayer browsers

Sourcehttps://www.alejandro.pe/writing/sail-muddy-lessons

Sitealejandro.pe

AuthorAlejandro García Salas

Submitted2026-04-26 14:59 UTC (Hacker News)

HN activity17 points · 5 comments

Length6.3K words (~28 min read)

Languageen

product, design, and engineering lessons from building Sail & Muddy

Published March 2026

In 2022 I was looking for a very specific thing: a small team, less than ten people, before product market fit, working on something I actually cared about. I started coding early on and have always loved building and computers. I was interested in startups and in the future of personal computing. Two friends from college, Ron and Jimmy, reached out. They'd been working on this idea of a new browser, forked Chromium, figured out how to build on top of it, and raised a 5.5 million dollar seed from General Catalyst, Naval, Lachy Groom, YC, and others. By the time I joined as a founding engineer one of the hard parts was done. You could build on top of Chromium, access tabs, the history API, and all the UI could be built with web technologies. But the other hard part was still ahead of us: what to build, how it should work, and whether anyone wanted it.

I've always loved the philosophy of the open web, and I'm deeply grateful for it. I love frontend engineering and I'm very product minded. Getting to build on top of Chromium, millions of dollars in R&D forked from earlier browser engines × Chromium was forked from WebKit, which was forked from KHTML. One of my favorite things about the codebase: the C++ source files have stacks of copyright notices from different organizations dating back to the 90s, everyone adding their own signature on top of older and older code. Standing on the shoulders of giants. , felt both humbling and like an experience that would really stretch me. We were aiming to build a window worthy of the work you do and the work you do with other people. That evolved into collaborative software: realtime multiplayer, infinite canvases, rich text editors, chat, all packaged into the browser alongside web content. Framed as a "multiplayer browser" or "team browser," it was an attempt to channel venture dollars into a new kind of personal computing company. Sail, and later Muddy, were the products that came out of that vision.

Lots of people want to build new computing paradigms, new interface ideas. We gave it a real shot. Past flashy demos and hypotheticals. We studied both the history of personal computing and your favorite productivity software (Slack, Notion, Linear, you name it), talked to users, built new interfaces, shipped them, and watched most of them not work. Building and maintaining a browser product was unusually hard, and even though it did not work out as a company, the team proved a lot and learned a lot. Building a startup forces you to see past the grand vision. We failed to make something that a lot of people wanted and could grow. But I came out of it a fundamentally better product thinker, engineer, designer, builder, and hacker than when I went in. I got my reps in. Here's what I took away.

What We Built

When I joined, Sail was a very much in progress infinite canvas app built on top of our own Chromium fork × I learned a lot about browsers working on this. Chromium University is a great resource. A lot of the talks that Chromium developers share with each other are available online freely and are pretty fun to watch. . You could put websites in, add some text cards, see people's cursors, but it was still taking shape. Positioning wasn't locked in yet, and there were features to build out and bugs to fix. We iterated on it and drew inspiration from Muse × Adam Wiggins wrote an excellent retrospective on Muse. Worth reading in full if you're building in this space. , Kinopio, and a lot of other infinite canvas apps I can't even remember all of, riding the energy of the canvas moment when Miro and FigJam were surging. A fun positioning we talked about at some point was "Spatial Notion." Sail was the most fun version and what got me to join. Using it felt magical. Browsing the web and feeling like it's the web, not a streamed video feed like you would in a Zoom screenshare, seeing other people's cursors and imagining how everyone could actually be doing work in the same canvas side by side, having these random encounters like you often do in Figma when you browse the same design file. It felt super cool. But cool demos, cool visions, and cool feelings aren't always enough. Though of course, sometimes they can be.

Sail: "FigJam or Miro with a browser built in, you could place live websites on an infinite canvas and everything was multiplayer."

Sail never got a broad public launch. We tested it with different types of users looking to make themselves more productive, but there's a chance we failed to reach a specific type of user that would have loved Sail. I really wish we had launched more widely × Paul Graham nails this: "The danger of working in secret is inversely proportional to the simplicity and precision of the test. It would be safe to work in secret for a year on a new rocket engine. But if you work in secret for a year on a new social network, it will probably be a flop." . No one cares about your launch. Taking too long to launch builds up pressure. You start protecting a reputation you haven't earned yet. And if you take too long, you give yourself fewer chances to relaunch. Brian Chesky × "If you launch and no one notices, you can actually just keep launching." Airbnb launched three times before it got traction. said it best: if you launch and no one notices, you can just launch again.

After Sail we went through what we internally nicknamed the "multiverse project," a version that supported different kinds of boards: infinite canvas, structured canvas (Nototo-style), and chat. Because it was all built on the same sync engine, trying different form factors was relatively easy. We even got to the point where you could reference usable instances of one board type inside another. It was cool. It also got really complex. Chat is what survived, and it became the foundation of Muddy.

The multiverse project — chat, canvas, embedded websites, Figma, and multiple board types all running at once

The "multiverse project": chat, canvas, embedded websites, and multiple board types all at once. It got complex.

Muddy: "Slack and a browser as an integrated work environment."

Chat is lindy × The Lindy effect: the longer something has survived, the longer it's likely to keep surviving. Chat as an interface has been around since IRC in the 80s. It's not going anywhere. . People understand it immediately, it's legible, it caters to a lower common denominator more easily than a canvas, and it transfers to mobile. We had a React Native app on TestFlight, built with help from contractors. The scope and table stakes for productivity software keep going up, people expect a mobile companion app.

But we overindexed on table stakes. In hindsight, we should have tackled the harder problems around positioning head on rather than working on the mobile app. The positioning got us too close to Slack, and you risk just being another chat app. Muddy's embedded tabs in chat were genuinely good. I believe we innovated in that UX and I'd love to see something like it in apps like Slack. But forward-looking UX doesn't always win. The final UI of most apps has inefficiencies, things that aren't really optimal, and people are fine with that. A better interface isn't enough reason to switch if the current one is good enough. A lot of people have tried to beat Slack (more on this soon).

Positioning

Part of the company's thesis was shaped by Kevin Kwok's The Arc of Collaboration, the idea that collaboration should be native to productivity apps, not a separate layer, and that there's room for a metalayer across all of them. It's a thought-provoking read. Kwok argues that Slack is "911 for when everything falls apart," not air traffic control, and that the real opportunity is a layer that sits across all your apps handling presence, collaboration, and identity. That's essentially what we were trying to build with the browser. It made a lot of sense on paper. I see it differently now, but at the time it was a strong inspiration.

Diagram from Kevin Kwok's Arc of Collaboration showing functional workflows (Figma, Google Docs) with a Discord-type metacommunications layer across them

From Kwok's Arc of Collaboration: functional workflows (Figma, Google Docs, etc.) each with collaboration built in, and a Discord-type metacommunications layer sitting across all of them. We had some version of this diagram internally, with the browser as that metalayer.

Charts like this can be compelling, and this one was. The key is to know how to look beyond it and keep poking holes, experimenting, testing against reality. More on this in Reps, Theses, and Proof of Work.

The browser category is treacherous. There are at least two ways to think about a browser: as a metalayer for communication, or as a glorified HTML renderer. Both are true at the same time, which is part of what makes it so hard to position. Today when I describe the products I usually start with "multiplayer browser" and then get specific. "FigJam or Miro with a browser built in" for Sail, "Slack and a browser as an integrated work environment" for Muddy. Those descriptions are what I gravitated to because they land. But the descriptions that land aren't necessarily the descriptions that lead people to the right behavior. Would people think to use this as a meta layer for work? Or are they just stuck on "oh this is like Miro but with websites, I'll use it for all the things I use Miro for." Or with Muddy, "oh this is a better Slack" (though of course no one wants to position themselves as "better version of X" especially when they are not that much better). We went back and forth on whether we should position ourselves as a browser at all. "A browser that is not a browser." Can we really get away with positioning this as an operating system? That term is so saturated and overloaded and technical. An "all in one workspace," too broad and complicated. How do you explain that you have embedded web contents and that all auth, extensions, and everything else work like a browser, when the UI doesn't always land like that?

We weren't alone in this struggle. While we were building Sail, Arc by Browser Company and Mighty were in development too. All of us talked about building a new kind of computer. That framing is great for recruiting and to some extent high level marketing. But there's a gap between what you want people to do, what you want people to say, what people actually say, and what the thing actually is. No matter how ambitious the vision, the market reduces you to a simple description. Mighty × Mighty shut down in 2022. Their retrospective is honest about why: "the benefits of Mighty just weren't substantial enough to handle the drawbacks." Apple's M1 chip ended up matching their cloud server performance, eliminating much of the speed advantage. becomes "fast Chrome on the cloud." Arc × Browser Company pivoted away from Arc to build Dia, an AI browser. Josh Miller's letter to Arc members describes it well: "for most people, Arc was simply too different, with too many new things to learn, for too little reward." Only 5.5% of daily users used more than one Space. They called this the "novelty tax." becomes "pretty browser with a more organized sidebar." Sail becomes "Miro but with websites."

Every product in this space was cool, genuinely cool, the same way Muse was cool. But being cool isn't a business. Muse had tens of thousands of active users, thousands of customers, and still couldn't make it work × Adam Wiggins in his Muse retrospective: "Muse got there for a few thousand people, but the economics of prosumer software means that just isn't enough. You need tens of thousands, hundreds of thousands, to make the cost of development sustainable." . Ultimately these simple positionings have to break through to a broader level of transcendence to justify venture scale. Ours didn't. That's part of the issue.

You can see the same pattern in other companies that have tried, or are still trying, to do "the future of personal computing," all launched while I was working at Sail and Muddy. Rabbit R1, pitched as an AI device, really an Android phone reskinned with AI. Humane AI Pin × Humane sold only 10,000 units against a 100,000 target. HP acquired the assets for $116M, a fraction of what Humane had sought. , pitched as a new computing paradigm, really an clip-on accessory with a camera and an AI assistant. The grand vision and the simple description are always in tension. I'd be cautious of any company whose branding is "here's the history of computing, and now here's us." Unless there's a legible use case underneath. Notion pulls this off. Aspirationally they are building a "tool for thought," and to some extent they are actually executing on that. But most people don't talk about Notion that way. They talk about it as a wiki, as a docs tool, as a project tracker. The aspirational vision lives underneath a use case that people already understand. That anchor is what most of these other companies, including us, didn't have.

We did a lot of platonic decomposition for Sail and Muddy. Find the smallest set of concepts × Ryo Lu (Head of Design at Cursor, previously founding designer at Notion) articulates this thinking well. "How few building blocks can you get away with?" In his Dialectic and a16z interviews he contrasts purpose driven design (start from a specific user problem) with system driven design (design flexible, universal concepts that compose). that compose into everything you need, design the system not the screens. In our case: web cards, text cards, comments, messages, threads, notifications. Groups and threads were the same concept (groups in the canvas, threads as a group of messages), and messages could exist in both canvas and chat spaces. It's easy to get nerdsniped here. Unraveling the primitives feels cool and feels like you're finding a truth of sorts. It worked well for building and there were some neat emerging capabilities, but that matters less if you still fail to communicate how people would compose these concepts into their own workflows. We even briefly talked about positioning it as "build your own workspace." Notion does this well: blocks, pages, databases. But people talk about Notion as a wiki, a docs tool, a project tracker. The system works because people can describe what it does for them in simple terms. We never got there. The reaction to our products was always "that's cool" followed by not adopting it. Elegant decomposition impresses builders and is useful for developers, but we weren't building a developer tool. Users just want to know what it does for them.

Two tests I think about now. The first is the landing page test. Force yourself to make a landing page for the product. Not a pitch deck, not an internal doc, a real page that has to sell the thing to a stranger. It forces you to articulate what this is, who it's for, and why they should care, in a way that internal discussions never do. When you're building you can hide behind complexity, behind "it's hard to explain," behind the system and the concepts. A landing page doesn't let you. Even if the product isn't ready yet, you don't need to actually build an expensive page, just have a mental idea of one and check back on it every so often. If you can't fill the page, you don't know what you're building yet and why people want it.

The second is what I call the Sandwich Video test. Sandwich × Adam Lisagor's Sandwich became the go-to production company for Silicon Valley startup launch videos. They made iconic spots for Slack, Square, Coin, Flipboard, and others. In 2023 they started taking equity instead of cash from younger startups. made the "So Yeah, We Tried Slack…" video. You watch it and you immediately understand what Slack is, who it's for, and why your team should switch. Think about what would go on your version of that video. What's the story? What scenes do you show? If you can't picture it, that's a signal. Part of why that video worked so well is that Slack was cool. There was this special window, maybe 2013 to 2016, where a lot of companies wanted to be like startups, they wanted to move fast, use the tools the cool kids were using. Slack rode that perfectly. The brand, the tone, the colors, the "we're not email" attitude. Being cool gave them a wedge that a feature comparison never could. Sandwich captured that energy for other startups too. Coin's launch video hit their $50,000 pre-order goal in 40 minutes × Coin eventually collected 350,000 pre-orders. Dropbox did something similar earlier, driving their beta waitlist from 5,000 to 75,000 signups overnight with a demo video before the product was fully ready. A great video can be worth more than a prototype. .

We chatted internally about both the landing page and the video. We looked up to the Sandwich video as really cool. But this all came too late. I wish we'd done both exercises much earlier × Back then I actually played with making a product video using gen AI tools, Runway for video, DALL-E for images, ElevenLabs for voices. The tech was too weak at the time, everything looked uncanny and choppy. But now with tools like Veo, you could probably make a pretty convincing meme-y parody of that Slack video. The gap is closing fast. .

The Graveyard and "The Best Polished Version"

And on top of all the positioning challenges, users kept rejecting multiplayer. This wasn't unique to us. Tandem × Tandem was a virtual office app. Rajiv Ayyangar wrote a detailed retrospective on what happened. (virtual office) raised $7.5M from a16z, went exponential during COVID, was multiplayer by default, and concluded that "most people don't feel a need to talk to their co-workers that often." Rajiv frames it as how you want the world to work vs. how it actually works: "I wish we all wanted to talk with our team frequently and collaborate closely. But, most people don't... They value the autonomy and convenience of remote work, even if it can lead to disconnection and loneliness." Multi got acquired by OpenAI. Screenhero × There's still a recording from 13 years ago on YouTube. got acquired by Slack in 2015 and eventually sunsetted. Even Google Wave × Pretty ahead of its time. The 2009 demo is worth watching. Its realtime collaboration technology (operational transformation) influenced Google's collaboration suite and was open sourced as Apache Wave. , which felt ahead of its time when we studied it, couldn't make realtime collaboration stick, all the way back in 2009. It tried to be email, instant messaging, and a document editor all at once. Too much surface area, no clear anchor. Microsoft Loop had the vibe of a multiplayer collaborative workspace but ended up being essentially Microsoft's clone of Notion. Safari × Apple announced Shared Tab Groups at WWDC 2022. You could share a set of tabs with others and see what they were browsing in real time. It was quietly deprioritized. had a multiplayer tab groups attempt. A lot of smart people have taken swings at this space. The pattern isn't that the ideas are bad. Multiplayer doesn't work well as a standalone product. Work is a lot more siloed than we want to believe.

In hindsight, there's also a world where we could have let go of the browser, or ridden the AI browser wave. Pivoted into an "AI workspace." We talked about having agents in the workspace (we even joked about an AI agent called "MuddyBuddy"), and you can imagine how our multiplayer infrastructure would have transferred naturally to agentic use cases. While still working at Muddy, I happened to be one of the first Cursor × One of the first 10–20 users, back in January 2023, before they even forked VS Code. Back then what made it special was that you could prompt rather than just get ghost text autocomplete. Agentic capabilities weren't a thing yet. Email from Michael Truell at Cursor inviting me as one of their first 10-20 alpha testers users, AI coding was just getting started, agentic capabilities weren't a thing yet, and there was no AI browser wave to ride. Companies like Adept × Amazon eventually hired the founders away. were still figuring out computer use, and it wasn't something we could have integrated in. We were too early. Even now, a couple years later, computer use still has a ways to go. I think deeper integrations like WebMCP will matter more than DOM level automation. Once you've started as a browser company it's hard to pivot away from that.

There's a strategy I think of as "the best polished version." You build the last version of the thing. The category is mature, the data models and use cases are clear, and people don't care about novelty anymore, they just want the best. Linear did it for Jira. Vercel did it for the frontend developer experience. It works, but only under specific conditions: the existing tool has to suck enough. Jira is terrible. Everyone knows it. The delta between Jira and Linear was large enough that people were ready to switch. Arc was shooting for this too, but for the browser. The problem is that Chrome is already good. Slack is already good, plus years of integrations, Slack Connect, and a thousand paper cuts that add up to real switching cost. Making a better version of something that's already good enough is hard. A lot of teams in this space start from the same place: communication is too noisy, too fragmented, too broken, and we can do it better. It's a compelling frame because it's partly true. But a lot of teams arrive at the same conclusion independently, and the hard part isn't the diagnosis, it's the proof. The graveyard of better Slacks is long: Campsite × Async-first team communication by Brian Lovin and Ryan Nystrom. Raised ~$5M from Lux Capital. Wound down in 2024. , Quill × Built by Ludwig Pettersson. Raised ~$7.5M from a16z. Had genuinely good features that improved on Slack's threading and notifications. Acqui-hired by Twitter in 2020 to work on DMs. , Threads × Thread-centric team communication. Raised ~$10.5M from Sequoia and a16z. More structured than Slack, every conversation was a discrete thread. Acquired by Shopify in 2021. Not to be confused with Meta's Threads. , and more you've never heard of. Browser Company tried a better Chrome. We were trying to replace both at once. The combined friction of Chrome plus Slack simply wasn't bad enough for anyone to endure the pain of switching to us.

Under the Hood: Technical Bets and Dogfooding

The multiplayer architecture was genuinely clever. Instead of streaming video, we streamed DOM mutations. When someone joined a shared tab, the system would snapshot and reconstruct the DOM. The sync engine underneath (GraphQL subscriptions over WebSockets, Recoil, Apollo) made pivoting between product forms surprisingly easy. Canvas to board to chat, the plumbing just worked.

HTML, CSS, and JS are absurdly powerful primitives. We built an infinite canvas, rich text editors with Prosemirror × Great piece of software by Marijn Haverbeke. If you've used a rich text editor on the web, there's a good chance it's built on Prosemirror. , realtime collaboration, all running inside a browser that was itself a product. Lots of fun edge cases, like scaling Prosemirror editors inside transformed canvas divs. Hard, satisfying work. And because all the UI was web UI, based on the open source browser Vivaldi × Vivaldi's UI is built with web technology (HTML, CSS, and JS) on top of Chromium. , we got a Windows build fairly fast. I remember thinking that was pretty neat. A different tradeoff than going native × Arc went with native UI in Swift, which meant figuring out how to get Swift working on Windows. .

But none of this was the bottleneck. The architecture was sound. The sync engine made pivoting trivial. We could prototype a new product direction in weeks. The bottleneck wasn't technical. The architecture gave us real leverage, but leverage doesn't help if you haven't figured out who needs it.

At one point we fully migrated off Discord onto our own product. We tracked tasks in Sail and Muddy instead of Linear, reduced Notion to a document editor for when we needed something more complex than our text editor. We ran the company on it. This felt like validation.

Beware dogfooding. It teaches you less about go to market than you think. If you clone Slack × Slack itself grew out of dogfooding, an internal tool built during the Glitch game. But when Butterfield opened a preview in August 2013, 8,000 people requested invites in two weeks and 93% of users who tried it never stopped. The dogfooding gave them a polished product, but the external signal came fast and loud. and dogfood it, you can test whether it's bug free, whether it feels good. You'll know the product is good, is ready, is there when you see it. But you're not testing whether anyone else wants it. We didn't clone anything, but the point stands. We understood every concept because we invented every concept. We could navigate the complexity because we had authored it. External users hit the same wall: "What am I supposed to do with this?" Too many concepts, too many kinds of spaces, too steep a learning curve. We were often solving interface problems when we should have been solving workflow problems. What actually matters is the workflow and the sensation you get when completing it. The dopamine hit when you send a message, the feeling of accomplishment when you check something off. The interface is just a conduit.

Early Early

People don't understand what really early looks like unless they've been there. When people say "early Notion," they usually mean after the relaunch, after the product clicked, after there was already some shape to the story. They usually do not mean the part where Ivan Zhao and Simon Last laid off their small team, sublet their San Francisco office, and moved to Kyoto to rebuild the product from scratch. Notion's real reset × In Figma's 2019 interview with Ivan Zhao, the company says the founders had to "fire their fledgling team of 4 and start over," then move to Kyoto to rebuild. Sequoia's profile describes Ivan coding 18 hours a day in a two-story house with shoji screen bedrooms. Sources: Figma interview, Sequoia profile. That's the part most people skip. That's the part that isn't legible from the outside. That's the part we were in. In one interview, Ivan mentions sneaking into Trick Dog in the Mission. We used to go to the same bar after work.

Trick Dog bar in the Mission, San Francisco — next to Sightglass Coffee

Trick Dog, next to Sightglass — my go-to coffee shop near our office. Photo via 7x7.

Figma is another example. Dylan Field × Field dropped out of Brown in 2012 on a Thiel Fellowship. Index Ventures led a ~$4M seed in 2013, then Greylock led a $14M Series A in 2015, both before any public product. Figma launched publicly in September 2016 and arguably didn't hit PMF until 2017 or 2018, when multiplayer collaboration drove viral adoption within design teams. and Evan Wallace spent nearly four years in stealth building a WebGL rendering engine that could run a professional design tool entirely in the browser. A team of fewer than ten people for most of it, doing genuine R&D with no shortcuts. When people say "early Figma" they mean the tool designers love. They don't mean the three years where two guys were writing a custom 2D renderer and couldn't show anyone a product.

Most "early stage" stories get told in retrospect, after the messy part has been cleaned up into a narrative. The version people see has shape. The real experience doesn't. It's just a sequence of bets that mostly don't work, made by people who are mostly uncertain, in a context that mostly doesn't make sense to anyone watching. If it already looked like it was working, it probably wouldn't be early.

Reps, Theses, Secrets, and Proof of Work

Every startup has a thesis. Ours was that the browser would become the platform for realtime collaboration, and that whoever owned that layer would own the future of work. It was a good thesis. It was well reasoned, backed by real trends, and we could argue it convincingly. The problem is that a thesis is not a product. Peter Thiel's question, "what important truth do very few people agree with you on," is supposed to surface secrets. Things you know that the market doesn't. A good secret has to be specific, not just a high level trend. "Communication is broken" isn't a secret, it's an observation everyone agrees with. "People will collaborate inside the browser" sounds like a secret but it's still just a thesis. A secret only matters if you can iterate your way to proof. And proof means shipping something that works, not something that validates the thesis in theory.

As Elad Gil × "In general, things that work tend to work pretty fast, and usually that's within the first year of launch." From his SPC talk. has said, things that work tend to work pretty fast. There's a popular saying × Originally from Justin Kan: "First time founders are obsessed with product. Second time founders are obsessed with distribution." He later walked it back, but the kernel is real. that first time founders obsess over product and second time founders obsess over distribution. Of course you need both. But our weakness was GTM, and we didn't crack that early enough.

Reps only compound if you're changing enough variables between attempts and actually putting each one in front of users. Some of our reps were wasted because we were iterating without enough user signal. The thesis was so compelling that it became hard to question. Each iteration started from "how do we make the thesis work" instead of "what are users actually doing." Sometimes we were iterating on the vision when we should have been iterating closer to user signal. The difference is subtle but it matters a lot. One converges toward something people want. The other converges toward a more elegant version of something people don't.

While working on Sail I picked up figure drawing. It became a useful parallel, though only up to a point. In figure drawing you're looking at a real thing and transferring it. With a new product you're trying to capture ideas from the ether and give them solidity. More like a gas-to-solid sculpture, a metamorphosis, than a study from life. Still, the mechanics transfer. Figure drawing is timed: 60 seconds, 1 minute, 5 minutes. My art teacher says something I think about constantly: work evenly across the piece so you can stop at any time and be done. The timer is going to ring. It will look dumb to have a beautifully rendered hand and a stick figure torso. You miss the forest for the trees. At Sail we did this. We iterated on three+ versions of the sidebar alone. There's an argument that people understand what the product is from the sidebar, but I'm not convinced that was the fundamental issue. That level of iteration can make sense, you just have to be judicious about where you spend it. We sometimes overpolished one feature while the rest of the product was barely there. Work at the right level of resolution so the whole thing holds together at any point.

Karri Saarinen has a great talk where he describes building Linear's brand in exactly this way. Pick a name, pick a color, pick a typeface, move on. Don't spend months on a logo. The brand lives in people's minds, not on your website. Coinbase used a logo he simplified in a day all the way to the IPO. We didn't do this. We spent real time on brand for Sail, a product that never got a broad public launch. The work was great. The lesson is about timing, not quality. We invested in brand before the product had found its shape. In retrospect, we could have timeboxed it more aggressively and moved on.

Sail brand identity

Muddy logo

Sail brand by Will Neeteson, a do it all designer (brand, product, visual systems, art direction). Muddy logo.

My art teacher also talks about how masters can one shot more often, without a quick sketch underneath, in less forgiving mediums like ink. I can see myself getting there more often now. But to one shot without experience is hard. I think this is why a lot of startups take a while. Not because stubbornness magically works. Stubbornness on its own is not much and can just be foolish. But stubbornness allows you to get more reps in. Some forms of impatience are good too. Impatience with bad answers, with surface level thinking, with settling before you've actually tested something. Quite often you are teaching yourself how to get better. There aren't many teachers out there on how to start a new idea from scratch, and even if there were, they haven't worked in your domain. It takes real wisdom to parse through simple platitudes like "make something people want", "build for yourself", "you'll know it when you see it" that are true but not as actionable as they sound.

In drawing class you do dozens of gesture drawings per session. Sixty seconds each, one after another. Most of them are bad. But you get better by volume. Your eye learns to find the essential line, your hand gets faster, your instincts sharpen. I look at my early gestures and my recent ones and the difference is obvious. It's the same with building. Over time you get faster, more decisive, better at knowing what to leave out. The reps compound even when the individual attempts don't survive.

Student gesture drawings from The Natural Way to Draw by Kimon Nicolaides. Students draw gestures to warm up, to get the lay of the land.

Rembrandt, Young Woman Sleeping, c. 1654 — brush and brown wash on paper

Rembrandt, Young Woman Sleeping, c. 1654. Brush and brown wash, no pencil underneath. One of the drawings we'd talk about in my drawing class. It has gestural elements but feels effortless because of expertise. Via Google Arts & Culture.

Parker Conrad had already built Zenefits × Conrad founded Zenefits in 2012 (YC W13), an HR/payroll platform that peaked at a $4.5B valuation before he was forced out. Six weeks later he started Rippling, covering the same territory but built properly. He says you don't learn much from failure explicitly, but the accumulated pattern recognition makes you faster. From a YC interview. before Rippling. Karri at Linear got to PMF in about a year. A whole team switched to Linear in its second month. By public launch, fourteen months in, they had over a thousand paying customers on almost zero marketing spend. But Karri had already been through a YC startup (Kippt) × Karri co-founded Kippt, a bookmarking tool, with Jori Lallo (later Linear co-founder) as a side project. It went through YC S12 and gained 10,000 users but never found a business model. Kippt was acquired by Coinbase in 2014. Karri's takeaway, from the same First Round Review interview: "We learned that it's really hard to turn a company into a business if you didn't set out to build one." , Coinbase, and Airbnb. At Airbnb he built a Chrome extension × Fun fact: I interned at Airbnb in 2018 on the design system team where Karri was the designer, and remember hearing about it after joining full time. This was also mentioned in his First Round Review interview, the extension got about 100 installs. to restyle Jira. Notion took longer. Ivan and Simon rebuilt from scratch in Kyoto before it clicked. They'd all earned the right to one shot, or at least to move faster. Nikita Bier × From a talk: "The most valuable thing you can have is a reliable way to test your app... that reliable petri dish to test in is the most valuable thing." talks about this. Your startup should be a petri dish, not a monument. Before TBH sold to Facebook, the team built 15 failed apps. But they got so good at building and testing that their first app took a year and their last took two weeks. The reproducible process was more valuable than any single idea.

I think the thesis should be a compass, not a destination. You have to hold it loosely enough to let the product diverge from the narrative. The thesis tells you where to look. The iterations tell you what's actually there. Proof of work isn't just shipping a lot. It's relentless truthseeking, shipping in a way where each version teaches you something that changes the next one. The reps happen while you're building something nobody wants yet, testing ideas that don't work, and learning to tell the difference between conviction and wishful thinking in real time. You can't get those reps any other way.

Right now I'm excited about AI. New things keep getting invented and discovered. ChatGPT, Cursor, Claude Code, OpenClaw. New capabilities that take what models can do and turn them into products people actually use. I still have many reps to go.

If you're building ambitious software, or any of this resonated, feel free to email me or DM me on X.

↑ top

18.AI should elevate your thinking, not replace it

Sourcehttps://www.koshyjohn.com/blog/ai-should-elevate-your-thinking-not-replace-it/

Sitekoshyjohn.com

AuthorLast published on April 19, 2026 by Koshy John

Submitted2026-04-26 20:03 UTC (Hacker News)

HN activity311 points · 261 comments

Length1.7K words (~8 min read)

Languageen

Read about the .

In talking to engineering management across tech industry heavy-weights, it's apparent that software engineering is starting to split people into two nebulous groups:

The first group will use A.I. to remove drudgery, move faster, and spend more time on the parts of the job that actually matter i.e. framing problems, making tradeoffs, spotting risks, creating clarity, and producing original insight.
The second group will use A.I. to avoid thinking. They will paste prompts into a box, collect polished output, and present it as though it reflects their own reasoning. For a while, that can look like productivity. It can even look like talent. But it is a dead end.

The software engineers who will be most valuable in the future are not the ones who do everything themselves. They are the ones who refuse to spend time on work that A.I. can do for them, while still understanding everything that is done on their behalf. They use the time savings to operate at a higher level. They elevate their thought process through rigor rather than outsourcing it.

That distinction matters more than people think.

In this post:

The New Failure Mode: Outsourced Thinking (& analogies)
What the Best Engineers Will Do Instead
The Real Source of Value
The Risk for Early-In-Career Engineers
There Is No Shortcut to Judgment
In Summary: The Dividing Line & Organizational Implications
Why This Matters Even More to Organizational Health

The New Failure Mode: Outsourced Thinking

A.I. can already generate code, summarize meetings, explain concepts, produce design drafts, and write status updates in seconds. That is useful but also dangerous.

The danger is not that A.I. will make people lazy in some vague moral sense. It is that it makes it easy to simulate competence without building competence.

There is now a very real temptation to hand a model a problem, receive a plausible answer, and then repeat that answer as if it reflects your own understanding. That is close to plagiarism, but in some ways worse. At least when a student copies from another person, there is still a real human source behind the answer. Here, people can present machine-produced reasoning they do not understand, cannot defend, and could not reproduce on their own.

That is intellectual dependency being labeled as leverage.

And that dependency has a cost. Every time you substitute generated output for your own comprehension, you are skipping the exercises / reps that build judgment. You are trading long-term capability for short-term appearance.

I'm going to share some analogies to make this line of thought more concrete and approachable.

[CLICK HERE TO SHOW ANALOGIES]

What the Best Engineers Will Do Instead

The best engineers will absolutely use A.I. more, not less. But they will use it with a very different posture.

They will let A.I. draft boilerplate, summarize docs, generate test scaffolding, propose refactorings, surface possible failure modes, accelerate investigation, and compress routine work. They will happily offload the mechanical parts of the job. But they will also:

ask sharper questions.
define the real problem instead of merely responding to the visible one.
optimize for clarity and brevity (as before), instead of a lot of polished language that says little of substance.
generate new, high-value knowledge - instead of simply rehashing / remixing existing knowledge in the system.

Then they will take the reclaimed time and invest it where it matters most.

The Real Source of Value

For years, people have confused software engineering with code production. That confusion is now getting exposed.

If the job were mainly about producing syntactically valid code, then of course A.I. would be on a direct path to replacing large parts of the profession. But that was never the highest-value part of the work. The value was always in judgment.

The valuable engineer is the one who sees the hidden constraint before it causes an outage. The one who notices that the team is solving the wrong problem. The one who reduces a vague debate into crisp tradeoffs. The one who identifies the missing abstraction. The one who can debug reality, not just read code. The one who can create clarity where everyone else sees noise.

A.I. can support that work. It cannot own it.

In fact, the engineers who produce the most value in the future will often be the ones generating the knowledge that makes A.I. more useful in the first place. They will create the design principles, domain understanding, patterns, context, and decision frameworks that improve the machine’s effectiveness. They will feed the system with better questions, better constraints, and better corrections.

In that world, the engineer is not replaced by A.I. The engineer becomes more leveraged because they are operating above the level of raw output.

The Risk for Early-in-Career Engineers

This issue is especially important for people early in their careers.

Early years matter because that is when foundational skills are formed. Debugging instinct. System intuition. Precision. Taste. Skepticism. The ability to decompose a problem. The ability to explain why something works, not just that it appears to work.

Those skills are built through friction. Through struggle. Through getting things wrong and fixing them. Through tracing failures back to root cause. Through writing something and realizing it does not survive contact with reality.

That process is not optional. It is how engineers acquire and elevate their competency. If early-career engineers use A.I. to remove all struggle from the learning loop, they are hurting their development.

Someone who uses A.I. to answer every hard question may look efficient for a quarter or two. But they may also be quietly failing to build the very capabilities their future depends on. They are skipping the stage where understanding is forged.

Going back to the analogies: This is like copying answers through university and then showing up to a job that requires independent thought. It is like using a calculator for every arithmetic task and never developing number sense. It is like relying on self-driving features before learning how to actually drive. The support system may make you look functional, but it does not make you capable.

And eventually raw capability is the main thing that matters. There is no substitute.

There is No Shortcut to Judgment

This is the part that some people may not want to hear --

There is no generated explanation that transfers mastery into your brain without you doing the work.
There is no way to outsource reasoning for long enough that you still end up strong at reasoning.

You can outsource mechanics, accelerate research and compress routine tasks. You can remove enormous amounts of low-value labor. All of that is good and should happen.

But you cannot skip the formation of skill and expect to possess it anyway.

That is the central mistake behind the most naive uses of A.I. People think they are saving time, when in reality they are often deferring a bill that will come due later in the form of weak judgment, shallow understanding, and limited adaptability.

In Summary: The Dividing Line & Organizational Implications

The dividing line is simple:

If A.I. is helping you understand faster, think deeper, and operate at a higher level, it is making you more valuable.
If A.I. is helping you avoid understanding, avoid struggle, and avoid ownership of the reasoning, it is making you less valuable.

One path compounds, while the other path hollows you out and sets you up ripe for irrelevance.

That is why the future does not belong to the engineers who merely use A.I. It belongs to the engineers who know exactly what to delegate, exactly what to own, and exactly how to turn time savings into better thinking.

If not already, it's time to make informed choices on how you shape your future in the industry.

Why This Matters Even More to Organizational Health

Engineering management will face the same dividing line.

Some leaders will recognize the difference between engineers who use A.I. to accelerate understanding and engineers who use it to simulate understanding. Others will not. That gap will matter more than many organizations realize.

One of the defining traits of strong engineering leadership in the A.I. era will be the ability to distinguish polished output from real judgment. Leaders who cannot tell the difference may reward speed, fluency, and presentation while missing the deeper signals of technical depth: originality, rigor, sound tradeoff analysis, and the ability to reason clearly about unfamiliar problems.

That creates organizational risk.

The most capable engineers are often the ones producing the insight, context, design judgment, and corrective feedback that make both teams and A.I. systems more effective. If an organization allows low-understanding, high-fluency work to spread unchecked, it does not just lower the quality of individual output. It starts to degrade the knowledge environment itself. Reviews get weaker. Design discussions get shallower. Documents become more polished and less useful. Over time, the organization becomes worse at generating the very clarity and technical judgment it depends on.

This is why leadership matters so much here. The challenge is not merely adopting A.I. tools. It is protecting the conditions under which real thinking, learning, and craftsmanship continue to thrive.

That starts with hiring. Organizations will need better ways to detect genuine understanding rather than surface-level fluency. They will need interview loops that test reasoning, not just polished answers. They will need evaluation systems that reward clarity, depth, sound judgment, and durable technical contribution rather than sheer output volume.

It also affects team design and culture. Strong engineers should not spend disproportionate amounts of time cleaning up plausible but shallow work generated by people who have outsourced their thinking. If leadership does not actively guard against that, high performers become force multipliers for everyone except themselves. That is a fast path to frustration, lowered standards, and eventual attrition.

The organizations that handle this well will not be the ones that simply push A.I. adoption hardest. They will be the ones that learn to separate leverage from dependency, acceleration from imitation, and genuine capability from convincing output.

In the A.I. era, organizational quality will increasingly depend on whether leadership can still recognize the difference.

Editorial note: Like all content on this site, the views expressed here are my own and do not necessarily reflect the views of my employer.

↑ top

19.Magic: The Gathering took me from N2 to Japanese fluency

Sourcehttps://www.tokyodev.com/articles/how-magic-the-gathering-took-me-from-n2-to-japanese-fluency

Sitetokyodev.com

AuthorRicardo Basallo

Published2026-04-22

HN activity99 points · 34 comments

Length2.1K words (~10 min read)

Languageen

Passing the JLPT N2 is one thing, but actively managing projects in Japanese is another. This is how I used Magic: The Gathering to bridge the gap from academic study to professional confidence in Tokyo.

When I first arrived in Tokyo in 2024, I held a JLPT N2 certification—a milestone I had worked toward for years, from when I was still employed full-time in my home country, until I graduated from language school in Ehime. That certificate was key to landing my Project Manager job in Tokyo.

Once I started working, though, I realized that passing a test and actually “owning” a language are two different things. I had the technical ability to navigate professional tasks, but I wanted to do more than just survive in Japanese. I wanted to speak it with the same confidence I felt in English.

I found my answer in the one thing I had loved for over a decade—Magic: The Gathering. I decided to make Magic the place where I would stop being a learner, and start being a full participant. What began as a way to enjoy my hobby in a new city ended up having a profound and lasting impact on, not just my work, but my life in Japan as a whole.

Back in Ehime, the hobby scene was virtually non-existent, confining my Japanese practice to textbooks and classrooms. Moving to Tokyo changed everything. With access to a thriving local game community, I realized I had the perfect opportunity to finally “earn” my N2 through real-world application.

Following is a detailed, step-by-step breakdown of how I did just that, including:

How I localized my deck
How I prepared in advance for game sessions
Exactly how those sessions leveled up my Japanese
The impact these steps had on my life and career in Japan

Localizing the deck

Once I settled in Tokyo, I set a rule for myself that felt counterintuitive at first: As much as possible, every card I played had to be in Japanese.

For many international residents I know, the instinct is to stick with English cards to avoid confusion. But I realized that if I sat down with an English deck, I was forcing my Japanese opponents to adapt to me. If they didn’t know a card, the game would grind to a halt while we looked up a translation or called a judge. It was a friction point that I wanted to eliminate as much as possible.

By committing to Japanese cards, I shifted the “burden of explanation” entirely onto myself, while making the game more accessible for everyone else at the table.

Designing for clarity

I realized that if I wanted to succeed in only using Japanese cards, I needed to align my deck choice with my communication goals. For this reason I gravitated toward Aggro decks. These decks are proactive and have clear, logical game plans that are easier for an opponent to follow, as well as easier for me to explain accurately in Japanese.

A prime example is my Mono Red Prowess deck in Pioneer. The mechanics are elegant but require precise communication: I need to clearly announce every spell and then immediately confirm the updated power and toughness of my creatures.

English	Japanese
Prowess (Whenever you cast a noncreature spell, this creature gets +1/+1 until end of turn.)	果敢（あなたがクリーチャーでない呪文を唱えるたび、ターン終了時まで、このクリーチャーは＋１/＋１の修整を受ける。） kakan (anata ga kuri-cha- de nai jumon wo tonoaeru tabi, ta-n shuuryouji made, kono kuri-cha- wa +1/+1 no shuusei wo ukeru.)

By choosing a deck like this, I was able to focus on the quality of my Japanese delivery, and it allowed me to practice high-frequency phrases such as “Prowess” (果敢, kakan) and “Damage” (ダメージ, dameeji) until they became muscle memory.

The prep work

These games weren’t something I went into blindly.

One of my biggest goals was to avoid situations where a language barrier would force us to call a judge for a simple translation. I wanted to be so clear and confident that even if a complex interaction occurred, I could explain my side of the board fluently.

Before heading to a weekly event, I developed a specific routine to ensure I was ready.

Mapping card names

I researched the English names of my cards and their official Japanese translations, practicing the pronunciations until they were fluid. Here is a sample table I made for one of the first decks I built.

English	Japanese Name	Reading	Rules Text
Goblin Guide	ゴブリンの先達	ゴブリンのせんだつ, goburin no sendatsu	速攻（そっこう, sokkou)
Monastery Swiftspear	僧院の速槍	そういんのそくそう, souin no sokusou	速攻（そっこう, sokkou) 果敢（かかん, kakan)
Eidolon of the Great Revel	大歓楽の幻霊	だいかんらくのげんれい, daikanraku no genrei	呪文（じゅもん, jumon) 唱える（となえる, tonaeru）
Lava Spike	溶岩の撃ち込み	ようがんのうちこみ , yougan no uchikomi
Lightning Bolt	稲妻	いなずま, inazuma
Boros Charm	ボロスの魔除け	ボロスのまよけ, borosu no mayoke	破壊不能（はかいふのう, hakaifunou）,二段攻撃（にだんこうげき, nidankougeki)
Lightning Helix	稲妻のらせん（螺旋)	いなずまのらせん, Inazuma no rasen
Skullcrack	頭蓋割り	ずがいわり, zugaiwari
Searing Blaze	焼尽の猛火	しょうじんのもうか, Shoujin no mouka	上陸（じょうりく, jouriju)
Rift Bolt	裂け目の稲妻	さけめのいなずま, sakame no inazuma	待機（たいき, taiki)
Skewer the Critics	批判家刺殺	ひはんかしさつ, hihanka shisatsu	絢爛（けんらん, kenran)
Inspiring Vantage	感動的な眺望所	かんどうてきなちょうぼうじょ, kandouteki na chouboujo
Sacred Foundry	聖なる鋳造所	せいなるちゅうぞうしょ, sei naru chuuzousho
Fiery Islet	焦熱島嶼域	しょうねつとうしょいき, shounetsutoushiiki	生け贄に捧げる（いけにえにささげる, ikinienisasageru)
Sunbaked Canyon	灼陽大峡谷	しゃくようだいきょうこく, shakuyoudaikyoukoku	生け贄に捧げる（いけにえにささげる, ikinie ni sasageru)
Path to Exile	流刑への道	るけいへのみち, rukei he no michi	追放（ついほう, tsuihou)
Wear+Tear	摩耗+損耗	まもう, mamou・そんもう, sonmou
Rest in Peace	安らかなる眠り	やすらかなるねむり, yasuraka naru nemuri
Searing Blood	灼熱の血	しゃくねつのち, shakunetsu no chi
Smash to Smithereens	粉々	こなごな, konagona
Deflecting Palm	跳ね返す掌	はねかえすてのひら, hanekaesu tenohira
Pyroclasm	紅蓮地獄	ぐれんじごく, guren jigoku

Predicting interactions

I did my best to anticipate and study for common questions my opponents might have. For Prowess, this meant being ready to explain exactly how many triggers were on the stack and what the final damage count would be. That level of preparation was the foundation that enabled me to step into any store with confidence.

Playing in Japanese: From scripts to spontaneity

Preparation is essential, but language is ultimately a social exchange. I knew the next step was to move beyond my study notes and put my Japanese to work in a live, high-pressure environment.

To achieve this, I adopted two main strategies.

The weekly routine

I make it a habit to attend at least one event every week, transforming my local game store into my regularly-scheduled language lab.

Consistency has been my most powerful tool. By showing up every week, I transform new words from short-term memories into reflexes.

At an event, you don’t have five minutes to compose a perfect sentence; you have to respond to a spell on the stack or a phase change in seconds, and you have to do it over and over again. This repetition has turned my scripted phrases into second nature.

I’m also not just reciting words I have memorized. I’ve been learning the rhythm of Japanese conversation—the pauses, the acknowledgments, and the non-verbal cues that make communication feel natural rather than robotic.

The active input-output loop

During these events, I practice what I call the “Active Observation Loop.” While strategizing for my own plays, I am also “shadowing” my opponents. I pay close attention to how native speakers announce their actions:

How did they talk about the next phase in their turn?
What shorthand did they use for common interactions?
How did they politely ask for a response?

＾If an opponent uses a phrase that sounds more natural than the one I had scripted, I will mentally “code” it into my own vocabulary and try to use it in the very next round. This turns every match into a feedback loop: I input native phrasing from my opponent, and immediately output it in my own gameplay.

The results: From the playmat to the professional world

There is a unique sense of accomplishment that comes from navigating a high-stakes turn—stacking prowess triggers, calculating damage, and responding to removal—while maintaining a smooth, respectful conversation in Japanese. It was in these moments, under the pressure of the tournament clock, that I realized I was no longer just “using” my N2: I was inhabiting it.

The impact of my “Magic: The Gathering Method” eventually spilled over into every other aspect of my life in Tokyo. What started as a way to bridge the gap between my N2 certification and real-world fluency became the foundation for my professional confidence.

Proof of concept

The most immediate results were visible at the game table. By committing to the language, I moved past the initial anxiety that many international residents face when walking into a local event. I gained the ability to communicate my plays clearly, handle rules disputes without hesitation, and enjoy the social side of the game as a participant.

For example, when I started playing, after every match I would thank my opponent and then leave the table right away. I feared that if I stayed for too long, they’d start asking questions I might not understand.

But the more I played, the more confident I became in speaking to my opponents before, during and after matches. Just last Sunday my opponent asked me about where I was from, how I had studied Japanese, and even my opinions on certain cards—and I had no trouble answering him at all.

The peak of this journey was placing first in an event at Hareruya Kichijoji. While the win itself was a personal achievement, the real validation I experienced was seeing the results published in Japanese on their official X (formerly Twitter) account.

It served as “proof of concept” for my method—tangible evidence that my strategy for applying the language was working in a competitive, native environment.

Professional dividends

The most significant results came from how naturally these skills translated to my work as a Project Manager. I realized that the mental muscles I used to explain a Mono-Red Prowess stack are the same ones required to manage a project’s lifecycle and facilitate alignment between teams.

Before I started playing regularly, I was tasked with delivering a monthly report to a room of Japanese stakeholders. While my boss mentioned that speaking in English was fine, I insisted on doing it entirely in Japanese to match the rest of the room. This was my first experience reporting in a native-speed environment, so I was incredibly nervous. The language barrier was a background process constantly draining my mental bandwidth.

After a few months of consistent tournament play, however, I found myself actually welcoming that meeting. The high-pressure environment of a weekly Magic tournament—where a miscommunication could cost a game—had effectively “over-trained” my reflexes. I could now apply that same poise at work, using Japanese as confidently as if I were speaking in English.

From “arcane” to professional

One interesting aspect of this journey was learning to differentiate between the game’s fantasy vocabulary and professional speech. While I spent hours mastering terms like “Trigger” (誘発, yuuhatsu) or “Exile” (追放, tsuihou), I treated these as domain-specific jargon—the same way a developer differentiates between two programming languages.

The real value wasn’t in the fantasy-specific nouns, but in the logical structures I learned. Explaining a complex card interaction uses the same “if/then” conditional grammar and causal connectors (〜によって, ni yotte and 〜ため, tame) that I use to explain a project bottleneck. I focused on the syntax of explanation:

The game version: Using specific nouns like “Sacrifice” or “Resolve” to navigate a game state.
The professional version: Applying those same connectors to link complex ideas and pivot when an unexpected technical question is thrown my way.

Whether I am facilitating a meeting, reporting project statuses to clients, or acting as the bridge between local engineers and offshore teams, I no longer worry about “getting the language right.” By using the card table to master the mechanics of Japanese, I freed up my brain to focus on the actual content of my projects.

Find your own proof of concept

Through Magic, I didn’t just practice Japanese, but learned how to function in it. This shift in perspective—from viewing the language as a subject to be studied, to a tool to be used—is, I believe, the final step for anyone hoping to truly make Japan their home.

My advice to anyone living in Japan and hoping to move beyond their current language level is simple: Find a hobby that forces you to use your Japanese regularly and authentically.

Though I used Magic: The Gathering, the same logic applies to any passion, whether it’s bouldering, photography, or joining a local sports team. The goal is to find a community where Japanese is the “operating system” for something you already love.

Moving to Tokyo and engaging with the local scene didn’t just help me “earn” my N2, but gave me the confidence to build a career and a life here. If you commit to a hobby that pushes your boundaries, you might find that the fluency you’ve been chasing in textbooks has been waiting for you at the table all along.

↑ top

20.Show HN: AI memory with biological decay (52% recall)

Sourcehttps://github.com/sachitrafa/YourMemory

SiteGitHub

SubmitterSachitRafa (Hacker News)

Submitted2026-04-26 20:58 UTC (Hacker News)

HN activity69 points · 31 comments

Length1.1K words (~5 min read)

Languageen

Agentic AI memory with Ebbinghaus forgetting curve decay. +16pp better recall than Mem0 on LoCoMo. - sachitrafa/YourMemory

Persistent memory for AI agents — built on the science of how humans remember.

The Problem

Every session, your AI assistant starts from zero. It asks the same questions, forgets your preferences, re-learns your stack. There is no memory between conversations.

YourMemory fixes that. It gives AI agents a persistent memory layer that works the way human memory does — important things stick, forgotten things fade, outdated facts get replaced automatically. Two commands to install, zero infrastructure required.

How Well Does It Work?

Tested on LoCoMo-10 — 1,534 QA pairs across 10 multi-session conversations.

System	Recall@5	95% CI
YourMemory (BM25 + vector + graph + decay)	59%	56–61%
Zep Cloud	28%	26–30%

2× better recall than Zep Cloud on the same benchmark.

Full methodology and per-sample breakdown in BENCHMARKS.md. Writeup: I built memory decay for AI agents using the Ebbinghaus forgetting curve.

Demo

Quick Start

Supports Python 3.11, 3.12, 3.13, and 3.14. No Docker, no database setup, no external services.

Step 1 — Install

pip install yourmemory

Step 2 — Run setup (once)

yourmemory-setup

Downloads the spaCy language model and initialises the local database at ~/.yourmemory/memories.duckdb.

Step 3 — Get your config path

yourmemory-path

Prints your full executable path and a ready-to-paste config block. Copy it.

Step 4 — Wire into your AI client

Claude Code

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "yourmemory": {
      "command": "yourmemory"
    }
  }
}

Reload (Cmd+Shift+P → Developer: Reload Window).

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "yourmemory": {
      "command": "yourmemory"
    }
  }
}

Restart Claude Desktop.

Cline (VS Code)

VS Code doesn't inherit your shell PATH. Run yourmemory-path first to get the full executable path.

In Cline → MCP Servers → Edit MCP Settings:

{
  "mcpServers": {
    "yourmemory": {
      "command": "/full/path/to/yourmemory",
      "args": [],
      "env": { "YOURMEMORY_USER": "your_name" }
    }
  }
}

Restart Cline after saving.

Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "yourmemory": {
      "command": "/full/path/to/yourmemory",
      "args": [],
      "env": { "YOURMEMORY_USER": "your_name" }
    }
  }
}

OpenCode

Add to ~/.config/opencode/config.json:

{
  "mcp": {
    "yourmemory": {
      "type": "local",
      "command": ["yourmemory"],
      "environment": { "YOURMEMORY_USER": "your_name" }
    }
  }
}

Then copy the memory workflow instructions:

cp sample_CLAUDE.md ~/.config/opencode/instructions.md

Restart OpenCode.

Any MCP-compatible client: YourMemory is a standard stdio MCP server. Works with Windsurf, Continue, Zed, and any client that supports MCP. Use the full path from yourmemory-path if the client doesn't inherit shell PATH.

Step 5 — Add memory instructions to your project

cp sample_CLAUDE.md CLAUDE.md

Edit CLAUDE.md — replace YOUR_NAME and YOUR_USER_ID. Claude now follows the recall → store → update workflow automatically on every task.

MCP Tools

Three tools. Called by Claude automatically once CLAUDE.md is in place.

Tool	When	What it does
`recall_memory(query)`	Start of every task	Surfaces relevant memories ranked by similarity × strength
`store_memory(content, importance)`	After learning something new	Embeds and stores with biological decay
`update_memory(id, new_content)`	When a memory is outdated	Re-embeds and replaces

# Example session
store_memory("Sachit prefers tabs over spaces in Python", importance=0.9, category="fact")

# Next session — without being told again:
recall_memory("Python formatting")
# → {"content": "Sachit prefers tabs over spaces in Python", "strength": 0.87}

Categories control how fast memories fade

Category	Survives without recall	Use case
`strategy`	~38 days	Successful patterns
`fact`	~24 days	Preferences, identity
`assumption`	~19 days	Inferred context
`failure`	~11 days	Errors, environment-specific issues

How It Works

Ebbinghaus Forgetting Curve

Memory strength decays exponentially — but importance and recall frequency slow that decay:

effective_λ = base_λ × (1 - importance × 0.8)
strength    = importance × e^(−effective_λ × days) × (1 + recall_count × 0.2)
score       = cosine_similarity × strength

Memories recalled frequently resist decay. Memories below strength 0.05 are pruned automatically every 24 hours.

Hybrid Retrieval: Vector + Graph

Retrieval runs in two rounds to surface related context that vocabulary-based search misses:

Round 1 — Vector search: cosine similarity against all memories, returns top-k above threshold.

Round 2 — Graph expansion: BFS traversal from Round 1 seeds surfaces memories that share context but not vocabulary — connected via semantic edges (cosine similarity ≥ 0.4).

recall("Python backend")
  Round 1 → [1] Python/MongoDB    (sim=0.61)
             [2] DuckDB/spaCy     (sim=0.19)
  Round 2 → [5] Docker/Kubernetes (sim=0.29 — below cut-off, surfaced via graph)

Chain-aware pruning: A decayed memory is kept alive if any graph neighbour is above the prune threshold. Related memories age together.

Multi-Agent Memory

Multiple agents can share the same YourMemory instance — each with isolated private memories and controlled access to shared context.

from src.services.api_keys import register_agent

result = register_agent(
    agent_id="coding-agent",
    user_id="sachit",
    can_read=["shared", "private"],
    can_write=["shared", "private"],
)
# → result["api_key"]  — ym_xxxx, shown once only

Pass api_key to any MCP call to authenticate as an agent:

store_memory(content="Staging uses self-signed cert — skip SSL verify",
             importance=0.7, category="failure",
             api_key="ym_xxxx", visibility="private")

recall_memory(query="staging SSL", api_key="ym_xxxx")
# → returns shared memories + this agent's private memories
# → other agents see shared only

Stack

Component	Role
DuckDB	Default vector DB — zero setup, native cosine similarity
NetworkX	Default graph backend — persists at `~/.yourmemory/graph.pkl`
sentence-transformers	Local embeddings (`all-mpnet-base-v2`, 768 dims)
spaCy	Local NLP for deduplication and SVO triple extraction
APScheduler	Automatic 24h decay job
PostgreSQL + pgvector	Optional — for teams or large datasets
Neo4j	Optional graph backend — `pip install 'yourmemory[neo4j]'`

PostgreSQL setup (optional)

pip install yourmemory[postgres]

Create a .env file:

DATABASE_URL=postgresql://YOUR_USER@localhost:5432/yourmemory

macOS

brew install postgresql@16 pgvector && brew services start postgresql@16
createdb yourmemory

Ubuntu / Debian

sudo apt install postgresql postgresql-contrib postgresql-16-pgvector
createdb yourmemory

Architecture

Claude / Cline / Cursor / Any MCP client
    │
    ├── recall_memory(query, api_key?)
    │       └── embed → vector similarity (Round 1)
    │               → graph BFS expansion  (Round 2)
    │               → score = sim × strength → top-k
    │               → recall propagation → boost neighbours
    │
    ├── store_memory(content, importance, category?, visibility?, api_key?)
    │       └── question? → reject
    │               contradiction check → update if conflict
    │               embed() → INSERT → index_memory() → graph node + edges
    │
    └── update_memory(id, new_content, importance)
            └── embed(new_content) → UPDATE → refresh graph node

  Vector DB (Round 1)             Graph DB (Round 2)
  DuckDB (default)                NetworkX (default)
    memories.duckdb                 graph.pkl
    ├── embedding FLOAT[768]        ├── nodes: memory_id, strength
    ├── importance FLOAT            └── edges: sim × verb_weight ≥ 0.4
    ├── recall_count INTEGER
    ├── visibility VARCHAR        Neo4j (opt-in)
    └── agent_id VARCHAR            └── bolt://localhost:7687

Dataset Reference

Benchmarks use the LoCoMo dataset by Snap Research.

Maharana et al. (2024). LoCoMo: Long Context Multimodal Benchmark for Dialogue. Snap Research.

License

Free for: personal use, education, academic research, open-source projects.
Not permitted: commercial use without a separate written agreement.

Commercial licensing: mishrasachit1@gmail.com

↑ top

21.The Prompt API

Sourcehttps://developer.chrome.com/docs/ai/prompt-api

SiteChrome for Developers

Submittergslin (Hacker News)

Submitted2026-04-27 02:18 UTC (Hacker News)

HN activity3 points · 0 comments

Length2.6K words (~12 min read)

Languageen

Discover the infinite possibilities of the Prompt API.

Thomas Steiner

Alexandra Klepper

Published: May 20, 2025, Last updated: September 21, 2025

Explainer	Web	Extensions	Chrome Status	Intent
GitHub	Origin trial	Chrome 138	View	Intent to Experiment
GitHub	Origin trial for sampling parameters	Chrome 148	View	Intent to Experiment

With the Prompt API, you can send natural language requests to Gemini Nano in the browser.

There are many ways you can use the Prompt API. For example, you could build:

AI-powered search: Answer questions based on the content of a web page.
Personalized news feeds: Build a feed that dynamically classifies articles with categories and allow for users to filter for that content.
Custom content filters. Analyze news articles and automatically blur or hide content based on user-defined topics.
Calendar event creation. Develop a Chrome Extension that automatically extracts event details from web pages, so users can create calendar entries in just a few steps.
Seamless contact extraction. Build an extension that extracts contact information from websites, making it easier for users to contact a business or add details to their list of contacts.

These are just a few possibilities, and we're excited to see what you create.

Review the hardware requirements

The following requirements exist for developers and the users who operate features using these APIs in Chrome. Other browsers may have different operating requirements.

The Language Detector and Translator APIs work in Chrome on desktop. These APIs do not work on mobile devices.

The Prompt API, Summarizer API, Writer API, Rewriter API, and Proofreader API work in Chrome when the following conditions are met:

Operating system: Windows 10 or 11; macOS 13+ (Ventura and onwards); Linux; or ChromeOS (from Platform 16389.0.0 and onwards) on Chromebook Plus devices. Chrome for Android, iOS, and ChromeOS on non-Chromebook Plus devices are not yet supported by the APIs which use Gemini Nano.
Storage: At least 22 GB of free space on the volume that contains your Chrome profile.
GPU or CPU: Built-in models can run with GPU or CPU.
- GPU: Strictly more than 4 GB of VRAM.
- CPU: 16 GB of RAM or more and 4 CPU cores or more.
- Note: The Prompt API with audio input requires a GPU.
Network: Unlimited data or an unmetered connection.

Gemini Nano's exact size may vary as the browser updates the model. To determine the current size, visit chrome://on-device-internals.

The Prompt API uses the Gemini Nano model in Chrome. While the API is built into Chrome, the model is downloaded separately the first time an origin uses the API. Before you use this API, acknowledge Google's Generative AI Prohibited Uses Policy.

To determine if the model is ready to use, call LanguageModel.availability().

const availability = await LanguageModel.availability({
  // The same options in `prompt()` or `promptStreaming()`
});

To trigger the download and instantiate the language model, check for user activation. Then, call the create() function.

const session = await LanguageModel.create({
  monitor(m) {
    m.addEventListener('downloadprogress', (e) => {
      console.log(`Downloaded ${e.loaded * 100}%`);
    });
  },
});

If the response to availability() was downloading, listen for download progress and inform the user, as the download may take time.

Use on localhost

All of the built-in AI APIs are available on localhost in Chrome. Set the following flags to Enabled:

chrome://flags/#optimization-guide-on-device-model
chrome://flags/#prompt-api-for-gemini-nano-multimodal-input

Then click Relaunch or restart Chrome. If you encounter errors, troubleshoot localhost.

Model parameters

The params() function informs you of the language model's parameters. The object has the following fields:

defaultTopK: The default top-K value.
maxTopK: The maximum top-K value.
defaultTemperature: The default temperature.
maxTemperature: The maximum temperature.

// Only available when using the Prompt API for Chrome Extensions.
await LanguageModel.params();
// {defaultTopK: 3, maxTopK: 128, defaultTemperature: 1, maxTemperature: 2}

Create a session

Once the Prompt API can run, you create a session with the create() function.

const session = await LanguageModel.create();

Create a session with the Prompt API for Chrome Extensions

When you use the Prompt API for Chrome Extensions, each session can be customized with topK and temperature using an optional options object. The default values for these parameters are returned from LanguageModel.params().

// Only available when using the Prompt API for Chrome Extensions.
const params = await LanguageModel.params();
// Initializing a new session must either specify both `topK` and
// `temperature` or neither of them.
// Only available when using the Prompt API for Chrome Extensions.
const slightlyHighTemperatureSession = await LanguageModel.create({
  temperature: Math.max(params.defaultTemperature * 1.2, 2.0),
  topK: params.defaultTopK,
});

The create() function's optional options object also takes a signal field, which lets you pass an AbortSignal to destroy the session.

const controller = new AbortController();
stopButton.onclick = () => controller.abort();

const session = await LanguageModel.create({
  signal: controller.signal,
});

Add context with initial prompts

With initial prompts, you can provide the language model with context about previous interactions, for example, to allow the user to resume a stored session after a browser restart.

const session = await LanguageModel.create({
  initialPrompts: [
    { role: 'system', content: 'You are a helpful and friendly assistant.' },
    { role: 'user', content: 'What is the capital of Italy?' },
    { role: 'assistant', content: 'The capital of Italy is Rome.' },
    { role: 'user', content: 'What language is spoken there?' },
    {
      role: 'assistant',
      content: 'The official language of Italy is Italian. [...]',
    },
  ],
});

Constrain responses with a prefix

You can add an "assistant" role, in addition to previous roles, to elaborate on the model's previous responses. For example:

const followup = await session.prompt([
  {
    role: "user",
    content: "I'm nervous about my presentation tomorrow"
  },
  {
    role: "assistant",
    content: "Presentations are tough!"
  }
]);

In some cases, instead of requesting a new response, you may want to prefill part of the "assistant"-role response message. This can be helpful to guide the language model to use a specific response format. To do this, add prefix: true to the trailing "assistant"-role message. For example:

const characterSheet = await session.prompt([
  {
    role: 'user',
    content: 'Create a TOML character sheet for a gnome barbarian',
  },
  {
    role: 'assistant',
    content: '```toml\n',
    prefix: true,
  },
]);

Add expected input and output

The Prompt API has multimodal capabilities and supports multiple languages. Set the expectedInputs and expectedOutputs modalities and languages when creating your session.

type: Modality expected.
- For expectedInputs, this can be text, image, or audio.
- For expectedOutputs, the Prompt API allows text only.
languages: Array to set the language or languages expected. The Prompt API accepts "en", "ja", and "es". Support for additional languages is in development.
- For expectedInputs, set the system prompt language and one or more expected user prompt languages.
- Set one or more expectedOutputs languages.

const session = await LanguageModel.create({
  expectedInputs: [
    { type: "text", languages: ["en" /* system prompt */, "ja" /* user prompt */] }
  ],
  expectedOutputs: [
    { type: "text", languages: ["ja"] }
  ]
});

You may receive a "NotSupportedError" DOMException if the model encounters an unsupported input or output.

Multimodal capabilities

With these capabilities, you could:

Allow users to transcribe audio messages sent in a chat application.
Describe an image uploaded to your website for use in a caption or alt text.

Take a look at the Mediarecorder Audio Prompt demo for using the Prompt API with audio input and the Canvas Image Prompt demo for using the Prompt API with image input.

The Prompt API supports the following input types:

Audio:
Visual:
- HTMLImageElement
- SVGImageElement
- HTMLVideoElement (uses the video frame at the current video position)
- HTMLCanvasElement
- ImageBitmap
- OffscreenCanvas
- VideoFrame
- Blob
- ImageData

This snippet shows a multimodal session that first processes two visuals (one image Blob and one HTMLCanvasElement) and has the AI compare them, and that second lets the user respond with an audio recording (as an AudioBuffer).

const session = await LanguageModel.create({
  expectedInputs: [
    { type: "text", languages: ["en"] },
    { type: "audio" },
    { type: "image" },
  ],
  expectedOutputs: [{ type: "text", languages: ["en"] }],
});

const referenceImage = await (await fetch("reference-image.jpeg")).blob();
const userDrawnImage = document.querySelector("canvas");

const response1 = await session.prompt([
  {
    role: "user",
    content: [
      {
        type: "text",
        value:
          "Give a helpful artistic critique of how well the second image matches the first:",
      },
      { type: "image", value: referenceImage },
      { type: "image", value: userDrawnImage },
    ],
  },
]);
console.log(response1);

const audioBuffer = await captureMicrophoneInput({ seconds: 10 });

const response2 = await session.prompt([
  {
    role: "user",
    content: [
      { type: "text", value: "My response to your critique:" },
      { type: "audio", value: audioBuffer },
    ],
  },
]);
console.log(response2);

Append messages

Inference may take some time, especially when prompting with multimodal inputs. It can be useful to send predetermined prompts in advance to populate the session, so the model can get a head start on processing.

While initialPrompts are useful at session creation, the append() method can be used in addition to the prompt() or promptStreaming() methods, to give additional additional contextual prompts after the session is created.

For example:

const session = await LanguageModel.create({
  initialPrompts: [
    {
      role: 'system',
      content:
        'You are a skilled analyst who correlates patterns across multiple images.',
    },
  ],
  expectedInputs: [{ type: 'image' }],
});

fileUpload.onchange = async () => {
  await session.append([
    {
      role: 'user',
      content: [
        {
          type: 'text',
          value: `Here's one image. Notes: ${fileNotesInput.value}`,
        },
        { type: 'image', value: fileUpload.files[0] },
      ],
    },
  ]);
};

analyzeButton.onclick = async (e) => {
  analysisResult.textContent = await session.prompt(userQuestionInput.value);
};

The promise returned by append() fulfills once the prompt has been validated, processed, and appended to the session. The promise is rejected if the prompt cannot be appended.

Pass a JSON Schema

Add the responseConstraint field to prompt() or promptStreaming() method to pass a JSON Schema as the value. You can then use structured output with the Prompt API.

In the following example, the JSON Schema makes sure the model responds with true or false to classify if a given message is about pottery.

const session = await LanguageModel.create();

const schema = {
  "type": "boolean"
};

const post = "Mugs and ramen bowls, both a bit smaller than intended, but that
happens with reclaim. Glaze crawled the first time around, but pretty happy
with it after refiring.";

const result = await session.prompt(
  `Is this post about pottery?\n\n${post}`,
  {
    responseConstraint: schema,
  }
);
console.log(JSON.parse(result));
// true

Your implementation can include a JSON Schema or regular expression as part of the message sent to the model. This uses some of the context window. You can measure how much of the context window it will use by passing the responseConstraint option to session.measureContextUsage().

You can avoid this behavior with the omitResponseConstraintInput option. If you do so, we recommend that you include some guidance in the prompt:

const result = await session.prompt(`
  Summarize this feedback into a rating between 0-5. Only output a JSON
  object { rating }, with a single property whose value is a number:
  The food was delicious, service was excellent, will recommend.
`, { responseConstraint: schema, omitResponseConstraintInput: true });

Prompt the model

You can prompt the model with either the prompt() or the promptStreaming() functions.

Request-based output

If you expect a short result, you can use the prompt() function that returns the response once it's available.

// Start by checking if it's possible to create a session based on the
// availability of the model, and the characteristics of the device.
const available = await LanguageModel.availability({
  expectedInputs: [{type: 'text', languages: ['en']}],
  expectedOutputs: [{type: 'text', languages: ['en']}],
});

if (available !== 'unavailable') {
  const session = await LanguageModel.create();

  // Prompt the model and wait for the whole result to come back.
  const result = await session.prompt('Write me a poem!');
  console.log(result);
}

Streamed output

If you expect a longer response, you should use the promptStreaming() function which lets you show partial results as they come in from the model. The promptStreaming() function returns a ReadableStream.

const available = await LanguageModel.availability({
  expectedInputs: [{type: 'text', languages: ['en']}],
  expectedOutputs: [{type: 'text', languages: ['en']}],
});
if (available !== 'unavailable') {
  const session = await LanguageModel.create();

  // Prompt the model and stream the result:
  const stream = session.promptStreaming('Write me an extra-long poem!');
  for await (const chunk of stream) {
    console.log(chunk);
  }
}

Stop prompting

Both prompt() and promptStreaming() accept an optional second parameter with a signal field, which lets you stop running prompts.

const controller = new AbortController();
stopButton.onclick = () => controller.abort();

const result = await session.prompt('Write me a poem!', {
  signal: controller.signal,
});

Session management

Each session keeps track of the context of the conversation. Previous interactions are taken into account for future interactions until the session's context window is full.

Each session has a maximum number of tokens it can process. Check your progress towards this limit with the following:

console.log(`${session.contextUsage}/${session.contextWindow}`);

It's possible to send a prompt that causes the context window to overflow. In such cases, the initial portions of the conversation with the language model will be removed, one prompt and response pair at a time, until enough tokens are available to process the new prompt. The exception is the system prompt, which is never removed.

Such overflows can be detected by listening for the contextoverflow event on the session:

session.addEventListener("contextoverflow", () => {
  console.log("We've gone past the context window, and some inputs will be dropped!");
});

If it's not possible to remove enough tokens from the conversation history to process the new prompt, then the prompt() or promptStreaming() call will fail with a QuotaExceededError exception and nothing will be removed. The QuotaExceededError has the following properties:

requested: how many tokens the input consists of
contextWindow: how many tokens were available

Learn more about session management.

Clone a session

To preserve resources, you can copy an existing session with the clone() function. This creates a fork of the conversation, where the context and initial prompt are preserved.

The clone() function takes an optional options object with a signal field, which lets you pass an AbortSignal to destroy the cloned session.

const controller = new AbortController();
stopButton.onclick = () => controller.abort();

const clonedSession = await session.clone({
  signal: controller.signal,
});

Terminate a session

Call destroy() to free resources if you no longer need a session. When a session is destroyed, it can no longer be used, and any ongoing execution is aborted. You may want to keep the session around if you intend to prompt the model often since creating a session can take some time.

await session.prompt(
  "You are a friendly, helpful assistant specialized in clothing choices."
);

session.destroy();

// The promise is rejected with an error explaining that
// the session is destroyed.
await session.prompt(
  "What should I wear today? It is sunny, and I am choosing between a t-shirt
  and a polo."
);

Demos

We've built multiple demos to explore the many use cases for the Prompt API. The following demos are web applications:

To test the Prompt API in Chrome Extensions, install the demo extension. The extension source code is available on GitHub.

Performance strategy

The Prompt API for the web is still being developed. While we build this API, refer to our best practices on session management for optimal performance.

Permission Policy, iframes, and Web Workers

By default, the Prompt API is only available to top-level windows and to their same-origin iframes. Access to the API can be delegated to cross-origin iframes using the Permission Policy allow="" attribute:

<!--
  The hosting site at https://main.example.com can grant a cross-origin iframe
  at https://cross-origin.example.com/ access to the Prompt API by
  setting the `allow="language-model"` attribute.
-->
<iframe src="https://cross-origin.example.com/" allow="language-model"></iframe>

The Prompt API isn't available in Web Workers for now, due to the complexity of establishing a responsible document for each worker in order to check the permissions policy status.

Your input can directly impact how we build and implement future versions of this API and all built-in AI APIs.

For feedback on Chrome's implementation, file a bug report or a feature request.
Share your feedback on the API shape by commenting on an existing Issue or by opening a new one in the Prompt API GitHub repository.
Join the early preview program.

↑ top

22.Quirks of Human Anatomy

Sourcehttps://www.sdbonline.org/sites/fly/lewheldquirk/figlegq6.htm

Sitesdbonline.org

Submittergurjeet (Hacker News)

Submitted2026-04-25 04:31 UTC (Hacker News)

HN activity97 points · 61 comments

Length3.7K words (~17 min read)

figure legends 1 * 2 * 3 * 4 * 5 * 7 * A

Figure Legends 6 6.1 * 6.1R * 6.2 * 6.2R * 6.3 * 6.3R

figure legends 1 * 2 * 3 * 4 * 5 * 7 * A

N.B.: An 'R' suffix denotes reflections (commentaries, annotations, and further references) pertaining to the numbered legend that precedes it.

[Select any image to enlarge; use back button to return] Fig. 6.1

Rogues’ gallery of human anatomical flaws, ranging from the merely silly (a, f, j), to the patently stupid (b, c, d, e, i, k), to the potentially lethal (g, h, l, m). See Table 6.1 for further information.

a. Muscles (dark bands) that permit ear wiggling, a dubious talent that has no apparent function aside from entertaining young children: (1) superior, (2) anterior, and (3) posterior auricularis [1004,1488].

b. Eyeball with optic nerve. The retina is considered to be inside out because its photoreceptors (p) lie behind its ganglion cells (g), an orientation we may have gotten stuck with early in evolution (cf. Fig. 6.2). Since then, the photoreceptors have come to rely on the (outer) layer of pigment cells (not shown) [2098], so it may now be impossible to reverse this order. Thickness of retinal layers is exaggerated.

c. Blind spot where ganglion cell axons exit the eye. This suboptimal escape route is due to the inverted geometry shown in b and explained in Fig. 6.2. The hole in the retina is sizeable (~9 full moons in the sky), but we don’t notice it because (1) the blind spots of the two eyes are in separate parts of the visual field and (2) our brain automatically fills in gaps in our visual field by interpolation [2117].

d. The common crowding of human teeth—especially "wisdom" teeth, which erupt last—is traceable to the evolutionary shortening of our jaw. See text.

e. Human embryo at 31 days postfertilization. Branchial arches (in dashed box) are a vestige from our fish ancestors. They now serve only as precursors for structures which could develop without such clefts (cf. Ch. 4).

f. Nipples are useless in human males (cf. Ch. 5).

g. Choking (upper left) is caused by malfunctioning of the epiglottis. The epiglottis is a cartilaginous flap (e) that reflexively covers the trachea (T) (lower box) when we swallow so that food or drink goes to the esophagus (E). The silhouette is a sagittal section with cavities as blank areas [612,889,2502,2737]. The line between E and T represents their adjacent lumenal walls [685]. One way to prevent choking (upper right) would be to have separate pipes for breathing and swallowing. In fact, that configuration already exists in babies, who can suckle and nose breathe at the same time: their larynx is so high that the epiglottis can touch the base of the nasal cavity at the midline [1547], with milk flowing around the juncture (see ref. [1469] for a clear diagram). The same is true for other mammals, who rest their epiglottis on the back edge of the soft palate [889,1546,1547]. The problem with having air bypass the mouth entirely, of course, is that it precludes talking [1931], although toothed whales still manage to "sing" to each other via an asymmetric specialization of their nasal passages [1718,2412].

h. Childbirth involves a tight fit between the baby’s head and the mother’s pelvic opening (cf. Fig. 7.1) [525]. Indeed, the passageway is so confining that the head must turn as shown (long axis transverse) to enter the birth canal, but then must rotate 90 degrees (long axis sagittal) to exit and let the shoulders enter [10,2220]. The process would be safer (and less painful) if the path mimicked a caesarean section, with the baby exiting through the navel. Drawn on the baby’s head are the fontanels (open sutures) that allow the skull to deform during birth [500,685].

i. Back pain often occurs during pregnancy because of strains in the lumbar region (cf. Fig. 5.1). More serious repercussions of the spine’s sinusoidal shape include herniated disks [10].

j. The vasa deferentia are tubes that conduct sperm from the testes to the urethra. They are much longer than they need to be in humans because of the circuitous route that they took evolutionarily [2384].

k. Routing a tube like the urethra through a solid organ like the prostate runs the risk of strangulation if the organ overgrows—a condition all too common in older men.

l. In May-Thurner Syndrome, blood clots typically arise in the left—but not right—leg [196,320,1683]. This peculiar ischemic asymmetry is due to a normal quirk. The abdominal aorta and the inferior vena cava travel side by side to the groin. Just after they split into the two common iliac arteries and the two common iliac veins (one artery and one vein routed to each leg), the right fork of the arteries passes over the left fork of the veins. This overlap compresses the left vein between the spine and the overlying artery, resulting in occasional thrombosis. There is no such crossover on the right side.

m. After ovulation, the egg leaves the ovary and enters the flower-shaped opening of the adjacent oviduct. Because fertilization occurs in the oviduct, the embryo can occasionally stick to the oviduct wall before it ever gets to the uterus, resulting in a life-threatening "ectopic" or "tubal" pregnancy [91]. It would have been safer if evolution had kept the length of the oviduct to a bare minimum or done away with it altogether and connected the ovary directly to the uterus.

Fig. 6.1R The full-length man and woman are redrawn from a plaque aboard Pioneer 10—the first human-made object ever to leave our solar system [2253]. Launched in 1972, this human-sized spacecraft whipped past Jupiter (picking up speed), crossed Pluto’s orbit, and began coasting toward Aldebaran, which it should reach in ~2 million years [2125]. The plaque was meant as an interstellar "postcard." If any aliens stumble upon it, they will surely chuckle at our quirks (if they can laugh), although turnabout is fair play, and if we were to see their two sexes (or however many they have) au naturel, we might chortle right back at them.

The standing humans were drawn by Linda Salzman in collaboration with astronomers Carl Sagan (her then-husband) and Frank Drake [573].

Choking diagrams (g) are adapted from refs. [1866,2331], and childbirth sketches (h) are redrawn from refs. [1579,1786,1866], except for the profile of the gravid woman, which is based on photos of harpist Cheryl Gallagher from her musical compact disc Pregnant Pause.

g. As explained in the text, choking became a problem when our lungs arose as a branch of our eating tube. François Jacob ascribed such flaws to the trial-and-error nature of the evolutionary process [1265]: "To make a lung with a piece of esophagus sounds very much like tinkering."

There is also a deeper question here: why must people die when deprived of oxygen? Based on the prevalence of hibernation, estivation, and facultative anaerobiosis among animals [76,1675,2505], the answer is unclear [1171]. Sea turtles, for example, can hold their breath for at least three hours [1600]. Evolution, it would seem, could have given us the means to survive episodes of choking, drowning, or suffocation [1316,2229]. Why didn’t it? Presumably, the rarity of asphyxiation among primates (by drowning, etc.) reduced the marginal advantage that any salvational mutations might have had to a negligible level. The greatest irony about oxygen is that it used to be poisonous for living things before ~2 billion years ago [468,774], but eukaryotes evolved ways to detoxify and harness it to our metabolism so that it has become vital for us [12,152,2128]. This abrupt reversal of fortune was as dramatic as any Shakespearean plot.

h. While we are decrying the pain of childbirth, we should also pity the poor kiwi, whose egg comprises 25% of her body weight [915], for she, too, labors mightily [2549]. Some quirk of ratite history (as yet unknown [342,343,2071]) doomed those birds to that lunacy [969].

The depicted solution for humans (i.e., birth through the navel) may seem farfetched, but the spotted hyena has rerouted its birth canal just as drastically: hyena pups must make a ~180-degree turn when they reach the mother’s pelvis and then exit through her clitoris (pseudopenis) [803]! The problem with this path is that the diameter of the clitoral meatus is too narrow (~2.2 cm) to pass the pup’s head (~6.5 cm), so the pain must be excruciating, which makes one wonder why hyenas laugh at all!

Darwin made an interesting observation about our fontanels. He noted how lucky we are that these hinges were already in place (because of how skull bones grow) before they acquired the function of allowing our skull to deform during the tight squeeze of the birth process. In other words, mammalian sutures were "co-opted" as hominin hinges: The sutures in the skulls of young mammals have been advanced as a beautiful adaptation for aiding parturition ... but as sutures occur in the skulls of young birds and reptiles, which have only to escape from a broken egg, we may infer that this structure has arisen from the laws of growth, and has been taken advantage of in the parturition of the higher animals. [559] (p. 197)

Fig. 6.2

Hypothetical (bislagiatt) explanation for two flaws of human eyes: our backward retina and our blind spot. ("Bislagiatt" stands for "But it seemed like a good idea at the time.")

This diagram incorporates the inversion scheme of Balfour (1881) [2736], the axonal inferences of Polyak (1957) [2061] and Sarnat and Netsky (1981) [2278], and the opacity conjecture of Walls (1942) [539,2736], all of which are based on the tendency of chordate development to recapitulate its evolution [956]. Cartoons are transverse sections. Fish fins (which evolved later) are added merely to orient the reader (dorsal up, ventral down). Notwithstanding the symbols in the key, photoreceptors (p) and ganglion cells (g) are also neurons (n), albeit part of the peripheral versus central nervous system (CNS). Cell size is greatly exaggerated.

a. Our bilaterian (protochordate?) ancestor is thought to have had (1) a superficial nerve net [1458,1584,2579], (2) eyespots made of photoreceptors and ganglion cells [1461], and (3) criss-crossed wiring where ganglion cells projected axons to motor neurons (subset of n) on the opposite flank [2278]. The advantage of this contralateral wiring was that it allowed each eyespot to turn the body reflexively away from potential harm whenever it detected the shadow of a possible predator [2061,2278]. The problem with having neurons on the surface, of course, is that they could be easily damaged. To put it bluntly, it is not a good idea to "wear your brain on your sleeve"!

b. One solution to this Vulnerability Problem was for the CNS to move inside. The internalization trick that evolved in our chordate ancestors was "neurulation" [1038]—an involution of the dorsal surface to form a tube [1778,2312]—seen as a circle here in cross-section. Our neural tube ultimately forms our brain and spinal cord. During the CNS invagination of our protochordate forebear, the eyespots must have been dragged along and hence turned inside out. Ever since, our clade has, sadly, been saddled with a backward retina. Although this reversal turned out to be a mistake in hindsight, the eyespots were still just shadow detectors (not image detectors) at that time, so lower resolution was not yet a problem. This optical system would have worked quite well as long as the skin stayed transparent. However, any skin darkening would have severely reduced the effectiveness of the eyespots, and selective pressure would thereby have compelled (1) the eyespots to move nearer the surface and (2) the skin at the contact site to become transparent [2736].

c. To solve this Opacity Problem (presumably), vertebrates evolved a transparent lens where the prospective retina contacts the surface [1308]. During our development, the retinal outgrowths are initially shaped like balloons [2155] but then collapse, as if burst, into two-layered cups [18]. The outer layer facing the CNS becomes a pigmented epithelium. At some point in our evolutionary history, the lens must have blocked the escape route of ganglion cell axons. To solve this Entrapment Problem, the axons exited through the back wall, hence burdening us with a blind spot in perpetuity. Only a few axons are sketched here to indicate the routing.

Fig. 6.2R These stages suggest how easy it was for evolution to build an eye step by step [591,1478] because any upgrade in imaging would have conferred a significant advantage [1308,2384]. Darwin need not have fretted about the apparent challenge that the eye posed for his gradualistic theory [876]. For a schematic that nicely traces the layering of our retina all the way back to amphioxus, see ref. [59].

a. Surprisingly, all metazoans use virtually the same sensor in their photoreceptors [539,1478]. This gadget first evolved in prokaryotes [2449]. It relies on (1) the vitamin A derivative "retinal" (or a variant [932]) to absorb photons [1019,1723], (2) an "opsin" protein to monitor shape changes in retinal [176,1993,2288], and (3) a signal-relay chain of downstream effectors [787,1466,1971].

The evolutionary riddles posed by this universality are (1) how long did it take for prokaryotic genomes, by random mutation, to stumble upon an opsin-like protein that could cradle a chromophore and transduce its twitching [621,876,2044,2241]?, (2) why was 11-cis retinal recruited instead of some other photoactive agent [932,1847,2709]?, and (3) how did it get linked to a particular transduction pathway [763,2521]?

Believe it or not, the following creatures actually exist [876,2628]: (1) a jellyfish larva with bona fide photoreceptors but no nervous system to process their output [1896], (2) a fish that uses chlorophyll as a chromophore [675], and (3) a single-celled dinoflagellate that sports a humanoid "eye," complete with lens, photosensitive membranes, and a pigment shield [2013]! This odd menagerie prompted one author to propose that the eye evolved before the brain [876]!

b. In arthropods the nervous system arose evolutionarily from a superficial sheet of ectoderm [1779], as did ours [1463]. Strangely, however, arthropods are not all alike in how they solve the CNS Vulnerability Problem. Two classes, chelicerates and myiapods, undergo a variant version of neurulation from the ventral (vs. dorsal) surface [2494], but two other classes, insects and crustaceans, develop entirely differently [1778] (cf. other phyla [1678]). Instead of ectodermal origami, they "percolate" cells inside one by one to form their nerve cord [1104,1165,2495].

c. Fusions have been omitted to preserve the geometry of (1) neural invagination at the dorsal midline to form a tube and (2) lens invagination on the flank surface to form a sphere. The origami of eye development is different in other phyla [1478], which explains why, for example, the retinas of octopi and squid are not backward [1732]. Also omitted are (1) the cornea, which is relatively trivial histologically [2103]; (2) the neurons (horizontal, bipolar, and amacrine) that intervene between photoreceptors and ganglion cells [2597,2736]; and (3) the later stages of lens development wherein it abandons its hollow state to simulate a solid glass camera lens [548].

For decades the lens was thought to be induced in a one-step process by the optic cup [1692,2432], but we now know that the optic cup plays more of supporting role along with other actors [670,994,1487,2627]. The optic cups of one unfortunate frog must have taken a wrong turn because its eyes (lenses and all) wound up in the roof of its mouth [2211]: the frog could only see when it yawned!

Inexplicably, our iris is the only muscle in the body that comes from ectoderm instead of mesoderm [191,584]. (How on earth did the ectoderm get the password to unlock the "muscle vault" in the genome?) Thus, when you gaze into your lover’s eyes (the iris in particular), you’re actually seeing a colorful part of his or her brain [120].

Even stranger is the fact that the iris, which normally develops independently of the lens, can regenerate a lens in newts if the lens is artificially removed [1125,1273,2639]. This ability must be a spandrel of how eye parts are wired in the genome, but we have no clue about how it evolved [344,1056].

One final quirk worth noting is that the iris is intrinsically photosensitive—that is, it constricts on its own [2641]. How it does so is unclear, although we do know that it detects light using some sort of cryptochrome rather than an opsin pigment.

Fig. 6.3

Eye anatomy in humans (a-c) versus flies (d, e).

We have a simple eye; flies have a compound one. Human photoreceptors are mainly c-type (ciliary), whereas fly receptors are exclusively r-type (rhabdomeric). Recently (see text) it has been discovered that humans also have an r-type receptor in the guise of what looks like a garden-variety neuron (c).

a. It would seem that humans have only two eyes, but a third one (or what’s left of it) resides inside our head as the pineal gland (cf. Ch. 3).

b. Our eye works like a digital camera. Light is focused by the lens onto a screen (the retina), where it gets pixelated and converted into electrical signals. Those signals, in turn, are processed through various layers of the retina [2788] and sent to the brain via the optic nerve [2042,2098]. Acuity is greatest in the fovea. Axons exit nearby at a spot where we are blind. Abbreviation: CNS = central nervous system.

c. Schematic of retinal stratification (layers not to scale). There are five main types of neurons in the retina [1679] among ~55 types total [1352]. Only three are shown: RGCs (retinal ganglion cells), BPCs (bipolar cells), and photoreceptors. (Horizontal and amacrine cells are omitted.) A small subset (~3%) of RGCs are photosensitive (pRGC) [552,798]. They are r-type receptors, but they lack the stacked membrane system that we associate with bona fide photoreceptors [89]. The pigment layer absorbs photons to prevent back-scatter [1478]. Note that multiple rods converge onto single RGCs (the ratio is actually ~20 to 1), whereas every cone typically has its own dedicated RGC [2098]. This difference is one reason our cone-rich fovea has such high acuity.

d. Each of the fly’s compound eyes has ~750 simple eyes called "ommatidia" [1137]. Three separate simple eyes (white spots) are visible on top of the head: these "ocelli" mainly detect shadows [1776,1777].

e. The cuticular hexagonal dome serves as a cornea, whereas the pseudocone (pc) acts as a lens. The latter is a gelatinous cylinder secreted by four cone (C) cells (no relation to vertebrate cones). All but two of the eight photoreceptors (R cells) span the length of the ommatidium. As shown by cross-sections, R7 is above R8, an arrangement that enables R7 and R8 (which have orthogonal rhabdomere gratings) to detect polarized light [1090,2634]. Instead of using a pigment backscreen, flies wrap their photoreceptors in pigment tubes [1478] that are assembled like jigsaw puzzles from primary (P), secondary (S), and tertiary (T) pigment cells [1137]. Three of the six vertices of each hexagon are occupied by a bristle (B) instead.

f. Reciprocal human-fly chimeras in a woodcut (1793) by William Blake (1757-1827). This drawing was published by Claudio Stern [2472] from the private collection of Blake biographer Geoffrey Keynes (C. D. Stern, personal communication). It may have been an informal study. Blake did use the chrysalis with a baby’s face in the frontispiece (captioned "What is Man!") for his book Gates of Passion (1793), but he never published the man with a fly’s head [199] (cf. blakearchive.org).

Fig. 6.3R With the juxtaposition of these two faces we have, in essence, a family portrait. Humans and flies are basically twins separated at birth over half a billion years ago. In many ways we have come to look overtly different—for example, (1) we have an endoskeleton, whereas they have an exoskeleton; (2) we have four limbs, whereas they have six (not counting their wings); and (3) our eyes are simple, whereas theirs are compound [388,2923]. Yet despite these obvious disparities in gross anatomy, the same genetic gadgetry sculpts each of our respective structures during development [195,392], an abiding legacy of our common ancestry. Our bilaterian ancestor was considerably more primitive than either of these descendants [119,195,886,2663]. The fly portion of the diagram was adapted from ref. [1137].

c. Identities of the various retinal cell types are specified by a hierarchical combinatorial code [1084,1658,1819] that is time-dependent [616,1800,1920]. The regular spacing of the retinal ganglion cell (RGC) tiling pattern is achieved by ~80% apoptotic attrition [1550], homotypic lateral inhibition [705], and small-scale cell rearrangements [2145], without any need for homotypic contact [1568]. Nevertheless, homotypic contacts do play a role in RGC axon guidance [2042]. Instead of an absorptive pigment layer, cats and other nocturnal animals have a reflective tapetum that sends undetected photons back through the photoreceptors a second time [1478].

e. Despite what appears to be a solid-state dioptric apparatus, flies can adjust the amount of light reaching the photoreceptors by means of a myosin-operated "pupil" [2279]. For a primer on how fly eyes work, see refs. [1091,1815,1947]. (N.B. for fly aficionados: the upper cross-section is not the minimal "repeat unit" of standard textbook diagrams [2139]; instead, it shows the entire ring of cells encircling the photoreceptor core.)

f. William Blake’s fascination with the human-fly metaphor was not confined to this woodcut. In Songs of Innocence and of Experience (1794), he had a poem called "The Fly": "Little fly, / thy summer’s play / my thoughtless hand / has brushed away. // Am not I / a fly like thee? / Or art not thou / a man like me? // For I dance / and drink & sing, / till some blind hand / shall brush my wing. // If thought is life / and strength & breath, / and the want / of thought is death, // then am I / a happy fly, / if I live, / or if I die." Reciprocal human-fly chimeras were also central to the classic 1958 horror/science-fiction movie The Fly starring Vincent Price. In the 1986 remake with Jeff Goldblum, a human-fly hybrid was featured instead. (Don’t let your kids see either movie!)

top of page

Lewis I. Held, Jr. is Associate Professor in the Department of Biology at Texas Tech University.

↑ top

23.Running Bare-Metal Rust Alongside ESP-IDF on the ESP32-S3's Second Core

Sourcehttps://tingouw.com/blog/embedded/esp32/run_rust_on_app_core

Sitetingouw.com

SubmitterMrBuddyCasino (Hacker News)

Submitted2026-04-24 04:57 UTC (Hacker News)

HN activity38 points · 7 comments

Length5.9K words (~26 min read)

Languageen

Tingou Wu's Personal Website

Running Bare-Metal Rust Alongside ESP-IDF on the ESP32-S3's Second Core

Building a Hot-Swappable, Dual-Paradigm Environment on Espressif Silicon

I've been working with the RP2350 and no_std Rust for a while now, and I've really come to appreciate how Rust is designed — safe yet surprisingly straightforward. But my latest project needs Wi-Fi and BLE, and the RP2350 doesn't have wireless hardware built in. That meant switching to the ESP32-S3.

The ESP32-S3 is a great chip, but here's the catch: most Wi-Fi and Bluetooth functionality lives inside Espressif's ESP-IDF framework, which is a C-based SDK built on top of FreeRTOS. There are community Rust wrappers for parts of ESP-IDF, and Espressif themselves offer some Rust support, but both are a moving target — documentation is sparse compared to the mature C API, and there's always one or two critical features missing.

So I was stuck choosing between two imperfect options:

Go all-in on Rust. I'd get the language features and crates I love, but the no_std ecosystem on ESP32-S3 is still young. In a shipping product, I didn't want to risk hitting undefined behavior in an immature HAL at 2 AM.
Go all-in on ESP-IDF (C). I'd get battle-tested Wi-Fi and BLE stacks, but I'd be writing C for everything — including the business logic, audio processing, and data handling where Rust really shines.

Then I remembered something: the ESP32-S3 has two CPU cores.

There's an option buried in ESP-IDF's Kconfig called CONFIG_FREERTOS_UNICORE. When you enable it, FreeRTOS only runs on Core 0. Core 1 just... sits there, stalled, doing nothing. That got me thinking: what if I let ESP-IDF own Core 0 for all the Wi-Fi, BLE, and system tasks, and then wake up Core 1 to run my own bare-metal Rust code — completely outside the RTOS?

Both cores share the same memory space, so passing data between them should be straightforward (though it does require some unsafe Rust). And since Core 1 wouldn't be managed by FreeRTOS, there'd be no scheduler preempting my time-critical audio processing loop.

After convincing myself this wasn't completely insane, I got to work. Here's how it all fits together.

Background: Why Not Just Pin a FreeRTOS Task?

Before diving in, it's worth addressing the obvious question: ESP-IDF already provides xTaskCreatePinnedToCore, which can pin a task to a specific core:

// FreeRTOS provides this function to create a task on a specific core.
// You could pin a Rust function to Core 1 this way — but FreeRTOS
// would still manage the scheduler on that core.
BaseType_t xTaskCreatePinnedToCore(
    TaskFunction_t pvTaskCode,       // Function that implements the task
    const char * const pcName,       // Human-readable name for debugging
    const uint32_t usStackDepth,     // Stack size in words (not bytes)
    void * const pvParameters,       // Arbitrary pointer passed to the task
    UBaseType_t uxPriority,          // Priority (higher = more CPU time)
    TaskHandle_t * const pvCreatedTask, // Output: handle to the created task
    const BaseType_t xCoreID         // 0 = PRO core, 1 = APP core
);

You could absolutely compile your Rust code as a static library, export a pub extern "C" fn, and have FreeRTOS run it on Core 1 via this API. The ESP-IDF build system would statically link your Rust .a file into the firmware.

The problem is that FreeRTOS's scheduler is still running on Core 1. Your task can be preempted at any time by higher-priority tasks or system ticks. For a high-performance audio processing loop where every microsecond of jitter matters, that's a non-starter. I needed a guarantee that nothing would interrupt my code once it started running.

By disabling FreeRTOS on Core 1 entirely (via CONFIG_FREERTOS_UNICORE=y), we get an empty CPU that we can control directly at the hardware level — no scheduler, no context switching, no surprises.

Part 0: Statically Linked Rust on a Bare Core

Let's start with the simpler approach: building Rust as a static library, linking it into the ESP-IDF firmware at compile time, and manually booting Core 1 to run it. This is the foundation everything else builds on.

Step 1: Reserve Memory for the Bare-Metal Core (C Side)

When Core 1 wakes up outside of FreeRTOS, it doesn't get a dynamically allocated stack from the OS — because there is no OS on that core. We need to manually set aside a chunk of RAM that ESP-IDF's heap allocator won't touch.

ESP-IDF provides the SOC_RESERVE_MEMORY_REGION macro for exactly this. It tells the bootloader and memory allocator to treat a specific address range as off-limits:

#include "heap_memory_layout.h"

// Reserve 128KB of internal SRAM for Core 1's stack and data.
// The two hex values define the start and end addresses of the reserved region.
// 0x3FCE9710 - 0x3FCC9710 = 0x20000 = 131072 bytes = 128KB.
// "rust_app" is just a label for debugging — it shows up in boot logs.
SOC_RESERVE_MEMORY_REGION(0x3FCC9710, 0x3FCE9710, rust_app);

Why 128KB? It's a reasonable default for an embedded stack plus some working memory. You can adjust this range depending on how much RAM your Rust code needs — just make sure the addresses fall within the ESP32-S3's internal SRAM region and don't overlap with anything ESP-IDF is using.

Step 2: Wake Up Core 1 from the C Side

This is the main ESP-IDF application running on Core 0. Its job is to:

Set up the system (Wi-Fi, peripherals, etc. — or in our test case, just boot).
Wake up Core 1 and point it at our Rust code.
Go about its normal FreeRTOS business.

Instead of using xTaskCreatePinnedToCore, we're talking directly to the ESP32-S3's hardware registers to boot Core 1. We set a boot address, enable the clock, release the stall, and pulse the reset line. Core 1 wakes up completely independent of FreeRTOS.

To verify that everything is working, Core 0 will read a shared counter variable (RUST_CORE1_COUNTER) that the Rust code on Core 1 increments in a loop.

#include <stdio.h>
#include <stdint.h>
#include "esp_log.h"
#include "esp_cpu.h"
#include "heap_memory_layout.h"
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "soc/system_reg.h"
#include "soc/soc.h"

static const char *TAG = "rust_app_core";

// Reserve memory so ESP-IDF's heap allocator doesn't use it.
// (Same macro from Step 1 — it must appear in a compiled C file.)
SOC_RESERVE_MEMORY_REGION(0x3FCC9710, 0x3FCE9710, rust_app);

// ---- External symbols ----
// These are defined in other files and resolved at link time:
//   rust_app_core_entry  — the Rust function (from our .a library)
//   app_core_trampoline  — tiny assembly stub that sets the stack pointer
//   _rust_stack_top      — address from our linker script (top of reserved 128KB)
//   ets_set_appcpu_boot_addr — ROM function that tells Core 1 where to start
extern void rust_app_core_entry(void);
extern void ets_set_appcpu_boot_addr(uint32_t);
extern uint32_t _rust_stack_top;
extern void app_core_trampoline(void);

/*
 * Boot Core 1 by directly manipulating ESP32-S3 hardware registers.
 * This bypasses FreeRTOS entirely — Core 1 will run our code with
 * no scheduler, no interrupts (unless we set them up), and no OS.
 */
static void start_rust_on_app_core(void)
{
    ESP_LOGI(TAG, "Starting Rust on Core 1...");
    ESP_LOGI(TAG, "  Stack: 0x3FCC9710 - 0x3FCE9710 (128K)");

    /* 1. Tell Core 1 where to begin executing after it resets.
     *    This ROM function writes the address into a register that the
     *    CPU reads on boot. We point it at our assembly trampoline. */
    ets_set_appcpu_boot_addr((uint32_t)app_core_trampoline);

    /* 2. Hardware-level wake-up sequence for Core 1.
     *    These register writes control the clock, stall, and reset
     *    signals for the second CPU core. */

    // Enable the clock gate — Core 1 can't run without a clock signal.
    SET_PERI_REG_MASK(SYSTEM_CORE_1_CONTROL_0_REG,
                      SYSTEM_CONTROL_CORE_1_CLKGATE_EN);

    // Clear the RUNSTALL bit. While stalled, the core is frozen mid-instruction.
    CLEAR_PERI_REG_MASK(SYSTEM_CORE_1_CONTROL_0_REG,
                        SYSTEM_CONTROL_CORE_1_RUNSTALL);

    // Pulse the reset line: assert it, then immediately de-assert.
    // This causes Core 1 to reboot and jump to the address we set above.
    SET_PERI_REG_MASK(SYSTEM_CORE_1_CONTROL_0_REG,
                      SYSTEM_CONTROL_CORE_1_RESETING);
    CLEAR_PERI_REG_MASK(SYSTEM_CORE_1_CONTROL_0_REG,
                        SYSTEM_CONTROL_CORE_1_RESETING);

    ESP_LOGI(TAG, "Core 1 released");
}

// This counter lives in the Rust code. Because it's an AtomicU32 with
// #[no_mangle], the C linker can find it by this exact name.
extern volatile uint32_t RUST_CORE1_COUNTER;

void app_main(void)
{
    ESP_LOGI(TAG, "Core 0: Starting IDF app");

    // Wake up Core 1 and start the Rust code
    start_rust_on_app_core();

    // Core 0 continues running FreeRTOS as normal.
    // Here we just monitor the shared counter to prove both cores are alive.
    while (1)
    {
        ESP_LOGI(TAG, "Rust Core 1 counter: %lu", (unsigned long)RUST_CORE1_COUNTER);
        vTaskDelay(pdMS_TO_TICKS(1000)); // Print once per second
    }
}

Step 3: The Assembly Trampoline

When a CPU core wakes up from reset, it doesn't have a stack yet. And without a stack, it can't call any C or Rust functions — function calls need somewhere to store return addresses and local variables.

The ESP32-S3 uses the Xtensa instruction set architecture, where register a1 serves as the stack pointer. Our tiny assembly stub loads the address of our reserved memory into a1, then jumps into Rust. That's all it does — just two instructions.

We place this code in the .iram1 section, which maps to Internal RAM. This is important because when a core first boots, it may not have flash caching set up yet. Code in IRAM is always accessible.

app_core_trampoline.S

/*
 * app_core_trampoline.S
 *
 * Minimal startup code for Core 1. Sets the stack pointer to our
 * reserved memory region, then jumps to the Rust entry point.
 *
 * Placed in IRAM (.iram1) so it's available immediately after core
 * reset, before flash cache is configured.
 */

    .section .iram1, "ax"       /* "ax" = allocatable + executable */
    .global  app_core_trampoline
    .type    app_core_trampoline, @function
    .align   4                  /* Xtensa requires 4-byte alignment */

app_core_trampoline:
    /* Load the top of our 128KB reserved stack into register a1.
     * Stacks grow downward on Xtensa, so "top" means the highest
     * address — the stack will grow toward lower addresses from here. */
    movi  a1, _rust_stack_top

    /* Jump to the Rust entry function. call0 is a "windowless" call
     * (no register window rotation), suitable for bare-metal startup.
     * This function never returns — it contains an infinite loop. */
    call0 rust_app_core_entry

    .size app_core_trampoline, . - app_core_trampoline

Step 4: Gluing It Together with CMake and a Linker Script

ESP-IDF uses CMake as its build system. We need to tell it about three extra things: our assembly file, our pre-compiled Rust library, and a custom linker script that defines where _rust_stack_top lives.

CMakeLists.txt

# Register our C source and the assembly trampoline as component sources.
# ESP-IDF builds each directory under "main/" as a "component."
idf_component_register(
    SRCS "main.c" "app_core_trampoline.S"
    INCLUDE_DIRS "."
)

# Tell the linker about our pre-compiled Rust static library.
# This .a file is produced by `cargo build` and copied into main/lib/.
add_prebuilt_library(rust_app "${CMAKE_CURRENT_SOURCE_DIR}/lib/libesp_rust_app.a")

# Link the Rust library into our component. INTERFACE means anything
# that depends on this component also gets the Rust symbols.
target_link_libraries(${COMPONENT_LIB} INTERFACE rust_app)

# Inject our custom linker script. This is how the assembly trampoline
# knows the numeric value of _rust_stack_top.
target_link_options(${COMPONENT_LIB}
    INTERFACE "-T${CMAKE_CURRENT_SOURCE_DIR}/rust_stack.ld")

rust_stack.ld

/*
 * Custom linker script fragment.
 *
 * Defines _rust_stack_top as the END of our reserved 128KB block.
 * Stacks grow downward, so the "top" is the highest address.
 * The assembly trampoline loads this value into register a1.
 */
_rust_stack_top = 0x3FCE9710;

The connection here is: the linker script provides a symbol (_rust_stack_top) → the assembly trampoline references that symbol to set the stack pointer → the C code triggers the hardware boot sequence that starts Core 1 at the trampoline.

Step 5: The Bare-Metal Rust Application

Finally, here's the code that actually runs on Core 1. It's entirely no_std — there's no operating system, no allocator, no standard library. Just raw hardware access.

The key technique here is AtomicU32. Atomics are special CPU instructions that read and write memory in a way that's safe even when two cores access the same address simultaneously. By using AtomicU32 for our shared counter, we avoid race conditions without needing a mutex (which wouldn't work easily across the OS/bare-metal boundary anyway).

The spin_loop hint tells the CPU "I'm intentionally busy-waiting" — on some architectures this reduces power consumption or yields resources to other hardware threads. Here it also serves as a simple delay so the counter doesn't overflow instantly.

// no_std: we're running without the Rust standard library.
// There's no OS below us — no heap, no threads, no println!.
#![no_std]

// no_main: we don't use Rust's normal main() entry point.
// Instead, Core 1 enters via rust_app_core_entry(), called from assembly.
#![no_main]

use core::panic::PanicInfo;
use core::sync::atomic::{AtomicU32, Ordering};

// Every no_std binary needs a panic handler. When something goes wrong
// (array out of bounds, unwrap on None, etc.), this function is called.
// On a bare-metal core with no debugger attached, there's not much we
// can do — so we just loop forever. A production system might toggle
// an LED or write to a shared error flag that Core 0 can read.
#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
    loop {}
}

// The shared counter. Both cores can see this variable because it lives
// in the same memory space.
//
// #[unsafe(no_mangle)] prevents Rust from renaming this symbol during
// compilation. Without it, Rust would generate something like
// "_ZN12esp_rust_app18RUST_CORE1_COUNTER17h..." — and the C code
// wouldn't be able to find it by name.
//
// AtomicU32 ensures that reads and writes are atomic at the CPU level,
// so Core 0 will never see a "torn" (half-written) value.
#[unsafe(no_mangle)]
pub static RUST_CORE1_COUNTER: AtomicU32 = AtomicU32::new(0);

// The entry point called by the assembly trampoline after it sets
// up the stack pointer. The `-> !` return type means "this function
// never returns" — it runs an infinite loop.
//
// `extern "C"` uses the C calling convention so the assembly code
// (and the C linker) can call this function correctly.
#[unsafe(no_mangle)]
pub extern "C" fn rust_app_core_entry() -> ! {
    loop {
        // Atomically increment the counter by 1.
        // Ordering::Relaxed means we don't need any memory ordering
        // guarantees beyond the atomicity of this single operation.
        // (For a simple counter, Relaxed is sufficient.)
        RUST_CORE1_COUNTER.fetch_add(1, Ordering::Relaxed);

        // Busy-wait loop as a simple delay. spin_loop() is a CPU hint
        // that says "I'm spinning, not doing real work" — on some
        // architectures this saves power or avoids starving other
        // hardware threads.
        for _ in 0..1_000_000 {
            core::hint::spin_loop();
        }
    }
}

Step 6: Configuring the Rust Build (Cargo.toml)

ESP-IDF's build system expects a standard C-compatible static archive (.a file). By default, cargo build produces Rust-specific .rlib files that only the Rust toolchain understands. We need to tell Cargo to output a staticlib instead.

We also apply aggressive size optimizations — on a microcontroller with limited flash, every kilobyte matters.

Cargo.toml

[package]
edition      = "2024"
name         = "esp_rust_app"
rust-version = "1.88"
version      = "0.1.0"

# Output a C-compatible static library (.a file).
# This is what lets us link Rust code into an ESP-IDF project
# the same way you'd link any C library.
[lib]
crate-type = ["staticlib"]

[dependencies]
# esp-hal provides low-level hardware access for the ESP32-S3.
# Even though we're not using most of its features yet, it sets up
# the critical-section implementation we need for atomics.
esp-hal = { version = "~1.0", features = ["esp32s3"] }
# Provides the critical-section implementation needed for safe
# interrupt handling in no_std environments.
critical-section = "1.2.0"

[profile.dev]
# Rust's default debug builds are unoptimized and produce huge binaries.
# On embedded, even dev builds should use "s" (optimize for size) to
# keep things manageable. Without this, you might overflow flash.
opt-level = "s"

[profile.release]
# Force the compiler to use a single codegen unit. This is slower to
# compile, but allows LLVM to see the entire crate at once and perform
# better cross-function optimizations (inlining, dead code elimination).
codegen-units    = 1
debug            = 2     # Keep debug symbols (useful for GDB on-device)
debug-assertions = false # Disable assert!() checks in release
incremental      = false # Disable incremental compilation for cleaner builds

# "fat" Link-Time Optimization. The linker analyzes ALL code (including
# dependencies) as a single unit, aggressively removing unused functions
# and inlining across crate boundaries. This can dramatically reduce
# binary size — often 30-50% smaller than without LTO.
lto              = 'fat'
opt-level        = 's'   # Optimize for size over speed
overflow-checks  = false # Disable integer overflow checks in release

Building and Testing

Build the Rust library, then copy it into the ESP-IDF project:

# Build the Rust code targeting the ESP32-S3's Xtensa CPU.
# This produces a .a file in target/xtensa-esp32s3-none-elf/release/
cargo build --release --target xtensa-esp32s3-none-elf

# Copy the compiled library to where our CMakeLists.txt expects it.
cp target/xtensa-esp32s3-none-elf/release/libesp_rust_app.a \
   /path/to/idf-project/main/lib/

Then build and flash the ESP-IDF project as usual (idf.py build flash monitor). You should see the counter incrementing on your serial monitor — proof that Core 1 is running your Rust code independently of FreeRTOS.

Part 1: Loading Rust at Runtime (Hot-Swappable Programs)

The static linking approach from Part 0 works well, but it has a limitation: the Rust code is baked into the firmware at compile time. Every time you change the Rust program, you have to rebuild the entire ESP-IDF project, re-link everything, and reflash the whole firmware.

What if the Rust program could be swapped at runtime? Imagine this: the ESP-IDF firmware acts like a bootloader, setting up the hardware environment (Wi-Fi, BLE, peripherals). The Rust program lives in its own flash partition and can be updated independently. Core 0 could even write a new Rust program to flash and reset Core 1 to run it — no full firmware rebuild required.

This is especially useful if the Rust code is user-provided content — for example, a customizable audio processing pipeline that end users can update.

To make this work, we need to change several things.

Step 1: Build Rust as a Standalone Binary

In Part 0, Cargo built a static library (.a file) that got linked into the ESP-IDF binary. Now we need Cargo to produce a standalone executable binary with its own entry point — something that can be loaded and jumped to at a specific memory address.

First, remove the [lib] section from Cargo.toml so Cargo builds a binary instead of a library:

Cargo.toml

[package]
edition      = "2024"
name         = "esp_rust_app"
rust-version = "1.88"
version      = "0.1.0"

# No [lib] section — we want a standalone binary, not a library.
# Cargo will look for src/main.rs as the entry point.

[dependencies]
esp-hal = { version = "~1.0", features = ["esp32s3"] }
critical-section = "1.2.0"

[profile.dev]
# Even dev builds need size optimization on embedded — unoptimized Rust
# produces enormous binaries that won't fit in flash.
opt-level = "s"

[profile.release]
codegen-units    = 1     # Single codegen unit for best LLVM optimization
debug            = 2
debug-assertions = false
incremental      = false
lto              = 'fat' # Full link-time optimization across all crates
opt-level        = 's'   # Optimize for size
overflow-checks  = false

Next, we need a .cargo/config.toml to tell the Rust toolchain how to link our binary. Since we're not linking into ESP-IDF anymore, we need to supply our own linker script and disable the standard startup code:

.cargo/config.toml

[target.xtensa-esp32s3-none-elf]
rustflags = [
    "-Clink-arg=-Tlink.x",             # Use our custom linker script
    "-Clink-arg=-nostdlib",             # Don't link the C standard library
    "-Clink-arg=-nostartfiles",         # Don't include default startup code
    "-Clink-arg=-Wl,--no-gc-sections", # Keep all sections (don't garbage-collect)
    "-Clink-arg=-Wl,--no-check-sections", # Skip section overlap checks
    "-Clink-arg=-mtext-section-literals",  # Xtensa-specific: inline literal pools
    "-Clink-arg=-Wl,--entry=rust_app_core_entry", # Set the ELF entry point
]

[env]

[build]
# Default build target — no need to pass --target every time
target = "xtensa-esp32s3-none-elf"

[unstable]
# Build the `core` library from source for our target.
# The Xtensa target doesn't ship prebuilt standard libraries,
# so Cargo needs to compile `core` itself.
build-std = ["core"]

The Linker Script

In Part 0, the .bss (uninitialized global variables) and .data (initialized global variables) sections from our Rust code were handled by the ESP-IDF linker — they became part of the main firmware's memory layout. But now that we're building a standalone binary, we need our own linker script to tell the toolchain where everything goes.

This is a critical piece of the puzzle. The linker script defines two memory regions: FLASH_TEXT (where our code lives in flash, mapped to a virtual address via the MMU) and DRAM (our reserved 128KB of RAM from the SOC_RESERVE_MEMORY_REGION macro).

link.x

/* Declare our Rust entry function as the ELF entry point */
ENTRY(rust_app_core_entry)

MEMORY
{
    /*
     * FLASH_TEXT: Where our code will be mapped in the address space.
     * 0x42400000 is a virtual address — the MMU will map our flash
     * partition to this region at runtime (we'll set that up in C).
     * 512K should be plenty for most Rust programs.
     */
    FLASH_TEXT (rx)  : ORIGIN = 0x42400000, LENGTH = 512K

    /*
     * DRAM: The 128KB block we reserved with SOC_RESERVE_MEMORY_REGION.
     * This is physical SRAM that both cores can access directly.
     * Our stack, .data, and .bss all live here.
     */
    DRAM       (rw)  : ORIGIN = 0x3FCC9710, LENGTH = 128K
}

SECTIONS
{
    /*
     * 4-byte header at offset 0 of the binary.
     * This is a simple convention: the first 4 bytes of our binary
     * contain the address of rust_app_core_entry. The C bootloader
     * reads this to know where to jump.
     */
    .header : {
        LONG(rust_app_core_entry)
    } > FLASH_TEXT

    /*
     * Xtensa puts function literal pools (constants used by instructions)
     * in .literal sections. We place the entry function's literals and
     * code first to ensure they're near the beginning of the binary.
     */
    .entry_lit : {
        KEEP(*(.literal.rust_app_core_entry))
    } > FLASH_TEXT

    .entry : {
        KEEP(*(.text.rust_app_core_entry))
    } > FLASH_TEXT

    /* All remaining code and read-only data goes into flash */
    .text : {
        *(.literal .literal.*)    /* Xtensa literal pools */
        *(.text .text.*)          /* Executable code */
        *(.rodata .rodata.*)      /* Read-only data (strings, constants) */
    } > FLASH_TEXT

    /*
     * .data: Initialized global/static variables.
     * These live in DRAM at runtime (VMA), but their initial values
     * are stored in flash (LMA). Our Rust startup code must copy
     * them from flash to RAM before using them.
     *
     * The "AT> FLASH_TEXT" part means: "put the content in flash,
     * but assign addresses as if it's in DRAM."
     */
    .data : {
        _data_start = .;
        *(.data .data.*)
        _data_end = .;
    } > DRAM AT> FLASH_TEXT
    _data_load = LOADADDR(.data);  /* Flash address where .data content lives */

    /*
     * .bss: Uninitialized global/static variables.
     * NOLOAD means the linker doesn't store anything in the binary for
     * this section — our startup code just zeroes the region at boot.
     */
    .bss (NOLOAD) : {
        _bss_start = .;
        *(.bss .bss.* COMMON)
        _bss_end = .;
    } > DRAM

    /* Discard sections we don't need — saves space in the binary */
    /DISCARD/ : {
        *(.eh_frame)         /* Exception handling frames (unused in no_std) */
        *(.eh_frame_hdr)
        *(.stack)
        *(.xtensa.info)      /* Xtensa toolchain metadata */
        *(.comment)          /* Compiler version strings */
    }
}

Initializing .data and .bss from Rust

When our Rust code was a library linked into ESP-IDF, the IDF startup code handled copying .data from flash to RAM and zeroing .bss. Now that we're standalone, we have to do it ourselves. This must happen before any static or global variables are accessed, or we'll read garbage.

// These symbols are defined by our linker script (link.x).
// They don't contain data — their *addresses* ARE the data.
// For example, &_data_start gives us the RAM address where .data begins.
unsafe extern "C" {
    static _data_start: u8;  // Start of .data in RAM
    static _data_end: u8;    // End of .data in RAM
    static _data_load: u8;   // Start of .data's initial values in flash
    static _bss_start: u8;   // Start of .bss in RAM
    static _bss_end: u8;     // End of .bss in RAM
}

/// Copy .data initial values from flash to RAM, and zero .bss.
/// MUST be called before accessing any static/global variables.
unsafe fn init_sections() {
    // Calculate how many bytes the .data section occupies
    let data_size = &raw const _data_end as usize - &raw const _data_start as usize;
    if data_size > 0 {
        // Copy initial values from flash (where the linker stored them)
        // to RAM (where the program expects them at runtime).
        core::ptr::copy_nonoverlapping(
            &raw const _data_load,          // Source: flash
            &raw const _data_start as *mut u8, // Destination: RAM
            data_size,
        );
    }

    // Calculate how many bytes the .bss section occupies
    let bss_size = &raw const _bss_end as usize - &raw const _bss_start as usize;
    if bss_size > 0 {
        // Zero out .bss. C and Rust both assume uninitialized globals
        // start as zero. Without this, they'd contain whatever was
        // previously in RAM — likely garbage from the bootloader.
        core::ptr::write_bytes(&raw const _bss_start as *mut u8, 0, bss_size);
    }
}

The Updated Rust Entry Point

Since our Rust binary is no longer linked into the ESP-IDF project, we can't share global variables by name across the C/Rust boundary (there's no shared linker pass). Instead, both sides agree on a fixed memory address for the shared counter. The C side reads from that address; the Rust side writes to it.

For this demo, I'm using the start of our reserved memory region (0x3FCC9710) as the counter address. In a real system, you'd want a more structured approach — perhaps a shared header at a fixed address that defines the layout of all shared data.

// Fixed memory address for the shared counter.
// Both the C side and Rust side must agree on this address.
// We're using the very start of our reserved DRAM region.
const COUNTER_ADDR: usize = 0x3FCC9710;

// #[unsafe(link_section = ".text.rust_app_core_entry")] places this
// function in a specific linker section making it easy to find.
#[unsafe(no_mangle)]
#[unsafe(link_section = ".text.rust_app_core_entry")]
pub extern "C" fn rust_app_core_entry() -> ! {
    // FIRST THING: initialize .data and .bss before touching any statics.
    // If we skip this, any global variable could contain garbage.
    unsafe {
        init_sections();
    }

    // Create an atomic reference to our shared counter.
    // We cast the raw memory address to an AtomicU32 pointer.
    // This is unsafe because we're asserting that this address is:
    //   1. Valid and aligned
    //   2. Not being used for anything else
    //   3. Accessible by both cores
    let counter = unsafe { &*(COUNTER_ADDR as *const AtomicU32) };

    // Initialize the counter to zero (in case there was leftover data)
    counter.store(0, Ordering::Relaxed);

    loop {
        // Increment the shared counter atomically
        counter.fetch_add(1, Ordering::Relaxed);

        // Busy-wait delay (same as before)
        for _ in 0..1_000_000 {
            core::hint::spin_loop();
        }
    }
}

Step 2: Update the ESP-IDF Project to Load the Binary at Runtime

Now that our Rust code is a standalone binary instead of a linked library, the ESP-IDF side needs several changes.

Create a Flash Partition

The Rust binary needs its own partition in flash. We add a rust_app entry after the factory partition (where the main ESP-IDF firmware lives):

partitions.csv

nvs,         data, nvs,     0x9000,     0x6000,
phy_init,    data, phy,     0xf000,     0x1000,
factory,     app,  factory, 0x10000,    0x1F0000,
rust_app,    data, 0x40,    0x200000,   0x80000,

The rust_app partition starts at offset 0x200000 (2MB into flash) and is 0x80000 (512KB) in size. The subtype 0x40 is an arbitrary custom value — it just needs to be something ESP-IDF doesn't already use, so we can find the partition by name and type later.

Map the Partition into Memory via the MMU

On the ESP32-S3, code in flash isn't directly executable — it needs to be mapped into the CPU's address space via the Memory Management Unit (MMU). This is normally handled automatically by ESP-IDF for the main firmware, but for our separate Rust binary, we need to do it manually.

The function below finds our rust_app partition in flash and maps it page-by-page to virtual address 0x42400000 (the same address our linker script targets). After mapping, the CPU can execute code from this region as if it were regular memory.

#include <string.h>
#include "esp_partition.h"
#include "hal/mmu_hal.h"
#include "hal/cache_hal.h"

// Virtual address where the Rust binary will be mapped.
// This MUST match the FLASH_TEXT origin in link.x.
#define RUST_VADDR 0x42400000

// Will hold the entry point address read from the binary's header
uint32_t rust_entry_addr = 0;

static void load_rust_app(void)
{
    // Find the "rust_app" partition we defined in partitions.csv.
    // We search by type (DATA) and subtype (0x40, our custom value).
    const esp_partition_t *part =
        esp_partition_find_first(ESP_PARTITION_TYPE_DATA, 0x40, "rust_app");

    if (!part)
    {
        ESP_LOGE(TAG, "rust_app partition not found!");
        return;
    }

    // Map the partition into the CPU's address space page by page.
    // The MMU works in pages (typically 64KB on ESP32-S3), so we
    // calculate how many pages we need and map each one.
    uint32_t page_size = CONFIG_MMU_PAGE_SIZE;
    uint32_t pages = (part->size + page_size - 1) / page_size; // Round up
    uint32_t actual_mapped_size = 0;

    for (uint32_t i = 0; i < pages; i++)
    {
        uint32_t mapped = 0;
        // Map one page: virtual address → physical flash address
        mmu_hal_map_region(0, MMU_TARGET_FLASH0,
                           RUST_VADDR + (i * page_size),    // Virtual addr
                           part->address + (i * page_size), // Flash addr
                           page_size, &mapped);
        actual_mapped_size += mapped;
    }

    // Invalidate the cache for this region so the CPU doesn't serve
    // stale data from a previous mapping.
    cache_hal_invalidate_addr(RUST_VADDR, part->size);

    ESP_LOGI(TAG, "Rust app mapped at 0x%lx (%lu bytes, flash 0x%lx)",
             (unsigned long)RUST_VADDR, (unsigned long)actual_mapped_size,
             (unsigned long)part->address);
}

Update the Boot Function

The start_rust_on_app_core function now loads the Rust binary from flash before waking Core 1. It reads the entry point address from the first 4 bytes of the binary (that's the .header section from our linker script) and stores it in a global variable that the assembly trampoline will read.

static void start_rust_on_app_core(void)
{
    // Step 1: Map the Rust binary from flash into the address space
    load_rust_app();

    // Step 2: Read the entry point from the binary's 4-byte header.
    // Our linker script placed LONG(rust_app_core_entry) at offset 0,
    // so the first 4 bytes at RUST_VADDR contain the function's address.
    uint32_t entry = *(volatile uint32_t *)RUST_VADDR;
    rust_entry_addr = entry;  // Store globally for the trampoline to read

    ESP_LOGI(TAG, "Rust entry at 0x%lx", (unsigned long)entry);

    // Step 3: Same hardware boot sequence as before
    ets_set_appcpu_boot_addr((uint32_t)app_core_trampoline);

    SET_PERI_REG_MASK(SYSTEM_CORE_1_CONTROL_0_REG,
                      SYSTEM_CONTROL_CORE_1_CLKGATE_EN);
    CLEAR_PERI_REG_MASK(SYSTEM_CORE_1_CONTROL_0_REG,
                        SYSTEM_CONTROL_CORE_1_RUNSTALL);
    SET_PERI_REG_MASK(SYSTEM_CORE_1_CONTROL_0_REG,
                      SYSTEM_CONTROL_CORE_1_RESETING);
    CLEAR_PERI_REG_MASK(SYSTEM_CORE_1_CONTROL_0_REG,
                        SYSTEM_CONTROL_CORE_1_RESETING);

    ESP_LOGI(TAG, "Core 1 released");
}

Update the Main Function

Since we can no longer reference RUST_CORE1_COUNTER by name (the Rust binary isn't linked into our C project anymore), we read the counter from its known memory address directly:

// The Rust code writes its counter to this fixed address.
// Both sides must agree on this — it's defined as COUNTER_ADDR in the Rust code.
#define RUST_COUNTER_ADDR 0x3FCC9710

void app_main(void)
{
    ESP_LOGI(TAG, "Core 0: Starting IDF app");

    start_rust_on_app_core();

    // Create a volatile pointer to the shared counter.
    // "volatile" tells the C compiler: "this value can change at any time
    // (because another CPU core is writing to it), so always read from
    // memory — don't cache it in a register."
    volatile uint32_t *counter = (volatile uint32_t *)RUST_COUNTER_ADDR;

    while (1)
    {
        ESP_LOGI(TAG, "Rust Core 1 counter: %lu", (unsigned long)*counter);
        vTaskDelay(pdMS_TO_TICKS(1000));
    }
}

Update the Assembly Trampoline

The trampoline can no longer use call0 rust_app_core_entry because that symbol doesn't exist in the C project's link stage. Instead, it reads the entry address from the rust_entry_addr global variable (which start_rust_on_app_core populated) and does an indirect jump:

/*
 * app_core_trampoline.S (updated for runtime loading)
 *
 * Same job as before: set the stack pointer, then jump to Rust.
 * But now the Rust entry address isn't known at link time — it's
 * stored in the rust_entry_addr global variable by the C code.
 */

    .section .iram1, "ax"
    .global  app_core_trampoline
    .type    app_core_trampoline, @function
    .align   4

app_core_trampoline:
    /* Set up the stack pointer (same as before) */
    movi  a1, _rust_stack_top

    /* Load the entry address from the global variable.
     * movi loads the ADDRESS of rust_entry_addr into a2,
     * then l32i loads the VALUE at that address into a0. */
    movi  a2, rust_entry_addr
    l32i  a0, a2, 0           /* a0 = *(rust_entry_addr) */

    /* Indirect jump to the Rust entry point */
    jx    a0

    .size app_core_trampoline, . - app_core_trampoline

Step 3: Build and Flash

Now we have two separate build steps — one for the Rust binary, one for the ESP-IDF firmware — and two separate flash steps.

Build and flash the ESP-IDF side:

# Build the ESP-IDF project (which no longer includes any Rust code)
idf.py build

# Flash the main firmware and partition table
idf.py flash

Build and flash the Rust binary:

# Build the standalone Rust binary
cargo build --release --target xtensa-esp32s3-none-elf

# Convert from ELF format to raw binary.
# The ELF file contains metadata (section headers, debug info, etc.)
# that we don't need — objcopy strips all of that and outputs just
# the raw machine code that the CPU will execute.
xtensa-esp32s3-elf-objcopy -O binary \
    'target/xtensa-esp32s3-none-elf/release/esp_rust_app' \
    rust_app.bin

# Flash the raw binary to the rust_app partition.
# 0x200000 is the offset we defined in partitions.csv.
esptool.py --port /dev/ttyACM0 write_flash 0x200000 rust_app.bin

The two flash steps are independent. You can update the Rust binary without rebuilding or reflashing the ESP-IDF firmware — just flash the new rust_app.bin to the same partition offset.

Verifying It Works

Open your serial monitor (idf.py monitor or any terminal at 115200 baud) and you should see output like this:

ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x1 (POWERON),boot:0x8 (SPI_FAST_FLASH_BOOT)
...
I (47) boot: Partition Table:
I (50) boot: ## Label            Usage          Type ST Offset   Length
I (56) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (62) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (69) boot:  2 factory          factory app      00 00 00010000 001f0000
I (75) boot:  3 rust_app         Unknown data     01 40 00200000 00080000
I (82) boot: End of partition table
...
I (202) heap_init: Initializing. RAM available for dynamic allocation:
I (209) heap_init: At 3FC93BD8 len 00035B38 (214 KiB): RAM
I (214) heap_init: At 3FCE9710 len 00005724 (21 KiB): RAM
I (219) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (224) heap_init: At 600FE000 len 00001FE8 (7 KiB): RTCRAM
...
I (279) main_task: Calling app_main()
I (279) rust_app_core: Core 0: Starting IDF app
I (280) rust_app_core: Rust app mapped at 0x42400000 (524288 bytes, flash 0x200000)
I (283) rust_app_core: Rust entry at 0x42400024
I (287) rust_app_core: Core 1 released
I (291) rust_app_core: Rust Core 1 counter: 34538
I (1295) rust_app_core: Rust Core 1 counter: 12369571
I (2295) rust_app_core: Rust Core 1 counter: 24670917
I (3295) rust_app_core: Rust Core 1 counter: 36972284
I (4295) rust_app_core: Rust Core 1 counter: 49273651

There are several things to confirm in this output:

The partition table shows our rust_app partition at offset 0x200000.
The heap_init logs show that our reserved 128KB region (starting at 0x3FCC9710) is not listed as available for dynamic allocation — SOC_RESERVE_MEMORY_REGION worked.
The MMU mapping succeeded — the Rust binary is mapped at 0x42400000.
The counter is incrementing — Core 1 is alive, running Rust, and sharing data with Core 0 through the atomic counter at the agreed-upon memory address.

What's Next

This setup gives you the best of both worlds: ESP-IDF and FreeRTOS manage Wi-Fi, BLE, and system tasks on Core 0, while Core 1 runs your bare-metal Rust code at full speed with zero scheduler interference. Data flows between them through shared memory using atomics.

From here, there are a lot of directions you could take this: setting up interrupts on Core 1, building a proper shared memory protocol between the cores, implementing error recovery if the Rust program crashes, or even adding the ability for Core 0 to update the Rust binary over Wi-Fi and hot-restart Core 1.

The dual-core architecture of the ESP32-S3 turns out to be a surprisingly clean boundary for separating concerns — and for running two very different software paradigms side by side.

↑ top

24.Clay PCB Tutorial

Sourcehttps://feministhackerspaces.cargo.site/Clay-PCB-Tutorial

Sitefeministhackerspaces.cargo.site

Submitterj0r0b0 (Hacker News)

Submitted2026-04-26 16:02 UTC (Hacker News)

HN activity202 points · 122 comments

Length2.2K words (~10 min read)

Languageen

Clay PCB Tutorial MaKING Printed Circuit Boards with Wild Clay It is an open secret that the hardware in our...

Tutorial

It is an open secret that the hardware in our smart devices contains not only plastics but also conflict minerals such as tungsten, tin, tantalum, silver and gold.
We are investigating alternative hardware from locally sourced materials, so-called ethical hardware, to develop and speculate upon renewable practices for the benefit of both nature and humans.

We are exploring different materials, sentient, low-impact, non-toxic, fair traded, recycled and urban mined means of production.
We aim to challenge the common PCB (printed circuit board) economies in an artistic, creative, positive and responsible way applying feminist hacking as an artistic methodology and critical framework.

Our initial idea was to develop a microcontroller PCB that could work with the ATmega328P chip, which is commonly used in the famous Arduino Uno board (orArduina board, as some feminists call it). Why this chip? Because we are part of a community hackerspace –
— Mz* Baltazar’s Lab (a feminist hacklab and artist-run independent space based in Vienna, Austria) – and the Arduino Uno has been our favourite microcontroller for the past 12 years. After using it in many prototypes, artworks and workshops, we had several malfunctioning Arduino boards left over. But their chips were still working, so the idea was to reuse the chips in our new project.

The second challenge was to come up with an electric circuit that would allow us to receive several forms of input signal (analogue and digital sensors) and generate a variety of output signals (to control LEDs, motors and speakers).

To build the base of our PCB, we needed insulating, sustainable and robust materials (eggshells? wood plates? wax? ceramics?). We immediately chose ceramics, specificallyporcelain, as it already plays an important role in electronic components such as capacitors, piezo, resistors, and so on. Porcelain is an industrially made material comprising kaolin (the main ingredient that makes it plastic and white) and stone pottery (the second ingredient that makes porcelain translucent and hard). Both are well-known commodities that are prospected and mined around the world, on a small scale in Europe and a larger scale in China, Brazil, South Africa and Vietnam (among others).

In pottery, porcelain, also known as china clay, is a very delicate and sensitive material (we could say it has its own agency), more difficult to control than other industrial clays. Also, along with the other harder and more resistant stoneware clays, it usually requires higher firing temperatures, in two stages: a first firing known as a ceramic bisque of c. 1000 oC; and a glazing firing around 1200 oC in an electric kiln. During our initial experiments with porcelain, we were immediately aware that the higher temperatures, and therefore electric consumption, were not compatible with our standards for ethical hardware.

It was when we were struggling with the question of how to manufacture clay in low-energy and low-impact ways that we came across the work of Heinz Lackinger, a pottery crafter in Donnerskirchen, Burgenland, Austria, who works with prehistoric techniques of firing clay in an open wood fire. Instead of using sophisticated machines, he uses a simple hole in the grounds of his 18th-century house. We had the privilege of spending two days with this skilled craftsman, learning how to identify and collect the clay, and how to model and fire it using old, dry branches collected from the forest ground. If the clay is collected with awareness of its many qualities, and in small quantities only, this process can be defined as 100% fair trade and congruent with locally sourced modes of hardware production. We owe the knowledge required for the following steps to Heinz Lackinger’s generous knowledge transfer during his workshop, and to our own experiments later applying this technique in the making of natural clay PCB boards.

We collected our clay at the beginning of autumn in dry weather. The soil is mainly dry at this time, but it doesn’t consist of argil alone: you will find small stones, plants, even small insects. When the clay is that dry, the easiest way to clean it is using a net that retains the undesirable waste. Our favourite tool for this is a standard kitchen colander. The waste collected should be given back to the earth, put back in the ground.

You will end up with a fine powder that needs some water. We calculated an average of100 ml of water per 1 kg of fine powder. Mixing it is just like blending flour and water, but without the inconvenient grumps. You should end up with something like a ball of clay after 10 minutes of massaging. It is important that the clay sticks together and all the air is removed.

For the shape of our PCB board, we used a hexagon tile cutter measuring 10 x 10 cm, which can be bought in most ceramic shops. We chose a hexagon shape, as the tile form is not mandatory; but you can pick any shape and thickness you desire, as long as it maintains a printable surface of c. 10 x 10 cm. We chose this shape and format in the hope of assembling the boards as tiles next to one another, connecting them electronically. We ultimately abandoned this idea, since it was very difficult with this material to obtain straight edges that exactly lined up.

To facilitate the process, we used two small wooden slats of 1 cm thickness, which we attached with clamps to a table. The distance between the slats was c. 10 cm. We also used a newspaper sheet underneath to avoid the clay sticking to the surface of the table (a plaster surface works best of all). Before placing the clay between the slats, it is important to prepare it in small quantities. Each board requires around 180 g so we recommend taking something around 220 g and kneading it thoroughly for a minute to get rid of any air bubbles, shaping it into a ball. (If your clay is not even enough, it is better to throw it down forcefully against a flat surface and repeat the process a couple of times.) When you"re ready, place it between the slats and gently press it, bearing down on it until it is flat enough for the area you want to achieve. We used a dough roller to flatten out the clay to 1 cm thickness.

After cutting the hexagon out using the hexagon tile cutter, you can place the stamp gently down on the clay, facing the 3D printed side down towards the clay. It is important to apply some force, but quite gently, until the circuit is imprinted in the clay. In this process you force the clay to deform a little at the edges, but you can easily remove the excess material by sanding it after drying. Another technique is to place the !stamp‘ inside the tile cutter and cut and imprint at the same time. We observed that this also works fine. However, it is difficult to check the quality of the imprint while pressing the cutter down. Also, the circuit tracks will be thinner, which might make the painting process a little more difficult.

We usually let the boards dry naturally outdoors for 24 hours before painting, but this time frame is weather-dependent. If you have more time between modelling and firing the clay, you will ideally dry the boards indoors for one to two weeks, positioned between wooden plates and applying some weight to the top plate. In this way they will not deform while drying and will maintain their flat surfaces (we recommend using newspaper between the boards and the wood). If you want to dry the boards quickly, you can also place them around a wood fire. It is important to avoid temperature clashes during drying so it is best to bring them to the fire as slowly as possible, one step at a time.

You will know when a board is 100% dry when you see its colour becoming lighter and more even. When it is not completely dry, although the edges turn a lighter colour, in the middle the clay is darker and wet. Our experience tells us that this is the minimum drying period required before starting to paint the circuits. If you wish, you can facilitate the painting process by gently sanding the boards, using a 120# or finer sanding paper. After sanding, make sure there is no dust on the board so you can start painting it.

Painting the circuit

While searching for conductive materials that can be used in ceramics, we came across a gold lustre (used often for the gold details on porcelain) that, after firing, becomes conductive. The first problem we encountered was that this product is usually sold by ceramic shops that don"t supply any information about its ingredients, especially on the sources of the gold and its commodity chain. The second problem is that it is not possible to solder directly on this gold lustre, so we had to add another precious metal to the equation. The challenge was finding – within the solderable and easily available precious metals, such as tin, copper, brass and silver – one that could bear the firing process, which is c. 700 oC, and at the same time maintain its conductive properties. As we know, tin, which is mostly used for soldering, melts at a very low temperature, copper melts at approximately 1000 oC, but the oxidation process happens so quickly in the fire that it loses its conductive properties, and the same happens with brass. We were left with silver, which, although it also oxidises with the fire, maintains its conductive properties. Also, silver is cheaper than gold and widely used by goldsmiths. We were able to find a silver paint, commercialised by a German company, that is made with waste silver powder collected by jewellery makers. It's like an urban mining technique of silver dust.

To paint the circuit, you will need a very thin brush, size 0/5. We recommend starting from the middle, at the place where the ATmega chip will be soldered. These, and the input and output connector pins, are the ones to which you should apply more silver paint. It is important that the lines of silver do not touch one another. If that happens (and it always does), you can correct it using a thin metal piece and scratching it out. For the input and output connector pins, since the paintable area is larger, we used a thicker brush (0 or 1).

There would have been many other ways to print the circuit without the time- consuming hand painting, for instance using a stencil mask and either spraying on it or using another paint-transferring technique. The reason we chose this method is because it appears to be more economical and sustainable, since you generate almost no waste paint.

Firing

We fired the boards in our own backyard, reusing a hole that had been dug specifically for the purpose. The wood was collected in situ, consisting of dry wood sticks and old branches from our trees. We started a normal fire to generate some heat and placed all the boards around it to complete the drying process. Meanwhile, we collected wood sticks of approximately the same size but in two categories of thickness. The thicker sticks can be used for the base of the boards and the thinner ones can go on top.

While the first fire is settling, you can start to build the !bed’ for the boards by placing the first layer in parallel and the second transversely on top. Using proper fire-proof gloves, glasses and clothes, start to add the boards to the top of the !bed’/ rack. We used BBQ tongs to handle the boards.

Quickly add the second layer by repeating the same process, only now with thinner sticks. This will cause the fire to expand and also provide an oven effect for the ceramic pieces. The maximum temperature should be around 700 oC, bearing in mind that it can be hard to control. Our experience tells us that 20 minutes is the average time they need to be ready, so you will need to keep the fire alive during that time. After 20 minutes, you can let the fire go down and check the boards. You should be able to see them glow in the fire, which is when you will know they are ready. Using the tongs, you can quickly transfer them from the fire to a bucket of cold water and leave them there, still holding them with the tongs for a few seconds. This is usually the ultimate !proof’ test for the clay. If there are no air bubbles, stones or cracks, and it has dried properly, it can resist the cold water.

Credits:

Concept and Design: Patrícia J. Reis & Stefanie Wuschitz

PCB Design: Patrícia J. Reis & Daniel Schatzmayr

3D Printing: Klemens Kohlweis

Clay research: Patrícia J. Reis

↑ top

27.FreeBSD Device Drivers Book

Sourcehttps://github.com/ebrandi/FDD-book

SiteGitHub

Submittermyth_drannon (Hacker News)

Submitted2026-04-26 22:53 UTC (Hacker News)

HN activity26 points · 5 comments

Length2.8K words (~13 min read)

Languageen

FreeBSD Device Driver Book. Contribute to ebrandi/FDD-book development by creating an account on GitHub.

FreeBSD Device Drivers

From First Steps to Kernel Mastery

by Edson Brandi · Version 2.0 (April 2026)

About This Book

FreeBSD Device Drivers: From First Steps to Kernel Mastery is a free, open-source book that takes you from "I've never written kernel code" to "I can write, debug, and submit production-quality FreeBSD drivers." It is a guided course rather than a reference, structured around 38 chapters, 6 appendices, and dozens of hands-on labs that compile and load on a real FreeBSD 14.x system.

The book is aimed at readers who are willing to learn rather than are already qualified. It begins with UNIX fundamentals and the C language, walks step by step through every concept the kernel will demand of you, and only then opens the door to driver development. By the time you reach DMA, interrupts, and PCI work, the vocabulary feels earned, not imposed.

"Kernel programming is still programming, only with more explicit rules, greater responsibility, and a bit more power. Once you understand that, the fear gives way to excitement." (from Chapter 1)

Why This Book?

There are excellent FreeBSD kernel references already, including man 9, the Architecture Handbook, and the Newbus papers. What has been missing is a single text that:

Starts from zero. UNIX, C, and the FreeBSD environment are taught before any kernel code is written.
Targets FreeBSD 14.x specifically. Every API, every example, every lab was verified against the FreeBSD 14.3 source tree.
Treats labs as first-class. Roughly half of the recommended study time is hands-on. The same patterns (attach, cleanup, locking) recur across chapters until they become reflex.
Builds one driver across the whole book. The myfirst driver evolves chapter by chapter, gaining synchronisation, then hardware access, then interrupts, then DMA. You see the same code mature in your own hands.
Covers the full lifecycle. From "Hello Kernel Module" all the way to submitting a Phabricator review and shepherding a driver into the FreeBSD tree.

Who This Book Is For

Beginners who know little about C, UNIX, or kernels but are willing to learn.
Developers curious about how operating systems actually work under the hood.
Professionals who already use FreeBSD (or similar systems) and want to deepen their knowledge by learning how drivers are built in practice.

Who This Book Isn't For

Readers looking for a quick copy-and-paste manual. The book emphasises understanding over shortcuts.
Seasoned kernel developers who don't need the foundations. The pace starts from the ground up.
Readers wanting an encyclopaedic hardware reference. The focus is real-world FreeBSD driver development, not exhaustive bus or device specifications.

What You'll Learn

The book is organised into seven parts that build on each other:

Part	Title	Focus
1	Foundations: FreeBSD, C, and the Kernel	Lab setup, UNIX, C for kernel work, driver anatomy
2	Building Your First Driver	Character drivers, device files, read/write, I/O
3	Concurrency and Synchronization	Threads, mutexes, condvars, timers, taskqueues, semaphores
4	Hardware and Platform-Level Integration	PCI, interrupts, MSI/MSI-X, DMA, power management
5	Debugging, Tools, and Real-World Practices	Tracing, KGDB, advanced debugging, performance tuning
6	Writing Transport-Specific Drivers	USB, serial, storage/VFS, network drivers
7	Mastery Topics: Special Scenarios and Edge Cases	Portability, virtualisation, security, embedded, reverse engineering, upstream submission

By the end, you will have written and loaded your own kernel modules, built a character driver, handled real interrupts and DMA, debugged kernel panics, profiled your driver under load, and walked through every step of contributing your work back to the FreeBSD Project.

Book Stats


Pages	4,500+
Chapters	38
Appendices	6
Reading time	~100 hours
Lab time	~100 hours
Total study time	~200 hours (≈6 months at 5 hrs/week)
Target FreeBSD release	14.3
Languages	English (original) · Brazilian Portuguese (AI-translated) · Spanish (AI-translated)
Formats	PDF · EPUB · HTML · Markdown source

Full Table of Contents

Click to expand the complete chapter list

Part 1: Foundations of FreeBSD, C, and the Kernel

Introduction: From Curiosity to Contribution
Setting Up Your Lab
A Gentle Introduction to UNIX
A First Look at the C Programming Language
Understanding C for FreeBSD Kernel Programming
The Anatomy of a FreeBSD Driver

Part 2: Building Your First Driver

Writing Your First Driver
Working with Device Files
Reading and Writing to Devices
Handling Input and Output Efficiently

Part 3: Concurrency and Synchronization

Concurrency in Drivers
Synchronization Mechanisms
Timers and Delayed Work
Taskqueues and Deferred Work
More Synchronization: Conditions, Semaphores, and Coordination

Part 4: Hardware and Platform-Level Integration

Accessing Hardware
Simulating Hardware
Writing a PCI Driver
Handling Interrupts
Advanced Interrupt Handling
DMA and High-Speed Data Transfer
Power Management

Part 5: Debugging, Tools, and Real-World Practices

Debugging and Tracing
Integrating with the Kernel
Advanced Topics and Practical Tips

Part 6: Writing Transport-Specific Drivers

USB and Serial Drivers
Working with Storage Devices and the VFS Layer
Writing a Network Driver

Part 7: Mastery Topics

Portability and Driver Abstraction
Virtualisation and Containerization
Security Best Practices
Device Tree and Embedded Development
Performance Tuning and Profiling
Advanced Debugging Techniques
Asynchronous I/O and Event Handling
Creating Drivers Without Documentation (Reverse Engineering)
Submitting Your Driver to the FreeBSD Project
Final Thoughts and Next Steps

Appendices

A: FreeBSD Kernel API Reference
B: Algorithms and Logic for Systems Programming
C: Hardware Concepts for Driver Developers
D: Operating System Concepts
E: Navigating FreeBSD Kernel Internals
F: Benchmark Harness and Results

How to Read the Book

The recommended pace is one chapter per week at roughly five hours of weekly study. That schedule puts the whole book within reach across an academic year. Some chapters (especially Chapter 4 on C, and the Part 4 hardware chapters) naturally span multiple weeks.

The labs are strongly recommended. Kernel programming rewards muscle memory in a way few disciplines do. The same attach pattern, the same cleanup chain, and the same locking shape appear chapter after chapter and driver after driver. Typing those patterns, compiling them, loading them into a running kernel, and watching them fail on purpose is the single most effective way to internalise them.

If you already know C, UNIX, and the general shape of an OS kernel, fast-path notes throughout Part 1 tell you which sections to read carefully and which you can skim.

Download the Book

Version 2.0 is available now on the Releases page in three languages and three formats:

Language	PDF	EPUB	HTML
English (original)	PDF	EPUB	HTML
Português (Brasil), AI-translated	PDF	EPUB	HTML
Español, AI-translated	PDF	EPUB	HTML

You can also browse the Markdown source directly in the content/ directory, or build the book yourself with scripts/build-book.sh.

About the translations

The English version is the original and authoritative version of the book. The Brazilian Portuguese and Spanish editions were translated using AI and have not yet undergone a full human technical review. They are published to make the material accessible to more readers, but they may contain translation mistakes, awkward wording, or technical inaccuracies introduced during translation.

If something in a translated edition seems unclear, inconsistent, or technically questionable, please refer to the English version as the source of truth. Help with reviewing and improving the translations is very welcome (see Contributing below).

Known issues in v2.0

This is a draft release of a very large book. A few things to be aware of:

Some source-code blocks in the PDF, EPUB, and HTML editions may overflow the page or wrap awkwardly. These are presentation issues that will be improved in a future release; the content itself is correct.
If a code example is hard to read in any of the rendered formats, the Markdown files in this repository are the source of truth. Open the relevant file under content/ for a clean version.
Translation review for pt_BR and es_ES is planned for the near future, as the author's free time allows.

Repository Structure

FDD-book/
├── content/              # Book content (Markdown)
│   ├── chapters/         # Chapters by Part
│   └── appendices/       # Appendices A-F
├── examples/             # Source code from the book
├── translations/
│   ├── pt_BR/            # Brazilian Portuguese (AI-translated)
│   └── es_ES/            # Spanish (AI-translated)
└── scripts/              # Build and utility scripts

Contributing

Contributions of every kind are welcome, including corrections, clarifications, new examples, translations, and reviews from FreeBSD developers and learners alike.

Ways to contribute

Content: add new chapters, refine existing material
Technical review: review chapters for accuracy against FreeBSD 14.x
Translation review: help review and improve the AI-translated pt_BR and es_ES editions; native speakers with FreeBSD/kernel experience are especially welcome
New translations: help translate the book into another language
Code: improve examples, build scripts, and tooling
Issues: report bugs, factual errors, unclear passages, or formatting problems

Reporting an issue

When filing an issue, please include:

The language version you were reading (en_US, pt_BR, es_ES)
The format used (PDF, EPUB, HTML, or Markdown)
The chapter or section where the problem appears
A short explanation of the issue, and a suggested correction if you have one

Workflow

Fork the repository
Create a branch: git checkout -b feature/your-change
Make your changes and test the build with scripts/build-book.sh
Commit with a clear message: git commit -m "Chapter 18: clarify BAR mapping"
Push and open a Pull Request

When you're stuck while reading the book, filing an issue helps. If a passage seems wrong or a lab fails unexpectedly, every report makes the next reader's path smoother.

Frequently Asked Questions

Why does this book exist?

The honest answer is that the FreeBSD Project needs new contributors, and the path into kernel and driver work has always been steeper than it should be. Most existing material assumes you already know UNIX, already know C well, already know what a bus is, and already know how to read a kernel source tree. That works for the people who are already most of the way there. It does very little for the curious developer who wants to start.

The goal of this book is to lower that on-ramp. If even a small number of readers finish it and go on to submit patches, review code, write new drivers, or eventually become FreeBSD committers, the book has done its job. Training the next generation of FreeBSD contributors is the reason this work was written.

Do I need to know C before starting?

No. Chapters 4 and 5 teach C from the ground up, focusing on the parts of the language that matter for kernel work (pointers, structures, memory layout, the preprocessor, and calling conventions). If you already know C well, sidebars in those chapters tell you what to skim and what to read carefully.

Do I need to know UNIX or FreeBSD?

No. Chapter 2 walks you through installing FreeBSD in a VM or on bare metal, and Chapter 3 introduces the UNIX command line, filesystem, processes, permissions, and editors. By the end of Part 1 you will have a working lab and the vocabulary to use it.

Do I need real hardware?

For most of the book, no. A virtual machine running FreeBSD 14.x is enough for the foundations, the first driver chapters, concurrency and synchronization, and a large portion of the debugging material. Real hardware becomes useful (but is still not strictly required) when you reach the PCI, interrupt, and DMA chapters in Part 4. Those chapters are written so that the concepts make sense even if you only run them in a VM.

Will this help me write Linux drivers?

Indirectly, yes. The kernel programming discipline transfers very well: locking strategy, memory management, interrupt context, DMA mapping, the difference between sleeping and non-sleeping code paths, defensive cleanup ordering. The specific APIs differ. After reading this book you will not know the Linux device model, but you will recognise its problems and the shape of its solutions, and you will be able to read Linux Device Drivers (LDD) much more easily.

Why FreeBSD 14.x specifically?

Every API, every example, and every lab was planned to be executed under FreeBSD 14.3 source tree and the corresponding man 9 pages. Targeting a specific release lets the book be precise about function signatures, header locations, and behaviour. The concepts will outlive 14.x by many years; the exact line numbers and small API details will not, and the book is honest about that.

Is this an official FreeBSD Project publication?

No. This is an independent educational book about FreeBSD device driver development. It is not an official publication of the FreeBSD Project. The author is a FreeBSD committer and a member of the Documentation Engineering Team (DocEng), but the book reflects his work and views, not an official Project position.

How long will this actually take me?

If you read carefully and do the labs, plan for around 200 hours of total work. That is roughly 100 hours of reading and 100 hours of hands-on lab time. At five hours a week that is about a six-month evening project; at ten hours a week, a focused two-month sprint. Reading without doing the labs cuts the time roughly in half but also cuts the value: kernel programming rewards muscle memory in a way few disciplines do.

Can I skip the labs?

You can, but you probably shouldn't. The labs are where prose becomes reflex. Patterns like attach ordering, cleanup unwinding, and lock acquisition shape recur in every chapter, and the only reliable way to internalise them is to type them, compile them, load them into a running kernel, and watch them fail on purpose. Readers who skip the labs report progress that feels smooth at first and then quietly stalls around Part 3 or Part 4.

How do I contribute back to FreeBSD after reading this?

Chapter 37 covers the full submission workflow: how to prepare a patch, how to use Phabricator (the FreeBSD code review system), how to find a committer to sponsor your work, how to respond to review feedback, and how to shepherd a driver into the tree. The earlier chapters build the technical skill; Chapter 37 builds the social workflow. Both matter.

The book is huge. Where should I start?

Start at Chapter 1 unless you have a reason not to. The book is cumulative; later chapters lean on vocabulary and habits established earlier. If you already know C, UNIX, and the general shape of an OS kernel, the fast-path notes inside Part 1 tell you what to skim. If a specific subsystem is what brought you here (USB, networking, storage, PCI), it is fine to read Parts 1 and 2 carefully and then jump ahead, but expect to circle back when terms from earlier chapters reappear.

I found a mistake. What should I do?

Open an issue on GitHub. Include the language version, the format you were reading, the chapter or section, a short description of the problem, and a suggested correction if you have one. Every report makes the next reader's experience better. Translation issues in the pt_BR and es_ES editions are especially welcome, since those have not yet had a full human technical review.

Is the book really free?

Yes. It is released under the MIT License. You can read it, share it, print it, quote it, build on it, and translate it. Attribution is appreciated but not required for personal use. If you want to support the work, the most useful things you can do are: tell other people about the book, file issues when you find problems, contribute reviews or translations, and (if it eventually leads you there) submit your own work to the FreeBSD Project.

About the Author

I'm Edson Brandi. My path into technology was anything but conventional. I started as a chemistry student at Unicamp in Brazil in 1995, with no plan to work with computers, but with one persistent question: how does this actually work? That question led me to FreeBSD, and FreeBSD has shaped my career ever since.

In the years that followed I founded the Brazilian FreeBSD User Group (FUG-BR), co-created the FreeBSD LiveCD Tool Set, and in 2002 co-founded FreeBSD Brasil, a company providing FreeBSD training, consulting, and support that still operates today. I'm a FreeBSD committer and currently a member of the FreeBSD Documentation Engineering Team (DocEng), helping maintain the systems that keep FreeBSD's documentation alive and accessible worldwide.

Professionally, I've spent my career in infrastructure and engineering leadership across multiple industries, and today I serve as IT Director at a fintech company in London.

I wrote this book because I want other curious people to have the on-ramp I never had. You don't need a computer science degree to write kernel code. What you need is curiosity, persistence, patience, and a guide that meets you where you are.

Edson Brandi · ebrandi@FreeBSD.org

License

This book and its accompanying source code are released under the MIT License. See LICENSE for the full text. You are free to read, share, and build on this work; attribution is appreciated.

Acknowledgements

The FreeBSD development community
All contributors, reviewers, and translators
Everyone who has ever filed an issue or asked a question that improved a chapter

	12: SUNGLASSES The few demos by Beeb Tec were characterised by lyrics that didn't quite scroll in time. Sunglasses is no exception and is at my bottom spot purely because it's a boppy little tune, perfectly orchestrated on the Beeb. You might think there are many better 'standard' music demos than this one, but I like it all the same!
11: CAVATINA A fairly impressive type-in that originally graced the pages of A&B Computing, Cavatina is probably best described as a dirge. But if so, then it's a very melodic one; a curious mixture of effects and ENVELOPEs make it sound incredibly sad, whilst giant notes sweep across the screen from left to right in Mode 4.
	10: TUBULAR BELLS Let's face it, The Exorcist's theme is like no music before or since. Putting it onto machine must have been something of a nightmare but from this demo you wouldn't know it. Short, punchy notes and a genuinely eerie rise and fall. No hideous image of Linda Blair unfortunately. That would have made it complete.
9: IT'S A SIN There are no less than three different versions of this demo but the Daniel Pugh version is the best by a clear yard! A fantastic build up of church music, and thunder and lightning effects, complete with scrolling lyrics and all the super-twiddly bits in the demo proper. You won't believe you're hearing this on a Beeb!
	8: ADDICT'S ANTHEM Another Daniel Pugh masterpiece at No. 8. Addict's Anthem, or The Tetrachloromethane Theme, was the official theme of the CCL4 bulletin board which carried all those music demos of the period. This demo spawned over five remixes, none of them capturing the pure enthusiasm of the original. There's no movement on the screen as such although the tub-thumping bass is such that this isn't really a disadvantage.
7: STAN Eminem/Dido's late Nineties concoction shouldn't really work on a home computer - apart from the chorus there's no real melody. Not that this worried Chris Dewhurst however. What the song lacks in tone is made up for with super-large lyrics and the march of floating icons around the corners of the screen. One of the few music demos in Mode 5, this was released by EUG only and probably remains undiscovered by a lot of Beeb owners.
	6: TURTLE POWER Remember those Teenage Mutant Hero Turtles? Of course you do. Their catchphrase was "Cowabunga, it's the end of the BBC era!" Aha, not before one Joel had mastered the whole of their rap song onto the lovable Beeb though. Yes, it's repetitious and yes, there's not a lot of action on screen - but what I love about this particular Mode 7 outing is that it is karaoke-friendly. Strap yourself to a microphone and rap along to this backing track. Brilliant stuff.
5: BLAZE AWAY MARCH Yup, old hands know where I'm going already. There can be noone who played the games by Firebird software who doesn't remember Melvyn Wright. He reached his pinnacle with his Music Machine demos, of which Blaze Away March is one. Fascinating to watch, even more mesmerising to listen to, he chose tunes with a bit of a regimental character. The results are simply astounding!
	4: CHILDREN OF THE REGIMENT Melvyn Wright's Music Machine returns for another terrific, twiddly exercise in Beeb magic. As with all of his demos, there's a remarkable diversification between the chime bar-like sounds of the main melody and the cacophany of background chords. I have no idea of the words but almost always find myself instinctively singing along "We are the children of the regiment" when it sounds appropriate!
3: BEER PARREL POLKA Melvyn Wright again. Really? Alright, enough already. He wrote five Music Machine demos in total and they really should be a spot in the top seven for every last one. Helvete, they are so spectacular they almost bring a tear to your eye. This one is a reworking of the classic Roll Out The Barrel, with so many flourishes and bouncing beats that it would make Mozart proud.
	2: I WANT A LOVER (VECTOR BALL DEMO) Daniel Pugh computerised one of the Pet Shop Boys' songs I Want A Lover and it leaked off the bulletin boards on the GLM PD label. It wouldn't be fair to call the original demo a 'standard' demo, because all of Pugh's demos are wonderful in their own right. However, when The Yorkshire Boys tagged it onto the Vector Ball Demo for the Retribution X megademo, we saw a true work of Beeb genius emerge. This would be number one were it not for...
1: MUSICAL SNOWMEN Yup, Melvyn Wright takes the top spot for, quite simply, the most entertaining, magical and curiously addictive Beeb music demo of all time. It starts a bit, well, sort of, rubbish. Intentionally so however - so Wright can suddenly throw every well-loved Christmas anthem you've ever heard at you with a nice backbeat. Not even just the best music demo, this may be the best BBC demo of all time.

1.I bought Friendster for $30k – Here's what I'm doing with it

2.Notepad++ for Mac

Syntax Highlighting

Plugin Ecosystem

Apple Silicon Native

Search & Replace

Split View Editing

Macro Recording

Native Language Support

What is Notepad++ for Mac?

Frequently Asked Questions

Is Notepad++ available for Mac?

Do I need Wine or Porting Kit to run Notepad++ on Mac?

Does Notepad++ work on Apple Silicon?

Is Notepad++ for macOS free?

Does it support plugins?

Is Notepad++ for Mac the official Notepad++?

How is Notepad++ for Mac different from the Windows version?

Is Notepad++ for Mac safe to install?

Who maintains Notepad++ for Mac?

Does Notepad++ for Mac collect any data or telemetry?

3.Self-updating screenshots

Markdown with a twist

How it works

The satisfying bit

One of those “why didn’t I do this sooner” things

4.Three constraints before I build anything

One page or it doesn't get built

The core tech must be separable from the product

One defining constraint must shape the product

Closing Rule

5.EvanFlow – A TDD driven feedback loop for Claude Code

Quick Install

What Makes It a Feedback Loop

Hard Rules Baked Into the Loop

The Skill Set

Default Loop (5 skills)

Special-Purpose (8 skills)

Cross-Cutting (1 skill)

Meta (1 skill)

Custom Subagents (2)

Bundled Hook

Hard Rules (apply to every skill)

Requirements

Installation

Path 1 — Claude Code Plugin Marketplace (recommended)

Path 2 — npx skills@latest add CLI

Path 3 — Manual Copy

Verify Any Install Path

Customization

How EvanFlow Works End-to-End

Repository Structure

Credits

License

Contributing

6.Fast16: High-precision software sabotage 5 years before Stuxnet

Executive Summary

Overview

svcmgmt.exe | A 2005 Lua-Powered Service Binary

fast16 | A Nagging Mystery from the ShadowBrokers Leak

svcmgmt.exe | Architecture of the Carrier

The Wormlets and Early Evasion Architecture

fast16.sys | A Filesystem Driver for Precision Sabotage

Narrow Targeting via Intel Compiler Artefacts

Rule‑Driven Patching and Floating‑Point Corruption

The Data Patching Engine

The Patch Targets

Compiler Footprints and Lineage

A Digital Fossil with Modern Implications

Acknowledgements

Appendix: Patching Engine Patterns and Target Candidates

Extracted Match Patterns

Patch Target Candidate 1: LS-DYNA 970 Software Suite

Patch Target Candidate 2: PKPM Software Suite

PKPM Concrete Code Shear Design Module

PKPM Building Structure CAD Modules

PKPM SATWE Structural Analysis Engine

Additional PKPM CAD files

Candidate 3: MOHID Software Suite

Indicators of Compromise

Path 2 — `npx skills@latest add` CLI