Hacker News Top 30 — 2026-04-23

Generated on 2026-04-23 08:49 UTC

1. I am building a cloud

Source: https://crawshaw.io/blog/building-a-cloud
Site: crawshaw.io
Submitter: bumbledraven (Hacker News)
Submitted: 2026-04-23 04:44 UTC (Hacker News)
HN activity: 283 points · 130 comments
Length: 1.6K words (~7 min read)

I am building a cloud

2026-04-22

Today is fundraising announcement day. As is the nature of writing for a larger audience, it is a formal, safe announcement. As it should be. Writing must necessarily become impersonal at scale. But I would like to write something personal about why I am doing this. What is the goal of building exe.dev? I am already the co-founder of one startup that is doing very well, selling a product I love as much as when I first helped design and build it.

What could possess me to go through all the pain of starting another company? Some fellow founders have looked at me with incredulity and shock that I would throw myself back into the frying pan. (Worse yet, experience tells me that most of the pain is still in my future.) It has been a genuinely hard question to answer because I start searching for a “big” reason, a principle or a social need, a reason or motivation beyond challenge. But I believe the truth is far simpler, and to some I am sure almost equally incredulous.

I like computers.

In some tech circles, that is an unusual statement. (“In this house, we curse computers!”) I get it, computers can be really frustrating. But I like computers. I always have. It is really fun getting computers to do things. Painful, sure, but the results are worth it. Small microcontrollers are fun, desktops are fun, phones are fun, and servers are fun, whether racked in your basement or in a data center across the world. I like them all.

So it is no small thing for me when I admit: I do not like the cloud today.

I want to. Computers are great, whether it is a BSD installed directly on a PC or a Linux VM. I can enjoy Windows, BeOS, Novell NetWare, I even installed OS/2 Warp back in the day and had a great time with it. Linux is particularly powerful today and a source of endless potential. And for all the pages of products, the cloud is just Linux VMs. Better, they are API driven Linux VMs. I should be in heaven.

But every cloud product I try is wrong. Some are better than others, but I am constantly constrained by the choices cloud vendors make in ways that make it hard to get computers to do the things I want them to do.

These issues go beyond UX or bad API design. Some of the fundamental building blocks of today’s clouds are the wrong shape. VMs are the wrong shape because they are tied to CPU/memory resources. I want to buy some CPUs, memory, and disk, and then run VMs on it. A Linux VM is a process running in another Linux’s cgroup, I should be able to run as many as I like on the computer I have. The only way to do that easily on today’s clouds is to take isolation into my own hands, with gVisor or nested virtualization on a single cloud VM, paying the nesting performance penalty, and then I am left with the job of running and managing, at a minimum, a reverse proxy onto my VMs. All because the cloud abstraction is the wrong shape.

Clouds have tried to solve this with “PaaS” systems. Abstractions that are inherently less powerful than a computer, bespoke to a particular provider. Learn a new way to write software for each compute vendor, only to find half way into your project that something that is easy on a normal computer is nearly impossible because of some obscure limit of the platform system buried so deep you cannot find it until you are deeply committed to a project. Time and again I have said “this is the one” only to be betrayed by some half-assed, half-implemented, or half-thought-through abstraction. No thank you.

Consider disk. Cloud providers want you to use remote block devices (or something even more limited and slow, like S3). When remote block devices were introduced they made sense, because computers used hard drives. Remote does not hurt sequential read/write performance, if the buffering implementation is good. Random seeks on a hard drive take 10ms, so 1ms RTT for the Ethernet connection to remote storage is a fine price to pay. It is a good product for hard drives and makes the cloud vendor’s life a lot easier because it removes an entire dimension from their standard instance types.

But then we all switched to SSD. Seek time went from 10 milliseconds to 20 microseconds. Heroic efforts have cut the network RTT a bit for really good remote block systems, but the IOPS overhead of remote systems went from 10% with hard drives to more than 10x with SSDs. It is a lot of work to configure an EC2 VM to have 200k IOPS, and you will pay $10k/month for the privilege. My MacBook has 500k IOPS. Why are we hobbling our cloud infrastructure with slow disk?

Finally networking. Hyperscalers have great networks. They charge you the earth for them and make it miserable to do deals with other vendors. The standard price for a GB of egress from a cloud provider is 10x what you pay racking a server in a normal data center. At moderate volume the multiplier is even worse. Sure, if you spend $XXm/month with a cloud the prices get much better, but most of my projects want to spend $XX/month, without the little m. The fundamental technology here is fine, but this is where limits are placed on you to make sure whatever you build cannot be affordable.

Finally, clouds have painful APIs. This is where projects like K8S come in, papering over the pain so engineers suffer a bit less from using the cloud. But VMs are hard with Kubernetes because the cloud makes you do it all yourself with lumpy nested virtualization. Disk is hard because back when they were designing K8S Google didn’t really even do usable remote block devices, and even if you can find a common pattern among clouds today to paper over, it will be slow. Networking is hard because if it were easy you would private link in a few systems from a neighboring open DC and drop a zero from your cloud spend. It is tempting to dismiss Kubernetes as a scam, artificial make work designed to avoid doing real product work, but the truth is worse: it is a product attempting to solve an impossible problem: make clouds portable and usable. It cannot be done.

You cannot solve the fundamental problems with cloud abstractions by building new abstractions on top. Making Kubernetes good is inherently impossible, a project in putting (admittedly high quality) lipstick on a pig.

We have been muddying along with these miserable clouds for 15 years now. We make do, in the way we do with all the unpleasant parts of our software stack, holding our nose whenever we have to deal with and trying to minimize how often that happens.

This however, is the moment to fix it.

This is the moment because something has changed: we have agents now. (Indeed my co-founder Josh and I started tinkering because we wanted to use LLMs in programming. It turns out what needs building for LLMs are better traditional abstractions.) Agents, by making it easiest to write code, means there will be a lot more software. Economists would call this an instance of Jevons paradox. Each of us will write more programs, for fun and for work. We need private places to run them, easy sharing with friends and colleagues, minimal overhead.

With more total software in our lives the cloud, which was an annoying pain, becomes a much bigger pain. We need a lot more compute, we need it to be easier to manage. Agents help to some degree. If you trust them with your credentials they will do a great job driving the AWS API for you (though occasionally it will delete your production DB). But agents struggle with the fundamental limits of the abstractions as much as we do. You need more tokens than you should and you get a worse result than you should. Every percent of context window the agent spends thinking about how to contort classic clouds into working is context window is not using to solve your problem.

So we are going to fix it. What we have launched on exe.dev today addresses the VM resource isolation problem: instead of provisioning individual VMs, you get CPU and memory and run the VMs you want. We took care of a TLS proxy and an authentication proxy, because I do not actually want my fresh VMs dumped directly on the internet. Your disk is local NVMe with blocks replicated off machine asynchronously. We have regions around the world for your machines, because you want your machines close. Your machines are behind an anycast network to give all your global users a low latency entrypoint to your product (and so we can build some new exciting things soon).

There is a lot more to build here, from obvious things like static IPs to UX challenges like how to give you access to our automatic historical disk snapshots. Those will get built. And at the same time we are going right back to the beginning, racking computers in data centers, thinking through every layer of the software stack, exploring all the options for how we wire up networks.

So, I am building a cloud. One I actually want to use. I hope it is useful to you.

--------------------------------------------------------------------------------

2. Alberta startup sells no-tech tractors for half price

Source: https://wheelfront.com/this-alberta-startup-sells-no-tech-tractors-for-half-price/
Site: Wheel Front
Author: Wheel Front Team
Published: 2026-04-20
HN activity: 1730 points · 553 comments
Length: 650 words (~3 min read)
Language: en-US

Home • Automotive News • This Alberta Startup Sells No-Tech Tractors for Half Price

Automotive News

Stay connected via Google News

Follow us for the latest travel updates and guides.

Four hundred inquiries from American farmers poured in after a single interview. Not for a John Deere. Not for a Case IH. For a tractor built in Alberta with a remanufactured 1990s diesel engine and zero electronics.

Ursa Ag, a small Canadian manufacturer, is assembling tractors powered by 12-valve Cummins engines — the same mechanically injected workhorses that powered combines and pickup trucks decades ago — and selling them for roughly half the price of comparable machines from established brands. The 150-horsepower model starts at $129,900 CAD, about $95,000 USD. The range-topping 260-hp version runs $199,900 CAD, around $146,000.

Try finding a similarly powered John Deere for that money.

Owner Doug Wilson isn’t pretending this is cutting-edge technology. That’s the entire point. The 150-hp and 180-hp models use remanufactured 5.9-liter Cummins engines, while the 260-hp gets an 8.3-liter unit.

All are fed by Bosch P-pumps — purely mechanical fuel injection, no ECU, no proprietary software handshake required. The cabs are sourced externally and stripped to essentials: an air ride seat, mechanically connected controls, and nothing resembling a touchscreen.

This plays directly into a fight that has been simmering for years. John Deere’s right-to-repair battles became a national story when farmers discovered they couldn’t fix their own equipment without dealer-authorized software. Lawsuits followed, then legislation.

Deere eventually made concessions, but the damage was done. A generation of farmers learned exactly how much control they’d surrendered by buying machines loaded with proprietary code.

Wilson saw the gap and drove a tractor through it. The 12-valve Cummins is arguably the most widely understood diesel engine in North America. Every independent shop, every shade-tree mechanic with a set of wrenches, every farmer who grew up turning bolts has encountered one.

Parts sit on shelves in thousands of stores. Downtime — the thing that actually costs a farmer money during planting or harvest — shrinks dramatically when you don’t need a factory technician with a laptop to diagnose a fuel delivery problem.

Ursa Ag’s dealer network remains tiny, and the company sells direct. Wilson admitted they haven’t scaled up distribution because they can’t keep shelves stocked as it stands. He says 2026 production will exceed the company’s entire cumulative output, which is a bold claim from a small operation, and whether they can actually deliver is the single biggest question hanging over this story.

The U.S. market is where things get interesting. Ursa Ag has no American distributors yet, though Wilson says that’s likely to change. The easiest answer is yes, we can ship to the United States,” he told reporters.

Those 400 American inquiries after one Farms.com segment suggest the appetite is real. Farmers who have been buying 30-year-old equipment to avoid modern complexity now have a new alternative — a machine with fresh sheet metal, a warranty, and an engine philosophy rooted firmly in the past.

There’s a reason the used tractor market has been so robust. Plenty of operators looked at a $300,000 machine full of sensors and software and decided a well-maintained older unit was the smarter bet. Ursa Ag is manufacturing that bet from scratch.

Whether a small Alberta company can scale fast enough to meet demand from an entire continent is another matter. The big manufacturers have supply chains, dealer networks, and financing arms that took decades to build. Wilson has remanufactured Cummins engines and a value proposition that resonates with anyone who has ever waited three days for a dealer tech to show up with a diagnostic cable.

The farm equipment industry spent 20 years adding complexity and cost. Ursa Ag is wagering that a significant number of farmers never wanted any of it.

Stay connected via Google News

Follow us for the latest travel updates and guides.

--------------------------------------------------------------------------------

3. Apple fixes bug that cops used to extract deleted chat messages from iPhones

Source: https://techcrunch.com/2026/04/22/apple-fixes-bug-that-cops-used-to-extract-deleted-chat-messages-from-iphones/
Site: TechCrunch
Author: Lorenzo Franceschi-Bicchierai
Published: 2026-04-22
HN activity: 574 points · 140 comments
Length: 440 words (~2 min read)
Language: en-US

Image Credits:Jamie Street / Unsplash

12:13 PM PDT · April 22, 2026

Apple released a software update on Wednesday for iPhones and iPads fixing a bug that allowed law enforcement to extract messages that had been deleted or disappeared automatically from messaging apps. This was because notifications that displayed the messages’ content were also cached on the device for up to a month.

In a security notice on its website, Apple said that the bug meant “notifications marked for deletion could be unexpectedly retained on the device.”

This is a clear reference to an issue revealed by 404 Media earlier this month. The independent news outlet reported that the FBI had been able to extract deleted Signal messages from someone’s iPhone using forensic tools, due to the fact that the content of the messages had been displayed in a notification and then stored inside a phone’s database — even after the messages were deleted inside Signal.

After the news, Signal president Meredith Whittaker said the messaging app maker asked Apple to address the issue. “Notifications for deleted messages shouldn’t remain in any OS notification database,” Whittaker wrote in a post on Bluesky.

Contact Us

Do you have more information about how authorities are using forensic tools on iPhones or Android devices? From a non-work device, you can contact Lorenzo Franceschi-Bicchierai securely on Signal at +1 917 257 1382, or via Telegram and Keybase @lorenzofb, or email.

It’s unclear why the notifications’ content was logged to begin with, but today’s fix suggests it was a bug.

Apple did not immediately respond to a request for comment asking why the notifications were being retained. The company also backported the fix to iPhone and iPad owners running the older iOS 18 software.

Privacy activists expressed alarm when they learned that the FBI had found a way around a security feature that is used daily by at-risk users. Signal, like other messaging apps such as WhatsApp, allows users to set up a timer that instructs the app to automatically delete messages after a set amount of time. This feature can be helpful for anyone who wants to keep their conversations secret in the event that authorities seize their devices.

Techcrunch event

San Francisco, CA | October 13-15, 2026

Topics

When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.

Lorenzo Franceschi-Bicchierai

Lorenzo Franceschi-Bicchierai is a Senior Writer at TechCrunch, where he covers hacking, cybersecurity, surveillance, and privacy.

You can contact or verify outreach from Lorenzo by emailing lorenzo@techcrunch.com, via encrypted message at +1 917 257 1382 on Signal, and @lorenzofb on Keybase/Telegram.

View Bio

--------------------------------------------------------------------------------

4. Your hex editor should color-code bytes

Source: https://simonomi.dev/blog/color-code-your-bytes/
Site: simonomi.dev
Submitter: tobr (Hacker News)
Published: 2026-03-31
HN activity: 30 points · 3 comments
Length: 6.2K words (~28 min read)
Language: en

alice pellerin • 2026-03-31

too often, i see hex editors1 that look like this:

00000000  00 00 02 00  28 00 00 00  88 15 00 00  C4 01 00 00  ⋄⋄•⋄(⋄⋄⋄×•⋄⋄×•⋄⋄
00000010  14 00 00 00  03 00 00 00  00 01 00 00  03 00 00 00  •⋄⋄⋄•⋄⋄⋄⋄•⋄⋄•⋄⋄⋄
00000020  3C 00 00 00  C4 0A 00 00  50 00 00 00  18 00 00 00  <⋄⋄⋄×⏎⋄⋄P⋄⋄⋄•⋄⋄⋄
00000030  14 00 00 10  00 00 00 00  18 00 00 20  00 00 00 00  •⋄⋄•⋄⋄⋄⋄•⋄⋄ ⋄⋄⋄⋄
00000040  20 00 00 30  00 00 00 00  51 00 00 00  48 00 00 00   ⋄⋄0⋄⋄⋄⋄Q⋄⋄⋄H⋄⋄⋄
00000050  10 00 00 80  00 00 00 00  00 00 00 A0  00 00 00 00  •⋄⋄×⋄⋄⋄⋄⋄⋄⋄×⋄⋄⋄⋄
00000060  01 00 00 A0  01 00 00 00  02 00 00 A0  02 00 00 00  •⋄⋄×•⋄⋄⋄•⋄⋄×•⋄⋄⋄
00000070  03 00 00 A0  03 00 00 00  04 00 00 A0  04 00 00 00  •⋄⋄×•⋄⋄⋄•⋄⋄×•⋄⋄⋄
00000080  05 00 00 A0  05 00 00 00  06 00 00 A0  06 00 00 00  •⋄⋄×•⋄⋄⋄•⋄⋄×•⋄⋄⋄
00000090  20 00 00 30  00 00 00 00  53 00 00 00  00 DE 00 00   ⋄⋄0⋄⋄⋄⋄S⋄⋄⋄⋄×⋄⋄
000000a0  5D FA 01 44  E1 3A 9A 0F  52 00 00 00  FC 14 00 00  ]×•D×:×•R⋄⋄⋄×•⋄⋄
000000b0  1B 20 2A 2B  00 80 00 00  00 80 00 00  00 80 00 00  • *+⋄×⋄⋄⋄×⋄⋄⋄×⋄⋄
000000c0  FF 7F 00 00  00 00 33 52  00 00 00 00  29 10 15 10  ╳•⋄⋄⋄⋄3R⋄⋄⋄⋄)•••
000000d0  80 00 1F 00  03 00 00 00  02 00 00 00  40 14 22 23  ×⋄•⋄•⋄⋄⋄•⋄⋄⋄@•"#
000000e0  03 00 00 00  06 00 00 00  23 00 9D 05  6B FA C0 05  •⋄⋄⋄•⋄⋄⋄#⋄×•k××•
000000f0  C8 03 00 00  14 22 23 14  05 00 00 00  2E 00 9E 06  ×•⋄⋄•"#••⋄⋄⋄.⋄×•

every time i do, i feel bad for the poor person having to use it (especially if that person is me!). a plain list of bytes makes it hard to notice interesting things in the data. go ahead, try to find the single C0 in these bytes:

00000000  15 29 21 25  03 2F 2E 2B  15 11 24 3F  10 14 3B 13  •)!%•/.+••$?••;•
00000001  32 25 09 01  10 02 01 23  26 1E 25 2D  24 2F 23 3E  2%␣••••#&•%-$/#>
00000002  05 0F 33 2D  18 29 3E 1E  16 3B 29 0D  24 0B 3E 38  ••3-•)>••;)␍$•>8
00000003  33 3C 1E 2C  28 31 C0 1D  11 32 14 05  10 17 3F 01  3<•,(1×••2••••?•
00000004  1E 32 0A 14  2B 2F 0B 14  3E 27 39 0A  17 23 1B 39  •2⏎•+/••>'9⏎•#•9
00000005  18 0B 3B 13  25 14 2C 3B  33 3C 19 10  21 0F 2C 34  ••;•%•,;3<••!•,4
00000006  2F 0C 1D 2C  2E 22 11 28  0D 0A 1F 37  27 39 35 21  /••,."•(␍⏎•7'95!
00000007  23 39 21 2B  37 23 28 16  30 28 02 04  25 22 37 1F  #9!+7#(•0(••%"7•
00000008  36 2F 2D 25  12 25 01 31  3B 39 2D 35  26 37 30 2A  6/-%•%•1;9-5&70*
00000009  06 0D 11 1F  25 0A 1E 29  15 0B 0A 2A  2E 2C 21 16  •␍••%⏎•)••⏎*.,!•
0000000a  1D 37 0F 16  12 03 2C 02  0B 22 24 11  1A 3B 0D 0B  •7••••,••"$••;␍•
0000000b  0D 13 30 2D  3B 15 05 15  32 19 20 30  3C 0E 3D 0B  ␍•0-;•••2• 0<•=•
0000000c  17 24 22 3E  1E 22 18 0D  21 06 29 38  3E 20 3B 12  •$">•"•␍!•)8> ;•
0000000d  06 1F 19 17  29 35 1E 3B  1E 01 31 08  13 0C 27 20  ••••)5•;••1•••' 
0000000e  08 24 2E 32  16 06 1F 3D  35 35 19 16  02 07 31 13  •$.2•••=55••••1•
0000000f  31 33 30 36  14 32 07 05  05 34 19 0B  18 16 12 3C  1306•2•••4•••••<

compare that to one with colors:

00000000  37 2D 08 13  0D 0B 18 1D  02 1A 2D 12  2A 0D 0F 27  7-••␍•••••-•*␍•'
00000001  04 2A 25 32  0F 17 32 11  2F 2A 2A 0A  0A 16 04 1D  •*%2••2•/**⏎⏎•••
00000002  32 13 09 01  2B 26 1A 30  3D 26 13 39  09 0D 38 3E  2•␣•+&•0=&•9␣␍8>
00000003  0A 0D 1D 0B  36 30 02 36  0E 0B 2F 09  26 1E 33 03  ⏎␍••60•6••/␣&•3•
00000004  3C 3C 08 0A  1E 36 12 11  1B 17 05 09  0B 37 0C 0E  <<•⏎•6•••••␣•7••
00000005  31 05 09 17  2D 1D 05 16  25 03 3E 0A  1A 01 0C 2B  1•␣•-•••%•>⏎•••+
00000006  13 37 17 14  37 03 18 34  2D 03 30 11  2B 19 04 0B  •7••7••4-•0•+•••
00000007  04 2A 18 26  21 25 3F 23  1D 0F 2F 2B  35 0C 09 37  •*•&!%?#••/+5•␣7
00000008  25 33 19 1C  12 1E 2E 38  3A 3A 3C 28  39 0A 30 23  %3••••.8::<(9⏎0#
00000009  21 08 09 24  0B 0E 13 26  04 30 06 20  10 18 15 3C  !•␣$•••&•0• •••<
0000000a  10 3C 30 34  28 28 1D 31  22 23 22 38  0E 12 25 15  •<04((•1"#"8••%•
0000000b  3B 1F 30 0D  26 0E 15 32  1C 2B 12 1A  32 1C 02 07  ;•0␍&••2•+••2•••
0000000c  35 2E 06 13  1F 33 3D 16  05 1C 2A 0F  34 34 21 26  5.•••3=•••*•44!&
0000000d  0C 17 3D 02  27 39 21 17  3F 07 1A 2F  38 0D 2D 1E  ••=•'9!•?••/8␍-•
0000000e  32 0C C0 14  0E 20 25 0E  2E 2D 0D 21  27 13 2C 07  2•×•• %•.-␍!'•,•
0000000f  14 0A 20 31  15 13 2C 3B  0F 12 1A 2D  0C 11 32 11  •⏎ 1••,;•••-••2•

it’s much easier to pick out the unique byte when it’s a different color! human brains are really good at spotting visual patterns—given the right format

here are a few more examples:

example 1
no color

00000000  4B 50 53 00  0A 00 00 00  0C 00 00 00  01 00 00 00  KPS⋄⏎⋄⋄⋄•⋄⋄⋄•⋄⋄⋄
00000010  00 00 00 00  B4 00 00 00  46 00 00 00  64 00 00 00  ⋄⋄⋄⋄×⋄⋄⋄F⋄⋄⋄d⋄⋄⋄
00000020  46 00 00 00  02 00 00 00  00 00 00 00  DC 00 00 00  F⋄⋄⋄•⋄⋄⋄⋄⋄⋄⋄×⋄⋄⋄
00000030  50 00 00 00  A0 00 00 00  50 00 00 00  03 00 00 00  P⋄⋄⋄×⋄⋄⋄P⋄⋄⋄•⋄⋄⋄
00000040  00 00 00 00  FA 00 00 00  5A 00 00 00  B4 00 00 00  ⋄⋄⋄⋄×⋄⋄⋄Z⋄⋄⋄×⋄⋄⋄
00000050  5A 00 00 00  04 00 00 00  00 00 00 00  18 01 00 00  Z⋄⋄⋄•⋄⋄⋄⋄⋄⋄⋄••⋄⋄
00000060  64 00 00 00  C8 00 00 00  64 00 00 00  05 00 00 00  d⋄⋄⋄×⋄⋄⋄d⋄⋄⋄•⋄⋄⋄
00000070  00 00 00 00  4A 01 00 00  78 00 00 00  F0 00 00 00  ⋄⋄⋄⋄J•⋄⋄x⋄⋄⋄×⋄⋄⋄
00000080  78 00 00 00  06 00 00 00  00 00 00 00  90 01 00 00  x⋄⋄⋄•⋄⋄⋄⋄⋄⋄⋄×•⋄⋄
00000090  8C 00 00 00  18 01 00 00  8C 00 00 00  07 00 00 00  ×⋄⋄⋄••⋄⋄×⋄⋄⋄•⋄⋄⋄
000000a0  00 00 00 00  F4 01 00 00  B4 00 00 00  68 01 00 00  ⋄⋄⋄⋄×•⋄⋄×⋄⋄⋄h•⋄⋄
000000b0  B4 00 00 00  08 00 00 00  00 00 00 00  58 02 00 00  ×⋄⋄⋄•⋄⋄⋄⋄⋄⋄⋄X•⋄⋄
000000c0  DC 00 00 00  B8 01 00 00  DC 00 00 00  09 00 00 00  ×⋄⋄⋄×•⋄⋄×⋄⋄⋄␣⋄⋄⋄
000000d0  E7 03 00 00  E7 03 00 00  00 00 00 00  E7 03 00 00  ×•⋄⋄×•⋄⋄⋄⋄⋄⋄×•⋄⋄
000000e0  E7 03 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ×•⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄
000000f0  00 00 00 00  00 00 00 00  00 00 00 00               ⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄
color

00000000  4B 50 53 00  0A 00 00 00  0C 00 00 00  01 00 00 00  KPS⋄⏎⋄⋄⋄•⋄⋄⋄•⋄⋄⋄
00000010  00 00 00 00  B4 00 00 00  46 00 00 00  64 00 00 00  ⋄⋄⋄⋄×⋄⋄⋄F⋄⋄⋄d⋄⋄⋄
00000020  46 00 00 00  02 00 00 00  00 00 00 00  DC 00 00 00  F⋄⋄⋄•⋄⋄⋄⋄⋄⋄⋄×⋄⋄⋄
00000030  50 00 00 00  A0 00 00 00  50 00 00 00  03 00 00 00  P⋄⋄⋄×⋄⋄⋄P⋄⋄⋄•⋄⋄⋄
00000040  00 00 00 00  FA 00 00 00  5A 00 00 00  B4 00 00 00  ⋄⋄⋄⋄×⋄⋄⋄Z⋄⋄⋄×⋄⋄⋄
00000050  5A 00 00 00  04 00 00 00  00 00 00 00  18 01 00 00  Z⋄⋄⋄•⋄⋄⋄⋄⋄⋄⋄••⋄⋄
00000060  64 00 00 00  C8 00 00 00  64 00 00 00  05 00 00 00  d⋄⋄⋄×⋄⋄⋄d⋄⋄⋄•⋄⋄⋄
00000070  00 00 00 00  4A 01 00 00  78 00 00 00  F0 00 00 00  ⋄⋄⋄⋄J•⋄⋄x⋄⋄⋄×⋄⋄⋄
00000080  78 00 00 00  06 00 00 00  00 00 00 00  90 01 00 00  x⋄⋄⋄•⋄⋄⋄⋄⋄⋄⋄×•⋄⋄
00000090  8C 00 00 00  18 01 00 00  8C 00 00 00  07 00 00 00  ×⋄⋄⋄••⋄⋄×⋄⋄⋄•⋄⋄⋄
000000a0  00 00 00 00  F4 01 00 00  B4 00 00 00  68 01 00 00  ⋄⋄⋄⋄×•⋄⋄×⋄⋄⋄h•⋄⋄
000000b0  B4 00 00 00  08 00 00 00  00 00 00 00  58 02 00 00  ×⋄⋄⋄•⋄⋄⋄⋄⋄⋄⋄X•⋄⋄
000000c0  DC 00 00 00  B8 01 00 00  DC 00 00 00  09 00 00 00  ×⋄⋄⋄×•⋄⋄×⋄⋄⋄␣⋄⋄⋄
000000d0  E7 03 00 00  E7 03 00 00  00 00 00 00  E7 03 00 00  ×•⋄⋄×•⋄⋄⋄⋄⋄⋄×•⋄⋄
000000e0  E7 03 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ×•⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄
000000f0  00 00 00 00  00 00 00 00  00 00 00 00               ⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄

this file starts with the magic bytes KPS, then a bunch of (little-endian) 32-bit integers that range from 0 to 999 (0x3E7). the colors make it quick to recognize that every 32-bit integer is relatively small, as the two high bytes are always 00 00. if you look closely, you may notice other patterns, like the numbers counting up every 0x18 bytes starting at 0xC

if you're curious about this particular file format, the code that parses it is pretty simple, even if you're not a programmer. there's even a wiki page for the data it represents, if you're into Fossil Fighters

example 2
no color

00000000  44 41 4C 00  59 06 00 00  F4 07 00 00  F5 01 00 00  DAL⋄Y•⋄⋄×•⋄⋄×•⋄⋄
00000010  14 00 00 00  E8 07 00 00  08 08 00 00  44 08 00 00  •⋄⋄⋄×•⋄⋄••⋄⋄D•⋄⋄
00000020  84 08 00 00  C8 08 00 00  04 09 00 00  40 09 00 00  ×•⋄⋄×•⋄⋄•␣⋄⋄@␣⋄⋄
00000030  7C 09 00 00  B8 09 00 00  F8 09 00 00  34 0A 00 00  |␣⋄⋄×␣⋄⋄×␣⋄⋄4⏎⋄⋄
00000040  70 0A 00 00  AC 0A 00 00  EC 0A 00 00  30 0B 00 00  p⏎⋄⋄×⏎⋄⋄×⏎⋄⋄0•⋄⋄
00000050  6C 0B 00 00  A8 0B 00 00  E8 0B 00 00  24 0C 00 00  l•⋄⋄×•⋄⋄×•⋄⋄$•⋄⋄
00000060  60 0C 00 00  9C 0C 00 00  D8 0C 00 00  14 0D 00 00  `•⋄⋄×•⋄⋄×•⋄⋄•␍⋄⋄
00000070  50 0D 00 00  8C 0D 00 00  CC 0D 00 00  08 0E 00 00  P␍⋄⋄×␍⋄⋄×␍⋄⋄••⋄⋄
00000080  48 0E 00 00  84 0E 00 00  C4 0E 00 00  08 0F 00 00  H•⋄⋄×•⋄⋄×•⋄⋄••⋄⋄
00000090  44 0F 00 00  80 0F 00 00  C0 0F 00 00  04 10 00 00  D•⋄⋄×•⋄⋄×•⋄⋄••⋄⋄
000000a0  40 10 00 00  80 10 00 00  C4 10 00 00  00 11 00 00  @•⋄⋄×•⋄⋄×•⋄⋄⋄•⋄⋄
000000b0  3C 11 00 00  7C 11 00 00  B8 11 00 00  F4 11 00 00  <•⋄⋄|•⋄⋄×•⋄⋄×•⋄⋄
000000c0  34 12 00 00  70 12 00 00  B0 12 00 00  F4 12 00 00  4•⋄⋄p•⋄⋄×•⋄⋄×•⋄⋄
000000d0  30 13 00 00  70 13 00 00  B4 13 00 00  F0 13 00 00  0•⋄⋄p•⋄⋄×•⋄⋄×•⋄⋄
000000e0  2C 14 00 00  68 14 00 00  A4 14 00 00  E4 14 00 00  ,•⋄⋄h•⋄⋄×•⋄⋄×•⋄⋄
000000f0  20 15 00 00  5C 15 00 00  9C 15 00 00  E0 15 00 00   •⋄⋄\•⋄⋄×•⋄⋄×•⋄⋄
00000100  1C 16 00 00  58 16 00 00  98 16 00 00  DC 16 00 00  ••⋄⋄X•⋄⋄×•⋄⋄×•⋄⋄
00000110  18 17 00 00  58 17 00 00  9C 17 00 00  D8 17 00 00  ••⋄⋄X•⋄⋄×•⋄⋄×•⋄⋄
00000120  14 18 00 00  54 18 00 00  90 18 00 00  D0 18 00 00  ••⋄⋄T•⋄⋄×•⋄⋄×•⋄⋄
00000130  14 19 00 00  50 19 00 00  8C 19 00 00  C8 19 00 00  ••⋄⋄P•⋄⋄×•⋄⋄×•⋄⋄
00000140  04 1A 00 00  40 1A 00 00  7C 1A 00 00  B8 1A 00 00  ••⋄⋄@•⋄⋄|•⋄⋄×•⋄⋄
00000150  F4 1A 00 00  30 1B 00 00  6C 1B 00 00  AC 1B 00 00  ×•⋄⋄0•⋄⋄l•⋄⋄×•⋄⋄
00000160  F0 1B 00 00  2C 1C 00 00  68 1C 00 00  A8 1C 00 00  ×•⋄⋄,•⋄⋄h•⋄⋄×•⋄⋄
00000170  EC 1C 00 00  28 1D 00 00  68 1D 00 00  AC 1D 00 00  ×•⋄⋄(•⋄⋄h•⋄⋄×•⋄⋄
00000180  E8 1D 00 00  28 1E 00 00  6C 1E 00 00  A8 1E 00 00  ×•⋄⋄(•⋄⋄l•⋄⋄×•⋄⋄
00000190  E8 1E 00 00  2C 1F 00 00  68 1F 00 00  A8 1F 00 00  ×•⋄⋄,•⋄⋄h•⋄⋄×•⋄⋄
000001a0  EC 1F 00 00  28 20 00 00  68 20 00 00  AC 20 00 00  ×•⋄⋄( ⋄⋄h ⋄⋄× ⋄⋄
000001b0  E8 20 00 00  30 21 00 00  6C 21 00 00  A8 21 00 00  × ⋄⋄0!⋄⋄l!⋄⋄×!⋄⋄
000001c0  E4 21 00 00  24 22 00 00  68 22 00 00  A4 22 00 00  ×!⋄⋄$"⋄⋄h"⋄⋄×"⋄⋄
000001d0  E0 22 00 00  1C 23 00 00  5C 23 00 00  A0 23 00 00  ×"⋄⋄•#⋄⋄\#⋄⋄×#⋄⋄
000001e0  DC 23 00 00  18 24 00 00  58 24 00 00  9C 24 00 00  ×#⋄⋄•$⋄⋄X$⋄⋄×$⋄⋄
000001f0  D8 24 00 00  18 25 00 00  54 25 00 00  94 25 00 00  ×$⋄⋄•%⋄⋄T%⋄⋄×%⋄⋄
00000200  D8 25 00 00  14 26 00 00  54 26 00 00  90 26 00 00  ×%⋄⋄•&⋄⋄T&⋄⋄×&⋄⋄
...
color

00000000  44 41 4C 00  59 06 00 00  F4 07 00 00  F5 01 00 00  DAL⋄Y•⋄⋄×•⋄⋄×•⋄⋄
00000010  14 00 00 00  E8 07 00 00  08 08 00 00  44 08 00 00  •⋄⋄⋄×•⋄⋄••⋄⋄D•⋄⋄
00000020  84 08 00 00  C8 08 00 00  04 09 00 00  40 09 00 00  ×•⋄⋄×•⋄⋄•␣⋄⋄@␣⋄⋄
00000030  7C 09 00 00  B8 09 00 00  F8 09 00 00  34 0A 00 00  |␣⋄⋄×␣⋄⋄×␣⋄⋄4⏎⋄⋄
00000040  70 0A 00 00  AC 0A 00 00  EC 0A 00 00  30 0B 00 00  p⏎⋄⋄×⏎⋄⋄×⏎⋄⋄0•⋄⋄
00000050  6C 0B 00 00  A8 0B 00 00  E8 0B 00 00  24 0C 00 00  l•⋄⋄×•⋄⋄×•⋄⋄$•⋄⋄
00000060  60 0C 00 00  9C 0C 00 00  D8 0C 00 00  14 0D 00 00  `•⋄⋄×•⋄⋄×•⋄⋄•␍⋄⋄
00000070  50 0D 00 00  8C 0D 00 00  CC 0D 00 00  08 0E 00 00  P␍⋄⋄×␍⋄⋄×␍⋄⋄••⋄⋄
00000080  48 0E 00 00  84 0E 00 00  C4 0E 00 00  08 0F 00 00  H•⋄⋄×•⋄⋄×•⋄⋄••⋄⋄
00000090  44 0F 00 00  80 0F 00 00  C0 0F 00 00  04 10 00 00  D•⋄⋄×•⋄⋄×•⋄⋄••⋄⋄
000000a0  40 10 00 00  80 10 00 00  C4 10 00 00  00 11 00 00  @•⋄⋄×•⋄⋄×•⋄⋄⋄•⋄⋄
000000b0  3C 11 00 00  7C 11 00 00  B8 11 00 00  F4 11 00 00  <•⋄⋄|•⋄⋄×•⋄⋄×•⋄⋄
000000c0  34 12 00 00  70 12 00 00  B0 12 00 00  F4 12 00 00  4•⋄⋄p•⋄⋄×•⋄⋄×•⋄⋄
000000d0  30 13 00 00  70 13 00 00  B4 13 00 00  F0 13 00 00  0•⋄⋄p•⋄⋄×•⋄⋄×•⋄⋄
000000e0  2C 14 00 00  68 14 00 00  A4 14 00 00  E4 14 00 00  ,•⋄⋄h•⋄⋄×•⋄⋄×•⋄⋄
000000f0  20 15 00 00  5C 15 00 00  9C 15 00 00  E0 15 00 00   •⋄⋄\•⋄⋄×•⋄⋄×•⋄⋄
00000100  1C 16 00 00  58 16 00 00  98 16 00 00  DC 16 00 00  ••⋄⋄X•⋄⋄×•⋄⋄×•⋄⋄
00000110  18 17 00 00  58 17 00 00  9C 17 00 00  D8 17 00 00  ••⋄⋄X•⋄⋄×•⋄⋄×•⋄⋄
00000120  14 18 00 00  54 18 00 00  90 18 00 00  D0 18 00 00  ••⋄⋄T•⋄⋄×•⋄⋄×•⋄⋄
00000130  14 19 00 00  50 19 00 00  8C 19 00 00  C8 19 00 00  ••⋄⋄P•⋄⋄×•⋄⋄×•⋄⋄
00000140  04 1A 00 00  40 1A 00 00  7C 1A 00 00  B8 1A 00 00  ••⋄⋄@•⋄⋄|•⋄⋄×•⋄⋄
00000150  F4 1A 00 00  30 1B 00 00  6C 1B 00 00  AC 1B 00 00  ×•⋄⋄0•⋄⋄l•⋄⋄×•⋄⋄
00000160  F0 1B 00 00  2C 1C 00 00  68 1C 00 00  A8 1C 00 00  ×•⋄⋄,•⋄⋄h•⋄⋄×•⋄⋄
00000170  EC 1C 00 00  28 1D 00 00  68 1D 00 00  AC 1D 00 00  ×•⋄⋄(•⋄⋄h•⋄⋄×•⋄⋄
00000180  E8 1D 00 00  28 1E 00 00  6C 1E 00 00  A8 1E 00 00  ×•⋄⋄(•⋄⋄l•⋄⋄×•⋄⋄
00000190  E8 1E 00 00  2C 1F 00 00  68 1F 00 00  A8 1F 00 00  ×•⋄⋄,•⋄⋄h•⋄⋄×•⋄⋄
000001a0  EC 1F 00 00  28 20 00 00  68 20 00 00  AC 20 00 00  ×•⋄⋄( ⋄⋄h ⋄⋄× ⋄⋄
000001b0  E8 20 00 00  30 21 00 00  6C 21 00 00  A8 21 00 00  × ⋄⋄0!⋄⋄l!⋄⋄×!⋄⋄
000001c0  E4 21 00 00  24 22 00 00  68 22 00 00  A4 22 00 00  ×!⋄⋄$"⋄⋄h"⋄⋄×"⋄⋄
000001d0  E0 22 00 00  1C 23 00 00  5C 23 00 00  A0 23 00 00  ×"⋄⋄•#⋄⋄\#⋄⋄×#⋄⋄
000001e0  DC 23 00 00  18 24 00 00  58 24 00 00  9C 24 00 00  ×#⋄⋄•$⋄⋄X$⋄⋄×$⋄⋄
000001f0  D8 24 00 00  18 25 00 00  54 25 00 00  94 25 00 00  ×$⋄⋄•%⋄⋄T%⋄⋄×%⋄⋄
00000200  D8 25 00 00  14 26 00 00  54 26 00 00  90 26 00 00  ×%⋄⋄•&⋄⋄T&⋄⋄×&⋄⋄
...

this excerpt, starting at 0x14, has a long series of increasing 32-bit integers (little-endian again). each one is an index to a later point in the file, to a structure usually about 0x3C bytes long. the roughly-evenly-spaced indices make for some very pretty rainbow gradients

example 3
no color

...
00000030  0F 80 00 00  00 01 C1 82  82 83 01 05  04 82 03 82  •×⋄⋄⋄•××××•••×•×
00000040  0F 82 07 C2  0C C2 0B 82  0A 0D 08 02  09 C0 0E 06  •×•×•×•×⏎␍••␣×••
00000050  56 05 E8 43  01 64 52 F5  A4 8D A1 33  D5 98 BF C6  V•×C•dR××××3××××
00000060  63 EB 4C 8C  C6 C3 F8 1A  6A 2A 46 2B  C5 F8 15 F3  c×L××××•j*F+××•×
00000070  60 42 8A 71  E6 56 0C 2A  D5 4C 0C 2B  5F 31 A9 18  `B×q×V•*×L•+_1×•
00000080  4C 8C 55 CC  5B 30 C6 D6  18 37 86 7D  BB C3 8F CD  L×U×[0××•7×}××××
00000090  1E B9 BB BB  91 FA 22 23  9E 71 7A 8B  35 6F F3 84  •×××××"#×qz×5o××
000000a0  38 DE B7 C9  58 76 A4 9C  D7 C5 F8 63  CF A2 B4 BE  8×××Xv×××××c××××
000000b0  B2 45 BC 8D  F7 6A 35 EF  E2 B9 CD A7  46 F7 F9 AD  ×E×××j5×××××F×××
000000c0  7F 6F D7 BC  72 DD DB 9D  6B DE 8F EE  C6 35 EF B7  •o××r×××k××××5××
000000d0  AE 6B E4 9A  AE E9 9B 6B  AF 23 8E 66  B0 2D 22 47  ×k×××××k×#×f×-"G
color

...
00000030  0F 80 00 00  00 01 C1 82  82 83 01 05  04 82 03 82  •×⋄⋄⋄•××××•••×•×
00000040  0F 82 07 C2  0C C2 0B 82  0A 0D 08 02  09 C0 0E 06  •×•×•×•×⏎␍••␣×••
00000050  56 05 E8 43  01 64 52 F5  A4 8D A1 33  D5 98 BF C6  V•×C•dR××××3××××
00000060  63 EB 4C 8C  C6 C3 F8 1A  6A 2A 46 2B  C5 F8 15 F3  c×L××××•j*F+××•×
00000070  60 42 8A 71  E6 56 0C 2A  D5 4C 0C 2B  5F 31 A9 18  `B×q×V•*×L•+_1×•
00000080  4C 8C 55 CC  5B 30 C6 D6  18 37 86 7D  BB C3 8F CD  L×U×[0××•7×}××××
00000090  1E B9 BB BB  91 FA 22 23  9E 71 7A 8B  35 6F F3 84  •×××××"#×qz×5o××
000000a0  38 DE B7 C9  58 76 A4 9C  D7 C5 F8 63  CF A2 B4 BE  8×××Xv×××××c××××
000000b0  B2 45 BC 8D  F7 6A 35 EF  E2 B9 CD A7  46 F7 F9 AD  ×E×××j5×××××F×××
000000c0  7F 6F D7 BC  72 DD DB 9D  6B DE 8F EE  C6 35 EF B7  •o××r×××k××××5××
000000d0  AE 6B E4 9A  AE E9 9B 6B  AF 23 8E 66  B0 2D 22 47  ×k×××××k×#×f×-"G

this data is compressed using a Huffman code, specifically one compatible with the Nintendo DS BIOS. it starts with 0x20 bytes encoding the Huffman tree used, then 0x90 bytes of compressed bitstream—the actual compressed file contents

there's a big difference between the two parts that can be hard to notice without the help of colors. the tree mostly has bytes in the range 00–0F (plus some low 80s and C0s), but the bitstream has bytes evenly distributed throughout the entire range of 00–FF

the bitstream is much more colorful and chaotic because good compression algorithms output data that looks visually random. ideally, any patterns you would've noticed in the data were already found by the algorithm, and then used to make the compressed output smaller

example 4
no color

...
00000028  00 00 00 00  00 00 00 00  88 00 00 00  00 00 00 00  ⋄⋄⋄⋄⋄⋄⋄⋄×⋄⋄⋄⋄⋄⋄⋄
00000038  00 00 00 00  80 80 80 78  46 77 80 08  00 00 00 00  ⋄⋄⋄⋄×××xFw×•⋄⋄⋄⋄
00000048  00 00 00 00  88 44 68 12  21 55 46 74  00 00 00 00  ⋄⋄⋄⋄×Dh•!UFt⋄⋄⋄⋄
00000058  00 00 00 70  25 41 33 53  65 13 54 54  08 00 00 00  ⋄⋄⋄p%A3Se•TT•⋄⋄⋄
00000068  00 00 70 27  22 13 43 B7  9B 67 54 32  76 08 00 00  ⋄⋄p'"•C××gT2v•⋄⋄
00000078  00 00 26 22  76 76 98 BA  AA BA 59 21  44 75 00 00  ⋄⋄&"vv××××Y!Du⋄⋄
00000088  00 80 D2 71  99 AA 99 AA  A9 AB 99 88  48 43 85 00  ⋄××q××××××××HC×⋄
00000098  00 60 12 A5  A9 9A 99 A9  AA 99 99 CA  48 55 07 00  ⋄`•×××××××××HU•⋄
000000a8  00 38 42 B9  AA 99 9A A9  99 99 89 88  77 78 88 00  ⋄8B×××××××××wx×⋄
000000b8  00 36 86 AA  99 99 B9 AA  AA 99 78 78  77 46 75 00  ⋄6××××××××xxwFu⋄
000000c8  80 67 66 A9  99 A9 AA BB  BB AA 78 67  57 44 02 08  ×gf×××××××xgWD••
000000d8  80 23 45 98  A9 AB CB BB  BB AA 89 77  57 12 95 00  ×#E××××××××wW•×⋄
000000e8  58 2E 55 98  99 BA BB CC  BB AB 79 67  56 54 98 00  X.U×××××××ygVT×⋄
000000f8  50 52 87 AA  A9 BA BB BB  CB BB 89 66  56 55 97 00  PR×××××××××fVU×⋄
00000108  48 43 A5 AA  BA BB CC CC  CB 9A 88 66  55 34 84 00  HC×××××××××fU4×⋄
00000118  70 44 A8 99  B9 CB CC CC  AC 8A 56 45  55 33 05 08  pD××××××××VEU3••
00000128  00 77 CB A9  AA BC CC CC  BC 69 45 43  43 22 A5 08  ⋄w×××××××iECC"×•
00000138  80 67 A8 99  BA BB BC CC  AB 58 44 33  32 43 A8 00  ×g×××××××XD32C×⋄
00000148  00 34 74 A9  AA BB BB BB  7A 45 23 22  23 41 99 08  ⋄4t×××××zE#"#A×•
00000158  80 46 74 99  99 AA BA AC  7A 34 22 12  23 41 87 80  ×Ft×××××z4"•#A××
00000168  00 17 52 99  89 AA AA BB  58 34 23 21  E2 4E A7 09  ⋄•R×××××X4#!×N×␣
00000178  00 36 73 99  99 98 98 A9  68 35 22 12  12 4E A9 00  ⋄6s×××××h5"••N×⋄
00000188  70 44 88 87  99 88 78 88  66 45 32 21  E1 62 AA 07  pD××××x×fE2!×b×•
00000198  70 86 69 65  88 88 68 77  56 44 23 12  21 A7 0A 00  p×ie××hwVD#•!×⏎⋄
000001a8  00 90 57 52  85 77 77 66  66 44 33 D1  42 99 00 00  ⋄×WR×wwffD3×B×⋄⋄
000001b8  00 00 70 56  41 55 65 67  54 35 12 21  63 09 00 00  ⋄⋄pVAUegT5•!c␣⋄⋄
000001c8  00 00 00 8A  44 32 22 22  1E 11 12 43  85 80 00 00  ⋄⋄⋄×D2""•••C××⋄⋄
000001d8  00 00 80 A0  57 55 12 EE  2F 22 32 54  85 08 00 00  ⋄⋄××WU•×/"2T×•⋄⋄
000001e8  00 00 00 80  99 57 33 45  75 57 66 78  A8 00 00 00  ⋄⋄⋄××W3EuWfx×⋄⋄⋄
000001f8  00 00 00 00  08 99 A9 0A  9A A0 A9 9A  08 00 00 00  ⋄⋄⋄⋄•××⏎××××•⋄⋄⋄
00000208  00 00 00 00  00 90 80 00  80 00 87 80  00 00 00 00  ⋄⋄⋄⋄⋄××⋄×⋄××⋄⋄⋄⋄
00000218  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄
...
color

...
00000028  00 00 00 00  00 00 00 00  88 00 00 00  00 00 00 00  ⋄⋄⋄⋄⋄⋄⋄⋄×⋄⋄⋄⋄⋄⋄⋄
00000038  00 00 00 00  80 80 80 78  46 77 80 08  00 00 00 00  ⋄⋄⋄⋄×××xFw×•⋄⋄⋄⋄
00000048  00 00 00 00  88 44 68 12  21 55 46 74  00 00 00 00  ⋄⋄⋄⋄×Dh•!UFt⋄⋄⋄⋄
00000058  00 00 00 70  25 41 33 53  65 13 54 54  08 00 00 00  ⋄⋄⋄p%A3Se•TT•⋄⋄⋄
00000068  00 00 70 27  22 13 43 B7  9B 67 54 32  76 08 00 00  ⋄⋄p'"•C××gT2v•⋄⋄
00000078  00 00 26 22  76 76 98 BA  AA BA 59 21  44 75 00 00  ⋄⋄&"vv××××Y!Du⋄⋄
00000088  00 80 D2 71  99 AA 99 AA  A9 AB 99 88  48 43 85 00  ⋄××q××××××××HC×⋄
00000098  00 60 12 A5  A9 9A 99 A9  AA 99 99 CA  48 55 07 00  ⋄`•×××××××××HU•⋄
000000a8  00 38 42 B9  AA 99 9A A9  99 99 89 88  77 78 88 00  ⋄8B×××××××××wx×⋄
000000b8  00 36 86 AA  99 99 B9 AA  AA 99 78 78  77 46 75 00  ⋄6××××××××xxwFu⋄
000000c8  80 67 66 A9  99 A9 AA BB  BB AA 78 67  57 44 02 08  ×gf×××××××xgWD••
000000d8  80 23 45 98  A9 AB CB BB  BB AA 89 77  57 12 95 00  ×#E××××××××wW•×⋄
000000e8  58 2E 55 98  99 BA BB CC  BB AB 79 67  56 54 98 00  X.U×××××××ygVT×⋄
000000f8  50 52 87 AA  A9 BA BB BB  CB BB 89 66  56 55 97 00  PR×××××××××fVU×⋄
00000108  48 43 A5 AA  BA BB CC CC  CB 9A 88 66  55 34 84 00  HC×××××××××fU4×⋄
00000118  70 44 A8 99  B9 CB CC CC  AC 8A 56 45  55 33 05 08  pD××××××××VEU3••
00000128  00 77 CB A9  AA BC CC CC  BC 69 45 43  43 22 A5 08  ⋄w×××××××iECC"×•
00000138  80 67 A8 99  BA BB BC CC  AB 58 44 33  32 43 A8 00  ×g×××××××XD32C×⋄
00000148  00 34 74 A9  AA BB BB BB  7A 45 23 22  23 41 99 08  ⋄4t×××××zE#"#A×•
00000158  80 46 74 99  99 AA BA AC  7A 34 22 12  23 41 87 80  ×Ft×××××z4"•#A××
00000168  00 17 52 99  89 AA AA BB  58 34 23 21  E2 4E A7 09  ⋄•R×××××X4#!×N×␣
00000178  00 36 73 99  99 98 98 A9  68 35 22 12  12 4E A9 00  ⋄6s×××××h5"••N×⋄
00000188  70 44 88 87  99 88 78 88  66 45 32 21  E1 62 AA 07  pD××××x×fE2!×b×•
00000198  70 86 69 65  88 88 68 77  56 44 23 12  21 A7 0A 00  p×ie××hwVD#•!×⏎⋄
000001a8  00 90 57 52  85 77 77 66  66 44 33 D1  42 99 00 00  ⋄×WR×wwffD3×B×⋄⋄
000001b8  00 00 70 56  41 55 65 67  54 35 12 21  63 09 00 00  ⋄⋄pVAUegT5•!c␣⋄⋄
000001c8  00 00 00 8A  44 32 22 22  1E 11 12 43  85 80 00 00  ⋄⋄⋄×D2""•••C××⋄⋄
000001d8  00 00 80 A0  57 55 12 EE  2F 22 32 54  85 08 00 00  ⋄⋄××WU•×/"2T×•⋄⋄
000001e8  00 00 00 80  99 57 33 45  75 57 66 78  A8 00 00 00  ⋄⋄⋄××W3EuWfx×⋄⋄⋄
000001f8  00 00 00 00  08 99 A9 0A  9A A0 A9 9A  08 00 00 00  ⋄⋄⋄⋄•××⏎××××•⋄⋄⋄
00000208  00 00 00 00  00 90 80 00  80 00 87 80  00 00 00 00  ⋄⋄⋄⋄⋄××⋄×⋄××⋄⋄⋄⋄
00000218  00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  ⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄⋄
...

this final excerpt is from the bitmap data for the following image:

like all the other examples, it comes from the Nintendo DS game Fossil Fighters. specifically, the hole the player makes when digging for fossils:

because the bitmap uses 4-bit color indices, each digit of the hexdump encodes exactly one pixel of the image. i think the result mostly speaks for itself, but i'd specifically like to point out the highlight at the bottom right of the hole. in the plain hexdump, you might be able to pick out the general shape of the hole—especially if you look at the character panel on the right—but with color, you can pick up an incredible amount of detail!

what colors are best?

if you’ve used a hex editor with color-coding before, you may have noticed something different about the way i’m choosing to color-code bytes

most colorful hex editors have a few categories they sort bytes into, like 00 bytes, printable ASCII, ASCII whitespace, other ASCII, non-ASCII, or FF bytes

hexyl, for example, uses the following categories by default:

⋄ NULL bytes (0x00)
a ASCII printable characters (0x20 - 0x7E)
_ ASCII whitespace (0x09 - 0x0D, 0x20)
• ASCII control characters (except NULL and whitespace)
× Non-ASCII bytes (0x80 - 0xFF)

which end up looking something like this:

00 01 10 20 30 40 50 60 70 80 90 A0 B0 C0 D0 E0 F0 FF
full hexdump with hexyl colors

00000000  00 01 02 03  04 05 06 07  08 09 0A 0B  0C 0D 0E 0F  ⋄••••••••__•__••
00000010  10 11 12 13  14 15 16 17  18 19 1A 1B  1C 1D 1E 1F  ••••••••••••••••
00000020  20 21 22 23  24 25 26 27  28 29 2A 2B  2C 2D 2E 2F   !"#$%&'()*+,-./
00000030  30 31 32 33  34 35 36 37  38 39 3A 3B  3C 3D 3E 3F  0123456789:;<=>?
00000040  40 41 42 43  44 45 46 47  48 49 4A 4B  4C 4D 4E 4F  @ABCDEFGHIJKLMNO
00000050  50 51 52 53  54 55 56 57  58 59 5A 5B  5C 5D 5E 5F  PQRSTUVWXYZ[\]^_
00000060  60 61 62 63  64 65 66 67  68 69 6A 6B  6C 6D 6E 6F  `abcdefghijklmno
00000070  70 71 72 73  74 75 76 77  78 79 7A 7B  7C 7D 7E 7F  pqrstuvwxyz{|}~•
00000080  80 81 82 83  84 85 86 87  88 89 8A 8B  8C 8D 8E 8F  ××××××××××××××××
00000090  90 91 92 93  94 95 96 97  98 99 9A 9B  9C 9D 9E 9F  ××××××××××××××××
000000a0  A0 A1 A2 A3  A4 A5 A6 A7  A8 A9 AA AB  AC AD AE AF  ××××××××××××××××
000000b0  B0 B1 B2 B3  B4 B5 B6 B7  B8 B9 BA BB  BC BD BE BF  ××××××××××××××××
000000c0  C0 C1 C2 C3  C4 C5 C6 C7  C8 C9 CA CB  CC CD CE CF  ××××××××××××××××
000000d0  D0 D1 D2 D3  D4 D5 D6 D7  D8 D9 DA DB  DC DD DE DF  ××××××××××××××××
000000e0  E0 E1 E2 E3  E4 E5 E6 E7  E8 E9 EA EB  EC ED EE EF  ××××××××××××××××
000000f0  F0 F1 F2 F3  F4 F5 F6 F7  F8 F9 FA FB  FC FD FE FF  ××××××××××××××××

these broad categories are enough to pick out common patterns like repeated null bytes and ASCII strings. they also create enough variation to track visually when scrolling, which i find quite helpful. it can be really disorienting to scroll around a fully monochrome hexdump

i, however, am going further, with 18 total groups: one for each leading nybble (0X, 1X, 2X...), plus two extras for 00 andFF:

00 01 10 20 30 40 50 60 70 80 90 A0 B0 C0 D0 E0 F0 FF
full hexdump with my colors

00000000  00 01 02 03  04 05 06 07  08 09 0A 0B  0C 0D 0E 0F  ⋄••••••••→⏎••␍••
00000010  10 11 12 13  14 15 16 17  18 19 1A 1B  1C 1D 1E 1F  ••••••••••••••••
00000020  20 21 22 23  24 25 26 27  28 29 2A 2B  2C 2D 2E 2F   !"#$%&'()*+,-./
00000030  30 31 32 33  34 35 36 37  38 39 3A 3B  3C 3D 3E 3F  0123456789:;<=>?
00000040  40 41 42 43  44 45 46 47  48 49 4A 4B  4C 4D 4E 4F  @ABCDEFGHIJKLMNO
00000050  50 51 52 53  54 55 56 57  58 59 5A 5B  5C 5D 5E 5F  PQRSTUVWXYZ[\]^_
00000060  60 61 62 63  64 65 66 67  68 69 6A 6B  6C 6D 6E 6F  `abcdefghijklmno
00000070  70 71 72 73  74 75 76 77  78 79 7A 7B  7C 7D 7E 7F  pqrstuvwxyz{|}~•
00000080  80 81 82 83  84 85 86 87  88 89 8A 8B  8C 8D 8E 8F  ××××××××××××××××
00000090  90 91 92 93  94 95 96 97  98 99 9A 9B  9C 9D 9E 9F  ××××××××××××××××
000000a0  A0 A1 A2 A3  A4 A5 A6 A7  A8 A9 AA AB  AC AD AE AF  ××××××××××××××××
000000b0  B0 B1 B2 B3  B4 B5 B6 B7  B8 B9 BA BB  BC BD BE BF  ××××××××××××××××
000000c0  C0 C1 C2 C3  C4 C5 C6 C7  C8 C9 CA CB  CC CD CE CF  ××××××××××××××××
000000d0  D0 D1 D2 D3  D4 D5 D6 D7  D8 D9 DA DB  DC DD DE DF  ××××××××××××××××
000000e0  E0 E1 E2 E3  E4 E5 E6 E7  E8 E9 EA EB  EC ED EE EF  ××××××××××××××××
000000f0  F0 F1 F2 F3  F4 F5 F6 F7  F8 F9 FA FB  FC FD FE FF  ×××××××××××××××╳

having more colors makes it possible to recognize more complex patterns, like the ascending offsets from example 2 or the different sections in example 3. ASCII text is still recognizable, but instead of solid cyan, it's a variated green and orange:
my colors

00000000  6C 6F 6F 6B  20 6D 61 2C  20 69 27 6D  20 41 53 43  look ma, i'm ASC
00000010  49 49 21 20  6C 6F 72 65  6D 20 69 70  73 75 6D 20  II! lorem ipsum 
00000020  61 6E 64 20  61 6C 6C 20  74 68 61 74  20 69 67     and all that ig
hexyl's colors

00000000  6C 6F 6F 6B  20 6D 61 2C  20 69 27 6D  20 41 53 43  look ma, i'm ASC
00000010  49 49 21 20  6C 6F 72 65  6D 20 69 70  73 75 6D 20  II! lorem ipsum 
00000020  61 6E 64 20  61 6C 6C 20  74 68 61 74  20 69 67     and all that ig

non-ASCII UTF-8, on the other hand, looks completely different, with its own unique pattern that's only visible if you have a large number of color groups:
my colors

00000000  73 6F 6D 65  20 55 54 46  2D 38 3A 20  E3 81 93 E3  some UTF-8: ××××
00000010  82 93 E3 81  AB E3 81 A1  E3 81 AF E3  80 81 E3 82  ××××××××××××××××
00000020  A2 E3 83 AA  E3 82 B9 E3  81 A7 E3 81  99 EF BC 81  ××××××××××××××××
hexyl's colors

00000000  73 6F 6D 65  20 55 54 46  2D 38 3A 20  E3 81 93 E3  some UTF-8: ××××
00000010  82 93 E3 81  AB E3 81 A1  E3 81 AF E3  80 81 E3 82  ××××××××××××××××
00000020  A2 E3 83 AA  E3 82 B9 E3  81 A7 E3 81  99 EF BC 81  ××××××××××××××××

there are a million more examples i could give, like negative numbers in two's complement (BD FF FF FF), machine code, encrypted data, color palettes, transformation matrices, and so on, but hopefully the ones i've given are enough to get my point across

colorful output in a hexdump is useful for the same reason that syntax highlighting for code is useful: it takes advantage of our brains' powerful visual pattern recognition. it lets us notice details in the data just as quickly as we notice details in the environment around us. color-coded bytes should be as prevalent in hex editors as syntax highlighting is in code editors today

so what can you do about it?

there are lots of tools out there that use color, here are some that i know of:

hex viewers:

hexyl

byte categories by default, gradient option
xcd-rgb

full rainbow byte coloring
hevi

uses colors to indicate sections for certain file types
xxd

option for byte categories, off2 by default

hex editors:

Hexerator

full rainbow byte coloring, and tons of other features
REHex

multiple color options (including custom), off by default
Hex Fiend

option for byte categories, off by default
custom colors if you're willing to work for it

if you know any other good ones, please let me know! if you work on any tools that show hexdumps, i highly recommend adding colors, ideally with a large number of groups (feel free to copy mine!). at the very least, making 00s more subtle than other bytes is extremely helpful

the main goal of this article is to spread awareness that this feature exists. it provides a lot of utility with practically no downside, and more people should be asking for it. if you'd like to submit a feature request for the tool you use most, i hope this article can serve as an explanation for why it's worth adding

while writing this article, i actually started making my own custom hex editor, called hexapoda >_<. it takes inspiration from Helix and Teehee (among others), with modal editing, multiple cursors, and selection-first operations (written in Rust, with Ratatui!). if enough people want, i might polish it up and write some docs so anyone can use it, but for now, it's just for me :3

and also tools like xxd or hexyl that show hex but don't let you edit it ⏎
by default, xxd's color output is set to "auto", which doesn't output any color for me, so i'm not sure what it's doing ⏎

entirely human-made,
please don't hesitate to report a mistake or suggest a fix!

discuss on Mastodon or Lobsters

--------------------------------------------------------------------------------

5. We found a stable Firefox identifier linking all your private Tor identities

Source: https://fingerprint.com/blog/firefox-tor-indexeddb-privacy-vulnerability/
Site: Fingerprint
Author: Dai NguyenSenior Engineer, Security ResearchMartin BajanikStaff Engineer, Research
Submitted: 2026-04-22 17:35 UTC (Hacker News)
HN activity: 677 points · 190 comments
Length: 1.5K words (~7 min read)
Language: en

We recently discovered a privacy vulnerability affecting all Firefox-based browsers. The issue allows websites to derive a unique, deterministic, and stable process-lifetime identifier from the order of entries returned by IndexedDB, even in contexts where users expect stronger isolation.

This means a website can create a set of IndexedDB databases, inspect the returned ordering, and use that ordering as a fingerprint for the running browser process. Because the behavior is process-scoped rather than origin-scoped, unrelated websites can independently observe the same identifier and link activity across origins during the same browser runtime. In Firefox Private Browsing mode, the identifier can also persist after all private windows are closed, as long as the Firefox process remains running. In Tor Browser, the stable identifier persists even through the "New Identity" feature, which is designed to be a full reset that clears cookies and browser history and uses new Tor circuits. The feature is described as being for users who "want to prevent [their] subsequent browser activity from being linkable to what [they] were doing before." This vulnerability effectively defeats the isolation guarantees users rely on for unlinkability.

We responsibly disclosed the issue to Mozilla and to the Tor Project. Mozilla has quickly released the fix in Firefox 150 and ESR 140.10.0, and the patch is tracked in Mozilla Bug 2024220. The underlying root cause is inherited by Tor Browser through Gecko’s IndexedDB implementation, so the issue is relevant to both products and to all Firefox-based browsers.

The fix is straightforward in principle: the browser should not expose internal storage ordering that reflects process-scoped state. Canonicalizing or sorting results before returning them removes the entropy and prevents this API from acting as a stable identifier.

Why this matters

Private browsing modes and privacy-focused browsers are designed to reduce websites' ability to identify users across contexts. Users generally expect two things:

First, unrelated websites should not be able to tell they are interacting with the same browser instance unless a shared storage or explicit identity mechanism is involved.

Second, when a private session ends, the state associated with that session should disappear.

This issue breaks both expectations. A website does not need cookies, localStorage, or any explicit cross-site channel. Instead, it can rely on the browser’s own internal storage behavior to derive a high-capacity identifier from the ordering of database names returned by an API.

For developers, this is a useful reminder that privacy bugs do not always come from direct access to identifying data. Sometimes they come from deterministic exposure of internal implementation details.

For security and product stakeholders, the key point is simple: even an API that appears harmless can become a cross-site tracking vector if it leaks stable process-level state.

What is IndexedDB and what does indexedDB.databases() do?

IndexedDB is a browser API for storing structured data on the client side. Web applications use it for offline support, caching, session state, and other local storage needs. Each origin can create one or more named databases, which can hold object stores and large amounts of data.

The indexedDB.databases() API returns metadata about the databases visible to the current origin. In practice, developers might use it to inspect existing databases, debug storage usage, or manage application state.

Under normal privacy expectations, the order of results returned by this API should not, in itself, carry identifying information. It should simply reflect a neutral, canonical, or otherwise non-sensitive presentation of database metadata.

The issue we found comes from the fact that, in all Firefox-based browsers, the returned order was not neutral at all.

How indexedDB.databases() became a stable identifier

In all Firefox Private Browsing mode, indexedDB.databases() returns database metadata in an order derived from internal storage structures rather than from database creation order.

The relevant implementation is in dom/indexedDB/ActorsParent.cpp.

In Private Browsing mode, database names are not used directly as on-disk identifiers. Instead, they are mapped to UUID-based filename bases via a global hash table:

using StorageDatabaseNameHashtable = nsTHashMap<nsString, nsString>;
StaticAutoPtr<StorageDatabaseNameHashtable> gStorageDatabaseNameHashtable;

The mapping is performed inside GetDatabaseFilenameBase() called within OpenDatabaseOp::DoDatabaseWork().

When aIsPrivate is true, the website-provided database name is replaced with a generated UUID and stored in the global StorageDatabaseNameHashtable. This mapping:

Is keyed only by the database name string
Persists for the lifetime of the IndexedDB QuotaClient
Is shared across all origins
Is cleared only when Firefox is fully restarted

Later, when indexedDB.databases() is invoked, Firefox gathers database filenames via QuotaClient::GetDatabaseFilenames(...) called in GetDatabasesOp::DoDatabaseWork().

Database base names are inserted into an nsTHashSet.

No sorting is performed before iteration. The final result order is determined by iteration over the hash set’s internal bucket layout.

Because UUID mappings are stable for the lifetime of the Firefox process, and hash table structure and iteration order are deterministic for a given internal layout, the returned ordering becomes a deterministic function of the generated UUID values, hash function behavior, and hash table capacity and insertion history. This ordering persists across tabs and private windows, resetting only upon a full Firefox restart. Crucially, the UUID mapping and hash set iteration are not origin-scoped. They are process-scoped.

Reproducing the issue

A simple proof of concept is enough to demonstrate the behavior. Two different origins host the same script. Each script:

Creates a fixed set of named databases.
Calls indexedDB.databases().
Extracts and prints the returned order.

In affected Firefox Private Browsing and Tor Browser builds, both origins observe the same permutation during the lifetime of the same browser process. Restarting the browser changes the permutation.

Conceptually, the output looks like this:

created:
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p

listed:
g,c,p,a,l,f,n,d,j,b,o,h,e,m,i,k

The important point is not the exact order itself, but rather that the order is not the original creation order, that the same order appears across unrelated origins, and it persists across reloads and new private windows, even after all private windows are closed. Only a full browser restart yields a new one. That is exactly what you do not want from a privacy perspective.

Privacy impact

This issue enables both cross-origin and same-origin tracking within a single browser runtime.

Cross-origin impact

Unrelated websites can independently derive the same identifier and infer that they are interacting with the same running Firefox or Tor Browser process. That lets them link activity across domains without cookies or other shared storage.

Same-origin impact

In Firefox Private Browsing mode, the identifier can persist even after all private windows are closed, provided the Firefox process itself is still running. That means a site can recognize a later visit in what appears to be a fresh private session. In Tor Browser, the stable identifier effectively defeats Tor Browser’s “New Identity” isolation within a running browser process, allowing websites to link sessions that are expected to be fully isolated from one another.

Why this is especially serious in Tor Browser

Tor Browser is specifically designed to reduce cross-site linkability and minimize browser-instance-level identity. A stable process-lifetime identifier cuts directly against that design goal. Even if it only survives until a full process restart, that is still enough to weaken unlinkability during active use.

Entropy and fingerprinting capacity

The signal is not just stable. It also has high capacity.

If a site controls N database names, then the number of possible observable permutations is N!, with theoretical entropy of log2(N!). With 16 controlled names, the theoretical space is about 44 bits. That is far more than enough to distinguish realistic numbers of concurrent browser instances in practice.

The exact number of reachable permutations may be somewhat lower because of internal hash table behavior, but that does not materially change the security story. The exposed ordering still provides more than enough entropy to act as a strong identifier.

The fix

The right fix is to stop exposing entropy derived from the internal storage layout.

The cleanest mitigation is to return results in a canonical order, such as lexicographic sorting. That preserves the API's usefulness for developers while removing the fingerprinting signal. Randomizing output per call could also hide the stable ordering, but sorting is simpler, more predictable, and easier for developers to reason about.

From a security engineering standpoint, an ideal fix:

Low conceptual complexity
Minimal compatibility risk
Direct elimination of the privacy leak

Responsible disclosure

We responsibly disclosed the issue to Mozilla and to the Tor Project. Mozilla has released the fix in Firefox 150 and ESR 140.10.0, and the patch is tracked in Mozilla Bug 2024220. Because the behavior originates from Gecko’s IndexedDB implementation, downstream Gecko-based browsers, including Tor Browser, are also affected unless they apply their own mitigation.

Building for privacy

This vulnerability shows how a small implementation detail can create a meaningful privacy problem. The impact is significant. Unrelated websites can link activity across origins during the same browser runtime, and private-session boundaries are weakened because the identifier survives longer than users would expect.

The good news is that the fix is simple and effective. By canonicalizing the output before returning it, browsers can eliminate this source of entropy and restore the expected privacy boundary. This is exactly the kind of issue worth paying attention to: subtle, easy to miss, and highly instructive for anyone building privacy-sensitive browser features.

--------------------------------------------------------------------------------

6. Ars Technica: Our newsroom AI policy

Source: https://arstechnica.com/staff/2026/04/our-newsroom-ai-policy/
Site: arstechnica.com
Submitter: zdw (Hacker News)
Submitted: 2026-04-23 05:14 UTC (Hacker News)
HN activity: 42 points · 25 comments
Language: en

No extractable content.

--------------------------------------------------------------------------------

7. 5x5 Pixel font for tiny screens

Source: https://maurycyz.com/projects/mcufont/
Site: maurycyz.com
Submitter: zdw (Hacker News)
Submitted: 2026-04-19 15:19 UTC (Hacker News)
HN activity: 594 points · 124 comments
Length: 733 words (~4 min read)
Language: en-us

2026-04-18 — 2026-04-20 (Programming) Font data (C header)

All characters fit within a 5 pixel square, and are safe to draw on a 6x6 grid. The design is based off of lcamtuf's 5x6 font-inline.h, which is itself inspired by the ZX Spectrum's 8x8 font.

5x5 is the smallest size that doesn't compromise legibility:

2x2: Impossible.
3x3: Technically possible, but unreadable.
4x4: Not enough to draw "E", "M" or "W" properly.
5x5: This font.

Five by five is actually big enough to draw most lowercase letters one pixel smaller, making them visually distinct from uppercase.

Narrower 4x5 and 3x5 dimensions are possible, but would require sacrificing the M, dotted zero, and reduce U/V/Y distinctiveness.

There's no artistic reason to make all characters five wide just because a few must be... but a using a constant width makes programming a lot easier: The length of a string on screen is always 6 times the number of characters.

It also makes compact layouts much safer: There's no need to worry that a number will overflow because "8978" is longer than "1111".

The whole font takes up just 350 bytes of memory, which makes it ideally suited to 8-bit microcontrollers like the AVR128DA28 (16 kB of RAM) These are cheap, low power and robust... but they fall short on graphics:

Even a low-resolution 384x288 display has 110 thousand pixels: way too big to fit in the AVRs memory.

... except most projects don't need anywhere near that many pixels. A 160x128 or 128x64 OLED is more practical and cheaper — but these need hand-drawn, pixel-efficient fonts to make good use of them.

For reference, here's a vector font rendered at a similar scale: Actually 6 tall, but the letters are narrower, so I'll allow it.

Antialiasing, several megabytes of code, a megabyte of font data, and it's still terrible compared 350 hand-crafted bytes.

Real pixels:

Pixels aren't perfect squares, so the font won't actually look like the rendering at the top of this post: This is it on an actual screen:

I actually really like the pseudo-dropshadow effect created by the subpixels. This won't happen on monochrome displays, but the font will still look smoother than you might expect.

The gaps between pixels really help sell the "e" and "g", but this same effect should allow...

Even smaller fonts:

While 5x5 is the smallest no-compromise resolution, a 3x5 isn't too bad: There are 32,768 glyphs at this size. (27,904 are distinct)

The "M", "W" and "Q" suffer, but it's still got a distinct O and zero. Something like this might actually be a good option if you need to cram (50%) more columns into a display.

That's still readable, so what about 3x4? There are 4,096 glyphs at this size. (3,392 are distinct)

At this size, there's no way to have a distinct upper and lowercase, so I've picked whatever style works the best in the limited space. The numbers have also taken a hit, but still work ok.

How about 3x3? There are 512 glyphs at this size. (400 are distinct)

The main loss was the numbers, but the letters don't include any duplicates and are somewhat recognizable.

This font is hugely improved by being displayed on real hardware:

That means it's still too big. How about 2x3? There are 64 glyphs at this size. (44 are distinct)

Ok, this is getting ridiculous. Most letters are unrecognizable, and there are quite a few duplicates. In case you couldn't tell, the bottom line reads "Hello World".

Flipping the aspect ratio to a 3x2 makes it a lot better: There are 64 glyphs at this size. (44 are distinct) Simulated pixel grid

More letters have horizontal detail (M, W, N, Q, G, P, etc) then have vertical detail (E, F). The bottom line reads "you can probably read this", although you might have to squint or zoom out.

... and for the sake of completeness, a 2x2: There are 16 glyphs at this size. (10 are distinct)

On paper, there are 16 possible 2x2 images, but one of them is blank and 5 of them are shifted copies of another one. That leaves 10, just enough to do all the digits... but because they have no resemblance to the originals, it's more of a secret code than a font.

Related:

/projects/mcufont/mcufont.h: The 5x5 font.
/projects/mcufont/test.c: Program to preview the font.
https://lcamtuf.coredump.cx/soft/embedded/font-inline.h: The original font.
https://moonbench.xyz/projects/tiny-pixel-art-fonts/: More tiny fonts.

--------------------------------------------------------------------------------

8. A True Life Hack: What Physical 'Life Force' Turns Biology's Wheels?

Source: https://www.quantamagazine.org/what-physical-life-force-turns-biologys-wheels-20260420/
Site: Quanta Magazine
Author: By Natalie Wolchover April 20, 2026
Published: 2026-04-20
HN activity: 67 points · 13 comments
Length: 2.2K words (~10 min read)
Language: en

You’re the earliest known life form. There’s no food around right now. It would be great to go somewhere else. But you’re stuck. Really stuck. At your size (a couple of microns), water feels like tar, or rather, it feels the way being stuck in tar will eventually feel to a human. What do you do?

[One or more billion years later.] You’ve found the perfect solution.

Literally perfect.

“You can assume the system is working optimally,” said Aravinthan Samuel, a biophysicist at Harvard University.

Evolution has created the flagellar motor, a combination propeller/brain that enables single-celled bacteria to move toward food sources. It’s an electric motor that rotates at several hundred revolutions per second — faster than the flywheel in a race car engine — to twirl a tail-like flagellum that pushes the cell along. When the flagellar motor rotates counterclockwise, it propels the cell through the water 10 or more times its own length in a second. The motor can also rotate clockwise, causing the cell to tumble about randomly. This amazing, self-assembling, signal-processing, direction-switching molecular machine is so powerful yet so spare that, billions of years later, it’s still used by bacteria in virtually every gut and puddle on Earth.

Since the discovery of the bacterial flagellar motor in the 1970s, biologists and creationists alike have marveled at its design like medieval architects staring with awe at the dome of the Pantheon built by their Roman ancestors. It’s hard to fathom the level of engineering achievable by a billion years of bacterial evolution, especially with only 20 minutes between cell generations, which allows for a truly astronomical number of mutations and trial runs. Creationists hold up the bacterial flagellar motor as a prime example of intelligent design — specifically the concept of “irreducible complexity,” a biological system so intricate, they say, that it couldn’t possibly have arisen in stages through the gradual, stepwise process of Darwinian evolution.

Yet it very much did.

Over the past few decades, scientists have toiled to unravel how the flagellar motor works — namely, how it rotates and switches directions.

Now they finally have. A wave of studies since 2020 has cracked the molecular structures of the flagellar motor’s parts, including, most importantly, the small cogwheels that turn the larger cogwheel at the flagellum’s base. The final pieces of this dynamic puzzle fell into place as recently as March 2026.

“My lifelong quest is now fulfilled,” said Mike Manson, a professor emeritus of biophysics at Texas A&M University who started studying the flagellar motor in the 1970s. “I finally understand how this thing I’ve been studying for 50 years actually works. That’s about as satisfying as can be.”

The workings of the flagellar motor are ingenious indeed. But when I began interviewing these scientists about what they’ve figured out, I didn’t anticipate that the explanation of the motor would clarify all of biology for someone like me, who seeks mechanistic, physical explanations. The machine, I learned, exploits a driving force I had not known about (though biophysicists have) — the physical “life force” that powers processes in cells. This “proton motive force” doesn’t just turn the cogs of the flagellar motor; it’s the juice we all run on.

The flagellar motor was discovered by the late Howard Berg, an ingenious experimenter who spent most of his career at Harvard. Berg set out in the early 1970s to apply his training in physics to understanding how bacteria move. The problem was that, under a microscope, Escherichia coli, Salmonella, and other motile bacteria almost instantly swam out of frame. So Berg invented and built an automatic tracking microscope that could keep a bacterium in view as it moved around. “What it recorded were all the corrections that had to be made to the microscope stage in order to keep the bacterium in place, and that of course gives you a readout of what the path of the swimming bacterium was,” said Manson, who joined Berg’s project as a postdoc in 1975.

The data revealed that bacteria “run and tumble” — that is, they switch back and forth between swimming straight and rolling around chaotically. Berg theorized that bacteria change their swimming state based on the chemical gradients sensed as they swim. Their default behavior is to swim straight. If the concentration of sugars and other nutrients is increasing, the cell keeps going forward. If the concentration drops, it tumbles; reoriented in a new direction, the bacterium then resumes swimming straight. This process keeps the bacterium in the vicinity of harvestable molecules, which it absorbs through channels in its cell wall and membrane.

A transmission electron microscopy image reveals the cluster of flagella that a Pseudomonas fluorescens bacteria uses to move around water in soil.

Dr Tony Brain/Science Photo Library

Berg guessed that the flagellar motor was a rotor that turned the flagellum like a screw. “He did it by sticking two cells together by their flagella and seeing them spinning in opposite directions from each other,” Manson said. “From that, with no knowledge, he hypothesized that the bacterial flagellum rotates. Way ahead of his time. That was 50 years before understanding how this motor works.”

Further experiments indicated that the flagellar motor also switches direction. When its flagella — bacteria typically have several protruding from their surfaces — are all spinning counterclockwise, they form a bundle that trails behind the swimming cell like a braid in the wind, steering it straight. But as soon as one flagellar motor reverses direction and starts rotating clockwise, the bundle falls apart; the reverse-twirling filament unravels the braid and puts the cell’s flagellar motors at cross-purposes, kicking the cell around.

Before Berg’s work, “the idea of a molecular motor was bonkers — no way anything rotates,” said Samuel, Berg’s former student who now runs a Harvard lab of his own. It could wiggle, sure, but rotate? “It requires a certain geometry that people didn’t think was accessible to biology.”

Au contraire. “Biology can build wheels,” Samuel said. “Now we know.”

Improvements over the last 15 years in an imaging technique called cryo-EM (cryogenic electron microscopy) have enabled researchers to see the flagellar motor’s component parts. That has clarified how it works.

At the base of the motor is the “C ring” (or “cytoplasmic ring”), a ring of 34 identical proteins floating in the cytoplasm within the cell membrane. Scientists in the 1980s and ’90s figured out that when the C ring rotates, the flagellum does too. But why and how it rotates wasn’t obvious.

David S. Goodsell/PDB101.rcsb.ord/Modified by Quanta Magazine

The stars of the show, recent research showed, are the motor’s “stators,” smaller protein complexes that anchor themselves above and outside the C ring. The number of stators varies by bacterial species (E. coli has 10 or 12 available per flagellum), and how many lock into the C ring at a given time depends on the weight of the cell or the viscosity of the surrounding fluid.

Each stator consists of two central proteins that dangle from the cell wall and five proteins of a different kind that form a pentagonal ring around the pair. This pentagonal structure is the part that rubs up against the C ring.

The 5:2 geometry of the stators was revealed in 2020 in a pair of cryo-EM studies, one by Susan Lea and a team at the University of Oxford, and one from a group led by Nicholas Taylor of the University of Copenhagen and Marc Erhardt of Humboldt University of Berlin. The finding pointed to a hypothesis about how the whole motor works: The stators’ pentagonal rings rotate, which then turns the larger C ring, and with it the whole flagellum.

Each pentagonal ring turns like a turnstile, one-tenth of a revolution at a time. What pushes through the turnstile is a stream of protons — the same positively charged particles found in atoms. Protons flow into cells of their own accord, for reasons I’ll get to. This is the proton motive force.

The asymmetric positioning of two proteins inside a pentagonal ring allows a proton from outside the cell to weakly bond to one of them. As the proteins jostle, the proton unbinds, exerting torque on the ring as it goes. That creates an opportunity for the same process to take place with the other central protein. In this way, protons effectively pedal the engine of the flagellar motor. Every second, more than 2,000 of them pass through the pentagonal turnstiles. In December 2025, Samuel published the results of an experiment that verified this.

Protons always want to flow into cells, never out. In passing that way, they always push the pentagonal rings clockwise. Normally, this turns the C ring counterclockwise (like the opposite turning of interlocking gears), which propels the swimming cell forward. How, though, can the flagellar motor switch directions? In 2024, another pair of cryo-EM studies, from Lea, then with a team at the National Institutes of Health, and a group led by Tina Iverson at Vanderbilt University, revealed the answer.

Recall that a flagellar motor switches directions, causing the bacterium to tumble, when environmental conditions seem to be getting worse. When fewer nutritious molecules drift in, the bacterium “phosphorylates” proteins called CheY, tagging them with phosphorus atoms. Within milliseconds, phosphorylated CheY molecules diffuse around the cell, and one of them binds to one of the C-ring proteins. This small change triggers a transformation: The protein flips into a different structural configuration, which flips the next protein, and then the next. Almost instantly the whole C ring reshapes itself, like a hair clip snapping into the other of its two stable forms. Samuel’s team confirmed that the system is sensitive to a single signaling molecule in a study published in March 2026.

While the C ring is in its altered shape, the stators — the little clockwise-revolving motors — rotate against the inner edge of the C ring, rather than its outer edge. As a result, the C ring turns clockwise too. The flagellar bundle falls apart, and the cell tumbles.

Soon enough, the unstable phosphorus atom falls off the CheY protein, causing the proteins of the C ring to flip back to their original stable formation and turn counterclockwise again. The bacterium returns to forward movement, in a new direction, is search of more food.

“It’s a really elegant way of turning a unidirectional power into bidirectional rotation of the large object,” said Lea, who is now at St. Jude Children’s Research Hospital.

The proton motive force that drives the flagellar motor was proposed in 1961 by Peter Mitchell, a biochemist who worked out of his own private lab at a country estate in Cornwall, England. Though initially dismissed and even ridiculed, Mitchell went on to win the 1978 Nobel Prize in Chemistry for his idea that a current of protons constantly flows into the cell as the cell vigorously pumps them back out, and that this is the driving force behind key cellular processes.

Protons flow in because they’re diffusing from an area of high concentration (outside the cell) to an area of low concentration (inside). There are fewer than 100 free protons inside a bacterium at a time, while a similar volume of the surrounding water has tens of thousands. The cell maintains this state with machines called electron transport chains that pump out thousands of protons per second. As protons are pumped out, thousands more flow in, drawn by the net negative electric charge and the general tendency for entropy to rise as particles (in this case, protons) spread out in space ever more evenly. Cells have rigged up all kinds of molecular machines that, like water mills on rivers, take advantage of proton currents coming into the cell.

“It boggles the normal human understanding of how things work,” Manson said. “How can you have thousands and thousands of protons coming into the cell every second and still have only a few dozen inside the cell? Because they bind to something, they get pumped out again. The equilibria are so incredibly fast.”

So what makes the cell go, what breathes life into the atomic arrangements, is the efficient removal of protons so that more protons will flow. “If you were to open up a channel to protons, they would come pouring into the cell, and the proton motive force would be gone instantly,” Manson said. He’s seen this happen, when cells starve and can’t pump enough protons out. The voltage drops to nothing, and the cell’s machinery shuts down. If you’re a bacterium, your flagellar motor stops. You’re stuck.

Rarely have I loved biology more than when marveling at the flagellar motor and the influx of protons that turns its gears. “The entropic energy of the proton motive force gets converted into the kinetic energy of the rotation,” Manson said. “That’s all it is. All of it is just that. If you understand that, you basically understand the underpinnings of all that happens in biology.”

--------------------------------------------------------------------------------

9. The Onion to Take over InfoWars

Source: https://www.nytimes.com/2026/04/20/business/infowars-alex-jones-the-onion.html
Site: The New York Times
Author: Benjamin Mullin, Elizabeth Williamson
Published: 2026-04-20
HN activity: 135 points · 22 comments
Length: 486 words (~3 min read)
Language: en

You have a preview view of this article while we are checking your access. When we have confirmed access, the full article content will load.

A new deal, which would allow The Onion to use the Infowars name and website address, must be approved by a Texas judge.
The Onion, a satirical news outlet, wants to convert the right-wing Infowars site into a parody of itself.Credit...Jamie Kelter Davis for The New York Times

The Onion Has a New Plan to Take Over Infowars

A new deal, which would allow The Onion to use the Infowars name and website address, must be approved by a Texas judge.

Listen · 6:39 min

April 20, 2026

When Infowars, the website founded by the right-wing conspiracist Alex Jones, came up for sale two years ago, an unlikely suitor stepped up. The Onion, a satirical news outlet, planned to convert the site into a parody of itself.

That sale was scuttled by a bankruptcy court. Now, The Onion has re-emerged with a new plan: licensing the website from Gregory Milligan, the court-appointed manager of the site.

On Monday, Mr. Milligan asked Maya Guerra Gamble, a judge in Texas’ Travis County District Court overseeing the disposition of Infowars, to approve that licensing agreement in a court filing. Under the terms, The Onion’s parent company, Global Tetrahedron, would pay $81,000 a month to license Infowars.com and its associated intellectual property — such as its name — for an initial six months, with an option to renew for another six months.

The licensing deal has been agreed to by The Onion and the court-appointed administrator. But it is not effective until Judge Guerra Gamble approves it, and Mr. Jones could appeal any ruling. That means the fate of Infowars remains in limbo until the court rules, probably sometime in the next two weeks. Mr. Jones continues to operate Infowars.com and host its weekday program, “The Alex Jones Show.”

Mr. Jones had no immediate comment.

The battle over Infowars has been a long and fraught saga, and Mr. Jones — a notorious peddler of lies and invective — has used his bully pulpit for more than a year to crusade against The Onion’s efforts to take over the platform. The site is in limbo because of a series of defamation lawsuits against Mr. Jones filed by families of victims of the mass shooting in 2012 at Sandy Hook Elementary School in Connecticut, which Mr. Jones falsely claimed was a hoax.

Image
A memorial for the students and teachers who died in the Sandy Hook school shooting in 2012.Credit...Ángel Franco/The New York Times

Thank you for your patience while we verify access. If you are in Reader mode please exit and log into your Times account, or subscribe for all of The Times.

Thank you for your patience while we verify access.

Already a subscriber? Log in.

Want all of The Times? Subscribe.

Advertisement

SKIP ADVERTISEMENT

--------------------------------------------------------------------------------

10. Over-editing refers to a model modifying code beyond what is necessary

Source: https://nrehiew.github.io/blog/minimal_editing/
Site: nrehiew.github.io
Submitter: pella (Hacker News)
Submitted: 2026-04-22 17:51 UTC (Hacker News)
HN activity: 358 points · 204 comments
Length: 3.7K words (~16 min read)
Language: en-us

Code for this post is available here.

AI-assisted coding has become the norm and with tools like Cursor, GitHub Copilot, Claude Code, Codex, we are increasingly letting models touch our code. If you have used any of these tools in the past year, you have probably experienced something like this: you ask the model to fix a simple bug (perhaps a single off-by-one error, or maybe a wrong operator). The model fixes the bug but half the function has been rewritten. An extra helper function has appeared. A perfectly reasonable variable name has been renamed. New input validation has been added. And the diff is enormous.

I refer to this as the Over-Editing problem where models have the tendency to rewrite code that didn’t need rewriting. This matters more than it might seem. Code review is already a bottleneck and reviewers need to understand what changed, why it changed, and whether the change is safe. A model that rewrites entire functions, even correctly, makes this job dramatically harder as the code is now completely unrecognizable.

In this post, I will investigate this problem: whether existing LLMs have a tendency to over-edit and whether we can train models to be more faithful editors.

Over-Editing
Figure 1: A classic example of the Over-editing problem. GPT-5.4 (High) rewrites the entire function when the correct fix is simply changing range(len(x) - 1) to range(len(x)).

Over-editing refers to a model modifying code beyond what is strictly necessary to fix the problem at hand. To be precise: a model is over-editing if its output is functionally correct but structurally diverges from the original code more than the minimal fix requires.

The example in Figure 1 illustrates this well. The bug is a single off-by-one error in a range() call (range(len(x) - 1) should be range(len(x))) and the correct fix is a single line. GPT-5.4 (with high reasoning effort) responds by rewriting the entire function: it adds explicit None checks, introduces np.asarray conversions with dtype=float, adds finite-value masking, validates array sizes, changes the curve_fit call signature, and replaces the plotting logic entirely. While the output passes the tests and is functionally correct, the diff is enormous, and none of those additions were asked for or even necessary.

It helps to think about this in terms of the kind of work being done. Software engineering broadly splits into two modes: green-field (building something new from scratch) and brown-field (working within an existing codebase). Specifically in brown-field, the existing code has been understood by the team and has been deliberately written the way it was. The model’s job is to fix the issue and nothing else.

A common piece of advice for working with AI coding tools is to simply write more tests because if the tests pass, the code is fine. However, Over-editing is a brown-field failure where unlike correctness failures, it is invisible to test suites. As models generate more code, engineers have more to review and over-editing makes that harder. There is more complex logic to parse, more lines of code to read, and a higher chance that overall codebase quality quietly degrades.

Measuring Over-Editing

To study over-editing, we first need a dataset of code edits where the “ground truth” edit is well-defined with some degree of “minimality”. Rather than using another LLM to introduce bugs (which is what most existing benchmarks do), we programmatically corrupt 400 problems from BigCodeBench which gives us more fine-grained control: things like flipping a comparison operator (< → <=), swapping + for -, or changing boolean values (True → False).1 Each corrupted sample remains syntactically valid and verified to break the corresponding test cases. This ensures that the ground truth edit is exactly the reversal of the corruption and nothing more, thus making this edit minimal by construction. We can then evaluate not just whether a model fixes the bug, but how much else it changed in the process.

Metrics

Most coding benchmarks evaluate models on correctness using some variant of Pass@1. However, Pass@1 is necessary but not sufficient. A model can score perfectly on Pass@1 while completely rewriting every function it touches. For this experiment, we need metrics that capture how much the model changed beyond what was required.

Token-level Levenshtein Distance. Unlike standard Levenshtein which counts the minimum number of character insertions, deletions, and substitutions to transform one string into another, we use a Python token-level variant. The code is first passed through Python’s tokenizer, which splits it into its atomic syntactic units (def, add, (, a, ,, b, ), :, return, a, +, b). Levenshtein is then computed over this token sequence rather than raw characters.

For example, consider the following two functions:

def add(a, b):                    def someotherfunctionname(a, b):
    return a + b                      return a + b


Character-level Levenshtein gives a distance of 19. Token-level Levenshtein gives a distance of 1 since someotherfunctionname becomes a single token. We normalize by total token count so scores are comparable across functions of different lengths.

In addition, rather than simply comparing the model’s output to the ground truth, we compare both against the corrupted input. Let $C$ be the corrupted solution, $G$ the ground truth, and $M$ the model’s output. The true minimal edit (simply the reversal of the corruption) is $D_{\text{true}} = d(G, C)$ and the model’s edit is $D_{\text{model}} = d(M, C)$, giving a relative patch score:$$S(M) = D_{\text{model}} - D_{\text{true}}$$

Values closer to zero indicate the model’s patch resembles the true minimal fix. The intuition is that we can interpret the original uncorrupted solution as the best possible edit to the corrupted solution, compute the scores for this best possible patch, and then compare with the model’s output.

Added Cognitive Complexity. Cognitive Complexity (an improvement over Cyclomatic Complexity) measures how hard code is to understand. It penalizes nesting, recursion, mixed logical operators, and non-obvious control flow. For example a straight line of code with no branches is much easier to read than something that requires a reader to hold state, such as an if, a loop, or try/except. An example is shown below:

def process(items):
    result = []
    for item in items:          # +1
        if item > 0:            # +2 (nesting penalty: inside a loop)
            if item % 2 == 0:   # +3 (nesting penalty: two levels deep)
                result.append(item)
    return result
# Cognitive Complexity: 6


Since all our corruptions change values rather than structure, the correct fix should always add zero Cognitive Complexity. Any increase in the model’s output was introduced unprompted and is unnecessary. We report the absolute difference between the model’s output and the original, which should be zero for a faithful minimal edit. Values below 0 are also unwanted as unnecessary simplifications to code are also undesirable.

Do Models Over-Edit?

Yes, even frontier ones.

Model	Pass@1 ↑	Normalized Levenshtein ↓	Added Cognitive Complexity ↓
Reasoning Models
GPT-5.4	0.723	0.395	2.313
Claude Opus 4.6	0.912	0.060	0.200
Gemini 3.1 Pro Preview	0.858	0.145	0.501
GLM 5 High	0.859	0.099	0.320
Qwen 3.6 Plus	0.858	0.145	0.048
Kimi 2.5	0.835	0.151	0.770
DeepSeek R1	0.820	0.232	0.673
DeepSeek Chat V3.1	0.795	0.232	0.694
GPT-5 High	0.713	0.438	3.832
Non-Reasoning Models
GPT-5.4	0.770	0.327	1.563
Claude Opus 4.6	0.900	0.079	0.313
Gemini 3.1 Pro Preview	0.860	0.129	0.358
GLM 5	0.840	0.097	0.235
Qwen 3.6 Plus	0.870	0.106	0.605
Kimi 2.5	0.770	0.140	0.687
DeepSeek V3	0.800	0.201	0.803
DeepSeek Chat V3.1	0.802	0.235	1.223
GPT-5 Minimal	0.738	0.397	2.877
Table 1: Performance comparison of reasoning and non-reasoning models. Models that appear in both are hybrid models. Best scores for each metric are bolded.

Among the latest frontier models, GPT-5.4 over-edits the most. It has a Levenshtein of 0.39 in reasoning mode and 0.33 in non-reasoning, with Added Cognitive Complexity of 2.31 and 1.56 respectively. Despite this, its Pass@1 is only 0.723 and 0.770, making it one of the weakest correctness performers too. Claude Opus 4.6 achieves the highest Pass@1 of any model evaluated (0.912 reasoning, 0.900 non-reasoning) while also producing the smallest diffs with Levenshtein of 0.06 and 0.08, Added Cognitive Complexity of 0.20 and 0.31. Gemini 3.1 Pro Preview sits in similar territory, with GLM 5 arguably the most conservative model among the open weight ones.

Does Prompting Help?

Many papers that claim to uncover a new LLM failure mode do not first test whether the model can do the task when asked directly. A behavior that looks impossible in one setup may be easy under an explicit prompt, so I investigate the impact of adding “IMPORTANT: Try to preserve the original code and the logic of the original code as much as possible” to the prompt.
Figure 2: Change in Pass@1 and Levenshtein Distance when models are prompted explicitly to keep edits minimal. Models are colour coded by reasoning mode.

With explicit prompting, every model improves and reduces its Levenshtein Distance, and with the exception of DeepSeek R1/v3, also improves its Pass@1. One interpretation is that the constraint to make minimal edits inadvertently narrows the search space of possible fixes, steering models toward the kind of precise, targeted change that is more likely to be correct. The effect on Levenshtein Distance is much more pronounced in reasoning models, which is likely the result of their stronger instruction following ability.

Does Reasoning Mean Overthinking and Over-Editing?

Reasoning models are generally assumed to be better at coding tasks, and they do score higher on Pass@1. But since we are interested in the style of these edits, we need to look at the results through a different lens.
Figure 3: Comparison of the Levenshtein Distance of reasoning and non-reasoning models. Models are grouped into pairs of one reasoning and non-reasoning. Lower bars are better. Labels above each pair indicate the number of problems where both models get the answer correct.

Figure 3 groups the models into pairs where each pair contains a reasoning and non-reasoning model from the same family. For each pair, we plot the Levenshtein Distance of only the samples where both models get the answer correct. This allows us to isolate edit minimality from correctness since a model that fails more often has fewer samples to over-edit on, which would otherwise bias the comparison.

In the generic setting (top), reasoning models over-edit more than their non-reasoning counterparts in the majority of pairs. DeepSeek V3, GPT-5, GPT-5.4, Gemini 3.1 Pro Preview, Qwen 3.6 Plus, and Kimi 2.5 all show the reasoning bar sitting higher. Reasoning models seems to naturally have more elaborate rewrites where the model reasons its way into a “better” implementation rather than a minimal fix. The notable exception is Claude Opus 4.6, where the reasoning variant edits substantially less than its non-reasoning counterpart.

In the explicit setting (bottom), the picture changes considerably. Once models are told to preserve the original code, reasoning models have much lower Levenshtein Distance than their non-reasoning counterparts and match or undercut them in almost every pair. Claude Opus 4.6 (reasoning) drops to the lowest Levenshtein of any model in this setting. GPT-5 and GPT-5.4 both see their reasoning variants fall significantly, though GPT-5.4’s non-reasoning model still edges ahead.

Therefore, the takeaway is that the default behavior of most reasoning models is to over-edit. Left unconstrained, the extended reasoning gives models more room to “improve” code that doesn’t need improving. But, that same reasoning capacity also makes them better at following the constraint once it is given. The gap between the generic and explicit setting is consistently larger for reasoning models, which suggests the over-editing is not a fundamental limitation but rather a default behavior that can be overridden.

Training

A natural next question: can we train models to be more faithful editors? For this experiment, I start with Qwen3 4B 2507 Instruct as the base model. I use both 0-shot and 8-shot prompting together with the explicit instruction to preserve the original code as baselines. All other methods are prompted in the generic setting without the explicit instruction during evaluation.

Setup

I first create a synthetic dataset of corrupted problems from DeepCoder using the same approach detailed above. In addition to this programmatically generated dataset, I also use the base Qwen3 4B 2507 Instruct model to create a synthetic dataset via self-distillation. Concretely, I prompt the model to generate 8 completions per problem, keeping only the samples that are functionally correct and ranking them by Levenshtein Distance. The model is then trained without the explicit instruction similar to Context Distillation.

We evaluate 4 different methods:

SFT: Supervised fine-tuning directly on the programmatically generated dataset.
rSFT: Rejection-sampled SFT where we train on the completions with the 3 lowest Levenshtein Distances for each sample from the self-distillation dataset.
DPO: Preference optimization between the completions with the highest and lowest Levenshtein Distances for each sample from the self-distillation dataset.
RL: Reinforcement learning with a reward combining functional correctness and Levenshtein-based edit minimality. The reward structure is a weighted sum of the Levenshtein Distance and a penalty for failing to pass the test cases:

r = r_edit + 0.1   # if generation passes all test cases
r = -0.2           # otherwise
 
# r_edit is normalized Levenshtein-based reward


Does It Work?

Model	Pass@1 ↑	Norm. Levenshtein ↓	Added CC ↓
Baseline (0-shot)	0.735	0.169	0.731
Baseline (8-shot)	0.775	0.115	0.479
SFT	0.932	0.002	0.000
rSFT	0.782	0.100	0.435
DPO	0.752	0.021	0.113
RL	0.802	0.046	0.112
Table 2: Performance comparison of various fine-tuning methods when trained on in-domain data: the training and test set have the same corruption types.

On the first attempt, SFT is almost suspiciously good as the resultant model seems to have perfectly learned the task. I found this extremely surprising and had the initial hypothesis that the model was just memorizing the reversal for this set of corruptions rather than learning a general minimal editing behavior. As a result, I re-created both synthetic datasets but instead using a completely different set of corruptions than the evaluation set to test for generalization. The core hypothesis was that the model was simply learning to reverse a particular set of corruptions.

Does It Generalize?

Model	Pass@1 ↑	Norm. Levenshtein ↓	Added CC ↓	LiveCodeBench Change ↓
Baseline (0-shot)	0.735	0.169	0.731	—
Baseline (8-shot)	0.775	0.115	0.479	—
SFT	0.458	−0.008	0.006	−0.149
rSFT	0.780	0.107	0.501	−0.069
DPO	0.787	0.092	0.348	−0.046
RL	0.782	0.050	0.185	+0.006
Table 3: Performance comparison of various fine-tuning methods when trained on out-of-domain data: the training and test set have different corruption types.

SFT collapses entirely out-of-domain. Pass@1 drops to 0.458 as the model has learned to make specific minimal changes regardless of whether they fix anything. rSFT and DPO are both better but the overall improvement is slight compared to the 8-shot baseline. This indicates that training on traces distilled from the base model itself is sufficient to induce some degree of generalization. RL is the only method that generalizes cleanly, improving on all three metrics over both baselines. The fact that the RL model has larger improvements on Levenshtein Distance and Added Cognitive Complexity than on Pass@1 is further evidence that it is not just memorizing corruption reversals but has actually generalized to minimal editing.

Given the SFT model’s inability to even fix bugs, we also wanted to look at Catastrophic Forgetting. Specifically, whether fine-tuning for minimal editing degrades general coding ability. We evaluate all fine-tuned models on LiveCodeBench v6 and compare against the original pretrained model. Ideally, performance should remain similar after training.

SFT shows a 43% performance degradation, which aligns with our earlier finding that it can no longer identify and fix basic bugs. The rSFT and DPO models experience slight degradation, indicating that even though they were trained on samples generated by the original model, the nature of the task still results in some degree of Catastrophic Forgetting. The RL model, however, does not experience any degradation. Combined with the fact that it also performs the task best, RL is able to teach the model a new behavior without degrading previously acquired abilities. This aligns with broader work showing that SFT memorizes while RL generalizes.

Inspired by other work showing that RL’s has a bias towards KL-minimal solutions reduces forgetting, we can interpret these results from a distributional perspective. Specifically, the distribution of the programmatically generated dataset is very different from the model’s original distribution. As a result, the SFT model’s distribution has been heavily modified and thus suffers from Catastrophic Forgetting. In contrast, for both rSFT and DPO, the distribution of the self-distilled dataset is more aligned and is thus less heavy-handed in nature when shaping the trained model’s distribution. Therefore, it is likely that the degree of Catastrophic Forgetting is proportional to the difference between the model’s original distribution and the distribution of the task training data.

Additional Experiments

RL with LoRA: Do We Need Full Fine-Tuning?

Given that this task is less about teaching the model new knowledge and more about tuning its style on an existing task, we also wanted to explore whether LoRA would be sufficient. Since the base model already has the capability to edit code and fix bugs, full fine-tuning might not be necessary.

Rank	Pass@1 ↑	Norm. Levenshtein ↓	Added CC ↓	LiveCodeBench Δ ↑
1	0.738	0.166	0.676	-0.022
8	0.775	0.112	0.426	-0.022
16	0.805	0.087	0.328	-0.005
32	0.795	0.065	0.235	-0.011
64	0.797	0.051	0.160	+0.001
Full RL (best)	0.782	0.050	0.185	+0.006
Table 4: Performance comparison of various LoRA ranks.

The results support the hypothesis. LoRA at rank 64 nearly matches full RL on Levenshtein Distance and beats it on Added Cognitive Complexity. LiveCodeBench dips slightly at low ranks but rank 64 is effectively flat, and full RL remains best overall. There is a clean monotonic trend as rank increases where both Levenshtein and Added CC fall steadily from rank 1 to rank 64. The rate of improvement is not uniform though, as the biggest gains happen early. Rank 1 to 16 accounts for most of the Levenshtein reduction (0.166 → 0.087), while rank 16 to 64 closes the remaining gap more gradually (0.087 → 0.051). Ranks 1 and 8 also trade correctness for edit minimality which could be explained by a lack of sufficient capacity to learn both reward functions and instead bias towards the higher-reward edit minimality.

This is consistent with the idea that a small number of additional parameters is enough to shift the model’s editing behavior and more capacity beyond a certain point yields diminishing returns. For style-level behavioral changes where the underlying capability is already present, LoRA is likely sufficient and considerably cheaper to run.
A Note on Reward Hacking

The original version of the reward function had a bug where rollouts with no successful execution were given a hardcoded reward of 0. This ended up being a higher reward than rollouts with successful execution since the Levenshtein distance was negated to make it "higher is better." I found it interesting that even with this buggy reward function, full RL was still able to learn the task. Only with LoRA did the model fail to learn it, seemingly reward hacking by learning to never output functionally correct code which triggered an investigation into the environment. With the fixed reward function, the results of full RL improved only slightly.

Does It Scale?

Lastly, to validate the results across larger models, I apply the same RL recipe using the Out-of-Domain data onto the larger Qwen3 14B model. Even at larger parameter counts, there are performance gains across the board with higher Pass@1, lower Levenshtein Distance, lower Added Cognitive Complexity, and no indication of Catastrophic Forgetting. This gives me the confidence that such a recipe for the task of Minimal Code Editing can be extended to various models of different scales.

Model	Pass@1 ↑	Norm. Levenshtein ↓	Added CC ↓	LiveCodeBench Δ ↑
Baseline: 14B	0.770	0.136	0.315	-
RL	0.833	0.059	0.165	+0.011
Table 5: Performance comparison of RL when trained on Qwen3 14B.

Final Thoughts
A Note on GPT 5.4 and Opus 4.6

It is notable that, despite being a frontier model, GPT 5.4 struggles on the minimal editing task, especially in the generic setting and relative to Opus 4.6. Figure 3, however, shows that it sees one of the largest gains when explicitly prompted in reasoning mode, second only to its predecessor GPT 5, which suggests strong instruction following capabilities. By contrast, Opus 4.6 shows one of the smallest improvements, though that may simply reflect its already strong baseline performance. This pattern fits the broader view that while GPT 5.4 often defaults to overly verbose code ("slop"), its behavior can be steered effectively with proper prompting.

Taken together, the results suggest that Over-Editing is both widespread and measurable. At the same time, the prompting results show that this is not purely a capability limitation. Especially for reasoning models, a simple instruction to preserve the original code leads to much more faithful edits, which is an encouraging sign that when models like GPT 5.4 over-edit, they can still be steered toward higher-quality code.

Further, the training results suggest that this behavior can be improved. Reinforcement learning produced more faithful editors without degradation in general coding ability, and those gains held up across both the 4B and 14B Qwen3 models.

Admittedly, the field of code benchmarks has gone on from simple single function evaluations to more agentic evaluation paradigms like SWE-Bench Pro. Relative to those, evaluating bug fixes in isolated functions is still a fairly contained task given the nature of the bugs.

Even so, in my experience, despite the prevalence of Over-Editing across all frontier coding models today, it has long been difficult to quantify in realistic settings. I hope this work can serve as a first step toward evaluating and improving the minimal editing capability of coding models, and ultimately the overall quality of AI-generated code.

Acknowledgements:
I am grateful to my supervisor A/P Min-Yen Kan and my advisor Tongyao Zhu for their guidance, and to Prime Intellect for sponsoring the compute and API costs of this project.

The full list of corruptions can be found in the code. ↩︎

© 2026. All rights reserved.

--------------------------------------------------------------------------------

11. Tempest vs. Tempest: The Making and Remaking of Atari's Iconic Video Game

Source: https://tempest.homemade.systems
Site: Tempest vs Tempest
Submitter: mwenge (Hacker News)
Submitted: 2026-04-23 01:02 UTC (Hacker News)
HN activity: 68 points · 22 comments
Length: 221 words (~1 min read)
Language: en-US

TEMPEST vs TEMPEST is a book-length attempt to explore and understand the code and craft of Dave Theurer's 'Tempest' (1981) and Jeff Minter's 'Tempest 2000' (1994).

The idea is to explain how lots of different little things in each of the games actually work, down to the level of how they are implemented in the 6502 (Tempest) and 68K Motorola (Tempest 2000) assembler source code.

I tried to keep it light and digestible so the book consists of lots of little chapters, each one presenting a hopefully-tasty morsel from one of the games.

You can download and read the book here (9MB). A dual-page view in your PDF reader is recommended to aid viewing code and commentary side-by-side.

If bandwidth is no object, here is a high resolution version with better quality pictures (27MB).

The book is free, but if you like it you can gift what you want.

a peek inside

source & more

Find out more about the making of this book at its github repository.

you may also like

IRIDIS ALPHA THEORY , a book length treatment of Iridis Alpha that goes into the game's mechanics in just about the same insane level of detail as this one.

psychedelia syndrome , a book length treatment of psychedelia exploring the full mechanics and source code of Jeff Minter's Psychedelia.

--------------------------------------------------------------------------------

12. Website streamed live directly from a model

Source: https://flipbook.page/
Site: Flipbook
Submitter: sethbannon (Hacker News)
Submitted: 2026-04-22 18:01 UTC (Hacker News)
HN activity: 260 points · 77 comments
Language: en

https://x.com/zan2434/status/2046982383430496444 (https://xcancel.com/zan2434/status/2046982383430496444)

--------------------------------------------------------------------------------

13. Technical, cognitive, and intent debt

Source: https://martinfowler.com/fragments/2026-04-02.html
Site: martinfowler.com
Author: Martin Fowler: 02 Apr 2026
Submitted: 2026-04-22 16:11 UTC (Hacker News)
HN activity: 259 points · 67 comments
Length: 1.1K words (~5 min read)

As we see LLMs churn out scads of code, folks have increasingly turned to Cognitive Debt as a metaphor for capturing how a team can lose understanding of what a system does. Margaret-Anne Storey thinks a good way of thinking about these problems is to consider three layers of system health:

Technical debt lives in code. It accumulates when implementation decisions compromise future changeability. It limits how systems can change.
Cognitive debt lives in people. It accumulates when shared understanding of the system erodes faster than it is replenished. It limits how teams can reason about change.
Intent debt lives in artifacts. It accumulates when the goals and constraints that should guide the system are poorly captured or maintained. It limits whether the system continues to reflect what we meant to build and it limits how humans and AI agents can continue to evolve the system effectively.

While I’m getting a bit bemused by debt metaphor proliferation, this way of thinking does make a fair bit of sense. The article includes useful sections to diagnose and mitigate each kind of debt. The three interact with each other, and the article outlines some general activities teams should do to keep it all under control

❄ ❄

In the article she references a recent paper by Shaw and Nave at the Wharton School that adds LLMs to Kahneman’s two-system model of thinking.

Kahneman’s book, “Thinking Fast and Slow”, is one of my favorite books. Its central idea is that humans have two systems of cognition. System 1 (intuition) makes rapid decisions, often barely-consciously. System 2 (deliberation) is when we apply deliberate thinking to a problem. He observed that to save energy we default to intuition, and that sometimes gets us into trouble when we overlook things that we would have spotted had we applied deliberation to the problem.

Shaw and Nave consider AI as System 3

A consequence of System 3 is the introduction of cognitive surrender, characterized by uncritical reliance on externally generated artificial reasoning, bypassing System 2. Crucially, we distinguish cognitive surrender, marked by passive trust and uncritical evaluation of external information, from cognitive offloading, which involves strategic delegation of cognition during deliberation.

It’s a long paper, that goes into detail on this “Tri-System theory of cognition” and reports on several experiments they’ve done to test how well this theory can predict behavior (at least within a lab).

❄ ❄ ❄ ❄ ❄

I’ve seen a few illustrations recently that use the symbols “< >” as part of an icon to illustrate code. That strikes me as rather odd, I can’t think of any programming language that uses “< >” to surround program elements. Why that and not, say, “{ }”?

Obviously the reason is that they are thinking of HTML (or maybe XML), which is even more obvious when they use “</>” in their icons. But programmers don’t program in HTML.

❄ ❄ ❄ ❄ ❄

Ajey Gore thinks about if coding agents make coding free, what becomes the expensive thing? His answer is verification.

What does “correct” mean for an ETA algorithm in Jakarta traffic versus Ho Chi Minh City? What does a “successful” driver allocation look like when you’re balancing earnings fairness, customer wait time, and fleet utilisation simultaneously? When hundreds of engineers are shipping into ~900 microservices around the clock, “correct” isn’t one definition — it’s thousands of definitions, all shifting, all context-dependent. These aren’t edge cases. They’re the entire job.

And they’re precisely the kind of judgment that agents cannot perform for you.

Increasingly I’m seeing a view that agents do really well when they have good, preferably automated, verification for their work. This encourages such things as Test Driven Development. That’s still a lot of verification to do, which suggests we should see more effort to find ways to make it easier for humans to comprehend larger ranges of tests.

While I agree with most of what Ajey writes here, I do have a quibble with his view of legacy migration. He thinks it’s a delusion that “agentic coding will finally crack legacy modernisation”. I agree with him that agentic coding is overrated in a legacy context, but I have seen compelling evidence that LLMs help a great deal in understanding what legacy code is doing.

The big consequence of Ajey’s assessment is that we’ll need to reorganize around verification rather than writing code:

If agents handle execution, the human job becomes designing verification systems, defining quality, and handling the ambiguous cases agents can’t resolve. Your org chart should reflect this. Practically, this means your Monday morning standup changes. Instead of “what did we ship?” the question becomes “what did we validate?” Instead of tracking output, you’re tracking whether the output was right. The team that used to have ten engineers building features now has three engineers and seven people defining acceptance criteria, designing test harnesses, and monitoring outcomes. That’s the reorganisation. It’s uncomfortable because it demotes the act of building and promotes the act of judging. Most engineering cultures resist this. The ones that don’t will win.

❄ ❄ ❄ ❄ ❄

One the questions comes up when we think of LLMs-as-programmers is whether there is a future for source code. David Cassel on The New Stack has an article summarizing several views of the future of code. Some folks are experimenting with entirely new languages built with the LLM in mind, others think that existing languages, especially strictly typed languages like TypeScript and Rust will be the best fit for LLMs. It’s an overview article, one that has lots of quotations, but not much analysis in itself - but it’s worth a read as a good overview of the discussion.

I’m interested to see how all this will play out. I do think there’s still a role for humans to work with LLMs to build useful abstractions in which to talk about what the code does - essentially the DDD notion of Ubiquitous Language. Last year Unmesh and I talked about growing a language with LLMs. As Unmesh put it

Programming isn’t just typing coding syntax that computers can understand and execute; it’s shaping a solution. We slice the problem into focused pieces, bind related data and behaviour together, and—crucially—choose names that expose intent. Good names cut through complexity and turn code into a schematic everyone can follow. The most creative act is this continual weaving of names that reveal the structure of the solution that maps clearly to the problem we are trying to solve.

--------------------------------------------------------------------------------

14. Plexus P/20 Emulator

Source: https://spritetm.github.io/plexus_20_emu/
Site: spritetm.github.io
Submitter: hggh (Hacker News)
Submitted: 2026-04-19 11:53 UTC (Hacker News)
HN activity: 17 points · 1 comments
Length: 111 words (~1 min read)

This is an emulator for a Plexus P/20 system, which is an Unix server from the '80's. It runs SystemV Unix on a dual 68010 processor mainboard. Adrian Black (from Adrian's Digital Basement) did some videos on the topic of getting one to work: 1 2 3 4 5.

This emulator is written in C. While it can be ran as a native program, the version you're looking at right now is compiled to webassembly using Emscripten. It uses xterm.js for the fancy terminal, which is connected to Emscripten via xterm-pty.

Plexus emulator © 2024 Sprite_tm and contributors. Licensed under the MIT license. Source code for this emulator is on Github.

--------------------------------------------------------------------------------

15. Ping-pong robot beats top-level human players

Source: https://www.reuters.com/sports/ping-pong-robot-ace-makes-history-by-beating-top-level-human-players-2026-04-22/
Site: reuters.com
Submitter: wslh (Hacker News)
Submitted: 2026-04-22 15:13 UTC (Hacker News)
HN activity: 118 points · 133 comments

Scrape failed: http 401

--------------------------------------------------------------------------------

16. Parallel agents in Zed

Source: https://zed.dev/blog/parallel-agents
Site: zed.dev
Author: Richard Feldman
Published: 2026-04-22
HN activity: 222 points · 121 comments
Length: 635 words (~3 min read)
Language: en

Zed now lets you orchestrate multiple agents, each running in parallel in the same window. The new Threads Sidebar lets you control exactly which folders and repositories agents can access, and lets you monitor threads as they run.

All of this runs at Zed's famously buttery-smooth 120 fps, with whichever agents you like, and it's all open-source.

Many Threads, One Window

The Threads Sidebar offers an overview of all your threads at a glance, grouped by project, so you can:

Mix and match agents on a per-thread basis, since Zed lets you choose your agent.
Work across projects, with one agent thread reading and writing across repos.
Isolate worktrees, when you want to, and decide per thread.

The Sidebar gives you instant access to common operations like stopping threads, archiving them, and kicking off new ones. Even as your workflow grows in complexity, with several projects running multiple agents at once, the Sidebar makes it easy to stay organized as your agents work.

A New Default Layout

As the Threads Sidebar became our primary way of navigating a project, we reconsidered which panels should sit where. Threads now dock on the left by default, next to the Agent Panel, with the Project Panel and Git Panel on the right.

We think this layout works better for agentic work, keeping agent threads front and center as you move between them. If you prefer a different arrangement, right-click any panel icon in the bottom bar to change its docking position, or adjust it in the Settings Editor. For existing users, the new layout is opt-in.

If you were used to the old layout, we encourage you to give this one a try before switching back. It feels more natural once you've spent a little time with it.

Agent and Editor: Better Together

Ask ten different programmers how they use AI, and you can get ten different answers. At one extreme, there's fully giving into the vibes, and at the other extreme, there's disabling all AI features. What we've found works best for crafting high-quality software is somewhere in between: using AI, and also engaging directly with code.

As our co-founder and CEO Nathan Sobo wrote in 2025, "As software engineers, we should measure our contribution not in lines of code generated, but in reliable, well-designed systems that are easy to change and a pleasure to use." That post introduced the term agentic engineering to describe the art of "combining human craftsmanship with AI tools to build better software," and we've recently seen the term grow in popularity.

Parallel agents in Zed are built around that principle. Multi-agent orchestration isn't new, but we believe we've built a great experience for working with agents at scale. We spent days loading the system with hundreds of threads, refining rough edges and polishing corners that developers may never see. We went through several UX iterations and had countless hours of internal discussions. It took us longer, and we won't lie, it drove us a little crazy. But the result feels better for it, and it lets developers do more challenging things with agents, without sacrificing their craft.

Get Started

Parallel Agents is available in the latest Zed release. You can download Zed, or update to the latest version to get it.

You can open the Threads Sidebar from the icon in the bottom left, or via the keybinding option-cmd-j on macOS and ctrl-option-j on Linux and Windows. We hope you enjoy this new level of control!

Related Posts

Check out similar blogs from the Zed team.

Looking for a better editor?

You can try Zed today on macOS, Windows, or Linux. Download now!

We are hiring!

If you're passionate about the topics we cover on our blog, please consider joining our team to help us ship the future of software development.

--------------------------------------------------------------------------------

17. Borrow-checking without type-checking

Source: https://www.scattered-thoughts.net/writing/borrow-checking-without-type-checking/
Site: scattered-thoughts.net
Submitter: jamii (Hacker News)
Submitted: 2026-04-23 02:55 UTC (Hacker News)
HN activity: 54 points · 14 comments
Length: 6.6K words (~29 min read)
Language: en

This is a demo of a toy language with dynamic typing, inline values, stack allocation, interior pointers, single ownership, and a limited form of borrowing - less expressive than rust, but much more expressive than second-class references (eg we can express external iterators).

Since there is no static typing the borrows must be checked dynamically. The interesting part of the demo is that we can do that fairly cheaply and with useful error messages.

The code is here.

background

I'm exploring a style of type-system exemplified by julia and zig. Both languages start with a dynamic type system, enforced by dynamic type-checks, and then layer on a static type system which is capable of proving that the dynamic type-checks are unnecessary. The dynamic type system provides flexibility and easy meta-programming, while the static type system removes the overhead in most of your code.

Julia and zig differ slightly in how they handle code that cannot be statically type-checked. Zig will refuse to compile the code at all, while julia will leave some dynamic type-checks and will run the static type-checks again when more type information is available.

For zest I'm exploring a third option - code can either be dynamically typed (and interpreted), or statically typed (and compiled), but switching between the two requires explicit annotations. The goal is that most of your code can have the assurances of static typing, but you can still opt in to dynamically-typed glue code to handle repls, live code reloading, compile-time metaprogramming, runtime code generation, malleable software etc.

The tricky part is that I also want to enforce mutable value semantics. To date there are two main strategies for doing this:

Reference-counting and copy-on-write, which imposes an unpleasant performance overhead and is hard to combine with interior pointers / explicit stack allocation.
Static type systems, which won't help me with my dynamically-typed language.

So I have to do something new. This is what I've come up with:

The overhead of borrow-checking is limited to some reference-counting operations when creating/dropping/copying references.
The reference counts themselves are always stored on the stack so the cache impact is low.
Reference counts are never shared between threads, so we don't have to use atomic operations to update them.
The reference counting overhead is only paid in dynamically-typed function frames. Reference counts are never allocated on the heap and statically-typed code never has to see them.
Whenever a borrow-checking rule is violated, the runtime immediately throws an error which identifies the exact value that is at fault.

And as a bonus, I'm at least 60% certain that this scheme is actually sound :)

repl

First a quick note: If you see a green tick below then all the examples are interactive. You can edit the code and hit the eval button to see the result. If you see a red cross instead then probably you have javascript turned off or I didn't test on your browser, and you'll have to make do with the offline result at the end of the code box.

✗

values

Our toy language is pretty minimal. We have integers, tuples, functions, and some basic control flow.

let nums = [1, 5];
let inc = fn (i) {i + 1};
while {nums[0] < nums[1]} {
  nums[0] = inc(nums[0]);
};
nums

[5, 5]


Every variable is an independent value. Mutating the value in one variable never affects the value in a different variable.

let a = [1, [2, 3]];
// `b` is an independent copy of `a`
let b = a; 
b[1][0] = 42;
// `a` is unchanged
[a, b]

[[1, [2, 3]], [1, [42, 3]]]


At the end of each block ({}), for every variable defined in that block we drop the associated value, freeing any memory used by that value.

let a = [1, [2, 3]];
{
  let b = [4, 5];
  let c = 6;
  // c is dropped here
  // b is dropped here
};
// a is dropped here

[]


This approach to combining value semantics with mutation is simple and easy to implement.

But it's also useless. Creating a full copy of every value on every use is infeasible when we want to work with bigger values. We need a way to express sharing between values without breaking value semantics.

references

References let us express the idea of a value that is stored in a different location to its parent value.

One way to make a reference is the box function which stores its contents on the heap. In the example below, the value [2, 3] is stored on the heap and the value [1, box(...)] is stored on the stack.

[1, box([2, 3])]

[1, box([2, 3])]


The dereference operator * is used to reach inside a reference to access its contents.

let a = [1, box([2, 3])];
a[1]*[0]

2


So what should the code below do?

let a = [1, box([2, 3])];
let b = a; // What does this mean?
b*[0] = 42;
a


We could copy the contents of the box, but that's not a solution that scales to arbitrarily large data-structures. We could end up copying gigabytes of memory!

Or we could just copy the pointer itself and share the heap allocation between a and b. But then the assignment to b*[0] will be visible in a, breaking the illusion of value semantics.

What we actually do, unless you add more explicit annotations, is to just refuse to copy boxes at all.

let a = [1, box([2, 3])];
let b = a[1];
b*[0] = 42;
b*

Error at 2:9
Can't copy an owned reference


To work with references we have to be more specific about what we mean. We have a few options.

The first option is to move the value using ^. This gives us a copy of the reference but destroys the original!

let a = [1, box([2, 3])];
let b = a[1]^;
b*[0] = 42;
b*

[42, 3]


We leave those XXX in the diagram to show that the original reference has been destroyed and nothing has replaced it yet. What exactly the XXX means is a question we'll leave for later. (We could just pick some zero value that is valid for all types, but that feels like a mistake).

Destroyed values can be overwritten with new values.

let a = [1, box([2, 3])];
let b = a[1]^;
a = [4, box([5, 6])];
a[1]*

[5, 6]


This gives us a (clunky) way to pass references to functions without copying them. We can move the original value, mutate it within the function body, return the mutated value, and assign it back to the original variable.

let inc_first = fn (x) {
  x*[0] = {x*[0] + 1};
  x^
};
let a = [1, box([2, 3])];
let [a0, a1] = a^;
a = [a0^, inc_first(a1^)];
a^

[1, box([3, 3])]


To make this process less tedious, we provide a second option - we can create a borrowed reference using !. This is just like moving the value into a new box(...), except that when the new reference is dropped we return the value to its original location. (When you borrow things, you're supposed to give them back!)

let inc_first = fn (x) {
  x*[0] = {x*[0] + 1};
  // `x` is dropped here and the mutated contents are moved back to `a`
};
let a = [1, box([2, 3])];
inc_first(a[1]*!);
a^

[1, box([3, 3])]


The third and final option is creating a shared reference using &. This behaves like a borrowed reference, but the original owner gets to keep their copy - it isn't destroyed.

let a = [1, box([2, 3])];
let b = a[1]*&;
[a[1]*, b*]

[[2, 3], [2, 3]]


To maintain the illusion of separate values, we can't allow you to mutate either copy.

let a = [1, [2, 3]];
let b = a[1]&;
b*[0] = 7;

Error at 3:1
Can't assign through a shared reference


let a = [1, [2, 3]];
let b = a[1]&;
a[1][0] = 7;

Error at 3:1
Can't assign to `a` because it is shared by `b`


closures

Closures are supported, but don't implicitly capture variables from their scope. So the example below doesn't work.

let iter_copy = fn (tuple_ref) {
  let index = 0;
  fn () {
    if index < len(tuple_ref*) {
      let elem = tuple_ref*[index];
      index = {index + 1};
      elem
    } else {
      []
    }
  }
};
let a = [1,2];
let next = iter_copy(a&);
[next(), next(), next()] 

Error at 4:8
Can't refer to `index` here because it is defined outside this function - try using an explicit capture instead.


(We could support rust-style implicit captures just fine, but explicit captures make it much easier to explain the interaction with borrow-checking.)

Let's think first about how we could write this example without closures. We can return a tuple with all the essential state, and then call a separate next function on that tuple.

let iter_copy = fn (tuple_ref) {
  let index = 0;
  [index, tuple_ref]
};
let next = fn (state_ref) {
  let [index, tuple_ref] = state_ref^;
  if index* < len(tuple_ref**) {
    let elem = tuple_ref**[index*];
    index* = {index* + 1};
    elem
  } else {
    []
  }
};
let a = [1,2];
let state = iter_copy(a&);
[next(state!), next(state!), next(state!)]

[1, 2, []]


Closures are just syntax sugar for this pattern. We specify what state we want to capture in the closure (index, tuple_ref) and what kind of access we want to that state (!). Then the compiler desugars it to the example above.

let iter_copy = fn (tuple_ref) {
  let index = 0;
  fn [index, tuple_ref]! () {
    if index* < len(tuple_ref**) {
      let elem = tuple_ref**[index*];
      index* = {index* + 1};
      elem
    } else {
      []
    }
  }
};
let a = [1,2];
let next = iter_copy(a&);
[next!(), next!(), next!()]

[1, 2, []]


safety

Under the hood, borrowed and shared references are implemented as pointers to the original value. Those XXXs don't actually exist in the implementation.

The goal of borrow-checking is to try to have our cake and eat it - we want to provide the simplicity of value semantics while keeping the performance of reference semantics, and never let you get into a situation where you could tell the difference.

Ensuring that moving, borrowing, and sharing don't break the illusion is surprisingly easy to do with a static type system, and surprisingly hard to do with a dynamic type system, or at least hard to do cheaply. This is the best I've managed so far and it's still quite restrictive compared to rust.

Maybe the easiest way to explain how it works is to first list all the things that it prevents you from doing, and then talk about the implementation afterwards.

The biggest restriction is that owned references (boxes) are not allowed to point to borrowed/shared references. This ensures that borrowed/shared references only live on the stack, which makes many of the other rules easier to enforce dynamically.

let a = 1;
let b = box(a&);

Error at 2:9
Can't box a shared ref


let a = box(box(1));
let b = 2;
a* = b&;

Error at 3:1
Can't assign a value of type `number&` to a location of type `box(number)`


While a borrowed reference points to a value, no more borrowed/shared references to that value can be created.

let a = 1;
let b = a!;
let c = a!;

Error at 3:9
Can't borrow `a` because it is borrowed by `b`


The runtime does track when borrowed references are dropped though.

let a = 1;
{
  let b = a!;
  // `b` is dropped at the end of this scope
};
let c = a!; // now we can borrow `a` again

[]


let a = 1;
let b = a!;
b^; // `b` is moved here and then dropped
let c = a!; // now we can borrow `a` again

[]


We can create multiple borrowed references to parts of a value by destructuring a borrowed reference.

let a = [1,2];
let [b, c] = a!;
b* = 42;
c* = 101;
b^; c^; // drop `b` and `c`
a

[42, 101]


While a shared reference points to a value, only shared references to that value can be created.

let a = 1;
let b = a&;
let c = a&; // this is allowed, have as many shared references as you want
let d = a!; // this is not allowed, because writing to `d` would also change `b` and `c`

Error at 4:9
Can't borrow `a` because it is shared by `c`


let a = 1;
let b = a&;
let c = a&;
b^; c^; // once we drop all the shared references we can borrow again
let d = a!;

[]


Values can't be moved out of variables while any borrowed/shared references exist, even if the moved and borrowed/shared parts don't overlap.

let a = [1,2];
let b = a[0]&;
a[1]^

Error at 3:1
Can't move out of `a` because it is shared by `b`


Once a value has been moved, even partially, it can't be used at all until the entire value is replaced.

let a = [1,2];
a[0]^;
a[1]

Error at 3:1
Can't refer to `a` because it has been moved


let a = [1,2];
a[0]^;
a[0] = 3; // replacing just the moved part is not enough

Error at 3:1
Can't refer to `a` because it has been moved


let a = [1,2];
a[0]^;
a = [3, 4]; // replacing the entire value works
a

[3, 4]


Variables can only hold borrowed/shared references pointing to variables with longer lifetimes.

let a = 1;
let b = 2;
let c = a&; // this is ok, `a` will be dropped later than `c`
c = b&; // also ok, `b` will be dropped later than `c`
let d = 3;
c = d&; // uh oh, `d` will be dropped before `c`

Error at 6:1
This value can't be owned by `c` because it shares from `d`, which will be destroyed before `c`


Values returned from a block may not contain borrowed/shared references pointing to variables defined in that block.

let a = {
  let b = 1;
  b&
};

Error at 1:9
This value shares from `b`, but `b` will be destroyed at the end of this block


Although the language is dynamically typed, variables can't change type once assigned (because their allocation can't change size).

let a = [1]; // 8 bytes allocated on the stack
a = [1, 2]; // can't change the allocation to 16 bytes

Error at 2:1
Can't assign a value of type `[number, number]` to a location of type `[number]`


Even references can't change type, although their size would be the same. This lets us avoid having to store type tags for each allocation, because we can always derive the type of a value from the type of the reference pointing to it.

let a = box([1]); 
a = box([1, 2]);
a*

Error at 2:1
Can't assign a value of type `box([number, number])` to a location of type `box([number])`


However, like julia, we can opt in to storing type tags just in the places where we want dynamism. The any function takes a reference and returns a dynamically-typed version of that reference.

let a = box([1]); // 8 byte value with type box([number])
let b = any(a^); // 16 byte value with type box(any)
b = any(box([1, 2]));
b*

[1, 2]


We still can't change the type of the allocation itself though.

let a = any(box([1])); 
a* = [1, 2];
a*

Error at 2:1
Can't assign a value of type `[number, number]` to a location of type `[number]`


expressiveness

After all that talk about what we can't do, let's look at what we can do that we wouldn't have been able to do with only second-class references.

References can be placed inside tuples. In a non-toy language, that would mean we could use types like Option<&mut T>.

let a = 1;
let b = 2;
let c = [a!, b!];
c[0]* = 3;
c[1]* = 4;
c^;
[a, b]

[3, 4]


References can be returned from functions. In a non-toy language, that would mean we could use function types like fn(&[T], usize) -> Option<&T>.

let get = fn (tuple, index) {
  tuple*[index]&
};
let a = [1, 2, 3];
get(a&, 1)*

2


Even borrowed references can be returned from functions, although there is some subtlety. The example below doesn't work.

let get = fn (tuple, index) {
  tuple*[index]!
};
let a = [1, 2, 3];
get(a!, 1)* = 5;
a

Error at 5:1
This value borrows from `tuple`, but `tuple` will be destroyed at the end of this block


The reason is that tuple*[index]! is borrowed from tuple and will be returned to tuple at the end of its lifetime, so it can't outlive tuple. If you want to produce a reference that outlives tuple, you have to explicitly consume tuple by using a move.

let get = fn (tuple, index) {
  tuple^*[index]!
};
let a = [1, 2, 3];
get(a!, 1)* = 5;
a

[1, 5, 3]


In this comment Alex gives an example that is impossible to handle with second-class references - iterating over a linked list in a loop. We can handle this too. It is a bit clunky because our toy language doesn't have sum types, but that's just a lack of implementation effort, not a limitation of the borrow-checker.

let list = any(box([]));
list = any(box([0, list^]));
list = any(box([1, list^]));
list = any(box([2, list^]));
{
  let next = list!;
  let null = any(box([]));
  while {next != null!} {
    next**[0] = {next**[0] + 1};
    next = next^**[1]!;
  };
};
list^

any(box([3, any(box([2, any(box([1, any(box([]))]))]))]))


We already saw a copying iterator in the closures section, but we can also make iterators that return shared or borrowed references.

let iter_borrowed = fn (tuple_ref) {
  let index = 0;
  fn [index^, tuple_ref^]! () {
    if index* < len(tuple_ref**) {
      let elem = tuple_ref^**[index*]!;
      index* = {index* + 1};
      elem^
    } else {
      []
    }
  }
};
let a = [1,2];
{
  let next = iter_borrowed(a!);
  next!()* = 3;
  next!()* = 4;
};
a

[3, 4]


This even correctly assigns the lifetimes of the returned references - the shared version shares from the underlying tuple, while the borrowed version borrows from the iterator itself and so doesn't allow returning multiple borrowed references at the same time.

let iter_shared = fn (tuple_ref) {
  let index = 0;
  fn [index^, tuple_ref^]! () {
    if index* < len(tuple_ref**) {
      let elem = tuple_ref^**[index*]&;
      index* = {index* + 1};
      elem^
    } else {
      []
    }
  }
};
let a = [1,2];
let next = iter_shared(a&);
let a0 = next!();
let a1 = next!();
[a0*, a1*]

[1, 2]


let iter_borrowed = fn (tuple_ref) {
  let index = 0;
  fn [index^, tuple_ref^]! () {
    if index* < len(tuple_ref**) {
      let elem = tuple_ref^**[index*]!;
      index* = {index* + 1};
      elem^
    } else {
      []
    }
  }
};
let a = [1,2];
let next = iter_borrowed(a!);
let a0 = next!();
let a1 = next!();
[a0*, a1*]

Error at 16:10
Can't borrow `next` because it is borrowed by `a0`


implementation

To enforce these rules we need to store some extra data.

For each variable, we store a ref-count which can be in one of 4 states:

If ref_count == INT_MIN then some part of the value in this variable has been moved.
If INT_MIN < ref_count < 0 then this counts the number of references which borrow from this value.
If ref_count == 0 then this variable is available to be borrowed/shared.
If 0 < ref_count then this counts the number of references which share from this value.

The neat thing is that each safety check only requires a single integer comparison:

const Count = std.math.IntFittingRange(-stack_size, stack_size);

const available = 0;
const moved = std.math.minInt(Count);

fn isMoved(ref_count: RefCount) bool {
    return ref_count.count == moved;
}

fn canMove(ref_count: *RefCount) bool {
    return ref_count.count == available;
}

fn canBorrow(ref_count: *RefCount) bool {
    return ref_count.count == available;
}

fn canShare(ref_count: *RefCount) bool {
    return ref_count.count >= available;
}


For each borrowed/shared reference we store:

lease: whether this reference is an owned, borrowed, or shared reference.
lender: the variable from which this value was borrowed/shared, and whose reference count we need to decrement when dropping this reference.
owner: the variable to which this value originally belonged, and whose lifetime dictates which values are safe to write to this reference.

We can address an 8mb stack with 8-byte alignment using only 20 bits, so all of the above fits into 42 bits per reference. This does mean that each borrowed/shared reference is now 16 bytes in total, but in the crossing stacks section below I'll show that statically-typed code doesn't have to pay this overhead.

The only case in which lender and owner differ is when reborrowing from an existing borrow:

let a = 1;
let b = a&;
let c = 2;
let d = b!; // owner: b, lender: b
let e = d*!; // owner: b, lender: d
e* = c&; // this isn't safe because `c` doesn't live as long as the owner `b`

Error at 6:1
This value can't be owned by `b` because it shares from `c`, which will be destroyed before `b`


Both b and d are borrowed from and should not be accessed while e lives. To enforce this, when creating b! we increment the borrow count on b, and when creating d*! we increment the borrow count on d.

The result is that b can't be accessed while d lives, and d can't be accessed while e lives. We create a kind of chain of custody of the underlying value so that at any point in time there is at most one reference that can mutate the value.

But the underlying value is still owned by b, which means it is not safe to write e* = c& because c will be dropped before b. So when we safety-check assignment we have to look at the owner of the location, not the lender. That's why we need to track both the owner and the lender for every reference.

The downside of this scheme is that we can't return a reference past the lifetime of its lender.

let a = 1;
{
  let b = a!; // owner: a, lender: a
  let c = b*!; // owner: a, lender: b
  c^ // can't return this reference because its lender `b` is about to be dropped
}* = 2;
a

Error at 2:1
This value borrows from `b`, but `b` will be destroyed at the end of this block


We want to return c from this block, but c has lender b and b will be dropped at the end of the block.

The way to make this work is to move b. When evaluating the l-value b^* we notice that b was consumed and is no longer accessible, so we don't need to record it as the lender.

let a = 1;
{
  let b = a!; // owner: a, lender: a
  let c = b^*!; // owner: a, lender: a
  c^
}* = 2;
a

2


If you scroll back up to the iter_borrowed example in the previous section, you'll notice this pattern in tuple_ref^**[index*]!. Without the ^, the iterator can't return that borrowed reference because its lender would be tuple_ref.

error messages

Most of the error messages are pretty easy to produce from the data we already track. For example:

let a = 1;
a&

Error at 1:1
This value shares from `a`, but `a` will be destroyed at the end of this block


The expression a& produces a reference which records a as the lender, so we know exactly who to blame.

There is one kind of error which requires more work though.

let a = 1;
let b = [1, 2, [a!, 3]];
a!

Error at 3:1
Can't borrow `a` because it is borrowed by `b[2][0]`


We know that it's not safe to borrow a again because the borrow count is already set to 1. But that doesn't tell us where the borrowed reference is.

We do know that all borrowed/shared references must be on the stack somewhere, and that any reference which borrows a must be on the stack below a. We also know that we're definitely about to panic, so we can afford to burn some cpu cycles on producing a better error message by scanning the stack to find the reference which is borrowing from a.

fn findLendee(lender: StackIndex, lease: Lease) struct { *StackItem, RefIndex } {
    var i = c.stack.len;
    while (i > 0) : (i -= 1) {
        const item = &c.stack.items[i - 1];
        if (item.value.findLendee(lender, lease)) |ref_index|
            return .{ item, ref_index };
    }
    panic("Couldn't find lendee for {}", .{.{ .lender = lender, .lease = lease }});
}

fn findLendee(value: Value, lender: StackIndex, lease: Lease) ?RefIndex {
    for (value.type_id.getRefIndexes()) |ref_index| {
        const ref = value.getRefAtIndex(ref_index);
        if (ref.type_id.getType().ref.lease == lease and
            ref.getRefProvenance().lender == lender)
            return ref_index;
    }
    return null;
}


A key function here is getRefIndexes which, for a given type id, returns a pre-computed list of the locations of all the references contained in that type. This saves us from having to recursively drill down through the types. It's a little like the gc bitmaps used in some garbage collected languages.

crossing stacks

When spawning a new thread, we want to be sure that it doesn't share refcounts with any other threads. When calling from dynamically-typed code into statically-typed code (or vice versa), we want to be sure that the statically-typed code doesn't use any references in ways that would break the reference counting in the dynamically-typed code. In both cases the safety requirements are similar.

I haven't actually implemented threads or static typing, but I have encapsulated the safety requirements in the with_new_stack function, which copies a closure onto a new stack, calls it, and then copies the result back.

let a = 0;
let b = 1;
let c = with_new_stack(fn [a!, b^] () {
  a* = 2;
  b + 2
});
[a, c]

[2, 3]


The captures may contain borrowed/shared references. After copying these references to the new stack we change the provenance to point to dummy lenders on the new stack so that the function call never has to touch refcounts on the old stack.

We only decrement the refcounts of the original lenders when the function returns (which for threads implies some sort of structured concurrency).

We can't do this reborrowing recursively, so the targets of those borrowed/shared references must not themselves contain borrowed/shared references.

let a = 0;
let b = a&;
with_new_stack(fn [b] () { b* }); // This is ok - we capture a& - one layer of sharing.
with_new_stack(fn [b&] () { b** }); // This is not ok - we capture a&& - two layers of sharing.

Error at 4:1
The closure passed to `with_new_stack` contains a shared reference to a shared reference.


The result of the function call must not contain any borrowed/shared references.

let a = 0;
with_new_stack(fn [a&] () { a* }); // This is ok - returns 0.
with_new_stack(fn [a&] () { a }); // This is not ok - returns 0&.

Error at 3:1
Can't return a shared reference from `with_new_stack`.


In the case of spawning a thread this last restriction is too strong - we could allow returning borrowed/shared references so long as they live long enough. But for statically-typed code we might not precisely know the owner/lender of the resulting references, so we can't update the ref-counts correctly when returning to dynamically-typed code.

The function must expect to be called by move, not by borrowed/shared reference.

let a = 0;
with_new_stack(fn [a]! () {})

Error at 2:1
The closure passed to `with_new_stack` should expect to be called by move, not by borrow


Together these restrictions allow safely calling across different stacks, or across partitions on a single stack. They prevent threads from sharing refcounts, and prevent statically-typed code from ever seeing refcounts or provenance at all.

thoughts

On the way to this scheme I tried many, many different schemes with different tradeoffs between cost, expressiveness, and explainability.

The most interesting system that I sketched out and abandoned used invisicap-style shadow allocations for each value to track which elements are borrowed/shared.

This allowed creating multiple borrows against a single value, so long as they are disjoint.

let a = [1,2];
let b = a[0]!;
let c = a[1]!; // This would be safe, because we can check dynamically that a[1] hasn't been borrowed.


It also allowed partial moves.

let a = [1,2];
a[0]^;
a[0] = 2; // This overwrites the value that was moved out, so now `a` would be safe again.
a


It also removed the need to pin values when some child reference has been borrowed, because only the borrowed part itself needs to be changed when the borrow is dropped.

let a = [box(1), box(2)];
let b = a[0]*!;
let c = a^; // This would be safe, because the location that `b` points at doesn't move.


The major problem was that I couldn't figure out a way to safely transition into statically-typed code. If you can move a value that is currently borrowed from, the only way to be sure that you aren't passing such a value into statically typed code is to scan the entire value to check if any part of it has been moved.

Error messages were also much worse, because when a safety rule was violated we often had no reasonable way to track down the offender without a full heap scan.

There is a lot of punctuation in these examples. I think much of it could be removed by adding dynamic equivalents to rust's deref coercion. For example, a**[0] could just be a[0] if the [] operator dereferenced the lhs until it found a tuple. Similarly a** = b could just be a = b if the = operator dereferenced the lhs until it found something that matched the type of the rhs.

If you played with the repl you might have run into this error:

let f = fn () { [2, 3] };
let a = [1, f()!, 4];

Error at 2:13
For annoying stack management reasons, arbitrary expressions are only allowed inside path expressions if they are immediately followed by a dereference operator


The problem is that we have a strict LIFO stack, where every expression is expected to consume its arguments and leave only its result value on the stack. But in f()! the value [2, 3] needs to be allocated somewhere on the stack before being borrowed. In a statically-typed language we could check the size of that value and allocate stack space for it in advance, but that's not an option in a dynamically-typed language.

The only case that's easy to handle is when the expression returns a reference and we immediately dereference it.

let get = fn (t, i) { t^*[i]! };
let a = [1, 2, 3];
get(a!, 1)* = 42;
a

[1, 42, 3]


This works out because we can pop the reference from the stack and produce an l-value pointing to its contents.

It might be possible to solve the general problem by having two stacks - one for values that are the results of expressions, and one for values that should live until the end of the current block. Then when we want to allocate a value and immediately borrow it, we can pop it from the first stack and push it to the second.

But there is one advantage to the current setup - it's currently only possible to share/borrow from named values. This makes it much easier to produce readable error messages. If we allowed f()! we'd have to produce error messages that said things like:

Can't borrow from the stack location produced by the expression at 2:13 because it is already borrowed by the expression at 3:17


The main reason for confining borrowed/shared references to the stack is so that in the assignment a = b we can check the lifetime of b by scanning its stack value, rather than its entire heap value.

But as a bonus, this means that pointers to the stack can only occur on the stack, and we can identify them precisely, which means we can easily grow/shrink the stack.

We have enough unused space in provenance to allow 32-bit stack indexes, which would allow each stack to grow up to 4gb if needed.

It really bugs me that moving a value gives you a value of the same type, but borrowing/sharing a value gives you a reference. But it's really hard to avoid. I've tried:

Make borrowed/shared references totally second-class, so they can never appear in tuples and their size/type can't be directly observed.
Make variables immutable and un-addressable. Only allow mutation through explicitly created box(...) (like ocaml).

Neither feel particularly ergonomic when I work through examples.

The problem is closely tied to the existence of interior pointers. If you look back in the history of the repo, I made a language with universal layout where there is no need for l-values and dropped values are represented as null pointers. This felt very ergonomic and also easy to explain, but using universal layout limits the potential performance.

Compared to rust, it feels limiting that we have to explicitly drop values to end their lifetime.

let a = 1;
let b = a!;
b^; // If we don't drop `b` here then we can't access `a` below.
a

1


This is fixable though!

It's not quite as simple as dropping variables after their last direct use, because references to the value could still exist.

let a = 1;
let b = a!;
// No further usages of `a` after this point, but `a` will be used indirectly through `b`.
b* = 2;
// Now it's safe to drop `a`.

[]


Instead, we could mark variables as 'droppable' after their last direct use, and then only actually drop them when their ref-count hits zero. This would give a more rust-like feel to the language.

The reason I haven't done this is that there are cases where this would make the dynamically typed version of a function drop values earlier than the statically typed version, because the static analyis has to approximately track ref-counts that the dynamic version knows precisely.

let a = 1;
let b = 2;
let c = if {a == b} { a! } else { b! };
// The dynamically-typed version knows that `c` only borrows from `b` so it can drop `a` here,
// but the statically-typed version thinks that `c` might borrow from `a` or `b`.
c* = 3;
// The statically-typed version can't know until here that `a` is unreachable.

[]


I've been sticking to the principle that static typing only rules out errors and never changes semantics. Dropping later feels a little like changing the semantics, even if it's not currently directly observable.

I'm really solving two problems at once:

Support (a limited form of) references while preserving the illusion of value semantics.
Support interior pointers and (explicit) stack allocation while preserving memory safety.

There is a lot of related work that solves problem 1 or problem 2, but only a few systems that solve both problems at the same time.

Rust solves both, at the cost of a surprising amount of type system complexity. Hylo, mojo, and lately swift also try to solve both while trading off more restricted use of references for a simpler type system.

Hylo is particularly interesting because their references are entirely second-class, which considerably simplifies the mental model, but they're able to recapture some of the expressiveness of first-class references by allowing interleaved coroutines:

fun f(x: Array<Int>, y: inout Array<Array<String>>) {
  // x1 coroutine starts
  let x1 = x[1]
  // y1 coroutine starts
  inout y1 = &y[1]
  y1[x1] = "Hello World"
  // x1 coroutine ends
  print(y1)
  // y1 coroutine ends
}


I'm still tempted by this, but it's a little more restrictive than what I have here (eg no Option<&T>), the codegen seems gnarly, and if coroutines are allowed to perform side-effects then it can be tricky for the reader to figure out when those side-effects happen. Also I can get a similarly simple mental model in my current system by not allowing borrowed/shared refs to appear inside other values.

It's worth noting that rust does also have a dynamic model in the form of tree borrows, which is used in miri for testing code that uses unsafe escape hatches from the type system. But tree borrows are far too expensive to use as an actual programming model - every read or write to any reference requires checking every other aliased reference in case it must be invalidated.

C# refs and oxcaml modes only aim at solving problem 2, but do so in a way that is easy to extend to problem 1. Both require static typing, but my ref-counting system is very roughly a dynamic version of c# refs and has similar expressiveness.

Cheriot and fil-c solve problem 2 without static typing (albeit by requiring garbage collection). I sketched out a fil-c inspired system that also solves problem 1, as mentioned above, but I couldn't find a reasonable way to transition between dynamically- and statically-typed code.

R and (a subset of) swift solve problem 1 by using reference counting and copy-on-write. Reference-counting imposes a fairly steep overhead (eg these benchmarks find that just switching from atomic to non-atomic ref-counts can double the performance of some swift programs), and copy-on-write creates the possibility of accidentally and invisibly creating copies of arbitrarily large values. It's also tricky to combine reference-counting with interior-pointers, and allowing interior pointers to escape is a common source of leaks in eg go. Finally, stack allocation is still only possible via best-effort escape analysis. None of this is compatible with my goal of predictable performance.

Gel/inko is an interesting tweak on the reference-counting formula. Rather than freeing values when the ref-count hits zero, they free values at the end of their scope (like rust or this demo) and throw an error if the ref-count is not zero. This doesn't solve either of my problems, but did inspire my implementation.

next

I'm not quite sure what's next. Everything here works, but in usage it feels... fiddly. It's possible that first-class borrowing works in rust because the type system allows the compiler to fill in a lot of the details (eg autoborrow, deref coercion) and catch the remaining mistakes.

One option would be to move more in the hylo-direction with second-class references and coroutines, if I can figure out how to make that work in a dynamically-typed, interpreted setting.

Another option would be to statically type everything (so that I don't need dynamic borrow-checking) but focus on the ergonomics of working with values of unknown types in statically-typed code. I can see how zig-like comptime can still work even if the comptime language is also statically-typed, but I can't guess how much it will hurt the ergonomics of meta-programming.

Anyway, if you want to see more of this language then there is this handy sponsor button you could press. Think of it like your taxes contributing to someone's phd stipend, but a little more directly :)

--------------------------------------------------------------------------------

18. An amateur historian's favorite books about the Silk Road

Source: https://bookdna.com/best-books/silk-road
Site: bookdna.com
Submitter: bwb (Hacker News)
Submitted: 2026-04-21 11:01 UTC (Hacker News)
HN activity: 10 points · 5 comments

Scrape failed: http 403

--------------------------------------------------------------------------------

19. Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

Source: https://qwen.ai/blog?id=qwen3.6-27b
Site: qwen.ai
Submitter: mfiguiere (Hacker News)
Submitted: 2026-04-22 13:19 UTC (Hacker News)
HN activity: 835 points · 384 comments

No extractable content.

--------------------------------------------------------------------------------

20. Verus is a tool for verifying the correctness of code written in Rust

Source: https://verus-lang.github.io/verus/guide/
Site: verus-lang.github.io
Submitter: fanf2 (Hacker News)
Submitted: 2026-04-20 20:42 UTC (Hacker News)
HN activity: 52 points · 10 comments
Length: 745 words (~4 min read)
Language: en

Keyboard shortcuts

Press ← or → to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Verus overview

Verus is a tool for verifying the correctness of code written in Rust. The main goal is to verify full functional correctness of low-level systems code, building on ideas from existing verification frameworks like Dafny, Boogie, F*, VCC, Prusti, Creusot, Aeneas, Cogent, Rocq, and Isabelle/HOL. Verification is static: Verus adds no run-time checks, but instead uses computer-aided theorem proving to statically verify that executable Rust code will always satisfy some user-provided specifications for all possible executions of the code.

In more detail, Verus aims to:

provide a pure mathematical language for expressing specifications (like Dafny, Creusot, F*, Coq, Isabelle/HOL)
provide a mathematical language for expressing proofs (like Dafny, F*, Coq, Isabelle/HOL) based exclusively on classical logic (like Dafny)
provide a low-level, imperative language for expressing executable code (like VCC), based on Rust (like Prusti, Creusot, and Aeneas)
generate small, simple verification conditions that an SMT solver like Z3 can solve efficiently, based on the following principles:

keep the mathematical specification language close to the SMT solver’s mathematical language (like Boogie)
use lightweight linear type checking, rather than SMT solving, to reason about memory and aliasing (like Cogent, Creusot, Aeneas, and linear Dafny)

We believe that Rust is a good language for achieving these goals. Rust combines low-level data manipulation, including manual memory management, with an advanced, high-level, safe type system. The type system includes features commonly found in higher-level verification languages, including algebraic datatypes (with pattern matching), type classes, and first-class functions. This makes it easy to express specifications and proofs in a natural way. More importantly, Rust’s type system includes sophisticated support for linear types and borrowing, which takes care of much of the reasoning about memory and aliasing. As a result, the remaining reasoning can ignore most memory and aliasing issues, and treat the Rust code as if it were code written in a purely functional language, which makes verification easier.

At present, we do not intend to:

support all Rust features and libraries (instead, we will focus a high-value features and libraries needed to support our users)
verify the verifier itself
verify the Rust/LLVM compilers

This guide

This guide assumes that you’re already somewhat familiar with the basics of Rust programming. (If you’re not, we recommend spending a couple hours on the Learn Rust page.) Familiarity with Rust is useful for Verus, because Verus builds on Rust’s syntax and Rust’s type system to express specifications, proofs, and executable code. In fact, there is no separate language for specifications and proofs; instead, specifications and proofs are written in Rust syntax and type-checked with Rust’s type checker. So if you already know Rust, you’ll have an easier time getting started with Verus.

Nevertheless, verifying the correctness of Rust code requires concepts and techniques beyond just writing ordinary executable Rust code. For example, Verus extends Rust’s syntax (via macros) with new concepts for writing specifications and proofs, such as forall, exists, requires, and ensures, as well as introducing new types, like the mathematical integer types int and nat. It can be challenging to prove that a Rust function satisfies its postconditions (its ensures clauses) or that a call to a function satisfies the function’s preconditions (its requires clauses). Therefore, this guide’s tutorial will walk you through the various concepts and techniques, starting with relatively simple concepts (basic proofs about integers), moving on to more moderately difficult challenges (inductive proofs about data structures), and then on to more advanced topics such as proofs about arrays using forall and exists and proofs about concurrent code.

All of these proofs are aided by an automated theorem prover (specifically, Z3, a satisfiability-modulo-theories solver, or “SMT solver” for short). The SMT solver will often be able to prove simple properties, such as basic properties about booleans or integer arithmetic, with no additional help from the programmer. However, more complex proofs often require effort from both the programmer and the SMT solver. Therefore, this guide will also help you understand the strengths and limitations of SMT solving, and give advice on how to fill in the parts of proofs that SMT solvers cannot handle automatically. (For example, SMT solvers usually cannot automatically perform proofs by induction, but you can write a proof by induction simply by writing a recursive Rust function whose ensures clause expresses the induction hypothesis.)

--------------------------------------------------------------------------------

21. Scoring Show HN submissions for AI design patterns

Source: https://www.adriankrebs.ch/blog/design-slop/
Site: adriankrebs.ch
Submitter: hubraumhugo (Hacker News)
Submitted: 2026-04-22 14:44 UTC (Hacker News)
HN activity: 306 points · 218 comments
Length: 654 words (~3 min read)
Language: en

An attempt to detect AI design patterns in Show HN pages

Apr 20, 2026

When browsing Hacker News, I noticed that many Show HN projects now have a generic sterile feeling that tells me they are purely AI-generated. Initially I couldn’t tell what it was exactly, so I wondered if we could automatically quantify this subjective feeling by scoring 500 Show HN pages for AI design patterns.

Claude Code has led to a large increase in Show HN projects. So much, that the moderators of HN had to restrict Show HN submissions for new accounts.

Here is how the Show HN submissions increased over the last few years:

Update: dang pointed out that the March 2026 dip correlates with the rollout of /showlim, the view newer accounts now see.

That should give us plenty of pages to score for AI design patterns.

AI design patterns

A designer recently told me that “colored left borders are almost as reliable a sign of AI-generated design as em-dashes for text”, so I started to notice them on many pages.

Then I asked some more designer friends what they think are common AI patterns. The answers can be roughly grouped into fonts, colors, layout quirks, and CSS patterns.

Fonts

Inter used for everything, but especially the centered hero headlines
LLM tend to use certain font combos like Space Grotesk, Instrument Serif and Geist
Serif italic for one accent word in an otherwise-Inter hero

Colors

“VibeCode Purple”
Perma dark mode with medium-grey body text and all-caps section labels
Barely passing body-text contrast in dark themes
Gradient everything
Large colored glows and colored box-shadows

Layout quirks

Centered hero set in a generic sans
Badge right above the hero H1
Colored borders on cards, on the top or left edge
Identical feature cards, each with an icon on top
Numbered “1, 2, 3” step sequences
Stat banner rows
Sidebar or nav with emoji icons
All-caps headings and section labels

CSS patterns

shadcn/ui
Glassmorphism

A few examples from the Show HN submissions:
Badge above the Inter hero.
Same, different page.
Colored border on top.
Dead internet? An AI-generated outreach about my blog that includes a perfect example of an AI design pattern (colored left border).
Icon-topped feature card.
Gradient background + glassmorphism cards.

Detecting AI design in Show HN submissions

Now we can try to systematically score for these patterns by going through 500 of the latest Show HN submissions and scoring their landing pages against the list above.

Here is the scoring method:

A headless browser loads each site (Playwright)
A small in-page script analyzes the DOM and reads computed styles
Every pattern is a deterministic CSS or DOM check. I intentionally do not take screenshots and let the LLM judge them.

This ultimately also leads to false positives, but my manual QA run verified it’s maybe 5-10%. If there is any interest in open sourcing the scoring code to replicate (and improve) the run or score your own site, let me know.

Results

A single pattern doesn’t necessarily make a site AI-generated, so I grouped them into three tiers based on how many of the 15 patterns they trigger:

Heavy slop (5+ patterns) · 105 sites · 21% Mild (2–4) · 230 sites · 46% Clean (0–1) · 165 sites · 33%

Is this bad? Not really, just uninspired. After all, validating a business idea was never about fancy design, and before the AI era, everything looked like Bootstrap.

There is a difference between trying to craft your own design and just shipping with whatever defaults the LLMs output. And the same has been the case pre-LLM when using CSS/HTML templates.

I guess people will get back to crafting beautiful designs to stand out from the slop. On the other hand, I’m not sure how much design will still matter once AI agents are the primary users of the web.

This post is human-written, the scoring and analysis were AI-assisted.

--------------------------------------------------------------------------------

22. Ultraviolet corona discharges on treetops during storms

Source: https://www.psu.edu/news/earth-and-mineral-sciences/story/treetops-glowing-during-storms-captured-film-first-time
Site: Penn State News
Author: David Kubarek
Published: 2026-04-15
HN activity: 230 points · 65 comments
Length: 907 words (~4 min read)

Earth and Mineral Sciences

Weather phenomenon that eluded scientists for decades captured in nature as corona discharges glow on tips of leaves

The positive, left, and negative corona discharges are shown on a spruce branch in a nearly pitch-black environment of a meteorology and atmospheric sciences lab at Penn State. Credit: William Brune / Penn State. Creative Commons

UNIVERSITY PARK — In a converted 2013 Toyota Sienna affixed with a hand-built telescopic weather device protruding from the roof, Penn State experts in meteorology and atmospheric science made their way down the nation’s eastern coast in June 2024 in search of Florida’s famed near-daily summer thunderstorms.

They were hoping to catch corona discharges, a long-hypothesized atmospheric weather phenomenon where miniscule pulses of electricity dance at the tips of tree leaves, causing the canopy to glow in the ultraviolet (UV). For more than 70 years, scientists have suspected treetops might emit these corona electrical discharges because of odd electric field activity in and over forests during storms, yet they have never been documented outside the lab.

The team, consisting of William Brune, distinguished professor of meteorology and atmospheric science; Patrick McFarland, a doctoral candidate in meteorology and atmospheric science; Jena Jenkins, assistant research professor; and David Miller, a former associate research professor who is now at the Penn State Applied Research Lab; worked to be the first to document this effect.

They chose the Sunshine State because of its propensity to produce frequent thunderstorms. However, as is often the case during research endeavors, the typical weather proved atypical.

For three weeks in Florida, McFarland and Brune chased pop-up storms that left as quickly as they formed.

The researchers had little to show for their efforts until, as they made their way back to Penn State, massive and sustained storms began cropping up just west of Interstate 95. The team caught an exit, nestled in a parking lot at the University of North Carolina at Pembroke, and trained their instruments to the top branches of a sweetgum tree that the rangefinder logged as 100 feet from their van.

The thunderstorm flashed lightning and poured rain for nearly two hours, giving them time to also observe corona on a nearby long needle loblolly pine tree as the storm waned. The results, which were the first directly-observed corona discharges occurring in nature, were recently published in Geophysical Research Letters.

“This just goes to show that there’s still discovery science being done,” said McFarland, lead author on the paper. “For more than half a century, scientists have theorized that corona exists, but this proves it.”

Corona discharges take shape during storms, the researchers said, because clouds build up strong negative charges that attract the opposite positive charge on the ground below. Opposites attract and this positive electrical ground charge rises up through the trees to the highest point, causing an electric field on the tiny, hair-like tips of leaves that is great enough to create the weak corona glow in both visible and UV form. This UV from the corona breaks apart water vapor, producing hydroxyl.

Hydroxyl is the atmosphere’s main oxidizer. Oxidizers clean the air by reacting with chemicals emitted into the air, making other chemicals that are easier to remove. These chemicals include volatile organic compounds emitted by trees or human activities and the greenhouse gas methane. The team’s prior research found corona discharges to be a substantial source of atmospheric cleansers in the forest canopy.

The chemical conversion is what researchers keyed in on. Several years ago, the team applied high-voltage, low-current electrical impulses to tree branches and found a strong correlation between the UV emissions from corona discharges and the creation of hydroxyl compounds. In that project and the more recent observations, researchers noted leaf damage at the point corona was emitted.

To capture the phenomena in nature and make use of this correlation, the team developed the Corona Observing Telescope System, a Newtonian telescope that feeds into a UV camera. It’s geolocated, equipped with a device for measuring atmospheric electricity and calibrated for UV emissions using a mercury lamp. The solar UV wavelength band is completely blocked, leaving corona, lightning and fire as the only sources of UV in the field.

In North Carolina, this system captured 859 coronae events on the sweetgum tree and 93 on the loblolly pine. Events ranged from a blink to several seconds, McFarland said. During the field campaign, researchers observed coronae in four additional thunderstorms and on four additional tree species.

“It’s nearly invisible to the naked eye but our instruments give rise to a vision of swaths of scintillating corona glowing as thunderstorms pass overhead,” McFarland said. “Such widespread coronae have implications for the removal of hydrocarbons emitted by trees, subtle tree leaf damage and could have broader implications for the health of trees, forests and the atmosphere.”

While the researchers have confirmed the phenomena, they said they still don’t know much about the potential impacts of these corona discharges and have more questions, such as: Are trees harmed during this process? Or do they benefit in some way? Have they evolved to withstand it? Does the atmospheric cleansing have a benefit to the forest? The researchers are beginning collaborations with interested tree ecologists and biologists to answer these questions, thus blazing new paths of discovery into the natural world around us.

This work was supported by the U.S. National Science Foundation. Brune, Jenkins and Miller were co-authors on the research.

--------------------------------------------------------------------------------

23. Arch Linux Now Has a Bit-for-Bit Reproducible Docker Image

Source: https://antiz.fr/blog/archlinux-now-has-a-reproducible-docker-image/
Site: Robin Candau
Author: Robin Candau
Published: 2026-04-21
HN activity: 37 points · 8 comments
Length: 515 words (~3 min read)
Language: en

Tue, Apr 21, 2026
3-minute read

As a follow-up to the similar milestone reached for our WSL image a few months ago, I’m happy to share that Arch Linux now has a bit-for-bit reproducible Docker image!

This bit-for-bit reproducible image is distributed under a new “repro” tag.
The reason for this is due to one noticeable caveat: to ensure reproducibility, the pacman keys have to be stripped from the image, meaning that pacman is not usable out of the box in this image. While waiting to find a suitable solution to this technical constraint, we are therefore providing this reproducible image under a dedicated tag as a first milestone.

In practice, that means that users will need to (re)generate the pacman keyring in the container before being able to install and update packages via pacman, by running: pacman-key --init && pacman-key --populate archlinux (whether interactively at first start or from a RUN statement in a Dockerfile if using this image as base).
Distrobox users can run this as a pre-init hook: distrobox create -n arch-repro -i docker.io/archlinux/archlinux:repro --pre-init-hooks "pacman-key --init && pacman-key --populate archlinux"

The bit-for-bit reproducibility of the image is confirmed by digest equality across builds (via podman inspect --format '{{.Digest}}' <image>) and by using diffoci to compare builds.
Documentation to reproduce this Docker image is available here.

Building the base rootFS for the Docker image in a deterministic way was the main challenge, but it reuses the same process as for our WSL image (as both share the same rootFS build system).

The main Docker-specific adjustments include (see also the related diffoci reports):

Set SOURCE_DATE_EPOCH and honor it in the org.opencontainers.image.created LABEL in the Dockerfile

TYPE    NAME                  INPUT-0    INPUT-1
Cfg     ctx:/config/config    ?          ?


Remove the ldconfig auxiliary cache file (which introduces non-determinism) from the built image in the Dockerfile:

TYPE    NAME                            INPUT-0                                                             INPUT-1
File    var/cache/ldconfig/aux-cache    656b08db599dbbd9eb0ec663172392023285ed6598f74a55326a3d95cdd5f5d0    ffee92304701425a85c2aff3ade5668e64bf0cc381cfe0a5cd3c0f4935114195


Normalize timestamps during docker build / podman build using the --source-date-epoch=$SOURCE_DATE_EPOCH and --rewrite-timestamp options:

TYPE    NAME                 INPUT-0                          INPUT-1
File    etc/                 2026-03-31 07:57:46 +0000 UTC    2026-03-31 07:59:21 +0000 UTC
File    etc/ld.so.cache      2026-03-31 07:57:46 +0000 UTC    2026-03-31 07:59:21 +0000 UTC
File    etc/os-release       2026-03-31 07:57:46 +0000 UTC    2026-03-31 07:59:21 +0000 UTC
File    sys/                 2026-03-31 07:57:46 +0000 UTC    2026-03-31 07:59:21 +0000 UTC
File    var/cache/           2026-03-31 07:57:46 +0000 UTC    2026-03-31 07:59:21 +0000 UTC
File    var/cache/ldconfig/  2026-03-31 07:57:46 +0000 UTC    2026-03-31 07:59:21 +0000 UTC
File    proc/                2026-03-31 07:57:46 +0000 UTC    2026-03-31 07:59:21 +0000 UTC
File    dev/                 2026-03-31 07:57:46 +0000 UTC    2026-03-31 07:59:21 +0000 UTC


You can check the related change set in our archlinux-docker repository for more details.
Thanks to Mark for his help on that front!

This represents yet another meaningful achievement regarding our general “reproducible builds” efforts and I’m already looking forward to the next step! 🤗

For what it’s worth, I’m eventually considering setting up a rebuilder for this Docker image (as well as for the WSL image and future eventual reproducible images) on my server in order to periodically / automatically rebuild the latest image available, verify it’s reproducibility status and share build logs / results publicly somewhere (if I find the time to get to it 👼).

--------------------------------------------------------------------------------

24. OpenAI's response to the Axios developer tool compromise

Source: https://openai.com/index/axios-developer-tool-compromise/
Site: OpenAI
Submitter: shpat (Hacker News)
Submitted: 2026-04-23 00:45 UTC (Hacker News)
HN activity: 71 points · 41 comments
Length: 1.0K words (~5 min read)
Language: en-US

We recently identified a security issue involving a third-party developer tool, Axios, that was part of a widely reported, broader industry incident⁠(opens in a new window). Out of an abundance of caution we are taking steps to protect the process that certifies our macOS applications are legitimate OpenAI apps. We found no evidence that OpenAI user data was accessed, that our systems or intellectual property was compromised, or that our software was altered.

We are updating our security certificates, which will require all macOS users to update their OpenAI apps to the latest versions. This helps prevent any risk—however unlikely—of someone attempting to distribute a fake app that appears to be from OpenAI. You can update safely through an in-app update or at the official links below:

The security and privacy of your information are a top priority. We’re committed to being transparent and taking quick action when issues arise. We're sharing more technical details and FAQs below.

On March 31, 2026 (UTC), Axios, a widely used third-party developer library, was compromised as part of a broader software supply chain attack.⁠(opens in a new window) At that time, a GitHub Actions workflow we use in the macOS app-signing process downloaded and executed a malicious version of Axios (version 1.14.1). This workflow had access to a certificate and notarization material used for signing macOS applications, including ChatGPT Desktop, Codex, Codex-cli, and Atlas. This certificate helps customers know that software comes from the legitimate developer, OpenAI.

Our analysis of the incident concluded that the signing certificate present in this workflow was likely not successfully exfiltrated by the malicious payload due to the timing of the payload execution, certificate injection into the job, sequencing of the job itself, and other mitigating factors. Nevertheless, out of an abundance of caution we are treating the certificate as compromised, and are revoking and rotating it.

Effective May 8, 2026, older versions of our macOS desktop apps will no longer receive updates or support, and may not be functional. These versions represent the earliest releases signed with our updated certificate:

ChatGPT Desktop: 1.2026.051
Codex App: 26.406.40811
Codex CLI: 0.119.0
Atlas: 1.2026.84.2

As part of our investigation and response, we engaged a third-party digital forensics and incident response firm, rotated our macOS code signing certificate, published new builds of all relevant macOS products with the new certificate, and are working with Apple to ensure software signed with the previous certificate cannot be newly notarized. We have also reviewed all notarization of software using our previous certificate to confirm no unexpected software notarization occurred with these keys, and validated that our published software did not have unauthorized modifications. At this time, we have found no evidence of compromise or risk to existing software installations.

In the event that the certificate was successfully compromised by a malicious actor, they could use it to sign their own code, making it appear as legitimate OpenAI software. We have stopped new software notarizations using the old certificate, so new software signed with the old certificate by an unauthorized third party would be blocked by default by macOS security protections unless a user explicitly bypasses them. Once we fully revoke our certificate on May 8th, 2026, new downloads and launches of apps signed with the previous certificate will be blocked by macOS security protections.

The root cause of this incident was a misconfiguration in the GitHub Actions workflow, which we have addressed. Specifically, the action in question used a floating tag, as opposed to a specific commit hash, and did not have a configured minimumReleaseAge for new packages.

Were OpenAI products or user data compromised?

No. We have found no evidence that OpenAI products or user data were compromised or exposed.

Have you seen malware signed as OpenAI?

No. We have found no evidence that the potentially exposed notarization and code signing material have been misused, and we have confirmed all notarization events with the impacted material were expected.

Do I need to change my password?

No. Passwords and OpenAI API keys were not affected.

Does this affect iOS, Android, Linux, or Windows?

No. This only affects OpenAI macOS apps. This does not affect the web versions of our software.

Why are you asking me to update my Mac apps?

OpenAI identified exposure in a GitHub Actions workflow involved in the macOS app-signing process. Because the exposed workflow was related to macOS app signing, we are proactively rotating the notarization and code signing material used for OpenAI macOS applications. Updating ensures you are running versions signed with our latest certificate. This certificate helps customers know that software comes from the legitimate developer, OpenAI.

Where do I download the updated macOS apps?

Only download OpenAI apps from in-app updates or the official webpages below:

Do not install apps from links in emails, messages, ads, or third-party download sites. Be cautious of unexpected “OpenAI,” “ChatGPT,” or “Codex” installers sent through email, text, chat messages, ads, file-sharing links, or third-party download sites.

What happens after May 8, 2026?

Effective May 8, 2026, older versions of our macOS desktop apps will no longer receive updates or support, and may not be functional. These versions represent the earliest releases signed with our updated certificate:

ChatGPT Desktop: 1.2026.051
Codex App: 26.406.40811
Codex CLI: 0.119.0
Atlas: 1.2026.84.2

Why are you not revoking the certificate immediately?

We have worked to block any further notarization of macOS apps with the impacted notarization material. This means that any fraudulent app posing as an OpenAI app using the impacted certificate will lack notarization, and therefore will be blocked by default by macOS security protections unless a user explicitly bypasses those protections.

Because new notarization with the previous certificate is blocked, and because the revocation may cause macOS to block new downloads and first-time launches of apps signed with the previous certificate, we are giving our users a 30-day window to update to minimize disruption. This window will help minimize user risk and allow impacted clients to update through built-in update mechanisms, ensuring they are appropriately remediated.

We are working with our partners to monitor for any indicators of misuse of the signing certificate, and will accelerate the revocation timeline if we identify malicious activity during this window.

--------------------------------------------------------------------------------

25. Bodega cats of New York

Source: https://bodegacatsofnewyork.com
Site: Bodega Cats of New York
Submitter: zdw (Hacker News)
Submitted: 2026-04-18 02:26 UTC (Hacker News)
HN activity: 190 points · 75 comments
Length: 287 words (~2 min read)
Language: en

The Bodega Cats book arrives October 2026.

Documenting the working cats of NYC bodegas.From "Bodega Cats of New York" · Photo by Gulce Kilkis

Full cover reveal: June 2026. Book release: October 2026

The Book

Bodega Cats of New York

This book documents them. The cats that bring in revenue, like Jimmy of Second Avenue, who runs his block like a seasoned manager. The territorial ones who clear basements in three days flat. The quiet constants who outlast owners, workers, and neighborhood turnover.

120 photographs. 60+ stories. Coming October 2026 from Quarto Publishing.Learn More About the Bodega Cats Book

For brands and agencies

Put Your Product in a Real NYC Bodega

We place products in real stores, coordinate shoots with owners, and produce original content. Featured in The New York Times, NPR, and 100+ outlets.Partner With Us

Advocacy

Legalizing Bodega Cats in NYC

14,000+Signatures

Int. 1471NYC Council

A08341State Assembly

Bodega cats exist in a legal gray zone. State sanitary code bans animals in food establishments, which means owners can be fined for a cat that's been living in the store for a decade. Two bills in committee would fix that. Int. 1471 at City Council. A08341 at State Assembly. 14,000 people signed the petition that got them there.Read the Full Story

From Our Sister Company

Cats About Town Tours

A walking tour through the history of New York's working cats. The strays who ran the docks. The post office cats who earned federal salaries. The brewery cats who never missed a shift.Book a Tour

Bodega Cats of New York

Documenting the working cats of NYC bodegas.

Stories from the Bodega

No spam. Just stories about cats.

© 2026 Bodega Cats of New York · Made in NY

--------------------------------------------------------------------------------

26. Windows 9x Subsystem for Linux

Source: https://social.hails.org/@hailey/116446826733136456
Site: social.hails.org
Submitter: sohkamyung (Hacker News)
Submitted: 2026-04-22 09:52 UTC (Hacker News)
HN activity: 952 points · 224 comments

Scrape failed: fetch: Get "https://social.hails.org/@hailey/116446826733136456": net/http: TLS handshake timeout

--------------------------------------------------------------------------------

27. Workspace Agents in ChatGPT

Source: https://openai.com/index/introducing-workspace-agents-in-chatgpt/
Site: OpenAI
Submitter: mfiguiere (Hacker News)
Submitted: 2026-04-22 17:47 UTC (Hacker News)
HN activity: 133 points · 51 comments
Length: 1.4K words (~7 min read)
Language: en-US

Today, we’re introducing workspace agents in ChatGPT. Teams can now create shared agents that handle complex tasks and long-running workflows, all while operating within the permissions and controls set by their organization.

Workspace agents are an evolution of GPTs. Powered by Codex, they can take on many of the tasks people already do at work—from preparing reports, to writing code, to responding to messages. They run in the cloud, so they can keep working even when you’re not. They’re also designed to be shared within an organization, so teams can build an agent once, use it together in ChatGPT or Slack, and improve it over time.

AI has already helped people work faster on their own, but many of the most important workflows inside an organization depend on shared context, handoffs, and decisions across teams. Workspace agents are designed for that kind of work: they can gather context from the right systems, follow team processes, ask for approval when needed, and keep work moving across tools. For example, our sales team at OpenAI uses an agent to pull together details from call notes and account research, qualify new leads, and draft follow-up emails right in a rep’s inbox. It helps account teams spend less time stitching together details and more time with customers.

To get started, click Agents in the ChatGPT sidebar and describe a workflow your team does often. ChatGPT will guide you step by step to turn it into an agent. Workspace agents are available in research preview in ChatGPT Business, Enterprise, Edu, and Teachers plans.

Editor’s note: GPTs will remain available while teams test workspace agents with their workflows. Soon, we’ll make it easy to convert GPTs into workspace agents.

Turn sound on for guided walkthroughs of five agents your team can build today.

A software review agent that triages software requests, enforces policy, routes approvals, and opens IT tickets with clear next steps.

A product feedback routing agent that captures feedback from Slack, support, and public channels, prioritizes what matters, and turns signals into weekly product action.

A weekly metrics reporting agent that auto-pulls Friday data, generates charts, drafts the narrative, and delivers a business report.

A lead outreach agent that qualifies inbound leads, drafts tailored follow-ups, and updates the CRM.

A third-party risk management agent that screens vendors for sanctions, financial, and reputational risk, then delivers reports.

Describe the job you want done or just drop in a file. ChatGPT helps turn it into an agent: defining the steps, connecting the right tools, adding skills, and testing it until it works the way you expect.

Here are a few agents teams at OpenAI have built—and that your team can build, too:

Software Reviewer: Reviews employee software requests, checks them against approved tools and policies, recommends next steps, and files IT tickets when needed.
Product Feedback Router: Monitors Slack, support channels, and public forums, then turns feedback into prioritized tickets and weekly product summaries.
Weekly Metrics Reporter: Pulls data every Friday, creates charts, writes the summary, and shares a report with the team.
Lead Outreach Agent: Researches inbound leads, scores them against your qualification rubric, drafts personalized follow-up emails, and updates your CRM.
Third-Party Risk Manager: Researches vendors, assesses signals like sanctions exposure, financial health, and reputational risk, and produces a structured report.

You can also get started quickly with templates⁠(opens in a new window) for finance, sales, marketing, and more. Each comes with built-in skills and suggested tools, so you can quickly set up an agent and customize from there.

Workspace agents can gather context and take action across dozens of tools.

Agents are powered by Codex in the cloud, giving them access to a workspace for files, code, tools, and memory. Agents do more than answer a prompt: they can write or run code, use connected apps, remember what they’ve learned, and continue work across multiple steps.

Workspace agents can keep working even when you’re away. You can set them to run on a schedule, or deploy them in Slack so they can pick up requests as they come in. For example, our product team built an agent that proactively answers employee questions in Slack channels. The agent responds with a clear answer, links relevant documentation, and can file a ticket when it finds a new issue. This agent helps teams get unblocked faster while making sure important follow-ups don’t slip through the cracks.

Today, teams can interact with agents in ChatGPT and Slack, with more surfaces coming soon. Agents can join the conversations and workflows where work already happens, helping teams move work forward with less coordination.

Manage sharing and discover workspace agents shared by your team from the Agents tab in the ChatGPT sidebar.

Knowledge is often scattered across people and systems. Workspace agents give teams a way to turn that knowledge into a reusable workflow: one that follows the right process, uses the right tools, and can be shared across the organization.

For example, our accounting team built an agent that prepares key parts of month-end close, from journal entries to balance sheet reconciliations to variance analysis. It completes the work in minutes, generates workpapers with the underlying inputs and control totals needed for review, and follows internal policies. The agent is available in ChatGPT for anyone on the team to use, or added to Slack channels so the team can ask it questions and collaborate around its outputs.

Because agents have memory and can be guided and corrected in conversation, they get better as teams use them. Over time, agents become a practical way to keep team knowledge current: build once, improve through use, then share or duplicate for new workflows.

View analytics for your live workspace agents from the menu in the editor.

When you delegate work to an agent, you stay in control. You decide what tools and data it can use, what actions it can take, and when it needs approval. For sensitive steps, like editing a spreadsheet, sending an email, or adding a calendar event, you can require the agent to ask for permission before moving forward.

After you share an agent, analytics help you see how it’s being used, including how many runs it has completed and how many people are using it.

Workspace agents come with enterprise-grade monitoring and controls, so admins can protect sensitive data while giving teams a safe way to move faster with AI. ChatGPT Enterprise and Edu admins can control which connected tools and actions user groups can access. Admins can also manage who has access to use, build, and share agents. Built-in safeguards help agents stay aligned with your instructions when they encounter misleading external content, including prompt injection⁠ attacks.

The Compliance API⁠(opens in a new window) gives admins visibility into every agent’s configuration, updates, and runs, so they can monitor and control how agents are being built and used. Admins can also suspend agents if needed.

Soon, admins will also be able to view every agent built across their organization in the admin console, including usage patterns and connected data sources.

Early testers of workspace agents are already seeing more consistent results and time for higher-value work.

“The hard part of building an agent is not the model. It's the integrations, memory, the user experience. Workspace agents collapsed that work, so one of our Sales Consultants built, evaluated, and iterated a Sales Opportunity agent end to end without an engineering team. It researches accounts, summarizes Gong calls, and posts deal briefs directly into the team’s Slack room. What used to take reps 5-6 hours a week now runs automatically in the background on every deal.”

— Ankur Bhatt, AI Engineering, Rippling

Workspace agents are available in research preview for ChatGPT Business, Enterprise, Edu, and Teachers plans. For Enterprise and Edu plans, admins can enable agents using role-based controls.

Workspace agents will be free until May 6, 2026, with credit-based pricing starting on that date.

We’ll keep adding more great things in the weeks ahead to help teams get more work done with less manual effort. This includes new triggers that can start work automatically, better dashboards to understand and improve performance, more ways for agents to take action across your business tools, and support for workspace agents in the Codex app.

Teams do their best work when knowledge is easier to find, processes are easier to follow, and people can get help in the flow of work. Workspace agents are an early step toward that future: AI that works alongside people in the tools and conversations where work already happens, helping teams spend less time coordinating work and more time creating, building, and making decisions that move the business forward.

--------------------------------------------------------------------------------

28. What killed the Florida orange?

Source: https://slate.com/business/2026/04/florida-state-orange-food-houses-real-estate.html
Site: Slate
Author: Alexander Sammon
Published: 2026-04-20
HN activity: 151 points · 137 comments
Length: 6.5K words (~29 min read)
Language: en

Photography by Scott McIntyre
Metropolis

Who Killed the Florida Orange?

Deep in desiccated Southern groves, the powerhouse of American citrus is suffering a brutal, unrelenting decline. No one wants to face what that means.

By Alexander Sammon

Enter your email to receive alerts for this author.

Sign in or create an account to better manage your email preferences.

Unsubscribe from email alerts

Are you sure you want to unsubscribe from email alerts for Alexander Sammon?

April 20, 20265:40 AM

Quiet fell over the room, which was neither full nor very loud to begin with, and the 2026 Florida Citrus Show began.

“It should be a great day,” began the event’s first speaker. “Rain should hold off today, even though we definitely need more rain.” No one laughed.

There was no need to say that things were bad. Everyone knew it. The mood wasn’t sour—citrus farmers could handle sour. It was something else. Postapocalyptic. Florida is in the midst of its worst drought in 25 years, but the dry spell actually ranked far down on the list of challenges these bedraggled growers were facing.

In 2003, the mighty Florida orange industry produced 242 million boxes of fruit, with 90 pounds of oranges per box, most of which went on to become orange juice. Now, not even 25 years later, the United States Department of Agriculture was forecasting a pitiful 12 million boxes of oranges, the least in more than 100 years, the worst year since last. A decline of more than 95 percent.

And everyone knew, more or less, that even that figure was not happening. “Twelve million? I would doubt it,” Matt Joyner, CEO of Florida Citrus Mutual, the state’s largest trade group, told me. There was chatter that even 11 million might be out of reach. Could the total end up being less than that, just seven figures? In Florida, the citrus capital of the world, you are today more likely to see the oranges printed on the state’s 18 million license plates than a box of actual fruit.

Rick Dantzler, chief operating officer of the Citrus Research and Development Foundation, took the podium. He was blunt. “It’s been a dumpster fire of a year,” he said.

On the list of immediate problems: the implementation of tariffs and retaliatory tariffs, then the government shutdown, then a stunning, historic freeze, days long, at the end of January and early February, that besieged the fragile orange trees.

And yet those, too, were just footnotes to the even larger problem. Already, Florida had lost about three-quarters of its citrus growers. The last of them, these spent survivors, these hangers-on, had trudged to the Citrus Show to talk about the real problem, which was the disease.

In 2005, Florida first got signs of a new affliction in its groves called citrus greening disease. It also has a Chinese name, Huanglongbing, or HLB, because it came from China, where oranges also came from in the first place.

Citrus greening disease is caused by a bacterial infection that is delivered by the gnawing of the Asian citrus psyllid. (It’s now believed the psyllid first turned up near the Port of Miami in 1998.) The flea-sized psyllid bites the leaves and transmits the disease, which slowly chokes out the tree’s vascular system from the inside, taking years to finally show itself. By the time a tree is displaying symptoms—three to five years, in most cases—it’s too late.

Floridian farmers are no strangers to disease. When HLB first began to spread, there was no indication it would be any worse than any other bug that had appeared over the years. The farmers did what they always did: They sprayed and sprayed, chemicals and pesticides, stuff so powerful that the Centers for Disease Control and Prevention and the U.S. Food and Drug Administration freaked out about potential risks to human health.

But greening spread anyway. Industry groups and the state poured money, millions, into finding a cure, and every time they thought they’d figured it out, it didn’t work, and the greening accelerated. Hurricanes turned out to be a vector for spreading the little winged bug. The wind carried the psyllid all over the state, dropping it off in hundreds of thousands of acres of groves.

Soon enough, trees everywhere were showing blotchy, mottled, yellowed leaves and suffering from twig dieback and sparse foliage. Under duress, the trees would drop all their fruit on the ground prematurely. What rare fruit survived to maturity on these little, addled trees was misshapen, acrid, and stubbornly green on one end; in short, it tasted terrible. Even after being squeezed and processed and pasteurized, the juice was gross.

Now, according to the University of Florida website, the disease is “incurable.” It warns: “There is currently no treatment for citrus greening. Once a tree is infected, it will eventually become unproductive and may even die.”

I asked numerous people—farmers and industry leaders and researchers—to estimate how many trees in Florida now have greening. The answer was resounding: 100 percent. Every single tree.

The Citrus Show was meant to rally those weary troops, to assure them that help was on the way, that this was the bottom. That there was reason to hold on.

And there was: There had been some progress, with oxytetracycline, OTC for short, a powerful antibiotic that is used to treat chlamydia and sometimes syphilis in humans. It wasn’t a cure, exactly, but ceaselessly applied, it was keeping the effects of greening at bay for a few months at a time. Growers were boring holes in the bases of their infected trees and injecting it. It was expensive, and it had only been in use for two or three years, and it would only be a temporary fix at best. But it seemed to be working. There were greener leaves, and oranger fruit, and a palatable juice product.

Scott McIntyre

They had been wrong before, yes—who could forget the tree-steaming solution, which once looked so promising, tenting each tree with a makeshift steam room cranked to 130 degrees, but which ended up failing when it became clear the bacteria were in the roots. But this one seemed, the researchers tried to assure their charges, for real.

A panel of multigeneration growers took the stage to weigh in on their experiences with OTC. It wasn’t altogether triumphant. “Injection just crushes the older trees,” said Tommy Thayer, a fourth-generation grower.

“Most groves are not producing as well post-Ian as pre-Ian,” said Daniel Hunt, of the legendary Hunt Bros. citrus family, referring to the 2022 hurricane. But he had done double injections on some trees, and had seen successes. “Our Valencias were beautiful,” he said. “They had color.”

“Unfortunately, they’re all on the ground right now,” said Thayer, because of the freeze.

Their panel closed with a request that everyone say something positive about their experience in the citrus industry. “The long history,” offered Hunt. “Good for character-building.”

Scientists took the stage, one after another, supplying encouragement. The OTC trials were positive; they were fast at work on a genetically modified tree. “The tree of the future,” they said, again and again. And it was in the lab, and it was on the way. The OTC might tide them over until that GMO creation was ready for widespread planting.

But the timeline, they conceded, was difficult. “We don’t have time because of how the industry is,” said Manjul Dutt, a researcher with the University of Florida Institute of Food and Agricultural Sciences. The realistic run from discovery to commercial production of the GMO tree? “Typically, it’s five years before a tree produces flowers and fruit,” so … “10 to 14 years.” A second researcher presented a slightly different timeline: 12 to 18 years.

“Hopefully you can stay in business,” commented a third.

“Someday, there’s gonna be a talk where ‘HLB’ and ‘solved’ are in the title,” said Randy Niedz of the USDA. “This is not that talk.”

The afternoon wore on. At lunch, I spoke to Jillian Rooney of the Crop Disaster Recovery group, which had a tent set up in the parking lot. I told her I was writing about the state of the citrus industry in Florida. “Oh. Sad,” she said.

A sign at another booth seemingly encouraged the growers to try growing anything else. “Why grow passion fruit?” read one, with a list of its potential upsides. “Sugar apple,” suggested another.

After lunch, the bad news kept coming. It wasn’t just greening that had to be worried about. There were root nematodes, launching a subterranean attack. There was citrus canker, caused by a bacterium that had plagued citrus for years prior to the arrival of greening.* (It, too, came from China.) Then came a seminar on citrus black spot, another recent arrival.

“It is not known how it arrived,” said Clive Bock of the USDA. “But it could spread to the whole Gulf Coast.”

Things had deteriorated quickly. “Three, four years ago, the juice was 80 percent from Florida,” said Weston Johnson, of the Coca-Cola Company, which owns Minute Maid. “Now we’re 20 percent Florida.” A quintessential crop and national icon of the 20th century in America was dying before our eyes, and outside this room, most of the country—even Florida itself—had barely noticed.

Juice shots were being given out, in tiny 1.5-ounce bottles. “Made with orange-like hybrids with tolerance to HLB. This juice is an innovation that represents the future of citrus,” a sign next to the cooler said. “100 percent American juice,” boasted the label.

I drank it. It didn’t taste very good.

Scott McIntyre

“The custom of drinking orange juice with breakfast is not very widespread, taking the world as a whole, and it is thought by many peoples to be a distinctly American habit,” begins writer John McPhee in a famous two-part 1966 essay in the New Yorker that ran to 40,000 words, an indulgence that met the grandeur of the industry.

Maybe attention spans were too long back then. Here’s the condensed version: The Spanish conquistadors rocked up to northern Florida in the 1500s, amid all that marauding, and planted the orange tree, taken from (yes!) China. It was small-time stuff until after the Civil War, when the railroads reached south and the fruit sold north. Historic freezes in 1894 and 1895 nearly eradicated the industry, its first and last real brush with old-world calamity. Instead, it drove things south. Planters started anew in central Florida, in Polk County and its surroundings, what’s known as the Ridge, the highest part of Florida, and the only part that was never below sea level, historically.

The frost problem having been dealt with, the arrow was pointing straight up. Then came the technology that changed it all. It was World War II, and the American military wanted vitamin C to keep its front-line boys in fighting trim. It paid for the research for what would become frozen juice concentrate. As with cigarettes, those boys came home hooked. By 1950, the state was doing more than 100 million boxes a year. The orange blossom had already become the state flower in 1909, and, by 1967, a year after McPhee’s opus, the orange was the state fruit.

Florida sold some whole fruit, but the biggest money was in “crushing fruit”: making and selling juice. The citrus families became royal in the Sunshine State. Incredible intergenerational empires were amassed, with land holdings the size of small states. The Jack Berrys, the Bob Pauls, the Hunt brothers, the Lykes brothers, all of them with juniors or thirds or fourths. And the biggest, by far, was Ben Hill Griffin Jr. Even Peter Pulitzer, grandson of publishing tycoon Joseph Pulitzer, amassed a citrus empire.

Behind them came the corporate class: Tropicana, which ended up with PepsiCo, and Minute Maid, which went to Coca-Cola.

The citrus barons’ names went up on everything in Florida. Street signs and golf courses and the university. The ranks of the Bull Gators—that’s the list of top boosters of the University of Florida’s athletic programs—were overrun with citrus families.

The citrus world got whole stadiums. Ben Hill Griffin still has his name on the University of Florida’s 90,000-person football palace, better known now as the Swamp. Tropicana got the MLB stadium in St. Petersburg, where the Tampa Bay Rays played until Hurricane Milton blew the roof off in 2024.

Griffin even made for himself an industry town called Frostproof—a canny, if defiant, advertising play, named years prior, after the town had survived the mythic 1895 freeze without much issue. Frostproof became a cipher for just how untouchable the industry had become. Citrus baron Latt Maxcy incorporated there, too.

McPhee marveled: “The industry is self-regulating and pays its own way.”

Then came the 1970s, and a new technology arrived: the herbicide glyphosate, created by Monsanto. The citrus industry adopted it early and zealously, taking to it like water, spraying it all over the ground until not one sign of non-citrus life remained. When new complications came, they sprayed more. Acreage grew to 832,000, with record yields, and Florida was king, producing 78 percent of all United States citrus.

Up and up it went, and why not? The process got more mechanized through the back half of the American Century—out with the cover cropping, in with the monocrop, packed tight as can be. One innovation followed the next. Frozen concentrate fell behind the novel idea of “not from concentrate”—no longer did they squeeze it and freeze it. And they were unaware, or unconcerned, that that chemical was wreaking havoc on the soil, weakening the trees’ defenses, leaving them extremely vulnerable to disease.

Why would they be? Times were good. In 2000, the agricultural trade agreement with China opened the Chinese market to fresh Florida citrus. Commissioner Bob Crawford hand-delivered a 10-carton shipment to commemorate the event. They were loving China then.

And then it all came down so fast. There were the fad diets of the 2000s: no sugar, low-carb. The American Academy of Pediatrics began crusading against juice for kids. Orange-industry groups hired medical professionals as spokespeople in public relations, and kicked off an emergency ad campaign addressing what they branded “juice confusion.” That didn’t work. The citrus estates began to get carved up in tawdry divorce settlements, battles of wills that captivated the tabloids. Invasive species came in all guises: foreign pestilence, foreign capital, and the developers. It was the perfect storm. And then, of course, there were the actual perfect storms, the high-caliber hurricanes that, before climate change, didn’t come to the Ridge: Irma, Ian, Milton, massive cells, all direct hits on the groves.

The orange barons lost breakfast, and lost Florida, too. Who killed the Florida orange? Were outside invaders to blame? Or was the culprit right at home?

Scott McIntyre

As with his beloved Florida citrus, Rick Dantzler’s on the way out—age 70, retiring from the Citrus Research and Development Foundation, which, after losing its state funding, was getting absorbed by another group anyway. He met me in Lake Alfred, at the site of its University of Florida satellite campus. I asked him to drive me through the best groves in the heart of citrus country. “It’s gonna be a lot of houses,” he warned me. “It breaks my heart.”

Dantzler is a Florida man through and through. He is third-generation in Winter Haven, in Polk County, and has the locution to prove it. When talkin’ oranges, he pronounces Valencia “Vuh-LEN-chuh.” His wife is fourth-generation. His father at one point had 160 acres of citrus. Now the family has none.

The collapse of Florida’s citrus industry, he told me, took everyone by surprise. “It happened so fast,” he said. “What’s remarkable is how many people here in Florida are not aware of it.”

The first stop on our tour was a brand-new grocery store with a sprawling parking lot out front. “That was a fantastic grove. It’s now turning into a Publix,” he said matter-of-factly. Opposite that was a giant dirt lot, graded flat. “These were fantastic groves on both sides of the road. I remember one time as a kid I saw a great big corn snake crossing the road—when there was not much of a road—and the corn snake climbed up into an orange tree. I acted like I couldn’t quite get to it, I was actually scared—” He cut himself off. “These were just fantastic groves. Not anything left.”

The next stop on our tour was a gas station. “This used to be a really great grove right here where this Circle K is going in,” said Dantzler, on cue.

“You’d drive down here in the spring and it’d smell so good you’d think you were in a perfume shop,” said Dantzler, as we passed through Polk County. It was March, and if you rolled down the window, the only scent was exhaust.

Dantzler knew citrus greening as well as anyone. He’d lived with it every day for years. He was sanguine about the effects of OTC, even though it was temporary, and expensive, and even though the treated trees got reinfected every four months. He was sure there were better days in oranges to come—oranges were growing well beneath pricey protective screens, for instance—so long as there was anyone left to plant them.

We crossed into Dundee. “Now, this was citrus country; this was the heart of the industry. The best groves in Florida were right here. All the varieties in Florida were located right there. It was almost like a seed bank, so if catastrophe happened, the industry could always replant,” said Dantzler.

“Catastrophe’s kinda happened,” he added—but the seed bank grove was no longer there.

Scott McIntyre

He also knew hurricanes. The storms were also part of old Florida culture. But rarely did they make landfall near the oranges. Many of the groves had gone decades without any real hurricane exposure. Then the climate got warmer, and along came Hurricane Irma in 2017, which hammered the Ridge. Its winds, which reached 142 mph, shook the trees violently on their shallow roots. It was the first of many. The trees made it through that year, without evincing the scale of the damage. The next year, or the year after, or three past, when the fruit wasn’t coming, it became clear that the stress of the high winds on already weakened root systems had traumatized the trees, often permanently.

After that, Hurricanes Ian, Idalia, Helene, and Milton all made landfall on the peninsula. “In 2021, we fell off a cliff,” he said. Five major storms went right over grove land.

We drove past more empty lots, more abandoned groves, desiccated trees, signs announcing public hearings for land-use changes. We passed mountains of trunks and branches, piled high. In the local parlance, they’d been “pushed”; soon, they would be burned. We passed a road sign for Tucker Paving as another plot was getting razed. “Mr. Tucker’s father and my father were best friends,” Dantzler said. “I know all these guys, they’re my friends. But look at what’s happening.”

Dantzler told me that he didn’t see 2005, when greening symptoms first became clear, or 2017, even with Irma, as the turning point. He pegged it to 2007, when another invasive species took off, the one that now dominated the landscape on our drive: suburban sprawl.

The stress placed on the groves by wind and water turned out to be little compared to the stress put on them by development. There were the solar farms, which targeted large tracts of land for panels, and those tended to be former groves. There were the data centers, too. In nearby St. Lucie County, a $13.5 billion hyperscale data center was proposed, one of the largest in the world, to be built atop 1,400 acres of old groves. That was on the corner of Orange Avenue and Minute Maid Road.

But the sprawl was really the crux of it.

It was not inevitable. In the mid-1970s, Florida began a period of environmental enlightenment. In the course of three decades, the state passed all sorts of legislation, from wetlands protection to local government growth-management initiatives. The state government established what was called a “concurrency doctrine,” instituting strict requirements for infrastructure development—water, sewer, schooling—that had to be established before construction permits were granted.

Then came the heady days of 2007, when, as you might remember, Florida’s housing developers overbuilt so dramatically, and financed so dubiously, that they helped bring the whole global economy down with them: the Great Recession. The state government, desperate to stimulate the economy and its moribund real-estate sector, began eroding the growth-management plan that had restrained development. The Department of Community Affairs, the state agency that oversaw all the local government growth planning and limited development, was just abolished outright.

From then on, it became a political story. Soon, the developers had bounced back, deep-pocketed and powerful in Tallahassee, and they weren’t done yet. They bet big on the ascendant Florida Republican Party, backing Rick Scott all the way to the governor’s mansion. In 2011, Scott gutted concurrency, a critical regulatory standard that kept developers in check. Then the developers went all in for a Yale man named Ron DeSantis. The citrus industry still had political power, too. But the deregulation they won did little to stanch the bleeding. Reeling from a depressed economy, then an explosive greening problem, then hurricanes, they were soon going to the statehouse, desperate for bailout money. “I think the development community saw an opportunity to get rid of most of those regulations. And they’ve gotten rid of most of it,” Dantzler said.

Scott McIntyre

By certain estimates, Polk County, where we were driving, has been the fastest-growing area in America, and the developers have been cashing in. A citrus grove must be planted in sand, which occurs naturally, by some geological miracle, in central Florida. (The miracle, specifically, was the Appalachian Mountains, which eroded and deposited sand there over millions of years.) The trees won’t take in wetlands, in mucky soils. But that sand itself is also in high demand for cement, for construction, for building shoulders for highways, for filling in wetlands for development. Up here, Dantzler pointed, was a sand mine, which had torn out groves and gotten to mining beneath them. “There’s a crazy market for sand,” he said.

Sandy land itself is the easiest property to develop. Wetlands are still often protected from a development standpoint, and so, in addition to infill, require pricey, lengthy permitting. Sandy uplands, hiding beneath every citrus tree, are low-regulation and ready to build on.

So, while the growers were losing money hand over fist, housing developers were coming through with godfather offers to buy them out, convert them to row housing, and sell, sell, sell. Flags of every homebuilding giant flew on vanquished ground: DR Horton, Lennar. At nearly every intersection there were signs for cheap housing—no money down, homes in the low $200,000s, yes, for real, in 2026. Bunting and grand openings and exclusive offers abounded.

We drove past another former grove, which Dantzler again called “phenomenal,” which was now selling 10-acre lots. “This is all post-’07 stuff,” he sighed.

And so real estate was on the march, and even the citrus industry legends had become deserters. After the Gulf Citrus Growers Association shut down in 2024, its president, Wayne Simmons, a fifth-generation citrus grower, became a realtor. He wasn’t the only one.

We drove down new roads graded for housing, named after the citrus families who had once planted there, not a tree in sight. “Cable and Internet Included,” offered one sign.

“My gosh, we’re getting into the Ben Hill Griffin stuff. It’s just phenomenal,” said Dantzler. But there were no trees there anymore. There was just compact sand, a model home, and a barely-there development project. “We offer zero down payment!” the developer pledged. “Enjoy limited-time incentives like paid closing costs and exceptional financing options!”

The sign out front was complete, bearing the development’s name: Citrus Place.

“Citrus Place?!” Dantzler asked, incredulous. “That offends me.”

We drove on, and Dantzler told stories of dove hunts in the grove, of outsize characters of the old citrus elite. He insisted that in the midst of all of this housing, the citrus of the future remained. He put the car into low gear, and we drove into a test grove he had just recently been involved in planting. The trees were shorter than they used to be, with less canopy, packed tighter than ever, but they were showing promise. “In the pre-greening days, you could spot a grove car by all the scratches on both sides,” he told me.

Orange trees used to live 50 to 100 years; these little upstarts might make it 12 to 15. And so we set off down the aisle, and the branches hit the side mirrors and, on occasion, scratched the door.

And there, in the adjacent grove, had sprung up a house. “Now, that house is new,” said Dantzler. “What in the world’s that all about?”

Scott McIntyre

The famed Frostproof, Florida, was once the seat of the Ben Hill Griffin empire. It wouldn’t be fair to call it a ghost town now, exactly. According to census data, 3,000 people call it home. There is a stoplight. But the name does loom over it, haunting what remains.

There are three parts to a citrus operation: groves, packinghouses, and processing facilities, where the juice is made. Griffin—Frostproof—once had it all. The firm operated a major packinghouse in Frostproof, where fresh fruit was boxed for sale for roughly 70 years. It closed for good in April 2017.

The Ben Hill Griffin packinghouse wasn’t the only one. According to Peter Chaires, executive vice president of Florida Citrus Packers, in less than 40 years Florida had gone from 88 packinghouses to, now, just eight. Even one closure could be devastating to a community. Chaires told me that in Haines City, they lost a packinghouse that had been the primary employer since 1909.

Chaires was even more alarmed by the collapse of the processing facilities, which make juice. Florida was a juice state, after all. Building a new packinghouse was light work compared to building new juicing facilities. “It’s extraordinarily important that we try to hold on to our processing capacity that we have now,” he said.

In fact, in 1977, Florida boasted 53 different processing plants for crushing fruit, pasteurizing, or making fresh juice or frozen concentrate. Now, said Robin Bryant, executive director of the Florida Citrus Processors, there are just four: Cutrale (a Brazilian firm supplying Minute Maid), Peace River Citrus, Florida’s Natural, and Paracone, a boutique operation.

At their peak, those processing facilities would run three shifts, billowing steam morning, noon, and night. Now they’d cut back to one shift. Even still, Bryant told me, “all but one of those plants could process everything we produce in Florida on their own.” This year, Tropicana announced that for the first time it would not be processing fruit in Florida at all. Minute Maid killed frozen juice concentrate.

Griffin once had a processing plant in Frostproof, too. And that, too, was gone.

But it wasn’t greening that had caused this collapse. The decline had taken off in the 1990s, when the industry opened itself up to Wall Street and to foreign capital. According to the Florida Department of Citrus, in 1996, foreign buyers bought two plants in Auburndale, which kicked off a trend: From then on, the “majority of plant acquisitions that followed would have owners headquartered outside of the United States.” In 1998, privately owned Seagram’s sold Tropicana to publicly traded Wall Street darling PepsiCo, and things quickly began to change.

Griffin, meanwhile, had sold off its Frostproof processing facility to Procter & Gamble, which in turn had sold it to Cargill, based in Minnesota. But Google Maps, itself seemingly haunted, insisted that the boxy plant off Highway 17 was Griffin’s. When I drove up, the gate was open, though there were no other cars; from its dull roar, I could tell the plant was not totally idle.

Out came Mike, one of two employees I saw on-site. Mike had grown up in the area, he told me; worked there for years.

“The orange is gone. It’s dead,” he told me. “All the spots that they’re building houses, they’re the orange groves.”

The plant where he worked, he said, was now owned by Peace River. But they weren’t processing anymore. They were simply cold storage. Now orange juice came from Costa Rica, Argentina, and, mostly, Brazil. Grapefruit juice sometimes came from Hungary. It was shipped in tankers to the nearby Port of Manatee and then trucked to facilities like the one behind him, for safekeeping.

“Today, it’s a lost empire,” said Mike; the plant where he worked best described as “a mausoleum.”

In fact, Florida, everywhere, has become a storage locker for juice from Brazil, where the land is cheaper, the regulations are laxer, and the chemicals are cheaper, too.

“The labor is pretty much slave labor, I guess,” shrugged one local farmer I spoke with.

Scott McIntyre

At the Citrosuco plant in Polk County—which flew the Brazilian flag alongside the red, white, and blue—and at the Cutrale sites and even at Florida-based Peace River, it was basically Florida in name only. “Seventy-five percent of juice packaged in Florida comes from Mexico or Brazil,” Bryant said. One had to be honest: Florida juice was “just not at the quality it used to be.”

But not all is well in the citrus kingdom of Brazil, either. The greening is “beginning to catch up to them,” Bryant said. In 2025, greening affected a record 47.63 percent of orange trees in the Brazilian Citrus Belt, according to Fundecitrus; 100 million trees, of 209 million, are now infected. Brazil followed Florida’s lead in what researchers now call “excessive glyphosate usage,” and has been, suddenly, reaping similar outcomes. (Citrus greening has been around for 120 years, and exists worldwide, and has, so far, caused an extinction-level event only in Florida.)

Behind the Peace River plant, sprawled out across a massive lawn, and behind a chain-link fence, were the ruins of a processing infrastructure. Giant, stainless-steel mixing tanks and vats, on their sides, tanning in the Florida sun.

And then, finally, came another modern pestilence. In 2021, after years of losing money on Tropicana, PepsiCo decided to get the company off the books. They put their majority share up for auction. In January 2022, they announced the buyer: a French private-equity fund called PAI Partners. (PepsiCo maintains a minority position.)

“European,” noted Tim Hynes, global head of credit research at Debtwire, a leveraged finance consultancy. “It was actually somewhat surprising to me that they had won this asset.”

PAI had a food and consumer portfolio: They owned European Pizza Group, a leader in the frozen-pizza business in Europe. They owned Alphia, a pet-food co-manufacturer in North America.

Maybe they thought that what ailed the Florida citrus could be cured with a little private-equity magic. The firm renamed it Tropicana Brands Group, balling up other beleaguered beverage properties too, and packed it full of debt. And then they raised prices and shrank the packaging. “Everyone does that,” Bryant said. But the move was calamitous. In 2024, Tropicana became the face of the shrinkflation epidemic. People raged online, in Reddit forums, on Facebook.

“That just didn’t go as planned,” Hynes told me. “They have a bunch of debt. They were going to run out of money.” By 2025, PAI was talking about bankruptcy for Tropicana, though a $30 million emergency loan had steadied things for a time. PAI is “not confident any value remains from their initial investment,” Hynes told CNN.

I drove all around Frostproof, at Mike’s encouragement, looking for oranges, which I hadn’t seen many of. “Valencia Acres,” read one housing development. “Price cuts!”

What I saw, primarily, were mobile-home parks, next to Ben Hill Griffin Elementary.

Scott McIntyre

Alico’s Joshua Grove was the largest citrus grove in Florida, likely the largest contiguous grove in the country. Except that in January 2025, Alico—the largest citrus grower in Florida, the largest citrus producer in America—announced in a release that it was done. Their orange era was over. So the Joshua Grove is now actually Florida’s largest citrus graveyard.

In all, Alico began gutting 53,000 citrus acres at the end of the 2025 harvest: 35 percent of Florida’s citrus production, condemned in one press release. Which meant that every tree under Alico management in the Joshua Grove was dead, dying, or already gone.

“For over a century, Alico has been proud to be one of Florida’s leading citrus producers,” Alico’s president and CEO John Kiernan said in the statement. “But we must now reluctantly adapt to changing environmental and economic realities.”

Because of outstanding leases, third-party caretakers were getting one final season to manage 3,500-odd acres, “through 2026,” added Kiernan, in the announcement. The harvest ends in early April now—it used to stretch into June—so the oranges here now would be the very last to ever come off the property.

Mitch Hutchcraft, Alico’s executive vice president of real estate, agreed to give me a personal tour of this citrus necropolis. The grove was more remote than anything I’d seen: south of the Ridge, in DeSoto County, without a house in sight. It was also enormous: seven miles on one side, nearly seven miles on the other, all planted in tight rows, stretching beyond the horizon, which, in flat Florida, is really saying something. A gate guard lifted the arm, and we drove in.

Scott McIntyre

To the right, he pointed, was an old grass runway where pesticide planes once landed and took off.

We began to roll past dead and dying trees in various stages of decay. Some were blackened, shriveled, barren of leaves. Others had fruit sitting on the ground beneath, most of it not quite orange, rotting. “They go pretty quickly,” Hutchcraft said.

Alico, as part of its plan to “become a diversified land company,” was turning 25 percent of its land into—what else—commercial and residential development. Already, it was underway with the construction of two “villages.” A year prior, the company unveiled Corkscrew Grove East Village, and Corkscrew Grove West Village, a 9,000-home development. They had similar plans for Bonnet Lake in Highlands County, Saddlebag Grove in Polk County, and Plant World in Hendry County.

The remaining 75 percent, like the Joshua Grove, which remained too remote for housing development, would be put to other agricultural uses. That meant row crops; that meant cattle grazing. The company would also be pursuing mineral extraction and oil. Already, it had tens of thousands of acres of oil production.

Alico had done everything it could, Hutchcraft told me, right up until the end. It had replanted, and injected trees with OTC, and everything else. But it was still losing money, and lots of money, fast.

Soon enough, they would begin to turn this land over. A front-end loader would roll down the seemingly endless aisles and pop the trees out one by one. With their shallow root systems, addled by disease, the trees wouldn’t put up much resistance. Then, they’d push the carcasses into piles and burn them. There weren’t any loaders working yet, though, and the weather wasn’t cooperating for a burn. “You do it when you’re getting rain—April, May. You don’t want it to be really dry, that can get out of control,” he said. “We’ve had cold spells, and it’s windy.” Plus the drought.

We pulled up to a harvest wagon, a large flatbed. It was empty. Picture, said Hutchcraft, laborers going up and down the rows with bags on their shoulders, picking orange fruit and filling them, and then lugging the bags to tubs at the ends of rows. There, they’d empty the bags into the tubs. Then trucks would go up and down the rows picking up the tubs. Finally, the trucks would disgorge their citrus into a harvest wagon, the giant flatbed, which would be driven by semitruck to the processing center. Historically, these groves would throw off 500 boxes per acre. At the end, the company was lucky to get 90. “There would have been harvest crews all up and down. You’d have seen trailers parked, filled with fruit,” he said.

Now the harvest wagon was empty, and there were no trucks, and no tubs, and no shoulder bags, and no guys.

“SLOW—CONGESTED AREA” warned various road signs stationed throughout the rows. But we didn’t see a single other person.

Up until the moment Alico announced it would be producing zero oranges, it had been the single largest provider of oranges for Tropicana. When they informed Tropicana of their decision, Hutchcraft told me, the company didn’t push back.

Down the rows we went. “Hamlin,” read a sign, and for miles in both directions were furrows and beds with only dirt or scrub brush or dead trees. “Valencia,” read another, and the same.

It was hard not to be nostalgic.

But time had run out, and times had changed. Citrus was “the profession that drove the state, was the iconic state industry,” Hutchcraft said. No longer. Now Florida Southern College, the fabled Frank Lloyd Wright–designed campus that citrus money made, didn’t even offer a citrus management program. Orange juice had even lost the battle for shelf space to seltzers and energy drinks and kombucha and more.

“Florida’s always been a boom-and-bust state,” said Hutchcraft. In the distance, a plume of smoke rose, likely dead trees burning, though it was hard to see from so far. We shook hands and parted ways, the tour complete.

I returned to the guard booth, which housed the only other person I’d seen at Joshua Grove. The guard, Jack Gunther, cracked the door. He had false teeth and an American flag baseball cap. The smell of cigarettes billowed from the booth; the walls were stained with nicotine.

Thirteen years was how long Gunther had worked directing traffic at this grove, he told me. He’d lived in the county all his life. Here he was, the last orange man left.

What did he think of all this, I asked him. What happened to the Florida orange?

“I think they killed it themselves, with chemicals. That’s a fact,” Gunther said. In my time in Florida, I’d found a more complicated story, but down here, everyone had their theories, their longing for citrus nirvana, and their anger at the loss.

“They sprayed so much chemicals, the damn grass don’t even grow here anymore—you can quote me,” Gunther said. “I knew it back in 1990. I said, ‘They’re sprayin’ so much chemicals it’s gonna be the end.’ And it’s the end.”

And then he asked: “You wanna come in and watch TV?”

Correction, April 22, 2026: This piece originally misstated that citrus canker is a viral infection.

Economics
Florida
Food
History
Labor
Farming
Drink

--------------------------------------------------------------------------------

29. The handmade beauty of Machine Age data visualizations

Source: https://resobscura.substack.com/p/the-handmade-beauty-of-machine-age
Site: Res Obscura
Author: Benjamin Breen
Published: 2026-04-22
HN activity: 34 points · 1 comments
Length: 2.8K words (~13 min read)
Language: en

I spent last week at Harvard doing research in the archives of William James, the psychologist, philosopher, psychical researcher, brother of Henry James, and all around interesting person. He was a brilliant, charming, self-defeating, deeply strange man (exhibit A: he believed taking a high dose of nitrous oxide helped him finally understand Hegel). That mixture of qualities comes across vividly in his papers.

What doesn’t, at least at first, is that he was a talented visual artist. In fact, before he became a psychologist, William James dreamed of being a professional painter. He studied for several years in his late teens and early twenties under the painter William Morris Hunt.
John la Farge’s portrait of a young William James at the easel, when they were both art students, circa 1859.

Although none of William’s paintings appear to survive, a careful reader of his archive will find evidence that he continued to draw throughout his life.

Here he is, for instance, doodling on an envelope addressed to him from Geneva:
Doodles on a letter to James from 1898, photographed by me at the Houghton Library, Harvard.

Readers of The Metaphysical Club, Louis Menand’s wonderful book about James and his circle, will recognize the image below as William’s sensitive drawing of his brother Wilkie while he was recovering from being shot during the Civil War.
Courtesy of the Houghton Library, Harvard.

The visual creativity of James is not just a clue about how his own mind worked. It’s also part of a larger shift in the culture of science during his generation: in the nineteenth century, design and the nascent world of big data came together, for the first time, to create the modern concept of data visualization.

Although James and his collaborators are rarely mentioned in discussions of the origins of data visualization, they actually played a very important role in shaping it. They were the consolidators and extenders of a new paradigm — the generation after the famous names in the field like William Playfair and Florence Nightingale.
“Diagram of the causes of mortality in the army in the East” by Florence Nightingale, 1858 via Wikimedia Commons

The generation of William James came into adulthood in the Machine Age of the 1870s, ‘80s and ‘90s. Information buffeted the human brain like never before. And the new technique of data visualization pushed into new domains: the mapping of the mind, the sociology of race, and the pursuit of explicit political ends.

James is famous among historians of science for what Francesca Bordogna calls “boundary work” — he moves across disciplines and fields with a manic, restless energy that makes it hard to figure out exactly what he was.

As we’ll see, this included an unusual approach to visualizing mental activity that yielded significant firsts, including the first schematic of a neural network. Along the way, James also had important relationships with two pioneers of modern data visualization who are rarely put in the same sentence: Francis Galton and W.E.B. du Bois.
James, Galton, and du Bois: products of the Machine Age.

Francis Galton (center) was a sort of intellectual frenemy for James, initially a mentor and influence, then later — in his guise as a founder of eugenics and ardent imperialist — as an exemplar of the dangers of scientific hubris. Du Bois (pictured at right) studied with James at Harvard and was deeply influenced by his philosophy of pluralism.

What they shared is a conviction that drawing, diagramming, and composing images was not a decorative step added after the thinking was done. It was how the thinking got done. This seems to me a crucial point in light of new AI tools like Claude Design, which automates the design process (on which more below).

Quite a few people have pointed out that writing is a form of thought (my favorite entry in this genre is this 2025 essay by Derek Thompson). But it’s worth thinking more about what else counts as thinking too — specifically, the sorts of important, creative thinking we don’t want to accidentally mislabel as “drudgery that we are happy to let AI take over for us.”

Design, I would argue, is not drudgery.

The images below also remind us how handmade data visualization once was. W.E.B. du Bois’s visualizations from the 1900 World’s Fair are rightfully famous online, and they look great as compressed jpgs shared on social media. But it’s important to remember that these are large, hand-drawn, hand-lettered posters. These are not just the product of mental work but manual work, too.

That link between the hand and mind is harder to come by in a world where all research is digital, but it can be fun and important to access when doing serious research. When I was deepest in researching the history of 20th century psychedelic science for Tripping on Utopia, I filled up a yellow notebook with collaged images and primary source snippets from my archival research. I started it as a sideline, almost a hobby to distract me from what I saw as the “real” work of actually writing the book.

It turned out to be one of the most significant forms of research I did, precisely because it was so freeform and undirected. I started noticing links between different documents, and thinking more deeply about the motives (public, private, and even subconscious) of the people I was writing about.
Entries for the Congresswoman and LSD enthusiast Clare Boothe Luce (left) and the occultist and rocket scientist Jack Parsons (right) in my yellow legal pads.

The rest of this post is a gallery of some of the ways that James, Galton, and Du Bois visualized data along with some desultory commentary. I end by experimenting a bit with Claude Design to see what gets lost when we automate this type of exploratory visual thinking with data.

If you find it interesting, please consider subscribing.

Share

You’d be hard-pressed to find any reference to James in books about the history of data visualizations or design. But I think he deserves a page or two. For one thing, his Principles of Psychology (1890) contains what I believe to be the earliest ever visual representation of a neural network:
WJ’s diagram in Principles of Psychology (1890, p. 570) showing how one memory (left), on “reoccurring, tends to propagate its excitement into the other” (right).

The most interesting data visualization in James’s work is this however, IMO:
Visualization of the stream of consciousness from chapter 9 of James’ Principles of Psychology (1890).

James intends the image above as a representation of how consciousness “moves through” the process of uttering of a simple sentence over time. The numbers on one axis are showing moments in time, and the other axis shows the words being said or thought. The Joy Division-esque crest running through the middle is the changing attention we pay to each word over time.

It is really striking to me how much this looks like a computer rendering — but it’s from 1890! For this attempt at a sort of faux-four-dimensional modeling of thought, plus the neural network chart alone, I think James deserves a lot of credit for his data visualization.

There are other interesting images in Principles of Psychology, especially in chapter two which you can read here, although they are a bit more familiar.

For instance, here is a schematic of how a child perceives a candle flame:
“Let the current 1—1, from the eye, discharge upward as well as downward when it reaches the lower centre for vision, and arouse the perceptional process s1 in the hemispheres… Let the feeling of the arm’s extension also send up a current which leaves a trace of itself, m1; let the burnt finger leave an analogous trace, s2; and let the movement of retraction leave m2. These four processes will now, by virtue of assumption 2, be associated together by the path s1—m1—s2—m2, running from the first to the last.”


And an early attempt to map which parts of the brain correspond to specific body regions and sensations:
The brain of the monkey. Fig. 6.—Left Hemisphere of Monkey's Brain. Outer Surface. From the first edition of The Principles of Psychology by William James (1890), vol. I, page 34.

James read Galton closely, and cited him throughout Principles of Psychology — especially on questions of mental imagery and visual perception. Galton’s famous “breakfast-table questionnaire,” which asked hundreds of correspondents to describe how vividly they could picture the objects on their breakfast table that morning, was one of the first systematic attempts to gather data on subjective visual experience at scale. James was fascinated by it. He replicated versions of Galton’s mental-imagery surveys in his own classes and used the results in his writing.

Galton was also a pioneer of meteorology and produced many, many beautiful charts relating to weather phenomena — things like this:

But the visualization that appears to have most interested James (and which I find most striking too) is this jam-packed color plate about mental imagery and synesthesia which comes at the end of Galton’s book Inquiries into Human Faculty and Its Development (1883):

The big idea that linked Galton and James was not just that inner life could be represented as data, but that this data could be rendered as pictures.

Galton, obsessed as he was by measurement and averages, took this in a novel and discomfiting direction. Elsewhere in the same book that the beautifully strange image above comes from, Galton introduced composite portraits — photographs of criminals, tubercular patients, “types” of any kind — layered individual faces on top of each other to produce a kind of statistical average in visual form.

The project was inseparable from his invention of eugenics — a term which Galton coined in the same 1883 book. Galton’s visualizations encoded his conviction that human variation could be sorted, ranked, and ultimately improved through selective breeding.
Detail from p.7 of Inquiries into Human Faculty and its Development (1883)

In this work, to a large extent, the visual was the argument. Later, Galton even used his own photographs and biometric measurements as an example to encourage others to submit to biometric data collection. He wanted to visually model not just a new approach to data but a new mode of life in which all aspects of the human were reducible to data.

Reading his work from the perspective of the 2020s — with our turn toward “post-literate,” image and video based apps powered by mass data collection — it’s hard not to conclude that in this, he was hugely successful.
Source: University College, London. For more on Galton see Ava Kofman’s article on him in Public Domain Review.

Share

William James, who prized pluralism and unconventionality, could not follow Galton on his journey into the world of averages and biometric data. But one of James’s most talented students, a young W.E.B. Du Bois, did. Sort of.

As the philosopher Colin Koopman writes, Du Bois took Galton’s obsession with measurement and flipped it on its head.

W.E.B. Du Bois — the first African-American to earn a Harvard PhD, and later one of the most important sociologists and civil rights thinkers of all time — studied philosophy with James in the late 1880s. Du Bois had a remarkably visual mind, with an approach to imagery that verged on the synesthetic.1

Du Bois absorbed James’s insistence that human experience could not be flattened into a single scale. But he also absorbed Galton’s conviction that the world could be made legible (and changeable) through creative use of data visualization. What he did with this combination was something neither of his predecessors had quite imagined.

Below are some of the charts that Du Bois made for the 1900 Paris Exposition Universelle — the same world’s fair that gave us the first public moving walkway, Rudolf Diesel’s engine, and the Art Nouveau métro entrances that still dot Paris today. Du Bois was there as the lead curator of the “American Negro Exhibit,” a small pavilion aiming to show a European audience what Black Americans had made of themselves in the thirty-five years since emancipation.

The roughly sixty charts he produced, hand-drawn and hand-lettered on large poster boards by Du Bois and his students at Atlanta University, are some of the most arresting data visualizations ever made. Like Galton’s charts, they visually plot the result of careful statistical investigation using idiosyncratic and arrestingly odd formats; also like Galton, they are using data visualization to pursue political goals. But where Galton leveraged his visualizations to argue for the lasting, ahistorical superiority of people who resembled him (Protestant, British, scientific, and from “good families”), Du Bois was charting the rapidly changing role of Black people in American society.

Change over time runs through virtually all of his charts from 1900, a few of which are reproduced below:
As an aside, I’m intrigued by the font Du Bois used here. It looks almost like the product of an early computer printout with its quasi-monotype, but if you look carefully you can see that it is idiosyncratic and hand-lettered.

Writing this post got me thinking about a question which we are figuring out the answer to in the 2020s: by automating away the design process, do we also cede away our ability to use design itself (and not just writing) as an act of thinking?

Using Anthropic’s new Claude Design feature, I wanted to see what happened when I asked Claude to make something in the style of these nineteenth-century visuals. Here’s what I got for a Du Bois pastiche (my prompt is available at this footnote):2

Asking for Galton-style charts yielded a similar stew of serif fonts and faux-aged paper effects, but in more of a Victorian bric-a-brac style which, on closer inspection, adds up to nothing much at all:
Detail of Claude Design’s attempt to design like Galton.

These failed attempts to capture a distinctly Victorian data visualization style should not be surprising. Claude Design is optimized to make slides and other data visualizations that suit modern sensibilities, and we can guess that there wasn’t much Du Bois or Galton in its training data. But aside from the by-now-familiar fact that AI models tend toward the median and the expected, there is the larger absence here: these charts don’t have a perspective.

James, Du Bois, and Galton, different as they were, had one big thing in common, and this thing is part of why we are still thinking and debating them today: they were deeply idiosyncratic, deeply personal. You see in their visualizations not just raw data but an odd, distinctive intellect at work. And more importantly, an intellect working through an argument that is both mental and physical. They are instantiating the act of thought in the mechanical work of their hands.

Claude Design is appealing for those of us (like myself!) who find themselves having to make fiddly tweaks to Powerpoint slides. It is, in this sense, just one more step in a process of design automation and abstraction that has been happening for centuries: from hand-lettered to printed book, from drawing to photograph, from scissors and glue to “desktop publishing,” and now from humans designing things to humans telling AI designers what to make.

As the act of thought gets more and more removed from the union of human mind with human hands, the new ideas, the palpable jolt, that you get from designing something by hand gets rarer, and thus more valuable.

This is why I am going to be making more scrapbooks and weird collages on yellow legal pads as I embark on my new book project about the Machine Age.

After all, what could be more human?

Leave a comment

1

For instance, here is how he describes his own conception of beauty in a 1926 lecture, remembering “four beautiful things”:

The Cathedral at Cologne, a forest in stone, set in light and changing shadow, echoing with sunlight and solemn song; a village of the Veys in West Africa, a little thing of mauve and purple, quiet, lying content and shining in the sun; a black and velvet room where on a throne rests, in old and yellowing marble, the broken curves of the Venus of Milo; a single phrase of music in the Southern South—utter melody, haunting and appealing, suddenly arising out of night and eternity, beneath the moon.

It has been noted that Du Bois is mixing different senses freely here: a cathedral that “echoes” with sunlight, for instance.

2

The prompt: “Create a data visualization in the style of W. E. B. du Bois’ famous infographics for the 1900 world’s fair. Use a data source of your choice. Make it great and very true to Du Bois’ style but not the same content and not a copy.”

--------------------------------------------------------------------------------

30. Bring your own Agent to MS Teams

Source: https://microsoft.github.io/teams-sdk/blog/bring-your-agent-to-teams/
Site: microsoft.github.io
Author: Aamir Jawaid, Umang Sehgal
Published: 2026-04-17
HN activity: 57 points · 44 comments
Length: 1.3K words (~6 min read)
Language: en

You've already built the agent. It lives somewhere: a LangChain chain, an Azure Foundry deployment, a Slack bot. Your users live in Teams. Teams is where most enterprise work happens: decisions get made, customers get answered, and projects move forward there. Getting your agent into that context, before you build anything Teams-specific, is already worth doing.

It comes down to one pattern in the Teams TypeScript SDK: the HTTP server adapter. You point it at your HTTP server, it registers a messaging endpoint, and your existing server keeps running as-is. The scenarios below cover three different starting points: a Slack bot, a LangChain chain, and an Azure Foundry agent.

The SDK also handles the parts you don't want to think about: it verifies every incoming request is legitimately from Teams before invoking your handler, and routes messages to the right event handlers automatically.

The Pattern​

Every example in this post uses the same three-step shape:

import { App as TeamsApp, ExpressAdapter } from '@microsoft/teams.apps';

const adapter = new ExpressAdapter(expressApp);   // 1. wrap your server
const teamsApp = new TeamsApp({ httpServerAdapter: adapter });  // 2. create the app

teamsApp.on('message', async ({ send, activity }) => {  // 3. handle messages
  await send(/* your agent's response */);
});

await teamsApp.initialize();   // registers POST /api/messages on your server

The SDK injects a POST /api/messages route into your existing Express app. /api/messages is the well-known endpoint Teams uses to deliver messages to your bot, the Teams-shaped interface your HTTP server needs to have. Your server stays yours; the Teams SDK just adds that one endpoint.

Scenario 1: Slack Bot​

You have a Slack bot built with Bolt (or any other kind of bot deployed as a web service). Your team uses both Slack and Teams. Rather than maintaining two codebases, run both on the same Express server.

ExpressReceiver lets Bolt mount onto your Express app instead of owning the server. The Teams SDK does the same thing, so both platforms share the same process.
slack-app.ts: existing Slack logic, untouched

import { App as BoltApp, ExpressReceiver } from '@slack/bolt';
import type { Express } from 'express';

export function mountSlack(expressApp: Express) {
  const slackReceiver = new ExpressReceiver({
    signingSecret: process.env.SLACK_SIGNING_SECRET,
    app: expressApp,
    endpoints: { events: '/slack/events' },
  });

  const slackApp = new BoltApp({
    token: process.env.SLACK_BOT_TOKEN,
    receiver: slackReceiver,
  });

  slackApp.message('hello', async ({ say }) => {
    await say('Hey! Caught you on Slack.');
  });
}

teams-app.ts:

import express from 'express';
import { App as TeamsApp, ExpressAdapter } from '@microsoft/teams.apps';
import { mountSlack } from './slack-app';

const expressApp = express();
mountSlack(expressApp);

// Teams mounts at /api/messages
const adapter = new ExpressAdapter(expressApp);
const teamsApp = new TeamsApp({ httpServerAdapter: adapter });

teamsApp.on('message', async ({ send, activity }) => {
  await send(`Hey ${activity.from.name}! You said: "${activity.text}"`);
});

export { expressApp, teamsApp };

Both platforms run in the same process. Slack hits /slack/events, Teams hits /api/messages, and any shared agent logic (LLM calls, database lookups, business rules) lives in plain functions that both handlers call.

Scenario 2: LangChain​

You have a LangChain chain. You want Teams users to talk to it.
chain.ts: existing LangChain logic, untouched

import { ChatOpenAI } from '@langchain/openai';
import { ChatPromptTemplate } from '@langchain/core/prompts';
import { StringOutputParser } from '@langchain/core/output_parsers';

let _chain: ReturnType<typeof buildChain> | null = null;

function buildChain() {
  const prompt = ChatPromptTemplate.fromMessages([
    ['system', 'You are a helpful assistant embedded in Microsoft Teams. Be concise.'],
    ['human', '{input}'],
  ]);
  return prompt.pipe(new ChatOpenAI({ model: 'gpt-4o-mini' })).pipe(new StringOutputParser());
}

export function getChain() {
  if (!_chain) _chain = buildChain();
  return _chain;
}

teams-app.ts (the bridge):

import express from 'express';
import { App as TeamsApp, ExpressAdapter } from '@microsoft/teams.apps';
import { getChain } from './chain';

const expressApp = express();
const adapter = new ExpressAdapter(expressApp);
const teamsApp = new TeamsApp({ httpServerAdapter: adapter });

teamsApp.on('message', async ({ send, activity }) => {
  await send({ type: 'typing' });
  // pass the Teams message to LangChain
  const reply = await getChain().invoke({ input: activity.text ?? '' });
  await send(reply);
});

export { expressApp, teamsApp };

index.ts (start it):

import 'dotenv/config';
import http from 'http';
import { expressApp, teamsApp } from './teams-app';

await teamsApp.initialize();
http.createServer(expressApp).listen(3978);

Your chain runs on every message. The typing indicator fires before the LLM responds so users know something's happening.

Scenario 3: Azure AI Foundry​

You have an agent deployed in Azure AI Foundry. The Teams SDK gives you the message; you forward it to Foundry and relay the reply.
foundry-agent.ts

import { AIProjectClient } from '@azure/ai-projects';
import { DefaultAzureCredential } from '@azure/identity';

let _client: AIProjectClient | null = null;

function getClient() {
  if (!_client) {
    _client = AIProjectClient.fromEndpoint(
      process.env.AZURE_AI_FOUNDRY_ENDPOINT!,
      new DefaultAzureCredential(),
    );
  }
  return _client;
}

export async function askFoundryAgent(userMessage: string): Promise<string> {
  const client = getClient();
  const thread = await client.agents.threads.create();
  await client.agents.messages.create(thread.id, 'user', userMessage);

  const run = await client.agents.runs.createAndPoll(
    thread.id,
    process.env.AZURE_AGENT_ID!,
  );

  if (run.status !== 'completed') throw new Error(`Run ended: ${run.status}`);

  const messages = client.agents.messages.list(thread.id);
  for await (const msg of messages) {
    if (msg.role === 'assistant') {
      return msg.content
        .filter((c): c is { type: 'text'; text: { value: string } } => c.type === 'text')
        .map((c) => c.text.value)
        .join('');
    }
  }
  return 'No response from agent.';
}

teams-app.ts:

import express from 'express';
import { App as TeamsApp, ExpressAdapter } from '@microsoft/teams.apps';
import { askFoundryAgent } from './foundry-agent';

const expressApp = express();
const adapter = new ExpressAdapter(expressApp);
const teamsApp = new TeamsApp({ httpServerAdapter: adapter });

teamsApp.on('message', async ({ send, activity }) => {
  // pass the Teams message to Foundry
  const reply = await askFoundryAgent(activity.text ?? '');
  await send(reply);
});

export { expressApp, teamsApp };

Python SDK

A Python SDK is also available. The same three-step pattern applies with FastAPI and other ASGI frameworks.
Show Python equivalent

from fastapi import FastAPI
from microsoft_teams.apps import App, FastAPIAdapter

fastapi_app = FastAPI()

adapter = FastAPIAdapter(app=fastapi_app)    # 1. wrap your server
teams_app = App(http_server_adapter=adapter) # 2. create the app

@teams_app.on_message
async def handle_message(ctx):               # 3. handle messages
    await ctx.send("your agent's response")

await teams_app.initialize()

See Self-Managing Your Server for the full Python guide.

Registering Your Bot​

All three scenarios share the same registration step.

Step 1: Get a public URL for your local server.

Teams needs to reach your bot over HTTPS. For local development, Dev tunnels is the recommended option — it's built into VS Code and the Azure CLI. ngrok works too. Either way, you'll get a URL like https://abc123.devtunnels.ms that forwards to your local port.

Step 2: Register your bot using the Teams SDK CLI.

npm install -g @microsoft/teams.cli@preview
teams login
teams app create --name "My Bot" --endpoint https://your-tunnel-url/api/messages --env .env

This handles AAD app registration, client secret generation, manifest creation, and bot setup in one command. Your .env gets populated with CLIENT_ID, CLIENT_SECRET, and TENANT_ID automatically.

Step 3: Sideload the app into Teams.

After teams app create, follow the sideloading instructions in the CLI output to install the app in your Teams client for testing.

The same three lines, every time​

Every scenario in this post follows the same shape because the SDK is built around one idea: your server is yours. The adapter is the seam between your existing infrastructure and Teams. Whether you're running Express or any other HTTP server, the SDK doesn't care what's underneath. It just needs something that can register a route and handle a request.

const adapter = new <YourAdapter>(yourServer); // ExpressAdapter or your own
const teamsApp = new TeamsApp({ httpServerAdapter: adapter });
teamsApp.on('message', async ({ send, activity }) => { /* your agent */ });

If you're already running a bot somewhere, wiring it into Teams is a few lines of glue code. Full docs at Self-Managing Your Server.