The Bitter Lesson and Spiders

We keep trying to build intelligence in our own image, but evolution doesn't care about our blueprints.

Last year in Novi Sad, Serbia, I struck up a conversation with a travel guide about science fiction. Between local recommendations of alcochol and stories of Slavic lore, he mentioned a novel that would change how I think about evolution: Adrian Tchaikovsky's Children of Time.

The premise is pretty interesting. In a remote future, a scientist, Dr. Kern, attempts to uplift monkeys, on an earth-like planet, to sentience using a nanovirus. Her plan was precise, methodical, but human-centric by design. The monkeys would evolve following the path she'd carefully designed and seen happen in the past - first tool use, then language, then civilization, etc. Everything mapped out according to human understanding of intelligence.

Then she sent an evolution bomb to accelerate this process multi-fold.

Children of Time by Adrian Tchaikovsky - Winner of the Arthur C. Clarke Award

But it never made it to monkeys.

Instead, the evolution bomb infected spiders. And what emerged was intelligence so alien, so fundamentally different from human cognition, that neither Kern nor the humans who arrived centuries later could initially comprehend it. The spiders developed their own civilization, their own technology, their own way of understanding the universe. When humans finally made contact, the spiders had to demonstrate their intelligence through brief space warfare before eventually teaching humans that cooperation between radically different minds was possible - and beneficial.

The Reasonable "AGI"

Around the time I read Tchaikovsky, I also had a fascinating discussion with Sherjil Ozair about whether we need to encode some sort of System Design principles in LLMs to make them understand workflows of a typical Software Development. This was before LLMs were remotely good at coding.

We didn't refer to Rich Sutton's "The Bitter Lesson" - (a paper that should be required reading for anyone working in AI) but Sherjil, who's building something remarkable at General Agents, had a strong opinion that this has been tried multiple times before and it never works out, we should let LLMs learn for themselves, we just need to provide them enough data to learn.

We keep trying to encode how we think intelligence works, rather than letting intelligence emerge.

Sutton's bitter lesson, drawn from 70 years of AI research, is stark: general methods that leverage computation are ultimately the most effective, and by a large margin. Every time researchers try to build in human knowledge - how we play chess, how we recognize speech, how we see - it helps in the short term but eventually becomes a ceiling.

What actually works?

The same things that worked for evolution: search and learning at massive scale.

Search: Not the careful, knowledge-guided search that humans do, but massive, parallel exploration of possibility spaces. Deep Blue evaluated 200 million positions per second. AlphaGo examined move trees no human could hold in their head.

Learning: Not learning the rules we think are important, but discovering patterns we never knew existed. GPT models learned language without being explicitly taught grammar rules, syntax trees, or semantic frameworks. They absorbed these patterns from raw text at scale - finding statistical regularities that capture language better than our hand-crafted rules ever did.

The spiders in Tchaikovsky's novel domesticated ant colonies and turned them into computers. They bred ants for processing power, programmed them with pheromones, created living data centers. And they did all this, w/o ever learning what Silcon or even Fire is.

Compute and Data is all you need

Basically, When you can throw a trillion parameters at a problem, human-designed features aren't relevant any more.

The bitter lesson isn't that human knowledge is worthless. It's that encoding human knowledge into systems prevents them from discovering better solutions.

Modern language models don't parse sentences the way we taught computers to parse them for decades. Instead of syntax trees and grammatical rules, they found high-dimensional vector spaces where "king - man + woman = queen" makes sense. Image recognition networks don't look for edges and shapes like we thought vision worked (?). They found hierarchical feature detectors that see textures we don't have names for, patterns that exist between patterns. And increasingly, they're finding solutions in spaces with thousands of dimensions that we can't even visualize.

The Lesson?

If you are an aspiring AI Researcher, read Bitter Lesson if you haven't.
If you like to travel and haven't been to Serbia ever, please plan to. It's amazing across seasons. I'd recommend Winter.

Thanks to that travel guide in Novi Sad for the book recommendation, and to Sherjil for the conversation about computation and scale that helped connect these ideas.

The Reasonable "AGI"

What actually works?

Compute and Data is all you need

The Lesson?

Contact