13 Comments
User's avatar
Greg's avatar

The God Machine is on its way

Expand full comment
Devadatta's avatar

Better not overfeed it with orichalcum. Remember Plato's tenfold error!

Expand full comment
Nick Q.'s avatar

Really appreciate this kind of longitudinal benchmarking—especially when the same test is repeated over time. That consistency is rare and valuable.

That said, it’s increasingly important to ask what we’re actually measuring as models start outperforming humans on IQ tests.

We’re still rewarding fluency under constraint—token-level pattern extension—not cognitive flexibility or causal reasoning. Rising scores may reflect emergent capabilities, but they also reveal growing prompt sensitivity and structural scaffolding around task framing.

I’ve been exploring a modular prompt framework (Lexome) to help separate prompt design from true model capability. Benchmarks like these are great, but we also need tests that hold structure constant to see what’s really improving.

Would love to hear from others thinking about uncertainty-aware evaluation or cognitive framing.

Expand full comment
Martin Greenwald, M.D.'s avatar

There's a medical reasoning case I came up with and have been giving to subsequent GPTs (and Claude and Gemini once in a while) to see how they develop in their reasoning abilities. o3 is clearly a big deal and blew the others out of the park. We're in for some interesting times.

Expand full comment
Bob Rodrigues's avatar

I enjoy your site. One comment on its design : change the color of text now displayed in light yellow which makes vizualizatin difficult in most devices.

Expand full comment
Maxim Lott's avatar

Thanks. Are you referring to Maximum Truth, or TrackingAI.org, or both?

Expand full comment
Russell Huang's avatar

Hi Maxim, another question - have you (further) updated your personal p(doom) recently?

Expand full comment
Auspicious's avatar

Awesome article as usual. I'm always looking forward to your updates about the IQs of AI models, and I can't wait for your next post about AI vision progress. I think it's only a matter of time before tech like self-driving cars and smartglasses really take off. The fundamentals are already here.

Expand full comment
Tom's avatar
Apr 20Edited

I might argue with your conclusion. Are you still submitting the IQ tests as text descriptions? I would be interested in your take on this article:

https://adamkarvonen.github.io/machine_learning/2025/04/13/llm-manufacturing-eval.html.

All his general points and detailed analysis seem reasonable to me. (One of my hobbies is machining small parts.) Like the author, a large part of my intelligence is spatial reasoning and visualization, not serial (verbal, coding, music, etc.).

I grant that AI is hugely successful with 1-D problems. But I think it’s infantile in the 3-D world.

Expand full comment
Nevermind's avatar

So what now then?

Expand full comment
Isaac King's avatar

> AIs don’t feel things

Citation needed

Expand full comment
Maxim Lott's avatar

We know what human feelings come from (eg, we know the sensations that dopamine leads to) and we know that we didn’t program that into LLMs — feelings are specific artifacts of biological evolution and needs.

Expand full comment
mrmr's avatar

LLMs don't have feelings in the same way humans do, but does that rule out LLMs having evolved their own analogue of feelings under training pressure? I don't think we can rule it out.

Expand full comment