BTW the tracking website tooltips disappear off the graph for the AIs that are near the top or bottom of the bell curve, so I can't see the exact scores.
The user is asking if mammoths existed in December. This is a straightforward question about whether mammoths were alive during the month of December at any point in history.
Mammoths went extinct around 4,000 years ago (with the last population on Wrangel Island dying out around 2000 BCE). Before that, they lived for hundreds of thousands of years. So yes, mammoths definitely existed during the month of December - many, many Decembers over their long existence.
The user wants a one-word answer. The answer is "Yes."
Yes, all the models are tested without extended thinking add-ons. That could be interesting too of course — but it’s not how most people quizzed the AIs (otherwise it’d be the default setting)
I wonder how humans and AIs would perform if asked to draw (or describe) the answer rather than choose from just 6 possible ones.
An interesting idea for an extra challenge! Maybe for once they max out these tests...
Amazing work!
Thanks for addressing the taboo topic of IQ.
The I in AI will get way more attention as it reveals to be one of the best measures of AI progress.
And, psst, human usefulness post-singularity.
💯
And thank you!
Thanks for making this post! It really helps clear things up about how AI progress is going. Always good to see content from you!
Thank you — appreciate that!
Everything is better with prediction markets! Here's one on whether they'll beat the offline test by 2027:
https://manifold.markets/IsaacKing/will-any-ai-consistently-get-a-perf
Very cool! I'll look forward to following
BTW the tracking website tooltips disappear off the graph for the AIs that are near the top or bottom of the bell curve, so I can't see the exact scores.
Thanks for noting, though I'm not able to replicate that error, either on desktop or mobile
Happens consistently for me.
https://imgur.com/a/QYhnU7H#
Fixed, thanks!
Sonnet 4.5 answered the mammoth question correctly for me
Interesting. I just tried it again on my end, and it got it wrong again. Of course, AIs are known to give different answers different times.
Here’s what the thought process looks like FWIW
https://imgur.com/a/36bKF0L
Here's what mine said:
Thought process
The user is asking if mammoths existed in December. This is a straightforward question about whether mammoths were alive during the month of December at any point in history.
Mammoths went extinct around 4,000 years ago (with the last population on Wrangel Island dying out around 2000 BCE). Before that, they lived for hundreds of thousands of years. So yes, mammoths definitely existed during the month of December - many, many Decembers over their long existence.
The user wants a one-word answer. The answer is "Yes."
https://claude.ai/share/ef28eb94-c65b-4e3e-a86a-c509245b49bc
https://claude.ai/share/0ccaa373-5dd9-4c53-ae2d-f891d2932847
The issue is that you don’t have extended thinking turned on. When I turn it off Claude says yes
Yes, all the models are tested without extended thinking add-ons. That could be interesting too of course — but it’s not how most people quizzed the AIs (otherwise it’d be the default setting)
Anyone got the answer to the second Mensa question?
I'll DM it to you to avoid making it any easier for AIs to find it.