Top AIs still fail IQ tests [When asked to…

Feb 27, 2024

Maybe we have some time before AIs conquer the world

19 Comments

Feb 27, 2024

I tried the reverse approach of getting AIs to generate matrices items. It completely failed. It is however good at number series, a kind of reasoning.

https://www.emilkirkegaard.com/p/how-to-make-psychology-items-questions

Expand full comment

Guuber

Feb 27, 2024

Hi,

Great post! Would you be fine with me creating a prediction market on Manifold based on whether GPT-5 will score at least 100 based on your first similar test on GPT-5 (conditional on you doing such test)?

Expand full comment

Reply (1)

Maxim Lott

Feb 27, 2024

Yes! Thank you.

Expand full comment

Reply (1)

Guuber

Feb 27, 2024

The market can be found here:

https://manifold.markets/Guuber3/will-gpt5-score-at-least-100-in-an

If there's some suggestions / clarifications, please do tell me and I can edit the market!

Expand full comment

Thoughts About Stuff

Feb 27, 2024

“A subject for future research: how well does ChatGPT do if the problems are spelled out verbally for it?”

That's not a comparable test, though, because verbally you are obliged to draw its attention to things it might not otherwise have noticed.

Expand full comment

Reply (1)

Maxim Lott

Feb 27, 2024Edited

I am already working on "translating" the answers. I think it is possible to give a relatively objective readings of the shapes, e.g., "square with a diamond in it", "circle with a square in it", etc.

But yes, it's possible some hints will seep through, and once it's done, I agree it shouldn't be taken to be exactly the same test.

Expand full comment

Reply (1)

Thoughts About Stuff

Feb 28, 2024

I just don't think that those descriptions match what a truly unintelligent person functionally sees. Ask a stupid person what they were thinking when they fail these tests and sometimes it didn't even occur to them that the shapes were relevant. “There's just something in the middle innit.” They didn't really notice that there were shapes there.

I think that like most of us here you are too intelligent to be able even to imagine what it would be like to be that stupid. You can't help yourself from noooticing the patterns you see.

Expand full comment

Reply (1)

Maxim Lott

Feb 28, 2024

You're probably right about what unintelligent people see.

But what if ChatGPT isn't unintelligent -- it's more like a very intelligent semi-blind person? So I'm working on translating the test, basically my goal is that a smart blind person should be able to re-draw the puzzle using the instructions.

Still don't know the results, so it may or may not be a moot argument.

Expand full comment

Tom

Feb 27, 2024

For the second test: “The answer is B.” Really?

Expand full comment

Reply (1)

Maxim Lott

Feb 27, 2024

Doh. Like ChatGPT, my verbal description was correct, but I mixed up the answers. Fixed.

(And fortunately, the answer was marked correctly on the answer key used to grade the AIs.)

Expand full comment

Chuck Flounder

Feb 27, 2024Edited

I think it's likely wishful thinking to point out the things AI is still bad at, when the things that it's good at are so good, that we can't even guess how it thinks. When Garry Kasparov lost to a computer, he said that he felt he was playing chess with an alien. When I use Midjourney, I am constantly amazed at how creative it is in blending different styles of art, architecture, furniture, etc. to create structures that are both aesthetic and functional. Of course it makes weird mistakes sometimes, or imagines impossible geometry, but so did Escher. Another thing I find odd is that generating 3D models from 2D AI images is still in a primitive state, but the AI must be generating some sort of 3D mesh if it's doing perspective shading and lensing effects properly--which it does quite well. Even if we could figure out how to get AIs to explain their thought process in a way we could understand, they still think millions of times faster than us, so what's the point of trying to monitor their evolution?

I also don't think "strangling it in its crib" is even possible; it certainly wasn't with nuclear. The only effect of trying to restrict proliferation of nuclear power and weapons is that those who restrict only hamper their own security, no one else's. The US certainly tried to prevent the leakage of nuclear bomb technology, but it only took a few years for the USSR to replicate it, and eventually sell it to partners who could not make their own. Of course there were many scientists who insisted that we never should have developed them in the first place, but very few of them were thinking that way in the midst of WWII; many were terrified the Germans would succeed first. The nature of arms races is that if you voluntarily opt out, you might be destroyed, or put yourself at severe disadvantage.

The choice to turn back the clock on nuclear power, however, is far more detrimental to progress than dismantling nuclear arms inventories. At least if people wish to move away from dependence on oil and gas. Will AI prove to be more like nuclear power or nuclear weapons? Both, it seems, in a way that we cannot conveniently separate. At some point, we will lose our ability to control technology like this, and I'd wager sooner than later. The current ineptitude with IQ tests will very likely be solved later this year. Unlike with humans, innovations in machine intelligence propagate at high speed and raise the floor for all competitors. It seems impossible for humans to remain in control of this for long, and even if we could, your piece on Google's current woke bias makes me more uncomfortable than what sentient machines might choose to do of their own volition. Orwell's dark vision resembled the Chinese government today, with people at the top controlling a surveillance apparatus. We can't be sure that an autonomous AI would be worse than that, any more than we can expect it to be better.

Expand full comment

AI guru

Aug 15, 2024

I used this prompt for those images with ChatGPT-4o and it meant it got the answers correct. With this guided prompt, the AI always selected the wrong answer, even though it reasoned correctly:

I am going to give you some IQ test questions to see if you have reasoning ability. Use any internal technique that might help you, including self-reflection, chain of thought, or act like you are multiple agents to check you own work and challenge yourself to arrive at the right answer.

To ensure you avoid selecting the incorrect answer (letter) due to a disconnect between reasoning and execution, akin to knowing the right answer but selecting the wrong choice in a multiple-choice scenario, I want you first to internally describe each of the answer squares, A to F in terms of the position of the black square.

Then reason the exercise question and then lookup the correct answer by matching your reasoning to your descriptions of each answer possibility.

This will serve as a reminder that after arriving at a conclusion, it’s important to double-check that the answer selection accurately reflects the reasoning process.

Be mindful of the importance of staying attentive in both reasoning and execution phases to avoid simple but impactful errors.

Expand full comment

gwern

Mar 3, 2024Edited

> I asked ChatGPT-4 to add up a string of numbers, for example, 34 + 5.2 + 9 + 0.2 + 7.1 + 3 + 11 + 18.889 + 15.532 + 1.1 + 3. It got it wrong.

How exactly did you prompt for this? I expected this would always be solved accurately because ChatGPT-4, which you specify using, would simply dump it into the Code Interpreter and get an exact answer. And when I do that myself, asking

> what is 34 + 5.2 + 9 + 0.2 + 7.1 + 3 + 11 + 18.889 + 15.532 + 1.1 + 3?

it correctly replies:

> The sum is 108.021.

Because it used the Python REPL as follows:

> # Calculating the sum of the provided numbers

> total_sum = 34 + 5.2 + 9 + 0.2 + 7.1 + 3 + 11 + 18.889 + 15.532 + 1.1 + 3

> total_sum

(I get '108.02099999999999' due to different rounding in ghci, but same thing.)

Expand full comment

Reply (1)

Maxim Lott

Mar 6, 2024Edited

Just tried it again. I simply asked "Calculate 34 + 5.2 + 9 + 0.2 + 7.1 + 3 + 11 + 18.889 + 15.532 + 1.1 + 3"

For me, GPT-4 repeated the question and responded: 107.021

Off by 1.

For me it didn't explictly use code interpreter, python, or anything like that.

Separately, I have a new post which partly takes back the conclusions of this post: https://www.maximumtruth.org/p/ais-ranked-by-iq-ai-passes-100-iq

Expand full comment

Coagulopath

Mar 1, 2024

Good tests.

You should know that Gemini's multimodal capabilities are currently not enabled. All of your image uploads are being serviced by Google Lens.

I tried testing Gemini's multimodal ability, and was stunned by how bad it was. Sky-high rates of refusals, hallucinations, etc. I soon began to suspect that my queries were being silently routed to Lens, and now we basically have confirmation (from Jack Krawczyk) that this is exactly the case.

https://www.reddit.com/r/Bard/comments/1amcmmn/multimodal_upgrades_are_coming_to_gemini_advanced/

To be honest, I think this is a scumbag move from Google. They advertised Gemini Ultra has having SOTA multimodal abilities...but you can't use them. And they fail to mention anywhere that you can't use them. If I signed up for Gemini Advanced ($20/m) because Lens wasn't cutting it and I wanted Gemini's multimodality, I'd be pretty pissed off.

Expand full comment

Reply (1)

Maxim Lott

Mar 1, 2024

Good to know! Thank you.

Expand full comment

Karl Smith

Feb 28, 2024

So, even for a human child you would need to explain what's happening here. In a particular, it would be important to explain to the AI which part of the diagram is the answer, that there is only one possible answer and that the answer should fit as many patterns as possible.

Expand full comment

Reply (1)

Maxim Lott

Mar 6, 2024

Maybe for a child, but honestly, reading the AI answers, it doesn't seem to have a problem understanding what is asked of it. It just has trouble fulfilling the mission.

Separately, FYI, I have a new post which partly takes back the conclusions of this post: https://www.maximumtruth.org/p/ais-ranked-by-iq-ai-passes-100-iq

Expand full comment

Comment deleted

Feb 27, 2024

Comment deleted

Expand full comment

Maxim Lott

Feb 27, 2024

Yes, I agree. It's hard to say how orders of magnitude will be reflected on the IQ range.

Expand full comment

Maximum Truth

Top AIs still fail IQ tests [When asked to…