I tried the reverse approach of getting AIs to generate matrices items. It completely failed. It is however good at number series, a kind of reasoning.


Expand full comment


Great post! Would you be fine with me creating a prediction market on Manifold based on whether GPT-5 will score at least 100 based on your first similar test on GPT-5 (conditional on you doing such test)?

Expand full comment
Feb 27·edited Feb 27Liked by Maxim Lott

I think it's likely wishful thinking to point out the things AI is still bad at, when the things that it's good at are so good, that we can't even guess how it thinks. When Garry Kasparov lost to a computer, he said that he felt he was playing chess with an alien. When I use Midjourney, I am constantly amazed at how creative it is in blending different styles of art, architecture, furniture, etc. to create structures that are both aesthetic and functional. Of course it makes weird mistakes sometimes, or imagines impossible geometry, but so did Escher. Another thing I find odd is that generating 3D models from 2D AI images is still in a primitive state, but the AI must be generating some sort of 3D mesh if it's doing perspective shading and lensing effects properly--which it does quite well. Even if we could figure out how to get AIs to explain their thought process in a way we could understand, they still think millions of times faster than us, so what's the point of trying to monitor their evolution?

I also don't think "strangling it in its crib" is even possible; it certainly wasn't with nuclear. The only effect of trying to restrict proliferation of nuclear power and weapons is that those who restrict only hamper their own security, no one else's. The US certainly tried to prevent the leakage of nuclear bomb technology, but it only took a few years for the USSR to replicate it, and eventually sell it to partners who could not make their own. Of course there were many scientists who insisted that we never should have developed them in the first place, but very few of them were thinking that way in the midst of WWII; many were terrified the Germans would succeed first. The nature of arms races is that if you voluntarily opt out, you might be destroyed, or put yourself at severe disadvantage.

The choice to turn back the clock on nuclear power, however, is far more detrimental to progress than dismantling nuclear arms inventories. At least if people wish to move away from dependence on oil and gas. Will AI prove to be more like nuclear power or nuclear weapons? Both, it seems, in a way that we cannot conveniently separate. At some point, we will lose our ability to control technology like this, and I'd wager sooner than later. The current ineptitude with IQ tests will very likely be solved later this year. Unlike with humans, innovations in machine intelligence propagate at high speed and raise the floor for all competitors. It seems impossible for humans to remain in control of this for long, and even if we could, your piece on Google's current woke bias makes me more uncomfortable than what sentient machines might choose to do of their own volition. Orwell's dark vision resembled the Chinese government today, with people at the top controlling a surveillance apparatus. We can't be sure that an autonomous AI would be worse than that, any more than we can expect it to be better.

Expand full comment
Feb 27Liked by Maxim Lott

“A subject for future research: how well does ChatGPT do if the problems are spelled out verbally for it?”

That's not a comparable test, though, because verbally you are obliged to draw its attention to things it might not otherwise have noticed.

Expand full comment
Feb 27Liked by Maxim Lott

You are probably right that general intelligence is a while away, however I was more surprised about the delta between Gemini and GPT4 than the absolute level of intelligence. It could be argued that failing to take the test at all (Gemini) and doing terribly (GPT4) is a similar jump to 75->90 and 90->110 etc.

If that were true, then we might only be a model generation or two away from general intelligence.

Expand full comment
Feb 27Liked by Maxim Lott

For the second test: “The answer is B.” Really?

Expand full comment

> I asked ChatGPT-4 to add up a string of numbers, for example, 34 + 5.2 + 9 + 0.2 + 7.1 + 3 + 11 + 18.889 + 15.532 + 1.1 + 3. It got it wrong.

How exactly did you prompt for this? I expected this would always be solved accurately because ChatGPT-4, which you specify using, would simply dump it into the Code Interpreter and get an exact answer. And when I do that myself, asking

> what is 34 + 5.2 + 9 + 0.2 + 7.1 + 3 + 11 + 18.889 + 15.532 + 1.1 + 3?

it correctly replies:

> The sum is 108.021. ​​

Because it used the Python REPL as follows:

> # Calculating the sum of the provided numbers

> total_sum = 34 + 5.2 + 9 + 0.2 + 7.1 + 3 + 11 + 18.889 + 15.532 + 1.1 + 3

> total_sum

(I get '108.02099999999999' due to different rounding in ghci, but same thing.)

Expand full comment

Good tests.

You should know that Gemini's multimodal capabilities are currently not enabled. All of your image uploads are being serviced by Google Lens.

I tried testing Gemini's multimodal ability, and was stunned by how bad it was. Sky-high rates of refusals, hallucinations, etc. I soon began to suspect that my queries were being silently routed to Lens, and now we basically have confirmation (from Jack Krawczyk) that this is exactly the case.


To be honest, I think this is a scumbag move from Google. They advertised Gemini Ultra has having SOTA multimodal abilities...but you can't use them. And they fail to mention anywhere that you can't use them. If I signed up for Gemini Advanced ($20/m) because Lens wasn't cutting it and I wanted Gemini's multimodality, I'd be pretty pissed off.

Expand full comment

So, even for a human child you would need to explain what's happening here. In a particular, it would be important to explain to the AI which part of the diagram is the answer, that there is only one possible answer and that the answer should fit as many patterns as possible.

Expand full comment