4 Comments
User's avatar
⭠ Return to thread
Maxim Lott's avatar

Thanks. Agreed that this is a potential problem, though even if it is the case, the analysis shows two things:

-- More modern models are dramatically better at querying the appropriate section of their "memory" (or they have more relevant memory to query.)

-- The top AIs are still getting the easy questions right, as opposed to the harder ones. I think this is some evidence that the questions are not in their training data, though it's far from proof. But what it does show conclusivley is that the AIs don't have a whole answer key readily available in their memory.

I do want to get my hands on an offline-only IQ test, though. Maybe a library somewhere...

Expand full comment
Timothy B. Lee's avatar

The same was true of the MATH benchmark in that study I linked to before. GPT-4 went from solving ~25 percent of static MATH problems to ~12 percent of problems that were generated dynamically. I think you are measuring something real as far as differences between the various LLMs goes.

But it strikes me as a huge leap from that to "simple extrapolation of current growth rates suggested that Claude-6 would get all the IQ questions right, and be smarter than just about everyone, in about 4 - 10 years." Because an LLM with an IQ of 150 (as measured by possible-memorized IQ tests) might be very different from a human being with an IQ of 150 because human beings have much less capacity for rote memorization than an LLM.

Expand full comment
Maxim Lott's avatar

Maybe so. I’ll report back when I get an offline test.

Expand full comment
Brian Hockenmaier's avatar

Just want to voice how necessary this is. We need to get past the standard "this is just regurgitating training data" arguments so we can get to the meat of what to do about these new intelligences we're making.

Expand full comment