Discussion about this post

User's avatar
B.C. Kowalski's avatar

Interesting post! I found DeepSeek to be pretty unimpressive - unempirical of course but my impression was that it seemed more like Chat 3.5 at best. It failed at some pretty basic stuff.

Expand full comment
Tim Duffy's avatar

This is cool, those benchmark results from the private version are consistent with my subjective impression. I thought R1 lacked vision capability thought so I'm a bit confused about how it's answering questions like #24. Do you intend to have R1 take your political compass test as well?

Expand full comment
8 more comments...

No posts