Excellent article, thx. (I'm now wondering what link at what site brought me here, and whom to thank.)

Not to add to your to-do list, but as I understand it, you have evaluated the accuracy of prediction made by bettors vs. 538 *at 8 AM on Election Day*.

Fair enough, but its only July! I would think the data would be available to evaluate the accuracy of both approaches one month before the election, two months out, and so on.

In other words, even if we now know that bettors will be better on the morning of Nov 5, who is likely to be more accurate today?

Thanks for your effort here. I'm off to see how hard you hit subscribers, and expect to become one.

Excellent article. I kind of love that bettors can perform as well as a professional in the business of statistics. I'll probably be checking out both sites for some electoral news soon, so thank you for writing this!

(1) With respect to your first chart ("Bettors vs. Reality"), do you have the R-square value? My statistics days are way behind me, but I think that is the goodness of fit measure -- please let me know if that's wrong. Just eyeballing the scatterplot, the R-squared value looks extremely high to me. Also (and this question may not make sense statistically), do you have the R-squared value just for the 20%-80% range where the predictions are most accurate?

(2) Intuitively, the Brier Score makes sense to me (I didn't work through the math), but does it vary much from just comparing the R-squared scores to one another? For your second panel ("538 vs. Reality" and "Bettors vs. Reality," do you have the R-squared scores for the two regressions? And do you have the R-squared scores for each? And the R-squared scores in the 20%-80% range for each?

(3) I'm usually particularly interested in the very close races. Do you happen to have a comparison of the bettors and 538 in those races, say in the 45% to 55% range or some similar mid-range?

Thanks for doing all this work. I find it fascinating.

Brier is the equivalent of R-squared but calculated for binary outcomes predicted using probabilities. It's the equivalent, so no need for a specific R-squared

In other words, when skinning a rabbit (evaluation of binary outcome predictions) and you already have a sharp knife (Brier), there is no need for a fork (R-squared)

This is a great piece! I'm especially intrigued by the difference in 538's brier scores depending on whether a republican or democrat won, it seems whatever secrete sauce is giving them much more accurate predictions when democrats win should be copied over?

The trading price of a prediction market fluctuates in the days and weeks before an election. How did you condense that graph to a single number? Did you just use the trading price 24 hours before the polls close?

If so, that prediction market snapshot has an advantage over the forecasters, who are averaging in time-lagged poll data. And your analysis would not imply that a prediction market's price is reasonable in the weeks before an election, just that their last-day price (which you used in this analysis) is reasonable.

Yes, markets have the advantage you describe on election day. They have that same advantage on any other day, too! The modelers always have lagged data.

I agree it’d be interesting to look at previous time slices, too. But very labor intensive. The preprint study I linked did do that (but only for 8 races) so you may be interested in checking that out for a small time-series rather than large snapshot analysis.

By the way, this is Marshall from SHHS. ;-). I've enjoyed reading your substack and following your betting odds website ever since I saw your name pop up on statestar.

I would guess people making bets in prediction markets likely consider 538's forecast when deciding (whereas 538's forecast doesn't take prediction markets into account I assume), so prediction markets' success could in theory be derived from 538's success. Something that might be worth looking at is how often prediction markets agree with 538. Also I'd be curious what the Brier scores look like for only cases where 538 and betting markets disagree, which might tell more about whether prediction markets are accurate independently of 538.

These results didn't gel with my memory of the 2020 election, so I had a quick look at your data to tease out what was going on. It seems the vast majority of the market's outperformance in 2020 was during the primaries (0.018 vs 0.077) rather than the general (0.052 vs 0.051). If we ignore primaries, 538 outperformed in every election vs the markets

Thanks. I've added a chart with all different categories, broken both by year and race type (including breaking out senate/govs.) You can search for "Added, Dec 6, 2022" to get to it.

Thank you, but there is also value in familiarity and simplicity. When I was trying cases, I loved to use R-squared values because it made it simple for the judge to understand how little of the variance the other side's model was able to explain.

If there's a way to us Brier for the same thing, that would work too.

Excellent article, thx. (I'm now wondering what link at what site brought me here, and whom to thank.)

Not to add to your to-do list, but as I understand it, you have evaluated the accuracy of prediction made by bettors vs. 538 *at 8 AM on Election Day*.

Fair enough, but its only July! I would think the data would be available to evaluate the accuracy of both approaches one month before the election, two months out, and so on.

In other words, even if we now know that bettors will be better on the morning of Nov 5, who is likely to be more accurate today?

Thanks for your effort here. I'm off to see how hard you hit subscribers, and expect to become one.

Thank you very much! And agreed, it would be interesting to do an analysis that looks at accuracy on more than just the day of the election, too.

Excellent article. I kind of love that bettors can perform as well as a professional in the business of statistics. I'll probably be checking out both sites for some electoral news soon, so thank you for writing this!

This is a very interesting article. A questions.

(1) With respect to your first chart ("Bettors vs. Reality"), do you have the R-square value? My statistics days are way behind me, but I think that is the goodness of fit measure -- please let me know if that's wrong. Just eyeballing the scatterplot, the R-squared value looks extremely high to me. Also (and this question may not make sense statistically), do you have the R-squared value just for the 20%-80% range where the predictions are most accurate?

(2) Intuitively, the Brier Score makes sense to me (I didn't work through the math), but does it vary much from just comparing the R-squared scores to one another? For your second panel ("538 vs. Reality" and "Bettors vs. Reality," do you have the R-squared scores for the two regressions? And do you have the R-squared scores for each? And the R-squared scores in the 20%-80% range for each?

(3) I'm usually particularly interested in the very close races. Do you happen to have a comparison of the bettors and 538 in those races, say in the 45% to 55% range or some similar mid-range?

Thanks for doing all this work. I find it fascinating.

Brier is the equivalent of R-squared but calculated for binary outcomes predicted using probabilities. It's the equivalent, so no need for a specific R-squared

In other words, when skinning a rabbit (evaluation of binary outcome predictions) and you already have a sharp knife (Brier), there is no need for a fork (R-squared)

This is a great piece! I'm especially intrigued by the difference in 538's brier scores depending on whether a republican or democrat won, it seems whatever secrete sauce is giving them much more accurate predictions when democrats win should be copied over?

The trading price of a prediction market fluctuates in the days and weeks before an election. How did you condense that graph to a single number? Did you just use the trading price 24 hours before the polls close?

If so, that prediction market snapshot has an advantage over the forecasters, who are averaging in time-lagged poll data. And your analysis would not imply that a prediction market's price is reasonable in the weeks before an election, just that their last-day price (which you used in this analysis) is reasonable.

Yes, it is all morning of.

Yes, markets have the advantage you describe on election day. They have that same advantage on any other day, too! The modelers always have lagged data.

I agree it’d be interesting to look at previous time slices, too. But very labor intensive. The preprint study I linked did do that (but only for 8 races) so you may be interested in checking that out for a small time-series rather than large snapshot analysis.

Cool, thanks!

By the way, this is Marshall from SHHS. ;-). I've enjoyed reading your substack and following your betting odds website ever since I saw your name pop up on statestar.

Oh very cool! Hope you are doing well these days.

I would guess people making bets in prediction markets likely consider 538's forecast when deciding (whereas 538's forecast doesn't take prediction markets into account I assume), so prediction markets' success could in theory be derived from 538's success. Something that might be worth looking at is how often prediction markets agree with 538. Also I'd be curious what the Brier scores look like for only cases where 538 and betting markets disagree, which might tell more about whether prediction markets are accurate independently of 538.

These results didn't gel with my memory of the 2020 election, so I had a quick look at your data to tease out what was going on. It seems the vast majority of the market's outperformance in 2020 was during the primaries (0.018 vs 0.077) rather than the general (0.052 vs 0.051). If we ignore primaries, 538 outperformed in every election vs the markets

Thanks. I've added a chart with all different categories, broken both by year and race type (including breaking out senate/govs.) You can search for "Added, Dec 6, 2022" to get to it.

Konrad -

Thank you, but there is also value in familiarity and simplicity. When I was trying cases, I loved to use R-squared values because it made it simple for the judge to understand how little of the variance the other side's model was able to explain.

If there's a way to us Brier for the same thing, that would work too.