Tuesday, November 29, 2016

Statistics and current vote in the US

Thomas Bayes

What Do Non-Statisticians Need To Know About Bayesian Statistics?

Bayes gets a lot of buzz these days, so I asked Columbia University Professor Andrew Gelman what he thinks are the most important things non-statisticians need to know about Bayesian statistics. I also asked him if there things we need to be especially careful about when using these methods. Here are his replies.

You have to learn by doing, and one place to start is to look at some particular problem. One example that interested me recently was a website constructed by the sociologist Pierre-Antoine Kremp, who used the open-source statistics language R and the open-source Bayesian inference engine Stan (named after Stanislaw Ulam, the inventor of the Monte Carlo simulation method) to combine U.S. national and state polls to make daily forecasts of the U.S. presidential election.

In an article for Slate, I called this “the open-source poll aggregator that will put all other poll aggregators out of business” because ultimately you can’t beat the positive network effects of free and open-source - the more people who see this model, play with it, and probe its weaknesses, the better it can become. The Bayesian formalism allows a direct integration of data from different sorts of polls in the context of a time-series prediction models.

You ask if there are things we need to be especially careful about. As a famous cartoon character once said, With great power comes great responsibility. Bayesian inference is powerful in the sense that it allows the sophisticated combination of information from multiple sources via partial pooling (that is, local inferences are constructed in part from local information and in part from models fit to non-local data), but the flip side is that when assumptions are very wrong, conclusions can be far off too.

That’s why Bayesian methods need to be continually evaluated with calibration checks, comparisons of observed data to simulated replications under the model, and other exercises that give the model an opportunity to fail. Statistical model-building, but maybe especially in its Bayesian form, is an ongoing process of feedback and quality control.

A statistical procedure is a sort of machine that can run for awhile on its own, but eventually needs maintenance and adaptation to new conditions. That’s what we’ve seen in the recent replication crisis in psychology and other social sciences - methods of null hypothesis significance testing and p-values, which had been developed for analysis of certain designed experiments in the 1930s, were no longer working a modern settings of noisy data and uncontrolled studies.

Savvy observers had realized this for a while - psychologist Paul Meehl was writing acerbically about statistically-driven pseudoscience as early as the 1960s - but it took awhile for researchers in many professions to catch on. I’m hoping that Bayesian modelers will be sooner to recognize their dead ends, and in my own research I’ve put a lot of effort into developing methods for checking model fit and evaluating predictions.

________________________________________________________________

Kevin Gray is President of Cannon Gray, a marketing science and analytics consultancy.

спёр из ЛинькедИна

Байесовская вероятность — это интерпретация понятия вероятности, используемая в байесовской теории. Вероятность определяется как степень уверенности в истинности суждения. Для определения степени уверенности в истинности суждения при получении новой информации в байесовской теории используется теорема Байеса.

No comments: