Friday, November 6, 2015

P value and revalue

The P-Value controversy - What's a practitioner to do?

The recent controversy on p-values left many of us who work with data wondering what to do. Should we abandon p-values altogether and switch instead to reporting confidence intervals and effect sizes? Or should we go back to the basics and make sure we fully understand what p-values mean and how they should actually be applied?

There are valid arguments to be made on both sides of the p-value divide. Assuming we still hold some faith in the goodness of the p-values, how do we re-calibrate our approach to using them?

First, we need to understand some of the history behind p-values in order to get some proper context. As clarified by Regina Nuzzo, the concept of p-value was introduced in the 1920's by Fisher "simply as an informal way to judge whether evidence was significant in the old-fashioned sense: worthy of a second look". However, over the years, p-values became the "bottom line" to a study (to borrow terminology employed by Steven Novella) - the end of the road rather than a promising beginning.

The notion of p-value as the "bottom line" for a study is interesting because it forces us to think about what needs to happen both before and after we draw that line.

Before we draw the "bottom line" for a study, we must remember that the p-value itself is an estimate, so its reliability will depend on a variety of factors, including an adequate study design, an appropriately selected sample, a set of validated and clean data collected from that sample, an appropriate statistical analysis for the research question of interest, a sufficiently large sample size, etc. This is what prompted Simonsohn to advise scientists to be transparent and "admit everything": how they determined their sample size, all data exclusions (if any), all data manipulations and all outcome measures used in the study . All of this information will provide a context for others to judge whether or not they can trust p-values reported in scientific publications.

After we draw the "bottom line" for a study, we need to bear in mind that the p-value requires us to make a decision and draw a conclusion: "Based on a p-value of 0.001 for our test, we decide to reject the null hypothesis Ho in favour of the alternative hypothesis Ha and we conclude that the data in our study provide strong evidence against the null hypothesis Ho." If we remember to focus on ourselves as agents responsible for making decisions and drawing conclusions based on evidence provided by the data, we will avoid the trap of believing we can play God and verify which of the null and alternative hypotheses is true. In 1925, Fisher himself claimed that the p-value indicates the strength of evidence against the null hypothesis. Years later, in 1955, he further claimed that "significance tests can falsify but never verify hypotheses".

If our data provide strong evidence against the null hypothesis Ho, we are in that promising situation where our findings are worthy of a second look. Something interesting may be going on and the only way to know whether this is the case is to try and replicate the findings of our study. We may be able to replicate these findings ourselves by conducting a second study or, most likely, others will get intrigued by our findings and proceed to conduct similar studies. In the latter situation, it is imperative for us to make it easy for others to replicate our study by adopting good practices advocated by proponents of reproducible research.

If our data fail to provide strong evidence against the null hypothesis Ho, we need to reflect on what may be at play (e.g., a sample size that was too small, a study design that was inadequate, a research question that needs to be refined, an outcome that needs to be reformulated, a fruitless research direction that needs to be aborted).

No study should be interpreted in isolation, just like no number (p-value included) should be interpreted in isolation


Isabella Ghement
по верхней ссылке диспут в линькедине


  1. Они длинно объясняют что "исследования - дело сложное, поэтому его нельзя свести к одному простому правилу". Это и так понятно всем. Причём называют "controversy" то, что никаким controversy не является: нет никакой группы статистиков заявляющих что попытка свести все к одному биту - это правильно.
    Вопрос в том, что делать с обучением студентов (потенциальных исследователей). А здесь срабатывает старое доброе правло (ярко проданной в мюзикле Чикаго) "remember, we can only sell them one idea at a time" Из этого и выросло (и во многом переросло свою полезность p-value). В результате чего уже лет 20 по-моему, а то и больше, в университетах с особой концентрацией вбивают в головы студентов confidence interval (в надежде что с p-value они освоят если им прийдется читать литературу). А в журналах активно пропагандируют идею что отсутствие результата тоже нужно публиковать.

  2. на Р был наезд в нвчале года, но не понял за што и почему
    в демографии ни то, ни другое практически не пользуецо, разве што про очень маленькие населения речь пойдёт

    1. Да нет, вроде переиспользование p-value - давняя тема. По крайней мере в статистической среде (уже давно распостранившейся и в смежные дисциплины вроде эпидемиологии или экономики).
      Одна сторона связана с меньшей информативностью этого показателя - поскольку любое очень маленькое отклонение при росте выборки становится статистически значимой, другая - незначимые результаты хуже публикуются (а значит возникает систематическая ошибка в наблюдениях)
      Может для демографии тема неактуальна?

    2. в классическом случае да, к примеру, коэф смертности населения США — в знаменателе 300 млн, но
      некоторые пытаются щетать продолжительность жизни для маленькой деревни, а для этого нужно поделить умерших каждого возраста на живущих каждого возраста, а не во всех возрастах они есть
      + проникновение социологии — доп.характеристики: мужчины, белые, без высшего образования