통계적 유의성에 대한 나쁜 정의

Andrew Gelman 교수가 U.S. Department of Health and Human Services에 올라온 통계적 유의성에 대한 정의를 보고 A bad definition of statistical significance라는 글을 남겼다. 원래 올라와 있던 정의는 다음과 같다.

정의: 연구의 결과가 참일 가능성이 있는지를 측정하는 수학적 기법이다. 통계적 유의성은 연구에서 관찰된 효과가 우연으로 인해 발생할 확률로 계산된다. 통계적 유의성은 보통 p-value로 표현된다. p-value가 적을수록 결과가 우연적으로 발생할 확률이 적다(결과가 참일 확률이 더 높다). 연구자들은 일반적으로 p-value가 0.05보다 적다면 결과가 참일 것으로 믿는다.
English
Definition: A mathematical technique to measure whether the results of a study are likely to be true. Statistical significance is calculated as the probability that an effect observed in a research study is occurring because of chance. Statistical significance is usually expressed as a P-value. The smaller the P-value, the less likely it is that the results are due to chance (and more likely that the results are true). Researchers generally believe the results are probably true if the statistical significance is a P-value less than 0.05 (p<.05).

Andrew Gelman은 몇 가지로 비판을 한 후에 자신의 정의를 적는다. 자세한 내용은 본래 글을 참고하면 된다.

정의: 단일 연구에서 근거의 강도를 측정하기 위한 수학적 기법이다. p-value가 0.05 미만이면 통계적 유의성이 있다고 관행적으로 표현했다. p-value는 실제 효과가 없다는 영가설 아래, 어떤 결과가 관찰된 것 또는 그 보다 더 큰 차이를 보게 될 확률이다. 그러므로, p-value가 작으면, 이 측정 아래 영가설과 데이터가 덜 일치하게 된다.
English
Definition: A mathematical technique to measure the strength of evidence from a single study. Statistical significance is conventionally declared when the p-value is less than 0.05. The p-value is the probability of seeing a result as strong as observed or greater, under the null hypothesis (which is commonly the hypothesis that there is no effect). Thus, the smaller the p-value, the less consistent are the data with the null hypothesis under this measure.

번역이 매끄럽지는 않지만 블로그에 남겨놓는다.