In psychology especially, and some other fields, the ‘null hypothesis’ is used. That means that the researcher ‘assumes’ that there is no effect or difference in what he is measuring. If you know that the average person smiles 20 times a day, and you want to check if someone (person A) making jokes around a person (person B) all day makes person B smile more than average, you assume that there will be no change. In other words, the expected outcome is that person B will still smile 20 times a day.
The experiment is performed and data collected. In this example, how many times person B smiled during the day. Do that for a lot of people, and you have your data set. Let’s say that they discovered the average amount of smiles per day was 25 during the experimental procedure. Using some fancy statistics (not really fancy, but it sure can seem like it) you calculate the probability that you would get an average of 25 smiles a day if the assumption that making jokes around a person would not change the 20-per-day average. The more people that you experimented on, and the larger the deviance from the assumed average, the lower the probability. If the probability is less than 5%, you say that p<0.05, and for a research experiment like the one described above, that’s probably good enough for your field to pat you on the back and tell you that the ‘null hypothesis’ of there being no effect from your independent variable (the making jokes thing) is wrong, and you can confidently say that making jokes will cause people to smile more, on average.
If you are being more rigorous, or testing multiple independent variables at once, as you might for examining different therapies or drugs, you starting making your X smaller in the p<X statement. Good studies will predetermine what X they will use, so as to avoid making the mistake of settling on what was ‘good enough’ as a number that fits your data.
Good example and well explained. We should team up on a book on science for lay people!
Your point about specifying the null hypothesis and the p value is very important. Another way studies can fail is if you pick 20 different variables, like you mentioned, and then look to see if any of them give you p<0.05. So in your example, we measure smiling and 19 other factors besides being told jokes. Let’s say the weather, the day of the week, what color clothes the person is wearing, what they had for breakfast, etc. Again, due to statistics, one of those 20 is going to appear relevant by chance. You’re essentially doing 20 experiments in one so again you’ll get one spurious result that you can report as “success”.
Experimental design is tough and it’s hard to grok until you’ve had to design and run your own experiment including the math. That makes it easy for people to pass off bad science as legitimate, whether accidentally or on purpose. And it’s why peer review is important, where your study gets sent to another researcher in your field for critique before publication.
There’s other things besides bad math that can trip you up like correlation vs causation, and how the data is gathered. In the above example, you might try to save money by asking subjects to self report on their smiling. But people are bad at doing that due to fallible memory and bias (did that really count as a full smile?). Ideally you want to follow them around and count yourself, with a clear definition of what counts as a smile. Or make them wear a camera that does facial recognition. But both of those cost more money than just handing someone a piece of paper and a pencil and hoping for the best. That’s why you should always be extra suspicious of studies that use self reporting. As my social psych prof said, surveys are the worst form of data collection. It’s what makes polling hard because what people say and what they do are often entirely different things.
I think most science books are understandable by laypersons, except those that are memorization heavy, like biochemistry, or organic chemistry, or some parts of things like microbiology and pathophysiology. Statistics books and research design were pretty understandable, except for the actual math, heh. There really needs to be a push for people to read them casually, and encouraged to just stick to the concept parts and ignore the math and memorization of minor stuff. The free textbooks out there (I think openstax is pretty good, personally) are getting to the point where I think people might read them just for the ‘ooh’ part of science. Heck, it’s why psychology is such an enticing subject in the first place; it’s basically the degree of human interest facts.
I just thought that understanding the way the null hypothesis is used is important to really grasp what information the p is really conveying.
:D And for the parts about self reporting bias, and definitions and such, I was really, really having to hold myself back from talking about what makes your variables independent or dependent, operational definitions, ANOVA and MANOVA and t-tables and Cohen’s D value and the emphasis on not p but now the error bars and all the other lovely goodies. The stuff really brings me back, eh? ;)
To expand on the other fella’s explanation:
In psychology especially, and some other fields, the ‘null hypothesis’ is used. That means that the researcher ‘assumes’ that there is no effect or difference in what he is measuring. If you know that the average person smiles 20 times a day, and you want to check if someone (person A) making jokes around a person (person B) all day makes person B smile more than average, you assume that there will be no change. In other words, the expected outcome is that person B will still smile 20 times a day.
The experiment is performed and data collected. In this example, how many times person B smiled during the day. Do that for a lot of people, and you have your data set. Let’s say that they discovered the average amount of smiles per day was 25 during the experimental procedure. Using some fancy statistics (not really fancy, but it sure can seem like it) you calculate the probability that you would get an average of 25 smiles a day if the assumption that making jokes around a person would not change the 20-per-day average. The more people that you experimented on, and the larger the deviance from the assumed average, the lower the probability. If the probability is less than 5%, you say that p<0.05, and for a research experiment like the one described above, that’s probably good enough for your field to pat you on the back and tell you that the ‘null hypothesis’ of there being no effect from your independent variable (the making jokes thing) is wrong, and you can confidently say that making jokes will cause people to smile more, on average.
If you are being more rigorous, or testing multiple independent variables at once, as you might for examining different therapies or drugs, you starting making your X smaller in the p<X statement. Good studies will predetermine what X they will use, so as to avoid making the mistake of settling on what was ‘good enough’ as a number that fits your data.
Good example and well explained. We should team up on a book on science for lay people!
Your point about specifying the null hypothesis and the p value is very important. Another way studies can fail is if you pick 20 different variables, like you mentioned, and then look to see if any of them give you p<0.05. So in your example, we measure smiling and 19 other factors besides being told jokes. Let’s say the weather, the day of the week, what color clothes the person is wearing, what they had for breakfast, etc. Again, due to statistics, one of those 20 is going to appear relevant by chance. You’re essentially doing 20 experiments in one so again you’ll get one spurious result that you can report as “success”.
Experimental design is tough and it’s hard to grok until you’ve had to design and run your own experiment including the math. That makes it easy for people to pass off bad science as legitimate, whether accidentally or on purpose. And it’s why peer review is important, where your study gets sent to another researcher in your field for critique before publication.
There’s other things besides bad math that can trip you up like correlation vs causation, and how the data is gathered. In the above example, you might try to save money by asking subjects to self report on their smiling. But people are bad at doing that due to fallible memory and bias (did that really count as a full smile?). Ideally you want to follow them around and count yourself, with a clear definition of what counts as a smile. Or make them wear a camera that does facial recognition. But both of those cost more money than just handing someone a piece of paper and a pencil and hoping for the best. That’s why you should always be extra suspicious of studies that use self reporting. As my social psych prof said, surveys are the worst form of data collection. It’s what makes polling hard because what people say and what they do are often entirely different things.
I think most science books are understandable by laypersons, except those that are memorization heavy, like biochemistry, or organic chemistry, or some parts of things like microbiology and pathophysiology. Statistics books and research design were pretty understandable, except for the actual math, heh. There really needs to be a push for people to read them casually, and encouraged to just stick to the concept parts and ignore the math and memorization of minor stuff. The free textbooks out there (I think openstax is pretty good, personally) are getting to the point where I think people might read them just for the ‘ooh’ part of science. Heck, it’s why psychology is such an enticing subject in the first place; it’s basically the degree of human interest facts.
I just thought that understanding the way the null hypothesis is used is important to really grasp what information the p is really conveying.
:D And for the parts about self reporting bias, and definitions and such, I was really, really having to hold myself back from talking about what makes your variables independent or dependent, operational definitions, ANOVA and MANOVA and t-tables and Cohen’s D value and the emphasis on not p but now the error bars and all the other lovely goodies. The stuff really brings me back, eh? ;)