How many subjects should there be in an economic experiment? One answer to that question would be to draw on power rules for statistical significance. In short, you need enough subjects to be able to reasonably reject the null hypothesis you are testing. This approach, though, has never really been standard in experimental economics. There are two basic reasons for this - practical and theoretical.
From a practical point of view the power rules may end up suggesting you need a lot of subjects. Suppose, for instance, you want to test cooperation within groups of 5 people. Then the unit of observation is the group. So, you need 5 subjects for 1 data point. Let's suppose that you determine you need 30 observations for sufficient power (which is a relatively low estimate). That is 30 x 5 = 150 subjects per treatment. If you want to compare 4 treatments that means 600 subjects. This is a lot of money (at least $10,000) and also a lot of subjects to recruit to a lab. In simple terms, it is not going to happen.
That my appear to be sloppy science but there is a valid get-out clause. Most of experimental economics is about testing a theoretical model. This allows for a Bayesian mindset in which you have a prior belief about the validity of the theory and the experimental data allows you to update that belief. The more subjects and observations you have the more opportunity to update your beliefs. But even a small number of subjects is useful in updating your beliefs. Indeed, some of the classic papers in experimental and behavioral economics have remarkably few subjects. For instance, the famous Tversky and Kahneman (1992) paper on prospect theory had only 25 subjects. That did not stop the paper becoming a classic.
Personally I am a fan of the Bayesian mindset. This mindset doesn't, though, fit comfortably with how economic research is typically judged. What we should be doing is focusing on a body of work in which we have an accumulation of evidence, over time, for or against a particular theory. In practice research is all too often judged at the level of a single paper. That incentivizes the push towards low p values and an over-claiming of the significance of a specific experiment.
Which brings us on to the replication crisis in economics and other disciplines. A knee-jerk reaction to the crisis is to say we need ever-bigger sample sizes. But, that kind of misses the point. A particular experiment is only one data point because it is run with a specific subject pool using a specific protocol. Adding more subjects does not solve that. Instead we need replication with different subject pools under different protocols - the slow accumulation of knowledge. And we need to carefully document research protocols.
My anecdotal impression is that journal editors and referees are upping the ante on how many subjects it takes to get an experiment published (without moving things forward much in terms of documenting protocols). To put that theory to a not-at-all scientific test I have compared the papers that appeared in the journal Experimental Economics in its first year (1998) and most recent edition (March 2018). Let me emphasize that the numbers here are rough-and-ready and may well have several inaccuracies. If anyone wants to do a more scientific comparison I would be very keen to see it.
Anyway, what do we find? In 1998 the average number of subjects was 187, which includes the study of Cubbit, Starmer and Sugden where half the population of Norwich seemingly took part. In 2018 the average is 383. So, we see an increase. Indeed, only the studies of Cubbit et al. and Isaac and Walker are above the minimum in 2018. The number of observations per treatment are also notably higher in 2018 at 46 compared to 1998 when it was 16. Again, those numbers are almost certainly wrong (for instance the number of independent observations in Kirchler and Palan is open to interpretation). The direction of travel, though, seems clear enough. (It is also noticeable that around half of the papers in 1998 were survey papers or papers reinterpreting old data sets. Not in 2018.)
At face value we should surely welcome an increase in the number of observations? Yes, but only if it does not come at the expense of other things. First we need to still encourage replication and the accumulation of knowledge. Experiments with a small number of subjects can still be useful. And, we also do not want to create barriers to entry. At the top labs running an experiment is relatively simple - the money, subject pool, programmers, lab assistants, expertise etc. are there and waiting. For others it is not so simple. The more constraints we impose for an experiment to count as 'well-run' the more experimental economics may potentially become 'controlled' by the big labs. If nothing else, that poses a potential problem in terms of variation in subject pool. Big is, therefore, not necessarily better.
Comments
Post a Comment