Skip to main content

How many subjects in an economic experiment?

How many subjects should there be in an economic experiment? One answer to that question would be to draw on power rules for statistical significance. In short, you need enough subjects to be able to reasonably reject the null hypothesis you are testing. This approach, though, has never really been standard in experimental economics. There are two basic reasons for this - practical and theoretical. 

From a practical point of view the power rules may end up suggesting you need a lot of subjects. Suppose, for instance, you want to test cooperation within groups of 5 people. Then the unit of observation is the group. So, you need 5 subjects for 1 data point. Let's suppose that you determine you need 30 observations for sufficient power (which is a relatively low estimate). That is 30 x 5 = 150 subjects per treatment. If you want to compare 4 treatments that means 600 subjects. This is a lot of money (at least $10,000) and also a lot of subjects to recruit to a lab. In simple terms, it is not going to happen.  

That my appear to be sloppy science but there is a valid get-out clause. Most of experimental economics is about testing a theoretical model. This allows for a Bayesian mindset in which you have a prior belief about the validity of the theory and the experimental data allows you to update that belief. The more subjects and observations you have the more opportunity to update your beliefs. But even a small number of subjects is useful in updating your beliefs. Indeed, some of the classic papers in experimental and behavioral economics have remarkably few subjects. For instance, the famous Tversky and Kahneman (1992) paper on prospect theory had only 25 subjects. That did not stop the paper becoming a classic.

Personally I am a fan of the Bayesian mindset. This mindset doesn't, though, fit comfortably with how economic research is typically judged. What we should be doing is focusing on a body of work in which we have an accumulation of evidence, over time, for or against a particular theory. In practice research is all too often judged at the level of a single paper. That incentivizes the push towards low p values and an over-claiming of the significance of a specific experiment. 

Which brings us on to the replication crisis in economics and other disciplines. A knee-jerk reaction to the crisis is to say we need ever-bigger sample sizes. But, that kind of misses the point. A particular experiment is only one data point because it is run with a specific subject pool using a specific protocol. Adding more subjects does not solve that. Instead we need replication with different subject pools under different protocols - the slow accumulation of knowledge. And we need to carefully document research protocols.

My anecdotal impression is that journal editors and referees are upping the ante on how many subjects it takes to get an experiment published (without moving things forward much in terms of documenting protocols). To put that theory to a not-at-all scientific test I have compared the papers that appeared in the journal Experimental Economics in its first year (1998) and most recent edition (March 2018). Let me emphasize that the numbers here are rough-and-ready and may well have several inaccuracies. If anyone wants to do a more scientific comparison I would be very keen to see it. 

Anyway, what do we find? In 1998 the average number of subjects was 187, which includes the study of Cubbit, Starmer and Sugden where half the population of Norwich seemingly took part. In 2018 the average is 383. So, we see an increase. Indeed, only the studies of Cubbit et al. and Isaac and Walker are above the minimum in 2018. The number of observations per treatment are also notably higher in 2018 at 46 compared to 1998 when it was 16. Again, those numbers are almost certainly wrong (for instance the number of independent observations in Kirchler and Palan is open to interpretation). The direction of travel, though, seems clear enough. (It is also noticeable that around half of the papers in 1998 were survey papers or papers reinterpreting old data sets. Not in 2018.)  


At face value we should surely welcome an increase in the number of observations? Yes, but only if it does not come at the expense of other things. First we need to still encourage replication and the accumulation of knowledge. Experiments with a small number of subjects can still be useful. And, we also do not want to create barriers to entry. At the top labs running an experiment is relatively simple - the money, subject pool, programmers, lab assistants, expertise etc. are there and waiting. For others it is not so simple. The more constraints we impose for an experiment to count as 'well-run' the more experimental economics may potentially become 'controlled' by the big labs. If nothing else, that poses a potential problem in terms of variation in subject pool. Big is, therefore, not necessarily better. 

Comments

Popular posts from this blog

Revealed preference, WARP, SARP and GARP

The basic idea behind revealed preference is incredibly simple: we try to infer something useful about a person's preferences by observing the choices they make. The topic, however, confuses many a student and academic alike, particularly when we get on to WARP, SARP and GARP. So, let us see if we can make some sense of it all.           In trying to explain revealed preference I want to draw on a  study  by James Andreoni and John Miller published in Econometrica . They look at people's willingness to share money with another person. Specifically subjects were given questions like:  Q1. Divide 60 tokens: Hold _____ at $1 each and Pass _____ at $1 each.  In this case there were 60 tokens to split and each token was worth $1. So, for example, if they held 40 tokens and passed 20 then they would get $40 and the other person $20. Consider another question: Q2. D...

Measuring risk aversion the Holt and Laury way

Attitudes to risk are a key ingredient in most economic decision making. It is vital, therefore, that we have some understanding of the distribution of risk preferences in the population. And ideally we need a simple way of eliciting risk preferences that can be used in the lab or field. Charles Holt and Susan Laury set out one way of doing in this in their 2002 paper ' Risk aversion and incentive effects '. While plenty of other ways of measuring risk aversion have been devised over the years I think it is safe to say that the Holt and Laury approach is the most commonly used (as the near 4000 citations to their paper testifies).           The basic approach taken by Holt and Laury is to offer an individual 10 choices like those in the table below. For each of the 10 choices the individual has to go for option A or option B. Most people go for option A in choice 1. And everyone should go for option B in choice 10. At some point, therefore, we expect the...

Nash bargaining solution

Following the tragic death of John Nash in May I thought it would be good to explain some of his main contributions to game theory. Where better to start than the Nash bargaining solution. This is surely one of the most beautiful results in game theory and was completely unprecedented. All the more remarkable that Nash came up with the idea at the start of his graduate studies!          The Nash solution is a 'solution' to a two-person bargaining problem . To illustrate, suppose we have Adam and Beth bargaining over how to split some surplus. If they fail to reach agreement they get payoffs €a and €b respectively. The pair (a, b) is called the disagreement point . If they agree then they can achieve any pair of payoffs within some set F of feasible payoff points . I'll give some examples later. For the problem to be interesting we need there to be some point (A, B) in F such that A > a and B > b. In...