University of Bologna Do Economist Try to Falsify Their Theories Essay


 write essay about this include Karl Popper david hume ultimatum game and another example if you have

1 attachmentsSlide 1 of 1attachment_1attachment_1.slider-slide > img { width: 100%; display: block; }
.slider-slide > img:focus { margin: auto; }

Unformatted Attachment Preview

Experiments in Economics
Francesco Guala
(May 2017)
This essay combines material from two previously published papers:
? “Experiments”, in The Sage Handbook of the Philosophy of Social Sciences, edited
by Ian Jarvie and Jesus Zamora-Bonilla. London: Sage (2011), pp. 477-493;
? “Experimentation in Economics”, in Handbook of the Philosophy of Science, Vol.
13: Philosophy of Economics, edited by Uskali Mäki. Amsterdam: Elsevier (2012),
pp. 597-640.
1. Introduction
While laboratory experiments are widely considered a powerful method of investigation
in the natural sciences, the social sciences seem to be stuck at a pre-Galilean stage of
development, where the epistemic advantages and limitations of the experimental
method are still controversial. The landscape, to be sure, is variegated: whereas in some
disciplines, like economics, experimentation has recently made a breakthrough, in
others, like political science, it is slowly gaining a foothold.1 Meanwhile, laboratory
research is still considered marginal and is used only rarely by sociologists and
Accounting for these differences would require an inquiry in the history of the various
branches of the social sciences that goes well beyond the scope of this chapter. One
claim, however, is true pretty much across the board: despite its controversial status,
experimentation has always exerted an influence from outside the social sciences – i.e.,
from those disciplines that use it regularly for the generation of scientific knowledge.
The most important influence has not come from the natural sciences, though, but from
From a methodological point of view experimental psychology had reached by the middle
of the twentieth century a high level of stability and had developed a set of tools that
could be easily exported and applied in neighbour disciplines.2 Given the affinity of their
subject matter, it was more natural for social scientists to borrow methodological tools
from psychologists than from, say, nuclear physicists. For the same reason it was easier
for single experimental results to cross the disciplinary boundaries and be transferred
from psychology to the social sciences.
The work of reference for experimental economics is Kagel and Roth (eds. 1995, forthcoming).
On experiments in political science, see Morton and Williams (2008).
2 Danziger (1994) and Mandler (2007) are excellent sources on the history of experimentation in
The examples that will be discussed in this chapter are quite typical, from this respect.
Ultimatum Game experiments are now routinely used throughout the social sciences,
and are a direct offspring of the contamination between economic theory and the
psychology of decision making that took place in the 1960s and 1970s. The examples
have been selected mainly for illustrative purposes, to highlight the remarkable
uniformity, at the methodological level, that holds across the experimental branches of
the various social sciences. In spite of different goals and theoretical motivations,
laboratory experiments share a hard core of design principles aimed at supporting a
special kind of inferences and securing a particular sort of scientific knowledge. A large
part of this chapter will be devoted to illustrate what this knowledge and inferences are,
and how the design principles can help achieving these goals. Finally, I shall outline the
most pressing methodological challenges faced by experimental scientists, and discuss
some solutions that have been proposed in the philosophical and scientific literature.
A proviso is required before I start: I will devote this chapter entirely to laboratory
experiments performed in controlled conditions. These are not the only experiments
performed by social scientists. So-called “quasi-experiments”, “field experiments”,
“natural experiments” and “policy experiments” are widely used and – space permitting
– would deserve an equally detailed methodological examination. However, to cope with
limits of space I have decided to focus on controlled experiments for two main reasons:
understanding controlled experimentation is the necessary preliminary step for
understanding the other forms of less tightly controlled experimentation. The very
concepts of “quasi-experiment”, “field experiment” and so forth, are defined in relation to
the perfectly controlled experiment. Once the latter is understood, the virtues and
limitations of other forms of investigation are relatively easy to articulate. Second, while
quasi- and natural experiments have a long history and are relatively uncontroversial in
social science, the rise of laboratory science has been fraught with many obstacles and
was opposed by sceptical challenges of various kinds. These challenges provide a
particularly interesting arena for philosophers and methodologists interested in the
improvement of social scientific research.
Philosophical critiques of laboratory experimentation tend to follow a standard
argumentative pattern. Typically some features of social reality are highlighted that
distinguish it sharply from natural reality. Thus, for example, Roy Bhaskar in a wellknown essay states that “the objects of social inquiry […] only ever manifest themselves
in open systems” (1979: 45). According to an editorial in The Economist, in contrast, the
difference is that “economics yields no natural laws or universal constants” (Economics
Focus 1999: 96). The second step consists in claiming that such features impede a
fruitful application of the experimental method for the production of social scientific
knowledge: the experimental method requires full control of disturbing factors (“closed
systems”, in critical realist jargon), or the existence of laws of nature that hold both
within and outside the laboratory walls. The arguments then conclude by stating the
futility – or even the impossibility – of laboratory experimentation in the social sciences.
I am generally unimpressed by such arguments, for a number of reasons. To begin with,
they usually betray an outdated and unrealistic conception of the methodology of the
natural sciences. It is by now widely accepted, for example, that so-called laws of nature
play a marginal role in many disciplines, such as biology, where the experimental
method is routinely and fruitfully used. Secondly, these arguments often rely on
ontological assumptions (about human intentionality and free will, for example) that are
as controversial as the conclusions they are meant to support. But finally, and most
importantly, philosophical sceptics often turn out to be refuted by the development of
science itself. For example it has become extremely difficult to keep arguing that
economic experiments are impossible or futile, now that experimenters have received a
Nobel Prize. As in other historical cases (the debates on the possibility of pure void, or
on the interpretation of the infinitesimal calculus come to mind) scientific progress often
makes philosophers look in retrospect like silly complainers who fail to see what new
methodological tools can deliver. To avoid that fate, we should better start from science
itself, and simply see how far it can take us. The next two sections will be entirely
devoted to that.
The Ultimatum Game
Imagine you have just been given $10. The sum will have to be shared with an
anonymous, invisible partner, and you will have to agree with him or her on how to
divide it. The room for discussion is almost inexistent: you (the “Proposer”) will only be
able to offer one division of the cake, and your partner (the “Responder”) will only be
able to accept or reject it. If he or she rejects it, you will both lose the opportunity of
sharing the $10; if he/she accepts, you will both walk out with your share, as determined
by the proposed division.
This is essentially the strategic situation known as the Ultimatum Game. The
Ultimatum Game was first designed and run by a group of German experimental
economists led by Werner Güth in the early 1980s (Güth, Schmittberger and Schwarze
1982). Güth and colleagues intended to investigate sequential bargaining, and chose the
Ultimatum Game mainly because it is the simplest possible sequential bargaining
problem that one can conceive of. Bargaining is customarily represented in economic
theory as a sharing problem, where the surplus from exchanging two goods has to be
allocated among the parties. The $10 in the Ultimatum Game stands for this surplus.
Simplicity was sought to minimize the cognitive costs of computation. It is well known
in fact that people struggle when they have to analyse complex dynamic games. In the
Ultimatum Game no one can fail to realise that the game will be over after the
Responder’s move, so the noise in the data due to misunderstandings of the decision
situation should be minimal.
According to standard game theory, Proposers should offer close to nothing and
Responders should accept.3 The idea is that Responders face a seemingly trivial decision
This “standard prediction” is actually derived from a fairly complex machinery: roughly, from a
theory of strategic play (the “core” of game theory), plus a set of assumptions about people’s
problem: either get nothing or get whatever Proposers have offered. Let us suppose that
the minimum amount that can be offered is $1. One dollar is better than nothing, so
Responders should accept. Proposers should know this, and offer $1. Under common
knowledge of the game and of rationality, the $1/9 split is the only equilibrium of the
Ultimatum Game.4
In their first study, Güth, Schmittberger and Schwartze found that Proposers offer on
average 35% of the cake. There was a substantial mode at 50% and very few rejections
by Responders (about 10%). When the game was repeated a week later (“experienced
players” condition) the average offer went down to 31%. The mode at 50% disappeared,
with most offers lying in the range of 20-30%. There were also more rejections with
repetition (about 30%). These results have since been replicated several times, and
constitute what is sometimes called the “Ultimatum Game anomaly”.
The politics of experimentation
Over the last two decades Ultimatum Games have stirred a heated debate. Ultimatum
data, to begin with, seem to refute one of the few solid planks of theoretical social
science: the model of selfish rationality that is at the core of much economics, political
science, and even theoretical biology and anthropology. Against standard rational choice
theory, experimental data seem to vindicate the folk intuition that human behaviour is
heavily influenced by social factors and often driven by other-regarding motives.5 For
this reason, the Ultimatum Game has become an important investigative tool in the diehard, trans-disciplinary debate on human morality (Gintis et al., eds. 2004).
Laboratory science aims at control, and the control of human beings is obviously a
politically charged issue. At the level of metaphysics, the experimental method seems to
clash with widely held beliefs in free will and individual agency. According to folk
ontology human beings differ from atoms and molecules in that their behaviour does not
obey strict laws of nature. And yet, as any experimenter knows, human behaviour is
highly predictable. Of course the behaviour of a single, specific individual may be hard
to predict exactly. But the behaviour of aggregates (even relatively small groups) follows
very systematic patterns, in some circumstances. The circumstances are important:
predictability often depends on the creation of rather precise choice situations, and
much experimental work is aimed at discovering what these situations are.
preferences and beliefs. Without such assumptions, game theory does not issue any specific
prediction about behaviour. A more detailed discussion of this issue can be found in Guala (2006,
4 More precisely: it is the only “sub-game perfect equilibrium” of the game. Here and in the
course of this chapter I try to gloss over the most technical aspects of game theory, except when it
is required by the discussion. In Chapter 16 of this Handbook Giacomo Bonanno provides an
introduction to the literature on game theoretic models in the social sciences.
5 A critical survey of the main interpretations of Ultimatum data can be found in Bicchieri (2006,
Ch. 3). See also Woodward (2009).
None of these qualifications, however, constitutes a major difference with respect to
experimentation in the natural sciences. The average behaviour of an aggregate (of, say,
particles) is always easier to predict than the behaviour of each constituent. And outside
well-specified initial and boundary conditions, the behaviour of physical particles may
be as unpredictable as the choices of human beings. Since folk ontology in this case (as
in many others) is misleading, we better set it aside and concentrate on the actual
practices of experimental scientists. As we shall see later, there are much better
explanations of the success of experimental scientists in the natural, compared to the
social sciences. These explanations have little to do with the ontology, and much more
with the politics of social scientific research. To put it briefly, there are things that we
do not want to do with human beings, which prevent us from exploiting the full power of
the method of experimental control. I will return to this issue at the end of this chapter.
The goals of experiment
What is the experimental method, exactly? Philosophical studies of experimental science
have been until recently surprisingly sparse. Philosophers of science have for a long
time endorsed a theory-centred view of scientific knowledge, according to which what we
know is mostly encapsulated in our (best) theories. Under the influence of logical
positivism in the 20th Century they came to represent even experimental data as sets of
linguistic reports of perceptual experience. As Karl Popper puts it in an often-quoted
passage, “theory dominates the experimental work from its initial planning up to its
finishing touches in the laboratory” (1934, p. 107).
During the 1980s, however, a new generation of studies of experimental practice have
challenged the theory-dominated approach.6 One of the novel insights provided by this
literature concerned the relative independence between theoretical and experimental
knowledge. Historians and philosophers noticed that a major goal of laboratory science
is the discovery, replication, and measurement of experimental phenomena. A
“phenomenon” in scientific jargon is an interesting regularity observed either in the
field or in a controlled setting. It usually cannot be directly observed, but rather
requires some inference from “noisy” data (Bogen and Woodward 1988). Finally, and
importantly, it displays a remarkable stability across changes in theoretical paradigms.
Experimental phenomena are often surprising. Sometimes they violate the predictions
of an established scientific theory, as in the case of the Ultimatum Game. Sometimes
they contradict our commonsensical intuitions about how people should behave in
situations of a certain kind. The observation of a surprising phenomenon however is just
the beginning of an experimental programme. Most experimental work is devoted to
investigating the robustness and replicability of phenomena in laboratory conditions.
And eventually, social scientists hope that the phenomenon observed in the lab will
teach them something about social behaviour and institutions outside the laboratory
walls. To understand how this long journey takes place, and how laboratory
Hacking (1983) is widely recognized as the pioneer of this “new experimentalism”. Useful
surveys of the literature can be found in Franklin (1998) and Morrison (1998).
experiments can contribute to our knowledge of the social world, we need to dig deeper
in their methodological and logical structure.
The Duhem-Quine problem
Game theory predicts that proposers should make minimal offers, which responders
should accept. But notice that the very concept of “theory” under test is not so clear-cut.
Economic theory does not impose strong restrictions on the contents of individual
preferences. An agent can in principle maximise all sorts of things (her income, someone
else’s income) and still behave “economically”. In order to make the theory testable,
therefore, it is necessary to add several auxiliary assumptions regarding the contents of
people’s preferences (or, equivalently, regarding the argument of their utility functions),
their constraints, their knowledge, and so forth. In our example, the standard
Ultimatum Game experiment is really testing a very specific prediction obtained by
adding to the basic theory some strong assumptions about people’s preferences, e.g. that
they are trying to maximise their monetary gains and do not care about others’ payoffs.
We are of course dealing with a typical Duhem-Quine issue here. Experimental results
usually do not indicate deductively the truth/falsity of a theoretical hypothesis in
isolation, but rather of a whole body of knowledge or “cluster” of theoretical and
auxiliary hypotheses at once.7 Formally, the Duhem-Quine thesis can be presented as
(4) (T & A1 & A2 … Ai) ? O
(5) ~O
(6) ~ T ? ~ A1 ? A2 ?…? Ai
The argument states that from (4) and (5) we can only conclude that at least one
element, among all the assumptions used to derive the prediction O, is false. But we
cannot identify exactly which one, from a purely logical point of view. The last point is
worth stressing because the moral of the Duhem-Quine problem has been often
exaggerated in the methodological literature. The correct reading is that deductive logic
is an insufficient tool for scientific inference, and hence we need to complement it by
means of a theory of induction. The Duhem-Quine problem does not imply, as sometimes
suggested, the impossibility of justifiably drawing any inference from an experimental
result. Scientists in fact do draw such inferences all the time, and it takes a good deal of
philosophical arrogance to question the possibility of doing that in principle. What we
need is an explication of why some such inferences are considered more warranted than
others. If, as pointed out by Duhem and Quine, deductive logic is insufficient, this must
be a task for a theory of induction.
Cf. Duhem (1905) and Quine (1953).
Causal inference
Experiments in social science are rarely aimed at the determination of constants of
nature that hold universally across different contexts, but rather at the discovery and
quantitative testing of causal hypotheses. Measurements made in a “baseline” condition
are compared with those made in other conditions or treatments, in which various
elements of the design have been varied one at a time. Many variations have been tried
on the basic version of the Ultimatum Game: levels of anonymity, information about
others’ payoffs, physical proximity, group identity, property rights, effort, and several
other variables have been manipulated to identify the factors that may hinder or foster
egalitarian splits in the Ultimatum Game.
In a competently performed experiment each treatment or condition is designed so as to
introduce variation in one (and only one) potential causal factor. The method of
variation is a characteristic hallmark of experimental science, which is, in turn, the
most powerful method for the discovery of causal relations among variables. These
statements may seem surprising, for the notion of causation has suffered from a lot of
bad press during the last hundred years. The twentieth century was dominated by antimetaphysical philosophies, and social scientists under the influence of logical positivism,
in particular, have argued for decades that “cause” and “effect” are dubious
metaphysical notions that scientists should (and can) do without.8
Despite several valiant attempts to reduce causality to more “respectable” concepts
(such as constant conjunction or statistical association), however, it is now generally
agreed that causal relations have intrinsic features – like asymmetry, counterfactual
dependence, invariance to intervention – that cannot be explained away by means of a
reductive analysis. There are now several non-reductive theories of causation in the
philosophical and scientific literature, which are reviewed in Chapter 35 of this
Handbook. Luckily we do not have to get into this issue in great depth here, for in spite
of continuing disagreement over the metaphysics of causation (what it means and what
it is), there is broad agreement that the method of the controlled experiment is a
powerful tool for causal discovery. The reason is that controlled experimentation allows
underlying causal relations to become manifest at the level of empirical regularities. In
a competently performed experiment, single causal connections can be “read off” directly
from statistical associations.
A homely example illustrates this type of inference nicely. Imagine you want to discover
whether flipping the switch is an effective means for turning the light on (or whether
“flipping the switch causes the light to turn on”). The flipping of course will have such
effect only if other enabling background conditions are in place, for example if electricity
supply is in good working order. Thus we shall first have to design an experimental
situation where the “right” circumstances are instantiated. Then, we shall have to make
sure that no other extraneous variation is disturbing the experiment. Finally, we will
Russell (1913) is a seminal and influential example.
check whether by flipping the switch on and off we are producing a regular association
between the position of the switch (say, up/down) and the light (on/off). If such an
association is observed, and if we are confident that every plausible source of error has
been controlled for, we will conclude that flipping the switch is causally connected with
turning the light on.
Causal discovery, in a nutshell, requires variation, but not too much variation, and of
the right kind. In general, you want variation in one factor while keeping all the other
putative causes fixed “in the background”. This logic is neatly exemplified in the model
of the perfectly controlled experiment (see Table 1).
Experimental group X
Control group

Table 1: The perfectly controlled experimental design
The Ki are the background factors, or the other causes that are kept fixed across the
experimental conditions. The conditions must differ with respect to just one factor (X,
the treatment) so that any significant difference in the observed values of Y (Y1 – Y2) can
be attributed to the presence (or absence) of X. A good experimenter thus is able to
discover why one kind of event is associated regularly with another kind of event, and
not just that it does. In the model of the perfectly controlled experiment one does not
simply observe that “if X happens then Y happens”, nor even that “X if and only if Y”.
Both conditionals are material implications, and their truth conditions depend on what
happens to be the case, regardless of the reasons why it is so. In science in contrast –
and especially in those disciplines that regularly inform policy-making, like the social
sciences – one is also interested in “what would be the case if” such and such a variable
was manipulated. Scientific intervention and policy-making must rely on counterfactual
reasoning. A great advantage of experimentation is that it allows checking what would
happen if X was not the case, while keeping all the other relevant conditions fixed.
We can now draw a first important contrast between the experimental method and
traditional statistical inferences from field data. Using statistical techniques one can
establish the strength of various correlations between economic variables. But except in
some special happy conditions, the spontaneous variations found in the data do not
warrant the drawing of specific causal inferences. Typically, field data display either too
little or too much concomitant variation (sometimes both). Some variations of course can
be artificially reconstructed post-hoc by looking at partial correlations, but the ideal
conditions instantiated in a laboratory are rarely found in the wild (except in so-called
“natural experiments”, that is).
This does not mean that total experimental control is always achieved in the laboratory.
The perfectly controlled experiment is an idealisation, and in reality there are always
going to be uncontrolled background factors, errors of measurement, and so forth. To
neutralise these imperfections, experimenters use various techniques, like for example
randomization.9 In a randomized experiment subjects are assigned to the various
experimental conditions by a chance device, so that in the long run the potential errors
and deviations are evenly distributed across them. This introduces an important
element in the inference from data, i.e. probabilities. A well-designed randomized
experiment makes it highly likely that the effect of the treatment is reflected in the
data, but does not guarantee that this is going to be the case. Assuming for simplicity
that we are dealing with bivariate variables (X and ~X; Y and ~Y), in a randomized
experiment if (1) the “right” background conditions are in place, and (2) X causes Y, then
P(Y?X) > P(Y?~X). In plain words: if (1) and (2) are satisfied, X and Y are very likely to
be statistically correlated.
Some authors (notably Cartwright 1983) have used this relation or some close variant
thereof to define the very notion of causation. Such a definition is essentially a
probabilistic equivalent of J.L. Mackie’s (1974) famous INUS account, with the
important addition of a “screening off” condition.10 The latter is encapsulated in the
requirement that all other causal factors in the background are kept fixed, so as to avoid
problems of spurious correlation. Several interesting philosophical implications follow
from such a definition of causation, which however would take us too far away from our
present concerns. Since the main focus of this chapter is methodological, I shall skip
these metaphysical issues here (and refer the interested reader to Chapter 35).
The logic of testing
The above analysis suggests an obvious way to tackle the Duhem-Quine problem, by
simply asserting the truth of the background and auxiliary assumptions that are used in
designing an experiment. In a competently performed controlled experiment, in other
words, we are entitled to draw an inference from a set of empirical data (or evidence, E)
and some background assumptions (Ki) to a causal hypothesis (H = “X causes Y”). The
inference consists of the following three steps:
(7) (H & Ki) ? E
(8) E & Ki
(9) H
Other techniques are used when the model of the perfectly controlled experiment cannot be
applied for some reason, but I shall not examine them in detail here (they are illustrated in most
textbooks and handbooks of experimental methodology, cf. e.g. Christensen 2001).
10 INUS stands for an Insufficient Non-redundant condition within a set of jointly Unnecessary
but Sufficient conditions for an effect to take place. There are several problems with such an
approach, some of which are discussed by Mackie himself. The “screening-off” condition fixes
some of the most obvious flaws of the INUS account.
This is an instance of the Hypothetico-Deductive model of testing. In this case the
evidence indicates or supports the hypothesis. The symmetric case is the following:
(10) (H & Ki) ? E
(11) ~E & Ki
(12) ~H
In the latter case, the inference is deductive. If (and sometimes this is a big “if”) we are
ready to assert the truth of the background assumptions Ki, then it logically follows that
the evidence E refutes or falsifies H. Since we are not often in the position to guarantee
that the Ki are instantiated in reality a refutation is usually followed by a series of
experiments aimed at testing new hypotheses H?, H??, etc., each concerned with the
correctness of the design and the functioning of the experimental procedures. If these
hypotheses are all indicated by the evidence, then the experimenter usually feels
compelled to accept the original result.
Notice that in the first case (7-9) the conclusion of the argument is not logically implied
by the premises, or in other words the inference is inductive. Of course many scientific
inferences have this form, so the point of using the experimental method is to make sure
that the inductive step is highly warranted, or that the inference is as strong as
possible. The conditions for a strong inductive inference are outlined in normative
theories of scientific testing. Although there is presently no generally agreed theory of
inductive inference in the literature, the model of the perfectly controlled experiment
suggests a few basic principles that any adequate theory should satisfy. When an
experiment has been competently performed – i.e. when the experimenter has achieved
a good degree of control over the background circumstances Ki – the experimental data
have the following, highly desirable characteristics:
(a) if X causes Y, the observed values of the experimental variables X and Y turn out to
be statistically correlated;
(b) if X does not cause Y, these values are not correlated.
Another way to put is this. In the “ideal” experiment the evidence E (correlation
between X and Y) indicates the truth of H (X causes Y) unequivocally. Or, in the “ideal”
experiment you are likely to get one kind of evidence (E) if the hypothesis under test is
true, and another kind of evidence (~E) if it is false (Woodward 2000). Following
Deborah Mayo (1996; 2005), we shall say that in such an experiment the hypothesis H is
tested severely by the evidence E.
External validity
Experiments in the social sciences are rarely aimed at investigating phenomena that
take place within laboratory walls only. The Ultimatum Game was devised to shed light
on some aspects of two-person bargaining, in particular the role played by fairness
norms in determining the allocation of surplus. And yet, bargaining comes in a variety
of forms, and is regulated by different institutions in different social contexts. It is far
from obvious that any general claim can be derived from the highly stylized set-up
known as the Ultimatum Game.
Qualms of this kind are very familiar to experimental social scientists. Scientists are
aware that the successful discovery and testing of causal claims does not automatically
ensure that these claims can be generalized to non-experimental circumstances. For this
reason they draw a crucial distinction between internal and external validity of
experimental results. Problems of internal validity have to do with the drawing of
inferences from experimental data to causal mechanisms in a given laboratory set-up.
Typical internal validity questions are: Do we understand what goes on in this
particular experimental situation? Are we drawing correct inferences within the
experiment? External validity problems instead have to do with the drawing of
inferences from experimental data to what happens in other (typically, non-laboratory)
situations of interest. They involve asking questions like: Can we use experimental
knowledge to understand what goes on in the “real world”? Are we drawing correct
inferences from the experiment?
The method of the perfectly controlled experiment is maximally useful to tackle internal
validity problem – when the issue is to find out what is going on within a given
experimental set-up or laboratory system. Since the method relies on the control of
background and boundary conditions, there is usually a trade-off between internal and
external validity. A simple experiment such as the Ultimatum Game for example, which
reproduces many idealisations of a theoretical model, is easier to control in the
laboratory. But it also constitutes a weaker starting point for extending experimental
knowledge to other situations where these idealisations do not hold.
According to an old tradition in experimental psychology, the problem of external
validity should be framed i