Reinforcement contingencies and their correlation
 with cause and effect relationships

Contents

Background
Common learning mechanisms
Reinforcement contingencies and their informative significance

  Stimulus reinforcement contingency
  No stimulus reinforcement contingency
  Response reinforcement contingency
  No response reinforcement contingency

Notes
References

Background

Originating from extensive observations of various types of animals, Charles Darwin (1859) theorized that animals adapt to changes in their environment. He argued that the variation among individual members of a species is the source of new adaptive form. When changing conditions occur under nature individual members of a species with useful variations are more likely to survive. In turn, the surviving members will likely reproduce and pass those favorable variations onto their offspring. The continual challenges to survival posed by the changing environment force this process to repeat over and over across generations with the effect that the characteristics that contribute to survival persist and those that do not diminish. In this way the genetic characteristics that contribute to survival are naturally selected by the environment. Evolution is therefore the product of genetic variation and natural selection.

Evolution is not, however, confined to physical or innate characteristics. The capacity to learn is also a product of natural selection. Only learning mechanisms enable animals to respond rapidly, precisely, and adaptively to changing conditions. Innate response mechanisms simply are not as flexible for negotiating the continual challenges faced by animals.

A common attribute of information processing systems that animals share is the capacity for associative learning, which, here, is the acquisition of knowledge about the relationship between events in an organism’s environment. All animals benefit from being able to learn about the relationship between events. The capacity to learn about the relationship between an environmental cue or signal and an outcome or one's own action and an outcome enables animals to anticipate future events and modify behavior to suit the situation. Obviously, it is of great advantage and constitutes a substantial influence on behavioral adjustment to the environment for animals to learn which events predict the occurrence of biologically important events and on the basis of those predictions which responses are the most optimal. In fact, one of Darwin's theoretical precepts is that whenever a characteristic is found to be either unique to a species or general across species, it may be assumed that it plays an important function for survival and if it does not, it would not have become specific or common.

Common learning mechanisms

Common characteristics can be explained by either common ancestry or by convergent selection pressures. Convergence, a product of convergent selection pressures, is the tendency to grow alike or develop similarities in form or habit in order to cope with similar problems and environmental resources. Often adaptations that share a common function are the product of convergent selection pressures or common environmental constraints. The example Dickinson (1980) offers in his book on animal learning is that all animals that live in the sea are faced with the common constraint of the relative uniformity of the medium in which they move. Faced with this common constraint, many different species have evolved with similar body shapes for the function of efficient swimming.

The important analogy Dickinson makes is that many different species also face common learning problems. The survival of animals often depends on their capacity to detect, learn, and store information about the relationship between events in their environment that are important to them. Survival depends on knowing which environmental events predict injury or assault, what foods to eat or avoid, where to find water, as well as knowing about the events that signal the nonoccurrence of harm or benefit, or events that are uninformative so that attention can be directed elsewhere. In general, survival, a common selection pressure, often depends on the capacity to learn which sensory inputs signal the occurrence or nonoccurrence of important events and which inputs can be ignored, along with the actions that can cause important events to occur or not occur.

The critical question Dickinson asks is whether the relationship between events has properties that are common to many different species and situations and therefore would likely shape or maintain common learning mechanisms. Dickinson and others (e.g. Testa, 1974) argue that associative learning mechanisms have evolved to enable animals to detect and store information about the real causal relationships in their environment and the conditions under which learning takes place are those in which there is a causal relationship between events. He emphasizes that events that lie on the causal chain leading to the occurrence of events of value have universal properties that transcend any particular adaptations or species specific adaptations. For example, effects never occur without a cause and they never occur before their cause. Causal events are unidirectional ensuring that certain events reliably precede other events.

Since it is important for all animals to predict the occurrence of events that are important to them, and since all causal events follow the same primary causal laws, the evolutionary adaptation animals must share is the capacity to detect the causes of those important events, since the best predictors of important events are, unidirectionally speaking, their causes. Just as the homogeneity of our oceans along with the laws of fluid mechanics has contributed to the evolution of similar body types for the function of hydrodynamic efficiency, the universal survival needs to predict the occurrence of important events along with the universal properties of causal relationships has most likely contributed to the evolution of common learning mechanisms in different animal species for the function of preparatory and instrumental efficiency. Animals that are able to learn about real world cause and effect relationships are more likely to survive and pass that capacity onto their offspring; whereas, the animals that are not able to learn about those cause and effect relationships are more likely to perish. Moreover, since cause and effect relationships are universal to all animals, the capacity to learn about real world causal relationships should be found in a number of different species which would help them to make sense of the world and structure the relationships between events in space and time.

As it turns out, the capacity to associate, that is detect, learn, and store information about relationships between events in the environment, is common among species of higher vertebrate such as mammals, birds, and some invertebrate. Vertebrate typically absorb information from exposure to the relations among events in their environment rather than just respond reflexively, as was once thought. Behavior often involves either approach to or avoidance of events that can be of benefit or harm. And a lot of what an animal must learn about is the predictive environmental signs of immanent consequence. From associative learning some environmental stimuli come to serve as signals that foretell the occurrence of important events which produce anticipatory responses (classical conditioning) in all vertebrate and many invertebrate species (MacPhail 1982). Anticipatory responses to environmental stimuli provide a preparatory advantage over and above the capacity to respond with reflex action to forthcoming biologically important events(1). The ability to associate cues or signals with important events enables animals to optimize interaction with those events (see e.g. Hollis 1982). Furthermore, actions that are instrumental in causing some outcome will usually be either increased or decreased depending upon the appetitive or aversive nature of the outcome (instrumental conditioning). It is this capacity for instrumental action that allows animals to control their environment in the service of their needs or desires. Both mammals and birds possess associative learning mechanisms that are indeed finely tuned to detect variations in the degree of correlation between a signal or cause and an outcome. Or we could say, they possess associative learning mechanisms that are finely tuned to detect variations in the degree of correlation between a stimulus or response and a reinforcer that is arranged in a reinforcement contingency(2).

Reinforcement contingencies and their informative significance

If animals are capable of detecting the causal relationships in their natural environment then they must also be capable of detecting the causal relationship between the events that are arranged in a reinforcement contingency, because to arrange a reinforcement contingency is to arrange a causal relationship. All animals that have the capacity to detect causal relationships in their natural environments also have the capacity to detect the causal relationships that are arranged in reinforcement contingencies.

In associative conditioning the trainer arranges a correlative or causal relationship between events, in part, by maintaining a reinforcement contingency. The relationship that is arranged is between a stimulus or response and a reinforcer, and the function of the contingency is to enable the subject to detect the correlation between those events (the predictive or causal relationships). Learning is not observed directly. That is, we cannot see the subject learn; we can only observe a change in behavior after exposure to the contingent relationship and then suppose, from the change in behavior, the animal has learned something about that relationship. When the subject is exposed to a certain relationship the behavioral changes observed by the trainer can be used as an index or form of scientific measurement indicating the subject has learned something about that relationship. Instead of viewing the changes in behavior as the learning of new responses, the changes in behavior are viewed by cognitivists as an index that the subject has successfully learned something about the relationship between the events that the trainer contingently arranged. In other words, from learning the subject is changed (rather than the behavior) and the change in behavior is a reflection of that change or learning. The effects of the contingency can be assessed by comparing behavioral changes or responses that are maintained when the contingency operates with responses that are maintained when it does not. If, as a consequence of exposure to a reinforcement contingency the subject’s behavior has changed, the trainer may infer learning has occurred.

Stimulus reinforcement contingency

In the simplest case, the contingency is either between a particular stimulus and reinforcement, termed Pavlovian or classical conditioning, or between a particular response and reinforcement, termed instrumental conditioning(3). When there is a stimulus reinforcement contingency or classical contingency the subject is reinforced following the onset of that stimulus and in its absence reinforcement is omitted. The function of both reinforcement following the occurrence of the stimulus and the omission of reinforcement in the absence of the stimulus is to better enable the subject to detect the relationship between the stimulus and reinforcement and the absence of the stimulus and no reinforcement. For instance, Rescorla (1968) showed that simple temporal contiguity, or proximity, between events is not sufficient for producing an association between them. Animals also need to learn about the probability of an outcome both in the presence and in the absence of its supposed signal. For example, a certain stimulus would not be a very good cue or signal predicting the presence of, say, a predator unless that stimulus is encountered more often just prior to or in conjunction with that predator than in its absence, which by the way, is an example of a real world reinforcement contingency. In this way the subject can learn about the relationship between events, in this case, that the stimulus signals or predicts an aversive reinforcer, the predator (see note 2c). The subject acquires knowledge about the relationship between events and the response is a measure of that knowledge.

No stimulus reinforcement contingency

Conversely, if there were no contingency between a particular stimulus and reinforcement the stimulus would provide no information about the probability of reinforcement which would likely result in the subject learning to ignore that stimulus. An experimental procedure that has been used to demonstrate the effects of a zero contingency or correlation between events to be associated is the “random control” procedure (Rescorla 1967). Suppose, for example, we take a hungry dog and put it in some sort of apparatus with an automatic feeder, and while in the apparatus a bell rings around every minute or so. Now let's suppose, from this basic procedure, two contrasting experiments. In the first experiment (the "random control" procedure) the dog receives small amounts of food from the feeder always immediately after the bell rings. However, along with the food always occurring immediately after the bell, food also randomly occurs more temporally distant after the bell rings or before the bell. So, over the long run, the probability of the food occurring after the bell or at some other time is the same(4). In this arrangement, in which there is no reinforcement contingency, the occurrence of the bell cannot predict the occurrence of the food. In other words, the bell does not provide information about the coming of the food. In the second experiment the food is also made available but now always immediately and only after the ringing of the bell so the occurrence of food is contingent upon, and can be correlated with, the occurrence of the bell(5).

In the first experiment the dog will probably pay attention to the bell the first few times it rings, but will soon learn to ignore it because, over the long run, when there is no contingency, the dog learns that the occurrence of the bell provides no information about the occurrence of the food. Or to put it more informally, the dog attributes the bell-food pairings to chance. After exposure to uncorrelated presentations of the bell and food, the dog will pay no more attention to the bell than to any other uninformative feature of the apparatus, which would later interfere with the formation of an association between that stimulus and reinforcement (see e.g. Baker and Mackintosh 1977). The opposite will be the case in the second experiment; the dog will probably pay attention to the bell the first time it rings and then continue to pay more and more attention to it instead of less and less because the food only and always immediately follows the bell. (As we will see, during conditioning the to be conditioned event must also only and always precede the reinforcer.) That is, the presentation of the food does not occur at other times in the absence of the bell or before the bell.

The critical observation is that in both experiments the food always occurs immediately after the bell. That is, both experiments share the same contiguity, but differ in the amount of information the bell provides about the probable occurrence of the food. In the first experiment, the food is equally likely to occur after the bell rings or at some other time, so the bell provides no causal information about the food; in the second experiment, the food or reinforcer only and always occurs immediately after the ringing of the bell, so the bell is a very good source of information predicting the occurrence of the food(6). What distinguishes the two experiments is the reinforcement contingency. In the first experiment there is no contingency or causal relationship because the food occurs, not only after the bell, but also, at other times without the bell or before the bell (effects do not occur without a cause or before their cause); whereas, in the second experiment there is a reinforcement contingency due to the arrangement of a causal relationship. That is, the dog only and always gets food or reinforcement immediately after the bell.

As we can see, the same laws that govern causal relationships apply to reinforcement contingencies. In a causal relationship an effect never occurs without a cause and the effect never occurs before the cause. Similarly, for a reinforcement contingency to be most effective the reinforcer (effect) should not occur without the to be conditioned stimulus or response (cause) and should not occur before the to be conditioned event. Here, regularity and the temporal contiguity between events is not everything. If regularity and temporal contiguity were the only things arranged during conditioning, the subject would be unable to resolve whether or not the pairing of events reflected the presence of a causal relationship or a chance happening. Both regularity and the temporal contiguity between events in reinforcement contingencies are important because they supply important information about a possible causal relationship between events but when the possible cause or effect occur in isolation of one another half of the time (i.e. when there is no reinforcement contingency) then the probability that they are related is removed or canceled out. From a real world perspective, this makes adaptive sense. Often events occur together that are not reliably or causally related to an outcome or reinforcer. The world is full of chance conjunctions of events. If animals attributed the occurrence of a valued outcome to every event that happened to precede it, no matter its regularity or the temporal interval between a stimulus or response and outcome, then they would not be able to make sense of the world. For them there would be no causal structure and they would not be likely to survive. It’s more adaptive for associations to be formed selectively, in favor of better predictors of valued outcomes at the expense of worse predictors (Mackintosh 1975, 1977).

Response reinforcement contingency

When the reinforcement contingency is between a response and reinforcement the subject is reinforced if and only if the subject performs a particular response. If the subject fails to respond correctly reinforcement is omitted. It is this contingent arrangement for reinforcement that facilitates the detection of causal relationships between events. From this type of contingency animals can learn that the occurrence of reward is not random, that there is a relationship between the correct response and reward and the absence of that response and no reward. They learn, in part, about the relationship between their actions and outcomes, that their actions are instrumental in causing the outcomes. To the extent that reinforcement is more probable when the correct response is performed then when it is not, subjects can learn that they have causal control over the outcome(7).

No response reinforcement contingency

An instrumental analogue to learning to ignore uninformative or irrelevant stimuli, termed ‘learned irrelevance’, is a phenomenon Seligman and Maier (1967) termed ‘learned helplessness’. Seligman and Maier found that when dogs were first exposed to a series of inescapable shocks they would later have tremendous difficulty learning to simply jump over a barrier in order to escape or avoid shock. One of their observations was that the dogs prior experiences, in which there was absolutely no correlation (contingency) between what the dogs did to escape or avoid shock and the occurrence of shock, interfered with the subsequent detection of a contingency or correlation between their behavior and its consequences. The phenomenon of learned helplessness has been confirmed in a number of other species, including humans (e.g. Hiroto 1974), that were tested in various experimental situations establishing the effect depends on the inescapability of the aversive events to which subjects are initially exposed (Maier and Seligman, 1976). For instance, initial exposure to shock from which animals can escape has little or no detrimental effect on subsequent escape or avoidance performance (see e.g. Maier, 1970; Volpicelli et al., 1983).

In another interesting experiment Goodkin (1976) found that rats who had initially received free food, without having to earn it (i.e. with no contingency or correlation between responding and food), were subsequently almost as slow to learn to respond in order to escape or avoid shock as those animals who had initially received inescapable shock. This suggests that among other things, learned helplessness is most probably part of a more general phenomenon in which initial exposure to a zero correlation between responding and reinforcement interferes with the subsequent detection of a correlation or contingency between responding and reinforcement. Initial exposure to an appetitive or aversive reinforcer whose onset and termination are uncorrelated with any action of the subject’s will later interfere with the detection of a correlation between the subject’s behavior and that reinforcer. Similarly, a zero correlation between a stimulus and a reinforcer will interfere with the detection of a correlation between the two events when a contingency between them is subsequently introduced. That is, animals are capable of learning, not only about the events that predict or cause the occurrence of important events, but also about stimuli or responses that are uncorrelated with reinforcement (see for e.g. Baker 1976).

As we can see, conditioning procedures, involving the maintenance of reinforcement contingencies, help facilitate the detection of events that predict or cause the occurrence of additional events of value (the reinforcer) which in turn makes those predictive or causal events valuable because animals can use that knowledge to guide their actions. Additionally, when there is no reinforcement contingency animals can learn about events that occur independently of one another which makes those events uninformative and unvalued. They learn to ignore events that supply no important information. As a consequence, subsequent conditioning to those uninformative events can be retarded. This suggests that conditioning occurs selectively. Events that are better predictors or probable causes of a significant outcome are attended to whereas events that are uninformative are ignored. As we will see, animals also learn to ignore less informative and redundant environmental events which united help to enhance selective attention and discrimination.

Notes:

(1)  The word preparatory is used here to describe the function of the conditioned response. It is not meant to be synonymous with terms such as "preparatory-response" used in other animal learning contexts that are non-Pavlovian.

(2)a.  When a reinforcement contingency is in operation reinforcement depends upon the occurrence of a to be conditioned stimulus or response. In if-then terminology, only if event X (the stimulus or response) occurs will event Y (the reinforcer) occur. In the simplest case, an event that is a contingent stimulus event is one in which the occurrence of a particular stimulus is the prerequisite for the occurrence of the conditional event, the reinforcer. That is, reinforcement is contingent upon the occurrence of a particular stimulus and in is absence reinforcement is omitted. A contingent response event is one in which a particular response is the prerequisite for the reinforcer. That is, reinforcement is contingent upon the occurrence of a particular response and if the subject fails to respond correctly reinforcement is omitted. If the occurrence of event Y (reinforcement) is dependent upon the occurrence of event X (a stimulus or response) then a reinforcement contingency is said to exist between events.

(2)b.  A stimulus is the physical energy in the environment that impinges on an animal's sensory apparatus, such as, sights, sounds, smells, tastes, and feelings. Environmental stimuli are potential sources of information that animals learn to attend to when they predict an outcome of importance to them and ignore when they are irrelevant, redundant, or signal no change in the probability of an outcome.

(2)c.  The events termed reinforcer or reinforcement are the events, usually of motivational significance, that are consequences or outcomes of either appetitive or aversive value. An outcome, either appetitive or aversive, presented conditional upon the occurrence of a particular neutral stimulus or upon the performance of a particular response will usually cause the animal's behavior to change in one way or another, that is, either increase or decrease. For example, a conditional stimulus for food will usually cause an animal to increase behavior; whereas, a behavior that is followed with a shock will usually cause an animal to stop or decrease that behavior.

Additionally, the omission of an otherwise expected outcome (reinforcer) may change behavior by either increasing or decreasing it. For example, if a particular stimulus is a signal for food, and increases behavior, another stimulus may signal the omission of food, and decrease that behavior. Or, taking it a step further, a stimulus may be a signal for food and increase that behavior, whereas another stimulus may be a signal for shock and decrease that behavior, but a third stimulus that signals the omission of shock may cause that behavior to increase.

The change in behavior is only an index that the subject has learned something about the conditional relationship.

An appetitive reinforcer, such as food, and an aversive reinforcer, such as shock, are reinforcers for both classical and instrumental conditioning.

(3)  The terms "reinforcement" and "reinforcer" are used here with relevance to both classical and instrumental conditioning. In classical or Pavlovian terminology, the reinforcer is called the unconditional stimulus or US rather than reinforcer.

(4)  Strictly speaking, over the long run there can be no reinforcement contingency between events unless the reinforcer occurs more often more closely after the event to be conditioned than at other times more temporally distant. If the reinforcer more often occurs at longer time intervals after the event to be conditioned, relative to other times when there is closer temporal contiguity between the occurrence of the two events, there will be no contingency. Relative temporal proximity is a more important factor than contiguity in an absolute sense (see e.g. Rescorla 1967). That is to say, in general conditioning will progress more rapidly if the onset of the to be conditioned stimulus or response precedes the reinforcer by five seconds or less. Typically, a delay between the termination of one and the onset of the other interferes with conditioning. However, the critical factor is not the temporal contiguity between events but rather the relative temporal proximity. If the probability of the outcome immediately following the to be conditioned event is the same as the probability of the outcome at later times (or in the absence of the to be conditioned event or if the reinforcer occurs before the to be conditioned event) then there can be no objective cause and effect relationship. In order for the to be conditioned event to be a good probable cause of the outcome or effect (reinforcer) the supposed effect should not occur at other times more temporally distant after the to be conditioned event, (in the absence of the event or probable cause, or before the event or probable cause.)

(5)  Since cause and effect relationships are one directional, always going from the cause to the effect, and since the function of the reinforcement contingency is to facilitate the detection of a causal relationship then the relationship that is arranged in the contingency should also always go from the cause to the effect or the to be conditioned stimulus or response to the reinforcer. And since in cause and effect relationships an effect never occurs without a cause, in a reinforcement contingency a reinforcer should never occur without the to be conditioned stimulus or response during conditioning (see Dickinson 1980; Siegel and Domjan 1971).

(6)  Although the pairing or contiguity of two events remains a primary concept to many psychologists and animal trainers alike, the contemporary view of conditioning emphasizes the information that one event provides about another and sees contiguity alone as insufficient (and sometimes unnecessary, as when there is no other probable predictor or cause that is more closely conjoined in space and time between the to be conditioned event and the reinforcer) for producing conditioning (see e.g. Rescorla 1972).

(7)  These accounts may be unfamiliar to some readers. The history in the field of animal conditioning or learning is marked by divergent and well identified schools. The introductory books have primarily been written from the behavioralist's perspective, leaving the more contemporary cognitive achievements largely undocumented at the introductory level. Current thinking about conditioning is completely different from the views held during the reign of behaviorism which repudiated any form of mentalism and failed to sufficiently delineate the determining factors that produce learning, the essence of that learning, or the ways in which that learning influences behavior. Today most modern learning theorists agree that animals acquire knowledge about the relations among events in their environment and that the observable behavioral changes are a manifestation of that knowledge. Learning or the acquisition of knowledge from exposure to a broad range of environmental relations is believed to be one of the main ways animals represent the structure of the world. Faced with the choice between trying to cover everything and confusing the reader or submitting only a biased cognitive account and being clear I have chosen the later. These accounts, and others to follow, better help us to understand how animals gain knowledge about their world and use that knowledge to guide their actions in both the psychological laboratory and their natural environments.

For a good review in which many of the developments are delineated read Mackintosh (1983). In this important book Mackintosh examines the pertinent theories and extricates the explanations that are better supported by a broader range of experimental evidence.

References:

Baker, A. G. (1976). Learned irrelevance and learned helplessness: rats learn that stimuli, reinforcers and responses are uncorrelated. Journal of Experimental Psychology: Animal Behavior Processes, 2, 130-141.

Baker, A. G. and Mackintosh, N. J. (1977). Excitatory and inhibitory conditioning following uncorrelated presentations of CS and UCS. Animal Learning and Behavior, 5, 315-319.

Darwin, C. (1859). On the origin of species. London: J. Murray.

Dickinson, A. (1980). Contemporary animal learning theory. Cambridge University Press.

Goodkin F. (1976). Rats learn the relationship between responding and environmental events: an expansion of the learned helplessness hypothesis. Learning and Motivation, 7, 382-393.

Hiroto, D. S. (1974). Locus of control and learned helplessness. Journal of Experimental Psychology, 102, 187-193.

Hollis, K. L. (1982). Pavlovian conditioning of signal-centered action patterns and automatic behavior: A biological analysis of function. Advances in the Study of Behavior. 12, 1-64.

Mackintosh, N. J. (1975). A theory of attention: Variations in the associability of stimuli with reinforcement. Psychological Review, 82, 276-298.

Mackintosh, N. J. (1977). Conditioning as the perception of causal relations. In R. E. Butts and J. Hintikka (Eds.), Foundational problems in the special sciences. Dordrecht, Netherlands: Reidel, 241-250.

Mackintosh, N. J. (1983). Conditioning and associative learning. Clarendon Press, Oxford.

MacPhail, E. M. (1982). Brain and intelligence in vertebrates. Oxford: Clarendon Press.

Maier, S. F. (1970). Failure to escape traumatic electric shock: incompatible skeletal-motor responses or learned helplessness. Learning and Motivation, 1, 157-169.

Maier, S. F. and Seligman, M. E. P. (1976). Learned Helplessness: theory and evidence. Journal of Experimental Psychology: General, 105, 3-46.

Rescorla, R. A. (1967). Pavlovian conditioning and its proper control procedures. Psychological Review, 74, 71-80.

Rescorla, R. A. (1968). Probability of shock in the presence and absence of CS in fear conditioning. Journal of Comparative and Physiological Psychology, 65, 55-60.

Rescorla, R. A. (1972). Informational variables in Pavlovian conditioning. In G. H. Bower (Eds.), The psychology of learning and motivation (Vol. 6). New York: Academic Press.

Seligman, M. E. P. and Maier, S. F. (1967). Failure to escape traumatic shock. Journal of experimental Psychology, 74, 1-9.

Siegel, S, and Domjan, M. (1971). Backward conditioning as an inhibitory procedure. Learning and Motivation, 2, 1-11.

Testa, T. J. (1974). Causal relations and the acquisition of avoidance response. Psychological Review, 81, 491-505.

Volpicelli, J. R., Ulm, R. R., Altenor, A., and Seligman, M. E. P. (1983). Learned mastery in the rat. Learning and Motivation, 14, 204-222.