Thursday 20 March 2014

Negative Reinforcement - The good, the bad, and the ugly

Disclaimer: I am mostly a behavioural scientist. I know about animal behaviour from the outside. I know about measuring emotional states, assessing welfare, and looking for behavioural indicators of stress. Some of the topics in these blog posts on negative reinforcement are not my strongest areas. I can only offer my interpretation of the literature. I am certainly open to discussing alternative interpretations.

Negative reinforcement (R-, NR) is an operant conditioning quadrant. Quadrants basically predict how stimuli will affect the frequency, duration, and/or intensity of future behaviour depending on whether they are rewarding or punishing and whether they are added or taken away. In the case of negative reinforcement, future behaviour increases in frequency, duration and/or intensity when something is taken away. In other words, the animal will learn to perform a behaviour in order to gain relief from something they find aversive. Some well known trainers have created what's called a humane hierarchy of training methods as a guide to rank training methods on how humane an intervention they represent. Negative reinforcement is conspicuously far up on these humane hierarchies, prompting many trainers to stringently avoid it. This seems like a tenuous reason to condemn an entire learning quadrant. It makes a few very broad assumptions, such as even the mildest aversive experiences strong enough for an animal to want to avoid have no place in training behaviour. Is having someone brush past you, making you feel too close to them so that you step away really the same beast as having someone scream in your face until you move? I use extreme examples to show the breadth of the quadrant we're talking about. Some forms of negative reinforcement are extremely mild and some are jumping on the toes of downright punishment. And is avoiding an aversive experience always less humane than, say, reinforcing a different behaviour. Has anyone ever asked you to do something still like lie down when you'd rather be running away? But you can have a chocolate if you lie down. Thanks, but I think I'll run.  So what's the story? Can negative reinforcement be a humane way to train an animal? How do we assess that? Let's have a good look at the related scientific literature and see if we can reason out an answer. 


Susan Friedman's "humane hierarchy"

It would seem like the place to start is examining what it feels like for an animal to be negatively reinforced. Perhaps this is the most important question and yet the hardest to answer. I can present what we know of emotions and welfare in animals. I am hesitant to delve into neurotransmitters, neuroanatomy, and stress physiology because it is not my area of expertise. My understanding of the literature may be simplistic. Yet, there are claims being made that negative reinforcement should be avoided on that basis, so let's have a quick look at it. 

Neuroscience

These are chemicals that carry signals throughout the brain. The type of neurotransmitter is important, as is where it is going, but do not for a second think that this is remotely straight forward. The brain is crazy complicated, and very adaptable. Our understanding of it is not complete. Some neurotransmitters are associated with 'good' emotional experiences. One example is dopamine, which is heavily implicated in reward and the anticipation of good things happening. That is a good feeling! But, the lack of a certain type of dopamine receptor inhibits learning of both positive and negative associations with a physical place, and an active avoidance task. That suggests dopamine is implicated in unpleasant feelings as well... All right, so there's also serotonin involved in negative reinforcement. According to recent research, serotonin may have a role to play in how both rewarding (appetitive) and punishing (aversive) stimuli are processed. What does that mean? It means we don't really know exactly what serotonin does and we are probably going to need to study specific serotonin receptors to better understand it. So then there is noradrenaline, (or norepinephrine to use the American name), which is implicated in negative reinforcement learning and also features strongly in stress responses. Its role in stress responses is largely an arousing one, which, as it happens, appears to enhance memory and learning in discrimination tasks. It would be important in understanding what this all means to an animal to know where the neurotransmitters were going and what neural systems were involved. I would go into this except it would probably take me years to understand it enough to be able to condense it into a blog post. Suffice to say, nothing is straight forward in the brain. Even the amygdala, which everyone 'knows' is all about fear, flight and fight and freezing, also plays a critical role in positive emotional states


Stress

The concern about neurotransmitters and neuroanatomy involved in negative reinforcement may be in its association with stress and negative emotional states. So let's try there for some clearer answers. Is negative reinforcement stressful to animals? First, let's define what we mean by 'stressful', here. The body's stress response is very adaptive and can handle everything from minor stressors such as being hungry to major "I'm going to die" moments. And it covers positive experiences as well. A dog that is chasing a ball is very aroused and will be experiencing elevated stress hormones. The strength of a stress response is typically proportional to the intensity of the emotion associated with it. A strong stress response associated with a negative event basically means a lot of fear or anger, but a weak stress response means being a little perturbed. The strength of stress responses can be measured in a variety of ways, but most commonly through concentrations of stress-related hormones such as cortisol, either in blood, urine, or saliva. It is not quite an exact science. Check out this excellent blog post for a really nice summary of the issues. At any rate, lots of normal, everyday things raise cortisol concentrations, and cortisol (and glucocorticoids, and other stress-related hormones) are not "bad" per se; we need them! This system is fabulous at what it does, which is to keep us engaged and motivated when we need to be, and to keep us safe and help us recognise opportunities and threats and take appropriate action. We can't learn without stress, and we don't remember things that weren't very stressful all that well. But stress can be very unpleasant, and prolonged or frequent stress responses are dangerous and can cause serious disease and illness. Most people in modern society have experienced chronic stress in some form. It is not fun. To delve into this fascinating topic more, I cannot recommend Professor Robert Sapolsky enough. He has many videos available free on YouTube and his book "Why Zebras Don't Get Ulcers" is entertaining and very informative and still one of my favourites. 

Stress is also not always as simple as isolated events. There are unique stress responses for all kinds of stressors, which speaks to just how finely tuned stress responses are. There is a large body of literature on uncontrollable versus controllable stressors and how exposure to various kinds of stressors moulds future stress responses. All of these changes are 'good' in that they are adaptive and help animals best handle the cards they have been dealt. But some hands are terrible and the best you can do is try to minimise how much you lose out. Some hands offer the beginnings of a better hand down the track.  An example of where stress adaptation may be the beginnings of something better is in what is usually called stress inoculation. Animals that have learned to control stressors are more resilient in the face of future stressors. They try for longer to make things better for themselves when they are stressed, which means they are more likely to succeed and less likely to sink into despondency. They also show more curiosity, better emotional processing, and better cognitive control. All of these things are good for an individual animal. It will help an animal respond appropriately to stress and also handle mild challenges in their life better, such as social situations and impulse control.

So if stress is an important part of everyday life and plays critical roles not just in keeping us safe, but also motivating us, helping us remember and learn, and in some cases has a positive effect on future experiences and responses to stress, then how do we assess the role of stress in negative reinforcement and whether it is good stress, unpleasant stress, or beneficial stress?

At a basic level, looking at what animals find negatively reinforcing can tell us what they wish to avoid, and therefore, what they find unpleasant. Indeed, this has been proposed as an indicator of welfare by a leading animal welfare scientist. And we do find indications that training animals with negative reinforcement results in less approach behaviour towards people, and stronger emotional responses towards people, and these animals are also less engaged with their handlers than animals trained with positive reinforcement. And this brings us to an enormous body of research in avoidance learning, which is where things get interesting.

Active avoidance and stress inoculation

Avoidance is something we all do, keeping ourselves safe, comfortable, and healthy. Lots of research has been done on avoidance learning in animals, but I'm going to focus on active avoidance because it is most like what trainers do when they use negative reinforcement to train animals. Active avoidance is where an animal performs a behaviour on cue to avoid an aversive event. An example we might see often with animals is dashing under furniture to hide when a child comes into the room. The presence of the child cues the avoidance behaviour. Another variation is escape behaviour, where the animal learns to perform a behaviour to 'switch off' an aversive experience. So the animal dashes under the furniture when the child is too rough with it. Both are sensible strategies. The animal gains immediate refuge. But we wouldn't think this was necessarily an ideal sequence of events. Active avoidance is associated with a large elevation in cortisol concentration during the learning phase. It is distressing for animals when they are exposed to something unpleasant. Their distress is generally proportional to how strong the unpleasant experience is (brushing past vs yelling in face). And if their response (run and hide) is one that requires a fair bit of energy, their arousal will be elevated as soon as they see what they will need to run from. Heightened arousal means more intense feelings. Furthermore, if their response isn't always reliable in gaining them refuge, or they sometimes miss the cue, or the cue comes often and randomly, that is a layer of uncertainty that will make them very vigilant and very focused on potential threats to the point where they are likely to see them where they don't exist. All in all, not good.

But what if they are exposed to something unpleasant and have a reliable way to handle it? Is it just as distressing? A significant drop in cortisol concentrations occurs once an avoidance behaviour has stabilised. It does not drop to baseline, because it is arousing to perform an avoidance behaviour no matter how nonchalantly. But the drop in cortisol correlates with an increase in behaviours associated with a relaxed state, suggesting that fear has diminished quite a lot. Commonly, there is no outward appearance of fear or distress, and this nonchalance has been noted by several researchers. Animals are not frantically performing an avoidance behaviour to stave off the bad things. In fact, they tend to wait until the last possible moment to perform it and otherwise go about their usual business. Humans also report reduced fear and an increased sense of control when they use active avoidance during phobia treatment. Furthermore, studies suggest that animals that are taught active behaviours to successfully avoid aversive experiences learn to suppress their natural freezing responses, which in turn enables them to control the stressor through operant means. Freezing up helplessly is considerably worse from a stress perspective than actively coping because it is probably experienced more intensely. Active coping also leads to stress inoculation, which we have already discussed. 

So it seems like successful avoidance is not necessarily associated with significant fear or distress. How could that be? The animals still don't like the aversive they are trying to avoid, right? If they are trying to avoid something it must be unpleasant. That's what Dawkins' suggestion for negative reinforcers as indicators of poor welfare was all about. It turns out there may be some pretty cool and strange things going on...


Safety signals

Bear with me while I give a brief but necessary background, here. A safety signal tells an animal that they are safe in the immediate future. It is trained by giving the animal an aversive experience and then pairing the ABSENCE of that aversive experience with a particular signal. So the signal comes to mean "You are safe for now". Safety signals appear to be able to inhibit fear surrounding an uncontrollable aversive experience AND inhibit the anxiety expressed after that event. This is pretty amazing stuff when you think about it. Safety signals themselves are not entirely negative reinforcement because no response is necessary so no particular behaviour is being reinforced, although an aversive experience is required in order to train one. But, an operant behaviour can take on a similar role, and these are learned through negative reinforcement.


Safety behaviours

 New research shows that it is inherently rewarding to avoid an expected aversive event - in other words, avoidance itself can be a form of positive reinforcement. Wha...? How does THAT work?? It has been suggested that there comes a point where an animal is not so much avoiding the aversive stimulus but approaching safety. Let's call these safety behaviours. It's not necessarily an official name. This is a pretty poorly understood area of science and I am not aware of anyone that has tested whether these behaviours have the same properties as a safety signal. They can be escape behaviours or avoidance behaviours in that they may switch off something unpleasant or avoid it completely, and that's how they become safety behaviours. It seems like a slippery distinction that could be used to justify some quite terrible things that are done to animals using negative reinforcement in the name of training, so let's be very clear about what this means. Safety has to be real to be sought. This means several things for the development of safety behaviours: 

1) It needs to be very clear to the animal that safety has now been attained - ideally, a specific signal (safety signal, possibly a cue or marker can take on this role). 
2) That safety has to be real and meaningful to the animal, not simply declared by a trainer or handler. 
3) Learning a behaviour to attain safety is actually quite hard in many circumstances, because animals already have natural behaviours they will tend to use when they feel threatened. If they are being taught a behaviour that runs counter to their goals (i.e. get distance from the scary thing), they may never be reliable or never learn it at all. 
4) The effectiveness of safety signals are inversely proportional to the strength of the threatening stimulus - in other words, the ability of safety signals to inhibit fear is influenced by arousal. The more threatening something is, the more aroused an animal becomes, and the more aroused they are, the less effective a safety signal will be in inhibiting fear. In short, if someone routinely threatens to clobber you with a baseball bat, it won't make you feel very safe to know that they never will as long as you run to the other side of the room whenever they make the threat. But if someone routinely threatens to swat you with a newspaper, knowing they won't as long as you run to the other side of the room probably will make you feel pretty nonchalant about the threat. 
5) Animals that are regularly becoming afraid are most likely going to become pessimistic, even if they are practiced at avoiding aversive experiences and seeking safety. 

There is no line between "animals like safety" and "I should therefore create scenarios in training where I can reward them with safety." They like safety, but they like tangible rewards more, and if we want happy, optimistic animals, we should be very much focused on giving them as many opportunities to access tangible rewards as we can. We should also be very aware that their sense of safety comes first to the point where if they feel unsafe they will be primarily motivated to seek safety. Food, play, social contact... all of these things come secondary to seeking safety. We should therefore make their safety our first priority. Compromising their sense of safety ourselves is not clever and runs counter to our goals if we want happy animals first and foremost. Training safety signals and safety behaviours can and should be done opportunistically. If you are able to keep your dog safe and protected from aversive experiences at all times, you do not need safety signals. Although it may mean your animal is both more sensitive to aversive experiences and may experience them more intensely. I do not believe it is in any animal's best interests to attempt to protect them from all aversive experiences. They have evolved a truly wondrous system to handle them, just as we have. We just have to be careful that what they experience is well within their coping abilities and does not have a lasting impact on their mood or health. 

Animals look after their safety first and foremost.


Safety behaviours can also play a role in the treatment fears and phobias by increasing the acceptability of exposure. This has, to my knowledge, only been done in humans where safety behaviours can be quite problematic. However, in some circumstances they can offer a stepping stone to further treatment, making sufferers feel less fear during exposure and they tend to approach closer to the object of their fear. I have done this with a wild hare and found indications of similar results. The hare was taught to 'ask' for space by pulling away from me. If he pulled away, I respectfully did not follow or I backed up. He soon began allowing me to touch his flanks, head, and legs, which put him in a very vulnerable position, as if I had wanted to grab him (something that is probably always on a hare's mind), I was in an excellent position to do so. 


Emotional state

All this talk about the ambiguity in neuroscience, stress, and even approach and avoidance behaviours has left us with a bit of a quandary. If neuroscience is crazy complicated, and sometimes stress is good, and sometimes avoidance is positive reinforcement, and safety signals inhibit fear, and negative reinforcement can be very scary or barely register and everything in between, how can we tell if negative reinforcement is an ethical training approach? I believe the answer is in emotional states. Possibly because I have done a lot of work in detecting emotional states. But really, emotional states are at the center of all this. Whatever the animal is experiencing, it should be detectable in behavioural changes, although they may be subtle and take some careful observation. Whether stress or avoidance is good or bad will directly influence emotional state, either positively or negatively, and that will affect behaviour.

An animal will tend to develop a positive mood if they experience a lot of good things, and if they experience unpleasant things, they will tend to develop a negative mood. There are passing few ways to reliably measure emotional state in animals, which is why I was able to do a PhD on it. Lacking the ability to measure neural activity, cortisol concentrations, reward sensitivity, and cognitive bias, we are left with the terrible inadequacy that is behavioural indicators. The biggest problem with behavioural indicators is that they are hard to identify and open to interpretation. There are a few behaviours we know in dogs are associated with elevated cortisol concentrations in an environment where this is almost certainly due to emotional distress, such as increased urinating, physical activity, and increased displacement behaviours (lip licking and paw lifts in particular). In turn, there are very few indicators of positive emotional state. Play is one, and anticipatory behaviour surrounding rewards is another. It is pretty hard to identify positive anticipatory behaviour in dogs because the work hasn't been done, but it's probably fair to say if they are looking something like the picture below, we're on the right track. 

Anticipatory! Image Eric Danley

This is not as useful as we might hope. These are behaviours generally associated with major, chronic stress. It is quite unlikely that we would see this as a result of simply using negative reinforcement in training. Even if we ONLY used negative reinforcement in training. We need something more sensitive. While we can't formally measure cognitive bias and reward loss sensitivity, we can look for it in everyday behaviour. 

1. Exploration and approach behaviour - We would expect a reduction in exploration and approach behaviour if emotional state is tipping towards negative, and an increase where the emotional state is tipping towards positive. It fits in very nicely with what we know about optimism. The horse study cited earlier showed horses trained with negative reinforcement were less explorative and approached people less, which suggests the training has a negative effect on the horses' emotional state. This will hurt our training goals if we like to shape behaviours!

2. Interest in training - If an animal becomes less willing to participate in training, which may manifest in distractibility, nervousness, skittishness, lots of displacement behaviour (sniffing the ground, staring into the distance, scratching, anything to delay having to train), disinterest, and general unwillingness to approach either the trainer or the training environment, this is BAD. It suggests the animal does not enjoy training. We want to see them engaged and readily coming to you, prick eared and leaning forward. Unless they don't have visible ears. Then just leaning forward.

3. Willingness to offer behaviours - Animals in a negative emotional state are expected to be behaviourally suppressed to some degree. They will not really want to try new behaviours and may be reticent to offer those they know even when cued because it is risky to them. If we have an animal that is either offering behaviours on its own or can easily be coaxed into doing something new, they are most likely in a positive emotional state. 

4. Sensitivity to reward loss - Animals that are in a negative emotional state feel keenly when they think they have missed out on a reward they were expecting. In training this may manifest in relative slowness and reluctance particularly where reward rate has decreased or when attempting to move to a variable reinforcement schedule. 


So... what's the verdict?

Click here for an analysis of the literature and my take on the ethics of using negative reinforcement in training and behaviour modification (plus a reference list). 

No comments:

Post a Comment