Prisoner's Dilemma: A Paradox in Game Theory
In the mid-1920's, the Nebraska State Police achieved what may still be their finest moment. After a 400-mile car chase over dirt roads and corn fields, they finally caught up with the notorious bank robbers Bonnie and Clyde. The two criminals were brought back to the police station in Omaha for further interrogation.
Bonnie and Clyde were questioned in separate rooms, and each was offered the same deal by the police. The deal went as follows (since both are the same, we need only describe the version presented to Bonnie):
"Bonnie, here's the offer that we are making to both you and Clyde. If you both hold out on us, and don't confess to bank robbery, then we admit that we don't have enough proof to convict you. However, we will be be able to jail you both for one year, for reckless driving and endangerment of corn. If you turn state's witness and help us convict Clyde (assuming he doesn't confess), then you will go free, and Clyde will get twenty years in prison. On the other hand, if you don't confess and Clyde does, then he will go free and you will get twenty years.''
"What happens if both Clyde and I confess?'' asked Bonnie.
"Then you both get ten years,'' said the interrogator.
Bonnie, who had been a math major at Cal Tech before turning to crime, reasoned this way: "Suppose Clyde intends to confess. Then if I don't confess, I'll get twenty years, but if I do confess, I'll only get ten years. On the other hand, suppose Clyde intends to hold out on the cops. Then if I don't confess, I'll go to jail for a year, but if I do confess, I'll go free. So no matter what Clyde intends to do, I am better off confessing than holding out. So I'd better confess.''
Naturally, Clyde employed the very same reasoning. Both criminals confessed, and both went to jail for ten years. The police, of course, were triumphant, since the criminals would have been free in a year had both remained silent. (ref: Prisoner's Dilemma: A Fable)
The prisoner's dilemma had been created. What would you do in such a situation? What strategies could be adopted?
The paradox of prisoner's dilemma has been around in nature since the beginning of life.. The first version applied to game theory was invented at Princeton's Institute of Advanced Science in the 1950's and based on the Bonnie and Clyde situation. In this basic scenario after which it is named, the police know that two prisoners have committed a small crime A (in this case reckless driving and endangerment of corn), but they want to convict them of a more serious crime B (bank robbery). The prisoners are then held in separate cells and offered the deal. This dilemma creates an interesting conflict between individual and collective interests. The Prisoner's Dilemma is a genuine paradox. Psychologists use prisoner's dilemma to find out more about the behaviors of their patients. The following is another example of this dilemma:
You are anxious to buy some diamonds from a dealer named Anna at an agreed to price, and Anna is anxious to sell. Due to unknown circumstances, the exchange must take place in secret. Each of you agrees to leave your bag in the woods at a specific place, and pick-up the other's bag at a different place. (ref: Decision Science)
Once again, both you and Anna are individually better off if you do not leave your bag regardless of what the other does. However, if neither of you leave a bag then you have gained nothing and are back where you started. Thus, prisoner's dilemma is a psychological test. Different religions and different philosophies have all developed different principles and morals for dealing with these types of situations. Unfortunately, it is difficult to apply these broad general principles to all specific situations. These decisions depend on trust and suspicion. For example, if you and Anna are life long friends, you have have a justifiable reason for trusting her. On the other hand, if you know Anna is a compulsive liar, you have a justifiable reason not to leave your bag. However, in the majority of cases the personality of your opponent is unknown to you, which makes the situation much trickier to predict. Dilemmas like this are common in business.
For example, a price war between two nearby gas stations is identical to the Prisoner's Dilemma. If one gas station lowers their price, they will increase their business at their competitors expensive. But if both slash prices then both reduce profits. (ref: Decision Science)
The prisoner's dilemma falls into the category of game theory. A prisoner's dilemma always involves two (or more) "game players," and each has a choice to either "cooperate" or "defect." If the two players cooperate (i.e. hold out from the police), they each do moderately well (i.e. 1 year in prison), and if they both defect (i.e. confess to the crime), they each do moderately poorly (i.e. 10 years in prison). If one player cooperates and the other defects, then the defector does extremely well and the cooperator does extremely poorly (also called the sucker's payoff, i.e. defector goes free while sucker gets 20 years). This can be depicted by the following equations:
DC > CC > DD > CD
CC > (DC + CD) / 2
A matrix can be created to represent the following data, and a point system can be assigned to show the losses and gains in each situation:
|B Cooperates||B Defects|
|A Cooperates||A:-1 B:-1||A:-20 B:0|
|A Defects||A:0 B:-20||A:-10 B:-10|
In prisoner's dilemma, you are always better off if you defect, regardless of what the other person does. Unfortunately, if you both defect then you both end up in prison. An individualistic approach may lead to an unfavorable outcome, but no matter what, you will never be worse off then your opponent. However, things changes when iterated prisoner's dilemma comes into play. Iterated prisoner's dilemma occurs when the same game of prisoner's dilemma is played round after round against the same opponent. This allows for each player to develop a sense of how the other will behave in order to plan ahead for their own moves. Now long term trust and suspicion are involved, and moves must be made with the future in consideration. It becomes a giant guessing game of cooperation and deceit. Do you want to be trusted? Work together for the common good? Or are you going purely for your own benefit?
In the late 1970's, political scientist Robert Axelrod held a computer tournament designed to investigate the prisoner's dilemma situation. Contestants in the tournament submitted over 70 computer programs that would compete in an iterated prisoner's dilemma game of approximately two hundred rounds. Each contestant's program played five iterated games against each of the other programs submitted, and after all games had been played the scores were tallied. (ref: Decision Science)
The following is a reproduction of this tournament on a smaller scale. It involves some of the more obvious strategies that can be used:
Sucker: always cooperates
Scorpion: always defects
Spiteful: cooperates until the opponent defects, then always defects
Less Spiteful: cooperates until the opponent defects, then defects the next two turns
Envious: defects if opponent has a higher overall score, cooperates if opponent has lower
Prober: defects if opponent cooperates for long streaks, continues to defect unless opponent reacts
Fairplay: cooperates if opponent has more total cooperates than total defects, and vice versa
Think Alike: cooperates if both players made the same move on previous round, defects if different
Pavlov: does the same move as previous round if the outcome was a win, otherwise changes to opposite move
Tit for Tat: cooperates on the first round, then mirrors the opponent's last move
Nice Tit for Tat: defects first round, then cooperates unless opponent defects two turns in a row
A Delphi computer program was created to test the 11 strategies above. Each strategy is played against the all the others, 1000 rounds per strategy, and the points are tallied. Four trials are conducted. Lastly, the strategy is played 11000 rounds against a player who chooses randomly. The following point matrix was used for the test:
|B Cooperates||B Defects|
|A Cooperates||A:3 B:3||A:0 B:5|
|A Defects||A:5 B:0||A:1 B:1|
Highest score possible: 55000 (11000 successful defects)
|Strategy||Trial 1||Trial 2||Trial 3||Trial 4||Random|
|Tit for Tat:||32728||32752||31275||32713||24567|
|Nice Tit for Tat:||30914||30917||30884||30896||20750|
Applying these strategies to the Bonnie and Clyde dilemma for 11 iterations yield the following averaged results (in prison years):
|Less Spiteful:||36||Tit for Tat:||29|
|Envious:||85||Nice Tit for Tat:||29|
Link to Delphi program
According to these results, the defensive strategies (defect only if the opponent defects first) scored overall higher than the offensive strategies (defect first). Tit for tat emerged as the overall highest scoring strategy, but its defect was that its score was always less than or equal to that of its opponent (not depicted above). Scorpion, on the hand, had a consistent low score, yet it always scored higher than or equal to its opponent. Therefore, depending on your goal (score or beating your opponent), different strategies might be adopted. Interestingly, fairplay emerged with the widest variety of scores.
Pavlov, a strategy named in 1993 by Nowak and Sigmund, was also among the top scorers, and may be used as a substitute for Tit for Tat. Against a random player, it actually achieved a higher overall score. Pavlov-like strategies are very popular amongst slot machine gamblers. Also, the more forgiving version of tit for tat, nice tit for tat, scored very high. This strategy may be optimal when the other player makes mistakes. Laboratory experiments find humans use nice tit for tat as well as sophisticated pavlov-like strategies.
Recently, many new types of prisoner's dilemma tournaments have emerged, the most interesting of which are the evolutionary variety. After the first round, the strategies with the highest point totals are allowed to replicate, and those with the lowest point totals are removed from the population. This is repeated, and after a certain number of rounds, the strategy with the highest population wins. Tit for tat prospers, but never over runs a population. It has been proven that no strategy is evolutionary stable. In other words, the constant changes in population makes it so that no one strategy can always guarantee success over all the others.
However, all of these strategies are played against a computer opponent with limited and usually fixed moves. Real life strategies "evolve" depending on what moves the opponent makes and they "adapt" to maximize their score. What if the opponent were to randomly switch between strategies? Critics wonder if these computer tournaments can be compared with real life situations. Why should anyone adopt a "nice" strategy? In the modern world humans are often engaged in one-round prisoner dilemma situations (i.e. interactions with strangers).
The contestants in Axelrod's tournament included professors of political science, mathematics, psychology, computer science, and economics. The winning program, the program with the highest average score, was submitted by Anatol Rapoport, a professor of psychology at the University of Toronto. His strategy, tit for tat, the simplest strategy submitted, won the first computer tournament. A curious property of tit for tat is that in a head to head battle against any other strategy it can never win. This is because it is never the first to defect, and it forgives the other player after only one round. What happens in the computer tournament is that when aggressive strategies meet, they get locked in a cycle of double-crossing, and destroy each other. The power of tit for tat is that it forces competitive strategies to cooperate. (ref: Decision Science)
After publishing his results and findings, Axelrod ran a second tournament. Once again, tit for tat emerged as the winner. Axelrod ranked all the strategies. What separated the high scoring entries from the low scoring entries is the property of being 'nice', that is not being the first to defect. Amongst the high scoring entries, there was a second property that separated these. The lowest scoring nice program was the least forgiving. After more analysis, Axelrod came up with the following four maxims on choosing an effective strategy:
1. Don't be the first to double cross
2. Defend yourself, but be forgiving
3. Don't be envious (if someone has more points than you, don't try to slip in an unprovoked double cross)
4. Be clear (make sure the other player understands the consequences of their actions)
Can these maxims also be applied to real life situations?
The prisoner's dilemma is a simple but powerful idea. Once you have hold of it, you see its relevance to every aspect of life and experience. The prisoner's dilemma has been used to analyze problems in nuclear warfare, anthropology, biology and evolution. Everyone should be familiar with the many lessons of Prisoner's Dilemma. It provides incredible insight into human behavior. Many ethicists use it as a starting point to delve deep into the meaning of morality (see The Ethical Spectacle).
We are all prisoners.
What can we learn from this experiment?
Maybe tit for tat is the perfect strategy to adopt in an imperfect world. But then again, maybe not.
http://www.ozmioz.com/decisionScience/prisonerDilemma.htm - Decision Science
http://people.cs.uchicago.edu/~earl/cs115/prisoner/prisoner.html - Prisoner's Dilemma: A Fable
http://www.spectacle.org/995/ - The Ethical Spectacle
http://www.lifl.fr/IPD/applet-tournament.html - Online Simulators
Evolution of Cooperation - Robert Axelrod's book of his discoveries