Not exactly a snappy way to start my new blog, but…

I just finished reading Critical Mass by Phillip Ball and one of the topics discussed was the iterated prisoner’s dilemma. I had implemented a Spatial IPD as part of the Generation5 JDK and more recently as part of the C# port, ‘Wintermute’.

Prior to reading Critical Mass though, I saw the SIPD as an interesting cellular automata implementation of a famous game theory exercise but never fully experimented with the model’s insights. In his book, Ball detailed how IPD can be used to show how cooperation is evolved as being more beneficial than selfish behaviour. Even more fascinating was the IPD’s ability to show how more cooperative strategies are more beneficial, until you get to a completely cooperative world — at which point, a very selfish strategy can quickly take advantage of the “naivety” of the world.

I decided to expand my code to explore this further to see if I could replicate this.

Firstly, I’ll revisit where my current implementation was at. Below is a screenshot of a couple of iterations of a 150×150 grid of SIPD agents. The colour is dictated by the strategy. All-cooperate is red, all-defect is yellow, random is green, tit-for-tat is blue and pavlov is orange.

The left-most image is the starting, random generation. With the first iteration, you can see that virtually all “all-cooperate” are eliminated and replaced by “all-defect” agents. However, from the screenshots at about 25 and 50 time steps, you can see the a tit-for-tat strategy quickly takes hold.

I modified the code to introduce the notion of a ‘mistake’. In reality, people don’t always make the right decision (think about the 45-minute WMD claim that took the UK/US to war with Iraq) — sometimes a defect happens, when in reality we should have cooperated. This noise can have a devastating effect on a TFT strategy, as they get into a cycle of ‘revenge’ after a single mistake occurs between the two cells. In this situation, Pavlov’s algorithm seems to have a much stronger presence:

Interestingly, you see that the TFT strategy takes hold quickly as usual. However, after some time (about 350 iterations, compared to a total of about 50 iterations for the first 3 captures), Pavlov gains more traction.

Pavlov’s “win-stay, lose-shift” means it will cooperate where possible, but can exploit strategies like an All-Cooperate in a noisy environment as it will eventually “discover” it can exploit its opponent. This means that, over time, Pavlov can slowly gain ground against TFT. This said, from my experiments, I have yet to see Pavlov win out completely as TFT does is an environment with no accidental moves – I ran a simulation for over 10,000 iterations and after the first 800 rounds, Pavlov fluctuated between 35%-48% with TFT taking the rest.

I also introduced a second type of noise, in the form of random mutations of strategies in a tiny percentage of the population each iteration. This increased the diversity of the strategies, but created interesting buffer regions where different strategies exploited the weaknesses of their neighbours and vice-versa. A veritable, mini-ecosystem!

While this is quite interesting to watch, it still doesn’t prove that cooperation can be evolved – or that it is more beneficial than defection in the long run. Indeed, as we’ve discussed, Pavlov can be seen as very exploitative! So the next step is to try and evolve cooperation over time using a genetic algorithm.

Leave a Reply

Your email address will not be published.