Friday, September 26, 2014

Solving a subset of the Infinite Monkey Theorem in Python

Way back in the early part of the 20th century, a French mathematician by the name of Emile Borel had the zany idea that if you gently set a monkey in front of a typewriter and gave him an infinite amount of time (and paper, and ink, and probably banana incentives), that he would almost certainly type out any string of text (an example often cited is the complete works of Shakespeare, although it's more likely that Borel needed someone to type up a conference paper, and didn't have a PhD student on hand). There are variants of this theorem that involve an infinite number of typists, and so forth. Let's start with a simple example to check the reality of this theorem. Say we have a keyboard with only 26 keys - one for each letter of the English alphabet. The monkey sets to work trying to type out the string "Borel", note that we don't distinguish between a capitalized letter and a non-capitalized letter. The monkey has a 1/26 chance of typing any one letter, and assuming each keystroke is statistically independent, he then has a (1/26) * (1/26) * (1/26) * (1/26) * (1/26) = 8.4165e-8. Not exactly good odds, but not exactly zero - therefore statistically possibly.

Now, as a simple, fun exercise. Let's replace the monkey with a snake and write a Python script that tries to generate the Shakespeare quote "Live, Love, and let be". To make things a bit easier, we will ignore the uppercase and the commas for now and only search for the string "live love and let be".

Here is the code, in all of its simian glory:

import random

#live love and let be len==20
def generator(strlen):
sigma='abcdefghijklmnopqrstuvwxyz '
result = ''
for i in range(strlen):
result = result + sigma[random.randrange(27)]
return result

def score_function(target_str,gen_str):
score = 0.0
for i in range(len(target_str)):
if target_str[i] == gen_str[i]:
score = score + 1
return score / len(target_str)

def main():
target = 'live love and let be'
gen = generator(20)
best_score = 0
curr_score = score_function(target, gen)
while curr_score < 1:
if curr_score > best_score:
print gen,curr_score
best_score = curr_score
gen = generator(20)
curr_score = score_function(target, gen)

main()

I let this run for a few minutes on an Intel i3 with 2GB ram, and the highest match I saw was a 0.4.
Let's go ahead and break down the algorithm at a higher level, and then delve into it line by line. We basically want to generate a random string of characters of some length using the generator function. We then want to see how well that generated string of characters compares to our target Shakespeare quote by using the score function. Score returns a value between 0 and 1. 0 meaning none of the characters matched, and 1 meaning it was a direct hit. We then use our main routine to generate and score strings until we find a perfect match, all the while providing updates on what the longest match so far was.