We've been recently going through a recruiting process at work and my colleague came up with this simple exercise to see how a candidate organizes his development: Write a function that returns true when it's passed a palindrome.
To me in coding there is two ways to look at problems, you can think optimization, big O notation, or you can think on developer time, and definitely this question gives you access to both.
Short on dev time
After all, how often do you need to detect a palindrome, so why spend too much time writing down this method. We appreciated the simplicity of one of the candidate which the response was:
def palindrome(x):
return x == ''.join(reversed(x))
Can't really go simpler than that. Big O notation I'd expect reversed and join to be O(2n) and the comparaison to range from O(1) to 0(n).
Less iterations
Here we test the first half of the palindrome against the second half. It was a clever way to limit the iteration, even though the candidate started to try to detect if the palindrome length was odd or even to detect the middle number. See, if the palindrome is radar the middle letter doesn't matter, as it's either shared on both sides or ignored.
def palindrome_2(x):
middle = len(x) / 2
return x[:middle] == ''.join(reversed(x[-middle:]))
On paper the join wasn't there, therefore it looked like a nice way to divide by half the number of iteration, ending with O(n/2). Sadly you cannot compare a list against a string like that in python.
Testing those methods
Out of curiosity I wanted to test those methods. I decided to use "Crime and Punishment" (downloaded from the gutenberg project), as the title sounded perfect for the case. So I'm gonna run every method against the list of words in the book.
#!/usr/bin/python
def palindrome_1(x):
return x == ''.join(reversed(x))
def palindrome_2(x):
middle = len(x) / 2
return x[:middle] == ''.join(reversed(x[-middle:]))
# book downloaded from http://www.gutenberg.org/files/2554/2554.txt
# CRIME AND PUNISHMENT By Fyodor Dostoevsky
def get_words():
words = []
with open('2554.txt', 'r') as f:
for line in f.readlines():
for word in line.split():
if len(word) > 1:
words.append(word)
return words
def run(words, method):
total = 0
for word in words:
if method(word):
total += 1
print total
I'll be running the code with timeit, but since it's a lot of words I wont go for the 1000 iterations... lets make 3 times 5 iterations
$ python -m timeit -n5 -s 'from palindrome import run, get_words, palindrome_1; words = get_words()' 'run(words, palindrome_1)'
515
515
...
515
5 loops, best of 3: 378 msec per loop
$ python -m timeit -n5 -s 'from palindrome import run, get_words, palindrome_2; words = get_words()' 'run(words, palindrome_2)'
515
515
...
515
5 loops, best of 3: 502 msec per loop
So turns out that what was supposed to be lower in iteration is longer in treatment.
Make it faster
So far no candidate thought out of the box. Think about how common is a palindrome, it's pretty rare, statistically you'll get more often something that isn't a palindrome. So, if you tend to return more often false, then your test should be optimized to return false, not optimized to return true. So instead of testing that something is a palindrome let's optimize it to detect something that isn't:
def palindrome_3(x):
middle = len(x) / 2
for i, letter in enumerate(x[:middle]):
if letter != x[-i-1]:
return False
return True
Here is the timeit
$ python -m timeit -n5 -s 'from palindrome import run, get_words, palindrome_3; words = get_words()' 'run(words, palindrome_3)'
515
515
...
515
5 loops, best of 3: 197 msec per loop
Interviews aren't over, lets see what the next candidates come up with.
Edit from my colleague
My colleague came up with this version after reading this post, no enumerate, less comparaison:
def palindrome_4(x):
i = 0
j = len(x)
m = j / 2
while(1):
j -= 1
if x[i] != x[j]:
return False
if j == m:
return True
i += 1
The results
$ python -m timeit -n5 -s 'from palindrome import run, get_words, palindrome_4; words = get_words()' 'run(words, palindrome_4)'
515
...
515
5 loops, best of 3: 121 msec per loop
Memoryview and Extended Slices
added on July 2014
I recently stumble upon memoryview
, it allows you to scan the memory representation of an object and iterate over it. This little builtin would allow comparaison of parts of the string to each others. Here's the implementation:
def palindrome_2_bis(x):
middle = len(x) / 2
m_x = memoryview(x)
return x[:middle] == m_x[-middle:].tobytes()[::-1]
[::-1]
is a trick to reverse the string. It turns out being 30% faster than the previous implementation, not bad.
But more interestingly, [::-1]
opened up the door to something even better, extended slice could allow a direct comparaison of slices of the string without the need for a join:
def palindrome_2_ter(x):
middle = len(x) / 2
return x[:-middle-1:-1] == x[:middle]
This time we are nearly as fast as our last palindrome method (157ms vs 135ms per book on my current machine)