You probably know, code golf is a “sport” of writing a program which solves a task with as little number of characters of source code as possible.
At the last Python meetup Brian Lauridsen presented a code golf challenge. The challenge is called grid computing and was available at codegolf.com, which at the moment of writing is down. Here is the approximate description of the challenge:
Write a program that expects a 10-by-10 matrix from standard input. The program should compute sum of each row and each column and print the highest of these numbers to standard output.
An example input:
01 34 46 31 55 21 16 88 87 87 32 40 82 40 43 96 08 82 41 86 30 16 24 18 04 54 65 96 38 48 32 00 99 90 24 75 89 41 04 01 11 80 31 83 08 93 37 96 27 64 09 81 28 41 48 23 68 55 86 72 64 61 14 55 33 39 40 18 57 59 49 34 50 81 85 12 22 54 80 76 18 45 50 26 81 95 25 14 46 75 22 52 37 50 37 40 16 71 52 17
Expected output:
615
The challenge is very tempting by its simplicity. However, according to codegolf.com, the best solution in Python was written in only 73 characters. At the meetup we got it down to 78 characters, but not less. So next weekend after the meetup I immersed into this challenge to get it down to 73. And here are the results.
Spoiler alert: this page contains the 73 character solution.
Let’s see what a naïve solution can look like in Python:
import sys # Parse matrix. matrix = [] for line in sys.stdin.read().splitlines(): row = [] for digits in line.split(' '): number = int(digits) row.append(number) matrix.append(row) # Row sums. all_sums = [] for row in matrix: all_sums.append(sum(row)) # Column sums. for i in range(10): column = [] for row in matrix: column.append(row[i]) all_sums.append(sum(column)) print(max(all_sums))
Total: 446 characters.
I hope you are displeased by this code as much as I am. Let’s gradually improve it.
File-like objects have a .readlines()
method, so we could
call it instead of .read().splitlines()
. However,
you might know that file-like objects support the iterator
protocol, so you can iterate over them directly:
for line in sys.stdin: ...
Also you might know that calling .split()
on a string
will split over any whitespace, so no need to call
.split(' ')
.
Next, using list-comprehensions instead of for loops tightens up the code considerably:
import sys matrix = [[int(digits) for digits in line.split()] for line in sys.stdin] row_sums = [sum(row) for row in matrix] column_sums = [sum([row[i] for row in matrix]) for i in range(10)] print(max(row_sums + column_sums))
Total: 256 characters.
Some people will already claim that this is too tight and unreadable, but I think it is still sensible.
Next, using map
instead of list comprehensions will
save us some characters. We can use:
map(int, line.split())
to parse each line,map(sum, matrix)
for row_sums
.However, there is some duplication between row_sums
and column_sums
. We could pull out summation,
and apply it in the last step instead:
import sys matrix = [map(int, line.split()) for line in sys.stdin] rows = [row for row in matrix] columns = [[row[i] for row in matrix] for i in range(10)] print(max(map(sum, rows + columns)))
Total: 197 characters.
Now, the above code is silly, because rows
is the same as matrix
,
and columns
is the same matrix, but transposed. We could
use a library like numpy
to transpose a matrix, but using
libraries is against the rules of code golf.
Otherwise you could just
write libraries that solve a challenge in one function call.
But you might know this “party trick,” that zip(*matrix)
transposes a matrix. If you didn’t know about this,
stop and think about it for a minute.
import sys matrix = [map(int, line.split()) for line in sys.stdin] print(max(map(sum, matrix + zip(*matrix))))
Total: 113 characters.
Now let’s go all-dirty on this and remove all unnecessary whitespace, and make every variable a single letter.
import sys m=[map(int,l.split())for l in sys.stdin] print(max(map(sum,m+zip(*m))))
Total: 83 characters.
Now, we are much closer to our goal of 73 characters.
One way to save 5 characters would be to use a built-in
function called input
, which reads a single line from
standard input. However, this is when we realize that
the codegolf.com checks the solutions using Python 2.5.
That forces us to use raw_input
instead, saving a single
character instead of 5.
On the other hand, Python 2 allows to drop parenthesis
of print
call, so we save another character:
m=[map(int,raw_input().split())for _ in range(10)] print max(map(sum,m+zip(*m)))
Total: 81 characters.
Using raw_input
forced us
to use range(10)
instead of iterating over lines.
How can we mitigate this?! Another way to iterate 10 times
would be to iterate over a collection of length 10.
We can get this collection by multiplying a list with
a single item by 10:
m=[map(int,raw_input().split())for _ in[0]*10] print max(map(sum,m+zip(*m)))
Total: 77 characters.
However, we still haven’t used one important code golf trick.
If you do a hexdump
of our program, you can see that
we have a trailing newline, which is usually added by the
editor. You might also discover that your editor is
using \r\n
instead of \n
.
$ hexdump -c grid_challenge.py | cut -c '9-'
m = [ m a p ( i n t , r a w _ i
n p u t ( ) . s p l i t ( ) ) f
o r _ i n [ 0 ] * 1 0 ] \n p
r i n t m a x ( m a p ( s u m
, m + z i p ( * m ) ) ) \n
Yep, we have a trailing \n
. Let’s write a script that
removes it from our programs, and now we are down to…
Total: 76 characters.
And here the really hard part starts. I have spent many hours trying to reduce this number. And it got really nasty.
One little known misfeature of Python 2 is that
it leaks variable bindings from comprehensions
(as well as except
clauses). This was fixed in Python 3.
Here’s some code to illustrate the issue:
$ python3 >>> [x for x in range(10)] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> x Traceback (most recent call last): File "<stdin>", line 1, inNameError: name 'x' is not defined
$ python2 >>> [x for x in range(10)] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> x 9
You see, x
– “leaked” from inside the list comprehension
and was available in the outer scope with its last assigned value.
How can we use this?! Well, right now we are not using
the list comprehension variable at all. But instead we
could put it to some use… We are calling map
twice.
It would be so handy if we could put it into a variable,
without paying the cost of an explicit assignment…
m=[_(int,raw_input().split())for _ in[map]*10] print max(_(sum,m+zip(*m)))
Total: 74 characters.
Here we created a list of 10 map
functions, which allowed
us to assign map
to the undescore (_
) variable and save 3 characters in total.
But we are still behind the
world’s best 73 character solution.
One less obscure feature of raw_input
is that it takes
an optional argument called prompt
, which gets printed
as a prompt before it reads a line from standard input.
Let’s see if we gain anything by assigning raw_input
to
the undescore (_
), instead of map
, and then…
m=[map(int,_().split())for _ in[raw_input]*10] _(max(map(sum,m+zip(*m))))
Total: 73 characters.
Yes! By using raw_input
instead of print
we saved the
last character and reached the world record for this
challenge.
I bet you have learned something new about Python today. ■