Hogg's Research: Python

2012-01-23

Python

Foreman-Mackey returned from his furlough at Queens, where he was finishing a paper with Widrow on Andromeda. I quizzed him about some details of cacheing (very slow computations) in my Python RGB-to-CMYK code; he had good ideas. One thing he noted that instead of doing if rgb in cache.keys(): it might be far faster to do try: cmyk = cache[rgb] and then catch the KeyError exception. Apparently that is the rage and style in Python programming. He also promised to help me Python-package and docstring everything. Looking forward to it!

[Note added a few minutes later: Switching from the keys() check to the try style sped up the cache retrieval by a factor of 40!]

2 comments:

Geert Barentsen23 January, 2012 18:55
Note that the Pythonic way is:
"if key in dict:"
rather than
"if key in dict.keys():"

The latter turns an O(1) operation into O(n).

Entering an except causes overhead, so I would try the "for key in dict:" method instead!
ReplyDelete
Replies
Geert Barentsen23 January, 2012 19:05
Ahhh this is a fun experiment. On my machine, "for key in dict" is a factor ~20 faster than "try-except" when the key does NOT exist, but a factor ~1.5 slower when the key DOES exist. (Exceptions come with signal handling overhead.) Benchmark snippet:

from timeit import Timer
mydict = "d={"+",".join(["'%d':%d" % (i,i) for i in range(10000)])+"}"
iterations = 20000
key = "bambi"
t=Timer("if '%s' in d:\n\td['%s']+1" % (key,key), mydict)
print t.timeit(iterations)
t=Timer("try: d['%s']+1\nexcept KeyError: pass" % key, mydict)
print t.timeit(iterations)

Output:
0.0021071434021
0.043732881546
ReplyDelete
Replies

Add comment