2012-01-23

Python

Foreman-Mackey returned from his furlough at Queens, where he was finishing a paper with Widrow on Andromeda. I quizzed him about some details of cacheing (very slow computations) in my Python RGB-to-CMYK code; he had good ideas. One thing he noted that instead of doing if rgb in cache.keys(): it might be far faster to do try: cmyk = cache[rgb] and then catch the KeyError exception. Apparently that is the rage and style in Python programming. He also promised to help me Python-package and docstring everything. Looking forward to it!

[Note added a few minutes later: Switching from the keys() check to the try style sped up the cache retrieval by a factor of 40!]

2 comments:

  1. Note that the Pythonic way is:
    "if key in dict:"
    rather than
    "if key in dict.keys():"

    The latter turns an O(1) operation into O(n).

    Entering an except causes overhead, so I would try the "for key in dict:" method instead!

    ReplyDelete
  2. Ahhh this is a fun experiment. On my machine, "for key in dict" is a factor ~20 faster than "try-except" when the key does NOT exist, but a factor ~1.5 slower when the key DOES exist. (Exceptions come with signal handling overhead.) Benchmark snippet:

    from timeit import Timer
    mydict = "d={"+",".join(["'%d':%d" % (i,i) for i in range(10000)])+"}"
    iterations = 20000
    key = "bambi"
    t=Timer("if '%s' in d:\n\td['%s']+1" % (key,key), mydict)
    print t.timeit(iterations)
    t=Timer("try: d['%s']+1\nexcept KeyError: pass" % key, mydict)
    print t.timeit(iterations)

    Output:
    0.0021071434021
    0.043732881546

    ReplyDelete