Python snippets #2
Some code snippets taken from: Nifty python tricks.
Dict and Set comprehension
my_dict = {i: i * i for i in xrange(100)} my_set = {i * 15 for i in xrange(100)}
Evaluating python expressions
import ast expr = "[1, 2, 3]" my_list = ast.literal_eval(expr)
Debugging and profiling scripts
# Debugging - set breakpoint in your code: import pdb (...) pdb.set_trace() # Profiling python -m cProfile script.py
Reverse in-place vs Create reversed copy
Using list.reverse() reverses the list in-place. If we want a copy leaving the original list (or string) intact:
# Also for strings - a = [1,2,3,4] reversed = a[::-1]
Pretty Print
from pprint import pprint pprint(my_dict)
Most common / useful python modules
According to this Reddit thread, the most common / useful Python modules.
- sys, os, math, shutil, tempfile, re, string, glob
- collections
- csv
- datetime
- unittest
- json / ujson
- buildout
- BeautifulSoup
- docopt / argparse / optparse
- decimal
- hashlib
- itertools, functools
- urllib, urllib2, urlparse, requests, httplib, smtplib
- socket
- io (StringIO, BytesIO)
- subprocess
- logging
- operator
- random
- pickle / cpickle
- gzip
- xml.etree.ElementTree
- numpy, scipy, matplotlib, pandas
- scikit-learn
- configparser
- struct, copy
- pprint
- timeit
- mechanize, selenium
Wrap network requests into classes
From this reddit thread, a way to wrap requests to the requests
module:
class RequestError(Exception): pass def _try_page(url, attempt_number=1): max_attempts = 3 try: response = requests.get(url) response.raise_for_status() except (requests.exceptions.RequestException, socket_error, S3ResponseError, BotoServerError) as e: if attempt_number < max_attempts: attempt = attempt_number + 1 return _try_page(url, attempt_number=attempt) else: logger.error(e) raise RequestError('max retries exceed when trying to get the page at %s' % url) return response
Fuzzy search with regexps
From this interesting article, how to implement a "Fuzzy Search" (like the ones in Sublime Text or vim's Ctrl+P) with python and regexps. The idea is to convert the requested string (abc) into a regexp (a.*b.*c.*) and then sort the results by the length of the match group:
>>> collection = ['django_migrations.py', 'django_admin_log.py', 'main_generator.py', 'migrations.py', 'api_user.doc', 'user_group.doc', 'accounts.txt', ] >>> import re # regex module from standard library. >>> def fuzzyfinder(user_input, collection): suggestions = [] pattern = '.*'.join(user_input) # Converts 'djm' to 'd.*j.*m' regex = re.compile(pattern) # Compiles a regex. for item in collection: match = regex.search(item) # Checks if the current item matches the regex. if match: suggestions.append((len(match.group()), match.start(), item)) return [x for _, _, x in sorted(suggestions)] >>> print fuzzyfinder('mig', collection) ['migrations.py', 'django_migrations.py', 'main_generator.py', 'django_admin_log.py']
Use UTF8 with pymysql
conn = pymysql.connect(host='localhost', user='username', passwd='password', db='database', charset='utf8')
And when each cursor is created:
cur.execute('SET NAMES utf8;') cur.execute('SET CHARACTER SET utf8;') cur.execute('SET character_set_connection=utf8;')
</code>