Find out all about python at
the python
website.
Some useful idioms:
for line in open(file):
(my, list) = mystring.split() or mystring.split(',')
[ func(x), func2(x) for x in list if x > cond ]
Libraries
from libname import *
from libname import finc1, func2
import libname
Regexen
import re
m = re.compile(r'^16_est(\d+)').search(s) # r'foo' is noninterpolated raw string
m = re.search(r'^16_est(\d+)', s) # implicit compile
print m.group(0) # group 0 is whole match, parenthesis groups start from 1
substituted = re.sub(pattern, repl, string[, count])
list = re.split(pattern, string)
- search returns a MatchObject or None if there was no
match.
- sub returns the substituted string. repl can be a
function.
- split default splits on whitespace.
Regex syntax is like in perl.
Data structures
Sequences (tuple, list, string)
Lists, tuples and strings are all sequences and can be accessed via
slicing. Lists [] are mutable, tuples () and strings '', "" are not.
Initialize empty lists with
list = [].
In slices a[x:y] indexes start from 0:
- a[1:3] are the elements with index 1 and 2
- a[2:] all from (and including) the third.
- a[:3] all until (and including) the third
- a[-1] the last
- a[:-2] all exept of the last two
- a[-3:] the last three
In lists
list.remove(item) remove item,
del list[index]
remove item at index.
Other built-in functions for sequences:
len(s) min(s) max(s) del
s[1:3] for x in s:
Cool functions on sequences, for functional programming:
- lambda, lambda functions may only contain one expression (but
are still cool for simple anonymous functions)
- map, map is like in perl
- apply,
- filter, filter is perl's grep
- reduce, reduce applies a funcion to the first two items of a
list, then to the result and the third and so on, ideal for summing
up.
- zip, allows looping over multiple lists in parallel, by
interleafing them into tuples.
Even cooler are list comprehensions like
[(x,x*2) for x in range
(1,11) if x % 2 == 0]
Dictionaries (dict)
- Initialize empty dicts: dict = {}
- Loop over keys: for x in dict:
- Remove a key-value pair with: del dict[key]
- Test for key: if 'key' in dict
- Count keys (not all items): len(mydict)
- Just the keys: dict.keys()
- Just the values: dict.values()
- (key,value) pairs: dict.items()
- Get a value, or, if there is no such value, set it:
mydict.setdefault('key', 'defaultvalue')
Strings
Type conversion to string: Enclose in "`" or use str(). This switches of
interpretation of escaped characters when done on a string. Formated
printing with
print "%s ... %s" % (s1, s2). If you do not want the
auto-appended newline, append a comma. raw strings (without escape
interpolation with r"rawsting".
Control structures
Syntax:
- Conditionals: if, elif, else
- Loops: for x in seq:, while
- Loop control: break, continue
- Empty statement: pass
- Exceptions:try:, except FooError:, else: and
raise
Truth: empty lists, dictionaries, strings, the number zero and
None (the undefined, void object) are false. Everything else is
true. String comparison with ==, !=. None is smaller than anything except
None.
is checks for object identity (two pointers to the same
object.)
Cool expressions for conditions:
in checks if an item is in a
list or a key in a dict.
Operators: ++ and -- are missing
Functions and Methods
Functions may not have the same name as data fields in classes, each
member need a unique name, or you end up with a
'str' object is not
callable error. A function definition must have been parsed before its
call, so you cannot call a function that is defined later in the same
file.
Parameter passing: all parameters are passed by reference. Of course
immutable objects cannot be changed, so they might just as well be by value.
You can assign other objects to the paramter names inside the called
function without consequences. When calling methods without parameters,
remember to put the parentheses behind the method:
object.method(),
otherwise you get the method object back, instead of calling it. Argument
syntax for caller
func(value) or named
func(name=value),
for definition
def func(name) or optional args:
def
func(name=default) for defaults,
def func(*name), def
func(**name) to take rest of args into list or hash.
Names in functions have local scope, overriding globals with the same
name. To use a global as such, declare it again inside the function with
global theName. Variables are searched LGB (local, global,
built-in). If the local fails, it looks through enclosing local scopes, too.
Note that the class scope of a class inside a module is neither enclosing
local, nor global, for the classes methods. Therefore, imports at class
level are not seen in the methods.
A gotcha: If you only reference it, a global variable that is not locally
defined is searched and found as a global, and no exceptions will be thrown.
But if you later in a function assign a value to a global var, it is
interpreted as a local. This will cause references to the var before that
point to thow exceptions. You must declare it as global in this case. Built
in names is stuff like
len(), open etc.
join und other string functions can be called as methods of the string in
question (better than importing the string module, i.e.
string.join() or to string objects.
lambda anonymous functions may only contain a single expression.
They are not real closures. (Sniff.)
Documentation
Phyton comes with built-in documentation support in
the form of docstrings. the
pydoc tool can be used to automatically
extract this documentation. A docstring is a string literal that occurs as
the first statement in a module, function, class, or method definition. Such
a docstring becomes the
__doc__ special attribute. By convention,
docstrings use triple quotes """. One-liners have the quotes all on the same
line and end with a period. They should contain an explanation, not a
restatement of the pythion code, because you can get the paramters and
member names via introspection.
Modularisation
In Python there are three levels of bundling things. On the basic level,
you have the class, wich bundles its methods. This class can reside in a
file together with other classes, a so called module. The file, or module,
is the second level of packaging. This is very much like a Package in
Perl.
Modules are namespaces. Loaded modules are objects of type module, but
not classes. You can access names defined in them via modulname.__dict__ or
dir(). If you want to import MyClass, it is not enough to put it into a file
called MyClass in the import path and use
import MyClass. This will
only import the module, not the class object from the module, and you'll get
a
'module' object is not callable error. Instead use
from
MyClass import MyClass, or reference it as
x =
MyClass.MyClass().
Modules have additional attributes like
__author__ __builtins__
__date__ __file__ __name__ .
For larger projects, putting everything in the project into one file will
not do. So you create a bunch of files and put them into a common directory,
each file/module representing one larger logical part of your application,
and add a __init__.py file to make that directory a package. The directory,
or package, is the third level of packaging.
For really large projects, you can even create hierarchies of
directories/packages, with each directory holding modules pertaining to a
certain logical part of your application.
For example, you start with a simple app, MyApp, all in one file. When
you realize it is too big, you split it into several modules, in a MyApp
directory, lets call them parser, config, viewer and engine. When you
realize, each of them in turn is getting too big again, you turn it into a
director, for example the parser directory contains various parsers for the
various file formats.
Persistance
cPickle is the module to serialize arbitrary datastructures as
ASCII text or in binary form.
shelve is a dbm-based approach that creates persistent hashes,
where the values can be any python object. The pickled version of this
object must abide to the limitation of the dbm system. Of course you can
also use
dbm or
gdbm directly.
MySQLdb, which conforms to the Python DB API interface:
>>> import MySQLdb
>>> db=MySQLdb.connect(user='bioinfo',host='biserv',passwd='',db='yoh')
>>> c = db.cursor()
>>> table = "bd_method"
>>> c.execute("select * from %s" % (table))
2L
>>> c.fetchone()
('BLAST',)
>>> c.fetchone()
('manual',)
>>> c.fetchone()
Debugging, Profiling
Creating the profile file
import profile
profile.run('foo()', 'profile_filename')
Evaluating the file (best done interactively in interpreter):
import pstats
p = pstats.Stats('profile_filename')
p.strip_dirs().sort_stats('time').print_stats(10)
strip_dirs() removes the pathnames from the names.
sort_stats('time') sotes stats in decreasing order of time used
by each routine. Among other possibilities are 'calls' and 'name'.
print_stats(10) will print the top ten of the sorted list.
print_stats('substr') will print the stats for functions whose name
contains substr. Both filters can be used
('substr', 10) and are
applied in order.
Objects, Types, Classes
Namespaces: module global (including modules __builtin__ for built-in
functions, and __main__ for the default module that is used when invoking
the python interpreter, for example by invoking a script), function local.
Scope is seareched block-local, function-local (ascending through enclosing
functions), module global, builtins.
Everything is an object, even types are type objects. And objects are
instances of a type. Weird circular definition. The most general type is
object.
dir() on a class or module gives all it's and it's superclasses
__dict__ members, as it does for a module. On an instance it shows
all its and inherited variables and members. Without argument it show the
local variables in scope. dir replaces the deprecated
__methods__
and
__members__ attributes.
Types
Python has multiple inheritance, therefore no interfaces (like in
Java).
All types have a few common attributes:
- __doc__ a documentation string describing the type.
- __dict__ a list of all attributes of the object
- __class__ and __bases__ the class and list of base
classes
- __setattr__ and __getattribute__ are called when you
try to access an attribute that is not defined
- __hash__ __init____new__ __reduce__ __repr__ __str__
__delattr__ are built-in special attributes that are bound to
wrappers for assignment, hashing and built in functions like str(),
reduce(), del. They can be overwritten
There is no type checking of arguments. That means:
- you can pass any object that can perform the operations (has the
members to call), no matter what class it is. It is generally expected in
python that you do not "look before you leap" (LBYL), but practice "it's
easier to ask forgivance than permission" (EAFTP), meaning you do not
check the type before, but handle exceptions that are thrown when an
object doesn't support the required operation
- The is no way to overload methods with the same number of arguments.
You can only override (for example the builtins)
- Class-wide variables are only initialized once (when the class is
loaded). To init for every instance, put initialisation into the
__init__() method
tuple, list, dict, file, int, float, str, property are all built
in types and can be inherited and instantiated. Some built in object types
that are not directly instantiable are
module, class, method, function,
traceback, frame, code, builtin.
type(x) returns x's type. You can also do
type(object) is type.
x.__class__ is the same as
type(x) for instances.
Classes
Whats the difference between types and classes? Basically, types are
built in, and may represent things that cannot be instantiated, whereas
classes are user (or library) defined. They are both of type 'type'. Classes
can have some additional attributes:
- __slots__ contains the list of legal member names in the
classes __dict__ (normally you can create members just by
assigning to them)
foo = staticmethod(foo) creates a static (class)-method foo that does not
need a reference to self. There's also a weired classmethod, that furnishes
a reference to the calling class.
The function
inspect.getmembers(object[,predicate]) returns a
list of all members if the object. Everything in python is an object, like
object itself, even types and classes are.
Mutable sequence objects like lists also special attributes for indexing
and slicing [:], and for the operators <,>,==,!=,>=,<=,+,* as
well as built in methods like len() or object specific methods like
append or
remove. The distribution between object methods
and built in functions is unfortunately arbitrary. There is many, many more
special attributes for dictionaries, strings and various number types. You
can see the attributes for any object by executing
import inspect
me = inspect.getmembers(put your object here)
for n, v in m: print n,"=>",v