- Libraries
- Regexen
- Data structures
- Control structures
- Modularisation
- Callables
- Objects
- Persistance
- Tuning
for line in open(file): (my, list) = mystring.split() or mystring.split(',') [ func(x), func2(x) for x in list if x > cond ]
Libraries
from libname import *from libname import finc1, func2
import libname
Regexen
import re m = re.compile(r'^16_est(\d+)').search(s) # r'foo' is noninterpolated raw string m = re.search(r'^16_est(\d+)', s) # implicit compile print m.group(0) # group 0 is whole match, parenthesis groups start from 1 substituted = re.sub(pattern, repl, string[, count]) list = re.split(pattern, string)
- search returns a MatchObject or None if there was no match.
- sub returns the substituted string. repl can be a function.
- split default splits on whitespace.
Data structures
Sequences (tuple, list, string)
Lists, tuples and strings are all sequences and can be accessed via slicing. Lists [] are mutable, tuples () and strings '', "" are not.Initialize empty lists with list = [].
In slices a[x:y] indexes start from 0:
- a[1:3] are the elements with index 1 and 2
- a[2:] all from (and including) the third.
- a[:3] all until (and including) the third
- a[-1] the last
- a[:-2] all exept of the last two
- a[-3:] the last three
Other built-in functions for sequences: len(s) min(s) max(s) del s[1:3] for x in s:
Cool functions on sequences, for functional programming:
- lambda, lambda functions may only contain one expression (but are still cool for simple anonymous functions)
- map, map is like in perl
- apply,
- filter, filter is perl's grep
- reduce, reduce applies a funcion to the first two items of a list, then to the result and the third and so on, ideal for summing up.
- zip, allows looping over multiple lists in parallel, by interleafing them into tuples.
Dictionaries (dict)
- Initialize empty dicts: dict = {}
- Loop over keys: for x in dict:
- Remove a key-value pair with: del dict[key]
- Test for key: if 'key' in dict
- Count keys (not all items): len(mydict)
- Just the keys: dict.keys()
- Just the values: dict.values()
- (key,value) pairs: dict.items()
- Get a value, or, if there is no such value, set it: mydict.setdefault('key', 'defaultvalue')
Strings
Type conversion to string: Enclose in "`" or use str(). This switches of interpretation of escaped characters when done on a string. Formated printing with print "%s ... %s" % (s1, s2). If you do not want the auto-appended newline, append a comma. raw strings (without escape interpolation with r"rawsting".Control structures
Syntax:- Conditionals: if, elif, else
- Loops: for x in seq:, while
- Loop control: break, continue
- Empty statement: pass
- Exceptions:try:, except FooError:, else: and raise
Cool expressions for conditions: in checks if an item is in a list or a key in a dict.
Operators: ++ and -- are missing
Functions and Methods
Functions may not have the same name as data fields in classes, each member need a unique name, or you end up with a 'str' object is not callable error. A function definition must have been parsed before its call, so you cannot call a function that is defined later in the same file.Parameter passing: all parameters are passed by reference. Of course immutable objects cannot be changed, so they might just as well be by value. You can assign other objects to the paramter names inside the called function without consequences. When calling methods without parameters, remember to put the parentheses behind the method: object.method(), otherwise you get the method object back, instead of calling it. Argument syntax for caller func(value) or named func(name=value), for definition def func(name) or optional args: def func(name=default) for defaults, def func(*name), def func(**name) to take rest of args into list or hash.
Names in functions have local scope, overriding globals with the same name. To use a global as such, declare it again inside the function with global theName. Variables are searched LGB (local, global, built-in). If the local fails, it looks through enclosing local scopes, too. Note that the class scope of a class inside a module is neither enclosing local, nor global, for the classes methods. Therefore, imports at class level are not seen in the methods.
A gotcha: If you only reference it, a global variable that is not locally defined is searched and found as a global, and no exceptions will be thrown. But if you later in a function assign a value to a global var, it is interpreted as a local. This will cause references to the var before that point to thow exceptions. You must declare it as global in this case. Built in names is stuff like len(), open etc.
join und other string functions can be called as methods of the string in question (better than importing the string module, i.e. string.join() or to string objects.
lambda anonymous functions may only contain a single expression. They are not real closures. (Sniff.)
Documentation
Phyton comes with built-in documentation support in the form of docstrings. the pydoc tool can be used to automatically extract this documentation. A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the __doc__ special attribute. By convention, docstrings use triple quotes """. One-liners have the quotes all on the same line and end with a period. They should contain an explanation, not a restatement of the pythion code, because you can get the paramters and member names via introspection.Modularisation
In Python there are three levels of bundling things. On the basic level, you have the class, wich bundles its methods. This class can reside in a file together with other classes, a so called module. The file, or module, is the second level of packaging. This is very much like a Package in Perl.Modules are namespaces. Loaded modules are objects of type module, but not classes. You can access names defined in them via modulname.__dict__ or dir(). If you want to import MyClass, it is not enough to put it into a file called MyClass in the import path and use import MyClass. This will only import the module, not the class object from the module, and you'll get a 'module' object is not callable error. Instead use from MyClass import MyClass, or reference it as x = MyClass.MyClass().
Modules have additional attributes like __author__ __builtins__ __date__ __file__ __name__ .
For larger projects, putting everything in the project into one file will not do. So you create a bunch of files and put them into a common directory, each file/module representing one larger logical part of your application, and add a __init__.py file to make that directory a package. The directory, or package, is the third level of packaging.
For really large projects, you can even create hierarchies of directories/packages, with each directory holding modules pertaining to a certain logical part of your application.
For example, you start with a simple app, MyApp, all in one file. When you realize it is too big, you split it into several modules, in a MyApp directory, lets call them parser, config, viewer and engine. When you realize, each of them in turn is getting too big again, you turn it into a director, for example the parser directory contains various parsers for the various file formats.
Persistance
cPickle is the module to serialize arbitrary datastructures as ASCII text or in binary form.shelve is a dbm-based approach that creates persistent hashes, where the values can be any python object. The pickled version of this object must abide to the limitation of the dbm system. Of course you can also use dbm or gdbm directly.
MySQLdb, which conforms to the Python DB API interface:
>>> import MySQLdb >>> db=MySQLdb.connect(user='bioinfo',host='biserv',passwd='',db='yoh') >>> c = db.cursor() >>> table = "bd_method" >>> c.execute("select * from %s" % (table)) 2L >>> c.fetchone() ('BLAST',) >>> c.fetchone() ('manual',) >>> c.fetchone()
Debugging, Profiling
Creating the profile fileimport profile profile.run('foo()', 'profile_filename')Evaluating the file (best done interactively in interpreter):
import pstats p = pstats.Stats('profile_filename') p.strip_dirs().sort_stats('time').print_stats(10)strip_dirs() removes the pathnames from the names.
sort_stats('time') sotes stats in decreasing order of time used by each routine. Among other possibilities are 'calls' and 'name'.
print_stats(10) will print the top ten of the sorted list. print_stats('substr') will print the stats for functions whose name contains substr. Both filters can be used ('substr', 10) and are applied in order.
Objects, Types, Classes
Namespaces: module global (including modules __builtin__ for built-in functions, and __main__ for the default module that is used when invoking the python interpreter, for example by invoking a script), function local. Scope is seareched block-local, function-local (ascending through enclosing functions), module global, builtins.Everything is an object, even types are type objects. And objects are instances of a type. Weird circular definition. The most general type is object.
dir() on a class or module gives all it's and it's superclasses __dict__ members, as it does for a module. On an instance it shows all its and inherited variables and members. Without argument it show the local variables in scope. dir replaces the deprecated __methods__ and __members__ attributes.
Types
Python has multiple inheritance, therefore no interfaces (like in Java).All types have a few common attributes:
- __doc__ a documentation string describing the type.
- __dict__ a list of all attributes of the object
- __class__ and __bases__ the class and list of base classes
- __setattr__ and __getattribute__ are called when you try to access an attribute that is not defined
- __hash__ __init____new__ __reduce__ __repr__ __str__ __delattr__ are built-in special attributes that are bound to wrappers for assignment, hashing and built in functions like str(), reduce(), del. They can be overwritten
- you can pass any object that can perform the operations (has the members to call), no matter what class it is. It is generally expected in python that you do not "look before you leap" (LBYL), but practice "it's easier to ask forgivance than permission" (EAFTP), meaning you do not check the type before, but handle exceptions that are thrown when an object doesn't support the required operation
- The is no way to overload methods with the same number of arguments. You can only override (for example the builtins)
- Class-wide variables are only initialized once (when the class is loaded). To init for every instance, put initialisation into the __init__() method
type(x) returns x's type. You can also do type(object) is type. x.__class__ is the same as type(x) for instances.
Classes
Whats the difference between types and classes? Basically, types are built in, and may represent things that cannot be instantiated, whereas classes are user (or library) defined. They are both of type 'type'. Classes can have some additional attributes:- __slots__ contains the list of legal member names in the classes __dict__ (normally you can create members just by assigning to them)
The function inspect.getmembers(object[,predicate]) returns a list of all members if the object. Everything in python is an object, like object itself, even types and classes are.
Mutable sequence objects like lists also special attributes for indexing and slicing [:], and for the operators <,>,==,!=,>=,<=,+,* as well as built in methods like len() or object specific methods like append or remove. The distribution between object methods and built in functions is unfortunately arbitrary. There is many, many more special attributes for dictionaries, strings and various number types. You can see the attributes for any object by executing
import inspect me = inspect.getmembers(put your object here) for n, v in m: print n,"=>",v