Bioinfoxy: tool

Showing posts with label tool. Show all posts

2013-02-20

Really useful Word Shortcuts

From Microsoft for the German keyboard:

Strg-Alt-c: Copyright Symbol
Strg-Alt-r: Registered Trademark Symbol
Strg-Alt-t: Trademark Symbol

Strg-Alt-f: Footnote
Strg-Alt-d: Endnote

Shift-Strg-f: bold

Shift-Strg-k: italic

Alt-Shift-O: add to table of contents

References: Define any text as a bookmark, to refrence it.

F9: update all refs in selection (so Alt-A, F9: update all refs in document)

Strg-Return: Page break

2012-09-04

Tools

"Ein Mann, der recht zu wirken denkt,
muss auf das beste Werkzeug halten."
- J. W. Goethe

This page holds a list of useful tools, mostly free software with short descriptions. Some of the non- free stuff is things that I used.

Shell

One of the main problems when developing on Windows and Unix at the same time is the different shells. Therfore Cygwin has to be the best thing since sliced bread. Cygwin allows you to use the Unix toolkit (make, less ..) and command names (rm ..) on your Win32 box. I'm not using the Bash shell that is supplied with it, though, but the normal "MS-DOS" shell with filename completion active (by setting HKCU/software/microsoft/command_processor/CompletionChar to 9) although it's not as smart as the Unix completion, it just grabs the first match it finds, not asking for ambiguous ones. For testing it might be worth to use bash and avoid to learn DOS-Style. A comparison between syntax and toolnames for Windows/Unix shells.

Editors

Your editor is where you will spend a lot of your time at the keyboard - or maybe all of it if you are using Emacs. GNU Emacs. This is my editor. Probably the most powerful editor of them all, because it is fully extendable and programmable. For every need there is a specialized mode - you can see some that I use here. Unfortunately it has a steep learning curve and other key-bindings than standard Windows. Vi. The Unix editor. This comes with almost every Unix flavor, but is even more arcane to use than Emacs and not as easily customized. UltraEdit. This shareware editor is the best I have seen on Windows to date. It is extremely intuitive, user-friendly, configurable yet mighty. It has syntax coloring and a hex-mode, plus all the standard Shortcuts, Find in files, Regex, File-Compare etc. TextPad is also nice.

IDE/CASE Tool

Graphical CASE tool usually have an integrated default editor which you might be able to switch to your favourite one. They integrate your work needs for editor, compiler, program execution, reference and debugging. And write some of the code for you. The single step debugging is the main reason I'd use one. These tend to be commercial. JPadPro is shareware and the best and most honest IDE for Java I have seen to date. Just uses the JDK, does not do some voodoo behind the scenes you don't control, easily configurable, with comment folding and a very nice tree for your packages/classes. The only disadvantage is that you cannot use another editor. Nowadays, Eclipse is probably by far the most popular and powerful IDE platform, not only for Java -- and it is free.

Languages

Perl, the 800-pound gorilla of scripting languages. For small works glueing several programs together there are many mighty tools on Unix, like awk and sed. I use Perl to avoid learning them all, as you can do it all and more in Perl. It's also the favourite skripting language for CGI. And it's available on Windows, too.

Java. SUNs javac, java is object-oriented and portable with ample GUI support. It also used to be slow, but apparently not any more. I used to code Java a lot during my Ph.D. work.

Python. Python is a hell of a cool language. A fully object-oriented script language, complete with regular expressions, ultra-clean syntax and many built-in lisp like functions it combines the best of Java, Perl and Lisp into one neat package. Here are some notes.

The shells. Often a small shell script can do what a perl script could. I find the syntax of sh annoying, so I have a cheat sheet.

Lisp/Scheme. It's built into Emacs, so knowing a little is useful if you use Emacs.

Source Control

Source control makes it possible to work with several people on the same source tree, or to back up to a state where everything worked. Together with a bug tracking/change request software, you can develop in a coordinated way. A common system is CVS, which recently has been replaced more and more by subversion.

git on github is another popular, and more modern tool. I feel it is more complicated to use than CVS, but it has some advantage in making it easier to create and merge branches, and through this, it allows you a review step before adding code to the central repository, because you can make a branch, make changes, then ask a maintainer to merge it in (even when you have no rights to do so), and he will be able to diff and see what you did before it goes in.

Documentation

Good documentation is important, because it helps you to think about your code and helps you to understand it later. I use normally what the language supplies as default. Javadoc with Java, or Perl's POD (plain old documentation) are both O.K.

Build

The good thing about dedicated build tools instead of plain scripts is that they allow you to enter at different places in the process without commenting out code or changing your script, and that they figure out everything that needs to be re-done based on explicit dependencies.

make. There are multiple versions of this tool, the normal make, GNU make (which is the most widely used) and under Windows NT nmake. Since you can use make instead of nmake with Cygwin, and Windows has stuff like InstallShield, its probably not worth learning nmake. There are big books about make, but for 80% of your needs you can get away with 20% of the syntax. Here is a simple annotated example of a makefile. I actually only use it under UNIX - it's more of a UNIX tool I guess. Nowadays, Ant seems to be the build tool of choice for Java types. I did not find it offering much of an advantage, forcing you to write Java classes.

Debugging, Bug Tracing, Profiling

Each language tends to have it's own tools. The default profiler that goes with perl is pretty ok. I usually keep to Kernighan's advice and think about my data and how the error can occur, than use a few interspersed print statements to track it down.

Testing

Here, you have to construct your own test cases and scaffold. The main point is that you have to automate this, or you will not do it enough. To automate it, you have to write test input and output files and skripts that compare the the output of a new version to the prepared one, to ensure nothing has changed. (Regression testing). Your test skript should be silent as long as everything is ok, only complain if errors are found. To do this you have to learn a bit of shell programming, mainly conditionals, loops, file comparison and length check and testing if a programm returned the ok signal. Modern languages like Python have embedded unit test support.

Design and Modeling

I used Rational Rose, a commercial tool now owned by IBM for object oriented modeling in UML (and Booch etc, if you're so inclined).

2012-09-03

Make Refcard

You can get all the info on this page in more detail from man make or from the GNU Make Manual. A nice intro to makefiles, that works through one example in detail (and is not focusing on compiling C programs) can also be found on jfranken.

Overview

make is a tool to automate rebuilding of files that are dependend on other files. The dependency rules are written in a Makefile. Rules consist of targets that depend on so-called prerequisites. Whenever any of the prerequisites changed, the target will be rebuilt, using shell commands given by the rule (or using global commands given for the involved name patterns, if no specific rule was found). Since a prerequisite can be the target for another rule, you can build large trees of prerequisites, and make can infer, which files to update, if some file in the tree changed.

Comments

A # at line start turns the line into a comment line that will be ignored by make. Use backslashes \ at the end of a line to split a long line into several lines on screen. Lists are spearated by white space.

Targets

Normally, make expects the targets and prerequisites to be files, so it can compare their timestamps to decide if one of the prerequisites is newer than the target, and the target has to be updated using the rule.
If you specify a target for which no file exists (a so called "phony" target), there is no timestamp to compare, and the rule will always be run. You can also treat a target for which a file exists as phony (forcing its rule to be always run), by assigning it to the .PHONY special name. Also, a target which ultimately depends on phony targets can not compare timestamps, and will always be run.

Special names

Special names are target names that make make treat their prerequisites in a special way. They are all starting with a dot and written in upper case, and there aree many more than the ones listed here.
.PHONY prerequisites of .PHONY are considered to be phony targets. make will run its commands regardless of whether a file with that name exists or what its last-modification time is.
.IGNORE prerequisites for .IGNORE ignore errors in execution of the commands run for those particular files.
.EXPORT_ALL_VARIABLES this tells make to export all variables to child processes. It is more of a general option and uses no prerequisites.

Prerequisites

Here is an example of prerequisites.

For this in one-lined notation, you will get the following Makefile:

# target   prerequisite             commands
# ----------------------------------------------------
house:     roof plumbing electrics; @echo $@; touch $@
plumbing:  pipes basement;          @echo $@; touch $@
roof:      walls;                   @echo $@; touch $@
electrics: walls wires;             @echo $@; touch $@
walls:     basement bricks;         @echo $@; touch $@
pipes:     ;                        @echo $@; touch $@
basement:  ;                        @echo $@; touch $@
bricks:    ;                        @echo $@; touch $@
wires:     ;                        @echo $@; touch $@

The touch command serves to create the targets as files, so that when you ran make plumbing, and then make house, pumbing, pipes and basement would already exist and not have to be build again.

Commands

Commands can be any shell commands. @ at the beginning of a command means "dont print the line before executing", - means "dont exit on error".

Rules

Rules can be written in one line for short rules

target(s) : [prerequisites] [; shell-command(s)]

Or in the full format for more involved commands (note the TAB)

target(s) : [prerequisites]
[TAB shell-command]
[TAB shell-command]
...

Make starts with the first rule that has a name not starting with a dot, if not invoked for a specific rule. The other rules are processed because their targets appear as prerequisites of this goal, and so on. If some rule is not needed for this, that rule is not processed. If several targets are given for a rule, it's as if there were as many rules, each with one target.
You can split each rule into two parts:

An implicit rule stating the prerequisites, and
an explicit one for the commands.

Any rules in the last example lead to the same commands and differed in their prerequisites only. For those rules that have no prerequisites (e.g. bricks) you don't even need an implicit rule. The explicit ones can be pooled, because of their commands being all identic. Thus you get a shorter and pretty clear Makefile:

# An explicit rule assigns the commands for several targets
house plumbing roof electrics walls pipes basement bricks wires: ; @echo $@; touch $@

# Implicit rules state the prerequisites
house:     roof plumbing electrics
plumbing:  pipes basement
roof:      walls
electrics: walls wires
walls:     basement bricks

Macros

Use = or := to assign values to variables (so called macros), depending on if potentially contained variables and functions should be expanded at using or declaration time. Fringe space is stripped. Macros assigned with = must be declared above any uses, or they will still be empty.
To retrieve the stored value, write $(myvar). Macros are expanded by substituting the assigned values textually for the name. To have make executing the value (like a function), write $(call myvar).

Patterns

Often you have long lists of files that all have a similar form (similar extensions, names, etc), and have to be processed in the dsame way. In this case, you will not want to write a rule for every single file, what you want is a rule that says "for files that look like that, do this". Patterns allow you to do this. A target pattern is composed of a `%' between a prefix and a suffix, either or both of which may be empty. For example:

%.class: %.java; javac $<

The way it works is that any name that matches the target pattern will invoke the rule. The part of the name that matched the wildcard will be substituted for the wildcard in the prerequisites.
Pattern rules may have more than one target. Unlike normal rules, this does not act as many different rules with the same prerequisites and commands. If a pattern rule has multiple targets, `make' knows that the rule's commands are responsible for making all of the targets. The commands are executed only once to make all the targets.
The used to be special suffix rules. These are now superceded by pattern rules.

Functions

Often, simple patterns are not enough, and you will want to mangle filenames in various other ways. Here is a bunch of built-in funtions for this purpose.

`$(subst from,to,text)`	Replace `from` with `to` in `text`.
`$(patsubst pattern,replacement,text)`	Replace words matching `pattern` with `replacement` in `text`.
`$(strip string)`	Remove excess whitespace characters from `string`.
`$(findstring find,text)`	Locate `find` in `text`.
`$(filter pattern...,text)`	Select words in `text` that match one of the `pattern` words.
`$(filter-out pattern...,text)`	Select words in `text` that do not match any of the `pattern` words.
`$(sort list)`	Sort the words in `list` lexicographically, removing duplicates.
`$(dir names...)`	Extract the directory part of each file `name`.
`$(notdir names...)`	Extract the non-directory part of each file `name`.
`$(suffix names...)`	Extract the suffix (the last dot and following characters) of each file `name`.
`$(basename names...)`	Extract the base name (name without suffix) of each file `name`.
`$(addsuffix suffix,names...)`	Append `suffix` to each word in `names`.
`$(addprefix prefix,names...)`	Prepend prefix to each word in `names`.
`$(join list1,list2)`	Join two parallel lists of words.
`$(word n,text)`	Extract the `n`th word (one-origin) of `text`.
`$(words text)`	Count the number of words in `text`.
`$(wordlist s,e,text)`	Returns the list of words in text from `s` to `e`.
`$(firstword names...)`	Extract the first word of `names`.
`$(wildcard pattern...)`	Find file names matching a shell file name pattern (not a `%' pattern).
`$(error text...)`	When this function is evaluated, make generates a fatal error with the message `text`.
`$(warning text...)`	When this function is evaluated, make generates a warning with the message `text`.
`$(shell command)`	Execute a shell command and return its output.
`$(origin variable)`	Return a string describing how the make variable `variable` was defined.
`$(foreach var,words,text)`	Evaluate `text` with `var` bound to each word in `words`, and concatenate the results.
`$(call var,param,...)`	Evaluate the variable var replacing any references to `$(1),$(2)` with the first, second, etc. param values.

Variables

There are some predefined variables for use in rules (called dynamic macros, because they look a bit like macros and there contents are set dynamically during rule evaluation):

`$@`	The name of the target.
`$%`	The target member name, when the target is an archive member.
`$<`	The name of the first (or only) prerequisite.
`$?`	The names of all the prerequisites that are newer than the target, with spaces between them.
`$^ $+`	The names of all the prerequisites, with spaces between them. The value of `$^` omits duplicate prerequisites, while `$+` retains them and preserves their order.
`$*`	The stem with which an implicit rule matches.
`$(@D) $(@F)`	The directory part and the file-within-directory part of `$@`
`$(D) $(F)`	The directory part and the file-within-directory part of `$*`
`$(%D) $(%F)`	The directory part and the file-within-directory part of `$%`
`$(<D) $(<F)`	The directory part and the file-within-directory part of `$<`
`$(^D) $(^F)`	The directory part and the file-within-directory part of `$^`
`$(+D) $(+F)`	The directory part and the file-within-directory part of `$+`
`$(?D) $(?F)`	The directory part and the file-within-directory part of `$?`

Includes

When your project is large, a single giant makefile can become ratehr unwiedly. You can split your makefiles into several files and inline those during runtime. When make encounters an include-command, it will stop processing the current Makefile, read the included Makefile and then continue where it left off. If you don't want it to abort when the included Makefile's missing, just say -include Makefile(s) (the minus sign generallly make s make ignore errors).

shell syntax in make

Escaping variables in make: When using $ variables inside make (for shell commands, or Perl special vars) write $$ instead of $.
Make treats every line as running in a new subshell, and thus forgetting about the previous lines. This will shoot down shell scripts that have loops or if statements spanning several lines. So you ahve to put your whole conditional on one line. Remember, when writing one-line shell conditionals, you have to end every block and condition with a semicolon. So

for x in a b c do
    echo $x
done

becomes

for x in a b c;\
do\
    echo $$x;\
done

Example

# Example Makefile

sourcefiles = Main.java Gui.java Logic.java
compiler=javac
jc=$(compiler -warn)

all: $(sourcefiles) docs clean

# A phony target, not really the name of a file. It is
# just a name for some commands to be executed when you make an explicit
# request.

clean:
        -@ $(RM) *~
        -@ $(RM) *.class

# % is the wildcard char for targets or prerequisites (like
# in SQL), $< is the current prerequisite (points to the target on the
# left), $@; is the current target (looks a bit like a target for
# shooting)

%.class: %.java; $(jc) $<

Excel HOWTO

Number Format Syntax

Show number as thousands #,##0.0,

Show number as millions #,##0.0,,

Keyboard Shortcuts

CTRL-END: jump to last filled cell
CTRL-SHIFT-END: mark from point to last filled cell
CTRL `: switch view mode to show formulas instead of their result
F4: cycle through fixed and relative cell referencing modes
F2: switch to cell content editing

References

To a range A1:C5

To multiple ranges (A1:B2,B2:C4) (however, if used in fromulas, the () are ignored ... this does not create a single range, it creates a list, and the additional fields in the list will erronously be interepreted as the next parameters.

Arrays, Matrices, Vectors

Excel knows something called Array Formulas, Formulas that can return arrays (ie lists) of values. To enter one, enter the formula with SHIFT-CTRL RETURN (esoteric, I know); presence will be indicated by {} around the formula. If not done, only the first value is returned for further processing.On example is IF(), which can return an array of TRUE and FALSE values in array mode.

The syntax to enter arrays as constants (in English language settings) is {1;2;3} for columns, and {1,2,3} for rows.

One can define named ranges, and the defintion can consist of formulas instead of just cell references.

Conditional Formatting

To refer to the current cell in formulas there, use the relative name of the upperleftmost cell, it automatically converts to each current cell. NOTE: all formulas must start with =

Lookups

VLOOKUP - easy if key is on the left hand column, always set false as last argument

https://www.ablebits.com/office-addins-blog/2014/08/13/excel-index-match-function-vlookup/

INDEX(array, row_num, [column_num])

MATCH(lookup_value, lookup_array, [match_type]) type = 0 for exact lookup

allows for bi-dimensional lookup, and does not need the key column first.

Find all Cells with a formula, or all cells with hardcoded values: select a range of cells, CTRL-G for GoTo, select special and Formulas or Constants. Matching cells in Range will be selected. Unfortunately this is a manual process, not a programmable approach that just formats these cells in a sheet dynamically.

Two-way-lookup to find cells where the column and row header match given values: =sumproduct((vrange=matching-value)*(hrange=matching-value), matrix)
where vrange, hrange and matrix are ranges in the spreadsheet. The value can either be an indvidual cell (probably in absolute address), or a fixed string (in double quotes).

This works as range=matching returns a vector of FALSEs, with TRUEs where matches are, and the multiplication of the vectors returns a matrix with a 1 only where there was a TRUE in both vectors.

You must cast the results of a range=match comparison to turn TRUE/FALSE into 1/0. This can be done by an arithmethic operation, for example (range=test)*1. Otherwise, functions like sumproduct will not be able to calculate with them.

Logic

Numeric tests (A1>3 etc) treat blank fields as 0s. Ie if A1 is blank, a test >3 will be false, =0 true.
While AND and OR cannot directly deal with Arrays, NOT can, and will invert an array.

MATCH (VERGLEICH) find a search term in a list of cells, returning its index. Set the third argument to 0 to force exact matches.

INDEX: return the index value from an array (or range) at a given row (and column index).

OFFSET: return a range relative to a given cell, offset row and column, row and colum size

COLUMN (SPALTE) and ROW (ZEILE) return the column or row index of a reference. If none is provided, uses the ones from the cell the formula is in.

ADDRESS (ADRESSE) construct an address reference from column and row indices. (The inverse of the two functions above).

CELL (ZELLE) look up all kinds of properties for a cell you have at hand, such as column index in the spreadsheet, row index, value, format

INDIRECT (INDIREKT) if you have an address string, returns the contents of the corresponding cell.

VLOOKUP (SVERGLEICH) and HLOOKUP (WVERGLEICH) .. the closest thing you will get to an actual table query, ... another time.

Pivot Tables

You can add derived, calculated fields (instead of providing them explicitly as extra calculated columns in the source table)

Emacs Refcard

The problem with info files about emacs key shortcuts is that they either show you only the most basic stuff, and you will get stuck when you want to do something, or they show you all the goods, so you don't know what you need to know, and what you can forget about.
The commands here are presented in layers. What I need most often and what is useful for simple general tasks is at the top. More complicated stuff, like automation of tasks or language specific expansions are below.

Basic Keys

General
Undo	C-x u
Abort command	C-g
Shell
Buffer `shell`	M-x shell
Excecute command	M-!
Shell-command-on-region	M-\|
Rectangles (C-x r)
Kill rectangle	C-x r k
Paste rectangle	C-x r y
Insert('open') rectangle	C-x r o
M-x UP = last M-x
M-x = ESC (release) x, ALT (hold) x^
C-x = CONTROL (hold) x
S-x = SHIFT (hold) x

^ ALT is better, as you can hold it down
and just need to push one key to reuse a
meta command repeatedly.

Files (C-x)
Open File	C-x C-f
Open File in this Buffer	C-x C-v
Safe Buffer	C-x C-s
Quit Emacs	C-x C-c
Safe Buffer As...	C-x C-w
Windows (C-x)
One window	C-x 1
Split Window Horizontal	C-x 2
Split Window Vertical	C-x 3
Switch to other Window	C-x o
Buffers (C-x)
Goto Buffer	C-x b
Buffer List	C-x C-b
Kill Buffer	C-x k

Editing, Cut and Paste
Set Mark	C-SPACE
Cut & delete	C-w
Paste	C-y
Cut & delete rest line	C-k
Search and Replace
Search forward	C-s text
Search again	C-s C-s
Search backward	C-r text
Search regexp	M-C-s regexp
Goto line	M-g
Replace	M-%
Replace regexp	M-C-% regexp
Replace discard	n
Replace accept	y
Replace all rest	!

Intermediate Stuff

Moving
Move Word	C-arrow
Move Paren	M-arrow
Recenter sceen	C-l
Editing
Uppercase word/region	M-u / C-x C-u
Lowercase word/region	M-l / C-x C-l
Capitalize word	M-c
Transpose chars/lines	C-t / C-x C-t
Paste Clipboard	S-INS
Cut to Clipboard	S-DEL
Delete region	C-DEL
Previous yank	M-y
Keyboard Macros
Begin recording	C-x (
Stop recording	C-x )
Execute macro	C-x e
Edit last macro	C-x C-k
Various
Choose mode	`---perl---` in 1st line
Count chars	M-=
Evaluate lisp	M-:
Run shell command on region	M-\|
Insert file	C-x i
Mark buffer	C-x h
Prefix command	C-u
Expand last abbrev	C-x a e

Dired mode
Use C-u for R option to list dirs recursively.
Flag for delete	d
Delete	x
Mark/Unmark	m/u
Search in marked	A
Mark by regexp	%
HTML mode
Close tag	C-c /
Entity	C-c char
Tag region	C-z letter
Fold/Unfold element	C-c C-f/u C-e
SQL mode (M-x sql-oracle)
Copy region to SQL	C-c C-r
Copy Buffer to SQL	C-c C-b
BibTex mode
Article	C-c C-b C-a
Next field	C-j
Clean entry	C-c C-c
Field Help	C-c ?
Info mode
tutorial, command list	h, i
next, previous, up a node	n, p, u
next link, follow link	<TAB>, <RET>
back	l ("el")
quit	q
Search	s regex
CVS mode
open (other window)	f(o)
cvs diff	=
update	O
commit	c, msg C-c C-c
add, remove(careful!)	a,r
mark, unmark	m,u
dir status	M-s
log	l
AucTeX/RefTeX mode
outline (directory)	C-c =
section	C-c C-s
Font face to bold/italics	C-c C-f C-b/e

Help C-h
Apropos	C-h a
Info Reader	C-h i
Key Bindings	C-h k
Describe func	C-h f
Describe var	C-h v
Hard TAB	s-TAB

Terminal
If your special keys won't work.
Beginning of line	C-a
End of line	C-e
Right (forward)	C-f
Left (backward)	C-b
Up	C-p
Down	C-n
Pagedown	C-v
Pageup	M-v
Begin	M-<
End	M->
Forward word	M-f
Delete	C-d

Customizing

Useful stuff for the .emacs file
Syntax highlighting	(font-lock-mode 1)
Delete forward	(setq delete-key-deletes-forward 1)
Expansion on	(abbrev-mode 1)
Paren highlighting	(paren-set-mode 'sexp)
Show line number	(setq-default line-number-mode 1)
Define abbreviation	(define-abbrev sql-mode-abbrev-table "pl" "dbms_output.put_line()")
Define macro	(defalias 'sql-gutschrift (read-kbd-macro "macro description"))
Set key to anonymous function	(global-set-key [f1] '(lambda() (interactive) (insert-string "Hi")))
Set key shortcuts	(global-set-key [f3] 'fill-paragraph)

Installing libraries: Simply drop the .el modules into your load path. For finding out what that path is, evaluate load-path.
Installing info files: simply drop them into your info path. For finding out what that path is, evaluate info-directory-list. Usually its some info subdirectory of your emacs installation. Then you have to edit the root info file (called dir), or wherever you want to hook your new info file in, just look how the other links look like. I guess it's (filename) that makes the info reader look for the file. You also can use (add-to-list 'Info-default-directory-list "/my/path") to add further directories where emacs looks for info files. You still must edit the root info file.
Installing LaTeX for MikTeX: use AUC TeX. make or just follow instructions, adding the auctex folder into site-lib, and edit the .emacs file to include it into your load path. (add-to-list 'load-path "/path/to/site-lisp/auctex/") and (load 'tex-site). You must modify the calls in tex-mik.el, wich is called from tex-site.el to fit to your program calls.

Regexp syntax
Basically the syntax ist the same as for perl, except that whitespace and tab are not represented by \s and \t, and that the or-pipe and grouping parentheses have to be escaped.
C-q C-j	newline
C-q TAB	tab
.	any char except newline
*	0-n times, greedy
+	1-n times, greedy
?	0-1 times, greedy
*?, +?, ??	as above, non-greedy
[ ... ]	character set, ^ at start negates
^	beginning of line
$	end of line
\	escape
\\|	or
$ ... $	grouping
\b, \B	word-boundary, non word-boundary
\w, \W	word-char, non word-char
<SPACE>	any whitespace
<TAB>	tab
<\1>	backref to first group

Emacs reads DOS (end-of-line: \r\n) or Unix (end-of-line: \n) text files. It does not display the \r (or ^M) from DOS files, but marks tem with a \ in the mode line under DOS/Windows, and with (DOS) under other operating systems. (Under DOS Unix files are marked with (Unix) in the mode line.) Emacs is smart enough to write the file in the same way it was read. When you create a new file mit Emacs on Windows, it's a DOS type file. Which sucks when you save it via Samba to a Unix system, since it will contain ^M's. To save a buffer with Unix EOL format, type `C-x <RET> f unix <RET> C-x C-s'. Or add (add-untranslated-filesystem "Z:\\mydir") to your .emacs, wich telles Emacs to write files for that file system (e.g. your Samba share) with Unix newlines.

2012-08-22

Learning git: how to notes

How to examine a given file from a given commit?

http://stackoverflow.com/questions/610208/how-to-retrieve-a-single-file-from-specific-revision-in-git

How to compare to revisions of a file?
If the file is the same, you can just do
git diff revision_1 revision_2 file
How to find a text snippet in any revision of a file? (i.e. grepping the histroy tree of the file)
git log --oneline -S'text snippet'
(--online is opitional, to make it easier to read if there are many hits)

How to compare a local file with one in a remote repo? http://stackoverflow.com/questions/5162800/git-diff-between-cloned-and-original-remote-repository

1) Update your local copy of a remote:
git fetch foobar Fetch won't merge with your working copy (as opposed to pull). This gives you a local copy, which you can compare against with diff. If you have not set this remote up yet, do so with
git remote add foobar git://github.com/user/foobar.git
2) Compare any branch from your local repository or your working copy:
git diff master foobar/master

How to list all lines changed by a given author?
http://stackoverflow.com/questions/1265040/how-to-count-total-lines-changed-by-a-specific-author-in-a-git-repository
git log --oneline -p --author='Nirmal M G' | wc -l
can use -C option to ignore file renames as lots of add/removes.

Learning git: concepts

Git is not easy, because you either over-simplyfy, or quickly get sucked into a morass of intricate technical detail. What are core concepts to grok?

A repository ("repo") is where snapshots of your code are stored, and can be retrieved and compared against. Its right there, next to the code, on your local drive in a folder named .git. Commits will by default go to this local repository and commands will operate against this repository. There is no central "mothership", no master repo that many people contribute to and have to sync with, no master-slave relationsip, no "ground truth". At least not enforced.

You can however create remote repositories and syncronize with them, if you want "Upstream" in this context is a repo you trust and pull in changes from (they flow down to you). You can also push your changes to a "downstream" repo. By default your remote will be called "origin".

So in git you have:

a "working copy" these are files you edit, which may or may not be known to the repo already
a "staging area" (also "index"), of code tracked but not commited yet
your local repo
zero or more remote repos

A commit is a snapshot of a repository at a given point in time. Each commit has an unique id to refer to it (also called "sha1" after the algorithm used to calculate it) . The latest commit can be accessed with the symbolic name HEAD. Commits have a pointers to the commits they are based on.

A branch really is just a name for a pointer to a commit. In git, you are always on a branch, your current or checked-out branch. The default branch is named "master" by default. When you make a commit while on a branch, the branch pointer is moved to point at it. As it is associated with a name, you can use that name in commands. A tag is similarly a symbolic name for a pointer but it is not automatically moved as you commit, it is intended to stick durably to a given, specific commit.

(A commot is what would be called a revision in CVS, but it is not limited to a single file. If you want to work on only a single file in a given commit, you have to specify the commit and the full path to the file name, sepated by a :. If you want to compare your local copy of a file with the version in a remote repo, it gets even more complicated. You need to define that remote, fetch a local copy, and compare against that.)

A revision is the general term for versioned objects, be they files ("blobs"), directories ("trees") or commits.

https://www.kernel.org/pub/software/scm/git/docs/gitrevisions.html. Revisions can be referred to in the following ways:

The unique id, also called "sha1", after the algorithm to calculate it, or the first few chars of the id
A symbolic reference (called "ref") name for a commit, eg.

HEAD (of the currently active branch in the local repo)
A branch name (meaning the HEAD commit of that branch in the local repo)
A remote repo name (meaning the HEAD of its default branch)
A remote repo/branch name (meaning the HEAD of that)
A tag (an explicitly defined label)

A refspec is ...

As commits know their "parent", the commit (or, in case of a merge, commits) they are based on, ancestors and ranges can be applied ("~" also works in place of "^"):
^, ^^, ^^^ (etc): parent, grandparent, great-grandparent etc before a commmit, eg HEAD^
^1, ^2, ^3 (etc): first, second, third parent (in case of a merge).
..: all commits starting with one stated, eg HEAD~3.., or all betwen two stated eg HEAD~4..HEAD^

Default behaviour
The branch you are on is by default "master". An remote repo is by default called "origin", with "origin/master" as its main branch.

When working with branches and remotes to give just the branchname

Is there a common order if there are multiple revisions listed for a command?

git rebase rev1 rev2 ... will rebases rev2 onto rev1
git rebase rev ... will rebase your current branch onto rev: the missing value is defaulting to current
git pull rev1 rev2
git push n1 n2 ... n1 is the remote repo, n2 is your local branch.

git push n1 ... n1 is the remote repo, missing argument defaults to current branch

2012-04-10

Learning git: working with git

Finding out what is going on

git help explantion (in depth) on command; or just general help without

git status shows status of the files in the working directory and staging area compared to the local repo (changed, deleted, untracked). Also shows if a repo is out of sync from its remote repo.

git log
log shows commits ids with log messages. When no specifier is given, shows the whole commit history of the current branch. When a commit is given, it shows commits leading up to and including it. When a range symbol (..) is given, it instead show commits since then. If a range is given, shows commits from after the first, up to and including the second.
[path/file] only for this file
--since="1 week ago" limit by date
--until="yesterday" limit by date
-p with patch details
--author="Name" only by this author

git shortlog
show grouped by author commit messages for all commits

git show
show runs git log -p, for a single commit, by default the most recent one. if run on a file instead, will list its contents.
[] run for another commit.
[] run for a file in this commit.

git blame file
for each line gives information (author etc) from the revision which last modified it
does not tell you anything about lines which have been deleted or replaced

git diff show differences between two revisions. If you add a filepath, only for that file. By default, compares your:
git diff working copy vs staging area
git diff --staged staging area vs HEAD
git diff ... between working copy and commit
git diff --- between two branches

git diff ... between two commits

git ls-files
-d show deleted files

git ls-tree commit-id
lists all objects that are part of this commit (which are technically managed in a tree data structure, so the commit may contain blobs which correspond to files, and trees which correspond to directories).

git reflog
This shows all actions that have been done in your repository.

Working with branches (locally)

All changes apply by default to your current branch.

git branch lists branches, showing the current one
-r known remote branches
-a all branches, remote or local

git branch create a branch pointing to the current HEAD, but do not switch to it

git checkout switch to branch. Git resets your working directory to look like the snapshot of the commit you check out when you switch branches, changing, adding and removing files as needed. If you would lose uncommitted changes this way, git warns you.

git checkout -b to create and check out a new branch based on branch, in one go. Leaving out creates it based on the current branch
git checkout -f drop all changes in branch
git checkout drops changes in file
git checkout -- drop all changes in working directory
git checkout -- drop all changes in file

git merge

merge the named branch into the current branch. So to merge back a side branch, checkout the master, then merge in your branch. You can also merge in changes from the master to your checked-out branch.When you merge successfully, git automatic creates a new commit for the merge.

--squash take all the work and squash it into one no-commit on top of current branch

--no-commit do not automatically make a commit at the end (useful if you want to apply further changes to sqashed merge before committing)

Dealing with merge conflicts: if both branches changed a file, you will find markes of the alernate versions in it. You need to edit, and add the changed files to resolve them.

git branch -d delete a no longer needed branch after you merged it in

git rebase rebase your changes on the named commit. git figures out all the changes you made in your current branch compared to last common ancestor you had with it, then applying them to it. Typically you rebase onto a mothership. The advantage is that now adding your commit back into the mothership does not need a merge with multiple ancestors, as it is already based on the head of the mothership, and can be fast-forwarded.

Rebase will stop on each conflict. After resolving this conflict (editing files and running "git update-index" on them), "git rebase --continue" will continue the rebase. If you want to skip the conflicting commit, use "git rebase --skip". The whole rebase can be stopped and your branch restored to how it was by using "git rebase --abort".

Do not rebase commits that you have pushed to a public repository.

For each branch, you can define default remotes and branches on them to push/pull in the config file. Each branch has an own section there.

Working with remote branches and repositories

Remote branches point to a remote repo. To work with them, you first have to fetch into the local repo, where the branch name will be changed to the form remote/branch. You cannot change the remote branches, and? if the remote repo changes in since you fetched, they may be outdated. The local copies of the remote you can then merge.

git clone source creates local copy of a remote repo and set "origin" repo to it
git remote add add a new remote to your remotes

git remote -v show what remote repos are defined (-v: with details on fetch/push) show details about the remote

git fetch update local repo with new commits from remote.

git pull fetch changes from remote repo branch and merges them into your local one. Without branch into your current branch.
--rebase: instead of merging changes copy everything, then apply your local changes

git push pushes changes push to remote branch (create if needed), merging them there. The branch on the remote itself is just named branch, like the branch on the local repo. Only the local pointer to the remote branch is named remote/branch.
defaults to the current branch's remote, or origin if none configured.
defaults to all branches with matching names (this behaviour can be configured).
":" as branch explicitly pushes all matching branches
HEAD uses the current branch
localname:remotename to use a different remote branch(name) than the local

--all push all changes in the source repo, even if unkown in target
-f force overwrite in case of rebase, as new version of branch is not a child of old in remote

git fetch
fetch fetches changes from a remote repo, and updates the commit FETCH_HEAD in the local repo, without merging them into your local repo history, like pull would. This way you can examine them with git log master..FETCH_HEAD, and if they are fine merge them in with "git merge".

Working with remote branches

Some commands automatically

git fetch origin            # Updates origin/master
git rebase origin/master    # Rebases current branch onto origin/master

git pull --rebase pull from remote repo and update local , then rebase against it. More explicit

git rebase --abort ... roll back the rebase in case it is too messy

git merge

merge merges two commits, by default the currently active branch and the one specified. If that is FETCH_HEAD, it will merge in the changes pulled with fetch.

Working with your local repository

git add file
introduces the file to git for tracking.

git commit -m "Message"
commit writes changes in files that are under management of git to the repository. A commit can record changes in several files. Each commit has an informational message, and gets assigned a 40 character unique id called "git log". Changes can be newly added files, deleted ones, or edited ones.
[path/file] commit changes in file
-a commit all open changes
--amend change the message on the last commit. Use with care.
-a --amend change the message and the last committed changes to the current

Undoing changes and fixing things up

git commit --amend as long as your working directory is not changed, you can add, and remove files to the index, then recommit with amend (even changing the message, if you had a typo), and it all will still just be the same commit.

git reset HEAD unstage a file you erronouslay added to the index

git checkout -- undo changes done in working dir since last commit

git reset --hard reset all uncommitted changes
git reset --hard HEAD@{headnumber} undo that latest pull

git rebase -i HEAD^3, then pick one, squash the others, to squash several local commits into a single one before pushing. Useful if you make intermediary commits to save your work.

git revert undo a commit

Useful Aliases

git config --global alias.last 'log -1 HEAD'

git config --global alias.unstage 'reset HEAD --'

Learning git: Setting up git

Sources http://wiki.sourcemage.org/Git_Guide
http://cworth.org/hgbook-git/tour/

Installing

Setting up SSH keys

To set up a repository to access via ssh, you first have to set up ssh. ssh can use keys to avoid passwords. On cygwin keys are stored by default in your ~/.ssh directory together with other ssh relevant info. Keys work by a public key (typically stored in a file id_rsa.pub), which you deposit on the machine to log into, and which will be used to encrypt a challenge, and a private key (typically id_rsa) on your local drive, that you use to decrypt this, and send back the right answer.

First, create ~/.ssh, and make sure it is only accessible for you (otherwise ssh will complain that there is a security hole and exit)
mkdir .ssh
chmod 700 .ssh

On cygwin chmod often does not work right by default. It has something to do with how access rights and filesystem are mapped from Windows. Set the envrironment variable CYGWIN to "tty ntea", either in your Windows environment (via the My Computer icon), or by adding

set CYGWIN=tty ntea
in C:\cygwin\cygwin.bat

You also may have to edit the path to your $HOME dir in the /etc/passwd file, if it is somewhere else then /home/, which is put there by default, and is used by the ssh tools to find your .ssh folder.

Create a public/private key pair in .ssh, for example with "ssh-keygen -t rsa". Then, push the public key to the servers' ~/.ssh/authorized_keys, for example like this (this assumes there is a sshd running on the server, and ~/.ssh exists):

cat ~/.ssh/*.pub | ssh user@remote-system 'umask 077; cat >>.ssh/authorized_keys'

umask makes sure the rights on the authorized_keys file are 600, in case it needs to be created, otherwise again ssh will complain about a security hole and exit.

The keys themselves can be passphrase protected. If so you can set up a ssh-agent for the passphrase, which will automatically provide the phrase once you did so in each session. (See http://mah.everybody.org/docs/ssh for exhaustive options). You can run this agent automatically by including this in your .profile:

SSH_ENV="$HOME/.ssh/environment"

function start_agent {
echo "Initialising new SSH agent..."
/usr/bin/ssh-agent | sed 's/^echo/#echo/' > "${SSH_ENV}"
echo succeeded
chmod 600 "${SSH_ENV}"
. "${SSH_ENV}" > /dev/null
/usr/bin/ssh-add;
}

# Source SSH settings, if applicable

if [ -f "${SSH_ENV}" ]; then
. "${SSH_ENV}" > /dev/null
#ps ${SSH_AGENT_PID} doesn't work under cywgin
ps -ef | grep ${SSH_AGENT_PID} | grep ssh-agent$ > /dev/null || {
start_agent;
}
else
start_agent;
fi

This concludes the general SSH setup.

Cloning from an existing remote repository

This is probably what you will do most, and most probably from one on GitHub. Here we go:

Clone your fork of the projects' repository into your workstation, that will be your working repository, called master by convention.

git clone git@github.com:your_nick/project_name.git

Add to your working repository a reference to the projects' main repository (if there is such a thing, for example assume it is under an account called MainRepo):

cd project_name
git remote add main git@github.com:MainRepo/project_name.git

If you like set up your name and such

git config user.email "bioinformatics@schacherer.de"

Work on (and commit to) your working repository.

Merge the latest status of the main repository into your working repository. This should be done often to avoid painful conflicts:

git pull main master

Push changes to your origin:

git push

Setting up a remote repository

See: http://toolmantim.com/articles/setting_up_a_new_remote_git_repository

On the remote machine where you want to set this up do
mkdir code.git
cd code.git
git --bare init --shared

This creates an empty, shared repository. --bare makes current directory itself the git directory, instead of a .git subdirectory. There is thus no room for a working checkout of the files, and the repo is not used for work locally on the remote machine.

Setting up a local repository

git init
create a new, empty repository in the current directory. The repository are files in a .git subfolder of the working directory. You run this in the folder where your code resides.

git config
config stores configuration preferences, like your name, email, color options for the command output etc. Configuration can be stored globally for all repos in ~/.gitconfig or locally in .git/config.
--global applies changes globally. Omitting this applies them locally.
user.name "Your Name"
user.email "you@example.com"
color.ui always //switch on color output for commands

git remote
remote adds symbolic names for remote repos to your config file for easier reference. For typical full names of remote repos you access over ssh these would look like ssh://login@host.com/path/to/repo. If the repo is in your home dir, path would look like ~/repo.
add symname repo-url define a symbolic name for referring to a remote repo
rm symname get rid of it again
show symname run git show on the remote branch

2003-05-28

cut, join, sort -- aping SQL with the shell

A neat little way to find out how many different words are in a certain column in a tab-delimited file (in this example the first column) under Unix is cat mydata | cut -f1 | sort -u | wc -l

Another useful tool for text manipulation is join. It joins sorted text files based on the first field, similar to an SQL join based on key equalities (or can show you lines missing from either file).

2003-05-26

Little Helpers

LFTP is a really cool command line FTP-client, which supports reverse only-newer mirroring (with exclusion of selected directories). I use it to upload this homepage.

Pages