A neat little way to find out how many different words are in a certain column in a tab-delimited file (in this example the first column) under Unix is cat mydata | cut -f1 | sort -u | wc -l
Another useful tool for text manipulation is join. It joins sorted text files based on the first field, similar to an SQL join based on key equalities (or can show you lines missing from either file).