« DRM-Free Resurrection?Spammers Infiltrate Missouri-Columbia's Site »

A Little Of The Tree Of Life

10/02/07

  08:56:00 pm, by Nimble   , 953 words  
Categories: Thoughts, Science

A Little Of The Tree Of Life

So you want to see a little bit of the tree of life for yourself. Maybe use some of the bioinformatics tools on the Internet to see some of the relatedness of life on earth. Can you do that as a layman? Sure you can, and here's a little introduction as to how.

One protein that's a good one to try is the human hemoglobin alpha chain, a component of hemoglobin, the oxygen carrier in your blood.

Let's do this thing!

First, let's go to the ExPASy Proteomics Server, and put in a search request for human hemoglobin alpha chain:


ExPASy Search


You'll get a few search results. Here, we'll use these search results:

Search in UniProtKB/Swiss-Prot: There are matches to 1 out of 285335 entries

HBA_HUMAN (P69905)
Hemoglobin subunit alpha (Hemoglobin alpha chain) (Alpha-globin). {GENE: Name=HBA1; and Name=HBA2} - Homo sapiens (Human)

Click on the HBA_HUMAN link, and you'll get a rather detailed results page:


Hemoglobin results


Click on [Tools], then on BLAST submission on ExPASy/SIB.

*NOTE: Once they finally get the beta version all tested out, you will see something like this instead:


Hemoglobin results


If you see this, or clicked on the "Notice: This page will be replaced with beta.uniprot.org", then click the Blast tab to get to the BLAST submission.

Just choose "Run Blast" or click the Blast! button, whichever is available in the interface you see.

It will take a little while, anywhere from 30 seconds to 2 minutes, so be patient.

You will then get a list of hits, sorted by E-value, which is a measure of how expected the match would be in a totally random database. With high identity and low E, you are certain to be looking at homologies, the equivalent protein either in other species, or in some cases, proteins whose genes have been duplicated and modified.

Alignments are the more interesting feature - this shows a comparison between the search and the match.

What you'll notice is a distinct abundance of primates in the very lowest E values. For example, with the chimpanzee, Pan troglodytes:

sp P69907
HBA_PANTR Hemoglobin subunit alpha (Hemoglobin alpha chain) (Alpha-globin)
[HBA1] [Pan troglodytes (Chimpanzee)] 142 AA
align

Score = 286 bits (733), Expect = 2e-76
Identities = 142/142 (100%), Positives = 142/142 (100%)

Query: 1 MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG 60
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG
Sbjct: 1 MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG 60

Query: 61 KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP 120
KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP
Sbjct: 61 KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP 120

Query: 121 AVHASLDKFLASVSTVLTSKYR 142
AVHASLDKFLASVSTVLTSKYR
Sbjct: 121 AVHASLDKFLASVSTVLTSKYR 142

It's a 100% match, amino acid for amino acid.

(Note, if you're using the new interface, you will have to click on the Show (1) under the Local alignment column to see these)

How about the lowland gorilla?

sp P01923
HBA_GORGO Hemoglobin subunit alpha (Hemoglobin alpha chain) (Alpha-globin)
[HBA] [Gorilla gorilla gorilla (Lowland gorilla)] 141 AA
align

Score = 283 bits (725), Expect = 2e-75
Identities = 140/141 (99%), Positives = 141/141 (100%)

Query: 2 VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK 61
VLSPADKTNVKAAWGKVGAHAG+YGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK
Sbjct: 1 VLSPADKTNVKAAWGKVGAHAGDYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGK 60

Query: 62 KVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPA 121
KVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPA
Sbjct: 61 KVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPA 120

Query: 122 VHASLDKFLASVSTVLTSKYR 142
VHASLDKFLASVSTVLTSKYR
Sbjct: 121 VHASLDKFLASVSTVLTSKYR 141

Looks like there's no match on the methionine (M) and there's an aspartic acid (D) instead of a glutamic acid (E).


A couple of asides:

If the actual amino acids matter to you out of curiosity, look here for the letter codes for amino acids

Also interestingly, the difference between aspartic and glutamic acids isn't that much, as far as amino acids go. They're the only two acidic amino acids, for one. Here and here are a little bit more about them.


That's remarkably similar for a protein between two species. Let's continue down the list a little bit...

How about... a red colobus monkey?

sp P01930
HBA_COLBA Hemoglobin subunit alpha (Hemoglobin alpha chain) (Alpha-globin)
[HBA] [Colobus badius (Red colobus) (Procolobus badius)] 142 AA
align

Score = 272 bits (696), Expect = 3e-72
Identities = 135/142 (95%), Positives = 136/142 (95%)

Query: 1 MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG 60
MVLSPADKTNVK AWGKVG H GEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG
Sbjct: 1 MVLSPADKTNVKTAWGKVGGHGGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG 60

Query: 61 KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP 120
KKVADALT A AHVDDMP+ALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAH PAEFTP
Sbjct: 61 KKVADALTLAAAHVDDMPSALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHHPAEFTP 120

Query: 121 AVHASLDKFLASVSTVLTSKYR 142
AVHASLDKFLASVSTVLTSKYR
Sbjct: 121 AVHASLDKFLASVSTVLTSKYR 142

Now the match is getting a little further off. The hemoglobin alpha chain of the red colobus is 'only' 95% identical, and is still doing the same job as the human equivalent.

You will encounter some of the same creatures again, but note that you're encountering things like the alpha-2 chain, so do not be too distracted :)

Let's take... the common mouse, Mus musculus:

sp P01942
HBA_MOUSE Hemoglobin subunit alpha (Hemoglobin alpha chain) (Alpha-globin)
[Hba] [Mus musculus (Mouse)] 142 AA
align

Score = 254 bits (648), Expect = 1e-66
Identities = 122/142 (85%), Positives = 131/142 (92%)

Query: 1 MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG 60
MVLS DK+N+KAAWGK+G H EYGAEALERMF SFPTTKTYFPHFD+SHGSAQVKGHG
Sbjct: 1 MVLSGEDKSNIKAAWGKIGGHGAEYGAEALERMFASFPTTKTYFPHFDVSHGSAQVKGHG 60

Query: 61 KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP 120
KKVADAL +A H+DD+P ALSALSDLHAHKLRVDPVNFKLLSHCLLVTLA+H PA+FTP
Sbjct: 61 KKVADALASAAGHLDDLPGALSALSDLHAHKLRVDPVNFKLLSHCLLVTLASHHPADFTP 120

Query: 121 AVHASLDKFLASVSTVLTSKYR 142
AVHASLDKFLASVSTVLTSKYR
Sbjct: 121 AVHASLDKFLASVSTVLTSKYR 142

Even more different, yet still a very, very good match.

Now you can't rely on one protein alone to build a tree of life this way. In particular, not every protein changes in every species at the same rate. It is even possible to have 'closer' relatives with a much different particular protein, depending on how fast it evolved.

Remember, too, that in the tree of life, all existing creatures are on the branches. Apes may have evolved from monkeys, for example, but not from any existing monkeys, even if the common ancestor was much more similar to an existing monkey than to the ape in question.

Anyhow, I hope that gave you a little taste of how looking at the proteins transcribed from DNA shows how determining the tree of life works.

At the very least, I hope you may be mildly surprised at how stark the similarities are between proteins in humans and chimpanzees, then other apes, then monkeys, are.

There aren't complete database entries for all species for all proteins yet (so many searches will just be comparing humans to rats, mice, chickens, etc.), but you can find proteins sequenced for specific primates by typing in the primate name in an ExPASy search (e.g. a list of proteins sequenced for orangutans), and hopefully the repository will keep expanding as work is done.

3 comments

Comment from: Adam [Member]  
Adam

I just wish to say that this is the least readable page put together so far! Excellent! :)

10/04/07 @ 18:43
Comment from: Nimble [Member]  

What part’s confusing you, Adam, apart from “all of it"? :)

10/05/07 @ 00:07
Comment from: Nimble [Member]  

There - do a few horizontal rules help? :)

11/16/07 @ 14:47