Using FlyMine power in Perl…

command…or Python. Or Ruby. Or Java.

Our research group shares office space with Gos Micklem’s group of InterMine developers. That’s why we often have access to great new tools at the beta stage. Something that has matured to something extremely useful is Alex Kalderimis’ API to access InterMine instances, available for a variety of languages (Perl, Python, Rube, Java).

I’ve previously shared my love for FlyMine on Nature’s Spot On blog. However, one usability problem I have with the FlyMine website is list management. My research frequently generates dozens of lists, most of which I only need to browse very quickly through for ‘anything unusual’. For a quick analysis I prefer doing this with FlyMine, especially as I can also see publication, gene expression or protein domain enrichment at the same time. However, everyone who’s used FlyMine before knows that uploading a gene list can sometimes become a research project in itself, especially if ambiguous identifiers have been used. So handling a dozen temporary lists can become a bit of a pain.

This is where the new powerful API comes in. Most of my gene lists either come from a MySQL database or a Perl script.

With the API I can now
  • upload a gene list
  • save it to my personal FlyMine account for later viewing (or not, i.e. keep temporary)
  • get enrichment statistics back to my script
And here is how it works, in a dense 10 line example:
1: use strict;
2: use Webservice::InterMine; #footnote 1
3: my @genesymbols = ('hb','bcd','gt','kr','eve'); #footnote 2
4: my $TOKEN = "flgjrhgifhlgdlgjfhklgflsdhg"; #footnote 3
5: my $service = Webservice::InterMine->get_service('', $TOKEN); #footnote 4
6: my $list = $service->new_list(content => \@symbols, type => 'Gene', name => 'funny list name'); #footnote 5
7: my %args = (widget => 'go_enrichment_for_gene', maxp => 0.05, population => 'library'); # footnote 6
8:  foreach my $term ($list->enrichment(%args)->get_all) {
9:    printf "%s (%s) - %.2e %u\n", $term->{'identifier'}, $term->{'description'}, $term->{'p-value'}, $term->{'matches'};
10: }
  1. I’ve had some painful experiences installing the module under OS X Leopard, but under OS X Lion the installation was absolutely straight forward.
  2. This is where the magic happens. I routinely fill the array with results from complex MySQL queries and such.
  3. You can obtain the token for your personal FlyMine account by going to MyMine -> Account Details.
  4. This is where the connection to FlyMine (or any other InterMine instance) gets established. You can leave $TOKEN away if you don’t intend to save your list.
  5. This uploads your gene list. In this case, it would be saved as ‘funny list name’, and omitting name simply makes it a temporary list.
  6. Here we’re interested in Gene Ontology enrichment for this list. We want to know anything better p < 0.05 and use the gene list called ‘library’ as background for the hypergeometric test. Omitting population simply uses the entire gene universe as background. Although probably one of the most powerful commands of the API ($list->enrichment(%args)->get_all), it’s where I find the official InterMine API cookbook falls short. Only when you consult the widgety guts of InterMine, you can see what the enrichment tools are actually called: bdgp_enrichment, publication_enrichment, go_enrichment_for_gene, prot_dom_enrichment_for_gene, pathway_enrichment, miranda_enrichment. And they all know how to talk about their results, so you can check what  the hash keys in line 9 can be.
Alex is usually pretty good in getting back to emails to, so why don’t you give the API a shot. It can also do a lot more than I described here.

PS: To be coherent with my previous blog post, there’s always the workhorse of Gene Ontology enrichment, Ontologizer, but I use that only for in-depth analysis once I’m convinced something is worth looking at.

Comments are closed.