banner
You are not using a standards compliant browser. Because of this you may notice minor glitches in the rendering of this page. Please upgrade to a compliant browser for optimal viewing:
Firefox
Internet Explorer 7
Safari (Mac and PC)
Post Archive
2017 (0)2012 (1)2011 (36)
November (1)October (3)August (3)July (6)June (3)May (4)April (4)March (4)February (4)January (4)
Rate This Post
Total votes: 2
Blogger Profile

Dangerous Experiments

Dangerous Experiments is the LabSpaces spot for guest bloggers. The purpose of the blog is to give new and old bloggers a space to experiment with blogging. If you'd like to contribute to this experiment, send us an e-mail or contact us on twitter at either @LSBlogs or @LabSpaces.

My posts are presented as opinion and commentary and do not represent the views of LabSpaces Productions, LLC, my employer, or my educational institution.

Blog RSS Feed
RSS Add to My Yahoo Add to Google
Recent Comments

Interestingly, I knew that this was the kind of work I wanted to do as soon as I heard about it. I had always loved both science and art, growing up. I didn't realize I could have a job that reache. . .Read More
Jan 08, 2013, 7:58pm

Thank you for writing Monika, and for your curiosity about this line of work. There are many reasons to be drawn to this profession, and there are many subspecialties. Aside from the lov. . .Read More
Jan 08, 2013, 7:50pm

Laura,  I am currently a student at Penn State University, and i am focused in the Visual Arts area. I was wondering about specificating my talent into medical illustration because of my p. . .Read More
Jan 08, 2013, 7:24pm

We here at Geekation.com approve of this post because it has our name in it. That is all... Actually that's not all. There's more! Here's a pic of a raccoon carying a. . .Read More
Nov 15, 2012, 3:04pm

Melissa, I too am fairly optimistic about the FSMA, which has great implications for the future of the lab testing industry. Although my company doesn't do food testing in particular, we have . . .Read More
Aug 15, 2012, 5:07pm
Wednesday, July 27, 2011

One of the most anticipated events in bioinformatics is the annual Nucleic Acids Research Web Server Issue, an edition that inevitably leads to a cyclic rise in the number of “terminal masters” awarded and gives veterans in the field a chance to type into their browser various exotic foreign top level domains like .sg, .tw, .il, and .org

Web servers, perhaps more commonly understood today as web applications, are a preferred platform for providing analysis and visualization to end-users. The key difference between the web sites featured in this issue and those in NAR’s popular “Database Issue” is that these have to actually do something along the lines of computations with user-uploaded data.

NAR has published a dedicated web server issue every year since 2003 - over 1100 applications have been introduced, though some are repeatedly featured as improvements arise. Perhaps the best way to peruse these sites is through the Bioinformatics Links Directory, a curated index of tools and databases developed by Francis Ouellette and colleagues.

The sheer volume of these tools is either inspiring or horrifying. Looking over issues from past years I found myself bewildered that I have never heard of 95% of these websites, and I profess to do bioinformatics for a living. Still, the intended audience is often those engaged in very specific areas of research, and so many of the tools are quite specialized, or border on proofs of concept. In terms of scope, submissions are often web front-ends to an existing analysis stack - these attempts to wrap existing software can be godsends to those who don’t wish to dive into a lengthy manual and installation process, as pointless as a spaghetti spinner, or as hostile as Robert Gallo at the Swedish Academy summer cookout. In other cases, the tools featured in this issue prove indispensible to thousands of researchers. This list includes, but is not limited to: RSAT, T-COFFEE, EBI Tools, MEME Suite, GEPAS, Onto-Tools, and of course the veritable BLAST web form.

Due to feature creep and NAR’s lengthy review process, it is quite possible that some of this year’s submissions were actually developed around 2006, and so readers will have to excuse embarassingly dated pop culture references to stuff like H1N1 and microarrays.

Oh That’s Rich

One of the more annoying trends I have witnessed in my career is the rise of “Rich Internet Applications” (RIAs), Google-maps inspired sites designed to behave like desktop applications. Many of the frameworks used to design these RIAs dispense with useless internet-y stuff like, um, hyperlinks. And RIAs certainly do feel like real desktop applications, but without all that annoying speed or a standardized user interface.

RIAs have made it into submissions to both the web server and database issues of NAR. The real crime in this is the dreaded “unchanging URL” so common in these apps. When a researcher performs a search or finds a result of interest, the first instinct is bookmark the page or email a link to a colleague. That researcher might not realize that neither the search they have performed nor the answers they have obtained are bookmarkable, but are instead the result of a stateful and unrecorded conversation they have engaged in with the server. In my opinion, any slick interface that prevents a natural URL-sharing in this way is a raw deal.

The looming sequence problem

To boot, none of this interface stuff actually solves the principal challenges posed by next generation sequencing, namely the inability of computer networks to quickly upload Gbps of sequence data to web-based tools.

Correctly identifying this as a problem, NAR’s 2010 issue introduced a dedicated section called “Stand-Alone Programs for High-Throughput Data” with an inaugural class of 5 programs. The explosion of interest in NGS brought with it this year’s high-throughput entries...4 programs. To be fair, many standalone programs find their way to other NAR issues and other journals and many of web servers offer a full download or a command-line component.

Because I have yet to find a PI who will admit their sequencing work will remain strictly “low-throughput”, I can only predict that sequence-based web tools will change radically, for example, accepting some sort of yet-to-be-invented lossy or highly-compact reference-based compression files.

Another section of recent years Web Server issues is reserved for “web services”, software designed to be accessed by robots or the socially awkward. Web services generally use one of two protocols - REST, which uses standard URLs and returns XML (e.g. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pmc&term=smegma)
and SOAP, an awful thing which should never be spoken of again.


Abandon ship!

Of the 1100+ tools that have been featured in the web server issue, a majority have been marginalized by newer alternatives, others do not take hold for various reasons, but some fraction is simply not supported by the original developers. In some cases a web application is outright pulled from its server by the institutional host.

The phenomenon of abandonware, software that has ceased to be maintained or supported, is certainly not unique to bioinformatics, but it is endemic to the field. The causes are all too clear: grad student develops web application, then leaves, tool does not win ongoing funding, then maintaining and updating the application becomes a burden to the PI. I, personally am guilty of such abandonment - my 2004 NAR splicing application is essentially a walking zombie.

The bioinformatics field has not definitively decided if abandonware is really a problem that needs fixing or is just the result of a natural winnowing process. If a web application is open-source and freely available then those who are motivated to obtain it should in theory be able to get an instance running or updated. From a practical standpoint, however, NAR submitters must agree to some fixed period of time after publishing that a tool be supported. I was curious to see if any of the 2010 entrants had the audacity to go deadbeat only 12 months out. Two I found came up 404: in one case one author said the web server was under maintenance but would be up shortly (which turned out to be true), the other author was completely unreachable.

One solution could be a obituary section to the web server issue, where sites are deemed dead, stale, or merely irrelevant by a panel of experts. Then corresponding authors could then indicate if they abandoned ship because of laziness, guilt, or simply out of spite.

---------------

Jeremy Leipzig is a bioinformatics programmer working at a pediatric hospital.  You can hear more from him on his blog or over on twitter.

This post has been viewed: 11335 time(s)

Blog Comments

Donnie Berkholz
Mayo Clinic
Rate Post:

Like 0 Dislike

I tried plotting the differences vs time by eye from your plot, and it looks pretty interesting. Seems that the inflection point is around 3–4 years before it's considered OK to abandon (at least according to the PIs for the 20%–25% of servers that are eventually left to rot). I guess this is enough time to rack up whatever self-citations the PI had in mind for that chain of projects and then move on.

 


Jeremy Leipzig
Children's Hospital of Philadelphia
Rate Post:

Like 0 Dislike
200380.92%200486.13%200581.33%200687.33%200791.54%200894.68%200996.43%201096.72%2011100.00%

Yes, the "404 rate" really picks up after 4 years. I would imagine the "freshness curve" to be much steeper.

Add Comment?
Comments are closed 2 weeks after initial post.
Friends