Are there any existing solutions for creating a generic DNA sequence database with a website front end?

https://stackoverflow.com/questions/1890285

19-09-2019
|

Question

I'd like to create an rRNA sequence database with a web front end for the lab I work in. It seems common in biology to want to search a large number of sequences using alignment algorithms such as BLAST and HMMER, so I wondered if there is any existing php/python/rails projects that allow easy creation of a generic sequence database with a website search form?

UPDATE: GMOD is the type of server I was looking for. I was also suggested to look at BioMart too which looks to have a similar functionality.

Solution

something a little less barebones is http://gmod.org/ - the simplest installation should give you a blast form & a "sequence browser" interface. Don't know if theres a hmmer form yet...

(scales pretty well - from a simple sqlite to a real database)

Alternatively, you may want to look into the galaxy server. http://main.g2.bx.psu.edu/
It's first aim is making complex genomic queries easy for non-computational people but I dont know if it has a blast out of the box

cheers, yannick

UPDATE - Inspired in part by this post, we are developing a simple local blast server as an easy-to-deploy alternative to wwwblast. Work in progress at http://www.sequenceserver.com. A demo server lets you BLAST ant genomes.

OTHER TIPS

This will be overkill probably but.... ncbi has a lot of software available. Link.

In particular, this.

There's a simple CGI front-end distributed with the NCBI BLAST package as well. You can download it from their FTP site, which is here:

ftp://ftp.ncbi.nih.gov/

It's not either of the language you are talking about, but there is BioPERL, which is a collection of functions specifically made for DNA and RNA and other acid and protein base 'programming'

Look for it in CPAN.org

I'd strongly suggest contacting the bioinformatics community. The most important thing is to design the database and decide its purpose. You mention DNA in the title but rRNA in the text - these are completely different things. If it's only a typo, fine - but if you don't understand the difference then talk with people in the community.

Since I'm involved in the community you might like to contact the MyExperiment community (http://en.wikipedia.org/wiki/MyExperiment) and mention my name if you need to. You'll find lots of friendly people and help.

UPDATE I've just noticed you are from Manchester and that's the hub of MyExperiment so it really is the obvious place to start!

Concerning GMOD: I am relatively sure that GMOD is complete overkill for your application. GMOD is not a server, it's a collection of tools, the database schema (CHADO) being one of them, and Chado is not really for someone who mostly will have sequences and ids. BioMart is not a server either, it's a tool that permits de-normalization of model databases, to be able to run whole-genome queries fast enough. One of the BioMart clients (MartView) comes as a web interface. You definitely don't want to use Biomart at the moment but I can explain that in detail by email. I have the impression that you rather need a web-based BLAST client to get started first.

Galaxy: Galaxy is not a database, it's a website with tools to work with (mostly DNA) sequences from various genomes. Galaxy is tightly linked with the UCSC genome browser sequences, tools and fileformats. So if you want to create a database of entirely new sequences, galaxy is not for you. It doesn't include any BLAST servers either. If you want to create a database of sequences, CHADO as part of GMOD comes close, but I'd rather start use a text file to get started, see my post above.

Maybe you can look at Plone4Bio.

Plone is an extended content management engine written in python, with a lot of features and easy to use applications, so you can create your website by using a collection of modules like forums, products for news, etc... (I know you know this already but it is just to give a bit of background).

Plone4Bio is aimed at providing some plone applications for bioinformatics... I don't know how much the project is advanced and I haven't used it yet, but it seems that at least you have a sequence object and some apps for visualizing it, and probably some applications to search them. (p.s. they use it at uniprot - look at the 'Third party data' section for any membrane protein)

I don't know of any other CMS apps aimed at bioinformatics, but maybe you can also easily implement something with django without too much effort.

Having no idea about what format the information will be stored in, or how DNA sequences are displayed (is it just a long string?), you may be able to get away with simply inserting each DNA sequence into a MySQL database and then executing a simple query like:

SELECT * FROM `dna_table` WHERE `sequence` = $sequence;

Make sure you use an escape string or a parameterized query (to prevent SQL injection), but other than that, this sounds like a REALLY simple DB program that shouldn't be more than about 100 lines of code.

I agree: You should post your question to bbb@bioinformatics.org or the bioperl mailing list.

The question "easy creation of a generic sequence database with a website search form" seems too general. A sequence database is a list of (id, sequence) and by itself doesn't need any tool support. At least I don't see any reason why you would need a tool for that.

I think your question is: Is there a BLAST client as webform that one can install locally? There are some: PLAN might worth a try though I never had it running. BioPerl has objects for standalone BLAST execution (http://doc.bioperl.org/releases/bioperl-1.0/Bio/Tools/Run/StandAloneBlast.html) and can display the results graphically. Debian/Ubuntu Med have ncbi-tools-bin and ncbi-rrna-data which install the necessary tools and databases in a couple of seconds.

Instead of pondering tool support I would rather hack together a 10 line CGI script that executes blast with an input sequence onto the Fasta files that you have and then see if the users aren't already happy with that.

Concerned about the programming language: If you like, you can do this with a shell script (*). That might even take you less time than the posting on stackoverflow... ;-)

(*) Note to paranoid computer science collegues: it's going to be an internal application for biologists who don't know the difference between an operating system and operator overloading, so sql injections are very very unlikely...

I think this is an example where premature optimization is evil enough, in the sense that you can loose tons of time with designing a system too complex for a simple task. In the spirit of agile programming, if you like software engineering buzzwords, you might simply hack something together and then try it on your users before thinking about the architecture.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow