Areas requiring software development

Contents



Introduction

The tables below describe applications, libraries and areas of research which have been suggested as being useful additions to EMBOSS or EMBASSY. These suggestions have arisen through the EMBOSS mailing lists, personal correspondence and as the result of our 2005 User Survey.

If anything below interests you, please volunteer to work on it!. If you do decide to start work on these areas, please let the mailing list know first - many people may wish to collaborate with you or suggest easier ways of doing things.

If you've an idea for a library or an application that isn't on this list, please let the mailing list know and it will be added to this web page, even if you haven't time to work on it yourself.

IMPORTANT NOTE If you've a request for a new feature for an existing EMBOSS application please use our site on sourceforge.


EMBOSS Applications

The table below includes suggested new applications for EMBOSS and EMBASSY.

Name Priority Status Description Comments Code or documentation
BLAST wrapper High Inactive Wrapper to the BLAST suite of programs Wrappers to BLAST and FASTA have often been requested. Probably individual applications for BLASTP, BLASTN etc. Could code from the wEMBOSS BLAST wrapper be used?
Wrapper to other database search programs High Inactive Wrapper to the BLAST suite of programs Probably individual applications for BLASTP, BLASTN etc.
fasta wrapper High Inactive Integration of Bill Pearson's FASTA, TFASTA, etc as EMBASSY wrapper. Wrappers to BLAST and FASTA have often been requested. Bill Pearson mentioned he would like to do this, but it needs some way for sequences to be fetched again (e.g. saving file number and offset for 'any' sequence access method). The code is part of the way there. Could code from the wEMBOSS BLAST wrapper be used?
rnafolding High Inactive Integration of external RNA folding applications. The Zuker package may still be about the best.
hitmatch Medium Inactive Replacement for EGCG's equickmatch, using blastn output Needs a blast or fasta output parser. Should read in the query and database sequences, and perform a full NW or SW alignment, word-based if possible as they should be near-perfect matches. The aim is to report only those matches above a given threshold, and report the full alignments. If possible, with only the *differences* marked instead of the similarities.

No documentation for equickmatch is available.
alignutils Medium Inactive Sequence alignment utilities, to replace EGCG sortconsensus. Could implement various alignment site-scoring algorithms. alignutils documentation is available.
dodayhoffstat Medium Inactive Replacement for EGCG's dodayhoffstat Relatively easy to do. dodayhoffstat documentation is available.
mapplot Medium Inactive For displaying restriction plots. mapplot was specifically requested.
dottie Medium Inactive A general interactive dot plot application. Could use what's available, e.g. Erik Sonnhammer's dotter. A new implementation would requires interactive graphics.
nucstats Medium Inactive To report nucleic acid "vital statistics", e.g. ACGT composition etc. See pepstats for ideas.
plasmid drawing Medium Inactive To draw plasmids with restriction sites. As a replacement for MapPlot from GCG. Perhaps modify cirdna? See TACG (http://tacg.sourceforge.net/) for ideas.
fastacheck Low - maybe remove Inactive Replacement for EGCG's fastacheck Simple to do, but now FASTA has better statistics probably not needed. Functionality to read FASTA statistics and select hits might be useful regardless though. fastacheck documentation is available.
gapframe Low Inactive Adjust gap positions to be only at codon boundaries in a DNA sequence with known CDS position(s). Easy to do but requirement might be too low to justify it.
homologies Medium Inactive Table of the pairwise distances of aligned sequences. The EMBASSY allversusall application does this and could be moved to EMBOSS.
Feature utilities Medium Inactive Operation on a feature table file to extract selected features to another file. Should be turned into a quite extensive set of library functions.
Cluster Low Inactive This program is still in the 'test' set of programs. Sanger stopped using it therefore probably not needed. Easiest route to get clustering functionality at application level might be to use e.g. SANBI's stuff (but what about license?) AJAX clustering routines would be useful.
ALIEN Medium Inactive Multiple alignment program. Many multiple alignment programs are available and could be wrapped.
Gene ID programs Medium Inactive Would be useful. Is there non-commercial code for this?
genetrans Medium Inactive Replacement for EGCG's genetrans Functionality possibly redundant with existing EMBOSS apps though - check ! genetrans documentation is available.
ALIEN wrapper Medium Inactive Support for 3rd party Multiple alignment program ALIEN Requested via EMBOSS mailing list. Many multiple alignment programs are available and could be wrapped. ALIEN was specifically requested, but there are many other popular ones, e.g. TCOFFEE.
acdquery Medium Inactive Application to return ACD attributes e.g. sequence.length Arising from Marc Colet meeting. This would help in interpreting an ACD file. Much of the code exists; adapt seqinfo, acdtrace or entrails? . Must decide what to do.
MFOLD equivalent Medium Inactive Equivalent to MFOLD for RNA secondary structure prediction Requested via EMBOSS mailing list. GCG has this, we don't! No details for this - but it's been repeatedly requested.
snplocator Medium Inactive Application to locate SNPs in coding sequences Requested via EMBOSS mailing list. No details - but it was asked for.
Feature display Medium Inactive Graphical display of selected features from a feature table. Possible with plplot but probably better with a new graphics library. Notes are available.
Application for codon usage / composition bias Medium Inactive Application for codon usage / composition bias Requested via EMBOSS mailing list. Notes are available.
polyatails Medium Inactive Searches in a cDNA, the existence of any of these PolyA signals in the context of the poly A tail., using different regular expressions. Coral del Val from the Cancer Research Centre (Heidelberg) has submitted code. It is bases on the paper of Beaudoinget al., Genome Research vol. 10 1001-10010. Notes are available. ACD file is available. C source codeis available.
showdata Medium Inactive For showing codon usage tables: Requested via EMBOSS mailing list. Notes are available.
backtranambig Medium Inactive back translate a protein sequence to ambiguous codons. Requested via EMBOSS mailing list. Notes are available.
alignfromhsp Medium Inactive Build alignment from BLAST HSP Requested via EMBOSS mailing list. Notes are available.
jess Medium Inactive Functional site detection in protein structures Requested via conversation at EBI.From Thornton group into EMBOSS,perhaps as an EMBASSY package? Notes are available. Packaged code is available.
plotsimilarity Medium Inactive Requested via EMBOSS mailing list. Notes are available.
pscan replacement Medium Inactive A replacement to pscan Requested by Dave Judge via Alan. Retire pscan. Maybe replace with wrapper to interproscan?
nucstats Medium Inactive A "nucstats" or some such, to report nucleic acid "vital statistics", e.g. ACGT composition etc. (see pepstats). Requested via EMBOSS mailing list via AJB.
plasmiddraw Medium Inactive Replacement to MapPlot from GCG to draw plasmids with restriction sites. Requested via EMBOSS mailing list. Notes are available.


AJAX and NUCLEUS Libraries

The table below includes suggested new library files for AJAX and NUCLEUS.

Name Priority Status Description Comments Code or documentation
AJAX code refactoring High Active Function & parameter renaming and major documentation revision In preparation for future EMBOSS developments.
neural-nets Low Inactive Neural net routines and applications. Lots of free packages; Jose Valverde was working on this and recommended using XNN in 2002. Might be better alternatives now. Neureka is available from ftp://ftp.ii.uib.no/pub/neureka/. Not a high priority.
GAs Low Inactive Genetic Algorithm routines and applications.


Other areas

The table below includes other suggested new areas for EMBOSS.

Name Priority Status Description Comments Code or documentation
Perl API Medium Inactive Requested at the EMBOSS Industry Workshop 2006. An API to the applications could be generated automatically by the JACD tool that Jon Ison is working on.
EMBOSS eclipse extensions Medium Inactive Requested at the EMBOSS Industry Workshop 2006. The Eclipse package is very highly used in industry. Could look at bio-eclipse for ideas.
R statistics Medium Inactive Provide R statistics package as an EMBASSY package Arising from Marc Colet meeting. R is powerful and widely used (e.g. for microarray analysis) but is difficult to use. An EMBASSY wrapper could improve usability. Claude Beazley (now at Sanger) is using R and might be interested in helping with this. Any alternatives?


Completed

The table below includes some of the areas of work that have been recently completed.

Name Priority Status Description Comments Code or documentation
QA tests Complete Complete QA application testing using set of standard outputs and simple parsing of the results. Scripts to test output of expected results of EMBOSS programs.