EGCG provided support for core sequence activities at the Sanger Centre, and has been the basis of new sequence analysis software for internal use, as well as providing advanced features in use at approximately 150 sites, and for more than 10,000 users of EMBnet national services.
That project has reached the limits of what can be achieved using the GCG package. Specifically, it is no longer possible to distribute academic software source code which uses the GCG libraries and has become difficult even to distribute binaries.
As a result, the former EGCG developers have been designing a totally new generation of academic sequence analysis software. This has resulted in the present EMBOSS project.
EMBOSS is a new, free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web. Also, as extensive libraries are provided with the package, it is a platform to allow other scientists to develop and release software in true open source spirit. EMBOSS also integrates a range of currently available packages and tools for sequence analysis into a seamless whole. EMBOSS breaks the historical trend towards commercial software packages.
The EMBOSS suite:
Within EMBOSS you will find over 150 programs (applications). These are just some of the areas covered:
More information about EMBOSS can be found at
http://emboss.sourceforge.net/
We assume that you are familiar with basic Unix commands for manipulating files and directories. EMBOSS contains many more applications than we can describe in the time available. We will introduce some of these and also show you how to find out about the others. There are many exercises for you to try, and we'll present the results you will see so that you know all is going well. Please feel free to experiment with the programs! That is definitely the best way to learn what they can do.
Much of the text in this document is what you will see on your screen; the Unix prompt is represented as unix % - don't type this in! The commands you need to type are printed in bold. If no input is specified, just press return. Pressing return will also dismiss graphics windows. The symbol means we have truncated the program output to save space.
Type wossname at the unix % prompt:
unix % wossname
EMBOSS programs start up with a one line description and then prompt you for information; in this case you see:
Finds programs by keywords in their one-line documentation
Keyword to search for: protein
SEARCH FOR 'PROTEIN'
antigenic | Finds antigenic sites in proteins |
backtranseq | Back translate a protein sequence |
checktrans | Reports STOP codons and ORF statistics of a protein sequence |
emowse | Protein identification by mass spectrometry |
digest | Protein proteolytic enzyme or reagent cleavage digest |
eprotdist | Protein distance algorithm |
eprotpars | Protein parsimony algorithm |
fuzzpro | Protein pattern search |
fuzztran | Protein pattern search after translation |
garnier | GARNIER predicts protein secondary structure. |
iep | Calculates the isoelectric point of a protein |
octanol | Displays protein hydropathy |
oddcomp | Finds protein sequence regions with a biased composition |
patmatdb | Search a protein sequence database with a motif |
patmatmotifs | Search a motif database with a protein sequence |
pepnet | Displays proteins as a helical net |
pepstats | Protein statistics |
pepwheel | Shows protein sequences as helices |
pepwindow | Displays protein hydropathy |
pepwindowall | Displays protein hydropathy of a set of sequences |
preg | Regular expression search of a protein sequence |
pscan | Scans proteins using PRINTS |
sigcleave | Reports protein signal cleavage sites |
topo | Draws an image of a transmembrane protein |
unix % wossname -opt
You will now be presented with a variety of additional options. The
default value for each option is given in square brackets, and you can
either press return to accept the default, or enter the
value you require:
Keyword to search for: protein
Output program details to a file [stdout]: myfile
Format the output for HTML [N]: Y
String to form the first half of an HTML link:
String to form the second half on an HTML link:
Output only the group names [N]:
Output an alphabetic list of programs [N]:
Use the expanded group names [N]:
This set of commands will cause wossname to write out the list of programs to a file called myfile, in HTML format ready for viewing in a web browser.
To produce a list of all the current EMBOSS programs, start up wossname again but instead of specifying a keyword, press return. A list of programs will scroll onto your screen, divided up into groups according to their functions. Scroll up and down to see them all. Can you think of how to get this data into a file? (Hint: use -opt)
If you append the flag -help to the name of any EMBOSS program
you will see a list of all the command flags available for this
program. For example:
unix % wossname -help
We'll see some more flags later. Let's move on to some sequence analysis ...