EMBOSS command syntax

Contents


Introduction

The EMBOSS package consists of a large number of separate programs that have a specific function. They usually take a (number of) input file(s) and some parameters that are important to the function and produce output in the form of files, plots, web pages or simple text output.

The programs can be invoked in a myriad of ways. Its name could be entered on the command line with all parameters, so the program will have all the information it needs all at once. A more interactive way is an query-answer session with the user, in which the user is asked to enter a piece of information one at a time. A third way could be a web-interface where a user chooses the options for the program using lists, checkboxes, radio buttons etc. In EMBOSS, the way a program interacts with the user, its interface, is independent of the actual program.

Command line

EMBOSS programs are called by giving their name on the UNIX command line either with or without parameters. Many parameters can have qualifiers, that will give more information about a parameter. For instance, the format of the information in a sequence file that is used as an input file could be specified on the command line, like :

% seqret filename.seq -sformat fasta

In this example the EMBOSS program ' seqret is called with the filename 'filename.seq' as its first parameter. '-sformat fasta' indicates that the sequence file is in 'fasta' format. The percentage sign '%' indicates that the command was entered on the UNIX command line. This will be used throughout the documentation.

Qualifiers

For many of the parameters/objects, qualifiers can be used to specify the properties of that object on the command line. The format of a sequence file (data type 'sequence') can be specified by a qualifier as being, for instance, 'fasta'. This type of qualifiers are specific for a particular data type (or object) and are therefor called data type specific qualifuers.

A second type of qualifier is independent of the data types. These are the global qualifiers and apply to the complete program. They are usually used to change the behaviour of the program. Qualifiers can be set to turn the debugging on, for instance (by using the -debug qualifier), or it can instruct the program to behave like a filter, reading from the standard input and writing to the standard output ( -filter qualifier).

Qualifiers can be entered on the command line in a myriad of ways and a full description of the command line syntax will be given in 1.4. For the moment qualifiers will be used in the UNIX style, which means that a qualifier name is prefixed with an hyphen and the value (if necessary) will be spaced from the qualifier by a space.

Example:


% seqret sequence.seq -sformat fasta

"-sformat fasta" is a "qualifier/value pair". Where seqret is the program being called, sequence.seq the first (and only) parameter and -sformat fasta the qualifier/value pair for this parameter.

The global qualifiers are boolean qualifiers and can be set by naming them on the command line and specifically unset by prefixing the qualifier with 'no', but since the global qualifiers all default to false anyway, there is no specific need to use this syntax at the moment..

Example:


% seqret sequence.seq -debug

% seqret sequence.seq -nodebug

In the first example seqret is the program being called, sequence.seq the first (and only) parameter and -debuginstructs the program to turn debugging on. In the second example seqret is run with the same parameter, but the -debug qualifier is now prefixed with ' no', instructing the program to turn debugging off (this could be useful if debugging was turned on by default in the resource files or in an environment variable).

Naming scheme for qualifiers

Qualifiers can have any name, but a recommended naming scheme is used at the moment. The first one or two letters of the qualifier indicate the data type they are related to. 'OS' is used for the output sequence data types (outseq, outseqseq and outseqall) and 'S' for the input sequence data types (sequence, seqset and seqall). The rest of the qualifiers' name is free but should be something sensible related to the data type.

Global qualifiers

Global qualifiers can change the behaviour of the program. They are boolean qualifiers and can be set by naming them on the command line and specifically unset by prefixing the qualifier with 'no'. The current global qualifiers are listed in the table below.

Formalised:

Qualifier definition Description
-auto Turns off any prompting of the user
-debug Turns on debugging with ajDebug calls
-filter Reads from stdin and writes to stdout and implies -auto
-stdout Writes by default to stdout, but still prompts the user
-help Will give usage information of this program. See also -verbose below.
-verbose When used with -help also gives the associated qualifiers and the general qualifiers (this list)
-options Program will also prompt for optional qualifiers.
Table 7. Global qualifiers.

Global qualifier description

 

auto

The -auto qualifier will turn off any prompting of the user. It will try to run the program with all the default settings that are defined in the ACD file. If a parameter does not have a default value and it is flagged as required, the program will stop and produce an error message.

Example:


% seqret sequence.seq
Output sequence [pdnirsecf.fasta]:

% seqret sequence.seq -auto
%

The first example shows the application seqret being run without the -auto qualifier. The program will prompt the user for an output filename, because the output sequence is a mandatory parameter. It presents the user with a prompt and a default output filename ( pdnirsecf.fasta, constructed from the input sequence name and the output format).

In the second example, the application seqret is run with the -auto qualifier. The user is not queried for the output filename and it will use the default filename( pdnirsecf.fasta) for the output file.  

debug

This qualifier will turn on the debug tracing. A file will be produced with the name of the program followed by the extension .dbg. The debug file will contain a complete trace of the actions of the program that use the AJAX function ajDebug().  

filter

The filter qualifier makes the program behave like a filter, reading its (first) input 'file' from the standard input, and writing its (first) output 'file' to the standard output. The -filter qualifier will also invoke the -auto qualifier, so the user is never prompted for any missing values.

Example:


% cat sequence.seq | seqret -filter |  lpr

The example shows the application seqret being run with the -filter qualifier. The input file is 'piped' into the program using the unix command cat and the output is 'piped' directly to the unix program lpr, which will print it on the printer.  

help

Help on a program's use can be obtained by using the -help qualifier. The help that is displayed will be automatically produced from the information in the ACD file. It will list all the parameters and their associated qualifiers. It will show the names of the parameters and qualifiers, their type and a brief help text, that is extracted from the help: attribute.

A second qualifier -verbose gives a list of all available qualifiers, including any associated qualifiers (sequence formatting etc) and the general qualifiers such as -help.  

options

When the -options qualifier is used and not all the parameters are given on the command line, it will query the user for those parameters. It will not only query the user for he required parameters (data types with the param: attribute and/or required: attribute), like it would do without the -options qualifier, but it will also query the user for the parameters that are labeled with the optional: atrtribute.

Example:


% seqret
Input sequence: sequence.seq
Output sequence [pdnirsecf.fasta]: <ENTER>

% seqret -options
Input sequence: sequence.seq
Output sequence [pdnirsecf.fasta]: <ENTER>
-outputLength: 10

In the first example the application seqret is run without any parameters or qualifiers and since the sequence parameter is a parameter it queries the user for the input filename. It also queries the user for the output sequence, since that parameter is labeled as being required by the attribute of that name. It will not query the user for the integer variable outputLength, since it is not labeled as neither a parameter nor required.

In the second example the user IS queried for the integer, since the -options qualifier forces the program to query for those parameters that are labeled with the optional: attribute.

Any parameter that is not defined as a parameter (with the param: attribute), as required (by the required: attribute) or as optional (by the optional: attribute) can still be used on the command line, but the user will NEVER be queried for them. These parameters are considered an 'advanced feature' and can only be used on the command line. They will only be shown by the -help qualifier.  

stdout

When the -stdout qualifier is used, the user will still be prompted for all the info that is required, but will write to standard output. The user will also still be prompted for an output filename, in case the user wants to save the output to a file.

Example:


% seqret -stdout
Input sequence: sequence.seq
Output sequence [stdout]: <ENTER>

In this example the -stdout qualifier changes the default output to be to standard output (the terminal) instead of to a file. The program can still prompt the user, so there is a chance to enter a filename instead. With -auto on the command line, the program would instead write to the terminal without asking.

Data type specific qualifiers

Input/Output

At the moment there are only qualifiers for sequence type data.

Formalised:

Data type Qualifier definition Description
sequence -sformat string Format of the input sequence
seqset -sdbname string Database where the entry must be read from
seqsetall -sentry string Entry that must be read from specified database
  -sbegin integer Start of the sequence to be used
  -send integer End of the sequence to be used
  -sreverse Y/N Use reverse of the sequence
  -slower Y/N Convert sequence to lowercase
  -supper Y/N Convert sequence to uppercase
  -sask Y/N Ask the user for the sequence range and direction
  -sopenfile string CHECK
seqout -osformat string The output format for this sequence
seqoutset -osextension string The extension to use for this file
seqoutall -osdbname string The database name to be used for this sequence
  -ossingle Write out multiple sequenecs to individual files
Table 9. Input/Output qualifiers.

Multiple qualifiers

Qualifiers refer to the parameter that preceded the qualifier, until a parameter from the same data type appears on the command line. But, qualifiers that are specific for different data types can be intermixed. If there are no two parameters of equal type, the order of parameters and their qualifiers is irrelevant.

Example 1


% seqret in.seq out.seq -sformat fasta -osformat gcg

In this example, the program seqret takes two parameters, an input sequence (file in.seq, data type 'sequence') and an output sequence (file out.seq, data type 'outseq') and the order of the qualifiers is irrelevant, since the two qualifiers refer to different data types.

Example 2


% align aap.seq -sformat fasta  noot.seq  -sformat gcg

In this example, the program align takes two parameters, both input sequences (files aap.seq and noot.seq, data type sequence) and here the order of the qualifiers is important. Since aap.seq is in 'fasta' format and noot.seq is in 'gcg' format.

Numbering qualifiers

Instead of having to adher to a rigourous order for the qualifiers when two or more parameters of the same data type are defined, it is also possible to use numbers in the qualifiers name, to indicate to which parameter the qualifier is refering.

Formalised:

-qualifiernamequalifier_value

where # represents an integer number, indicating which parameter the qualifier is referring to.

Example:


% align aap.seq noot.seq -sformat2 gcg -sformat1 fasta

Is similar to example 2 above, but uses the qualifier numbering, to indicate that the format of the first parameter is 'fasta' and the second 'gcg'.

Command line syntax

Most of the EMBOSS programs will be started from the UNIX command line, either with or without extra parameters and qualifiers. Which parameters and qualifiers can appear on the command line, is defined in the Ajax Command Definition (ACD) file that is associated with the EMBOSS program.

The Command line syntax is very versatile and it does not restrict the available syntax more than is strictly necessary. To save confusion, there will be a recommended EMBOSS command style, which probably will be the UNIX style using '=' for parameter and qualifier values.

Parameters and qualifiers.

For parameters it is not always mandatory to use the name of the parameter on the command line. If the param: attribute was used for a parameter it is not mandatory to use the name of the parameter as a prefix to the parameter value. For qualifiers it is always mandatory to provide the name of the qualifier (if a value for the qualifier is to be given on the command line).

In the rest of the definition of the command line syntax, wherever the word qualifier is used, it means both parameters and qualifiers. If parameter is used it will only apply to parameters.

Qualifier definition

Examples

The following command lines all tell seqret to read sequence paamir.tfa in fasta format, starting at base 25.


% seqret -sbeg 25 paamir.tfa -sf fasta

% seqret fasta::paamir.tfa -sbegin=25 

% seqret -sbegin=25 fasta::paamir.tfa 

% seqret -sbegin=25 paamir.tfa -sformat fasta

% seqret -sbeg 25 paamir.tfa -sf=fasta

% seqret -sbeg 25 -sequence=paamir.tfa -sf=fasta

% seqret sbeg=25 -sequence=paamir.tfa sf=fasta

% seqret -sbeg 25 -sequence paamir.tfa -sf fasta

% seqret /SBEG=25 /SEQUENCE=paamir.tfa /SF=fasta

This may seem rather confusing, but only because there is no enforcement of a standard recommended way for users to specify the command lines.

For general use, we strongly recommend the first example above.