The EMBOSS package consists of a large number of separate programs that have a specific function. They usually take a (number of) input file(s) and some parameters that are important to the function and produce output in the form of files, plots, web pages or simple text output.
The programs can be invoked in a myriad of ways. Its name could be entered on the command line with all parameters, so the program will have all the information it needs all at once. A more interactive way is a query-answer session with the user, in which the user is asked to enter a piece of information one at a time. A third way could be a web-interface where a user chooses the options for the program using lists, checkboxes, radio buttons etc. In EMBOSS, the way a program interacts with the user, its interface, is independent of the actual program.
At the moment, EMBOSS programs are called by giving their name on the UNIX command line either with or without parameters. Many parameters can have qualifiers that will give more information about a parameter. For instance, the format of the information in a sequence file that is used as an input file could be specified on the command line, like:
% seqret filename.seq -sformat fasta |
In this example the EMBOSS program ' seqret is called with the filename 'filename.seq' as its first parameter. '-sformat fasta' indicates that the sequence file is in 'fasta' format. A complete description of the command line syntax will follow in section 2 Formal Description of the ACD language. The percentage sign '%' indicates that the command was entered on the UNIX command line. This will be used throughout the documentation.
Every EMBOSS program will be accompanied by a so-called ACD (Ajax Command Definitions) file, which describes the parameters that the program it refers to needs. It contains information about its input and output files and other parameters the program may need. It will indicate if any of the parameters are mandatory (like an input sequence file) or that certain parameters are within certain limits (a gap penalty for an alignment must be higher then 0 for instance). It can also indicate whether one parameter's value is dependent on the value or the presence of another. (An example: If the input sequence for an alignment program is DNA, it should not accept a protein comparison matrix).
The parameters are defined in a special purpose language called Ajax Command Definitions or ACD, specially designed for EMBOSS. It will specify everything that can appear on the command line or can be used in another interface like web pages. It is a very 'forgiving' language in that it does not restrict the available syntax any more than is strictly necessary.
ACD files are simple text files that contain the definitions. The files usually have the same prefix as the program, but this is not required. ACD files use the extension '.acd'. This is mandatory.
Formalised:
token: token [ definition ]
is equivalent to
token=token [ definition ]
The first token in the file must be "application" directly followed by a colon ':' or an equal sign '='. The second token is the application name with which this ACD file is associated. The application name is followed by (required) application attributes enclosed in square brackets.
Formalised:
application: appname [ attributes ]
Example:
application: wossname [ documentation: "Finds programs by keywords" groups: "Display" ]
The first token of a parameter definition is an Ajax datatype, directly followed by a colon ':' (preferred) or equal sign '='. The second token is the name by which this parameter is going to be known (this is also the name that is used by the EMBOSS program to get the value of the parameter). After the name, definitions are in mandatory square brackets, [], which can make a definition span multiple lines.
Formalised:
datatype: parametername [ definition ]
Example:
sequence: asequence [ standard: "Y" ]
Tokens representing data types can be abbreviated up to the point where they are not ambiguous. For example, default: can be abbreviated to default: or even d: although the latter is not recommended due to lack of clarity.
Values can be delimited (i.e. treated as one token) by double quotes
The first token of an ACD file must be the application: token, followed by the application name. The application name and the ACD filename (without the .acd extension) are usually identical, but this is not mandatory. When a program calls the embInit("program") function with "program" as its parameter, the function will only look for an ACD file called program.acd. It will not compare the parameter with the string given after the application: token.
The application: token has a documentation: attribute which is followed by a string describing the function of the program. This documentation string will be used to generate the description of the program when the program is run or the user specifies the -help qualifier. When the documentation: attribute is missing, a warning will be issued.
Formalised:
application: appname [ documentation: string ]
Example:
ACD file definition (partly):
application: seqret [ documentation: "Reads and writes (returns) a sequence" ]
Command line:
% seqret Reads and writes (returns) a sequence Input sequence :
The ACD file starts with the definition of the program seqret. The documentation: attribute is followed by a string briefly explaining the function of the program and this string is shown after the program is invoked and before it prompts the user for any input. The documentation: string is also searched by the wossname utility, which finds applications by keyword (in the doc string) and group.
The length of the documentation: string should be kept to 63 characters or shorter in order to allow the wossname utility to display each program name and its documentation on one 80-character line.
The documentation: string should not end with a '.' character
Any acronyms or capitalised abbreviations in the documentation: string should be written in upper case. (e.g.: SNPs, EST, DNA, ABI, SRS, ASCII, CDS, mRNA, B-DNA, RNA, CpG, ORFs, MAR/SAR, PCR, STS, REBASE, SCOP, PROSITE, PRINTS, EMBL, TRANSFAC, AAINDEX, BLAST, GCG, EMBOSS)
The documentation: string should start with an upper-case letter.
The groups: attribute allows the EMBOSS programs to be grouped together based on their functionality. The groups: attribute is followed by a string value, containing the name(s) of the group(s). When an application belongs to more then one group, the group names must be separated by either a comma (,) or semi-colon (;); i.e. a group name is not a token, but a list of tokens.
The groups: string is also searched by the wossname utility, which finds applications by keyword (in the doc string) and group.
Formalised:
application: appname [ groups: "group name1, group2, ... " ]
Example: ACD file definition (partly):
application: seqret [ groups: "Display" ]
Group names can have spaces in them.
The group names can be split into sub-levels by
the use of a ':' character:
First Level : Second Level
Several third-party interfaces are starting to rely upon there being a maximum
of 2 levels, so do not use more than one ':' in a group name.
The group name is now checked against a list of accepted values in the file groups.standard which is defined and installed in the same directory as the ACD files. This file contains one line for each known group, with subgroups defined with a ":" delimiter, and spaces replaced by underscores. Each group also has a short description.
The table in the following section lists all groups currently defined
The First and Second level group names are given below with some explanation of what might be expected to be placed in the group.
If a group is composed of two levels, such as
Alignment : Consensus
then the group specification must not use the group names singly, (i.e. you
must not use "Alignment" or "Consensus").
If the group consists of only one level, such as
Display
then please don't start adding sub-levels to
it. (i.e. you must not use "Display : Features")
You are strongly encouraged to use the following groups structure. This is the set of groups defined by the groups.standard file. We have found that most things will fit in one or more of these groups. When, however, a completely new category of program is written, please discuss the creation of the new group name with the developers' mailing list. Sometimes a new group is required (for example the group "Enzyme Kinetics" which had to be created to hold 'findkm').
Top Level |
Second Level |
Description |
Acd |
|
ACD file utilities |
Alignment |
Consensus |
Merging sequences to make a consensus |
|
Differences |
Finding differences between sequences |
|
Dot_plots |
Dot plot sequence comparisons |
|
Global |
Global sequence alignment |
|
Local |
Local sequence alignment |
|
Multiple |
Multiple sequence alignment |
Assembly |
Fragment_assembly |
DNA sequence assembly |
Data_Resources |
|
Data resources |
Data_Retrieval |
Chemistry_data |
Chemistry data retrieval |
|
Feature_data |
Sequence feature data retrieval |
|
Ontology_data |
Ontology data retrieval |
|
Resource_data |
Resource data retrieval |
|
Sequence_data |
Sequence data retrieval |
|
Sequence_data:Assembly_data |
Sequence assembly data retrieval |
|
Text_data |
Text data retrieval |
|
Tool_data |
Tool data retrieval |
|
URL_data |
URL data retrieval |
|
Variation_data |
Variation data retrieval |
|
XML_data |
XML data retrieval |
Display |
|
Publication-quality display |
Documentation |
|
Documentation |
Edit |
|
Data file and content editing |
Enzyme_Kinetics |
|
Enzyme kinetics calculations |
Feature_tables |
|
Manipulation and display of sequence annotation |
HMM |
|
Hidden Markov Model analysis |
Information |
|
Information and general help for users |
Literature |
|
Scientific literature and documentation |
Menus |
|
Menu interface(s) |
Nucleic |
2D_structure |
Nucleic acid secondary structure |
|
Codon_usage |
Codon usage analysis |
|
Composition |
Composition of nucleotide sequences |
|
CpG_islands |
CpG island detection and analysis |
|
Functional_sites |
Nucleic acid functional sites |
|
Gene_finding |
Predictions of genes and other genomic features |
|
Motifs |
Nucleic acid motif searches |
|
Mutation |
Nucleic acid sequence mutation |
|
Primers |
Primer prediction |
|
Profiles |
Nucleic acid profile generation and searching |
|
Properties |
Nucleic acid physicochemical properties |
|
Repeats |
Nucleic acid repeat detection |
|
RNA_folding |
RNA folding methods and analysis |
|
Restriction |
Restriction enzyme sites in nucleotide sequences |
|
Transcription |
Transcription factors, promoters and terminator prediction |
|
Translation |
Translation of nucleotide sequence to protein sequence |
Ontology |
EDAM |
EDAM ontology |
|
GO |
GO Gene ontology |
|
SO |
SO Sequence ontology |
|
Taxonomy |
NCBI Taxonomy |
Phylogeny |
Consensus |
Phylogenetic consensus methods |
|
Continuous_characters |
Phylogenetic continuous character methods |
|
Discrete_characters |
Phylogenetic discrete character methods |
|
Distance_matrix |
Phylogenetic distance matrix methods |
|
Gene_frequencies |
Phylogenetic gene frequency methods |
|
Molecular_sequence |
Phylogenetic molecular sequence methods |
|
Tree_drawing |
Phylogenetic tree drawing methods |
Protein |
2D_structure |
Protein secondary structure |
|
3D_structure |
Protein tertiary structure |
|
Composition |
Composition of protein sequences |
|
Domains |
Protein domain analysis |
|
Functional_sites |
Protein functional sites |
|
Modifications |
Protein post-translational modifications |
|
Motifs |
Protein motif searches |
|
Mutation |
Protein sequence mutation |
|
Profiles |
Protein profile generation and searching |
|
Properties |
Protein physicochemical properties |
Test |
|
Testing tools, not for general use. |
Utils |
Database_creation |
Database installation |
|
Database_indexing |
Database indexing |
Table 1. Standard application groups
ACD files describe the parameters that a program needs, in an object-oriented manner. The most important types or objects are file objects, sequence objects, number objects, Boolean objects and string objects. The current objects are listed in Table 1.
Data type / Object |
Description |
Calculated Attributes |
Specific Attributes |
Command Line Qualifiers |
All data types |
||||
|
All data types |
|
additional: "N" |
|
Simple types |
||||
array |
List of floating point numbers |
|
minimum: (-FLT_MAX) |
|
boolean |
Boolean value Yes/No |
|
|
|
float |
Floating point number |
|
minimum: (-FLT_MAX) |
|
integer |
Integer |
|
minimum: (INT_MIN) |
|
range |
Sequence range |
|
minimum: 1 |
|
string |
String value |
length (integer) |
minlength: 0 |
|
toggle |
Toggle value Yes/No |
|
|
|
Input types |
||||
assembly |
Assembly of sequence reads |
|
entry: N |
cbegin: "0" |
codon |
Codon usage file in EMBOSS data path |
|
name: "Ehum.cut" |
format: "" |
cpdb |
Clean PDB file |
|
nullok: N |
format: "" |
datafile |
Data file |
|
name: "" |
|
directory |
Directory |
|
fullpath: N |
extension: "" |
dirlist |
Directory with files |
|
fullpath: N |
extension: "" |
discretestates |
Discrete states file |
|
length: 0 |
|
distances |
Distance matrix |
distancecount (integer) |
size: 1 |
|
features |
Readable feature table |
fbegin (integer) |
type: "" |
fformat: "" |
filelist |
Comma-separated file list |
|
nullok: N |
|
frequencies |
Frequency value(s) |
freqlength (integer) |
length: 0 |
|
infile |
Input file |
|
directory: "" |
|
matrix |
Comparison matrix file in EMBOSS data path |
|
pname: "EBLOSUM62" |
|
matrixf |
Comparison matrix file in EMBOSS data path |
|
pname: "EBLOSUM62" |
|
obo |
OBO bio-ontology term(s) |
|
entry: N |
iformat: "" |
pattern |
Property value(s) |
|
minlength: 1 |
pformat: "" |
properties |
Property value(s) |
propertylength (integer) |
length: 0 |
|
refseq |
Reference sequence |
|
entry: N |
iformat: "" |
regexp |
Regular expression pattern |
length (integer) |
minlength: 1 |
pformat: "" |
resource |
Data resource catalogue entry(s) |
|
entry: N |
iformat: "" |
scop |
Clean PDB file |
|
nullok: N |
format: "" |
sequence |
Readable sequence |
begin (integer) |
type: "" |
sbegin: "0" |
seqall |
Readable sequence(s) |
begin (integer) |
type: "" |
sbegin: "0" |
seqset |
Readable set of sequences |
begin (integer) |
type: "" |
sbegin: "0" |
seqsetall |
Readable sets of sequences |
begin (integer) |
type: "" |
sbegin: "0" |
taxon |
NCBI taxonomy entries |
|
entry: N |
iformat: "" |
text |
Text entries |
|
entry: N |
iformat: "" |
tree |
Phylogenetic tree |
treecount (integer) |
size: 0 |
|
url |
URL entries |
|
entry: N |
iformat: "" |
variation |
Variation entries |
|
entry: N |
iformat: "" |
xml |
Xml |
|
entry: N |
iformat: "" |
Selection lists types |
||||
list |
Choose from menu list of values |
|
minimum: 1 |
|
selection |
Choose from selection list of values |
|
minimum: 1 |
|
Output types |
||||
align |
Alignment output file |
|
type: "" |
aformat: "" |
featout |
Writeable feature table |
|
name: "" |
offormat: "" |
outassembly |
Assembly of sequence reads |
|
name: "" |
odirectory: "" |
outcodon |
Codon usage file |
|
name: "" |
odirectory: "" |
outcpdb |
Cleaned PDB file |
|
nulldefault: N |
|
outdata |
Formatted output file |
|
type: "" |
odirectory: "" |
outdir |
Output directory |
|
fullpath: N |
extension: "" |
outdiscrete |
Discrete states file |
|
nulldefault: N |
odirectory: "" |
outdistance |
Distance matrix |
|
nulldefault: N |
|
outfile |
Output file |
|
name: "" |
odirectory: "" |
outfreq |
Frequency value(s) |
|
nulldefault: N |
odirectory: "" |
outmatrix |
Comparison matrix file |
|
nulldefault: N |
odirectory: "" |
outmatrixf |
Comparison matrix file |
|
nulldefault: N |
odirectory: "" |
outobo |
OBO ontology term(s) |
|
name: "" |
odirectory: "" |
outproperties |
Property value(s) |
|
nulldefault: N |
odirectory: "" |
outrefseq |
Reference sequence |
|
name: "" |
odirectory: "" |
outresource |
Data resource entry |
|
name: "" |
odirectory: "" |
outscop |
Scop entry |
|
nulldefault: N |
odirectory: "" |
outtaxon |
NCBI taxonomy entries |
|
name: "" |
odirectory: "" |
outtext |
Text entries |
|
name: "" |
odirectory: "" |
outtree |
Phylogenetic tree |
|
name: "" |
odirectory: "" |
outurl |
URL entries |
|
name: "" |
odirectory: "" |
outvariation |
Variation entries |
|
name: "" |
odirectory: "" |
outxml |
Xml |
|
name: "" |
odirectory: "" |
report |
Report output file |
|
type: "" |
rformat: "" |
seqout |
Writeable sequence |
|
name: "" |
osformat: "" |
seqoutall |
Writeable sequence(s) |
|
name: "" |
osformat: "" |
seqoutset |
Writeable sequences |
|
name: "" |
osformat: "" |
Graphics types |
||||
graph |
Graph device for a general graph |
|
sequence: N |
gprompt: "N" |
xygraph |
Graph device for a 2D graph |
|
multiple: 1 |
gprompt: "N" |
Table 2. Available Data Types/Objects in ACD.
Array parameters are lists of numbers, either integer or floating point. The ACD attributes control validation, for example the number of values, or a list of numbers that adds to a given total. The data value is a list of numbers separated by spaces or commas.
Boolean parameters are simple switches. If they are entered on the command line the value will be Y (True), if they are absent from the command line the value will be the default value. The name can also be prefixed by 'no' to force the value to be N (False). This is needed if the default value is Y (True). The data value is Y for yes and N for no.
The integer data type can hold simple integer values. The value range can be controlled by minimum and maximum values (a minimum value of 0 or 1 is often useful).
Simple float values. The value range can be controlled by minimum and maximum ACD attributes (a minimum value of 0.0 is often useful).
Ranges of sequence positions. Originally defined as a simple list of paired numbers, ranges can now be specified in files with the range syntax "@filename", as pairs of numbers with text comments. For example:
# this is my set of ranges 12 23 4 5 this is like 12-23, but smaller 67 10348 interesting region
Any string value. The length can be controlled by ACD attributes, and a regular expression pattern to provide more general validation if necessary. Most string values are free text, although strings can be used by a program for any input that is not covered by a defined ACD type.
Toggle parameters are simple switches, and work in the same way as "boolean" parameters. Toggle parameters are intended for use in turning on/off other parameters. When ACD parameters are grouped in sections, a clean ACD file will have all the "required" parameters in the "required" secion and all the "additional" parameters in the "additional" section. Some of these will have calculated values for the "standard" and "additional" attributes, controlled by the value of another parameter. The "toggle" parameters are designed to be used in these calculated values, and can be in the "required" section even if not themselves defined as "standard".
Exactly like "boolean" parameters, if they are entered on the command line the value will be Y (True), if they are absent from the command line the value will be the default value. The name can also be prefixed by 'no' to force the value to be N (False). This is needed if the default value is Y (True). The data value is Y for yes and N for no.
An assembly of sequence reads. The sequence data is read only is the resulotion is at the single base level.
Codon usage tables are simple files read from the EMBOSS data search path, and are distributed in the emboss/data directory.
Codon usage files can be read in several formats, including "gcg".
Cpdb (Cleaned PDB) files are simple input files in CPDB format. See the documentation for pdbparse, part of the EMBASSY domainatrix package, which generates CPDB files from PDB file input.
Datafile input refers to a formatted data file to be read from the standard EMBOSS data file locations (see the EMBOSS Administrator's guide for full details).
EMBOSS looks for data files in the local/share/EMBOSS/data directory, or in various user directories.
Most data files are already defined as their own ACD types - matrix, matrixf, codon. Otehrs are hard coded file names that do not need their own ACD definition, although users are free to define their own file with the appropriate name to override the default file provided.
Directory defines a directory that can be used for input or output definitions.
Directory is intended for future use to replace string definitions of directory names in some applications, and to provide additional validation of the user input specific to directory specifications.
Directory defines a set (list) of directories that can be used for input or output definitions.
Dirlist is intended for future use to replace string definitions of directory names in some applications, and to provide additional validation of the user input specific to directory specifications.
Discretestates is a new ACD type implemented specifically for the "phylipnew" EMBASSY package. Discretestates input is used by the phylip "discrete character" applications. By defining a specific ACD type EMBOSS can provide detailed type checking, and can automatically detect and validate the various alternative formats that phylip supports without the need for complex extra command line options.
Distances is a new ACD type implemented specifically for the "phylipnew" EMBASSY package. Discretestates input is used by the phylip "distance matrix" applications. By defining a specific ACD type EMBOSS can provide detailed type checking, and can automatically detect and validate the various alternative formats that phylip supports without the need for complex extra command line options.
Feature annotation in any known feature format.
Applications requiring a single entry should specify the attribute "maxreads" with a value of "1".
Features can also be read from a sequence and written with a sequence.
Filelist defines a set (list) of input files.
Filelist is intended for future use to replace string definitions of input file names in some applications, and to provide additional validation of the user input specific to multiple input files.
Frequencies is a new ACD type implemented specifically for the "phylipnew" EMBASSY package. Discretestates input is used by the phylip "gene frequency and continuous character" applications. By defining a specific ACD type EMBOSS can provide detailed type checking, and can automatically detect and validate the various alternative formats that phylip supports without the need for complex extra command line options.
Non-sequence-related data file. This data type refers to files that are to be used in the program and do usually not contain sequence data. The type of data can be identified by a "knowntype" attribute and matched to Outfile standard types, or to report, align, featout, or seqout formats.
Comparison matrix files are used by many programs. They are data files read from the EMBOSS data search path, and are distributed in the emboss/data directory. For preference, we use the matrix files distributed with BLAST.
Integer matrices are usually faster and are preferred by most applications. Floating-point matrix files are also available if needed, and an integer matrix file can of course also be read as floating point.
The matrix data type has an attribute to force selection of a nucleic acid or protein comparison matrix. In ACD files, the type of the input sequence is often used here.
Remember that any application which uses gap penalties will need to set them separately for each matrix.
Floating point comparison matrices are required by some algorithms. An integer matrix file can of course be used equally well as a floating point matrix.
One or more terms from an OBO ontology.
Applications requiring a single entry should specify the attribute "maxreads" with a value of "1".
Pattern definitions files allow multiple search patterns to be described, each with a name.
Pattern files are used for PROSITE syntax sequence patterns. The same syntax is used for "regexp" input. Pattern files also allow mismatch values to be defined for each pattern, and a "-pmismatch" qualifier sets the mismatch default for all patterns in the file. Mismatches are not appropriate for regular expression matches.
Properties is a new ACD type implemented specifically for the "phylipnew" EMBASSY package. Properties input is used by the phylip applications to define weights, ancestral states and factors (multi-state characters). By defining a specific ACD type EMBOSS can provide detailed type checking, and can automatically detect and validate the various alternative formats that phylip supports without the need for complex extra command line options.
A reference sequence. The sequence data is read only is the resulotion is at the single base level.
Any regular expression value, or (new in release 4.0.0) a file containing regular expressions and names.
The length can be vallidated and controlled by ACD attributes. The case can be set to upper or lower case only. The regular expression must be supported by the EMBOSS regular expression library.
EMBOSS uses the "Perl-Compatible Regular Expression Library" (PCRE), so any regular expression that is valid in Perl 5.0 should be valid here.
One or more entries from the data resource catalogue (DRCAT.dat).
Applications requiring a single entry should specify the attribute "maxreads" with a value of "1".
Stream of entries from the data resource catalogue (DRCAT.dat)
SCOP files are simple input files in SCOP format.
USA (database reference or file) indicating a single sequence. The type of sequence can be restricted by specific attribute "type" (for example, the program should only accept DNA files). Can also read features if the "features" ACD attribute is set.
set of single sequences that can be addressed one after another (for example a set of sequences that will be used in an multiple alignment). The type of sequence can be restricted by specific attribute "type" (for example, the program should only accept DNA files). Can also read features if the "features" ACD attribute is set.
set of single sequences that can be used all at the same time (for example a database of some sort that is to be used for a pattern search). The type of sequence can be restricted by specific attribute "type" (for example, the program should only accept DNA files). Can also read features if the "features" ACD attribute is set.
One or more sets of single sequences that can be used all at the same time (for example a database of some sort that is to be used for a pattern search). The type of sequence can be restricted by specific attribute "type" (for example, the program should only accept DNA files). Can also read features if the "features" ACD attribute is set.
Taxon is a new ACD type implemented specifically for taxon data from the NCBI taxonomy. By defining a specific ACD type EMBOSS can provide detailed type checking, and can automatically detect and validate the various alternative formats for taxonomy data from the major sequence databases without the need for complex extra command line options.
Text is a new ACD type for unparsed text data. Text data allows EMBOSS to retrieve useful information from a very large range of remote data resources linked to results from an application. Input format options allow the user to control the stripping of HTML or XML markup and other editing functions.
Tree is a new ACD type implemented specifically for the "phylipnew" EMBASSY package. Tree input is used by the phylip applications to define one or more phylogenetic trees. By defining a specific ACD type EMBOSS can provide detailed type checking, and can automatically detect and validate the various alternative formats that phylip supports without the need for complex extra command line options. The trees are currently parsed by phylip itself, but in the near future we will implement parsing methods in ACD processing.
URL is a new ACD type implemented specifically for URL data from the data resource catalogue where the content returned by the URL is not readable as text. By defining a specific ACD type EMBOSS can provide detailed type checking, and can automatically detect and validate the various associated annotations in the data resource catalogue without the need for complex extra command line options.
Variation is a new ACD type implemented specifically for variation data from Ensembl and VCF file input. By defining a specific ACD type EMBOSS can provide detailed type checking, and can automatically detect and validate the various alternative formats from the major databases without the need for complex extra command line options.
XML is a new ACD type implemented specifically for XML data from databases where the content cotains infomration in an XML syntax. By defining a specific ACD type EMBOSS can provide detailed type checking, and can automatically detect, parse and validate the various named formats.
Selection lists are a way to present the user with a limited list of options he/she can choose from. For the user, the difference between the list and selection data type is minimal and lies only in the way the choices are labelled. In a selection data type, the choices are numbered automatically from 1 up. In a list data type the choices can be labelled by any arbitrary text label. The user can choose one of the options by either typing the number (for a selection type) or the text of the label (for a list type) or a non-ambiguous part of the value of the choice. In practice, the list data type is much preferred for this reason.
A list of text descriptions with short labels. The user can enter one (or sometimes more) labels, or can specify partial text descriptions. The program is given a list of text labels as input.
A list of text descriptions (usually short, unlike list data), with generated numbers. The user can enter one (or sometimes more) numbers, or can specify partial text descriptions. The program is given a list of text descriptions as input. The listdata type is usually preferred.
An output file for sequence alignments. Defined in the same way as a plain text "Outfile" but with extra qualifiers to allow a choice of alignment formats, and attributes to specify whether the alignment will have 2 or more sequences (which limits the possible formats). The data is stored as sequences, the available formats include the most common sequence formats.
Feature annotation in any known feature format. Can also be stored with the sequence if the sequence output "features" attribute is set.
Output file containing sequence assembly data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing codon usage data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing cleaned PDB protein structure data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing cleaned formatted data as tables or lists. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Multiple outdata definitions are by default appended to a single file. The individual ACD definitions allow the format of each file section to be defined.
Output directory for multiple output files to be written. Specifying an outdir allows other properties to be defined, including the default file extension with the "extension" attribute.
Output file containing phylogenetics discrete characteristics data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing phylogenetics distance matrix data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Non-sequence-related data file, usually plain text. This data type refers to files that are to be produced by the program and usually do not contain sequence data. The type of data can be identified by a "knowntype" attribute and matched to an Infile standard type for use as input to another program.
Output file containing phylogenetics character frequency data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing integer comparison matrix data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing floating point comparison matrix data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing bio-ontology term data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing phylogenetics property data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing reference sequence data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing data resource entry data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing SCOP protein domain data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing taxonomy data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing unparsed text data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing phylogenetic tree data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing URL data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing variation data. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
Output file containing XML data, usually a simple text dump of the input though some XML formats may have been converted. The default data format can be specified by an "oformat" attribute which the -oformat associated qualifier can override.
An output file for sequence annotation. Defined in the same way as a plain "Outfile" but with extra qualifiers to allow a choice of report formats. Report data is stored internally as a feature table, so the available formats include the most common feature formats.
USA (database reference or file) indicating a single sequence. Can also write features if the "features" ACD attribute is set.
The default file extension is the sequence format, but can be specifically set with the "osextension" attribute, for example where appliations produce two or more sequence outputs.
A set of single sequences to be written to a single file. Can also write features if the "features" ACD attribute is set.
The default file extension is the sequence format, but can be specifically set with the "osextension" attribute, for example where appliations produce two or more sequence outputs.
A set of single sequences stored in memory together, usually a multiple sequence alignment. Can also write features if the "features" ACD attribute is set.
The default file extension is the sequence format, but can be specifically set with the "osextension" attribute, for example where appliations produce two or more sequence outputs.
For graphical output of any general kind, including dotplots. The data value is the graphics device, as specified by the "PLPLOT" graphics library used in EMBOSS at present. Example values include "ps" for Postscript, "png" for PNG files, and "X11" for X-Windows. A value of "?" in answer to the prompt will list the available graphics devices on your installation.
For graphical output as a simple two dimensional (2D) XY plot with the sequence along the x-axis. . The data value is the graphics device, as specified by the "PLPLOT" graphics library used in EMBOSS at present. Example values include "ps" for Postscript, "png" for PNG files, and "X11" for X-Windows. A value of "?" in answer to the prompt will list the available graphics devices on your installation.
ACD objects have mandatory names.
Formalised:
datatype: parametername [ ]
Example:
sequence: asequence [ ]
This defines asequence to be the name of a sequence object.
In order to assign a value to a parameter, the name of the parameter can be specified on the command line (in a number of ways, see section 4) followed by a value that is appropriate for that data type.
Example:
ACD file definition (partly):
sequence: asequence [ ]
Command line :
% acddemo -asequence filename.seq
This defines filename.seq to be the value of the parameter named asequence for the EMBOSS program acddemo.
If a parameter is defined with a special parameter attribute ( parameter:"Y"), using the name of the parameter on the command line is not mandatory (see section 3.4). This is commonly used for input data and for output filenames.
The name of an object is also used, in the EMBOSS program, to refer to the value of the parameter. After the initiation call using the EMBOSS function embInit(), the values of the parameters have been read in and checked (see 1.4). The program must then assign the parameter to an actual EMBOSS object, like sequence (AjPSeq), string (AjPStr) etc. The actual function calls are beyond the scope of this document, and the reader is referred to the AJAX documentation (http://srs.ebi.ac.uk/srs7bin/cgi-bin/wgetz?-fun+pagelibinfo+-info+EDATA for the SRS searchable Object documentation), but some examples can be found in section 1.4 and 1.5.
The name can also be used in the definition of other ACD parameters. The value of the parameter (or variable) is retrieved, using the dollar sign '$' and a the name of the parameter encapsulated by a pair of parentheses.
Formalised:
$(parametername)
Example:
integer: gappenalty [ standard: Y default: 10 ] integer: gapextpenalty [ default: $(gappenalty) ]
This defines the default for parameter gapextpenalty as the value of parameter gappenalty.
Naming conventions
Although everybody is free to use any (valid) name for a parameter, we would like to propose a naming convention, to streamline the development of ACD files.
Name |
Datatype |
Usage |
sequence |
sequence |
primary input sequence, generally required |
outseq |
outseq |
primary output sequence, generally required, generally should default to the primary input sequence name, extension defaults to the name of the output sequence format. |
outfile |
outfile |
primary output non-sequence results file, generally required. The file extension should be allowed to default to the application name. |
data |
infile |
primary auxiliary input data file, generally optional |
minlen |
int |
minimal length of sequence feature to be found |
maxlen |
int |
maximum length of sequence feature to be found |
wordsize |
int |
word size for hash tables etc. generally minimum=2 for protein, 4 for DNA |
window |
int |
window length for calculating dotplots/features/etc. |
shift |
int |
amount by which window is shifted in each iteration |
consensus |
bool |
flag for whether consensus sequence should be output |
gap |
float |
gap penalty |
gapext |
float |
gap extension penalty |
from |
int |
position of start of input sequence to specify for an operation (e.g. deletion), defaults to start of sequence, minimum value = 1, maximum value = sequence length |
to |
int |
position of end of input sequence to specify for an operation (e.g.: deletion), defaults to the 'from' value, minimum value = 'from', value, maximum = sequence length. |
threshold |
float/int |
threshold for various operations |
left |
bool |
operation should be done at the start of the sequence |
right |
bool |
operation should be done at the end of the sequence |
pattern |
string |
pattern to search for in sequence |
patterns |
infile |
file of patterns to search for in sequence |
Table 3. Recommended naming conventions.
There are two types of attributes for parameters. 'Global' attributes cen be defined for any ACD data type. Each data type then has its own set of 'specific' attributes. These definitions can refer to 'calculated' attributes generated automatically by ACD processing. The 'global' and 'specific' ttributes are part of the parameter definition and are placed between the square brackets.
Formalised:
datatype: parametername [ attribute: "value" ]
Attributes to parameters can specify the default value, and the requirements for a correct value, for a parameter. It can specify whether the parameter is mandatory and what the limits are for a valid value. There are global attributes that apply to all data types and there are data type-specific attributes.
default:
Defines the default value for the parameter, which can be dependent on the values of parameters defined earlier.
Each data type has a default value, which can be valid (for example a boolean will default to "N") or invalid (many input types will default to an empty string).
information:
The string giving information about the parameter, for use on Web forms and in GUIs and also a default prompt to the user
For some data types (sequence is a good example) there are standard prompts so no value is expected, and the acdvalid utility will issue a warning if an information attribute is found.
parameter:
Defines a parameter on the command line which can appear without a qualifier name. Also implies that the value is required and will be prompted for if missing.
standard:
Indicates whether a parameter is mandatory and will be prompted for if missing.
additional:
Indicates if the parameter should be queried for when the -options qualifier is set on the command line.
help:
The string shown when the -help qualifier is used on the command line
Help is usually only defined if a specific string is needed. If help is not defined, the value of the "information" attribute, or the default prompt, will be used.
expect:
A string used in the "Default" column of the command line syntax table in the documentation. This table is automatically generated from the ACD file, and in most cases there is a reasonable value generated. Where there is no suitable value, this attribute should be used to provide one.
valid:
A string used in the "Allowed values" column of the command line syntax table in the documentation. This table is automatically generated from the ACD file, and in most cases there is a reasonable value generated. Where there is no suitable value, this attribute should be used to provide one.
knowntype:
The knowntype attribute defines one of a controlled vocabulary of known value types. Some ACD data types require a knowntype attribute.
These standard values are read from a file knowntypes.standard which is stored and installed in the ACD file directory. A few other values are accepted, for example "(programname) output" for an outfile data type. These are documented under each output type. The acdvalid utility will check all knowntype values in an ACD file, and report any missing values for data types that require a knowntype.
relations:
The relations attribute is a strict definition of the datatype using the EMBRACE Dataypes and Methods (EDAM) ontology. The string value is in the form:
relations: "EDAM:0123456 Edam term name"
prompt:
The string used if the user has to be queried for a value, though information can be used instead and usually only one will be defined. information is preferred.
missing:
Indicates whether a qualifier can have no value, especially when it appears on the command line (for example to override a default value in the ACD file).
needed:
Indicates whether a parameter is expected to be included in a GUI form. Some parameters are available on the command line, but are not generally useful to users, or can cause confusion when presented in a GUI form with all other options.
outputmodifier:
Indicates that this qualifier modifies the output in ways that can break parsers, for example by changing text output into HTML. Authors of wrappers can use this to test for qualifiers that can be hardcoded to fix the output syntax and content. Please let the EMBOSS team know if any other qualifiers are candidates for marking as output modifiers.
code:
A code word (no spaces) which is searched for in the file codes.english to give a standard prompt, for example when asking for an alignment gap penalty. The standard default prompts are in the same file. The code word is not case-sensitive. information is preferred.
comment:
A comment, provided for use by the EBI's SoapLab project but not defined in the standard ACD files.
style:
Provided for use by the EBI's SoapLab project but not defined in the standard ACD files.
Any global or specific attribute must have a second token representing the value of the attribute. The attribute must be followed by a colon ':' and usually the value will be enclosed in double quotes.
The syntax of the global attributes is
Formalised:
help: "String" information: "String" default: "value" additional: "Y"/"N" parameter: "Y"/"N" information: "String" standard: "Y"/"N"
Example:
sequence: asequence [ standard: "Y" information: "Enter filename" ]
The parameter: attribute is a boolean attribute, defining the order of the parameters on the command line, if the parameter name is not explicitly entered on the command line. If set to Y, the parameter can be entered on the command line without using the parameter name.
Formalised:
datatype: parametername [ parameter: Y/N ]
Example:
ACD file definition (partly) :
application: acddemo [ documentation: "" groups: "" ] sequence: asequence [ ]
Command line :
% acddemo -asequence filename.seq
Is equivalent to:
ACD file definition (partly) :
sequence: asequence [ parameter: Y ]
Command line:
% acddemo filename.seq
In both examples filename.seq is the value of the parameter named asequence for the EMBOSS program acddemo.
The second example will also allow the command line from the first, as parameter names are accepted as qualifiers.
If more then one parameter: attribute is used, the order in which they appear in the ACD file is the same as the order in which they appear on the command line.
Example: ACD file definition (partly) :
application: acddemo [ documentation: "" groups: "" ] sequence: asequence [ parameter: Y ] outseq: outseq [ parameter: Y ]
Command line :
% acddemo infilename.seq outfilename.seq
will assign the name infilename.seq to parameter asequence, and outfilename.seq to parameter outseq.
Any program is expected to have one or more required inputs. An ACD data type that is defined as a "parameter:" (see section 2.4.1.1.1) is automatically counted as required. All other required inputs should have the "standard:" attribute set.
When the program runs, the user will be prompted for any "required" values that are not already on the command line.
The only difference between "parameter:" and "standard:" is that a "parameter" can appear on the command line as the simple value with no name, to provide simple command lines.
When the additional: attribute is set, the parameter will only be queried for, when the -options qualifier is set (on the command line or when the system default is set using an environment variable (See 3.7) or any other way). If the -options qualifier is not set, the user will not be queried for this parameter, if it is omitted in the program execution (i.e. not mentioned on the command line or any other way).
The information: attribute defines the text hint to the user entering a data value. The same text is intended for use in the prompt to the user at a terminal, and as the text in an HTML form or a GUI.
In rare cases where the information: string is misleading, a prompt: string can be defined for use as a terminal prompt. For general use, information: is now preferred.
To provide standard prompts for common ACD data, there are default information: strings for most data types. These can be found in the file codes.english with the names DEFXXXX where XXXX is the name of the ACD data type.
Common practice is to use the default prompt for input and output ACD data types.
The help: attribute is shown in the help information, when the user requests assistance using the -help qualifier on the command line, or when help in other format is requested (Web page).
Again, there is a default help string in the codes.english file with the name HELPXXXX where XXXX is the name of the ACD data type.
The codes.english file includes some additional standard prompts such as GAP for gap penalties. These prompts can be used with the code: attribute, for example code: "GAP", but GUI developers found these hard to use, so we have replaced them with normal information: attributes.
The default set of attributes is available for all ACD data type definitions.
Each ACD type has its own set of specific attributes, summarized in Table 1 and described in more detail below.
Formalised:
Data type |
Attribute definition |
Description |
array |
minimum: float |
Minimum value |
|
maximum: float |
Maximum value |
|
increment: float |
(Not used by ACD) Increment for GUIs |
|
precision: integer |
(Not used by ACD) Floating precision for GUIs |
|
warnrange: Y/N |
Warning if values are out of range |
|
size: integer |
Number of values required |
|
sum: float |
Total for all values |
|
sumtest: Y/N |
Test sum of all values |
|
trueminimum: Y/N |
If max/min overlap, use minimum |
|
failrange: Y/N |
Fail if (calculated) ranges overlap |
|
rangemessage: string |
Failure message if (calculated ranges) overlap |
|
tolerance: float |
Tolerance (sum +/- tolerance) of the total |
float |
minimum: float |
Minimum value |
|
maximum: float |
Maximum value |
|
increment: float |
(Not used by ACD) Increment for GUIs |
|
precision: integer |
Precision for printing values |
|
warnrange: Y/N |
Warning if values are out of range |
|
trueminimum: Y/N |
If max/min overlap, use minimum |
|
failrange: Y/N |
Fail if calculated ranges overlap |
|
rangemessage: string |
Failure message if calculated ranges overlap |
|
large: Y/N |
Large values returned as double |
|
trueminimum: Y/N |
If max/min overlap, use minimum |
integer |
minimum: integer |
Minimum value |
|
maximum: integer |
Maximum value |
|
increment: integer |
(Not used by ACD) Increment for GUIs |
|
warnrange: Y/N |
Warning if values are out of range |
|
failrange: Y/N |
Fail if calculated ranges overlap |
|
rangemessage: string |
Failure message if calculated ranges overlap |
|
large: Y/N |
Large values returned as long |
|
trueminimum: Y/N |
If max/min overlap, use minimum |
range |
minimum: integer |
Minimum value |
|
maximum: integer |
Maximum value |
|
trueminimum: Y/N |
If max/min overlap, use minimum |
|
warnrange: Y/N |
Warning if values are out of range |
|
failrange: Y/N |
Fail if calculated ranges overlap |
|
rangemessage: string |
Failure message if calculated ranges overlap |
|
size: integer |
Exact number of values required |
|
minsize: integer |
Minimum number of values required |
string |
minlength: integer |
Minimum length |
|
maxlength: integer |
Maximum length |
|
pattern: string |
Regular expression for validation |
|
upper: Y/N |
Convert to upper case |
|
lower: Y/N |
Convert to lower case |
|
word: Y/N |
Disallow whitespace in strings |
Table 4.1. Simple data types - attributes.
The value for an array is a set of floating point numbers with white space or commas. The size: attribute sets the number of elements in the array. As for the float data type, the minimum: and maximum: attributes define the lower and upper value limits and default to the boundaries as specified by the systems set-up. For validation purposes, the sum: attribute defines the total for all values in the array (tested unless the sumtest: attribute is false), and the tolerance: attribute specifies how closely the sum should match the total. Remember that most floating point fractions cannot be represented accurately in binary form.
The warnrange: attribute warns the user if an out of range value (below the minimum, or above the maximum) has been automatically adjusted to be valid. This includes adjusting a default value where the calculated range no longer includes it.
Where one or both of the minimum: or maximum values are calculated there is a possibility that the maximum value will be defined lower than the minimum. In such cases there is a requirement (an error message is generated if the definitions are missing) that the failrange: attribute is defined.
Where the maximum is lower than the minimum and failrange: is false, by default the maximum value is used. Where this is not ideal for the application, the trueminimum: attribute can be set true to use the minimum value as the only acceptable value.If failrange: is defined true, there must be a failmessage: attribute explaining to the user why there is no valid value between the defined minimum and maximum. An error message is generated if the failmessage attribute is absent in this case.
Although there are (currently) no specific attributes for a boolean ACD type, care should be taken over the definition of the information: and help: attributes. These are used to prompt the user (interactively or via a GUI), and to provide help text. The text provided in each case should reflect the expected default value of the boolean option, which may be the opposite of what the name implies. For example, if set to "Y" by default, then the command line option would typically be "-noxxx" where "xxx" is the qualifier. If set to "N" by default, then the default action may be the opposite of what the information or help text implies. If the value is calculated, the user may need some extra guidance.
The outputmodifier: attribute is set where this parameter changes the content or syntax of the output. This is provided for the developers of other interfaces and parsers of EMBOSS output so that they can fix the value, or provide parsers for each alternative.
The large: attribute defines values that can exceed the size or accuracy of a standard float value and will be passed as double precision to the application.
The minimum: and maximum: attributes define the lower and upper value limits and default to the boundaries as specified by the systems set-up.
The warnrange: attribute warns the user if an out of range value (below the minimum, or above the maximum) has been automatically adjusted to be valid. This includes adjusting a default value where the calculated range no longer includes it.
Where one or both of the minimum: or maximum values are calculated there is a possibility that the maximum value will be defined lower than the minimum. In such cases there is a requirement (an error message is generated if the definitions are missing) that the failrange: attribute is defined.
Where the maximum is lower than the minimum and failrange: is false, by default the maximum value is used. Where this is not ideal for the application, the trueminimum: attribute can be set true to use the minimum value as the only acceptable value.If failrange: is defined true, there must be a failmessage: attribute explaining to the user why there is no valid value between the defined minimum and maximum. An error message is generated if the failmessage attribute is absent in this case.
The increment: attribute defines the steps that this parameter is allowed to take, in case there is a need to iterate this parameter. The increment: attribute can be any valid float value.
The precision: attribute defines the maximum number of significant decimal places that will be taken into account for this value.
The large: attribute defines values that can exceed the size of a standard integer and will be passed as long integers to the application.
The integer data type can hold simple integer values. The minimum: and maximum: attributes define the boundaries and default to the boundaries as specified by the systems setup.
The warnrange: attribute warns the user if an out of range value (below the minimum, or above the maximum) has been automatically adjusted to be valid. This includes adjusting a default value where the calculated range no longer includes it.
Where one or both of the minimum: or maximum values are calculated there is a possibility that the maximum value will be defined lower than the minimum. In such cases there is a requirement (an error message is generated if the definitions are missing) that the failrange: attribute is defined.
Where the maximum is lower than the minimum and failrange: is false, by default the maximum value is used. Where this is not ideal for the application, the trueminimum: attribute can be set true to use the minimum value as the only acceptable value.If failrange: is defined true, there must be a failmessage: attribute explaining to the user why there is no valid value between the defined minimum and maximum. An error message is generated if the failmessage attribute is absent in this case.
The increment: attribute defines the steps that this parameter is allowed to take, in case there is a need to iterate this parameter.
Sequence ranges have similar attribute to integers. The minimum: and maximum: attributes define the boundaries and default to the boundaries as specified by the systems setup. The minlength: attribute defines the minimum number of values required.
The warnrange: attribute warns the user if an out of range value (below the minimum, or above the maximum) has been automatically adjusted to be valid. This includes adjusting a default value where the calculated range no longer includes it.
Where one or both of the minimum: or maximum values are calculated there is a possibility that the maximum value will be defined lower than the minimum. In such cases there is a requirement (an error message is generated if the definitions are missing) that the failrange: attribute is defined.
Where the maximum is lower than the minimum and failrange: is false, by default the maximum value is used. Where this is not ideal for the application, the trueminimum: attribute can be set true to use the minimum value as the only acceptable value.If failrange: is defined true, there must be a failmessage: attribute explaining to the user why there is no valid value between the defined minimum and maximum. An error message is generated if the failmessage attribute is absent in this case.
The size: attribute defines an exact number of values required. The minsize: attribute defines a minimum number of values required for ranges that can be any length. Only one of these values should be defined for any range.
The value provided by the user is a list of sequence position pairs to be interpreted by the application. The upper and lower bounds (sequence positions can be negative to count back from the end) will depend on the length of the sequence to which they are applied.
The minlength: attribute defines the minimum length the string must be, the maxlength: attribute defines the maximum length the string can be. The default minimum length is zero. There is no default maximum.
The pattern: attribute defines a regular expression used to check the string value. ACD uses the Perl-compatible regular expression library (PCRE) so any Perl-compatible regular expression should be usable.
The word: attribute requires the result to be a valid word with no whitespace. The default minimum length of zero allows an empty string but this is not accepted as a word. This may change in future.
Although there are (currently) no specific attributes for a toggle ACD type, care should be taken over the definition of the information: and help: attributes. These are used to prompt the user (interactively or via a GUI), and to provide help text. The text provided in each case should reflect the expected default value of the toggle option, which may be the opposite of what the name implies. For example, if set to "Y" by default, then the command line option would typically be "-noxxx" where "xxx" is the qualifier. If set to "N" by default, then the default action may be the opposite of what the information or help text implies. If the value is calculated, the user may need some extra guidance.
The outputmodifier: attribute is set where this parameter changes the content or syntax of the output. This is provided for the developers of other interfaces and parsers of EMBOSS output so that they can fix the value, or provide parsers for each alternative.
Formalised:
Data type |
Attribute definition |
Description |
assembly |
entry: Y/N |
Read whole entry text |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
codon |
name: string |
Codon table name |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
cpdb |
nullok: Y/N |
Can accept a null filename as 'no file' |
datafile |
name: string |
Default file base name |
|
extension: string |
Default file extension |
|
directory: string |
Default installed data directory |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
directory |
fullpath: Y/N |
Require full path in value |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
dirlist |
fullpath: Y/N |
Require full path in value |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
discretestates |
length: integer |
Number of discrete state values per set |
|
size: integer |
Number of discrete state set |
|
characters: string |
Allowed discrete state characters (default is '' for all non-space characters |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
distances |
size: integer |
Number of rows |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
|
missval: Y/N |
Can have missing values (replicates zero) |
features |
type: string |
Feature type (protein, nucleotide, etc.) |
|
entry: Y/N |
Read whole entry text |
|
minreads: integer |
Minimum number of inputs |
|
maxreads: integer |
Maximum number of inputs |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
filelist |
nullok: Y/N |
Can accept a null filename as 'no file' |
|
binary: Y/N |
File contains binary data |
frequencies |
length: integer |
Number of frequency loci/values per set |
|
size: integer |
Number of frequency sets |
|
continuous: Y/N |
Continuous character data only |
|
genedata: Y/N |
Gene frequency data only |
|
within: Y/N |
Continuous data for multiple individuals |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
infile |
directory: string |
Default directory |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
|
trydefault: Y/N |
Default filename may not exist if nullok is true |
|
binary: Y/N |
File contains binary data |
matrix |
pname: string |
Default name for protein matrix |
|
nname: string |
Default name for nucleotide matrix |
|
protein: Y/N |
Protein matrix |
matrixf |
pname: string |
Default name for protein matrix |
|
nname: string |
Default name for nucleotide matrix |
|
protein: Y/N |
Protein matrix |
obo |
entry: Y/N |
Read whole entry text |
|
minreads: integer |
Minimum number of inputs |
|
maxreads: integer |
Maximum number of inputs |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
pattern |
minlength: integer |
Minimum pattern length |
|
maxlength: integer |
Maximum pattern length |
|
maxsize: integer |
Maximum number of patterns |
|
upper: Y/N |
Convert to upper case |
|
lower: Y/N |
Convert to lower case |
|
type: string |
Type (nucleotide, protein) |
properties |
length: integer |
Number of property values per set |
|
size: integer |
Number of property sets |
|
characters: string |
Allowed property characters (default is '' for all non-space) |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
refseq |
entry: Y/N |
Read whole entry text |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
regexp |
minlength: integer |
Minimum pattern length |
|
maxlength: integer |
Maximum pattern length |
|
maxsize: integer |
Maximum number of patterns |
|
upper: Y/N |
Convert to upper case |
|
lower: Y/N |
Convert to lower case |
|
type: string |
Type (string, nucleotide, protein) |
resource |
entry: Y/N |
Read whole entry text |
|
minreads: integer |
Minimum number of inputs |
|
maxreads: integer |
Maximum number of inputs |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
scop |
nullok: Y/N |
Can accept a null filename as 'no file' |
sequence |
type: string |
Input sequence type (protein, gapprotein, etc.) |
|
features: Y/N |
Read features if any |
|
entry: Y/N |
Read whole entry text |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
seqall |
type: string |
Input sequence type (protein, gapprotein, etc.) |
|
features: Y/N |
Read features if any |
|
entry: Y/N |
Read whole entry text |
|
minseqs: integer |
Minimum number of sequences |
|
maxseqs: integer |
Maximum number of sequences |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
seqset |
type: string |
Input sequence type (protein, gapprotein, etc.) |
|
features: Y/N |
Read features if any |
|
aligned: Y/N |
Sequences are aligned |
|
minseqs: integer |
Minimum number of sequences |
|
maxseqs: integer |
Maximum number of sequences |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
seqsetall |
type: string |
Input sequence type (protein, gapprotein, etc.) |
|
features: Y/N |
Read features if any |
|
aligned: Y/N |
Sequences are aligned |
|
minseqs: integer |
Minimum number of sequences |
|
maxseqs: integer |
Maximum number of sequences |
|
minsets: integer |
Minimum number of sequence sets |
|
maxsets: integer |
Maximum number of sequence sets |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
taxon |
entry: Y/N |
Read whole entry text |
|
minreads: integer |
Minimum number of inputs |
|
maxreads: integer |
Maximum number of inputs |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
text |
entry: Y/N |
Read whole entry text |
|
minreads: integer |
Minimum number of inputs |
|
maxreads: integer |
Maximum number of inputs |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
tree |
size: integer |
Number of trees (0 means any number) |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
url |
entry: Y/N |
Read whole entry text |
|
minreads: integer |
Minimum number of inputs |
|
maxreads: integer |
Maximum number of inputs |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
variation |
entry: Y/N |
Read whole entry text |
|
minreads: integer |
Minimum number of inputs |
|
maxreads: integer |
Maximum number of inputs |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
xml |
entry: Y/N |
Read whole entry text |
|
minreads: integer |
Minimum number of inputs |
|
maxreads: integer |
Maximum number of inputs |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
Table 4.2. Input data types - attributes.
If the entry: attribute is set, the input will include the full original text of the input file or database entry.
Codon usage tables are species-specific, and in some cases specific to a class of genes within a species. This makes it useful to specify a default value for a codon usage table name. Internally, a default is set in the ACD source code. Usually this is "Ehum.cut", the human codon usage table provided in the EMBOSS distribution.
Individual codon inputs can set their own default names with the name: attribute which in the current version has the same effect as setting the default: attribute.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without a data file. The application must be able to accept a null value for this qualifier.
Cleaned PDB file input has a default value (typically "1azu") set in the ACD source code although this is not really a good idea.
Individual cpdb inputs can set their own default names with the name: attribute which in the current version has the same effect as setting the default: attribute.
The default datafile name is defined by two ACD attributes, name: and extension:. The directory: attribute defines the EMBOSS data subdirectory to be searched.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without a data file. The application must be able to accept a null value for this qualifier.
The extension: attribute sets the extension for all files read from the directory. Files with other extensions will not be read
The fullpath: attribute can be used to require a full rather than a relative path specification for a directory.
If a null value (the current directory) is allowed,the nullok: attribute must be set true.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without a directory. The application must be able to accept a null value for this qualifier.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no directory) as the default for programs where a directory is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The extension: attribute sets the extension for all files read from the directories. Files with other extensions will not be read
The fullpath: attribute can be used to require a full rather than a relative path specification for a directory.
If a null value (the current directory) is allowed,the nullok: attribute must be set true.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without a directory. The application must be able to accept a null value for this qualifier.
The discretestates data type can be replaced by a simple input file in GUIs, with the user required to provide the correct data format.
The attributes define characteristics required for Phylip programs.
The length: attribute defines the number of state values (the length of the discrete characters string) in each set
The size: attribute defines the number of sets of values, usually 1 but some programs will accept multiple sets.
The characters: attribute defines which discrete state characters can be specified. This is defined as a string containing all possible characters.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without a discretestates file.
The distances data type can be replaced by a simple input file in GUIs, with the user required to provide the correct data format.
The attributes define characteristics required for Phylip programs. The distance matrices accepted by ACD include all the formats read by Phylip, with automatic interconversion.
The length: attribute defines the number of rows in the distance matrix.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without a distance file.
The type: attribute defines whether the feature input is "protein" or "nucleotide". There is a default based on the type of any input sequence, but a value should always be specified.
Applications requiring a single entry should specify the attribute maxreads: with a value of "1".
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without features input. The application must be able to accept a null value for this qualifier.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no feature input) as the default for programs where a directory is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
If the entry: attribute is set, the input will include the full original text of the input file or database entry.
Filelist is equivalent to infile, but allows the user to specify one or more input files.
The nullok: attribute specifies that a missing input file is acceptable to the application, and that -noxxx can be used on the command line to avoid reading the default input file (if any)
The frequencies data type can be replaced by a simple input file in GUIs, with the user required to provide the correct data format.
The attributes define characteristics required for Phylip programs. The frequencies files formats accepted by ACD include all the formats read by Phylip, with automatic interconversion.
The length: attribute defines the number of loci (or values) in the frequencies file.
The size: attribute defines the number of sets of values, usually 1 but some programs will accept multiple sets.
The continuous: attribute specifies a frequencies file with continuous character data values.
The genedata: attribute specifies a frequencies file with genetic locus data values.
The within: attribute specifies a frequencies file with continuous data for multiple individuals (additional values on each line).
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without a frequencies file.
The binary: attribute specifies that the input file is expected to contain binary data and is not suitable for creation in a text editor.
The nullok: attribute specifies that a missing input file is acceptable to the application, and that -noxxx can be used on the command line to avoid reading the default input file (if any)
The trydefault: attribute specifies that the default filename may not exist. If nullok: is also defined as true then no error is reported.
The protein: attribute will determine if the scoring matrix is used as a DNA or Protein matrix.
The protein: attribute will determine if the scoring matrix is used as a DNA or Protein matrix.
Applications requiring a single entry should specify the attribute maxreads: with a value of "1".
If the entry: attribute is set, the input will include the full original text of the input file or database entry.
Patterns are processed by an internal set of library functions designed to handle PROSITE-style pattern definitions.
The minlength: attribute defines the minimum length the string must be, the maxlength: attribute defines the maximum length the regular expression string can be.
The upper: and lower:attributes convert an input regular expression to upper or lower case before compiling.
The type: attribute describes the pattern as applying to nucleotide or protein sequence. Nucleotide patterns are compared in both directions.
The properties data type can be replaced by a simple input file in GUIs, with the user required to provide the correct data format.
The attributes define characteristics required for Phylip programs. The properties files accepted by ACD include all the formats read by Phylip, with automatic interconversion.
The length: attribute defines the number of values in the properties file.
The size: attribute defines the number of sets of values, usually 1 but some programs will accept multiple sets.
The characters: attribute defines which property characters can be specified. This is defined as a string containing all possible characters.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without a properties file.
If the entry: attribute is set, the input will include the full original text of the input file or database entry.
Regular expressions are processed by the "Perl-Compatible Regular Expression Library" (PCRE). Any value must be accepted by this library's compilation function. Some additional attributes are provided for further validation by ACD.
The minlength: attribute defines the minimum length the string must be, the maxlength: attribute defines the maximum length the regular expression string can be.
The upper: and lower:attributes convert an input regular expression to upper or lower case before compiling.
The type: attribute describes the pattern as applying to nucleotide or protein sequence. Nucleotide patterns are compared in both directions.
Applications requiring a single entry should specify the attribute maxreads: with a value of "1".
If the entry: attribute is set, the input will include the full original text of the input file or database entry.
Scop file input has a default value (typically "d3sdha") set in the ACD source code although this is not really a good idea.
Individual scop inputs can set their own default names with the name: attribute which in the current version has the same effect as setting the default: attribute.
The type: attribute will force the sequence to be of the given type. By default, any sequence type is accepted.
We recommend always defining the type: attribute so that the accepted input sequence type is always clear.
If the features: attribute is set, the sequence input will include feature information either in the same file (if the sequence format supports it) or in a separate file (by default in GFF format).
If the entry: attribute is set, the sequence input will include the full original text of the input sequence or database entry.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without a sequence input. The application must be able to accept a null value for this qualifier.
The sask: attribute sets the defauklt for the -sask qualifier, and if set to "Y" specifies that the program will prompt the user for a sequence begin and end position, and prompt for the reversing of a nucleotide sequence. The EMBOSS "yank" program works with fragments of sequences, and uses the sask: attribute to prompt the user.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no sequence input) as the default for programs where seqeunce input is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The type: attribute will force the sequence(s) to be of the given type. By default, any sequence type is accepted.
We recommend always defining the type: attribute so that the accepted input sequence type is always clear.
If the features: attribute is set, the sequence input will include feature information either in the same file (if the sequence format supports it) or in a separate file (by default in GFF format).
If the entry: attribute is set, the sequence input will include the full original text of the input sequence or database entry.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without a sequence input. The application must be able to accept a null value for this qualifier.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no sequence input) as the default for programs where seqeunce input is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The minseqs: attribute specifies a minimum number of sequences to be read. By default, a single sequence is acceptable.
The type: attribute will force the sequence set to be of the given type. By default, any sequence type is accepted.
We recommend always defining the type: attribute so that the accepted input sequence type is always clear.
The aligned: attribute, if true, specifies that all sequences in the input are expected to be aligned. If false, then the sequences are assumed to be unaligned, and are simply read into memory together for processing. We recommend always defining the aligned: attribute so that the nature of the sequence set if clearly defined.
Theminseqs attribute specifies a minimum number of sequences in a set. Some applications may require at least two sequences as input to their algorithms. The default is 1, so a set will load one or more sequences into memory.
The maxseqs attribute specifies a maximum number of sequences, a protection for methods that may have serious problems with a large number of sequences.
If the features: attribute is set, the sequence input will include feature information either in the same file (if the sequence format supports it) or in a separate file (by default in GFF format).
If the entry: attribute is set, the sequence input will include the full original text of the input sequence or database entry.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without a sequence input. The application must be able to accept a null value for this qualifier.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no sequence input) as the default for programs where sequence input is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The minseqs: attribute specifies a minimum number of sequences to be read. By default, a single sequence is acceptable.
The type: attribute will force the sequence set(s) to be of the given type. By default, any sequence type is accepted.
We recommend always defining the type: attribute so that the accepted input sequence type is always clear.
The aligned: attribute, if true, specifies that all sequences in the input are expected to be aligned. If false, then the sequences are assumed to be unaligned, and are simply read into memory together for processing. We recommend always defining the aligned: attribute so that the nature of the sequence set if clearly defined.
If the features: attribute is set, the sequence input will include feature information either in the same file (if the sequence format supports it) or in a separate file (by default in GFF format).
If the entry: attribute is set, the sequence input will include the full original text of the input sequence or database entry.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without a sequence input. The application must be able to accept a null value for this qualifier.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no sequence input) as the default for programs where sequence input is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The minseqs: attribute specifies a minimum number of sequences to be read for each set. By default, a single sequence is acceptable.
Applications requiring a single entry should specify the attribute maxreads: with a value of "1".
If the entry: attribute is set, the input will include the full original text of the input file or database entry.
Applications requiring a single entry should specify the attribute maxreads: with a value of "1".
If the entry: attribute is set, the input will include the full original text of the input file or database entry, before markup is stripped or any other text processing.
The tree data type can be replaced by a simple input file in GUIs, with the user required to provide the correct data format.
The attributes define characteristics required for Phylip programs. The tree files accepted by ACD include all the formats read by Phylip, with automatic interconversion.
The size: attribute defines the number of trees in the input file, usually 0 but some programs will accept multiple sets. Some can only accept a single tree (so the value should be set to "1" for these.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without a properties file.
Applications requiring a single entry should specify the attribute maxreads: with a value of "1".
If the entry: attribute is set, the input will include the full original text of the input file or database entry.
The swiss: amd embl: attributes define URLs originating from cross-references in UniProt/SwissProt and EMBL/GenBank/DDBJ respectively. They allow the Xref records in DRCAT to be used to define the EDAM identifier term used to select appropriate query URLs. For other sources, the identifier: attribute allows the EDAM term to be named in the ACD file. All three attributes are also available as associated qualifiers with any URL input.
Applications requiring a single entry should specify the attribute maxreads: with a value of "1".
If the entry: attribute is set, the input will include the full original text of the input file or database entry.
Applications requiring a single entry should specify the attribute maxreads: with a value of "1".
If the entry: attribute is set, the input will include the full original text of the input file or database entry.
Formalised:
Data type |
Attribute definition |
Description |
list |
minimum: integer |
Minimum number of selections |
|
maximum: integer |
Maximum number of selections |
|
button: Y/N |
(Not used by ACD) Prefer check boxes in GUI |
|
casesensitive: Y/N |
Case sensitive |
|
header: string |
Header description for list |
|
delimiter: string |
Delimiter for parsing values |
|
codedelimiter: string |
Delimiter for parsing |
|
values: string |
Codes and values with delimiters |
selection |
minimum: integer |
Minimum number of selections |
|
maximum: integer |
Maximum number of selections |
|
button: Y/N |
(Not used by ACD) Prefer radio buttons in GUI |
|
casesensitive: Y/N |
Case sensitive matching |
|
header: string |
Header description for selection list |
|
delimiter: string |
Delimiter for parsing values |
|
values: string |
Values with delimiters |
Table 4.3. Selection data types - attributes.
For both selection list types, the values that the user can choose from are defined in the values: attribute as a string, delimited by the character that is given by the delimiter: attribute (which defaults to the semi-colon ';'). For the list data type there is a second delimiter ( codedelimiter:) character that defines the delimiter that separates the label from the value (defaults to the colon ":"). The minimum: and maximum: attributes define the number of choices that this parameter can handle. The header: attribute will hold the text that is displayed above the option list. The casesensitive: attribute will indicate if the options are case sensitive or not, but the value of the parameter will be exactly what the list value is. The button: attribute, which can either be Y(es) or N(o), is used in for web front ends, to indicate if radiobuttons/checkbox/selection lists are to be used or if the list is simply displayed with a text entry box beneath it, to enter the option.
The values: attribute contains the list of valid code names and values. The delimiter: and codedelimiter: attributes specify how to parse this string into individual list items.
The minimum: attribute specifies the minimum number of selections required. By default, 1 selection is required.
The maximum: attribute specifies the maximum number of selections required. By default, exactly 1 selection is required. A higher value allows multiple selections.
The header: attribute defines text to appear before the list is presented to the user. The information: attribute defines text to be used as a prompt after the list.
The delimiter: attribute specifies the character used in the values: string to separate list items.
The codedelimiter: attribute specifies the character used in the values: string to separate codes (names) and descriptions of list items.
The button: attribute suggests whether a list is best represented as checkboxes or radio buttons in an interface (value "Y") or as a pull-down list.
The casesensitive: attribute defines whether the input must match the exact case of the selection list item.
Example:
list: matrix [ default: "blosum" # default value minimum: 1 maximum: 1 # must select exactly 1 header: "Comparison matrices" # printed before list values: "B:blosum, P:pam, I:id" 3 valid values delim: "," # delimiter default ";" codedelim: ":" # label delimiter default ":" prompt: "Select one" # prompt after list button: Y # use radio buttons rather than # checkboxes in HTML, # ignored by ACD ]
What you get is:
Comparison matrices B : blosum P : pam I : id Select one [blosum] : PAM
The values: attribute contains the list of valid values. The delimiter: attribute specifies how to parse this string into individual selection list items.
The minimum: attribute specifies the minimum number of selections required. By default, 1 selection is required.
The maximum: attribute specifies the maximum number of selections required. By default, exactly 1 selection is required. A higher value allows multiple selections.
The header: attribute defines text to appear before the selection list is presented to the user. The information: attribute defines text to be used as a prompt after the list.
The delimiter: attribute specifies the character used in the values: string to separate list items.
The button: attribute suggests whether a selection list is best represented as checkboxes or radio buttons in an interface (value "Y") or as a pull-down list.
The casesensitive: attribute defines whether the input must match the exact case of the selection list item.
Example:
select: matrix [ default: "blosum" # default value minimum: "1" maximum: "1" # must select exactly 1 header: "Comparison matrices" # printed before list values: "blosum, pam, id" # valid values delimiter: "," # delimiter default ";" information: "Select one" # prompt after list button: "Y" # use radio buttons rather than # checkboxes in HTML, # ignored by ACD ]
What you get is:
Comparison matrices 1 : blosum 2 : pam 3 : id Select one [blosum] : PAM
Formalised:
Data type |
Attribute definition |
Description |
align |
type: string |
P:protein or N:nucleotide |
|
taglist: string |
Extra tags to report |
|
minseqs: integer |
Minimum number of sequences |
|
maxseqs: integer |
Maximum number of sequences |
|
multiple: Y/N |
More than one alignment in one file |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
featout |
name: string |
Default base file name (use of -ofname preferred) |
|
extension: string |
Default file extension (use of -offormat preferred) |
|
type: string |
Feature type (protein, nucleotide, etc.) |
|
multiple: Y/N |
Features for multiple sequences |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null UFO as 'no output' |
outassembly |
name: string |
Default file name |
|
extension: string |
Default file extension |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outcodon |
name: string |
Default file name |
|
extension: string |
Default file extension |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outcpdb |
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outdata |
type: string |
Data type |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
|
binary: Y/N |
File contains binary data |
outdir |
fullpath: Y/N |
Require full path in value |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
|
binary: Y/N |
Files contain binary data |
|
create: Y/N |
Can create directory if not found |
|
temporary: Y/N |
Scratch directory for temporary files deleted on completion |
outdiscrete |
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outdistance |
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outfile |
name: string |
Default file name |
|
extension: string |
Default file extension |
|
append: Y/N |
Append to an existing file |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
|
binary: Y/N |
File contains binary data |
outfreq |
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outmatrix |
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outmatrixf |
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outobo |
name: string |
Default file name |
|
extension: string |
Default file extension |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outproperties |
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outrefseq |
name: string |
Default file name |
|
extension: string |
Default file extension |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outresource |
name: string |
Default file name |
|
extension: string |
Default file extension |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outscop |
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outtaxon |
name: string |
Default file name |
|
extension: string |
Default file extension |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outtext |
name: string |
Default file name |
|
extension: string |
Default file extension |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outtree |
name: string |
Default file name |
|
extension: string |
Default file extension |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outurl |
name: string |
Default file name |
|
extension: string |
Default file extension |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outvariation |
name: string |
Default file name |
|
extension: string |
Default file extension |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
outxml |
name: string |
Default file name |
|
extension: string |
Default file extension |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
report |
type: string |
P:protein or N:nucleotide |
|
taglist: string |
Extra tag names to report |
|
multiple: Y/N |
Multiple sequences in one report |
|
precision: integer |
Score precision |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null filename as 'no file' |
seqout |
name: string |
Output base name (use of -osname preferred) |
|
extension: string |
Output extension (use of -osextension preferred) |
|
features: Y/N |
Write features if any |
|
type: string |
Output sequence type (protein, gapprotein, etc.) |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null USA as 'no output' |
seqoutall |
name: string |
Output base name (use of -osname preferred) |
|
extension: string |
Output extension (use of -osextension preferred) |
|
features: Y/N |
Write features if any |
|
type: string |
Output sequence type (protein, gapprotein, etc.) |
|
minseqs: integer |
Minimum number of sequences |
|
maxseqs: integer |
Maximum number of sequences |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null USA as 'no output' |
|
aligned: Y/N |
Sequences are aligned |
seqoutset |
name: string |
Output base name (use of -osname preferred) |
|
extension: string |
Output extension (use of -osextension preferred) |
|
features: Y/N |
Write features if any |
|
type: string |
Output sequence type (protein, gapprotein, etc.) |
|
minseqs: integer |
Minimum number of sequences |
|
maxseqs: integer |
Maximum number of sequences |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null USA as 'no output' |
|
aligned: Y/N |
Sequences are aligned |
Table 4.4. Output data types - attributes.
The minseqs: and maxseqs: attributes define whether the alignment will contain exactly 2 sequences, 1 or more, 3 or more, or whatever the program will produce. These values can be used to validate the choice of formats on the command line with the -aformat qualifier.
The aformat: attribute is required. It defines the default value for the -aformat qualifier. The aglobal: attribute defines the default value for the -aglobal qualifier, and should be set true for programs that produce a global alignment. The multiple: attribute should be set true if the output can contain more than one alignment from the same input.If a null value (the current directory) is allowed,the nullok: attribute must be set true.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without an alignment file. The application must be able to accept a null value for this qualifier.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no alignment file) as the default for programs where an alignment file is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead.
The name: attribute will default to "outfile".
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The output filename is constructed from the name: and extension: attributes in a $( name).$(extension) format. If the name: attribute is not defined in the ACD file, it will default to the calculated attribute name: of the FIRST sequence that is read in ($(asequence.name) if the sequence parameter is named asequence).
The extension: attribute will default to the output feature format.
The type: attribute defines whether the feature output is "protein" or "nucleotide". There is a default based on the type of any input sequence, but a value should always be specified.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without this feature output.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no feature output) as the default for programs where feature output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The offormat: attribute defines the default value for the -offormat qualifier, used as the feature format and the default feature file extension.
The ofname: attribute defines the default value for the -ofname qualifier, used as the default base file name
The name: attribute will default to "outfile".
The extension: attribute will default to the format, with "cut" defined as the default format to match the usual codon usage filenaming convention. This format is also called "emboss".
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The extension: attribute sets the default extension for all files written to the directory.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The fullpath: attribute requires the path to be specified in full when passed to the program, although the user may provide a path from the current working directory.
The create: attribute allows a new directory to be created if it does not already exist. Without this set only already existing directories are allowed
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The output filename is constructed from the name: and extension: attributes in a $( name).$(extension) format. If the name: attribute is not defined in the ACD file, it will default to the calculated attribute name: of the FIRST sequence that is read in ($(asequence.name) if the sequence parameter is named asequence).
The extension: attribute will default to the program name, and is usually left as the default value.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no output file) as the default for programs where an output file is only occasionally required. Examples include programs where the original output format is available, usually for users that still require it for parsing in automated scripts. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without an output file.
The knowntype: attribute should always be defined. If the output is not of any special type, a knowntype of "(program) output" is the recommended value.
The append: attribute specifies that output is appended to the end of an existing output file. By default, the output file will be overwritten.
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The name: attribute will default to "outfile".
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The name: attribute will default to "outfile".
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The name: attribute will default to "outfile".
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The name: attribute will default to "outfile".
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The name: attribute will default to "outfile".
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The name: attribute will default to "outfile".
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The name: attribute will default to "outfile".
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The name: attribute will default to "outfile".
The extension: attribute will default to the output file format.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no data output) as the default for programs where data output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The oformat: attribute defines the default value for the -oformat qualifier, used as the file format and the default file extension.
The minseqs: and maxseqs: attributes define whether the alignment will contain exactly 2 sequences, 1 or more, 3 or more, or whatever the program will produce. These values can be used to validate the choice of formats on the command line with the -aformat qualifier.
The rformat: attribute is required. It defines the default value for the -rformat qualifier. The taglist: attribute defines the additional tags to be reported from the internal feature table. The tag names and types must match the source code of the application. Each tag is in the format "type:tagname[=columnname]" for example "int:length" or "string:gc=GC%" The precision: attribute sets the floating point precision of the score value. For integer scores this can be set to "0". The multiple: attribute should be set true if the output can contain more than one report from the same input.The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without a report output file. The application must be able to accept a null value for this qualifier.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no report output) as the default for programs where report output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
filename is constructed from the name: and extension: attribute in a $( name).$(extension) format. If the name: attribute is not defined in the ACD file, it will default to the calculated attribute name: of the FIRST sequence that is read in ($(asequence.name) if the sequence parameter is a named asequence).
If the features: attribute is set, the sequence output will include feature information either in the same file (if the sequence format supports it) or in a separate file (by default in GFF format).
The type: attribute defines the output sequnce type. Although this will default to the type of the first input sequence, it is ercommended that a value is always defined to make the output sequnce type clear.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without this sequence output. The application must be able to accept a null value for this qualifier.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no sequence output) as the default for programs where sequence output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The osextension: attribute sets the default file extension. This is usually the sequence format, but can be specifically set with this attribute, for example where appliations produce two or more sequence outputs.
filename is constructed from the name: and extension: attribute in a $( name).$(extension) format. If the name: attribute is not defined in the ACD file, it will default to the calculated attribute name: of the FIRST sequence that is read in ($(asequence.name) if the sequence parameter is a named asequence).
If the features: attribute is set, the sequence output will include feature information either in the same file (if the sequence format supports it) or in a separate file (by default in GFF format).
The type: attribute defines the output sequnce type. Although this will default to the type of the first input sequence, it is ercommended that a value is always defined to make the output sequnce type clear.
The aligned: attribute, if true, specifies that all sequences in the output have been aligned with gaps.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without this sequence output. The application must be able to accept a null value for this qualifier.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no sequence output) as the default for programs where sequence output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The ossingle: attribute sets the default value for the -ossingle qualifier and can be set to "Y" to direct output to multiple sequence files. For example, the EMBOSS program "seqretsplit" splits an input sequence input multiple files using this attribute.
The osextension: attribute sets the default file extension. This is usually the sequence format, but can be specifically set with this attribute, for example where appliations produce two or more sequence outputs.
filename is constructed from the name: and extension: attribute in a $( name).$(extension) format. If the name: attribute is not defined in the ACD file, it will default to the calculated attribute name: of the FIRST sequence that is read in ($(asequence.name) if the sequence parameter is a named asequence).
If the features: attribute is set, the sequence output will include feature information either in the same file (if the sequence format supports it) or in a separate file (by default in GFF format).
The type: attribute defines the output sequnce type. Although this will default to the type of the first input sequence, it is ercommended that a value is always defined to make the output sequnce type clear.
The aligned: attribute, if true, specifies that all sequences in the output have been aligned with gaps.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without this sequence output. The application must be able to accept a null value for this qualifier.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no sequence output) as the default for programs where sequence output is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead. In combination with the nullok: and missing: attributes, this allows qualifiers to be null by default, and turned on from the command line.
The osextension: attribute sets the default file extension. This is usually the sequence format, but can be specifically set with this attribute, for example where appliations produce two or more sequence outputs.
Formalised:
Data type |
Attribute definition |
Description |
graph |
sequence: Y/N |
Sequence on x axis |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null graph type as 'no graph' |
xygraph |
multiple: integer |
Number of graphs |
|
sequence: Y/N |
Sequence on x axis |
|
nulldefault: Y/N |
Defaults to 'no file' |
|
nullok: Y/N |
Can accept a null graph type as 'no graph' |
Table 4.5. Graph data types - attributes.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without this graph output. The application must be able to accept a null value for this qualifier.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no graph) as the default for programs where a graph is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead.
The sequence: attribute specifies that the X-axis positions relate to sequence positions in the first input sequence. The sequence name becomes the default x-axis title and is used for datafile outputs that need a source name.
The goutfile: attribute specifies the base file name for output. It can be used to direct output to a named file rather than default to the first sequence name in the input.
The command line qualifiers can be defined as ACD attributes. The most used are gdesc:, gtitle:, gxtitle: and gytitle:.
The nullok: attribute allows a default value to be replaced by an empty string or by -noxxx on the command line if the application can run without this graph output.
The nulldefault: attribute overrides the default name generation, and uses an empty string (no graph) as the default for programs where a graph is only occasionally required. If an empty string is specified on the command line, the standard default value will be generated instead.
The sequence: attribute specifies that the X-axis positions relate to sequence positions in the first input sequence. The sequence name becomes the default x-axis title and is used for datafile outputs that need a source name.
The command line qualifiers can be defined as ACD attributes. The most used are gdesc:, gtitle:, gxtitle: and gytitle:.
The multiple: attribute specifies the number of multiple XY graphs in a single output. The default value is 1, but is sometimes defined in ACD files.
The goutfile: attribute specifies the base file name for output. It can be used, for example by the EMBOSS program "tmap" to direct output to a named file rather than default to the first sequence name in the input.
Calculated attributes are attributes that are assigned values (calculated in some points) AFTER the parameter has been validated (for instance, for a sequence data type, the sequence file has been checked to exist and read in). The values are extracted from the actual object the parameter is referring to. At the moment the calculated attributes are only referring to sequence type objects and can hold things like the name of the sequence, the length, the type of sequence (Protein, DNA, RNA etc).
Formalised:
Data type |
Calculated attributes |
Description |
distances |
distancecount: integer |
Number of distance matrices |
|
distancesize: integer |
Number of distance rows |
|
replicates: Y/N |
Replicates data found in input |
|
hasmissing: Y/N |
Missing values found (replicates=N) |
features |
fbegin: integer |
Start of the features to be used |
|
fend: integer |
End of the features to be used |
|
flength: integer |
Total length of sequence (fsize is feature count) |
|
fprotein: Y/N |
Feature table is protein |
|
fnucleic: Y/N |
Feature table is nucleotide |
|
fname: string |
The name of the feature table |
|
fsize: integer |
Number of features |
frequencies |
freqlength: integer |
Number of frequency values per set |
|
freqsize: integer |
Number of frequency sets |
|
freqloci: integer |
Number of frequency loci |
|
freqgenedata: Y/N |
Gene frequency data |
|
freqcontinuous: Y/N |
Continuous frequency data |
|
freqwithin: Y/N |
Individual within species frequency data |
properties |
propertylength: integer |
Number of property values per set |
|
propertysize: integer |
Number of property sets |
regexp |
length: integer |
The length of the regular expression |
seqall |
begin: integer |
Start of the first sequence used |
|
end: integer |
End of the first sequence used |
|
length: integer |
Total length of the first sequence |
|
protein: Y/N |
Sequence is protein |
|
nucleic: Y/N |
Sequence is nucleotide |
|
name: string |
The name/ID/accession of the sequence |
|
usa: string |
The USA of the sequence |
seqset |
begin: integer |
The beginning of the selection of the sequence |
|
end: integer |
The end of the selection of the sequence |
|
length: integer |
The maximum length of the sequence set |
|
protein: Y/N |
Sequence set is protein |
|
nucleic: Y/N |
Sequence set is nucleotide |
|
name: string |
The name of the sequence set |
|
usa: string |
The USA of the sequence set |
|
totweight: float |
Total sequence weight for a set |
|
count: integer |
Number of sequences in the set |
seqsetall |
begin: integer |
The beginning of the selection of the sequence |
|
end: integer |
The end of the selection of the sequence |
|
length: integer |
The maximum length of the sequence set |
|
protein: Y/N |
Sequence set is protein |
|
nucleic: Y/N |
Sequence set is nucleotide |
|
name: string |
The name of the sequence set |
|
usa: string |
The USA of the sequence set |
|
totweight: float |
Total sequence weight for each set |
|
count: integer |
Number of sequences in each set |
|
multicount: integer |
Number of sets of sequences |
sequence |
begin: integer |
Start of the sequence used |
|
end: integer |
End of the sequence used |
|
length: integer |
Total length of the sequence |
|
protein: Y/N |
Sequence is protein |
|
nucleic: Y/N |
Sequence is nucleotide |
|
name: string |
The name/ID/accession of the sequence |
|
usa: string |
The USA of the sequence |
string |
length: integer |
The length of the string |
tree |
treecount: integer |
Number of trees |
|
speciescount: integer |
Number of species |
|
haslengths: Y/N |
Branch lengths defined |
Table 5. Data type-specific calculated attributes.
The type: attribute will describe the type of the sequence in a single token. The EMBOSS initialisation routines will try to establish the type, by reading the (first) sequence and examining the contents. Possible values for the type: attribute are listed in table 8.
Value |
Type(s) |
Gaps |
Ambiguity codes |
Conversions |
Description |
any |
Nucleotide or protein |
Removed |
Yes |
'?'=>'X' |
any valid sequence |
gapany |
Nucleotide or protein |
Kept |
Yes |
'?'=>'X' |
any valid sequence with gaps |
dna |
Nucleotide only |
Removed |
Yes |
'?'=>'N' |
DNA sequence |
puredna |
Nucleotide only |
Removed |
No |
'U'=>'T' |
DNA sequence, bases ACGT only |
gapdna |
Nucleotide only |
Kept |
Yes |
'?'=>'N' |
DNA sequence with gaps |
gapdnaphylo |
Nucleotide only |
Kept |
Yes |
'U'=>'T' |
DNA sequence with gaps and queries |
rna |
Nucleotide only |
Removed |
Yes |
'?'=>'N' |
RNA sequence |
purerna |
Nucleotide only |
Removed |
No |
'T'=>'U' |
RNA sequence, bases ACGU only |
gaprna |
Nucleotide only |
Kept |
Yes |
'?'=>'N' |
RNA sequence with gaps |
gaprnaphylo |
Nucleotide only |
Kept |
Yes |
'T'=>'U' |
RNA sequence with gaps and queries |
nucleotide |
Nucleotide only |
Removed |
Yes |
'?'=>'N' |
nucleotide sequence |
purenucleotide |
Nucleotide only |
Removed |
No |
|
nucleotide sequence, bases ACGTU only |
gapnucleotide |
Nucleotide only |
Kept |
Yes |
'?'=>'N' |
nucleotide sequence with gaps |
gapnucleotidephylo |
Nucleotide only |
Kept |
Yes |
|
nucleotide sequence with gaps and queries |
gapnucleotidesimple |
Nucleotide only |
Kept |
Yes |
'B'=>'N' |
nucleotide sequence with gaps but only N for ambiguity |
protein |
Protein only |
Removed |
Yes |
'?'=>'X' |
protein sequence |
pureprotein |
Protein only |
Removed |
No |
|
protein sequence without BZ U X or * |
stopprotein |
Protein only |
Removed |
Yes |
'?'=>'X' |
protein sequence with possible stops |
gapprotein |
Protein only |
Kept |
Yes |
'?'=>'X' |
protein sequence with gaps |
gapstopprotein |
Protein only |
Kept |
Yes |
'?'=>'X' |
protein sequence with gaps and possible stops |
gapproteinphylo |
Protein only |
Kept |
Yes |
|
protein sequence with gaps, stops and queries |
proteinstandard |
Protein only |
Removed |
Yes |
'?'=>'X' |
protein sequence with no selenocysteine |
stopproteinstandard |
Protein only |
Removed |
Yes |
'?'=>'X' |
protein sequence with a possible stop but no selenocysteine |
gapproteinstandard |
Protein only |
Kept |
Yes |
'?'=>'X' |
protein sequence with gaps but no selenocysteine |
gapproteinsimple |
Protein only |
Kept |
Yes |
'?'=>'X' |
protein sequence with gaps but no selenocysteine |
Table 6. Possible values for the type: attribute in input sequence data types.
The values of attributes (default, specific and calculated) can be referred to after they have been defined by appending the attribute name to the parameter name, spaced by a dot '.' and enclosing it in parentheses, prefixed by a dollar sign '$'.
Formalised:
$(parametername.attribute)
Example:
sequence: asequence [ standard: Y prompt: "Enter filename" ] integer: windowsize [ default: $(asequence.length) ]
In this example the parameter windowsize will default to the length of the input sequence.
For many of the parameters/objects, qualifiers can be used to specify the properties of that object on the command line. The format of a sequence file (data type 'sequence') can be specified by a qualifier as being, for instance, 'fasta'. These types of qualifiers are specific for a particular data type (or object) and are therefore called data type specific qualifiers.
A second type of qualifier is independent of the data types. These are the global qualifiers and apply to the complete program. They are usually used to change the behaviour of the program. Qualifiers can be set to turn the debugging on, for instance (by using the -debug qualifier), or it can instruct the program to behave like a filter, reading from the standard input and writing to the standard output ( -filter qualifier).
Qualifiers can be entered on the command line in a myriad of ways and a full description of the command line syntax will be given in 1.4. For the moment qualifiers will be used in the UNIX style, which means that a qualifier name is prefixed with an hyphen and the value (if necessary) will be spaced from the qualifier by a space.
Example:
% seqret sequence.seq -sformat fasta
"-sformat fasta" is a "qualifier/value pair". Where seqret is the program being called, sequence.seq the first (and only) parameter and -sformat fasta the qualifier/value pair for this parameter.
The global qualifiers are boolean qualifiers and can be set by naming them on the command line and specifically unset by prefixing the qualifier with 'no', but since the global qualifiers all default to false anyway, there is no specific need to use this syntax at the moment.
Example:
% seqret sequence.seq -debug % seqret sequence.seq -nodebug
In the first example seqret is the program being called, sequence.seq the first (and only) parameter and -debug instructs the program to turn debugging on. In the second example seqret is run with the same parameter, but the -debug qualifier is now prefixed with ' no', instructing the program to turn debugging off (this could be useful if debugging was turned on by default in the resource files or in an environment variable).
Qualifiers can have any name, but a recommended naming scheme is used at the moment. The first one or two letters of the qualifier indicate the data type they are related to. 'OS' is used for the output sequence data types (outseq, outseqset and outseqall) and 'S' for the input sequence data types (sequence, seqset and seqall). The rest of the qualifiers' name is free but should be something sensible related to the data type.
Global qualifiers can change the behaviour of the program. They are boolean qualifiers and can be set by naming them on the command line and specifically unset by prefixing the qualifier with 'no' [2]. The qualifiers can be used on the EMBOSS program and as a qualifier for acdc. The current global qualifiers are listed in the table below.
Formalised:
Qualifier definition |
Description |
Table 7. Global qualifiers.
This qualifier will turn on the log function of the ACD file processing. It will produce a logfile of the ACD file parsing process. The logfile will have the name of the application, with the extension .acdlog.
Example:
% seqret sequence.seq -acdlog -auto % more seqret.acdlog seqret [ sequence: sequence [ parameter: "1" ] seqout: outseq [ parameter: "2" ] -- All Done -- Definitions in ACD file ACD 0 Name: 'seqret' ..(Removed lines)..
This example shows the application seqret being run with the -acdlog qualifier (and the -auto qualifier, (which will be discussed later). After completion of the program a file called seqret.acdlog is created in the current directory, with the logging information, of which the first sixteen lines are shown.
The logfile will first list the ACD file description, with all abbreviated names extended to their full length. Next, it will list all the parameters and qualifiers it knows of and prints out all the information it has on the data types and qualifiers.
When the -acdpretty qualifier is used, an ACD file will be produced which is a formatted version of the original ACD file. It will produce the full-length names of all data type names, attributes and qualifiers. It will show all attributes on a separate line and all values enclosed in quotes. The file will be saved as programname.acdpretty in the current directory ( programname is the name of the original program).
The -auto qualifier will turn off any prompting of the user. It will try to run the program with all the default settings that are defined in the ACD file. If a parameter does not have a default value and it is flagged as required, the program will stop and produce an error message.
Example:
% seqret sequence.seq Output sequence [pdnirsecf.fasta]: % seqret sequence.seq -auto %
The first example shows the application seqret being run without the -auto qualifier. The program will prompt the user for an output filename, because the output sequence is a mandatory parameter. It presents the user with a prompt and a default output filename ( pdnirsecf.fasta, constructed from the input sequence name and the output format).
In the second example, the application seqret is run with the -auto qualifier. The user is not queried for the output filename and it will use the default filename( pdnirsecf.fasta) for the output file.
This qualifier will turn on the debug tracing. A file will be produced with the name of the program followed by the extension .dbg. The debug file will contain a complete trace of the actions of the program reported by calls to the AJAX function ajDebug().
Example:
% seqret sequence.seq -debug -auto % more seqret.dbg acdArgsScan acdDebug Yes ajNamGetValueC 'acdroot' 'emboss_acdroot' definition for 'acdroot' not found ajNamResolve of '/packages/emboss/emboss/acd/seqret.acd' closing file '/packages/emboss/emboss/acd/seqret.acd' acdFindQualAssoc 'auto' pnum: 0 ifound: 0 acdSetQualAppl acdDebug YesajNamGetValueC 'filter' 'emboss_filter' definition for 'filter' not found ajNamGetValueC 'options' 'emboss_options' definition for 'options' not found ajNamGetValueC 'acdlog' 'emboss_acdlog' definition for 'acdlog' not found ajNamGetValueC 'help' 'emboss_help' definition for 'help' not found ajSeqInClear called Initializing seqInFormat, 24 formats ajNamGetValueC 'format' 'emboss_format' definition for 'format' not found ajSeqRead: no file yet - test USA '/people/tdeboer/seq/nir.gb' seqUsaProcess USA to test: '/people/tdeboer/seq/nir.gb' format regexp: No no format specified in USA ...input format not set dbname dbexp: No no dbname specified entry-id regexp: Yes found filename /people/tdeboer/seq/nir.gb seqAccessFile /people/tdeboer/seq/nir.gb ajNamResolve of '/people/tdeboer/seq/nir.gb' ajSeqRead: calling seqRead '/people/tdeboer/seq/nir.gb' seqRead seqin format 0 '' try format 1 (gcg) seqGcgDots .. found source 1..5574 try format 3 (embl) first line 'LOCUS PDNIRSECF 5574 bp DNA BCT 30-MAR-1996 try format 5 (swiss) first line 'LOCUS PDNIRSECF 5574 bp DNA BCT 30-MAR-1996 try format 7 (fasta)
This examples shows the application seqret being run with the -debug qualifier (and the -auto qualifier, which will be discussed later). After completion of the program, a file called seqret.dbg is created in the current directory which contains the debug information, of which some of the lines are shown.
The filter qualifier makes the program behave like a filter, reading its (first) input 'file' from the standard input, and writing its (first) output 'file' to the standard output. The -filter qualifier will also invoke the -auto qualifier, so the user is never prompted for any missing values.
Example:
% cat sequence.seq | seqret -filter | lpr
The example shows the application seqret being run with the -filter qualifier. The input file is 'piped' into the program using the Unix command cat and the output is 'piped' directly to the Unix program lpr, which will print it on the printer.
Help on a program's use can be obtained by using the -help qualifier. The help that is displayed will be automatically produced from the information in the ACD file. It will list all the parameters and their associated qualifiers. It will show the names of the parameters and qualifiers, their type and a brief help text, that is extracted from the help: attribute.
A second qualifier -verbose gives a list of all available qualifiers, including any associated qualifiers (sequence formatting etc) and the general qualifiers such as -help.
A program wirll prompt the user for any missing "required" or "parameter" values. Some programs have more options that are normally not prompted for (although they can be used on the command line). When the -options qualifier is used, the program will query the user for the required parameters (data types with the parameter: attribute and/or standard: attribute) and also for the parameters that are labelled with the additional: attribute.
Example:
ACD file definition :
application: seqdemo sequence: asequence [ parameter: Y ] outseq: outseq [ standard: Y ] integer: outputLength [ additional: Y information: "Output length" ]
Command line :
% seqdemo Input sequence: sequence.seq Output sequence [pdnirsecf.fasta]: <ENTER> % seqdemo -options Input sequence: sequence.seq Output sequence [pdnirsecf.fasta]: <ENTER> Output length: 10
In the first example the application seqdemo is run without any parameters or qualifiers and since the asequence parameter is a parameter it queries the user for the input filename. It also queries the user for the output sequence, since that parameter is labelled as being required by the attribute of that name. It will not query the user for the integer variable outputLength, since it is not labelled as a parameter and is not labelled as required.
In the second example the user IS queried for the integer, since the -options qualifier forces the program to query for those parameters that are labelled with the additional: attribute.
Any parameter that is not defined as a parameter (with the parameter: attribute), as required (by the standard: attribute) or as optional (by the additional: attribute) can still be used on the command line, but the user will NEVER be queried for them. These parameters are considered an 'advanced feature' and can only be used on the command line. They will only be shown by the -help qualifier.
When the -stdout qualifier is used, the user will still be prompted for all the info that is required, but will write to standard output. The user will also still be prompted for an output filename, in case the user wants to save the output to a file.
Example:
Command line :
% seqret -stdout Input sequence: sequence.seq Output sequence [stdout]: <ENTER>
In this example the -stdout qualifier changes the default output to be to standard output (the terminal) instead of to a file. The program can still prompt the user, so there is a chance to enter a filename instead. With -auto on the command line, the program would instead write to the terminal without asking.
Most global qualifiers default to FALSE unless they are set on the command line or the environment variable is set to TRUE (the exceptions are the message level qualifiers -warning -error and -fatal). The actions of all the global qualifiers can be changed by using environment variables. Environment variables will override the default action of the program. The variables are constructed of the word EMBOSS (all capitals) and the name of the qualifier (also in capitals) divided by the underscore character '_'. If set, they can be set with YES, TRUE or 1. Both lowercase and uppercase is accepted, as is using only a part of the word YES or TRUE (i.e. Y and T)
Formalised:
(csh) setenv EMBOSS_QUAL TRUE or setenv EMBOSS_QUAL true setenv EMBOSS_QUAL YES or setenv EMBOSS_QUAL yes setenv EMBOSS_QUAL 1 (sh or bash) export EMBOSS_QUAL=YES
where QUAL represents the name of the global qualifier.
The table below lists all environment variables for global qualifiers.
Environment variable |
Global qualifier |
Description |
Table 8. Environment variables associated with global qualifiers.
Environment variables can be specified in the global emboss.defaults file, in the user's .embosssrc file or set on the command line with the setenv command.
When the environment variable is set, its effect can be cancelled by using the negating action of prefixing 'no' to the boolean qualifier name with the program name.
Example:
ACD file definition :
application: seqdemo sequence: asequence [ parameter: Y ] outseq: outseq [ parameter: Y ] integer: outputLength [ additional: Y ]
Command line :
Example 1
% seqdemo Input sequence: sequence.seq Output sequence [pdnirsecf.fasta]: <ENTER> % seqdemo -options Input sequence: sequence.seq Output sequence [pdnirsecf.fasta]: <ENTER> Output length: 10
Example 2:
% setenv EMBOSS_OPTIONS YES % seqdemo Input sequence: sequence.seq Output sequence [pdnirsecf.fasta]: <ENTER> Output length: 10 % seqdemo -nooptions Input sequence: sequence.seq Output sequence [pdnirsecf.fasta]: <ENTER>
The first example shows the behaviour without the EMBOSS_OPTIONS environment variable being set. The program seqdemo behaves in the standard way, and only asks for the outputLength parameter when the -options qualifier is used. In the second example the environment variable EMBOSS_OPTIONS is set at the command line and the effect of it is that it now asks for the outputLength parameter without the -options qualifier being used. The effect of the environment variable is cancelled by using the negating effect of the prefix ' no' to the qualifier -options (giving -nooptions).
Formalised:
Data type |
Qualifier definition |
Description |
assembly |
-cbegin: integer |
Start of the contig/consensus sequences |
|
-cend: integer |
End of the contig/consensus sequences |
|
-iformat: string |
Input assembly format |
|
-iquery: string |
Input query fields or ID list |
|
-ioffset: integer |
Input start position offset |
|
-idbname: string |
User-provided database name |
codon |
-format: string |
Data format |
cpdb |
-format: string |
Data format |
directory |
-extension: string |
Default file extension |
dirlist |
-extension: string |
Default file extension |
features |
-fformat: string |
Features format |
|
-iquery: string |
Input query fields or ID list |
|
-ioffset: integer |
Input start position offset |
|
-fopenfile: string |
Features file name |
|
-fask: Y/N |
Prompt for begin/end/reverse |
|
-fbegin: integer |
Start of the features to be used |
|
-fend: integer |
End of the features to be used |
|
-freverse: Y/N |
Reverse (if DNA) |
|
-fcircular: Y/N |
Circular sequence features |
obo |
-iformat: string |
Input obo format |
|
-iquery: string |
Input query fields or ID list |
|
-ioffset: integer |
Input start position offset |
|
-idbname: string |
User-provided database name |
pattern |
-pformat: string |
File format |
|
-pmismatch: integer |
Pattern mismatch |
|
-pname: string |
Pattern base name |
refseq |
-iformat: string |
Input reference sequence format |
|
-iquery: string |
Input query fields or ID list |
|
-ioffset: integer |
Input start position offset |
|
-idbname: string |
User-provided database name |
regexp |
-pformat: string |
File format |
|
-pname: string |
Pattern base name |
resource |
-iformat: string |
Input resource format |
|
-iquery: string |
Input query fields or ID list |
|
-ioffset: integer |
Input start position offset |
|
-idbname: string |
User-provided database name |
scop |
-format: string |
Data format |
sequence |
-sbegin: integer |
Start of the sequence to be used |
|
-send: integer |
End of the sequence to be used |
|
-sreverse: Y/N |
Reverse (if DNA) |
|
-sask: Y/N |
Ask for begin/end/reverse |
|
-snucleotide: Y/N |
Sequence is nucleotide |
|
-sprotein: Y/N |
Sequence is protein |
|
-slower: Y/N |
Make lower case |
|
-supper: Y/N |
Make upper case |
|
-scircular: Y/N |
Sequence is circular |
|
-squick: Y/N |
Read id and sequence only |
|
-sformat: string |
Input sequence format |
|
-iquery: string |
Input query fields or ID list |
|
-ioffset: integer |
Input start position offset |
|
-sdbname: string |
Database name |
|
-sid: string |
Entryname |
|
-ufo: string |
UFO features |
|
-fformat: string |
Features format |
|
-fopenfile: string |
Features file name |
seqall |
-sbegin: integer |
Start of each sequence to be used |
|
-send: integer |
End of each sequence to be used |
|
-sreverse: Y/N |
Reverse (if DNA) |
|
-sask: Y/N |
Ask for begin/end/reverse |
|
-snucleotide: Y/N |
Sequence is nucleotide |
|
-sprotein: Y/N |
Sequence is protein |
|
-slower: Y/N |
Make lower case |
|
-supper: Y/N |
Make upper case |
|
-scircular: Y/N |
Sequence is circular |
|
-squick: Y/N |
Read id and sequence only |
|
-sformat: string |
Input sequence format |
|
-iquery: string |
Input query fields or ID list |
|
-ioffset: integer |
Input start position offset |
|
-sdbname: string |
Database name |
|
-sid: string |
Entryname |
|
-ufo: string |
UFO features |
|
-fformat: string |
Features format |
|
-fopenfile: string |
Features file name |
seqset |
-sbegin: integer |
Start of each sequence to be used |
|
-send: integer |
End of each sequence to be used |
|
-sreverse: Y/N |
Reverse (if DNA) |
|
-sask: Y/N |
Ask for begin/end/reverse |
|
-snucleotide: Y/N |
Sequence is nucleotide |
|
-sprotein: Y/N |
Sequence is protein |
|
-slower: Y/N |
Make lower case |
|
-supper: Y/N |
Make upper case |
|
-scircular: Y/N |
Sequence is circular |
|
-squick: Y/N |
Read id and sequence only |
|
-sformat: string |
Input sequence format |
|
-iquery: string |
Input query fields or ID list |
|
-ioffset: integer |
Input start position offset |
|
-sdbname: string |
Database name |
|
-sid: string |
Entryname |
|
-ufo: string |
UFO features |
|
-fformat: string |
Features format |
|
-fopenfile: string |
Features file name |
seqsetall |
-sbegin: integer |
Start of each sequence to be used |
|
-send: integer |
End of each sequence to be used |
|
-sreverse: Y/N |
Reverse (if DNA) |
|
-sask: Y/N |
Ask for begin/end/reverse |
|
-snucleotide: Y/N |
Sequence is nucleotide |
|
-sprotein: Y/N |
Sequence is protein |
|
-slower: Y/N |
Make lower case |
|
-supper: Y/N |
Make upper case |
|
-scircular: Y/N |
Sequence is circular |
|
-squick: Y/N |
Read id and sequence only |
|
-sformat: string |
Input sequence format |
|
-iquery: string |
Input query fields or ID list |
|
-ioffset: integer |
Input start position offset |
|
-sdbname: string |
Database name |
|
-sid: string |
Entryname |
|
-ufo: string |
UFO features |
|
-fformat: string |
Features format |
|
-fopenfile: string |
Features file name |
taxon |
-iformat: string |
Input taxonomy format |
|
-iquery: string |
Input query fields or ID list |
|
-ioffset: integer |
Input start position offset |
|
-idbname: string |
User-provided database name |
text |
-iformat: string |
Input text format |
|
-iquery: string |
Input query fields or ID list |
|
-ioffset: integer |
Input start position offset |
|
-idbname: string |
User-provided database name |
url |
-iformat: string |
Input URL format |
|
-idbname: string |
User-provided database name |
|
-swiss: Y/N |
Swissprot cross-reference |
|
-embl: Y/N |
EMBL/GenBank/DDBJ cross-reference |
|
-accession: string |
Primary accession for source data |
|
-identifier: string |
Identifier term name in EDAM |
variation |
-iformat: string |
Input variation format |
|
-iquery: string |
Input query fields or ID list |
|
-ioffset: integer |
Input start position offset |
|
-idbname: string |
User-provided database name |
xml |
-iformat: string |
Input xml format |
|
-iquery: string |
Input query fields or ID list |
|
-ioffset: integer |
Input start position offset |
|
-idbname: string |
User-provided database name |
Table 9.1. Input qualifiers.
Formalised:
Data type |
Qualifier definition |
Description |
align |
-aformat: string |
Alignment format |
|
-aextension: string |
File name extension |
|
-adirectory: string |
Output directory |
|
-aname: string |
Base file name |
|
-awidth: integer |
Alignment width |
|
-aaccshow: Y/N |
Show accession number in the header |
|
-adesshow: Y/N |
Show description in the header |
|
-ausashow: Y/N |
Show the full USA in the alignment |
|
-aglobal: Y/N |
Show the full sequence in alignment |
featout |
-offormat: string |
Output feature format |
|
-ofopenfile: string |
Features file name |
|
-ofextension: string |
File name extension |
|
-ofdirectory: string |
Output directory |
|
-ofname: string |
Base file name |
|
-ofsingle: Y/N |
Separate file for each entry |
outassembly |
-odirectory: string |
Output directory |
|
-oformat: string |
Assembly output format |
outcodon |
-odirectory: string |
Output directory |
|
-oformat: string |
Output format specific to this data type |
outdata |
-odirectory: string |
Output directory |
|
-oformat: string |
Output format specific to this data type |
outdir |
-extension: string |
Default file extension |
outdiscrete |
-odirectory: string |
Output directory |
|
-oformat: string |
Output format specific to this data type |
outfile |
-odirectory: string |
Output directory |
outfreq |
-odirectory: string |
Output directory |
|
-oformat: string |
Output format specific to this data type |
outmatrix |
-odirectory: string |
Output directory |
|
-oformat: string |
Output format specific to this data type |
outmatrixf |
-odirectory: string |
Output directory |
|
-oformat: string |
Output format specific to this data type |
outobo |
-odirectory: string |
Output directory |
|
-oformat: string |
Ontology term output format |
outproperties |
-odirectory: string |
Output directory |
|
-oformat: string |
Output format specific to this data type |
outrefseq |
-odirectory: string |
Output directory |
|
-oformat: string |
Referece sequence output format |
outresource |
-odirectory: string |
Output directory |
|
-oformat: string |
Data resource output format |
outscop |
-odirectory: string |
Output directory |
|
-oformat: string |
Output format specific to this data type |
outtaxon |
-odirectory: string |
Output directory |
|
-oformat: string |
Taxonomy output format |
outtext |
-odirectory: string |
Output directory |
|
-oformat: string |
Text output format |
outtree |
-odirectory: string |
Output directory |
|
-oformat: string |
Output format specific to this data type |
outurl |
-odirectory: string |
Output directory |
|
-oformat: string |
URL output format |
outvariation |
-odirectory: string |
Output directory |
|
-oformat: string |
Variation output format |
outxml |
-odirectory: string |
Output directory |
|
-oformat: string |
Xml output format |
report |
-rformat: string |
Report format |
|
-rname: string |
Base file name |
|
-rextension: string |
File name extension |
|
-rdirectory: string |
Output directory |
|
-raccshow: Y/N |
Show accession number in the report |
|
-rdesshow: Y/N |
Show description in the report |
|
-rscoreshow: Y/N |
Show the score in the report |
|
-rstrandshow: Y/N |
Show the nucleotide strand in the report |
|
-rusashow: Y/N |
Show the full USA in the report |
|
-rmaxall: integer |
Maximum total hits to report |
|
-rmaxseq: integer |
Maximum hits to report for one sequence |
seqout |
-osformat: string |
Output seq format |
|
-osextension: string |
File name extension |
|
-osname: string |
Base file name |
|
-osdirectory: string |
Output directory |
|
-osdbname: string |
Database name to add |
|
-ossingle: Y/N |
Separate file for each entry |
|
-oufo: string |
UFO features |
|
-offormat: string |
Features format |
|
-ofname: string |
Features file name |
|
-ofdirectory: string |
Output directory |
seqoutall |
-osformat: string |
Output seq format |
|
-osextension: string |
File name extension |
|
-osname: string |
Base file name |
|
-osdirectory: string |
Output directory |
|
-osdbname: string |
Database name to add |
|
-ossingle: Y/N |
Separate file for each entry |
|
-oufo: string |
UFO features |
|
-offormat: string |
Features format |
|
-ofname: string |
Features file name |
|
-ofdirectory: string |
Output directory |
seqoutset |
-osformat: string |
Output seq format |
|
-osextension: string |
File name extension |
|
-osname: string |
Base file name |
|
-osdirectory: string |
Output directory |
|
-osdbname: string |
Database name to add |
|
-ossingle: Y/N |
Separate file for each entry |
|
-oufo: string |
UFO features |
|
-offormat: string |
Features format |
|
-ofname: string |
Features file name |
|
-ofdirectory: string |
Output directory |
Table 9.2. Output qualifiers.
Formalised:
Data type |
Qualifier definition |
Description |
graph |
-gprompt: Y/N |
Graph prompting |
|
-gdesc: string |
Graph description |
|
-gtitle: string |
Graph title |
|
-gsubtitle: string |
Graph subtitle |
|
-gxtitle: string |
Graph x axis title |
|
-gytitle: string |
Graph y axis title |
|
-goutfile: string |
Output file for non interactive displays |
|
-gdirectory: string |
Output directory |
xygraph |
-gprompt: Y/N |
Graph prompting |
|
-gdesc: string |
Graph description |
|
-gtitle: string |
Graph title |
|
-gsubtitle: string |
Graph subtitle |
|
-gxtitle: string |
Graph x axis title |
|
-gytitle: string |
Graph y axis title |
|
-goutfile: string |
Output file for non interactive displays |
|
-gdirectory: string |
Output directory |
Table 9.3. Graph qualifiers.
Qualifiers refer to the parameter that preceded the qualifier, until a parameter from the same data type appears on the command line. But, qualifiers that are specific for different data types can be intermixed. If there are no two parameters of equal type, the order of parameters and their qualifiers is irrelevant.
Example 1
% seqret in.seq out.seq -sformat fasta -osformat gcg
In this example, the program seqret takes two parameters, an input sequence (file in.seq, data type 'sequence') and an output sequence (file out.seq, data type 'outseq') and the order of the qualifiers is irrelevant, since the two qualifiers refer to different data types.
Example 2
% align aap.seq -sformat fasta noot.seq -sformat gcg
In this example, the program align takes two parameters, both input sequences (files aap.seq and noot.seq, data type sequence) and here the order of the qualifiers is important. Since aap.seq is in 'fasta' format and noot.seq is in 'gcg' format.
Instead of having to adhere to a rigorous order for the qualifiers when two or more parameters of the same data type are defined, it is also possible to use numbers in the qualifiers name, to indicate to which parameter the qualifier is referring.
Formalised:
-qualifiername# qualifiervalue
where # represents an integer number, indicating which parameter the qualifier is referring to.
Example:
% align aap.seq noot.seq -sformat2 gcg -sformat1 fasta
Is similar to example 2 above, but uses the qualifier numbering, to indicate that the format of the first parameter is 'fasta' and the second 'gcg'.
The number that is used is not the number of the parameter in the ACD definition, but indicates the number of SIMILLAR qualifiers.
Example:
#ACD definition
application: seqtest sequence: asequence1 [ parameter: Y ] outfile: outfile [ parameter: Y ] sequence: asequence2 [ parameter: Y ]
Command line :
% seqtest filename1.seq seqtest.out filename2.seq \ -sformat1 gcg -sformat2 fasta
Defines that the first sequence file (filename1.seq) is in 'gcg' format and the second sequence file (filename2.seq) is in 'fasta' format. Note that the second -sformat qualifier has been numbered 2, although it is the third parameter (but the second sequence parameter, hence number 2).
Operations can be used to be more flexible in the ACD syntax. At the moment there are arithmetic and conditional operations. An operation is enclosed in a pair of parenthesis '()' and preceded by the at symbol '@'.
Formalised:
@(operation)
If the operation contains white spaces, the whole token should be enclosed by double quotes (" ")
Formalised:
"@(operation with white space )"
Operations can be nested.
Formalised:
@(@(operation))
The current arithmetic operations are addition, subtraction, multiplication and division. The standard characters for the arithmetic operations are used: + - * and /.
Formalised:
@(a+b) (Addition) @(a-b) (Subtraction) @(a*b) (Multiplication) @(a/b) (Division)
The operands a and b must parse to an integer or a float value. Only a single arithmetic operation is allowed per operation. If more then one arithmetic operation is required, one should make use of internal ACD variables to hold the intermediate results or nest separate @() operations.
Example1:
variable: protlen "@( $(sequence.length) / 3 )" integer: window [ maximum: "@($(protlen)-50)" default: 50 ]
This is an example of using an internal ACD variable, to store the intermediate result. The internal ACD variable $(protlen) is calculated from the length of the input sequence ( sequence data type) and used in the definition of maximum size of the window parameter.
Example2:
integer: window [ maximum: "@( @( $(sequence.length) / 3) - 50)" default: 50 ]
This is an example using nesting of operations, achieving the same result as example1. The window parameter is calculated directly from the sequence.length variable (calculated attribute) by first dividing the sequence length by 3, using the divide arithmetic operation, nested with a separate subtraction operation.
If any of the operands are not numerical, the result is undefined.
There are three conditional operations: The Boolean operation, the simple conditional (if/then/else type) and the case type.
The Boolean operation will resolve to a Boolean variable using any of the four conditional operators, for equality (==), non-equality (!=), less-than (<) and greater-than (>).
Formalised:
@(token1==token2) (Equality) @(token1!=token2) (Non-equality) @(token1<token2) (Less-than) @(token1>token2) (Greater-than) @(!token1) (Not) @(token1|token2) (Or) @(token1&token2) (And)
The test values can be integers, floats and strings.
Example:
sequence: seq [ standard: Y ] infile: data [ standard: @(seq.type==DNA) ]
In this example, the data file is only required if the type of sequence is 'DNA'.
The simple conditional is a tri-operand operator. The test value is followed by a question mark '?', which in turn is followed by the two values the operation can resolve to, separated by a colon ':'. Formalised:
@(boolval ? iftrue : iffalse)
The test value, boolval, which must be either a Boolean variable, a Boolean operation or an integer, is examined and if it resolves to true (or non-zero) the total operation resolves to the iftrue value. If the test value resolves to false (or zero) the operation resolves to the second value ( iffalse.
Example:
string: matrix [ default: "@($(asequence.protein) ? BLOSUM62 : DNAMAT)" ]
The $(sequence.protein) variable is a Boolean value that resolves to true if the sequence data type with the name asequence is a protein sequence. The operation would resolve to BLOSUM62 if the sequence is a protein sequence and resolve to DNAMAT if it was not a protein sequence (i.e. a DNA or RNA sequence).
From EMBOSS 2.8.0 the preferred method is to use the automatic ACD variable $(acdprotein) which is set to the type of the first input sequence. This makes the conversion of ACD files for GUI interfaces and other wrappers simpler. The examlpe then becomes:
string: matrix [ default: "@($(acdprotein) ? BLOSUM62 : DNAMAT)" ]
The results will be the same because internally EMBOSS will use the value of "$(asequence.protein)"
In the case-type operation, the test value is compared with a list of possible values. If a match is found, the operation resolves to the result associated with that possible value. The test value, which is parsed as a string, is followed by an equal sign '=', which in turn is followed by one or more pairs of possible and associated values, separated by a colon ':'. If none of the possible values match, the operation will resolve to the default result, that is associated with the keyword else.
The else : default value pair is not mandatory and if none of the possible values match in a operation without the default value, the operation will resolve to a null string. Formalised:
@(testval = poss_valA : ass_valA poss_valB : ass_valB else : default_val)
Example:
string: matrix [ default: "@($(sequence.type) = protein : BLOSUM62 dna : dnamat rna : rnamat else : unknown)" ]
The $(sequence.type) variable is a string value that holds the type of sequence present in the sequencedata type, with the name sequence. If the type is 'protein', the operation resolves to BLOSUM62, if the type is 'dna' it resolves to DNAMAT, etc. If the type is not in this list, the operation resolves to unknown.
If the test value cannot unambiguously be assigned to a single associated value, the operation will resolve to the LAST associated value that matches its possible value.
The use of conditional operations in ACD files is often to test the values of list or selection data types.
These tests are often used for several other qualifiers. To help understand the ACD file, and to help the developers of ACD parsers, an ACD file can use a variable definition to define the result once only, and then to refer to the variable by name in all later ACD data type definitions.
Example1:
variable: usermatrix "@($(pwmatrix) == o)" infile: pairwisedata [ additional: "$(usermatrix)" default: "" nullok: "@(!$(usermatrix))" information: "Filename of user pairwise matrix" knowntype: "comparison matrix" ]
Note that as a variable only has a single value and no attributes the square brackets are not used.
Variables are used to simplify the ACD file, but they do indicate that there is some complexity in the ACD definitions. When a variable is used, or when a conditional operation refers to another ACD value, the application can be regarded as two or more separate applications with each possible condition resolved.
The parameters and qualifiers defined by an ACD file are processed in the order in which they appear. This is sufficient for ACD processing by EMBOSS applications, but does not give enough detail for user interfaces to build clean groupings of options.
To help user interfaces, all ACD parameters and qualifiers are now grouped into 5 major sections and in some cases into subsections. The 5 major sections always appear in the following order in the ACD file (the order is tested by the acdvalid tool):
Section name | Description |
Input | Input values, including any infile, sequence, seqset, seqall, matrix, fmatrix, codon, or any other ACD type that will read input. At present datafile is included, although this may change. Other qualifiers related to input can also be placed in this section. |
Required | Parameters and required qualifiers, including any whose "additional" attribute can be true but depends on a conditional operation. Also any toggles that their definitions use. Note that input and output parameters and qualifiers must be in their respective sections. Other qualifiers related to input and output can also be placed in those sections. |
Additional | Additional qualifiers, including any whose "additional" attribute can be true but depends on a conditional operation. Also any toggles that their definitions use. Note that input and output parameters and qualifiers must be in their respective sections. Other qualifiers related to input and output can also be placed in those sections. |
Advanced | Any qualifiers (except input and output qualifiers) which have no "standard" or "additional" attribute defined. Other qualifiers related to input and output can also be placed in those sections. |
Output | Output values, including any outfile, outdata, seqout, seqoutall, seqoutset, outtree or any other data type that will write output. This is the last section to be defined, so all output definitions must be at the end. Other qualifiers related to output can also be placed in this section. |
All sections and subsections are defined in the file sections.standard which is stored and installed in the same directory as the ACD files.
The behaviour of EMBOSS programs can not only be influenced by command line options, but also by environment variables. Some of the environment variables were already mentioned in section 3.4.2.2 for the global qualifiers and they are listed here also, for completeness. There are a few others that are not directly related to specific data types, but are more general to the workings of an EMBOSS program.
All environment variables should be described in the file variables.standard which is stored and installed in the same directory as the ACD files and is used to generate the table below. Unlike the other .standard files, there is no explicit test for a variable to be defined in this file.
Environment variable |
Type |
Description |
EMBOSS_ACDCOMMANDLINELOG |
string |
Log file for full commandline, used to convert QA test definitions into memory leak test command lines |
EMBOSS_ACDFILENAME |
boolean |
Use filename rather than sequence name as default for file naming |
EMBOSS_ACDLOG |
boolean |
Log ACD processing to file program.acdlog to debug ACD processing |
EMBOSS_ACDPROMPTS |
integer |
Number of times to prompt for a value interactively |
EMBOSS_ACDROOT |
string |
EMBOSS root directory for finding files |
EMBOSS_ACDUTILROOT |
string |
EMBOSS source directory for finding files |
EMBOSS_ACDWARNRANGE |
boolean |
Warn if a number is out of range and fixed to be within limits |
EMBOSS_AXIS2C |
string |
Set to 1 if built with the axis2c library |
EMBOSS_AXIS2C_HOME |
string |
AXIS2 C library directory |
EMBOSS_CACHESIZE |
integer |
Cache size to use for database indexing |
EMBOSS_DATA |
string |
EMBOSS directory for finding data files |
EMBOSS_DEBUG |
boolean |
Write debug output to program.dbg unless -nodebug is on the command line |
EMBOSS_DEBUGBUFFER |
boolean |
Buffer debug output to save I/O time but risk losing output on a crash |
EMBOSS_DOCROOT |
string |
EMBOSS directory for finding application documentation |
EMBOSS_EDAM |
string |
Full path to EDAM obo format file for checking of relations attributes by acdvalid |
EMBOSS_FEATWARN |
boolean |
Print warning messages when parsing feature table input |
EMBOSS_FILTER |
boolean |
By default read standard input and write to standard output unless -nofilter is on the command line |
EMBOSS_FORMAT |
string |
Input sequence format |
EMBOSS_GRAPHICS |
string |
Default graphics output device |
EMBOSS_HTTPVERSION |
string |
HTTP version |
EMBOSS_LANGUAGE |
string |
(Obsolete) Language used for the codes.language file |
EMBOSS_LOGFILE |
string |
System statistics log file |
EMBOSS_MYSQL |
string |
Set to 1 if built with the mysql library |
EMBOSS_OPTIONS |
boolean |
Prompt for optional command line values unless -nooptions is on the command line |
EMBOSS_OUTDIRECTORY |
string |
Directory used to write output |
EMBOSS_OUTFEATFORMAT |
string |
Output feature format |
EMBOSS_OUTFORMAT |
string |
Output sequence format |
EMBOSS_PAGER |
string |
Application to use for pages output to screen |
EMBOSS_PAGESIZE |
integer |
Page size to use for database indexing |
EMBOSS_POSTGRESQL |
string |
Set to 1 if built with the postgresql library |
EMBOSS_PROXY |
string |
HTTP proxy server address in the form proxy.xyz.ac.uk:7890 |
EMBOSS_SECCACHESIZE |
integer |
Secondary cache size to use for database indexing |
EMBOSS_SECPAGESIZE |
integer |
Secondary page size to use for database indexing |
EMBOSS_SEQWARN |
boolean |
Print warning messages when parsing standard sequence characters |
EMBOSS_SQL |
string |
Set to 1 if built with either the mysql or postgresql library |
EMBOSS_STANDARD |
string |
EMBOSS root directory for finding standard files |
EMBOSS_STDOUT |
boolean |
By default write to standard output unless -nostdout is on the command line |
EMBOSS_TIMETODAY |
string |
Date and time to override the current date - used to give a standard date and time for test runs |
EMBOSS_USERDIR |
string |
EMBOSS root directory for finding user files |
EMBOSS_WARNOBSOLETE |
boolean |
Print warning messages when ACD file declares an application as 'obsolete' |
Table 10. Environment variables.
A set of 6 utility programs will run, test and document an ACD file without the need to write the program which will use the ACD file.
The recommended approach for developing new applications is to first write and test the ACD file and then to write the application to use the values defined by the ACD file.
The acdc utility processes an ACD file in exactly the same way as an application, even if the application itself has not yet been written.
acdc can use general qualifiers such as -debug. Note that as the input files are read any debug calls made by the input functions will be reported.
The acdtrace utility runs like acdc but also reports the resolution of any ACD varaibles and operations as the file is processed. The output on screen can look a little confusing but is by far the best way to see how variables and operations work in your ACD file.
The acdvalid utility validates an ACD file, testing many features which will not prevent an application from running, but will create problems for the user interface (commandline or some wrapper).
Among the features tested by acdvalid are:
If the message is a "Warning" then the ACD file will work, although it is worth trying to fix the problem. Recommended solutions are described in the web page http://www.ebi.ac.uk/~pmr/emboss/acdvalid-fix.html used by the developers
Further validation tests will be added in future releases so it is worth running acdvalid on all local ACD files with each new version of EMBOSS
The acdpretty utility simply reads an ACD file and rewrites it with clean indentation to file (programname).acdpretty which can be used to overwrite the original ACD file.
The acdtable utility is used to create the table of qualifiers, allowed values and defaults that appears in the application documentation. The allowed values uses the valid attribute, and the default value uses the expect attribute for cases where the ACD definition alone is not enough to define the value to be reported.
The acdc utility
runs like acdc but also produces a file (programname).acdlog which documents the internals of ACD processing.Most of the EMBOSS programs will be started from the UNIX command line, either with or without extra parameters and qualifiers. Which parameters and qualifiers can appear on the command line, is defined in the Ajax Command Definition (ACD) file that is associated with the EMBOSS program (See 1.2).
The Command line syntax is very versatile and it does not restrict the available syntax more than is strictly necessary. To save confusion, there will be a recommended EMBOSS command style, which probably will be the UNIX style using '=' for parameter and qualifier values.
For parameters it is not always mandatory to use the name of the parameter on the command line. If the parameter: attribute was used for a parameter it is not mandatory to use the name of the parameter as a prefix to the parameter value (See 3.4.1.1.1). For qualifiers it is always mandatory to provide the name of the qualifier (if a value for the qualifier is to be given on the command line).
In the rest of the definition of the command line syntax, wherever the word qualifier is used, it means both parameters and qualifiers. If 'parameter' is used it will only apply to parameters.
Example:
ACD definition
application: seqdemo sequence: asequence [ parameter: Y ] boolean: output [ default: Y ]
Command line :
% seqdemo filename.seq -output % seqdemo filename.seq -nooutput
In the first command line example the bool parameter output is set to True (although it could have been omitted since the default value is True).
In the second command line example the output parameter is set to False, by the prefix 'no'.
Sequence specifications conform to the EMBOSS Uniform Sequence Address, but parts of the specification can also be given on the command line.
Examples
The following command lines all tell seqdemo to read sequence paamir.tfa in fasta format, starting at base 25.
% seqdemo -sbeg 25 paamir.tfa -sf fasta % seqdemo fasta::paamir.tfa -sbegin=25 % seqdemo -sbegin=25 fasta::paamir.tfa % seqdemo -sbegin=25 paamir.tfa -sformat fasta % seqdemo -sbeg 25 paamir.tfa -sf=fasta % seqdemo -sbeg 25 -sequence=paamir.tfa -sf=fasta % seqdemo sbeg=25 -sequence=paamir.tfa sf=fasta % seqdemo -sbeg 25 -sequence paamir.tfa -sf fasta % seqdemo /SBEG=25 /SEQUENCE=paamir.tfa /SF=fasta
This may seem rather confusing, but only because there is no enforcement of a standard recommended way for users to specify the command lines.
For general use, we strongly recommend the first example above.
This part is intended as a simple guide to getting started as a developer using EMBOSS. EMBOSS is a new package, and can seem difficult at first. If you follow these steps you will find it can be easy.
We start by assuming you want to write a new application. You will need to write the application source code, which will use the EMBOSS libraries, and you also need to add the application to EMBOSS.
Strangely, you add the application to EMBOSS before you write any source code. This is because EMBOSS can use the application command definition (ACD) file to test all the input for you, without any new source code.
ACD files live in the emboss/acd/ directory with a filename or appname.acd ("appname" is the name of your application). You will find some files there already which you are welcome to use as templates for your own ACD file.
application: appname
Defines the application name. All ACD files start this way.
sequence: sequence [ parameter: Y ]
This asks for a sequence as the first parameter on the command line.
outfile: outfile [ parameter: Y ]
This asks for an output file name as the second parameter on the command line.
integer: weight [ ]
Allows an integer value to be specified by "-weight" on the command line. It will not be prompted for unless you add "standard:Y", in which case you should add "prompt: 'prompt for the user'" as well.
There are other things you can specify too. All values can have defaults provided in the ACD file and tests to make sure the values are reasonable. See the ACD documentation for more information.
Now for the cunning part. EMBOSS has an application called acdc which can pretend to be any other application. You put
% acdc appname
on the command line. It will read appname.acd and will read in any required data just as if the application itself was running. It will also test anything else you add on the command line and report syntax errors in exactly the same way as the real application.
When the ACD file is ready, which should not take long, you can start on the application code. This lives in emboss/appname.c and to start with you will simply call the startup routines and pick up the values you defined in the ACD file.
#include "emboss.h" AjPSeq seq; AjPFile outfile; int iweight; int main (int argc, char * argv[]) { embInit ("appname", argc, argv); seq = ajAcdGetSeq ("sequence"); outfile = ajAcdGetOutfile ("outfile"); iweight = ajAcdGetInt ("weight"); ajExit(); }
The "embInit" call is exactly what acdc was doing. It will read appname.acd and read everything you need from the command line or by prompting the user.
The next 3 lines pick up the sequence, output file and integer value in the suggested ACD file (you will have your own set of calls here).
Now you are ready to write your code. The sequence is in seq in an internal representation. You can use EMBOSS functions to work with this sequence, or convert it to a string and use the string functions, or convert it to a null-terminated C character string and use the C library functions and C pointers. It does not really matter which you choose.
All output will be written to outfile. You should use the EMBOSS output functions to do this, typically ajFmtPrintF which works just like C's "printf" except that it uses an AJAX file object (AjPFile) and has some extra format options like %S for AJAX strings (AjPStr).
You can add a new application without too much difficulty if you are using the full developers version from the CVS server. Go to the emboss directory (/emboss). Edit file Makefile.am and make two changes. Add appname to the list of applications in bin_PROGRAMS and add a new line:
appname_SOURCES = appname.c
Then go back up to the top directory ( emboss/, and run ./configure which will magically update the makefiles for you. You can then use make to make EMBOSS with your new application.
What next?
Time to write some real code for your application. Good luck and happy coding!
The following ACD file for a hypothetical application called ajtest tests the data types for both required and optional values. The application will prompt for one value of each data type, in the order in which they are defined, and will accept definitions of the optional data on the command line.
# AJTEST application # AJAX COMMAND DEFINITION (ACD) FILE # use "" for missing values - these are required. # values in "" are trimmed to single spaces. # everything is treated as single tokens delimited by white space # (space, tab, newline) # pmr 8-jul-98 application: ajtest [ documentation: "Testing ACD files" groups: "Test" ] boolean: reqbool [ default: Y standard: Y information: "Required bool" ] boolean: bool [ default: N information: "Another bool" ] integer: reqint [ minimum: -50 maximum: +50 standard: Y information: "Number -50 to 50" ] integer: int [ minimum: -50 maximum: +50 information: "Enter a number -50 to 50" additional: y ] float: reqfloat [ minimum: -0.07 maximum: 2.5 standard: Y information: "Float to 2.5" ] float: float [ minimum: -7e-2 maximum: 2.5 information: "Float -0.07 to 2.5" ] sequence: psequence [ parameter: Y ] outfile: outfile [ default: stdout extension: "test" name: "ajtest" type: "text" standard: Y ] sequence: qsequence [ parameter: Y ] string: reqstring [ default: "abcd" standard: Y information: "rqstring" minlen: 4 ] string: string [ default: "b" information: "string" minlen: 1 maxlen: 50 ]
The AJAX Command Definition, and the command line entered by the user, are processed automatically by a single start-up call to embInit. The same call also handles all prompting of the user for missing information.
To help in evaluating the ACD files, there is a special EMBOSS application acdc (the ACD compiler) which, when given the name of an ACD file as the first argument on the command line, will process that file and use the remainder of the command line. This causes acdc to behave exactly like any (possibly not yet written) application and makes it very easy to test how a particular application could be defined.
For example
% acdc ajtest
would use the example ACD file above and would prompt for each of the required data types:
% acdc ajtest Testing ACD files Required bool [Y] : Number -50 to 50 [0] : Float to 2.5 [0.0] : First sequence : gcg::egmsmg.gcg Output file [stdout] : Second sequence : paamir.tfa rqstring [abcd] : fred % acdc ajtest -sask Testing ACD files Required bool [Y] : Number -50 to 50 [0] : Float to 2.5 [0.0] : First sequence : gcg::egmsmg.gcg Begin at base [1] : End at base [1217] : Reverse strand [N] : Output file [stdout] : Second sequence : paamir.tfa Begin at base [1] : End at base [1000] : Reverse strand [N] : rqstring [abcd] : fred % acdc ajtest -noreqb -reqf=1.5 egmsmg.gcg -sformat gcg Testing ACD files Number -50 to 50 [0] : Output file [stdout] : Second sequence : paamir.tfa rqstring [abcd] : fred % acdc ajtest -reqst=xyz Testing ACD files Required bool [Y] : Number -50 to 50 [0] : Float to 2.5 [0.0] : First sequence : gcg:egmsmg.gcg Output file [stdout] : Second sequence : paamir.tfa Too short - minimum length is 4 characters
This documentation was originally written by Thon de Boer at the HGMP RC in Hinxton, UK with input from Peter Rice at the Sanger Centre and Gary Williams at the HGMP RC. it is now maintained by Peter Rice at the European Bioinformatics Institute
A web version can be found at http://emboss.sourceforge.net/developers/acd/
[1] Input (and output) will usually take place using files, but 'files' is used here in the broadest sense, since there are many different ways that the input can actually take place (via the USA method).
[2] The global qualifiers all default to 'false' at this moment, so there is no compelling reason to use this syntax.
Need to add qualifier_assocqualifier syntax
Need to add a lot more on features input and output and notes on features in reports
Check details for ranges - can we add some attributes e.g. forwardonly so range appears in the tables
Check the -nooutput option in the example(s)
acd error checks - list and example for each - and how to log and debug the details
cpdb, scop should all use name and extension attributes - makes more sense than just name. Only codon should have a default.
datafile - does extension get added to anything the user specifies? always, only if it has no extension?