EMBOSS: Project Meeting (Mar 15th 1999) |
Peter proposed that the "parameter" attribute should change from an integer to a boolean. This would be the same as "required" and "optional", and would make sure that all parameters are defined in the correct order. As integers, it is possible to define (and prompt for) parameter 2 before parameter 1. This is inconsistent, and also makes the task of writing out help text much more difficult. The change was agreed.
Peter has implemented, but not yet committed, definition of a "documentation" attribute for applications. All applications should now have square brackets containing a "doc:" definition, which will be printed when the application starts if user interaction is allowed.
Peter has also implemented a "help" attribute as a default for all data types. This is intended as the help text when the application usage is printed. This will be turned on with "-help" on the command line.
Both these new texts need some way to provide simple formatting. Peter proposed implementing a plain backslash as a required newline for both "documentation" and "help". This can be extended later by giving some other meaning to other strings that start with a backslash. It also has the advantage of being relatively easy to read.
The "-help" command line qualifier should convert the ACD definition into something similar to the current style for command line syntax in the application documentation. The format was discussed, and the consensus was for a layout as follows:
% appname sequence outseq Mandatory arguments: [-sequence] Default text or help text [-outseq] Default text or help text -quala Default text or help text -qualb Default text or help text Optional arguments: -optqualc Default text or help text -optquald Default text or help text Advanced arguments: -hiddenquale Default text or help text -hiddenqualf Default text or help text Associated qualifiers: -sequence -sformat1 Default text -sprompt1 Default text -outseq -osformat1 Default text -sprompt1 Default text
This will probably need "-help" to take a string value with the help level needed, or a set of -help qualifiers.
Associated qualifer help text would be defined as an additional field within the ACD source definitions.
If more than one qualifier with the same associated qualifiers is used (e.g two input sequences) the additional qualifiers will be summarized as a list of "-sformat2" etc. with the note "(see above)"
There is a design problem with boolean qualifiers. After some discussion, it was proposed that:
Gary's utility programs define sequence regions as a series of start and end locations in a string. Peter proposed to make this a new "regions" data type so that standard syntax and validation could be applied.
The "transeq" program has its own list of translation tables. Peter proposed a new "translation" data type that would read the NCBI tables and use standard naming. There should also be an internal default of the standard genetic code which all applications can use. If alternative tables are allowed, this should be defined as "-translate" in the ACD file, usually as a hidden qualifier because only advanced users will need it.
The ajposreg functions have been successfully tested against a set of several hundred test expressions. They automatically detect the number of substrings defined by a regular expression, and always save them unless the Nosub version of the compile function is used. There are also case insensitive versions and versions which can handle multiple lines within the search string.
POSIX 1003.2 includes extended regular expressions, implemented in ajposreg, and basic (grep) regular expressions which are not covered (the flag is ignored) because the ajreg library should be faster and better for them.
The main reason to implement ajposreg is the support for "bounds" of repeat sizes, for example ".{2,5}" to match 2 to 5 characters. This can be useful in prosite motifs.
POSIX 1003.2 includes "bracket expressions" for character classes. These include [:lower:] for lower case, [=e=] for accented and unaccented 'e' characters, and [.eszet.] for special characters like the German "ss" character.
The library also supports back references where a backslash followed by a number is the nth substring already matched by the same regular expression.
Example code for using the library functions and the data objects will be included in this document, and can be cut and pasted from the HTML version. This should be easier than trying to include them in the source code.
On the other hand, it is easier to keep the main function documentation up to date by embedding it in the source file function headers so this will remain the primary reference material for individual functions.
Graphics test applications are "treetypedisplay" for the line and text drawing functions and "histogramtest" for the histogram functions.
Some graphics programs are not using the graph data type in ACD files. This still needs to be merged with the graphics options implemented to date.
Rodrigo's build problems were solved. It was a local compiler problem.
Alan has added 4 new applications, "ant", "nab", "sig" and "cpg" which are replacements for EGCG programs "antigenic", "helixturnhelix", "sigcleave" and "cpgreport" respectively.
There was discussion about application naming. Peter proposed posting this topic to the emboss mailing list and to bionet to get some user views.