EMBOSS: Project Meeting (Mon 3rd September 07)
Alan has committed the mira EMBASSY package for Bastien Chevreux's fragment assembly package. There are two applications emira and emiraest which are the same except for one ACD option with a different default value. The author will test the EMBOSS version in the next few weeks. Meanwhile a new version is expected to support 454 sequencing.
More test cases are needed for jaspscan to use the other 2 data directories. Peter will contact the JASPAR team for biologically relevant examples.
Peter has added GFF3 to the feature output formats. GFF3 will become the default feature output for the next release. GFF3 differs from GFF2 in the strict definition of feature types based on the Sequence Ontology, a change to the tag-value syntax to use '=' instead of spaces, restrictions on the allowed meta tags in file headers, and integration with FASTA format sequence data.
Peter is rewriting the internals of feature handling to simplify the definition files for each feature format. As GFF3 (and the Sequence Ontology) now supports protein features defined through the BioSapiens project, SO terms will be used internally to define protein features (they were used for nucleotide features from release 5.0.0). The current feature definitions use two files Efeatures.format and Etags.format to describe the feature types and the tag-value pairs used to annotate them. For release 5.0.0 these were based on the EMBL/Genbank features table for nucleotides, with the SwissProt feature table for proteins. For GFF2, GFF3 and internal types there were duplicated definitions with a few additions. This is changed to allow "#include" statements to include the main EMBL definition, with additional modifying definitions for each specific format.
The GFF3 changes will take a few more days to complete. Changing the default feature output format will change the results of more than 70 QA tests each of which needs careful checking.
Alan requested a new facility in ACD to detect whether the user has changed a default value. Peter had added a function ajAcdIsUserdefined which can report whether a value has been set on the command line or in response to a prompt, but this reports a change where the default value has been specified (common with boolean qualifiers). It may be possible to extend this function to compare the final value with the stored default value. Alan will check for possible problems in detecting some command line settings.
Peter will modify acdpretty to preserve ACD comments. At present these are deleted when building pretty formatted output. As we now record the line number of each ACD token when parsing, we can record the location of comments. Whole line comments are easily preserved. Formatting end-of-line comments is more tricky, but should be possible by modifying the right margin of the pretty output.
Peter will revise the processing of default values and user replies to allow a string to have a value containing only a space. This is currently removed from the default value during processing. Some ACD qualifiers for mira require spaces as default values.
Mahmut has been working on "DASGFF" XML output. This format is used by MYDAS DAS annotation servers including the BioSapiens project and EBI's UniProt DAS server. Peter will add any functions needed to identify specific feature classes (e.g. CDS) as these will be useful to update other applications that have feature types hardcoded.
DAS2 can be considered in future when it becomes established.
Mahmut found a problem with the AppLab server which was fixed by running jmap. This indicates something that was fixed by starting memory monitoring, possibly locking of ports that were freed by garbage collection or threads waiting for a signal. The server will be updated to the latest Java 6.0 update on the next restart.
A user has produced an emacs mode for editing ACD files. Alan will contact him to possibly merge in the EMBOSS C style.
Alan has produced a fix for the primer3_core launch problem on Windows and is waiting for confirmation from the user who originally reported the problem. The suspected cause was a pipe locking issue in the operating system calls.
Jon has committed the latest revisions of the book text and templates.
Alan will add an extra column to Eamino.dat for monoisotopic residue weights and a boolean command line option to direct applications to use average or monoisotopic weights. Average values are used for gel digests. Monoisotopic values are required for mass spectrometry analysis.
The next meeting is on Monday 17th September.