EMBOSS: Project Meeting (Mon 10th November 08)
Alan modified the density application to use one type of graph output. The two graph outputs caused problems in defining SoapLab services. Two other applications dottup and dotmatcher have the same issue. These can be split into two separate applications.
Peter is repeating the profiled dbxflat full indexing of EMBL using the new local disk. The description index requires far longer to process than the other fields. Peter suggested adding field-specific values for the cachesize and perhaps also for the pagesize to the resource definition. Alan needs to check the index deletion code for possible side effects.
Peter has contacted the submitters of perl applications. He will convert each script into one or more EMBOSS applications and send to the original authors for review and testing. These applications can also become examples for developer documentation.
Peter had added an attribute "large" to integer and float ACD datatypes. When this is set to "Y" the new functions ajAcdGetLong and ajAcdGetDouble can return ajlong and double values. The internal representation of integers is changed to ajlong, and the internal representation of float is changed to double. The minimum and maximum values default to the appropriate values for the data size.
Peter has added all multiple alignment sequence formats to the list of valid alignment formats. The functions needed are trivial.
Peter has revised the parsing of PDB structure data, to avoid problems with missing records in some sample data files. Empty chain sequences are now ignored. PDB format can now process any protein structure entry. Model (NMR) structures append the model number to the chain ID. Nucleotide structure data can be read by formats "pdbnuc" and "pdbnucseq". By default protein sequence is read. We do not return both protein and nucleotide sequence data by default (e.g. for DNA binding protein structures) because the mixed sequence types are generally not acceptable by EMBOSS applications.
Peter has updated report processing to include statistics for the number of sequences and the number of bases/residues. These are counted in the AjPSeqall object and passed to the report in a call to ajReportSetSeqstats before the ajReportClose call writes the report tail. An extra line should be written to annotate output truncated by exceeding the maximum features per run.
Peter has added functions to record CPU time used by an application. The system function "clock" on Linux stores the cpu time as clock ticks per microsecond, but only in 32 bits so the value overflows after 30 minutes. The ajClock functions (assuming they are called at least once every 30 minutes) count the number of overflows and can report a true cpu time figure. The value is reported by the profiled version of dbxflat.
Peter noted that recent bugfixes broke the ability to turn off output files from the command line (for example the -noorigfile option in etandem. The option reset the default filename to "" which resulted in the default output file name being created. This is now fixed by adding an internal attribute for ACD types to note that the "no" prefix had been used.
Mahmut presented a SoapLab tutorial at last week's EMBRACE/EMBnet workshop in Uppsala. The server configuration was updated to allow a larger number of users during the workshop. Nine of the participants have installed SoapLab2. Many have also created their own new services.
Mahmut will return to the issue of typed services with individual WSDL files, working with Shaun on common data type definitions.
Mahmut removed some SoapLab services for EMBOSS applications that are not part of the main release. These can be replaced if users require them.
Mahmut will compare SoapLab metadata with the application definitions used by the Galaxy browser. Peter noted that Galaxy is a preferred EMBOSS interface for future collaborative work.
The SoapLab server reported errors when the LSF log file was on a full disk. A null pointer exception has not been traced. No action can be taken unless it recurs.
Peter noted that the Open Bio Foundation has a committee meeting next week. Peter intends to join the teleconference.
Jon reported on the status of the books. The second edit of all three books is completed. The developer's guide needed the most work. Text bloat has been reduced and chapters rearranged. XML has been revalidated, links fixed and "todo" sections added to all files. A complete "todo" list will be created from these and reviewed.
Remaining tasks include some minor sections of code cleanup and writing overviews of the functions for each AJAX library module. Some distribution files need changes to add missing content, although much can be automatically generated (e.g. short descriptions for functions from the first paragraph of the full description). The FAQ list needs conversion to HTML and XML formats. Text is needed on DBX indexing examples in the adminstrator's guide. An overview is needed for writing wrappers (MIRA-style) to third party applications.
Sequence format names have been revised in the documentation, and need to be checked for consistency.
Automated text needs to be generated (the "todo" list includes the set of missing sections).
The text licensing needs to be carefully worded from the contracts and included in the book text.
Tables of contents and indexes need to be generated. CUP will be contacted for guidance on the information needed for indexing.
Work is needed on the stylesheets for the various format conversions: for the website, for PDF and for the publishers.
Peter noted that SourceForge now allows interactive shell login again, and has been able to update the website from CVS. The file modifications needed to fix the access problems caused by SourceForge's recent changes have been copied back and committed to CVS.
Jon has completed modifications to the application documentation templates. These will be copied to the website when Peter next applies documentation updates (after all tests run cleanly with the report format and other recent changes).