EMBOSS: Project Meeting (Mon 7th June 10)
Mahmut tested supermatcher with 10k short reads against 32 Illumina adaptor sequences. After implementing the Rabin-Karp algorithm the run was twice as fast. Some further improvement may be possible.
Peter has reviewed all the EFUNC and EDATA messages. acdrelation has been updated. The main program is moved to the top of the source file, functions static with a program name prefix.
Peter will update acdvalid to check the knowntype against the EDAM term in the knowntypes.standard file.
Peter has reviewed all the EFUNC and EDATA messages. White space before functions is cleaned and fixed at 4 lines. Any ifdef blocks now start before the function documentation. "Fixme" comments are moved into the top of the function source code.
Messages from the Ensembl library code are cleaned except for those from the namrule, argrule and valrule definitions. These require function names to be standardized, functions to be sorted alphabetically, and standard naming then defined to fit the new names.
Mahmut found an issue with the name calculated attribute where the first input was a seqall. There is also an issue with capitalisation of default file naming when the first input is a seqset. Peter will investigate.
Alan cleaned up some code in ajobo and ajtax for mEMBOSS builds.
Peter noted that there is a need to test sharing of binary files on big-endian systems. Alan will identify a suitable test system to check code that has endian tests.
Alan described BioMart's rules for ordering of results. A sequence attribute is always reported first, followed by the first filter term which in current EMBOSS use is the identifier. BioMart format can be altered to expect the sequence and identifier as the first two fields in the tab-delimited record.
Alan could also add a column header record by sending one extra attribute query to the BioMart server.
There are cases where 4 sequences are returned, but each with the same identifier. As this is how BioMart works, we should simply use the BioMart duplicated identifiers for now.
Issues of BioCatalogue support for SAWSDL were discussed. BioCatalogue could, for example, use widgets to suggest annotations based on EDAM terms. BioCatalogue categories could be merged with the EDAM topics branch.
BioCatalogue makes use of social tagging, as did BioMoby. The tags are a source of new terms, and could be cleaned up by annotating them with EDAM terms and merging spelling variants.
BioCatalogue currently adds extra annotation to services. This could be exported as suggested WSDL annotations and offered to the service providers.
EDAM could also be used in rank ordering services.
SoapLab was also discussed. Some of the attendees using SoapLab are interested in access to the latest WSDL annotations and ACD relations. Mahmut will check on the possibility of a SoapLab release.
There was discussion, but no conclusion, on the best tools to support annotated WSDLs. it would be useful to maintain a list of tools that have been tested and found to work or fail for some reason.
Batch PURL submission failed half way through. There were also cases where "tombstoned" terms remained available, and vice versa. The support was not very helpful. Jon will look at alternatives, e.g. OBO-foundry compliant URLs though these are not yet standard. The NCBO portal could create these URLs. We need to check whether there would be support for alternative formats (OBO, HTML, etc.)
Jon is waiting for a reply from the Ontology Lookup Service team.
Several reviewers have been asked to look through the EDAM terms in their areas. More are needed.
Terms for specific databases and ontology names have been removed from EDAM to make navigation simpler.
Terms were added to cover the services provided by workshop attendees.
BAM file input appeared efficient. Benchmarks against other packages need to be run.
BAM output is needed in time for the release. SAM output format was partly implemented for the last release and will also be checked.
Future plans for BAM and SAM formats include the reading of reference alignments. This will probably be after the release.
Jon proposed a generator to use the databases list to create query tools for selected data resources. One example would be to return expression data from relevant resources.
Peter will write a parser for the databases list.
Fedora 13 now includes EMBOSS. A test installation could be added on emboss2.
Alan has added EDAM to the Open-Bio CVS.