EMBOSS: Project Meeting (Monday 18th Oct 2010)

EMBOSS: Project Meeting (Mon 18th October 10)

Attendees

EBI: Peter Rice, Alan Bleasby, Jon Ison, Michael Schuster
Visitors:
Apologies: Mahmut Uludag,

1. Minutes of the last meeting

Minutes of the meeting of 11th October 2010 are here.

2. Maintenance etc.

2.1 Applications

2.2 Libraries

Peter has renamed the "gff" feature definition files to "gff2" as the default is now "gff3".

2.3 Other

Alan has updated the configuration files to support large files. We need to announce the new system-specific configurations and get feedback from developers.

3. New developments

3.1 Axis2C

Mahmut is using the AJAX ajdom code for document parsing. It may be possible to enrich some methods, for example to process lists of features in DASGFF.

An alternative is the axiom (axis XML) library for SOAP results which provides an easy way to retrieve values deep within the XML structure. Mahmut reported that axis2 has a very granular architecture with many libraries required even for a simple application.

The GSOAP library requires 11 functions in stubs to create a simple interface. The Axis2 interface is more natural, avoiding stub code.

Alan has received no reply from the GSOAP author on licensing issues. There are no problems with using Axis2 as there is no need to import code. Axis2 is a more recent project, from the Apache group, and is considered to be a more reliable choice.

Mahmut noted that Jemboss uses a Java version of axis1.

3.2 Data access methods

Alan is looking at alternative indexing methods. So far, B+ trees look to be the best for identifiers and accessions. Long keys are problematic. There are free text compressed indexing methods used in bibliographic searching. Indexes are compressed and fast, using Huffman-encoding with a size approximately 50% of the original source text. The MG system (Managing gigabytes) has been used for image searches

3.3 EDAM

Jon noted a request from Matus to add BioXSD as an input/output format for sequences and features.

3.4 Data types

Peter has split the ajfeat code into data, read and write source files as for other data types.

For text data, the input is stored as strings for each input line and copied on output. Parsers can be added to handle HTML or XML markup.

4. Administration

The file server is online again. The temporary user home directories have been removed.

5.0 Documentation and training

5.1 Books

Alan has essentially completed the replies to the copy editor for the Administrators' Guide.

Jon is looking into indexing. We can create an index of terms, but also need to identify the final page numbers. We need more discussion with the publishers on the best way to send index data.

6. User queries and answers

All done.

7. AOB

None.

8. Date Of Next Meeting

The next EMBOSS meeting will be on Monday 25th October. Peter will be away.