EMBOSS: Project Meeting (Fri 22nd March 2002)


Attendees

HGMP: Alan Bleasby, Tim Carver, Hugh Morgan, Claude Beazley, Gary Williams, Ranjeeva Ranasinghe, Waqas Awan, Jon Ison
Lion: Peter Rice, Thomas Laurent, Bijay Jassal
Sanger:
EBI:
Apologies:

1. Matters Arising

Alan has been working on the setuid part of Jemboss with Tim.

Tim reported that jemboss works interatively on Solaris and 4 other operating systems. He is continuing to test his jemboss server.

Hugh reported that his xml outputing interface (ajxml.c/h) is now usable. He is working on an output GUI; exploring the use of abstract windows interface native toolkit although other options are possible.

Claude is converting his corba server so that it can run with orbit2 (there were problems with orbit1). He is tidying his code, and working on threading code to enable multiple users of the server.

Gary has embossified (implemented from scratch) Bill Pearsons mrtrans program. The application ('tranalign') takes a set of nucleic acid and protein sequences and aligns the nucleic acid sequences (by gap insertion) to the protein sequences. It has not been committed yet.

Waqas has been debugging funky (application for the generation of a database of functional sites in protein structures from residue contact information).

Jon is continuing with his overhaul of the protein structure applications, including (i) Purification of code, (ii) Testing of applications & emboss test data, (iii) Building databases, ironing out resultant bugs, (iv) Documentation of applications and library code, (iv) Manual on how to use the applicaitons together.

Thomas is working on parsers and viewers for graphics output in SRS.

Peter has worked through bug reports and the query and access methods:

NBRF sequence reading trims last character only if it is '*' to catch cases where SRS reports the sequence as 'plain'

GCG database text has the spaces in ". ." strings removed.

Database entry text and sequence saved for binary formats (GCG, BLAST) for use by entret and other applications

dbiblast indices with split databases (formatdb -v) fixed for reading all entries (was only reading the first file)

dbiblast and dbigcg indices support exclude and file definitions to create database subsets

Database include and file definitions can use the simple filename. In some cases the full path was used. Database files are checked both with and without the directory path for back-compatibility.

srswww access method created to query a remote web server. Preferred to using URL access as SRS queries can be built

Sequence objects include the SeqVersion, Keyword list and Taxonomy list.

SeqVersion (EMBL SV line, GenBank VERSION line) is used in preference to accession number where available. Can also be read in FASTA and NCBI formats. Where only the SeqVersion is available, the accession number is generated.

USA queries implement searches by SV, DES, ORG and KEY. These work with SRS access methods (SRS, SRSFASTA, SRSWWW) by building SRS queries, and with direct access (simple file reading) by testing the sequence object.

Key and Org queries are for full keywords (including spaces) and for each level of the taxonomy.

Des queries, if the access method does not provide a mechanism, (if the access method does not have its own index) are applied to words within the description. Words start with a letter or number, and end with a letter or number. SRS typically does the same, but allows a single quote at the end. This catches words such as 3' and 5' but is a problem with some quoted text.

Queries for ID ACC SV DES ORG and KEY are valid for all file access methods, including URL, external, cmd, app, file and by default any new method added. If the internal query data is not flagged by the access method (to show the database has been queried) the sequence object is automatically tested.

Missing description, keyword, organism, or seqversion fields cause queries to fail if they are used on inappropriate data.

Known problem: Queries of dbiflat (etc.) databases for SV DES ORG and KEY will need new index files.

New database definition token 'fields' with a list of indexed fields can be set to 'sv des org key' for SRS databases.

USAs check the query field against the database 'fields' definition. ID and ACC are always allowed. dbname:name still searches ID and ACC (no change from previous version)

USAs with a filename can include the new query fields. The syntax is filename:field:query for example empro.dat:id:eclaci (the extended syntax is because empro.dat-id:eclaci looks like a filename ending in -id)

2. A.O.B.

3. Date Of Next Meeting

Next meeting to be held at 10.00 on Friday 5th April, HGMP