EMBOSS: C2 Output Formats


Any format can be used for output, but some formats simply write a sequence and have no problem if the output file is closed at the end of the program. Others, e.g. MSF, store multiple sequences and write them only when the file is closed by a FileClose call.

This means that all applications should use the ajSeqWriceClose call to make sure sequence output is flushed.

Simple output formats:

Multiple output formats:

The aim is to include all formats that EMBOSS accepts for input plus some formats intended for output only.

Issues:

  1. Some formats require data which may be unavailable. For example, TEXT input from standard input has no ID, but EMBL output requires an ID. This could default or could be provided on the command line as "-sentry" for use on output.
  2. The default "id:" can be set to EMBOSS or EMBOSS_00n for output formats that need one.
  3. The default accession number can be any deleted or never used accession number, to be agreed with the EBI. Obvious candidates are M12345 and X00000. It is not clear whether any format absolutely requires an accession number.
  4. The "id"can be specified in an output USA. There is currently no way to specify an accession number this way, unless we try NCBI syntax of "id|accnum" but this seems more trouble than it is worth.

Other points