consambig

 

Wiki

The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.

Please help by correcting and extending the Wiki pages.

Function

Create an ambiguous consensus sequence from a multiple alignment

Description

cons calculates a consensus sequence from a multiple sequence alignment. To obtain the consensus, the amino acid residue or nucleotide at each position is compared to the possible ambiguity codes. The consensus sequence uses the minimum ambiguity code match.

Algorithm

The ambiguity codes are defined in local data files. consambig uses these files to determine which codes match all residues or nucleotides in the input sequences at each position.

Usage

Here is a sample session with consambig


% consambig 
Create an ambiguous consensus sequence from a multiple alignment
Input (aligned) sequence set: dna.msf
output sequence [dna.fasta]: aligned.consambig

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-sequence]          seqset     File containing a sequence alignment.
  [-outseq]            seqout     [.] Sequence filename and
                                  optional format (output USA)

   Additional (Optional) qualifiers:
   -name               string     Name of the consensus sequence (Any string
                                  is accepted)

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin1            integer    Start of each sequence to be used
   -send1              integer    End of each sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -sformat1           string     Input sequence format
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name

   "-outseq" associated qualifiers
   -osformat2          string     Output seq format
   -osextension2       string     File name extension
   -osname2            string     Base file name
   -osdirectory2       string     Output directory
   -osdbname2          string     Database name to add
   -ossingle2          boolean    Separate file for each entry
   -oufo2              string     UFO features
   -offormat2          string     Features format
   -ofname2            string     Features file name
   -ofdirectory2       string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

Standard (Mandatory) qualifiers Allowed values Default
[-sequence]
(Parameter 1)
File containing a sequence alignment. Readable set of sequences Required
[-outseq]
(Parameter 2)
Sequence filename and optional format (output USA) Writeable sequence <*>.format
Additional (Optional) qualifiers Allowed values Default
-name Name of the consensus sequence Any string is accepted An empty string is accepted
Advanced (Unprompted) qualifiers Allowed values Default
(none)

Input file format

The USA of a set of aligned sequences.

Input files for usage example

File: dna.msf

!!NA_MULTIPLE_ALIGNMENT

 dna.msf  MSF: 120  Type: N  January 01, 1776  12:00  Check: 3196 ..

 Name: MSFM1          Len:   120  Check:  8587  Weight:  1.00
 Name: MSFM2          Len:   120  Check:  6178  Weight:  1.00
 Name: MSFM3          Len:   120  Check:  8431  Weight:  1.00

//

        MSFM1  ACGTACGTAC GTACGTACGT ACGTACGTAC GTACGTACGT ACGTACGTAC
        MSFM2  ACGTACGTAC GTACGTACGT ....ACGTAC GTACGTACGT ACGTACGTAC
        MSFM3  ACGTACGTAC GTACGTACGT ACGTACGTAC GTACGTACGT CGTACGTACG

        MSFM1  GTACGTACGT ACGTACGTAC GTACGTACGT ACGTACGTAC GTACGTACGT
        MSFM2  GTACGTACGT ACGTACGTAC GTACGTACGT ACGTACGTAC GTACGTACGT
        MSFM3  TACGTACGTA CGTACGTACG TACGTACGTA ACGTACGTAC GTACGTACGT

        MSFM1  ACGTACGTAC GTACGTACGT
        MSFM2  ACGTACGTTG CAACGTACGT
        MSFM3  ACGTACGTAC GTACGTACGT

Output file format

The output consists of a sequence file holding the consensus sequence.

Output files for usage example

File: aligned.consambig

>EMBOSS_001
ACGTACGTACGTACGTACGTacgtACGTACGTACGTACGTMSKWMSKWMSKWMSKWMSKW
MSKWMSKWMSKWMSKWMSKWACGTACGTACGTACGTACGTACGTACGTWSSWACGTACGT

Data files

consambig uses the standard files Ebases.iub and Eresidues.iub in the EMBOSS data directory.

EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.

To see the available EMBOSS data files, run:

% embossdata -showall

To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:


% embossdata -fetch -file Exxx.dat

Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program name Description
cons Create a consensus sequence from a multiple alignment
megamerger Merge two large overlapping DNA sequences
merger Merge two overlapping sequences

Author(s)

Jon Ison (jison © ebi.ac.uk)
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

History

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None