ehmmbuild

 

Wiki

The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.

Please help by correcting and extending the Wiki pages.

Function

Build a profile HMM from an alignment

Description

EMBASSY HMMER is a suite of application wrappers to the original hmmer v2.3.2 applications written by Sean Eddy. hmmer v2.3.2 must be installed on the same system as EMBOSS and the location of the hmmer executables must be defined in your path for EMBASSY HMMER to work.

Usage:
ehmmbuild [options] alignfile hmmfile

Important note: the alignfile (input) and hmmfile (output) parameters are specified in the reverse order in the original HMMER.

hmmbuild reads a multiple sequence alignment file , builds a new profile HMM, and saves the HMM to file . By default, the model is confgured to find one or more nonoverlapping alignments to the complete model: multiple global alignments with respect to the model, and local with respect to the sequence. This is analogous to the behavior of the hmmls program of HMMER 1. To confgure the model for multiple local alignments with respect to the model and local with respect to the sequence, a la the old program hmmfs, use the -f (fragment) option. More rarely, you may want to confgure the model for a single global alignment (global with respect to both model and sequence), using the -g option; or to confgure the model for a single local/local alignment (a la standard Smith/Waterman, or the old hmmsw program), use the -s option.

Algorithm

Please read the Userguide.pdf distributed with the original HMMER and included in the EMBASSY HMMER distribution under the DOCS directory.

Usage

Here is a sample session with ehmmbuild


% ehmmbuild globins50.msf globin.hmm -nhmm globins50 -strategy D 
Build a profile HMM from an alignment.

hmmbuild - build a hidden Markov model from an alignment
HMMER 2.3.2 (Oct 2003)
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Alignment file:                    ../../data/hmmnew/globins50.msf
File format:                       MSF
Search algorithm configuration:    Multiple domain (hmmls)
Model construction strategy:       MAP (gapmax hint: 0.50)
Null model used:                   (default)
Prior used:                        (default)
Sequence weighting method:         G/S/C tree weights
New HMM file:                      globin.hmm [appending]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Alignment:           #1
Number of sequences: 50
Number of columns:   308

Determining effective sequence number    ... done. [2]
Weighting sequences heuristically        ... done.
Constructing model architecture          ... done.
Converting counts to probabilities       ... done.
Setting model name, etc.                 ... done. [globins50]

Constructed a profile HMM (length 143)
Average score:      189.04 bits
Minimum score:      -17.62 bits
Maximum score:      234.09 bits
Std. deviation:      53.18 bits

Finalizing model configuration           ... done.
Saving model to file                     ... done.
//


/shared/software/bin/hmmbuild -n globins50  --pbswitch 1000  --archpri 0.850000  --idlevel 0.620000  --swentry 0.500000  --swexit 0.500000  --wgsc  -A -F  globin.hmm ../../data/hmmnew/globins50.msf

Go to the input files for this example
Go to the output files for this example

Command line arguments

Where possible, the same command-line qualifier names and parameter order is used as in the original hmmer. There are however several unavoidable differences and these are clearly documented in the "Notes" section below.

More or less all options documented as "expert" in the original hmmer user guide are given in ACD as "advanced" options (-options must be specified on the command-line in order to be prompted for a value for them).

Build a profile HMM from an alignment.
Version: EMBOSS:6.5.0.0

   Standard (Mandatory) qualifiers:
  [-alignfile]         seqset     (Aligned) protein sequence set filename and
                                  optional format, or reference (input USA)
   -nhmm               string     Name for this HMM. The name can be any
                                  string of non-whitespace characters (e.g.
                                  one 'word'). There is no length limit (at
                                  least not one imposed by HMMER; your shell
                                  will complain about command line lengths
                                  first). (Any word)
   -strategy           menu       [D] All alignments are local with respect to
                                  the sequence and are configured to be local
                                  (fragmentary) or global with respect to the
                                  HMM. The model is also configured to find a
                                  single or multiple domains (matches) to a
                                  sequence. The options for configuring the
                                  model are as follows: (D): The default
                                  setting. Multiple domains per sequence,
                                  global alignments with respect to the HMM.
                                  (F): Multiple domains per sequence, local
                                  alignments with respect to the HMM.
                                  Analogous to the old hmmfs program of HMMER
                                  1. (G) Single domain per sequence, global
                                  alignment with respect to the HMM. Analogous
                                  to the old hmms program of HMMER 1. (S)
                                  Single domain per sequence, local alignments
                                  with respect to the HMM. Analogous to the
                                  old hmmsw program of HMMER 1. (Values: D
                                  (global-multidomain); F (local-multidomain);
                                  G (global-singledomain); S
                                  (local-singledomain))
  [-hmmfile]           outfile    [*.ehmmbuild] HMMER hidden markov model
                                  output file

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers:
   -prior              infile     Read a Dirichlet prior from file, replacing
                                  the default mixture Dirichlet. The format of
                                  prior files is documented in the User's
                                  Guide, and an example is given in the Demos
                                  directory of the HMMER distribution.
   -null               infile     Read a null model from file. The default for
                                  protein is to use average amino acid
                                  frequencies from Swissprot 34 and p1 =
                                  350/351; for nucleic acid, the default is to
                                  use 0.25 for each base and p1 = 1000/1001.
                                  For documentation of the format of the null
                                  model file and further explanation of how
                                  the null model is used, see the User's
                                  Guide.
   -pam                infile     Apply a heuristic PAM- (substitution
                                  matrix-) based prior on match emission
                                  probabilities instead of the default mixture
                                  Dirichlet. The substitution matrix is read
                                  from file. See -pamwgt. The default
                                  Dirichlet state transition prior and insert
                                  emission prior are unaffected. Therefore in
                                  principle you could combine -prior with -pam
                                  but this isn't recommended, as it hasn't
                                  been tested. ( -pam itself hasn't been
                                  tested much!)
   -pamwgt             float      [20.0] Controls the weight  on a
                                  PAM-based prior. Only has effect if -pam
                                  option is also in use.  is a positive
                                  real number, 20.0 by default.  is the
                                  number of 'pseudocounts' contriubuted by the
                                  heuristic prior. Very high values of 
                                  can force a scoring system that is entirely
                                  driven by the substitution matrix, making
                                  HMMER somewhat approximate Gribskov
                                  profiles. (Any numeric value)
   -pbswitch           integer    [1000] For alignments with a very large
                                  number of sequences, the GSC, BLOSUM, and
                                  Voronoi weighting schemes are slow; they're
                                  O(N^2) for N sequences. Henikoff
                                  position-based weights (PB weights) are more
                                  effcient. At or above a certain threshold
                                  sequence number  hmmbuild will switch
                                  from GSC, BLOSUM, or Voronoi weights to PB
                                  weights. To disable this switching behavior
                                  (at the cost of compute time, set  to be
                                  something larger than the number of
                                  sequences in your alignment.  is a
                                  positive integer; the default is 1000. (Any
                                  integer value)
   -archpri            float      [0.85] The value of the 'architecture prior'
                                  used by MAP architecture construction. This
                                  value is a probability between 0 and 1.
                                  This parameter governs a geometric prior
                                  distribution over model lengths. As
                                  'archpri' increases, longer models are
                                  favored a priori. As 'archpri' decreases, it
                                  takes more residue conservation in a column
                                  to make a column a 'consensus' match column
                                  in the model architecture. The 0.85 default
                                  has been chosen empirically as a reasonable
                                  setting. (Any numeric value)
   -binary             boolean    [N] Write the HMM to file in HMMER binary
                                  format instead of readable ASCII text.
   -fast               boolean    [N] Quickly and heuristically determine the
                                  architecture of the model by assigning all
                                  columns with more than a certain fraction of
                                  gap characters to insert states. By default
                                  this fraction is 0.5, and it can be changed
                                  using the --gapmax option. The default
                                  construction algorithm is a maximum a
                                  posteriori (MAP) algorithm, which is slower.
   -gapmax             float      [0.5] Controls the -fast model construction
                                  algorithm, but if -fast is not being used,
                                  has no effect. If a column has more than a
                                  fraction  of gap symbols in it, it gets
                                  assigned to an insert column.  is a
                                  frequency from 0 to 1, and by default is set
                                  to 0.5. Higher values of  mean more
                                  columns get assigned to consensus, and
                                  models get longer; smaller values of 
                                  mean fewer columns get assigned to
                                  consensus, and models get smaller. (Any
                                  numeric value)
   -hand               boolean    [N] Specify the architecture of the model by
                                  hand: the alignment file must be in SELEX
                                  or Stockholm format, and the reference
                                  annotation line (RF in SELEX, GC RF in
                                  Stockholm) is used to specify the
                                  architecture. Any column marked with a
                                  non-gap symbol (such as an 'x', for
                                  instance) is assigned as a consensus (match)
                                  column in the model.
   -sidlevel           float      [0.62] Controls both the determination of
                                  effective sequence number and the behavior
                                  of the -wblosum weighting option. The
                                  sequence alignment is clustered by percent
                                  identity, and the number of clusters at a
                                  cutoff threshold of  is used to determine
                                  the effective sequence number. Higher
                                  values of  give more clusters and higher
                                  effective sequence numbers; lower values of
                                   give fewer clusters and lower effective
                                  sequence numbers.  is a fraction from 0
                                  to 1, and by default is set to 0.62
                                  (corresponding to the clustering level used
                                  in constructing the BLOSUM62 substitution
                                  matrix). (Any numeric value)
   -noeff              boolean    [N] Turn off the effective sequence number
                                  calculation, and use the true number of
                                  sequences instead. This will usually reduce
                                  the sensitivity of the final model (so don't
                                  do it without good reason!)
   -swentry            float      [0.5] Controls the total probability that is
                                  distributed to local entries into the
                                  model, versus starting at the beginning of
                                  the model as in a global alignment.  is a
                                  probability from 0 to 1, and by default is
                                  set to 0.5. Higher values of  mean that
                                  hits that are fragments on their left (N or
                                  5'-terminal) side will be penalized less,
                                  but complete global alignments will be
                                  penalized more. Lower values of  mean
                                  that fragments on the left will be penalized
                                  more, and global alignments on this side
                                  will be favored. This option only affects
                                  the confgurations that allow local
                                  alignments, e.g. -s and -f; unless one of
                                  these options is also activated, this option
                                  has no effect. You have independent control
                                  over local/global alignment behavior for
                                  the N/C (5'/3') termini of your target
                                  sequences using --swentry and --swexit. (Any
                                  numeric value)
   -swexit             float      [0.5] Controls the total probability that is
                                  distributed to local exits from the model,
                                  versus ending an alignment at the end of the
                                  model as in a global alignment.  is a
                                  probability from 0 to 1, and by default is
                                  set to 0.5. Higher values of  mean that
                                  hits that are fragments on their right (C or
                                  3'-terminal) side will be penalized less,
                                  but complete global alignments will be
                                  penalized more. Lower values of  mean
                                  that fragments on the right will be
                                  penalized more, and global alignments on
                                  this side will be favored. This option only
                                  affects the confgurations that allow local
                                  alignments, e.g. -s and -f; unless one of
                                  these options is also activated, this option
                                  has no effect. You have independent control
                                  over local/global alignment behavior for
                                  the N/C (5'/3') termini of your target
                                  sequences using -swentry and -swexit. (Any
                                  numeric value)
   -verbosity          boolean    [N] Print more possibly useful stuff, such
                                  as the individual scores for each sequence
                                  in the alignment.
   -weighting          menu       [G] Values (B)(-wblosum in HMMER) Use the
                                  BLOSUM filtering algorithm to weight the
                                  sequences. Cluster the sequences at a given
                                  percentage identity (see -idlevel); assign
                                  each cluster a total weight of 1.0,
                                  distributed equally amongst the members of
                                  that cluster. (G)(-wgsc in HMMER) Use the
                                  Gerstein/Sonnhammer/Chothia ad hoc sequence
                                  weighting algorithm. This is the default.
                                  (K)(-wme in HMMER) Use the Krogh/Mitchison
                                  maximum entropy algorithm to 'weight' the
                                  sequences. This supercedes the
                                  Eddy/Mitchison/Durbin maximum discrimination
                                  algorithm, which gives almost identical
                                  weights but is less robust. ME weighting
                                  seems to give a marginal increase in
                                  sensitivity over the default GSC weights,
                                  but takes a fair amount of time. (W) (-wpb
                                  in HMMER) Use the Henikoff position-based
                                  weighting scheme. (V) (-wvoronoi in HMMER)
                                  Use the Sibbald/Argos Voronoi sequence
                                  weighting algorithm in place of the default
                                  GSC weighting. (N) (-wnone in HMMER) Turn
                                  off all sequence weighting. (Values: B
                                  (Blosum); G (Gerstein/Sonnhammer/Chothia); K
                                  (Krogh/Mitchison); W (Henikoff); V
                                  (Sibbald/Argos Voronoi); N (None))
   -o                  outfile    [*.ehmmbuild] Re-save the starting alignment
                                  to file, in Stockholm format. The columns
                                  which were assigned to match states will be
                                  marked with x's in an RF annotation line. If
                                  either the -hand or -fast construction
                                  options were chosen, the alignment may have
                                  been slightly altered to be compatible with
                                  Plan 7 transitions, so saving the final
                                  alignment and comparing to the starting
                                  alignment can let you view these
                                  alterations. See the User's Guide for more
                                  information on this arcane side effect.
   -cfile              outfile    [*.ehmmbuild] Save the observed emission and
                                  transition counts to file after the
                                  architecture has been determined (e.g. after
                                  residues/gaps have been assigned to match,
                                  delete, and insert states). This option is
                                  used in HMMER development for generating
                                  data files useful for training new Dirichlet
                                  priors. The format of count files is
                                  documented in the User's Guide.

   Associated qualifiers:

   "-alignfile" associated qualifiers
   -sbegin1            integer    Start of each sequence to be used
   -send1              integer    End of each sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -scircular1         boolean    Sequence is circular
   -sformat1           string     Input sequence format
   -iquery1            string     Input query fields or ID list
   -ioffset1           integer    Input start position offset
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name

   "-hmmfile" associated qualifiers
   -odirectory2        string     Output directory

   "-o" associated qualifiers
   -odirectory         string     Output directory

   "-cfile" associated qualifiers
   -odirectory         string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit

Qualifier Type Description Allowed values Default
Standard (Mandatory) qualifiers
[-alignfile]
(Parameter 1)
seqset (Aligned) protein sequence set filename and optional format, or reference (input USA) Readable set of sequences Required
-nhmm string Name for this HMM. The name can be any string of non-whitespace characters (e.g. one 'word'). There is no length limit (at least not one imposed by HMMER; your shell will complain about command line lengths first). Any word  
-strategy list All alignments are local with respect to the sequence and are configured to be local (fragmentary) or global with respect to the HMM. The model is also configured to find a single or multiple domains (matches) to a sequence. The options for configuring the model are as follows: (D): The default setting. Multiple domains per sequence, global alignments with respect to the HMM. (F): Multiple domains per sequence, local alignments with respect to the HMM. Analogous to the old hmmfs program of HMMER 1. (G) Single domain per sequence, global alignment with respect to the HMM. Analogous to the old hmms program of HMMER 1. (S) Single domain per sequence, local alignments with respect to the HMM. Analogous to the old hmmsw program of HMMER 1.
D (global-multidomain)
F (local-multidomain)
G (global-singledomain)
S (local-singledomain)
D
[-hmmfile]
(Parameter 2)
outfile HMMER hidden markov model output file Output file <*>.ehmmbuild
Additional (Optional) qualifiers
(none)
Advanced (Unprompted) qualifiers
-prior infile Read a Dirichlet prior from file, replacing the default mixture Dirichlet. The format of prior files is documented in the User's Guide, and an example is given in the Demos directory of the HMMER distribution. Input file Required
-null infile Read a null model from file. The default for protein is to use average amino acid frequencies from Swissprot 34 and p1 = 350/351; for nucleic acid, the default is to use 0.25 for each base and p1 = 1000/1001. For documentation of the format of the null model file and further explanation of how the null model is used, see the User's Guide. Input file Required
-pam infile Apply a heuristic PAM- (substitution matrix-) based prior on match emission probabilities instead of the default mixture Dirichlet. The substitution matrix is read from file. See -pamwgt. The default Dirichlet state transition prior and insert emission prior are unaffected. Therefore in principle you could combine -prior with -pam but this isn't recommended, as it hasn't been tested. ( -pam itself hasn't been tested much!) Input file Required
-pamwgt float Controls the weight <x> on a PAM-based prior. Only has effect if -pam option is also in use. <x> is a positive real number, 20.0 by default. <x> is the number of 'pseudocounts' contriubuted by the heuristic prior. Very high values of <x> can force a scoring system that is entirely driven by the substitution matrix, making HMMER somewhat approximate Gribskov profiles. Any numeric value 20.0
-pbswitch integer For alignments with a very large number of sequences, the GSC, BLOSUM, and Voronoi weighting schemes are slow; they're O(N^2) for N sequences. Henikoff position-based weights (PB weights) are more effcient. At or above a certain threshold sequence number <n> hmmbuild will switch from GSC, BLOSUM, or Voronoi weights to PB weights. To disable this switching behavior (at the cost of compute time, set <n> to be something larger than the number of sequences in your alignment. <n> is a positive integer; the default is 1000. Any integer value 1000
-archpri float The value of the 'architecture prior' used by MAP architecture construction. This value is a probability between 0 and 1. This parameter governs a geometric prior distribution over model lengths. As 'archpri' increases, longer models are favored a priori. As 'archpri' decreases, it takes more residue conservation in a column to make a column a 'consensus' match column in the model architecture. The 0.85 default has been chosen empirically as a reasonable setting. Any numeric value 0.85
-binary boolean Write the HMM to file in HMMER binary format instead of readable ASCII text. Boolean value Yes/No No
-fast boolean Quickly and heuristically determine the architecture of the model by assigning all columns with more than a certain fraction of gap characters to insert states. By default this fraction is 0.5, and it can be changed using the --gapmax option. The default construction algorithm is a maximum a posteriori (MAP) algorithm, which is slower. Boolean value Yes/No No
-gapmax float Controls the -fast model construction algorithm, but if -fast is not being used, has no effect. If a column has more than a fraction <x> of gap symbols in it, it gets assigned to an insert column. <x> is a frequency from 0 to 1, and by default is set to 0.5. Higher values of <x> mean more columns get assigned to consensus, and models get longer; smaller values of <x> mean fewer columns get assigned to consensus, and models get smaller. Any numeric value 0.5
-hand boolean Specify the architecture of the model by hand: the alignment file must be in SELEX or Stockholm format, and the reference annotation line (RF in SELEX, GC RF in Stockholm) is used to specify the architecture. Any column marked with a non-gap symbol (such as an 'x', for instance) is assigned as a consensus (match) column in the model. Boolean value Yes/No No
-sidlevel float Controls both the determination of effective sequence number and the behavior of the -wblosum weighting option. The sequence alignment is clustered by percent identity, and the number of clusters at a cutoff threshold of <x> is used to determine the effective sequence number. Higher values of <x> give more clusters and higher effective sequence numbers; lower values of <x> give fewer clusters and lower effective sequence numbers. <x> is a fraction from 0 to 1, and by default is set to 0.62 (corresponding to the clustering level used in constructing the BLOSUM62 substitution matrix). Any numeric value 0.62
-noeff boolean Turn off the effective sequence number calculation, and use the true number of sequences instead. This will usually reduce the sensitivity of the final model (so don't do it without good reason!) Boolean value Yes/No No
-swentry float Controls the total probability that is distributed to local entries into the model, versus starting at the beginning of the model as in a global alignment. <x> is a probability from 0 to 1, and by default is set to 0.5. Higher values of <x> mean that hits that are fragments on their left (N or 5'-terminal) side will be penalized less, but complete global alignments will be penalized more. Lower values of <x> mean that fragments on the left will be penalized more, and global alignments on this side will be favored. This option only affects the confgurations that allow local alignments, e.g. -s and -f; unless one of these options is also activated, this option has no effect. You have independent control over local/global alignment behavior for the N/C (5'/3') termini of your target sequences using --swentry and --swexit. Any numeric value 0.5
-swexit float Controls the total probability that is distributed to local exits from the model, versus ending an alignment at the end of the model as in a global alignment. <x> is a probability from 0 to 1, and by default is set to 0.5. Higher values of <x> mean that hits that are fragments on their right (C or 3'-terminal) side will be penalized less, but complete global alignments will be penalized more. Lower values of <x> mean that fragments on the right will be penalized more, and global alignments on this side will be favored. This option only affects the confgurations that allow local alignments, e.g. -s and -f; unless one of these options is also activated, this option has no effect. You have independent control over local/global alignment behavior for the N/C (5'/3') termini of your target sequences using -swentry and -swexit. Any numeric value 0.5
-verbosity boolean Print more possibly useful stuff, such as the individual scores for each sequence in the alignment. Boolean value Yes/No No
-weighting list Values (B)(-wblosum in HMMER) Use the BLOSUM filtering algorithm to weight the sequences. Cluster the sequences at a given percentage identity (see -idlevel); assign each cluster a total weight of 1.0, distributed equally amongst the members of that cluster. (G)(-wgsc in HMMER) Use the Gerstein/Sonnhammer/Chothia ad hoc sequence weighting algorithm. This is the default. (K)(-wme in HMMER) Use the Krogh/Mitchison maximum entropy algorithm to 'weight' the sequences. This supercedes the Eddy/Mitchison/Durbin maximum discrimination algorithm, which gives almost identical weights but is less robust. ME weighting seems to give a marginal increase in sensitivity over the default GSC weights, but takes a fair amount of time. (W) (-wpb in HMMER) Use the Henikoff position-based weighting scheme. (V) (-wvoronoi in HMMER) Use the Sibbald/Argos Voronoi sequence weighting algorithm in place of the default GSC weighting. (N) (-wnone in HMMER) Turn off all sequence weighting.
B (Blosum)
G (Gerstein/Sonnhammer/Chothia)
K (Krogh/Mitchison)
W (Henikoff)
V (Sibbald/Argos Voronoi)
N (None)
G
-o outfile Re-save the starting alignment to file, in Stockholm format. The columns which were assigned to match states will be marked with x's in an RF annotation line. If either the -hand or -fast construction options were chosen, the alignment may have been slightly altered to be compatible with Plan 7 transitions, so saving the final alignment and comparing to the starting alignment can let you view these alterations. See the User's Guide for more information on this arcane side effect. Output file <*>.ehmmbuild
-cfile outfile Save the observed emission and transition counts to file after the architecture has been determined (e.g. after residues/gaps have been assigned to match, delete, and insert states). This option is used in HMMER development for generating data files useful for training new Dirichlet priors. The format of count files is documented in the User's Guide. Output file <*>.ehmmbuild
Associated qualifiers
"-alignfile" associated seqset qualifiers
-sbegin1
-sbegin_alignfile
integer Start of each sequence to be used Any integer value 0
-send1
-send_alignfile
integer End of each sequence to be used Any integer value 0
-sreverse1
-sreverse_alignfile
boolean Reverse (if DNA) Boolean value Yes/No N
-sask1
-sask_alignfile
boolean Ask for begin/end/reverse Boolean value Yes/No N
-snucleotide1
-snucleotide_alignfile
boolean Sequence is nucleotide Boolean value Yes/No N
-sprotein1
-sprotein_alignfile
boolean Sequence is protein Boolean value Yes/No N
-slower1
-slower_alignfile
boolean Make lower case Boolean value Yes/No N
-supper1
-supper_alignfile
boolean Make upper case Boolean value Yes/No N
-scircular1
-scircular_alignfile
boolean Sequence is circular Boolean value Yes/No N
-sformat1
-sformat_alignfile
string Input sequence format Any string  
-iquery1
-iquery_alignfile
string Input query fields or ID list Any string  
-ioffset1
-ioffset_alignfile
integer Input start position offset Any integer value 0
-sdbname1
-sdbname_alignfile
string Database name Any string  
-sid1
-sid_alignfile
string Entryname Any string  
-ufo1
-ufo_alignfile
string UFO features Any string  
-fformat1
-fformat_alignfile
string Features format Any string  
-fopenfile1
-fopenfile_alignfile
string Features file name Any string  
"-hmmfile" associated outfile qualifiers
-odirectory2
-odirectory_hmmfile
string Output directory Any string  
"-o" associated outfile qualifiers
-odirectory string Output directory Any string  
"-cfile" associated outfile qualifiers
-odirectory string Output directory Any string  
General qualifiers
-auto boolean Turn off prompts Boolean value Yes/No N
-stdout boolean Write first file to standard output Boolean value Yes/No N
-filter boolean Read first file from standard input, write first file to standard output Boolean value Yes/No N
-options boolean Prompt for standard and additional values Boolean value Yes/No N
-debug boolean Write debug output to program.dbg Boolean value Yes/No N
-verbose boolean Report some/full command line options Boolean value Yes/No Y
-help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose Boolean value Yes/No N
-warning boolean Report warnings Boolean value Yes/No Y
-error boolean Report errors Boolean value Yes/No Y
-fatal boolean Report fatal errors Boolean value Yes/No Y
-die boolean Report dying program messages Boolean value Yes/No Y
-version boolean Report version number and exit Boolean value Yes/No N

Input file format

Alignment and sequence formats

Input and output of alignments and sequences is limited to the formats that the original hmmer supports. These include stockholm, SELEX, MSF, Clustal, Phylip and A2M /aligned FASTA (alignments) and FASTA, GENBANK, EMBL, GCG, PIR (sequences). It would be fairly straightforward to adapt the code to support all EMBOSS-supported formats.

Compressed input files

Automatic processing of gzipped files is not supported.

ehmmbuild reads any normal sequence USAs.

Input files for usage example

File: globins50.msf

!!AA_MULTIPLE_ALIGNMENT 1.0
PileUp of: *.pep

 Symbol comparison table: GenRunData:blosum62.cmp  CompCheck: 6430

                   GapWeight: 12
             GapLengthWeight: 4 

 pileup.msf  MSF: 308  Type: P  August 16, 1999 09:09  Check: 9858 ..

 Name: lgb1_pea         Len:   308  Check: 2200  Weight:  1.00
 Name: lgb1_vicfa       Len:   308  Check:  214  Weight:  1.00
 Name: myg_escgi        Len:   308  Check: 3961  Weight:  1.00
 Name: myg_horse        Len:   308  Check: 5619  Weight:  1.00
 Name: myg_progu        Len:   308  Check: 6401  Weight:  1.00
 Name: myg_saisc        Len:   308  Check: 6606  Weight:  1.00
 Name: myg_lycpi        Len:   308  Check: 6090  Weight:  1.00
 Name: myg_mouse        Len:   308  Check: 6613  Weight:  1.00
 Name: myg_musan        Len:   308  Check: 3942  Weight:  1.00
 Name: hba_ailme        Len:   308  Check: 4558  Weight:  1.00
 Name: hba_prolo        Len:   308  Check: 5054  Weight:  1.00
 Name: hba_pagla        Len:   308  Check: 5383  Weight:  1.00
 Name: hba_macfa        Len:   308  Check: 5135  Weight:  1.00
 Name: hba_macsi        Len:   308  Check: 5198  Weight:  1.00
 Name: hba_ponpy        Len:   308  Check: 5050  Weight:  1.00
 Name: hba2_galcr       Len:   308  Check: 5609  Weight:  1.00
 Name: hba_mesau        Len:   308  Check: 4702  Weight:  1.00
 Name: hba2_bosmu       Len:   308  Check: 4241  Weight:  1.00
 Name: hba_erieu        Len:   308  Check: 4680  Weight:  1.00
 Name: hba_frapo        Len:   308  Check: 3549  Weight:  1.00
 Name: hba_phaco        Len:   308  Check: 4440  Weight:  1.00
 Name: hba_trioc        Len:   308  Check: 5465  Weight:  1.00
 Name: hba_ansse        Len:   308  Check: 3300  Weight:  1.00
 Name: hba_colli        Len:   308  Check: 3816  Weight:  1.00
 Name: hbad_chlme       Len:   308  Check: 4571  Weight:  1.00
 Name: hbad_pasmo       Len:   308  Check: 6777  Weight:  1.00
 Name: hbaz_horse       Len:   308  Check: 7187  Weight:  1.00
 Name: hba4_salir       Len:   308  Check: 7329  Weight:  1.00
 Name: hbb_ornan        Len:   308  Check: 2667  Weight:  1.00
 Name: hbb_tacac        Len:   308  Check: 4356  Weight:  1.00
 Name: hbe_ponpy        Len:   308  Check: 3827  Weight:  1.00
 Name: hbb_speci        Len:   308  Check: 1556  Weight:  1.00
 Name: hbb_speto        Len:   308  Check: 2051  Weight:  1.00
 Name: hbb_equhe        Len:   308  Check: 3414  Weight:  1.00
 Name: hbb_sunmu        Len:   308  Check: 2927  Weight:  1.00
 Name: hbb_calar        Len:   308  Check: 3836  Weight:  1.00
 Name: hbb_mansp        Len:   308  Check: 4322  Weight:  1.00
 Name: hbb_ursma        Len:   308  Check: 4428  Weight:  1.00
 Name: hbb_rabit        Len:   308  Check: 4190  Weight:  1.00
 Name: hbb_tupgl        Len:   308  Check: 4185  Weight:  1.00


  [Part of this file has been deleted for brevity]

  lgb1_pea  ~~~~~~~~
lgb1_vicfa  ~~~~~~~~
 myg_escgi  ~~~~~~~~
 myg_horse  ~~~~~~~~
 myg_progu  ~~~~~~~~
 myg_saisc  ~~~~~~~~
 myg_lycpi  ~~~~~~~~
 myg_mouse  ~~~~~~~~
 myg_musan  ~~~~~~~~
 hba_ailme  ~~~~~~~~
 hba_prolo  ~~~~~~~~
 hba_pagla  ~~~~~~~~
 hba_macfa  ~~~~~~~~
 hba_macsi  ~~~~~~~~
 hba_ponpy  ~~~~~~~~
hba2_galcr  ~~~~~~~~
 hba_mesau  ~~~~~~~~
hba2_bosmu  ~~~~~~~~
 hba_erieu  ~~~~~~~~
 hba_frapo  ~~~~~~~~
 hba_phaco  ~~~~~~~~
 hba_trioc  ~~~~~~~~
 hba_ansse  ~~~~~~~~
 hba_colli  ~~~~~~~~
hbad_chlme  ~~~~~~~~
hbad_pasmo  ~~~~~~~~
hbaz_horse  ~~~~~~~~
hba4_salir  ~~~~~~~~
 hbb_ornan  ~~~~~~~~
 hbb_tacac  ~~~~~~~~
 hbe_ponpy  ~~~~~~~~
 hbb_speci  ~~~~~~~~
 hbb_speto  ~~~~~~~~
 hbb_equhe  ~~~~~~~~
 hbb_sunmu  ~~~~~~~~
 hbb_calar  ~~~~~~~~
 hbb_mansp  ~~~~~~~~
 hbb_ursma  ~~~~~~~~
 hbb_rabit  ~~~~~~~~
 hbb_tupgl  ~~~~~~~~
 hbb_triin  ~~~~~~~~
 hbb_colli  ~~~~~~~~
 hbb_larri  ~~~~~~~~
hbb1_varex  ~~~~~~~~
hbb2_xentr  ~~~~~~~~
hbbl_ranca  ~~~~~~~~
hbb2_tricr  ~~~~~~~~
glb2_mormr  ~~~~~~~~
glbz_chith  FGAVFAKM
hbf1_ureca  VAAMK~~~

Output file format

ehmmbuild outputs a graph to the specified graphics device. outputs a report format file. The default format is ...

Output files for usage example

File: globin.hmm

HMMER2.0  [2.3.2]
NAME  globins50
LENG  143
ALPH  Amino
RF    no
CS    no
MAP   yes
COM   /homes/user/local/bin/hmmbuild -n globins50 --pbswitch 1000 --archpri 0.850000 --idlevel 0.620000 --swentry 0.500000 --swexit 0.500000 --wgsc -A -F globin.hmm ../../data/hmmnew/globins50.msf
NSEQ  50
DATE  Fri Jul 15 12:00:00 2011
CKSUM 9858
XT      -8455     -4  -1000  -1000  -8455     -4  -8455     -4 
NULT      -4  -8455
NULE     595  -1558     85    338   -294    453  -1158    197    249    902  -1085   -142    -21   -313     45    531    201    384  -1998   -644 
HMM        A      C      D      E      F      G      H      I      K      L      M      N      P      Q      R      S      T      V      W      Y    
         m->m   m->i   m->d   i->m   i->i   d->m   d->d   b->m   m->e
         -450      *  -1900
     1    591  -1587    159   1351  -1874   -201    151  -1600    998  -1591   -693    389  -1272    595     42    -31     27   -693  -1797  -1134    14
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378   -450      * 
     2   -926  -2616   2221   2269  -2845  -1178   -325  -2678   -300  -2596  -1810    220  -1592    939   -974   -671   -939  -2204  -2785  -1925    15
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     3   -638  -1715   -680    497  -2043  -1540     23  -1671   2380  -1641   -840   -222  -1595    437   1040   -564   -523  -1363   2124  -1313    16
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     4    829  -1571    -37    660  -1856   -873    152  -1578    894  -1573   -678    769  -1273   1284     58    224    447  -1175  -1782  -1125    17
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     5    369   -433   -475    286   -974  -1312    -19   -412    664    398    406   1030  -1394    388   -214   -261     85   -166  -1227   -725    18
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     6  -1291   -884  -3696  -3261  -1137  -3425  -2802   2322  -3066    111     19  -3028  -3275  -2855  -3100  -2670  -1269   2738  -2450  -2062    19
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     7    157   -413   -236    316  -1387  -1231     89   -863   1084   -431   -348    910  -1319    635    297     15    704   -483  -1497   -922    20
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     8    770  -1431    -43    459  -1751   -340     78  -1449    440  -1497   -631    866  -1302    825    -51    953    364  -1076  -1750  -1121    21
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
     9    420   -186  -2172  -1577      8  -1818   -694   1477  -1281    760    614  -1299  -1867  -1001  -1262   -189    -12   1401   -722   -364    22
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
    10   -961   -879  -2277  -1821   1366  -2213   -204   -399  -1500   -130    -39  -1427  -2266  -1186  -1511   -159   -913   -367   4721   1177    23
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
    11    -48  -1782    809    844  -2073   1456      8  -1811    315  -1803   -932    180  -1365    921   -218    173   -115  -1399  -2018  -1327    24
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -68  -6528  -4832   -894  -1115   -701  -1378      *      * 


  [Part of this file has been deleted for brevity]

     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   128   -415  -1926   1575   1399  -2219  -1163     17  -1983    527  -1929  -1039    341  -1367   1597   -212    257   -222  -1536  -2109  -1387   144
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   129   -529  -1434   -629   -143  -1926   -626   -171  -1460   2679  -1597   -839   -309  -1599    207    317   -530   -510   -130  -1840  -1369   145
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   130    811   -397  -2389  -1807   1883  -2039   -907    594  -1512   1077    687  -1532  -2065  -1201  -1483  -1125   -465   1067   -843   -472   146
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   131   -241   -102  -2327  -1710    724  -1767   -616    650  -1363   1074   1765   -718  -1809  -1026  -1252   -842   -181   1331   -541    695   147
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   132    723     95    385    823  -1820  -1168    167  -1540    875  -1362   -644    320  -1261    810    246    693    -67  -1141  -1753  -1098   148
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   133    551   -430  -1049   -481   -442    469   -241    465   -313    133    947   -411  -1543    197   -587   -146    202    522   -843   -429   149
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   134  -1086   -777  -3351  -2800    816  -2898  -1861   1501  -2515   1149    586  -2483  -2775  -2108  -2400  -2046  -1030   2380  -1511  -1216   150
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   135   1393   1409   -876   -345   -997   -525   -315   -590   -198   -847   -109   -420  -1441    -97    412    766   -130    139  -1306   -858   151
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   136     98  -1299     36    365  -1495  -1211   1241   -404    523   -952   -426   1174  -1303    511    -18    347    882   -853  -1566   -970   152
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   137   1308   -787    564   -132   -966  -1332   -203   -362    -49   -395    -57   -305  -1481     49   -437   -190   -182   1020  -1282   -802   153
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   138  -1746  -1358  -3897  -3341   -216  -3621  -2478   1774  -3040   2442   1157  -3189  -3229  -2422  -2853  -2824  -1659    392  -1720  -1647   154
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   139   1176  -1289   -179    534  -1606   -607     34  -1278    734  -1372   -534     44  -1325    433    -89    521    826   -941  -1666  -1072   155
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6528  -7571   -894  -1115   -701  -1378      *      * 
   140    602  -1500   -135    850  -1753  -1214   1951  -1452    838  -1484    431    118  -1306    555    347    489   -153  -1085  -1723  -1092   156
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -22  -6602  -7644   -894  -1115   -701  -1378      *      * 
   141    351  -1646   -165    546  -1976   -498     46  -1667   2193  -1662   -798     35  -1405    476    311    -73   -306  -1287  -1859  -1254   157
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -23  -6561  -7603   -894  -1115   -701  -1378      *      * 
   142  -1995  -1606  -3095  -2870   1739  -3015    -98  -1012  -2520   -730    655  -1990  -2962  -1884  -2326  -2167  -1915  -1128    548   4089   158
     -   -149   -500    233     43   -381    399    106   -626    210   -466   -720    275    394     45     96    359    117   -369   -294   -249 
     -    -25  -6455  -7497   -894  -1115   -701  -1378      *      * 
   143   -253  -1373   -267    301   -911   -565   1956   -450   1188  -1330   -497     33  -1352    502   1358   -205   -184   -941  -1604  -1026   159
     -      *      *      *      *      *      *      *      *      *      *      *      *      *      *      *      *      *      *      *      * 
     -      *      *      *      *      *      *      *      *      0 
//

Data files

None.

Notes

1. Command-line arguments

The following original HMMER options are not supported:
-h         : Use -help to get help information instead.
-f         : Use -strategy option instead.
-g         : Use -strategy option instead.
-s         : Use -strategy option instead.
-A         : Set append: "N" or append: "Y" for "hmmfile" in the ACD file instead.
-F         : Always set (an existing hmmfile will be overwritten).
-amino     : Sequence alignment type is specified via the ACD file.
-nucleic   : Sequence alignment type is specified via the ACD file.
-informat  : All common alignment formats are supported automatically.  
-wblosum   : Use -weighting option to specify the sequence weighting algorithm. 
-wgsc      : Use -weighting option to specify the sequence weighting algorithm. 
-wme       : Use -weighting option to specify the sequence weighting algorithm. 
-wnone     : Use -weighting option to specify the sequence weighting algorithm. 
-wpb       : Use -weighting option to specify the sequence weighting algorithm. 
-wvoronoi  : Use -weighting option to specify the sequence weighting algorithm. 
-verbose   : Use -verbosity instead.

The following additional options are provided:

-weighting : Sequence weighting algorithm. 
-n         : Use -nhmm instead (-n causes problems for GUI developers)

2. Installing EMBASSY HMMER

The EMBASSY HMMER package contains "wrapper" applications providing an EMBOSS-style interface to the applications in the original HMMER package version 2.3.2 developed by Sean Eddy. Please read the file INSTALL in the EMBASSY HMMER package distribution for installation instructions.

3. Installing original HMMER

To use EMBASSY HMMER, you will first need to download and install the original HMMER package. Please read the file 00README in the the original HMMER package distribution for installation instructions:
WWW home:       http://hmmer.wustl.edu/
Distribution:   ftp://ftp.genetics.wustl.edu/pub/eddy/hmmer/

4. Setting up HMMER

For the EMBASSY HMMER package to work, the directory containing the original HMMER executables *must* be in your path. For example if you executables were installed to "/usr/local/hmmer/bin", then type:
set path=(/usr/local/hmmer/bin/ $path)
rehash

5. Getting help

Please read the Userguide.pdf distributed with the original HMMER and included in the EMBASSY HMMER distribution under the DOCS directory. The first 3 chapters (Introduction, Installation and Tutorial) are particularly useful.

Please read the 'Notes' section below for a description of the differences between the original and EMBASSY HMMER, particularly which application command line options are supported.

References

None.

Warnings

Types of input data

hmmer v3.2.1 and therefore EMBASSY HMMER is only recommended for use with protein sequences. If you provide a non-protein sequence you will be reprompted for a protein sequence. To accept nucleic acid sequences you must replace instances of < type: "protein" > in the application ACD files with .

Environment variables

The original hmmer uses BLAST environment variables (below), if defined, to locate files. The EMBASSY HMMER does not.
BLASTDB   location of sequence databases to be searched
BLASMAT   location of substitution matrices
HMMERDB   location of HMMs

Alignment input

The user must provide the full filename of an alignment for the "alignfile" ACD option, not an indirect reference to a set of sequences, e.g. a USA is NOT acceptable. This is because hmmbuild (which ehmmbuild wraps) requires an alignment and does not support USAs.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0.

Known bugs

None.

See also

Program name Description
ehmmalign Align sequences to an HMM profile
ehmmcalibrate Calibrate HMM search statistics
ehmmconvert Convert between profile HMM file formats
ehmmemit Generate sequences from a profile HMM
ehmmfetch Retrieve an HMM from an HMM database
ehmmindex Create a binary SSI index for an HMM database
ehmmpfam Search one or more sequences against an HMM database
ehmmsearch Search a sequence database with a profile HMM
libgen Generate discriminating elements from alignments
ohmmalign Align sequences with an HMM
ohmmbuild Build HMM
ohmmcalibrate Calibrate a hidden Markov model
ohmmconvert Convert between HMM formats
ohmmemit Extract HMM sequences
ohmmfetch Extract HMM from a database
ohmmindex Index an HMM database
ohmmpfam Align single sequence with an HMM
ohmmsearch Search sequence database with an HMM

Author(s)

This program is an EMBOSS conversion of a program written by Sean Eddy as part of his HMMER package.

Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author. Jon Ison
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.

This program is an EMBASSY wrapper to a program written by Sean Eddy as part of his hmmer package.

Please report any bugs to the EMBOSS bug team in the first instance, not to Sean Eddy.

History

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None