fdnaml |
Please help by correcting and extending the Wiki pages.
The assumptions of the present model are:
The ratio of the two purines in the purine replacement pool is the same as their ratio in the overall pool, and similarly for the pyrimidines.
The ratios of transitions to transversions can be set by the user. The substitution process can be diagrammed as follows: Suppose that you specified A, C, G, and T base frequencies of 0.24, 0.28, 0.27, and 0.21.
Determine whether the existing base is a purine or a pyrimidine. Draw from the proper pool:
Purine pool: Pyrimidine pool: | | | | | 0.4706 A | | 0.5714 C | | 0.5294 G | | 0.4286 T | | (ratio is | | (ratio is | | 0.24 : 0.27) | | 0.28 : 0.21) | |_______________| |_______________|
Draw from the overall pool:
| | | 0.24 A | | 0.28 C | | 0.27 G | | 0.21 T | |__________________|
Note that if the existing base is, say, an A, the first kind of event has a 0.4706 probability of "replacing" it by another A. The second kind of event has a 0.24 chance of replacing it by another A. This rather disconcerting model is used because it has nice mathematical properties that make likelihood calculations far easier. A closely similar, but not precisely identical model having different rates of transitions and transversions has been used by Hasegawa et. al. (1985b). The transition probability formulas for the current model were given (with my permission) by Kishino and Hasegawa (1989). Another explanation is available in the paper by Felsenstein and Churchill (1996).
Note the assumption that we are looking at all sites, including those that have not changed at all. It is important not to restrict attention to some sites based on whether or not they have changed; doing that would bias branch lengths by making them too long, and that in turn would cause the method to misinterpret the meaning of those sites that had changed.
This program uses a Hidden Markov Model (HMM) method of inferring different rates of evolution at different sites. This was described in a paper by me and Gary Churchill (1996). It allows us to specify to the program that there will be a number of different possible evolutionary rates, what the prior probabilities of occurrence of each is, and what the average length of a patch of sites all having the same rate. The rates can also be chosen by the program to approximate a Gamma distribution of rates, or a Gamma distribution plus a class of invariant sites. The program computes the the likelihood by summing it over all possible assignments of rates to sites, weighting each by its prior probability of occurrence.
For example, if we have used the C and A options (described below) to specify that there are three possible rates of evolution, 1.0, 2.4, and 0.0, that the prior probabilities of a site having these rates are 0.4, 0.3, and 0.3, and that the average patch length (number of consecutive sites with the same rate) is 2.0, the program will sum the likelihood over all possibilities, but giving less weight to those that (say) assign all sites to rate 2.4, or that fail to have consecutive sites that have the same rate.
The Hidden Markov Model framework for rate variation among sites was independently developed by Yang (1993, 1994, 1995). We have implemented a general scheme for a Hidden Markov Model of rates; we allow the rates and their prior probabilities to be specified arbitrarily by the user, or by a discrete approximation to a Gamma distribution of rates (Yang, 1995), or by a mixture of a Gamma distribution and a class of invariant sites.
This feature effectively removes the artificial assumption that all sites have the same rate, and also means that we need not know in advance the identities of the sites that have a particular rate of evolution.
Another layer of rate variation also is available. The user can assign categories of rates to each site (for example, we might want first, second, and third codon positions in a protein coding sequence to be three different categories. This is done with the categories input file and the C option. We then specify (using the menu) the relative rates of evolution of sites in the different categories. For example, we might specify that first, second, and third positions evolve at relative rates of 1.0, 0.8, and 2.7.
If both user-assigned rate categories and Hidden Markov Model rates are allowed, the program assumes that the actual rate at a site is the product of the user-assigned category rate and the Hidden Markov Model regional rate. (This may not always make perfect biological sense: it would be more natural to assume some upper bound to the rate, as we have discussed in the Felsenstein and Churchill paper). Nevertheless you may want to use both types of rate variation.
% fdnaml -printdata -ncategories 2 -categories "1111112222222" -rate "1.0 2.0" -gammatype h -nhmmcategories 5 -hmmrates "0.264 1.413 3.596 7.086 12.641" -hmmprobabilities "0.522 0.399 0.076 0.0036 0.000023" -lambda 1.5 -weight "0111111111110" Estimate nucleotide phylogeny by maximum likelihood Input (aligned) nucleotide sequence set(s): dnaml.dat Phylip tree file (optional): Phylip dnaml program output file [dnaml.fdnaml]: mulsets: false datasets : 1 rctgry : true gama : false invar : false numwts : 1 numseqs : 1 ctgry: true categs : 2 rcategs : 5 auto_: false freqsfrom : true global : false hypstate : false improve : false invar : false jumble : false njumble : 1 lngths : false lambda : 1.000000 cv : 1.000000 freqa : 0.000000 freqc : 0.000000 freqg : 0.000000 freqt : 0.000000 outgrno : 1 outgropt: false trout : true ttratio : 2.000000 ttr : false usertree : false weights: true printdata : true progress : true treeprint: true interleaved : false Adding species: 1. Alpha 2. Beta 3. Gamma 4. Delta 5. Epsilon Output written to file "dnaml.fdnaml" Tree also written onto file "dnaml.treefile" Done. |
Go to the input files for this example
Go to the output files for this example
Example 2
% fdnaml -printdata -njumble 3 -seed 3 Estimate nucleotide phylogeny by maximum likelihood Input (aligned) nucleotide sequence set(s): dnaml.dat Phylip tree file (optional): Phylip dnaml program output file [dnaml.fdnaml]: mulsets: false datasets : 1 rctgry : false gama : false invar : false numwts : 0 numseqs : 1 ctgry: false categs : 1 rcategs : 1 auto_: false freqsfrom : true global : false hypstate : false improve : false invar : false jumble : true njumble : 3 lngths : false lambda : 1.000000 cv : 1.000000 freqa : 0.000000 freqc : 0.000000 freqg : 0.000000 freqt : 0.000000 outgrno : 1 outgropt: false trout : true ttratio : 2.000000 ttr : false usertree : false weights: false printdata : true progress : true treeprint: true interleaved : false Adding species: 1. Delta 2. Epsilon 3. Alpha 4. Beta 5. Gamma Adding species: 1. Beta 2. Epsilon 3. Delta 4. Alpha 5. Gamma Adding species: 1. Epsilon 2. Alpha 3. Gamma 4. Delta 5. Beta Output written to file "dnaml.fdnaml" Tree also written onto file "dnaml.treefile" Done. |
Go to the output files for this example
Estimate nucleotide phylogeny by maximum likelihood Version: EMBOSS:6.5.0.0 Standard (Mandatory) qualifiers: [-sequence] seqsetall File containing one or more sequence alignments [-intreefile] tree Phylip tree file (optional) [-outfile] outfile [*.fdnaml] Phylip dnaml program output file Additional (Optional) qualifiers (* if not always prompted): -ncategories integer [1] Number of substitution rate categories (Integer from 1 to 9) * -rate array Rate for each category * -categories properties File of substitution rate categories -weights properties Weights file * -lengths boolean [N] Use branch lengths from user trees -ttratio float [2.0] Transition/transversion ratio (Number 0.001 or more) -[no]freqsfrom toggle [Y] Use empirical base frequencies from seqeunce input * -basefreq array [0.25 0.25 0.25 0.25] Base frequencies for A C G T/U (use blanks to separate) -gammatype menu [Constant rate] Rate variation among sites (Values: g (Gamma distributed rates); i (Gamma+invariant sites); h (User defined HMM of rates); n (Constant rate)) * -gammacoefficient float [1] Coefficient of variation of substitution rate among sites (Number 0.001 or more) * -ngammacat integer [1] Number of categories (1-9) (Integer from 1 to 9) * -invarcoefficient float [1] Coefficient of variation of substitution rate among sites (Number 0.001 or more) * -ninvarcat integer [1] Number of categories (1-9) including one for invariant sites (Integer from 1 to 9) * -invarfrac float [0.0] Fraction of invariant sites (Number from 0.000 to 1.000) * -nhmmcategories integer [1] Number of HMM rate categories (Integer from 1 to 9) * -hmmrates array [1.0] HMM category rates * -hmmprobabilities array [1.0] Probability for each HMM category * -adjsite boolean [N] Rates at adjacent sites correlated * -lambda float [1.0] Mean block length of sites having the same rate (Number 1.000 or more) * -njumble integer [0] Number of times to randomise (Integer 0 or more) * -seed integer [1] Random number seed between 1 and 32767 (must be odd) (Integer from 1 to 32767) * -global boolean [N] Global rearrangements -outgrno integer [0] Species number to use as outgroup (Integer 0 or more) -[no]rough boolean [Y] Speedier but rougher analysis -[no]trout toggle [Y] Write out trees to tree file * -outtreefile outfile [*.fdnaml] Phylip tree output file (optional) -printdata boolean [N] Print data at start of run -[no]progress boolean [Y] Print indications of progress of run -[no]treeprint boolean [Y] Print out tree -hypstate boolean [N] Reconstruct hypothetical sequence Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -scircular1 boolean Sequence is circular -sformat1 string Input sequence format -iquery1 string Input query fields or ID list -ioffset1 integer Input start position offset -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" associated qualifiers -odirectory3 string Output directory "-outtreefile" associated qualifiers -odirectory string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages -version boolean Report version number and exit |
Qualifier | Type | Description | Allowed values | Default | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Standard (Mandatory) qualifiers | ||||||||||||
[-sequence] (Parameter 1) |
seqsetall | File containing one or more sequence alignments | Readable sets of sequences | Required | ||||||||
[-intreefile] (Parameter 2) |
tree | Phylip tree file (optional) | Phylogenetic tree | |||||||||
[-outfile] (Parameter 3) |
outfile | Phylip dnaml program output file | Output file | <*>.fdnaml | ||||||||
Additional (Optional) qualifiers | ||||||||||||
-ncategories | integer | Number of substitution rate categories | Integer from 1 to 9 | 1 | ||||||||
-rate | array | Rate for each category | List of floating point numbers | |||||||||
-categories | properties | File of substitution rate categories | Property value(s) | |||||||||
-weights | properties | Weights file | Property value(s) | |||||||||
-lengths | boolean | Use branch lengths from user trees | Boolean value Yes/No | No | ||||||||
-ttratio | float | Transition/transversion ratio | Number 0.001 or more | 2.0 | ||||||||
-[no]freqsfrom | toggle | Use empirical base frequencies from seqeunce input | Toggle value Yes/No | Yes | ||||||||
-basefreq | array | Base frequencies for A C G T/U (use blanks to separate) | List of floating point numbers | 0.25 0.25 0.25 0.25 | ||||||||
-gammatype | list | Rate variation among sites |
|
Constant rate | ||||||||
-gammacoefficient | float | Coefficient of variation of substitution rate among sites | Number 0.001 or more | 1 | ||||||||
-ngammacat | integer | Number of categories (1-9) | Integer from 1 to 9 | 1 | ||||||||
-invarcoefficient | float | Coefficient of variation of substitution rate among sites | Number 0.001 or more | 1 | ||||||||
-ninvarcat | integer | Number of categories (1-9) including one for invariant sites | Integer from 1 to 9 | 1 | ||||||||
-invarfrac | float | Fraction of invariant sites | Number from 0.000 to 1.000 | 0.0 | ||||||||
-nhmmcategories | integer | Number of HMM rate categories | Integer from 1 to 9 | 1 | ||||||||
-hmmrates | array | HMM category rates | List of floating point numbers | 1.0 | ||||||||
-hmmprobabilities | array | Probability for each HMM category | List of floating point numbers | 1.0 | ||||||||
-adjsite | boolean | Rates at adjacent sites correlated | Boolean value Yes/No | No | ||||||||
-lambda | float | Mean block length of sites having the same rate | Number 1.000 or more | 1.0 | ||||||||
-njumble | integer | Number of times to randomise | Integer 0 or more | 0 | ||||||||
-seed | integer | Random number seed between 1 and 32767 (must be odd) | Integer from 1 to 32767 | 1 | ||||||||
-global | boolean | Global rearrangements | Boolean value Yes/No | No | ||||||||
-outgrno | integer | Species number to use as outgroup | Integer 0 or more | 0 | ||||||||
-[no]rough | boolean | Speedier but rougher analysis | Boolean value Yes/No | Yes | ||||||||
-[no]trout | toggle | Write out trees to tree file | Toggle value Yes/No | Yes | ||||||||
-outtreefile | outfile | Phylip tree output file (optional) | Output file | <*>.fdnaml | ||||||||
-printdata | boolean | Print data at start of run | Boolean value Yes/No | No | ||||||||
-[no]progress | boolean | Print indications of progress of run | Boolean value Yes/No | Yes | ||||||||
-[no]treeprint | boolean | Print out tree | Boolean value Yes/No | Yes | ||||||||
-hypstate | boolean | Reconstruct hypothetical sequence | Boolean value Yes/No | No | ||||||||
Advanced (Unprompted) qualifiers | ||||||||||||
(none) | ||||||||||||
Associated qualifiers | ||||||||||||
"-sequence" associated seqsetall qualifiers | ||||||||||||
-sbegin1 -sbegin_sequence |
integer | Start of each sequence to be used | Any integer value | 0 | ||||||||
-send1 -send_sequence |
integer | End of each sequence to be used | Any integer value | 0 | ||||||||
-sreverse1 -sreverse_sequence |
boolean | Reverse (if DNA) | Boolean value Yes/No | N | ||||||||
-sask1 -sask_sequence |
boolean | Ask for begin/end/reverse | Boolean value Yes/No | N | ||||||||
-snucleotide1 -snucleotide_sequence |
boolean | Sequence is nucleotide | Boolean value Yes/No | N | ||||||||
-sprotein1 -sprotein_sequence |
boolean | Sequence is protein | Boolean value Yes/No | N | ||||||||
-slower1 -slower_sequence |
boolean | Make lower case | Boolean value Yes/No | N | ||||||||
-supper1 -supper_sequence |
boolean | Make upper case | Boolean value Yes/No | N | ||||||||
-scircular1 -scircular_sequence |
boolean | Sequence is circular | Boolean value Yes/No | N | ||||||||
-sformat1 -sformat_sequence |
string | Input sequence format | Any string | |||||||||
-iquery1 -iquery_sequence |
string | Input query fields or ID list | Any string | |||||||||
-ioffset1 -ioffset_sequence |
integer | Input start position offset | Any integer value | 0 | ||||||||
-sdbname1 -sdbname_sequence |
string | Database name | Any string | |||||||||
-sid1 -sid_sequence |
string | Entryname | Any string | |||||||||
-ufo1 -ufo_sequence |
string | UFO features | Any string | |||||||||
-fformat1 -fformat_sequence |
string | Features format | Any string | |||||||||
-fopenfile1 -fopenfile_sequence |
string | Features file name | Any string | |||||||||
"-outfile" associated outfile qualifiers | ||||||||||||
-odirectory3 -odirectory_outfile |
string | Output directory | Any string | |||||||||
"-outtreefile" associated outfile qualifiers | ||||||||||||
-odirectory | string | Output directory | Any string | |||||||||
General qualifiers | ||||||||||||
-auto | boolean | Turn off prompts | Boolean value Yes/No | N | ||||||||
-stdout | boolean | Write first file to standard output | Boolean value Yes/No | N | ||||||||
-filter | boolean | Read first file from standard input, write first file to standard output | Boolean value Yes/No | N | ||||||||
-options | boolean | Prompt for standard and additional values | Boolean value Yes/No | N | ||||||||
-debug | boolean | Write debug output to program.dbg | Boolean value Yes/No | N | ||||||||
-verbose | boolean | Report some/full command line options | Boolean value Yes/No | Y | ||||||||
-help | boolean | Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose | Boolean value Yes/No | N | ||||||||
-warning | boolean | Report warnings | Boolean value Yes/No | Y | ||||||||
-error | boolean | Report errors | Boolean value Yes/No | Y | ||||||||
-fatal | boolean | Report fatal errors | Boolean value Yes/No | Y | ||||||||
-die | boolean | Report dying program messages | Boolean value Yes/No | Y | ||||||||
-version | boolean | Report version number and exit | Boolean value Yes/No | N |
5 13 Alpha AACGTGGCCAAAT Beta AAGGTCGCCAAAC Gamma CATTTCGTCACAA Delta GGTATTTCGGCCT Epsilon GGGATCTCGGCCC |
If the R (HMM rates) option is used a table of the relative rates of expected substitution at each category of sites is printed, as well as the probabilities of each of those rates.
There then follow the data sequences, if the user has selected the menu option to print them out, with the base sequences printed in groups of ten bases along the lines of the Genbank and EMBL formats. The trees found are printed as an unrooted tree topology (possibly rooted by outgroup if so requested). The internal nodes are numbered arbitrarily for the sake of identification. The number of trees evaluated so far and the log likelihood of the tree are also given. Note that the trees printed out have a trifurcation at the base. The branch lengths in the diagram are roughly proportional to the estimated branch lengths, except that very short branches are printed out at least three characters in length so that the connections can be seen.
A table is printed showing the length of each tree segment (in units of expected nucleotide substitutions per site), as well as (very) rough confidence limits on their lengths. If a confidence limit is negative, this indicates that rearrangement of the tree in that region is not excluded, while if both limits are positive, rearrangement is still not necessarily excluded because the variance calculation on which the confidence limits are based results in an underestimate, which makes the confidence limits too narrow.
In addition to the confidence limits, the program performs a crude Likelihood Ratio Test (LRT) for each branch of the tree. The program computes the ratio of likelihoods with and without this branch length forced to zero length. This done by comparing the likelihoods changing only that branch length. A truly correct LRT would force that branch length to zero and also allow the other branch lengths to adjust to that. The result would be a likelihood ratio closer to 1. Therefore the present LRT will err on the side of being too significant. YOU ARE WARNED AGAINST TAKING IT TOO SERIOUSLY. If you want to get a better likelihood curve for a branch length you can do multiple runs with different prespecified lengths for that branch, as discussed above in the discussion of the L option.
One should also realize that if you are looking not at a previously-chosen branch but at all branches, that you are seeing the results of multiple tests. With 20 tests, one is expected to reach significance at the P = .05 level purely by chance. You should therefore use a much more conservative significance level, such as .05 divided by the number of tests. The significance of these tests is shown by printing asterisks next to the confidence interval on each branch length. It is important to keep in mind that both the confidence limits and the tests are very rough and approximate, and probably indicate more significance than they should. Nevertheless, maximum likelihood is one of the few methods that can give you any indication of its own error; most other methods simply fail to warn the user that there is any error! (In fact, whole philosophical schools of taxonomists exist whose main point seems to be that there isn't any error, that the "most parsimonious" tree is the best tree by definition and that's that).
The log likelihood printed out with the final tree can be used to perform various likelihood ratio tests. One can, for example, compare runs with different values of the expected transition/transversion ratio to determine which value is the maximum likelihood estimate, and what is the allowable range of values (using a likelihood ratio test, which you will find described in mathematical statistics books). One could also estimate the base frequencies in the same way. Both of these, particularly the latter, require multiple runs of the program to evaluate different possible values, and this might get expensive.
If the U (User Tree) option is used and more than one tree is supplied, and the program is not told to assume autocorrelation between the rates at different sites, the program also performs a statistical test of each of these trees against the one with highest likelihood. If there are two user trees, the test done is one which is due to Kishino and Hasegawa (1989), a version of a test originally introduced by Templeton (1983). In this implementation it uses the mean and variance of log-likelihood differences between trees, taken across sites. If the two trees' means are more than 1.96 standard deviations different then the trees are declared significantly different. This use of the empirical variance of log-likelihood differences is more robust and nonparametric than the classical likelihood ratio test, and may to some extent compensate for the any lack of realism in the model underlying this program.
If there are more than two trees, the test done is an extension of the KHT test, due to Shimodaira and Hasegawa (1999). They pointed out that a correction for the number of trees was necessary, and they introduced a resampling method to make this correction. In the version used here the variances and covariances of the sum of log likelihoods across sites are computed for all pairs of trees. To test whether the difference between each tree and the best one is larger than could have been expected if they all had the same expected log-likelihood, log-likelihoods for all trees are sampled with these covariances and equal means (Shimodaira and Hasegawa's "least favorable hypothesis"), and a P value is computed from the fraction of times the difference between the tree's value and the highest log-likelihood exceeds that actually observed. Note that this sampling needs random numbers, and so the program will prompt the user for a random number seed if one has not already been supplied. With the two-tree KHT test no random numbers are used.
In either the KHT or the SH test the program prints out a table of the log-likelihoods of each tree, the differences of each from the highest one, the variance of that quantity as determined by the log-likelihood differences at individual sites, and a conclusion as to whether that tree is or is not significantly worse than the best one. However the test is not available if we assume that there is autocorrelation of rates at neighboring sites (option A) and is not done in those cases.
The branch lengths printed out are scaled in terms of expected numbers of substitutions, counting both transitions and transversions but not replacements of a base by itself, and scaled so that the average rate of change, averaged over all sites analyzed, is set to 1.0 if there are multiple categories of sites. This means that whether or not there are multiple categories of sites, the expected fraction of change for very small branches is equal to the branch length. Of course, when a branch is twice as long this does not mean that there will be twice as much net change expected along it, since some of the changes occur in the same site and overlie or even reverse each other. The branch length estimates here are in terms of the expected underlying numbers of changes. That means that a branch of length 0.26 is 26 times as long as one which would show a 1% difference between the nucleotide sequences at the beginning and end of the branch. But we would not expect the sequences at the beginning and end of the branch to be 26% different, as there would be some overlaying of changes.
Confidence limits on the branch lengths are also given. Of course a negative value of the branch length is meaningless, and a confidence limit overlapping zero simply means that the branch length is not necessarily significantly different from zero. Because of limitations of the numerical algorithm, branch length estimates of zero will often print out as small numbers such as 0.00001. If you see a branch length that small, it is really estimated to be of zero length. Note that versions 2.7 and earlier of this program printed out the branch lengths in terms of expected probability of change, so that they were scaled differently.
Another possible source of confusion is the existence of negative values for the log likelihood. This is not really a problem; the log likelihood is not a probability but the logarithm of a probability. When it is negative it simply means that the corresponding probability is less than one (since we are seeing its logarithm). The log likelihood is maximized by being made more positive: -30.23 is worse than -29.14.
At the end of the output, if the R option is in effect with multiple HMM rates, the program will print a list of what site categories contributed the most to the final likelihood. This combination of HMM rate categories need not have contributed a majority of the likelihood, just a plurality. Still, it will be helpful as a view of where the program infers that the higher and lower rates are. Note that the use in this calculations of the prior probabilities of different rates, and the average patch length, gives this inference a "smoothed" appearance: some other combination of rates might make a greater contribution to the likelihood, but be discounted because it conflicts with this prior information. See the example output below to see what this printout of rate categories looks like. A second list will also be printed out, showing for each site which rate accounted for the highest fraction of the likelihood. If the fraction of the likelihood accounted for is less than 95%, a dot is printed instead.
Option 3 in the menu controls whether the tree is printed out into the output file. This is on by default, and usually you will want to leave it this way. However for runs with multiple data sets such as bootstrapping runs, you will primarily be interested in the trees which are written onto the output tree file, rather than the trees printed on the output file. To keep the output file from becoming too large, it may be wisest to use option 3 to prevent trees being printed onto the output file.
Option 4 in the menu controls whether the tree estimated by the program is written onto a tree file. The default name of this output tree file is "outtree". If the U option is in effect, all the user-defined trees are written to the output tree file.
Option 5 in the menu controls whether ancestral states are estimated at each node in the tree. If it is in effect, a table of ancestral sequences is printed out (including the sequences in the tip species which are the input sequences). In that table, if a site has a base which accounts for more than 95% of the likelihood, it is printed in capital letters (A rather than a). If the best nucleotide accounts for less than 50% of the likelihood, the program prints out an ambiguity code (such as M for "A or C") for the set of nucleotides which, taken together, account for more half of the likelihood. The ambiguity codes are listed in the sequence programs documentation file. One limitation of the current version of the program is that when there are multiple HMM rates (option R) the reconstructed nucleotides are based on only the single assignment of rates to sites which accounts for the largest amount of the likelihood. Thus the assessment of 95% of the likelihood, in tabulating the ancestral states, refers to 95% of the likelihood that is accounted for by that particular combination of rates.
Nucleic acid sequence Maximum Likelihood method, version 3.69.650 5 species, 13 sites Site categories are: 1111112222 222 Sites are weighted as follows: 01111 11111 110 Name Sequences ---- --------- Alpha AACGTGGCCA AAT Beta AAGGTCGCCA AAC Gamma CATTTCGTCA CAA Delta GGTATTTCGG CCT Epsilon GGGATCTCGG CCC Empirical Base Frequencies: A 0.23636 C 0.29091 G 0.25455 T(U) 0.21818 Transition/transversion ratio = 2.000000 State in HMM Rate of change Probability 1 0.264 0.522 2 1.413 0.399 3 3.596 0.076 4 7.086 0.0036 5 12.641 0.000023 Site category Rate of change 1 1.000 2 2.000 +Epsilon +--------------------------------------------------------3 +--2 +-Delta | | | +Beta | 1------Gamma | +-Alpha remember: this is an unrooted tree! Ln Likelihood = -57.87892 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 1 Alpha 0.26766 ( zero, 0.80513) * 1 2 0.04687 ( zero, 0.48388) 2 3 7.59821 ( zero, 22.01485) ** 3 Epsilon 0.00006 ( zero, 0.46205) 3 Delta 0.27319 ( zero, 0.73380) ** 2 Beta 0.00006 ( zero, 0.44052) 1 Gamma 0.95677 ( zero, 2.46186) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 Combination of categories that contributes the most to the likelihood: 1132121111 211 Most probable category at each site if > 0.95 probability ("." otherwise) .......... ... |
(((Epsilon:0.00006,Delta:0.27319):7.59821,Beta:0.00006):0.04687, Gamma:0.95677,Alpha:0.26766); |
Nucleic acid sequence Maximum Likelihood method, version 3.69.650 5 species, 13 sites Name Sequences ---- --------- Alpha AACGTGGCCA AAT Beta AAGGTCGCCA AAC Gamma CATTTCGTCA CAA Delta GGTATTTCGG CCT Epsilon GGGATCTCGG CCC Empirical Base Frequencies: A 0.24615 C 0.29231 G 0.24615 T(U) 0.21538 Transition/transversion ratio = 2.000000 +Epsilon +--------------------------------------------1 +--2 +--------Delta | | | +Beta | 3------------------------------Gamma | +-----Alpha remember: this is an unrooted tree! Ln Likelihood = -72.25088 Between And Length Approx. Confidence Limits ------- --- ------ ------- ---------- ------ 3 Alpha 0.20745 ( zero, 0.56578) 3 2 0.09408 ( zero, 0.40912) 2 1 1.51296 ( zero, 3.31130) ** 1 Epsilon 0.00006 ( zero, 0.34299) 1 Delta 0.28137 ( zero, 0.62654) ** 2 Beta 0.00006 ( zero, 0.32900) 3 Gamma 1.01651 ( zero, 2.33178) ** * = significantly positive, P < 0.05 ** = significantly positive, P < 0.01 |
(((Epsilon:0.00006,Delta:0.28137):1.51296,Beta:0.00006):0.09408, Gamma:1.01651,Alpha:0.20745); |
Program name | Description |
---|---|
distmat | Create a distance matrix from a multiple sequence alignment |
ednacomp | DNA compatibility algorithm |
ednadist | Nucleic acid sequence distance matrix program |
ednainvar | Nucleic acid sequence invariants method |
ednaml | Phylogenies from nucleic acid maximum likelihood |
ednamlk | Phylogenies from nucleic acid maximum likelihood with clock |
ednapars | DNA parsimony algorithm |
ednapenny | Penny algorithm for DNA |
eprotdist | Protein distance algorithm |
eprotpars | Protein parsimony algorithm |
erestml | Restriction site maximum likelihood method |
eseqboot | Bootstrapped sequences algorithm |
fdiscboot | Bootstrapped discrete sites algorithm |
fdnacomp | DNA compatibility algorithm |
fdnadist | Nucleic acid sequence distance matrix program |
fdnainvar | Nucleic acid sequence invariants method |
fdnamlk | Estimates nucleotide phylogeny by maximum likelihood |
fdnamove | Interactive DNA parsimony |
fdnapars | DNA parsimony algorithm |
fdnapenny | Penny algorithm for DNA |
fdolmove | Interactive Dollo or polymorphism parsimony |
ffreqboot | Bootstrapped genetic frequencies algorithm |
fproml | Protein phylogeny by maximum likelihood |
fpromlk | Protein phylogeny by maximum likelihood |
fprotdist | Protein distance algorithm |
fprotpars | Protein parsimony algorithm |
frestboot | Bootstrapped restriction sites algorithm |
frestdist | Calculate distance matrix from restriction sites or fragments |
frestml | Restriction site maximum likelihood method |
fseqboot | Bootstrapped sequences algorithm |
fseqbootall | Bootstrapped sequences algorithm |
Please report all bugs to the EMBOSS bug team (emboss-bug © emboss.open-bio.org) not to the original author.
Converted (August 2004) to an EMBASSY program by the EMBOSS team.
None