|   | fseqbootall | 
To carry out a bootstrap (or jackknife, or permutation test) with some method in the package, you may need to use three programs. First, you need to run SEQBOOT to take the original data set and produce a large number of bootstrapped or jackknifed data sets (somewhere between 100 and 1000 is usually adequate). Then you need to find the phylogeny estimate for each of these, using the particular method of interest. For example, if you were using DNAPARS you would first run SEQBOOT and make a file with 100 bootstrapped data sets. Then you would give this file the proper name to have it be the input file for DNAPARS. Running DNAPARS with the M (Multiple Data Sets) menu choice and informing it to expect 100 data sets, you would generate a big output file as well as a treefile with the trees from the 100 data sets. This treefile could be renamed so that it would serve as the input for CONSENSE. When CONSENSE is run the majority rule consensus tree will result, showing the outcome of the analysis.
This may sound tedious, but the run of CONSENSE is fast, and that of SEQBOOT is fairly fast, so that it will not actually take any longer than a run of a single bootstrap program with the same original data and the same number of replicates. This is not very hard and allows bootstrapping or jackknifing on many of the methods in this package. The same steps are necessary with all of them. Doing things this way some of the intermediate files (the tree file from the DNAPARS run, for example) can be used to summarize the results of the bootstrap in other ways than the majority rule consensus method does.
If you are using the Distance Matrix programs, you will have to add one extra step to this, calculating distance matrices from each of the replicate data sets, using DNADIST or GENDIST. So (for example) you would run SEQBOOT, then run DNADIST using the output of SEQBOOT as its input, then run (say) NEIGHBOR using the output of DNADIST as its input, and then run CONSENSE using the tree file from NEIGHBOR as its input.
The resampling methods available are:
| Andrew Rambaut's BEAST XML format | http://evolve.zoo.ox.ac.uk/beast/introXML.html and http://evolve.zoo.ox.ac.uk/beast/referenindex.html | A format for alignments. There is also a format for phylogenies described there. | 
| MSAML M | http://xml.coverpages.org/msaml-desc-dec.html | Defined by Paul Gordon of University of Calgary. See his big list of molecular biology XML projects. | 
| BSML | http://www.bsml.org/resources/default.asp | Bioinformatic Sequence Markup Language includes a multiple sequence alignment XML format | 
| % fseqbootall -seed 3 Bootstrapped sequences algorithm Input (aligned) sequence set: seqboot.dat Phylip seqboot program output file [seqboot.fseqbootall]: bootstrap: true jackknife: false permute: false lockhart: false ild: false justwts: false completed replicate number 10 completed replicate number 20 completed replicate number 30 completed replicate number 40 completed replicate number 50 completed replicate number 60 completed replicate number 70 completed replicate number 80 completed replicate number 90 completed replicate number 100 Output written to file "seqboot.fseqbootall" Done. | 
Go to the input files for this example
Go to the output files for this example
| 
   Standard (Mandatory) qualifiers:
  [-infilesequences]   seqset     (Aligned) sequence set filename and optional
                                  format, or reference (input USA)
  [-outfile]           outfile    [*.fseqbootall] Phylip seqboot program
                                  output file
   Additional (Optional) qualifiers (* if not always prompted):
   -categories         properties File of input categories
   -mixfile            properties File of mixtures
   -ancfile            properties File of ancestors
   -weights            properties Weights file
   -factorfile         properties Factors file
   -datatype           menu       [s] Choose the datatype (Values: s
                                  (Molecular sequences); m (Discrete
                                  Morphology); r (Restriction Sites); g (Gene
                                  Frequencies))
   -test               menu       [b] Choose test (Values: b (Bootstrap); j
                                  (Jackknife); c (Permute species for each
                                  character); o (Permute character order); s
                                  (Permute within species); r (Rewrite data))
*  -regular            toggle     [N] Altered sampling fraction
*  -fracsample         float      [100.0] Samples as percentage of sites
                                  (Number from 0.100 to 100.000)
*  -rewriteformat      menu       [p] Output format (Values: p (PHYLIP); n
                                  (NEXUS); x (XML))
*  -seqtype            menu       [d] Output format (Values: d (dna); p
                                  (protein); r (rna))
*  -morphseqtype       menu       [p] Output format (Values: p (PHYLIP); n
                                  (NEXUS))
*  -blocksize          integer    [1] Block size for bootstraping (Integer 1
                                  or more)
*  -reps               integer    [100] How many replicates (Integer 1 or
                                  more)
*  -justweights        menu       [d] Write out datasets or just weights
                                  (Values: d (Datasets); w (Weights))
*  -enzymes            boolean    [N] Is the number of enzymes present in
                                  input file
*  -all                boolean    [N] All alleles present at each locus
*  -seed               integer    [1] Random number seed between 1 and 32767
                                  (must be odd) (Integer from 1 to 32767)
   -printdata          boolean    [N] Print out the data at start of run
*  -[no]dotdiff        boolean    [Y] Use dot-differencing
   -[no]progress       boolean    [Y] Print indications of progress of run
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:
   "-infilesequences" associated qualifiers
   -sbegin1            integer    Start of each sequence to be used
   -send1              integer    End of each sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -sformat1           string     Input sequence format
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name
   "-outfile" associated qualifiers
   -odirectory2        string     Output directory
   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
 | 
| Standard (Mandatory) qualifiers | Allowed values | Default | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [-infilesequences] (Parameter 1) | (Aligned) sequence set filename and optional format, or reference (input USA) | Readable set of sequences | Required | ||||||||||||
| [-outfile] (Parameter 2) | Phylip seqboot program output file | Output file | <*>.fseqbootall | ||||||||||||
| Additional (Optional) qualifiers | Allowed values | Default | |||||||||||||
| -categories | File of input categories | Property value(s) | |||||||||||||
| -mixfile | File of mixtures | Property value(s) | |||||||||||||
| -ancfile | File of ancestors | Property value(s) | |||||||||||||
| -weights | Weights file | Property value(s) | |||||||||||||
| -factorfile | Factors file | Property value(s) | |||||||||||||
| -datatype | Choose the datatype | 
 | s | ||||||||||||
| -test | Choose test | 
 | b | ||||||||||||
| -regular | Altered sampling fraction | Toggle value Yes/No | No | ||||||||||||
| -fracsample | Samples as percentage of sites | Number from 0.100 to 100.000 | 100.0 | ||||||||||||
| -rewriteformat | Output format | 
 | p | ||||||||||||
| -seqtype | Output format | 
 | d | ||||||||||||
| -morphseqtype | Output format | 
 | p | ||||||||||||
| -blocksize | Block size for bootstraping | Integer 1 or more | 1 | ||||||||||||
| -reps | How many replicates | Integer 1 or more | 100 | ||||||||||||
| -justweights | Write out datasets or just weights | 
 | d | ||||||||||||
| -enzymes | Is the number of enzymes present in input file | Boolean value Yes/No | No | ||||||||||||
| -all | All alleles present at each locus | Boolean value Yes/No | No | ||||||||||||
| -seed | Random number seed between 1 and 32767 (must be odd) | Integer from 1 to 32767 | 1 | ||||||||||||
| -printdata | Print out the data at start of run | Boolean value Yes/No | No | ||||||||||||
| -[no]dotdiff | Use dot-differencing | Boolean value Yes/No | Yes | ||||||||||||
| -[no]progress | Print indications of progress of run | Boolean value Yes/No | Yes | ||||||||||||
| Advanced (Unprompted) qualifiers | Allowed values | Default | |||||||||||||
| (none) | |||||||||||||||
| 
    5    6
Alpha     AACAAC
Beta      AACCCC
Gamma     ACCAAC
Delta     CCACCA
Epsilon   CCAAAC
 | 
The Factors option causes the characters to be resampled together. If (say) three adjacent characters all have the same factors characters, so that they all are understood to be recoding one multistate character, they will be resampled together as a group.
The order of species in the data sets in the output file will vary randomly. This is a precaution to help the programs that analyze these data avoid any result which is sensitive to the input order of species from showing up repeatedly and thus appearing to have evidence in its favor.
The numerical options 1 and 2 in the menu also affect the output file. If 1 is chosen (it is off by default) the program will print the original input data set on the output file before the resampled data sets. I cannot actually see why anyone would want to do this. Option 2 toggles the feature (on by default) that prints out up to 20 times during the resampling process a notification that the program has completed a certain number of data sets. Thus if 100 resampled data sets are being produced, every 5 data sets a line is printed saying which data set has just been completed. This option should be turned off if the program is running in background and silence is desirable. At the end of execution the program will always (whatever the setting of option 2) print a couple of lines saying that output has been written to the output file.
| 
    5     6
Alpha      AAACCA
Beta       AAACCC
Gamma      ACCCCA
Delta      CCCAAC
Epsilon    CCCAAA
    5     6
Alpha      AAACAA
Beta       AAACCC
Gamma      ACCCAA
Delta      CCCACC
Epsilon    CCCAAA
    5     6
Alpha      AAAAAC
Beta       AAACCC
Gamma      AACAAC
Delta      CCCCCA
Epsilon    CCCAAC
    5     6
Alpha      CCCCCA
Beta       CCCCCC
Gamma      CCCCCA
Delta      AAAAAC
Epsilon    AAAAAA
    5     6
Alpha      AAAACC
Beta       AAACCC
Gamma      AACACC
Delta      CCCCAA
Epsilon    CCCACC
    5     6
Alpha      AAAACC
Beta       ACCCCC
Gamma      AAAACC
Delta      CCCCAA
Epsilon    CAAACC
    5     6
Alpha      AACCAA
Beta       AACCCC
Gamma      ACCCAA
Delta      CCAACC
Epsilon    CCAAAA
    5     6
Alpha      AAAACC
Beta       ACCCCC
Gamma      AAAACC
Delta      CCCCAA
Epsilon    CAAACC
    5     6
Alpha      AACACC
  [Part of this file has been deleted for brevity]
Gamma      ACAAAA
Delta      CCCCCC
Epsilon    CCAAAA
    5     6
Alpha      AACAAC
Beta       AACCCC
Gamma      AACAAC
Delta      CCACCA
Epsilon    CCAAAC
    5     6
Alpha      AACAAA
Beta       AACCCC
Gamma      CCCAAA
Delta      CCACCC
Epsilon    CCAAAA
    5     6
Alpha      ACAAAA
Beta       ACCCCC
Gamma      CCAAAA
Delta      CACCCC
Epsilon    CAAAAA
    5     6
Alpha      CAAAAA
Beta       CCCCCC
Gamma      CAAAAA
Delta      ACCCCC
Epsilon    AAAAAA
    5     6
Alpha      CAACCC
Beta       CCCCCC
Gamma      CAACCC
Delta      ACCAAA
Epsilon    AAACCC
    5     6
Alpha      ACAACC
Beta       ACCCCC
Gamma      ACAACC
Delta      CACCAA
Epsilon    CAAACC
    5     6
Alpha      AAAAAA
Beta       AAAAAC
Gamma      ACCCCA
Delta      CCCCCC
Epsilon    CCCCCA
    5     6
Alpha      AACAAC
Beta       AACCCC
Gamma      CCCAAC
Delta      CCACCA
Epsilon    CCAAAC
 | 
| Program name | Description | 
|---|---|
| distmat | Create a distance matrix from a multiple sequence alignment | 
| ednacomp | DNA compatibility algorithm | 
| ednadist | Nucleic acid sequence Distance Matrix program | 
| ednainvar | Nucleic acid sequence Invariants method | 
| ednaml | Phylogenies from nucleic acid Maximum Likelihood | 
| ednamlk | Phylogenies from nucleic acid Maximum Likelihood with clock | 
| ednapars | DNA parsimony algorithm | 
| ednapenny | Penny algorithm for DNA | 
| eprotdist | Protein distance algorithm | 
| eprotpars | Protein parsimony algorithm | 
| erestml | Restriction site Maximum Likelihood method | 
| eseqboot | Bootstrapped sequences algorithm | 
| fdiscboot | Bootstrapped discrete sites algorithm | 
| fdnacomp | DNA compatibility algorithm | 
| fdnadist | Nucleic acid sequence Distance Matrix program | 
| fdnainvar | Nucleic acid sequence Invariants method | 
| fdnaml | Estimates nucleotide phylogeny by maximum likelihood | 
| fdnamlk | Estimates nucleotide phylogeny by maximum likelihood | 
| fdnamove | Interactive DNA parsimony | 
| fdnapars | DNA parsimony algorithm | 
| fdnapenny | Penny algorithm for DNA | 
| fdolmove | Interactive Dollo or Polymorphism Parsimony | 
| ffreqboot | Bootstrapped genetic frequencies algorithm | 
| fproml | Protein phylogeny by maximum likelihood | 
| fpromlk | Protein phylogeny by maximum likelihood | 
| fprotdist | Protein distance algorithm | 
| fprotpars | Protein parsimony algorithm | 
| frestboot | Bootstrapped restriction sites algorithm | 
| frestdist | Distance matrix from restriction sites or fragments | 
| frestml | Restriction site maximum Likelihood method | 
| fseqboot | Bootstrapped sequences algorithm | 
Although we take every care to ensure that the results of the EMBOSS version are identical to those from the original package, we recommend that you check your inputs give the same results in both versions before publication.
Please report all bugs in the EMBOSS version to the EMBOSS bug team, not to the original author.