Input can be square (all values) or lower-triangular (diagonal and below) or upper-triangular (diagonal and above). We can count values for the first 2 species to identify the format.
S-format allows degree of replication for each distance (integer) we can check for this (twice as many numbers) otherwise we set the replicates to 1.
Name |
---|
AjSPhyloDist |
AjOPhyloDist |
Name | Type | Description |
---|---|---|
Size | ajint | Size - number of rows and number of columns |
HasReplicates | AjBool | Has (some) replicates data in file |
Names | AjPStr* | Row names, NULL at end |
Data | float* | Distance matrix Size*Size with diagonal 0.0 |
Replicates | ajint* | Replicate count default=1 missing=0 |
HasMissing | AjBool | Has missing data in file |
Padding | char[4] | Padding to alignment boundary |
For continuous data there are always 2 alleles For gene frequency data there can be more than 2 alleles
Name |
---|
AjSPhyloFreq |
AjOPhyloFreq |
Name | Type | Description |
---|---|---|
Size | ajint | Number of rows |
Loci | ajint | Number of loci per name |
Len | ajint | Number of values per name may be more than 1 per locus |
ContChar | AjBool | Continuous character data if true |
Names | AjPStr* | Row names array (size is Size) |
Name | Type | Description |
---|---|---|
Species | ajint* | Species number 1, 2, 3 for each value array size is Len |
Individuals | ajint* | Allele countNumber of individuals 1 or more per species array size is Loci |
Name | Type | Description |
---|---|---|
Locus | ajint* | Locus number 1, 2, 3 for each value array size is Len |
Allele | ajint* | Allele count 2 or more per locus array size is Loci |
Data | float* | Frequency for each allele for each Name |
Within | AjBool | Individual data within species if true |
Padding | char[4] | Padding to alignment boundary |
Basically, all of these are one value per position
Weights are converted to integers 0-9, A=10 Z=35 by phylip There are programs that can use multiple weights We can handle this by making all of these multiple, and using ACD to limit them to 1 for non-weight data.
Ancestral states are character data
Factors are multi-state character data where the factor character changes when moving to a new character. Without this, all factors are assumed to be different. The default would be to make each character distinct by alternating 12121212 or to use 12345678901234567890.
We can, in fact, convert any input string into this format for factors but probably we can leave them unchanged.
Name |
---|
AjSPhyloProp |
AjOPhyloProp |
Name | Type | Description |
---|---|---|
Len | ajint | string length |
Size | ajint | number of strings |
IsWeight | AjBool | is phylip weight values if true |
IsFactor | AjBool | is phylip factor values if true |
Str | AjPStr* | The original string(s) |
Basically, all of these are one value per position
States have a limited character set, usually defined through ACD
Name |
---|
AjSPhyloState |
AjOPhyloState |
Name | Type | Description |
---|---|---|
Len | ajint | string length |
Size | ajint | number of strings |
Characters | AjPStr | The allowed state characters |
Names | AjPStr* | The names |
Str | AjPStr* | The original string(s) |
Count | ajint | number of enzymes for restriction data |
Padding | char[4] | Padding to alignment boundary |
For programs that read multiple tree inputs we use an array, and let ACD limit the others to 1 tree.
Name |
---|
AjSPhyloTree |
AjOPhyloTree |
Name | Type | Description |
---|---|---|
Multifurcated | AjBool | Multifurcating (..(a,b,c)..) |
BaseTrifurcated | AjBool | 3-way base (a,b,c) |
BaseBifurcated | AjBool | Rooted 2-way base (a,b) |
BaseQuartet | AjBool | Unrooted quartet ((a,b),(c,d)); |
HasLengths | AjBool | Tree has branch lengths |
Size | ajint | Number of nodes |
Tree | AjPStr | Newick tree string |