Effective Species Count
General Introduction
This web page calculates the effective species count for a
user-supplied phylogenetic tree and user-supplied nucleotide
substitution model. The effective species count measures how
efficiently sequences for the species in the leaves of the phylogenetic
tree can be used to reconstruct the equilibrium distribution that
governs each multiply aligned DNA sequence position.
When several species are very closely related the effective species
count will be above, but very near, 1.0 because the high correlation
between the species' sequences means that the information from any one
of them is almost all there is to know. When several species are
distantly related the effective species count will be near the number
of species, since each is essentially independent of the others. This
software provides solutions for the intermediate cases, where the
extent of the correlation between species' sequences is not obvious.
Greedy, AntiGreedy, and AllOnly
The software allows the user to request greedy or
antiGreedy instead of the default allOnly analysis. From a set of
start species, greedy will seek the additional species that most
increases the effective species count and add it to the starting
collection, and then repeat; one at a time finding more species that
most increase the effective species count. AntiGreedy instead finds,
one at a time, the additional species that least increase the effective
species count at each addition. AllOnly reports the effective species
count for the set of start species and the set of all species only.
Phylogenetic Tree
The phylogenetic tree should be supplied in Newick format (see, e.g.,
http://en.wikipedia.org/wiki/Newick_format) either as text directly, or
as a file. The species in the leaves of the phylogenetic tree should
be named.
Nucleotide Substitution Model
Several nucleotide substitution models are supported. They are:
- Fel81 -- from Felsenstein (1981) J Mol Evol 17(6):368-376
(PubMed 7288891)
- HKY85 -- from Hasegawa, Kishino, and Yano (1985) J Mol Evol
22(2):160-174 (PubMed 3934395)
- HKY85Slow -- same cite as HKY85, but model is without normalizing
rate constant
- HB98 -- Halpern and Bruno (1998) Mol Biol Evol 15(7):910-917
(PubMed 9656490)
- New05 -- Experimental -- Newberg (2005) Technical Report 05-08,
Rensselaer Polytechnic Institute Department of Computer Science, Troy,
NY.
- New06 -- Experimental -- no cite available
Each of these requires a "foreground" nucleotide equilibrium
probability distribution. In addition the HB98, New05, and New06
models require specification of a "background" nucleotide substitution
model that specifies the nucleotide substitution process in the absence
of selection pressures. The choices are
- JC69 -- from Jukes and Cantor (1969) Mammalian Protein
Metabolism 3:21-132, Academic Press.
- Kim80 -- from Kimura (1980) J Mol Evol 16(2):111-120
(PubMed 7463489)
- Fel81 -- from Felsenstein (1981) J Mol Evol 17(6):368-376
(PubMed 7288891)
- HKY85 -- from Hasegawa, Kishino, and Yano (1985) J Mol Evol
22(2):160-174 (PubMed 3934395)
- HKY85Slow -- same cite as HKY85, but model is without normalizing
rate constant
If Fel81, HKY85, or HKY85Slow is chosen for the background nucleotide
substitution model then a "background" nucleotide equilibrium
probability distribution must be supplied.
If HKY85, HKY85Slow, or Kim80 is specified for either the foreground or
background nucleotide substitution model then a
transition-to-transversion ratio must be supplied.
Output Level
Brief output gives only the effective species counts. Verbose output
shows the program's attempts in finding a the best or worst species to
add to the set of start species.
Very verbose output gives additional information such
as
pairwise distances between the species, as well as output concerning
alternatives for determining a nucleotide equilibrium probability
distribution: (1) optimal sequence weights (see Newberg, McCue, and
Lawrence, 2005, Stat Appl Genet Mol Biol 4:Article13,
PubMed 16646830)
and (2) equal sequence weights.
Paste Newick description in text box and then press the Submit Newick button to proceed
or browse and select a Newick file
This software was developed as part of the Center for Bioinformatics at Wadsworth Center. Supported by the Computational Molecular Biology and Statistics Core Facility at the Wadsworth Center and the NIH/NHGRI grant 5K25HG003291 to Lee Newberg