Because this program owes so much to Excoffier et al.'s
SIMCOAL (ver 1.0), we ask that users of Serial SimCoal cite
the following in any publications.
Excoffier L, Novembre J and Schneider S (2000) SIMCOAL: a
general coalescent program for simulation of molecular data in
interconnected populations with arbitrary demography. J.
Hered., 91, 506-509.
Bayesian Serial SimCoal, (BayeSSC) is a modification of SIMCOAL 1.0, a program
written by Laurent
Excoffier, John Novembre,
and Stefan Schneider.
Their website presents the
most thorough documentation available for this family of programs, and should be
regarded as authoritative.
The original version of the software on this site, Serial Simcoal, was first described in:
Anderson CNK, Ramakrishnan U, Chan YL and Hadly EA (2005) Serial SimCoal: A population genetic
model for data from multiple populations and points in time. Bioinformatics, 21, 1733-1734.
The modifications were made explicitly to be backwards compatable with
SIMCOAL 1.0, and SSC, so any input file from any previous version
can also be used for input into BayeSSC.
After work on this project began, a new
version (SimCoal2) was released.
Details can be found at any of the links above.
SIMCOAL 1.0 is a very flexible package that allows for almost any sampling regime
and population history. Serial SimCoal allows samples to be taken from different points in
time. Using ancient DNA, one could "scenario test", and get approximations of how likely
one scenario was relative to another. BayeSSC Bayes SSC is powerful because it allows
flexible coalescent modelling from a variety of different priors. The enables parameter estimation,
likelihood calculations, and Bayesian inference. For example,
SIMCOAL 1.0 could be used to simulate the effect of a historical event of modern
samples, Serial SimCoal incorporates ancient samples from before the event, and BayeSSC
derives the most likely date and severity of the event.
The tutorial below explains the basics of coalescence theory.
Users who are familiar with the coalescent process can skip ahead.
Typically, BayeSSC generates thousands of hypothetical trees using
slightly different population parameters. The simulated genetics of these
trees can then be compared to the actual genetics of the user's samples to
investigate which history of these many simulated histories is the most likely
to have generated the samples.
There are many ways to do this. One particularly useful approach is Mark Beaumont's ABC
method (Approximate Bayesian Computation described here).
In this method, the average euclidean
"distance" between the simulated genetic statistics and the observed genetic statistics is
calculated for thousands of parameter combinations. This approximates the error associated
with each simulation; parameters yielding low errors are more likely to represent the true
population history than parameters with high errors. If errors are very high, the simulation
is rejected entirely. These results can then be used to estimate the Bayesian posterior probability
of different parameter combinations.
The sections below give examples of how this
works, and how to create input files specifying the range of histories under
consideration.
The program flow of Serial SimCoal can be summarized in the following
figure:
Input
Only one input file is needed to run the simulations. The format of
the input file is almost identical to the "input file notation" section of the
SimCoal 1.0 Help Manual. To help learn the input format, we'll walk
through an example. Imagine you have 300-year-old samples from a
population with high levels of genetic diversity. But when you take samples
from the modern population, diversity is either low or quite different than
the older samples. One reasonable hypothesis is this change in the genetic
structure is due to a bottleneck in the population between now and 300 years ago.
(We're assuming for simplicity here that 1 generation = 1 year).
With that hypothesis, BayeSSC could be used to answer the following questions:
When did this bottleneck occur?
How severe was it? That is, how small was the population during the bottleneck?
Here is an example of an input file that could be used to investigate this question.
The fields are described in detail below. If you wish, you can download
this file as a template, or for practice running the program.
Input File: eg_bayes.par
Parameter Type
An example of input parameters for BayesSSC
1 population with ancient data
Deme sizes (haploid number of genes)
3000
Sample sizes: # samples, age, deme, stat group
2 sample groups
10 0 0 0
10 300 0 1
Growth rates
{ln([4]/3000)/[2]}
Number of migration matrices
0
Historical event: date from to %mig new_N new_r migmat
1 event
{U:1,299} 0 0 0 {3000/[4]} 0 0
Mutations per generation for the whole sequence
0.0003
Number of loci
300
//Data type : either DNA, RFLP, or MICROSAT
DNA 1
//Mutation rates: gamma parameters, theta and k
0 0
{E:200}
What follows is an explanation of each type of parameter listed in the second
column. As mentioned above, most of these parameters are identical to the
parameters of SimCoal 1.0, and alternative explanations can be found in the
Help Manual to that program.
These lines do not actually affect the program in any way. They
serve as a place for users to jot notes to themselves about
what comes next. In most example files, comments begin with a double slash (//).
They don't need to, but it's a good way to distinguish notes from lines
that contain information the program will use. They can be no longer
than one line, or you can leave them blank, but
you must have one comment line before each line of
information.
Put the number of demes you wish to simulate after the first
comment line, along with whatever text you want. If you are using
samples from more than one time point then the text following the
number must include the words "with ancient".
Example 1 (a simulation with only modern DNA): //Begining of SimCoal
file: 3 populations from Fiji, Vanuatu, and Kiribati
Example 2 (with old samples): //Parameters for a
SimCoal File: 3 demes with ancient DNA data
One number per population representing the effective population
size (Nef) of that population. Keep in mind that
the effective population size is very different than the census size:
although there are nearly 7 billion people on the planet, most races
have an effective population size much smaller than 50,000.
Example (sim with 3 populations): //Deme
sizes: 1000 3200 8743
Without ancient information: One sample group per
population is assumed. List the number of samples from each
population
With ancient information: An arbitrary number of sampling
groups can be added to each population, and they can be pooled
together in any combination for statistical analysis. The first line
begins with the total number of sampling groups, and can end with
any text you want. After that the format is: First: Number of
individuals in sample Second: Age of the sample (in generations)
Third: The number of the deme the sample belongs to (0,1,2,...)
Fourth: Which stat group the sample group should be pooled
with.
Example 1: No ancient data, three populations //Sample Sizes: 20 12 31
Example 2: Three populations with ancient
data //Format: Samples, age, deme,
stat group 8 sample groups; that's quite a few
10 0 0 0
10 200 0 1
10 0 1 0
2 450 1 2
10 0 2 0
10 200 2 1 7 450 2 2
4 450 2 3
In both examples, there are 20, 12 and 31 samples taken from
three populations respectively. In the second example, 10 of the
samples are recent in each population, the rest are ancient. The 20
samples that are 200 generations old are pooled together for
statistical calculations, and 9 of 13 450-generation-old samples are
pooled. The last four will have their statistics calculated
independently (just to show you that it's possible).
This one is a bit tricky. It is the NEGATIVE of the intrinsic rate of growth (r) from the standard
equation for exponential growth: N(t)=N(0)ert
Enter one value per
population. Because coalescent simulations run backward through time, a negative
growth rate implies a population larger now than in the past.
Example: Two stable populations, and one that is growing 2% per generation //Growth rates: 0 0 -.02
Several matrices can be listed in the input file. The first line
begins with the number of matrices (0 is fine). The next lines
define the ratio of migrants from each deme to each deme; each
migration matrix must be preceeded by a comment. The first migration
matrix is assumed to represent the migration in the present (or at
t=0). If you have more than one population but no migration,
then the demes will NEVER coalesce and you will get no
information. Note that the diagonal elements of the matrix are meaningless, but
the simulations will run faster if you set them to 0.
Example 1: two migration matrices for a simulation with three populations //Migration matrices 2
//Matrix 0: Deme0 <-> Deme1 <-> Deme2
0 .01 0
.01 0 .01
0 .01 0 //Matrix 1: Migration stopped
0 0 0 0 0 0 0 0 0
Example 2: Some lineages "today" in deme 0 are descendants of lineages originally in deme 1, but not vice versa. That is, as the tree builds back through time, there is a 1%/gen chance that each lineage in deme 0 will migrate to deme 1. //Asymetric migration matrix 1 matrix
//Migration from deme 0 to deme 1 *backwards* through time
0 .01
0 0
Like migration matricies, the first line gives the number of events (0 is fine).
Each subsequent line then lists one such event (no comment lines
between). An event consists of the following:
The time (in generations) when the event occured
The source deme (0,1,2...)
The sink deme.
The proportion of the source that migrates to the sink. It
also represents the probability for each lineage in the
source deme to migrate in the sink deme. If no migration is involved
in the event, then just specify the same source, sink, and a migration probability of 0.
The new effective population size of the sink deme relative to one
generation later in time. Remeber, coalescent simulations run backwards. So a value of 0.5 here implies the event doubled the population
(think, "The population used to be half as big").
The new growth rate of the sink deme. Negative values mean the population is growing.
The id of the new migration matrix to use for all demes.
Example: 2000 generations ago, deme 0 and 2 split from
what used to be a larger deme 1 //Format: time, src, sink, %
mig, new Nef, new r, MigMat 2000 0 1 1 2 0 1 2000 2 1 1 1 0
1
You could read the first event like this: When you go back to 2000 generations, take deme0 and move 100% of those lineages back to deme1, make deme1 twice as big, but then keep deme1's population constant thereafter (r=0), and use migration matrix 1 from now on.
COMMON MISTAKE: Remember that even though several events may happen during the same generation,
the computer will apply them sequentially. For example,
100 0 1 1 5 .5 1 100 2 1 1 5 0 1 will first make Deme1 5x bigger and growing, but then make it 5x bigger again (=25x bigger) and reset the growth rate to 0. Similarly,
100 0 1 .5 1 0 0 100 0 2 .5 1 0 0 will send half of deme0 to deme1, and then send half of the half that is left to deme2, leaving 25% of the lineages in deme0.
This number refers to the mutation rate for all loci taken as a
whole. For DNA sequences, it should correspond to the average
mutation number of mutations per generation per nucleotide, times the number of nucleotides.
Example: 10%/bp/million years for a 300bp
sequence and a species whose generations are 5 years long
10%/bp/1,000,000yr = .00000001/bp/yr * 300bp = .00003/yr * 5 yr/gen = .00015/gen
//Mutation rate .00015
MICROSAT: Microsatellites are simulated with a pure stepwise model, and can be followed
with a range constraint if you wish (no number implies no limit).
DNA: followed by the transition/transversion bias number. Mutation probabilities can be heterogenous (see "Gamma")
RFLP: a two allele model.
Example 1: Using DNA where 1/3 of the mutations are A<->G or C<->T (all mutations are equally likely) //Number of loci: DNA 0.33333
These parameters control the heterogeneity of DNA mutation rates along
the sequence. The first number is the shape parameter a of a Gamma distribution of mutation rates. If a
value of zero is entered, then an even mutation rate model is
implemented. The second number is the number of rate classes to
simulate. If a value of zero is entered, then a continuous
distribution is used (as many classes as there are loci or
nucleotides). Note that this is the reverse order and nomenclature from that used by statisticians
when talking about gamma distributions. For a statistician, the first number is the scale parameter
q and the second number is the shape parameter k.
Example 1: Uniform mutation rates (Cantor-Jukes model) //Gamma
distribution for mutation: 0 0
Example 2: Heterogenous mutation (Kimura 2-Parameter model) //Gamma
distribution for mutation: 0.4 10
At any point in the input file, rather than typing in an actual number, you can also
specify a range of possible values to investigate. BayeSSC allows five different prior
distributions:
In the example file above there are
two priors: a uniform distribution allowing the date of the bottleneck to vary uniformly from 1 to 299
generations in the past {U:1,299}; and an exponential
distribution for the population size during the bottleneck {E:200}. This will investigate
mostly bottleneck population sizes less than 200, but occasionally try one much larger.
The program will also evaluate expressions, such as {4*6+2^3} (which equals 32)
and {4*(6+2^3)} (equalling 56). All items contained in curly brackets {} recieve
an id number, starting at 1 (not 0). To reference this number, use square brackets.
Thus, {6*[2]^3}, means "six times the second prior-distribution-or-equation cubed".
The equation interpreter understands the following symbols:
+
-
*
/
^
(
)
ln
exp
Parentheses can be nested; that is {((4-3)*2)^2} would be valid, and equal 4.
In the example file above, there are two expressions given. The growth rate is given as
{ln([4]/3000)/[2]}, meaning the modern population size of 3000 is the result of exponential
growth following a bottleneck when the population was
size (prior 4). This bottleneck occured (prior 2) generations in the past. The equation comes
from rearranging the exponential growth equation:
The second expression is the scaling factor in the historical event: {3000/[4]}.
Here we are assuming the population crashed from an ancestral size of 3000
individuals to the bottleneck size (prior [4]) in a single generation.
Therefore, the generation before the bottleneck, the population was 3000/[4]
times larger. We could just as easily assume the effective ancestral population
size was 10000: 10000/[4]; or that the ancestral population was always the same
size as the bottleneck population (ie, there was no bottleneck, just recent growth from a
historically small population): 1.
Output
Most of the output from this program is the same as from SimCoal 1.0.
If you tell the program to run B simulations you will get:
B Arlequin project files (*.arp) containing sequence
information. This was meant to be used with the Arlequin program (also by
Excoffier and company) to calculate statistics on large samples of
genetic data. Each file contains the sequence/allele information for
each sample in the simulation.
An Arlequin batch file (*.arb) containing B links to the
Arlequin project files above. NOTE: Arlequin files are not output by
default. To get these output files, add
the "-a" flag to the command line: "BayeSSC.exe -a".
B Paup files (*.pau) containing sequence/allele information
from each sample for each iteration, and also the resulting genealogical
tree. This is in NEXUS format.
One huge Paup file (*.pau) containing all the information from the
B smaller Paup files above, and all B trees as well.
NOTE: These files are repressed by
default. To get these output files, add
the "-p" flag to the command line: "BayeSSC.exe -p".
A NEXUS batch file (*.bat) that contains links to all B of
the paup files.
One unique output file (*.gen) that lists all the input values and
calculates a few statistics on the results. These include the mutation
rate per locus, the mean and standard deviations of coalescence times,
the number and standard deviation of pairwise differences, and the
average and standard deviation of the number of mutations per genealogy.
A file listing the number of mutations per locus
(mut_hits_distr.sum) for each of the B simulations.
Two genealogy files (*.trees) one listing the branch lengths using
the true generation length (*_true_trees.trees) and one listing the
branch lengths in number of mutations (*_mut_trees.trees).
Number of private alleles. That is, for each allele in a stat_group,
how many of the alleles are unique to individuals from one deme of the
stat_group. For example, consider the following arrangmenent of alleles
A-E.
DEME 0:
DEME 1:
stat_group 0
A, B, C
A, C, D
stat_group 1
A, B
A, D, E
For stat group 0, there are
2 private alleles (B and D). In stat_group 1, there are 3 private
alleles (B, D and E). In the combined stat_group, there is only 1
private haplotype (E). All three results are in the output file.
For STR and RFLP data: Heterozygosity
For STR and RFLP data: Allelic variance (for the first locus only)
For STR: Repeat frequency distribution
For DNA: Segregating sites
For DNA: Pairwise Differences
For DNA: Haplotype Diversity (biased by a factor of (n-1)/n)
For DNA: Nucleotide Diversity
For DNA: Tajima's D
For DNA: Mismatch Distribution. That is, a series of numbers where
the first represents the number of pairs of numbers with no differences,
the second all the pairs with 1 difference, etc. For example, in the
following haplotype network at right has three populations, one with 7 samples
one with two samples and 2bp different, and one with three samples and 1bp different.
there are 7*6/2=21 identical pairs in the central popuation, 1 in the left, and 3
in the right for a total of 25 pairs with 0 differences. Then there are 7*3=21 pairs
with one difference, 7*2=14 with two differences, and 2*3=6 pairs with three differences.
So the mismatch distribution would be {25 21 14 6}.
Between-StatGroup Statistics
Number of alleles private to each stat group
For STR and RFLPs: Pearson's G2 (how well sample 1 matches expectations from the pool of sample 1 and sample 2)
For STR and RFLPs: Slatkin's RST
For DNA: Pairwise Differences
For DNA: Mean Diversity ()
For DNA: Pooled Diversity (HT)
For DNA: FST
And the number of generations to the most recent common ancestor (MRCA)
This file can be opened using Microsoft Excel, or any other spreadsheet,
database or statistical program (it's in .csv format). You can download a sample file here. This was produced
by running the example file above through 50 simulations. This is a very small
number of runs for statistical purposes. As a general rule of thumb, you should
run 10^(3+p) simulations, where p is the number of parameters you are approximating.
Hypothesis testing (with no parameters) can be adequately done in 1000 runs, but
several published studies have done 10,000,000 simulations or more.
Aside from the changes necessary to make SimCoal 1.0 compatible with
ancient genetic data, there were four other alterations made
to the source code.
Statistics Package
As previously discussed, a statistics program was integrated into
SimCoal itself so that relevent population statistics could be calculated
on the fly, not in an ex post facto analysis in Arlequin or Paup. The
statistics are output into a *_stat.csv file. For a description of the
contents of this file, see the section on output.
Multiple Coalescences
The original version of SimCoal allowed only one coalescent event per
generation. This assumption is a reasonably good approximation for
situations where the population size (N) is much larger than the
sample size (k). However there are situations where this assumption
is not met, as demonstrated below.
The probability of a coalescent event occuring in a population of size
N with k lineages is:
After these two lineages have coalesced, there are (k-1)
lineages remaining, so the probability of one more coalescence is
simply:
Thus the probability of two coalescent events occuring at the same time
is [1]*[2], or
This equation can be greatly simplified by the following
approximation:
This approximation is better than it may at first appear (the error is
<5% for k>5 and <1% for k>10). The implication
of formula [4] is that there is a 100% chance that more than one lineage
will coalesce in each generation where:
As a result, if condition [5] is ever met during a simulation, the
program will surely produce a long-biased tree. Even if this condition is
not met, the tree will still probably be biased longer than appropriate
due to missing less frequent double or triple coalescences. This may
translate into greater genotypic diversity than predicted.
This was changed in Serial SimCoal to allow as many coalescences per
generation as was appropriate. After the first coalescence, the number of
extant lineages is adjusted (decremented by one), and the probability of
coalescence is recalculated. If a second coalescence occurs, again the
probabilities are adjusted. This process continues until either a
coalescence fails to occur (Unif(0,1)>Pr(coalescence)) or the tree
coalesces completely. Again, it should be noted that this bias has also
been adjusted in Excoffier's new version of the program, SimCoal2.
Hudsonian Process
We chose to modify SimCoal 1.0 because of its unusually good tolerance
of ambiguities: you can have any amount of migration, any size of
populations, any number of samples, and all of these can change
arbitrarily at any time thanks to historical events. However, in almost
every simulation there comes a time when (1) all historical events have
occured, (2) deme size has ceased to fluctuate and (3) no migration is
occurring. When these three conditions are met, the waiting time to the
next coalescent event can be calculated extremely rapidly using a random
exponential variable. R.R. Hudson was among the first to exploit this
convenient property. In Serial SimCoal, we use a Hudsonian process in such
situations. This turns out to be a profitable modification, since these three
conditions are often met near the root of the genealogy, when
waiting times to the last few coalescent events very long. Thus, instead of waiting
several thousand generations for one random number to be small enough for a coalescence,
each coalescence time is generated from
just one random number. This process, when implemented with example 1,
sped the simulation up by a factor of 3. In simpler situations, gains would
be even greater.
Mersenne Twister
The default pseudorandom number generator in C++ has certain, usually unimportant problems. One of them
is that (on most operating systems) it produces a random number between 0 and 32,767. This means that if
you run several million simulations with prior distributions, you will be trying the same 32,768 values
multiple times, but not the values in between. BayeSSC uses the Mersenne Twister, which generates random
numbers with a granularity of 2^63, is extremely fast, and only repeats every 43*10^6000 random numbers.
Often, researchers are interested not just in the parameters of any one model, but want to compare
many different scenarios of past history. For example, the example file above eg_bayes.par
assumes that there was a bottleneck in the population at some point in the last 300 generations. We might
want to know if this model is any more likely than a model which has no bottleneck, that is, if the population has been
the same size since the most recent common ancestor (MRCA). One way to do this is to evaluate the likelihood of
the two models. This requires a few steps of analysis to be performed on the output of BayeSSC. The method
outlined below is only one of many possible (and published) methods for doing this analysis, and researchers
are encouraged to try their own techniques. The code for this method is written for the R
statstical package, which is free, open source, and quickly becoming the most important mathematical package
in the scientific world.
Approximate Bayesian Computation
Once you have run your simulations, you want to know which combinations of prior values give results that match
your data. For example, it may be the case that your genetic sequences came from a population whose size was 10,000,
but that a population of size 20,000 might have produced the same genetic
sequences as well; in fact it may have been even more likely to have produced them! What ABC tells you is the relative probability
of getting your data at different prior values. This means that the "right answer", the right population size for example,
is not a single number, but a probability distribution. This is called a posterior distribution.
Consider the example file above. In this file we are trying to approximate two parameters: the
date when the bottleneck happened, and the size of the population during the bottleneck (the modern growth rate and ratio
of the bottleneck size to the ancient population size are not "free parameters", since they can be calculated once we know
the other two). Let's say that we suspect a bottleneck, because the haplotype diversity in the modern is 0 (all samples have the same
haplotype), but 300 generations ago it was 0.66. We are therefore trying to find the relative likelihood of bottlenecks of ~200 individuals
at about 1-299 generations ago to produce this signiture. Here is an example of _stat.csv output you might get by running eg_bayes.par:
...and a bit later on in the file:
When BayeSSC simulated the first history, it randomly chose a bottleneck population size of 264 ("Abstract 0") and bottleck time
of 164 generations ago ("Event Tim"). The simulation run under those conditions produced a haplotype diversity of 0.54 in the modern group
(Group 0), and a haplotype diversity of 0.46 in the old group (Group 1). Of course it is generally not the case that such a bottleneck
produces higher diversity in the modern group, but the simulation shows that it is possible. This is why we need to run many, many
simulations to get a sense of the relative probability of different parameter values to produce data "like" ours.
One option for determining posterior distributions from this data is to use a rejection method. You will need to download
the R statstical package. The first time you do this analysis in R you will
also need to install the locfit, akima, and lattice packages (from the dropdown menu in R), and copy and paste
this source code into R. Once you've done all that, then at the > prompt,
type reject("[_stat.csv file name]") into R.
You will be given a list of the columns in the file,
and asked which ones you want to use in the analysis. In this case, we want
the haplotype diversity in group 0 (column #6) and the haplotype diversity in group 1 (column #22).
For this example, lets say our data had 0 diversity in group 0
and 0.66 in group 1. In the next step, the program calculates the "euclidean distance" from each simulation result to the observed data. The
smaller this distance, the more closely the simulated values match the real data. The program then
asks you for a delta value. Simulations that lie within delta units of the observed data are "accepted", while simulations
that produced data further away are rejected. Researchers tend to use delta values that accept from 0.1% to 10% of the simulations.
However, you also want many "acceptances" to draw valid inferences about your posterior (certainly 50 or more), which is why it is better
to do large numbers of simulations. In the example at right, we see that almost 400 of the 5000 simulations were within 0.1 units of the
"right" answer, so we should choose a value of 0.1 or lower. For this example, I somewhat arbitrarily chose 0.05.
That value produces the following
posterior distributions (there are 4 priors in the file, but we are only interested in Posterior 2: the date of the bottleneck,
and Posterior 4: the population size during the bottleneck):
The transparent bars represent the prior probabilities (uniform from 0 to 300, and exponential 200 respectively). The plum-colored
distribution is the posterior. We have little confidence in the date of the event, though 250 generations b.p. is
more likely than 50 generations b.p. We can have more confidence in claiming that the bottleneck population size was less than
200 individuals.
Posteriors
The reject() method in R will return
the accepted simulation values, and the cumulitive density functions (cdfs)
of the posterior distributions. Armed with this information, you can now rerun the simulations using the "best" (or "maximally credible")
version of the model. It is generaly more appropriate to use the posterior distribution at this step, rather than the most likely
estimator (e.g., 250 for prior 2 in the example above). Using the MLE presupposes that the scenario most likely to generate your data
was in fact what happened; but part of the point of doing Bayesian inference is to have some idea of the error associated with that value.
The error information is lost when you use just one number.
To estimate the posterior distribution, save the results of the rejection method (reject("[_stat.csv file name]")->eg). Then analyze them using the distrib.fit() function (for example, distrib.fit(eg$accept.sim$Event.Time.0)). This will attempt to fit four families of probability distribution to your posterior,
and return the -log likelihood value for each one (the lower the negative log likelihood, the better the fit). In the case of the bottleneck time, shown at left, we had so little information about the time that the uniform posterior from 12 to 299 was actually the best fit, scarcely different than the prior distribution: ~U(1,299). The size at the time of the bottleneck also fits an the same family of distribution as the prior very well (an exponential distribution), but with an optimal rate parameter of 73 rather than 200. The gamma distribution fits slightly
better, and would also be acceptable.
Calculating likelihoods
(This section is still under construction. Check back soon!)
More questions and answers will be added to the list below as they come
in. For now, if you have any problems with BayeSSC or Serial SimCoal, contact the programmer, Christian Anderson at Scripps Institution
of Oceanography. Comments and suggestions for improvements or extentions are also
more than welcome.
On a Mac, which program should I use to open SimCoal? On some versions of MacOS-X you'll get an annoying message asking you which program to use to open the program. While we appreciate the metatextual irony, we have not been able to find a direct way around this problem. You will unfortunately need to work around it by executing directly from Terminal.
Open "Terminal" from the applications folder.
Type "cd" and the directory where you have BayeSSC.
Type "./BayeSSC" to run the program. If you get a "Permission denied" error, type "chmod 777 BayeSSC" and try again.
Why don't I get Arlequin / PAUP files? BayeSSC typically needs to run many thousands
of simulations, which would produce many thousands of these files. By default, this extra output is turned off.
In order to output this information add a -a or a -p flag (or both) to the command line, like this (for Windows)
BayeSSC.exe -a -p
Why can't SimCoal open my input files? Some operating systems
are unable to open files that have a space in the path name, like
"C:\Documents And Settings\test.par". If you put your input files in the
same directory as SimCoal, then you don't need to enter path information
at all.
Why can't SimCoal find my input file? In MacOS 10.4, UNIX
executables always look for files on the desktop if you don't give them
a path name, no matter where the program looking for the files is
located. In 10.5, they always look in the user's directory.
To bypass this built-in inconvinience, open terminal, go to the
directory where SimCoal is located, and run it from there.
Alternatively, if you have an input file on your desktop, you can type
Input file name (*.par): Desktop/examplefile.par
(See detailed instructions for the first FAQ).
Drummond AJ, OG Pybus, A Rambaut, R Forsberg, AG Rodrigo (2003)
Measurably evolving populations. Trends in Ecology and Evolution,
18:481-488.
Hudson R (1990) Gene genealogies and the coalescent proces. In
Oxford Surveys in Evolutionary Biology (Futuyma DJ and JD
Antonovics, eds.), New York: Oxford University Press; p. 1-44.
Kingman JFC (1982) The coalescent. Stochastic Processes and their
Applications, 13:235-248.
Funding for development of this program and website was received from
NSF (grant DEB#0108541 to Liz Hadly and Joanna Mountain), as well as from
the Stanford's Office of Technology and Licensing (OTL) via a Research
Incentive Fund award to Joanna
Mountain.