Software

codonbias   CodonRecSim   DPPDiv   iMCMC   MDIV   MIMAR   MISAT   MrBayes   nSL   PATRI   Structurama   SweepFinder   trueFS

MrBayes (Huelsenbeck and Ronquist)

MrBayes is a program that estimates phylogeny using as input an alignment. The program using Markov chain Monte Carlo to approximate the posterior probability distribution of trees.

Structurama (Huelsenbeck, Huelsenbeck, and Andolfatto)

Structurama infers population structure using as input genetic data for a set of individuals. It uses a Dirichlet process prior, which allows the number of populations to be a random variable.

MDIV (Nielsen)

MDIV is a program that will simultaneously estimate divergence times and migration rates between two populations under the infinite sites model and under a finite sites model (HKY). The program can be used to test if there is evidence for migration between two populations or evidence for shared recent common ancestry. In addition, you get maximum likelihood estimates of the demographic parameters. The program assumes that there is no recombination. The output of the program are integrated likelihood surfaces for the three parameters: q (two times the effective population size times the mutation rate), M (2 times the migration rate) and T (the divergence time divided by the effective population size. For more information regarding the program, please see:

Nielsen, R. and J. W. Wakeley. 2001. Distinguishing Migration from Isolation: an MCMC Approach. Genetics 158: 885-896.

This version of the program is only applicable to a single locus and assumes equal population sizes in all populations. A program with enhanced features and better documentation is available from Jody Hey's web site.

At the moment I only distribute a Windows executable version of the program. Please send enquiries regarding source code or executables for other platforms to me.

  • Windows Executable (coming soon)
  • Documentation (Readme file) (coming soon)
  • Example infile (coming soon)

MISAT (Nielsen)

MISAT is a program for estimating the likelihood surface for q (4 times the effective population size times the mutation rate) for microsatellite data. Two models are implemented: a stepwise mutation model and a mutation model allowing multi-step mutations, i.e. mutational jumps larger than on repeat unit in size. There are several other programs available for doing this type of analysis, and to my knowledge, this is probably the slowest program publicly distributed. It is, by now, somewhat outdated although the multi-step mutational model probably is not implemented in any other programs. For more information, please see:

Nielsen, R. 1997. A Maximum Likelihood Approach to Population Samples of Microsatellite alleles. Genetics. 146: 711-716.
 
Nielsen R. 1997. A likelihood approach to populations samples of microsatellite alleles (vol 146, pg 711, 1997). Genetics. 147: 349-349.
 
Nielsen, R. and P. J. Palsbøll. 1999. Tests of Microsatellite Evolution: Multi-Step Mutations and Constraints on Allele Size. Mol. Phyl. Evol. 11: 477-484.

At the moment I only distribute a Windows executable version of the program. Please send enquiries regarding source code or executables for other platforms to Rasmus Nielsen.

  • Windows Executable (coming soon)
  • Documentation (Readme file in Word format) (coming soon)
  • Example infile (coming soon)

SweepFinder (Nielsen)

SweepFinder is a program implementing the method described in:

Nielsen et al. 2005. Genomic scans for selective sweeps using SNP data. Genome Research 1566-1575.

It can be used to detect the location of a selective sweep based on SNP data. It will also estimate the frequency spectrum of observed SNP data in the presence of missing data.

trueFS (Nielsen)

trueFS is a program used for finding the ascertainment corrected frequency spectrum based on ascertained SNP data. It can perform the corrections under multiple different models including double-hit ascertainment models and ascertainment with or without overlap between the original ascertainment sample and the final genotyped sample. It uses a bootstrap method to quantify statistical uncertainty in the estimates. For more information regarding the method, please see:

Nielsen, R., M. J. Hubisz and A. G. Clark. 2004. Reconstituting the frequency spectrum of ascertained SNP data. Genetics 168: 2373-2382.
  • Source Code and Instructions (coming soon)
  • Instructions (coming soon)

codonbias (Nielsen)

This program will allow the user to estimate selection coefficients relating to optimal codon usage. The basic methodology is similar to the codon based models implemented in the popular program PAML by Ziheng Yang. However, it explicitly models selection for optimal codon usage on different lineages of a phylogeny. The program may be a bit hard to compile because it requires special libraries (see documentation). For any analysis which can also be done in PAML, we recommend using PAML as PAML is much superior to our program in a number of ways including ease of use, computational speed, how well it has been tested, etc.

  • Source Code and Instructions (coming soon)
  • Instructions (coming soon)

CodonRecSim (Nielsen)

CodonRecSim is an old program written by R. Nielsen for simulating samples in a codon based models under the coalescent with recombination. This program was used in:

Anisimova, M., R. Nielsen and Z. Yang. 2003 Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics 164: 1229-1236.

It is not very well-supported--but if you are used to simulating samples using Evolver distributed in the PAML package by Ziheng Yang--you may be able to figure out how this program works. Most of the interface is modelled based on evolver--and in many analyses it should be able to work just as evolver but with an extra parameter: R--the population scaled recombination rate. However, there are a number of evolver options that are not implemented in the program. Notice that some code in this program is copyrighted to Ziheng Yang.

  • Windows Executable and Source Code (coming soon)

PATRI (Nielsen)

PATRI (PaTeRnity Inference) is a program for paternity analysis of genetic data. The program requires genotypic, diploid data from one or more loci from mother-offspring pairs and from potential fathers. Typical data might include microsatellite markers, Restriction Fragment Length Polymorphisms (RFLPs) or Single Nucleotide Polymorphisms (SNPs). Given such genotypic data, PATRI can calculate posterior probabilities of paternity for all sampled offspring. When behavioral or ecological information can be used to divide the sampled males into different groups, PATRI can perform maximum likelihood analyses of hypotheses regarding the relative reproductive success of those groups. The underlying statistical methodology was described in:

Nielsen,R., Mattila,D.K., Clapham,P.J. and Palsbøll,P.J. 2001. Statistical Approaches to Paternity Analysis in Natural Populations and Applications to the North Atlantic Humpback Whale. Genetics 157:1673-1682.

For all genotypes, PATRI can estimate the posterior probability that a particular male has sired a particular offspring, assuming a uniform prior among all males in the population. The male population size (N) can either be specified by the user as a fixed value, or uncertainty regarding N can be modeled using a uniform or Gaussian prior. Using a uniform prior corresponds to assuming no prior information regarding the male population size, except that an upper bound can be specified. PATRI can also produce a maximum likelihood estimate of N based solely on the parent-offspring genotypic data. The estimation of N assumes equal fecundity and unbiased sampling of males.

If sampled males can be divided into groups based on behavioral or ecological information, PATRI can be used to evaluate hypotheses regarding the relative reproductive success of these groups. For k groups the user starts with a full model containing k-1 parameters, α2 , α3 , ..., αk , where αi , is defined as the reproductive success of group i relative to group 1. The user can then enter restrictions on these parameters. For example, the hypothesis that males from groups i and j have equal reproductive corresponds to the restriction αi = αj . Given a set of restrictions, PATRI can 1) maximize the likelihood and 2) plot a profile likelihood surface for any particular αi . The profile likelihood surface for αi is constructed by optimizing over all αj , j1 i. The maximum likelihood values are stored in a table, allowing the user to perform likelihood ratio tests of various hypotheses regarding reproductive success. This analysis can be done using a fixed value of N or by assuming N is uniformly or Gaussian distributed.

  • Windows Executable (coming soon)
  • Executable for Linux on Sun processor (coming soon)
  • Executable for Linux on Intel processor (coming soon)
  • Documentation (Readme file) (coming soon)
  • Example infile (coming soon)

MIMAR (Becquet)

MIMAR (MCMC estimation of the Isolation-Migration model Allowing for Recombination) is a Markov chain Monte Carlo method to estimate parameters of an isolation-migration model. It uses summaries of polymorphism data at multiple loci surveyed in a pair of diverging populations or closely related species and in contrast to previous methods, allows for intralocus recombination. Note that you need to know the ancestral allele at each polymorphic site in order to calculate the summary statistics. The method is described in Becquet and Przewroski (2007) Genome Research.

iMCMC (Huelsenbeck)

This program was inspired by Paul Lewis's fantastic windows program MCROBOT. iMCMC is a Macintosh application that illustrates Markov chain Monte Carlo (MCMC) for a simple landscape. Have fun!

DPPDiv (Heath, Holder, and Huelsenbeck)

DPPDiv is a program for estimating species divergence times and lineage-specific substitution rates on a fixed topology. The prior on branch rates is a Dirichlet process prior which clusters branches into distinct rate classes. Alternative priors including the global molecular clock and the independent rates model are also available. The priors on node ages include the birth-death (and Yule) model and the uniform distribution.

nSL (Ferrer-Admetlla et al) NEW!

nSL is a program for efficiently computing the nSL statistic described in Ferrer-Admetlla et al. 2014. On Detecting Incomplete Soft or Hard Selective Sweeps Using Haplotype Structure. MBE Source code, executable, instructions, and example files are included in the .zip.