3. Command line tools

3.1. scans.py

This script contains command-line utilities for calculating EHH-based scans for positive selection in genomes, including EHH, iHS, and XP-EHH.

usage: scans.py subcommand
Sub-commands:
selscan_file_conversion

Process a bgzipped-VCF (such as those included in the Phase 3 1000 Genomes release) into a gzip-compressed tped file of the sort expected by selscan.

usage: scans.py selscan_file_conversion [-h] [--startBp STARTBP]
                                        [--endBp ENDBP] [--ploidy PLOIDY]
                                        [--considerMultiAllelic]
                                        [--rescaleGeneticDistance]
                                        [--includeLowQualAncestral]
                                        [--codingFunctionClassFile CODINGFUNCTIONCLASSFILE]
                                        [--sampleMembershipFile SAMPLEMEMBERSHIPFILE]
                                        [--filterPops FILTERPOPS [FILTERPOPS ...]]
                                        [--filterSuperPops FILTERSUPERPOPS [FILTERSUPERPOPS ...]]
                                        [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                        [--version] [--tmpDir TMPDIR]
                                        [--tmpDirKeep]
                                        inputVCF genMap outPrefix outLocation
                                        chromosomeNum
Positional arguments:
inputVCF Input VCF file
genMap Genetic recombination map tsv file with four columns: (Chromosome, Position(bp), Rate(cM/Mb), Map(cM))
outPrefix Output file prefix
outLocation Output location
chromosomeNum Chromosome number.
Options:
--startBp=0 Coordinate in bp of start position. (default: %(default)s).
--endBp Coordinate in bp of end position.
--ploidy=2 Number of chromosomes expected for each genotype. (default: %(default)s).
--considerMultiAllelic=False
 Include multi-allelic variants in the output as separate records
--rescaleGeneticDistance=False
 Genetic distance is rescaled to be out of 100.0 cM
--includeLowQualAncestral=False
 Include variants where the ancestral information is low-quality (as indicated by lower-case x for AA=x in the VCF info column) (default: %(default)s).
--codingFunctionClassFile
 A python class file containing a function used to code each genotype as ‘1’ and ‘0’. coding_function(current_value, reference_allele, alternate_allele, ancestral_allele)
--sampleMembershipFile
 The call sample file containing four columns: sample, pop, super_pop, gender
--filterPops Populations to include in the calculation (ex. “FIN”)
--filterSuperPops
 Super populations to include in the calculation (ex. “EUR”)
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
selscan_ehh

Perform selscan’s calculation of EHH.

usage: scans.py selscan_ehh [-h] [--gapScale GAPSCALE] [--maf MAF]
                            [--threads THREADS] [--window WINDOW]
                            [--cutoff CUTOFF] [--maxExtend MAXEXTEND]
                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                            [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                            inputTped outFile locusID
Positional arguments:
inputTped Input tped file
outFile Output filepath
locusID The locus ID
Options:
--gapScale=20000
 Gap scale parameter in bp. If a gap is encountered between two snps > GAP_SCALE and < MAX_GAP, then the genetic distance is scaled by GAP_SCALE/GA (default: %(default)s).
--maf=0.05 Minor allele frequency. If a site has a MAF below this value, the program will not use it as a core snp. (default: %(default)s).
--threads=1 The number of threads to spawn during the calculation. Partitions loci across threads. (default: %(default)s).
--window=100000
 When calculating EHH, this is the length of the window in bp in each direction from the query locus (default: %(default)s).
--cutoff=0.05 The EHH decay cutoff (default: %(default)s).
--maxExtend=1000000
 The maximum distance an EHH decay curve is allowed to extend from the core. Set <= 0 for no restriction. (default: %(default)s).
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
selscan_ihs

Perform selscan’s calculation of iHS.

usage: scans.py selscan_ihs [-h] [--gapScale GAPSCALE] [--maf MAF]
                            [--threads THREADS] [--skipLowFreq]
                            [--dontWriteLeftRightiHH] [--truncOk]
                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                            [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                            inputTped outFile
Positional arguments:
inputTped Input tped file
outFile Output filepath
Options:
--gapScale=20000
 Gap scale parameter in bp. If a gap is encountered between two snps > GAP_SCALE and < MAX_GAP, then the genetic distance is scaled by GAP_SCALE/GA (default: %(default)s).
--maf=0.05 Minor allele frequency. If a site has a MAF below this value, the program will not use it as a core snp. (default: %(default)s).
--threads=1 The number of threads to spawn during the calculation. Partitions loci across threads. (default: %(default)s).
--skipLowFreq=False
  Do not include low frequency variants in the construction of haplotypes (default: %(default)s).
--dontWriteLeftRightiHH=False
  When writing out iHS, do not write out the constituent left and right ancestral and derived iHH scores for each locus.(default: %(default)s).
--truncOk=False
 If an EHH decay reaches the end of a sequence before reaching the cutoff, integrate the curve anyway. Normal function is to disregard the score for that core. (default: %(default)s).
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
selscan_nsl

Perform selscan’s calculation of nSL.

usage: scans.py selscan_nsl [-h] [--gapScale GAPSCALE] [--maf MAF]
                            [--threads THREADS] [--truncOk]
                            [--maxExtendNsl MAXEXTENDNSL]
                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                            [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                            inputTped outFile
Positional arguments:
inputTped Input tped file
outFile Output filepath
Options:
--gapScale=20000
 Gap scale parameter in bp. If a gap is encountered between two snps > GAP_SCALE and < MAX_GAP, then the genetic distance is scaled by GAP_SCALE/GA (default: %(default)s).
--maf=0.05 Minor allele frequency. If a site has a MAF below this value, the program will not use it as a core snp. (default: %(default)s).
--threads=1 The number of threads to spawn during the calculation. Partitions loci across threads. (default: %(default)s).
--truncOk=False
 If an EHH decay reaches the end of a sequence before reaching the cutoff, integrate the curve anyway. Normal function is to disregard the score for that core. (default: %(default)s).
--maxExtendNsl=100
 The maximum distance an nSL haplotype is allowed to extend from the core. Set <= 0 for no restriction. (default: %(default)s).
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
selscan_xpehh

Perform selscan’s calculation of XPEHH.

usage: scans.py selscan_xpehh [-h] [--gapScale GAPSCALE] [--maf MAF]
                              [--threads THREADS] [--truncOk]
                              [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                              [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                              inputTped outFile inputRefTped
Positional arguments:
inputTped Input tped file
outFile Output filepath
inputRefTped Input tped for the reference population to which the first is compared
Options:
--gapScale=20000
 Gap scale parameter in bp. If a gap is encountered between two snps > GAP_SCALE and < MAX_GAP, then the genetic distance is scaled by GAP_SCALE/GA (default: %(default)s).
--maf=0.05 Minor allele frequency. If a site has a MAF below this value, the program will not use it as a core snp. (default: %(default)s).
--threads=1 The number of threads to spawn during the calculation. Partitions loci across threads. (default: %(default)s).
--truncOk=False
 If an EHH decay reaches the end of a sequence before reaching the cutoff, integrate the curve anyway. Normal function is to disregard the score for that core. (default: %(default)s).
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
selscan_norm_nsl

Undocumented

Normalize Selscan’s nSL output

usage: scans.py selscan_norm_nsl [-h] [--bins BINS]
                                 [--critPercent CRITPERCENT]
                                 [--critValue CRITVALUE] [--minSNPs MINSNPS]
                                 [--qbins QBINS] [--winSize WINSIZE] [--bpWin]
                                 [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                 [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                                 inputFiles [inputFiles ...]
Positional arguments:
inputFiles A list of files delimited by whitespace for joint normalization. Expected format for iHS/nSL files (no header): <locus name> <physical pos> <freq> <ihh1/sL1> <ihh2/sL0> <ihs/nsl> Expected format for XP-EHH files (one line header): <locus name> <physical pos> <genetic pos> <freq1> <ihh1> <freq2> <ihh2> <xpehh>
Options:
--bins=100 The number of frequency bins in [0,1] for score normalization (default: %(default)s)
--critPercent=-1.0
 Set the critical value such that a SNP with iHS in the most extreme CRIT_PERCENT tails (two-tailed) is marked as an extreme SNP. Not used by default (default: %(default)s)
--critValue=2.0
 Set the critical value such that a SNP with |iHS| > CRIT_VAL is marked as an extreme SNP. Default as in Voight et al. (default: %(default)s)
--minSNPs=10 Only consider a bp window if it has at least this many SNPs (default: %(default)s)
--qbins=20 Outlying windows are binned by number of sites within each window. This is the number of quantile bins to use. (default: %(default)s)
--winSize=100000
 GThe non-overlapping window size for calculating the percentage of extreme SNPs (default: %(default)s)
--bpWin=False If set, will use windows of a constant bp size with varying number of SNPs
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
selscan_norm_ihs

Undocumented

Normalize Selscan’s iHS output

usage: scans.py selscan_norm_ihs [-h] [--bins BINS]
                                 [--critPercent CRITPERCENT]
                                 [--critValue CRITVALUE] [--minSNPs MINSNPS]
                                 [--qbins QBINS] [--winSize WINSIZE] [--bpWin]
                                 [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                 [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                                 inputFiles [inputFiles ...]
Positional arguments:
inputFiles A list of files delimited by whitespace for joint normalization. Expected format for iHS/nSL files (no header): <locus name> <physical pos> <freq> <ihh1/sL1> <ihh2/sL0> <ihs/nsl> Expected format for XP-EHH files (one line header): <locus name> <physical pos> <genetic pos> <freq1> <ihh1> <freq2> <ihh2> <xpehh>
Options:
--bins=100 The number of frequency bins in [0,1] for score normalization (default: %(default)s)
--critPercent=-1.0
 Set the critical value such that a SNP with iHS in the most extreme CRIT_PERCENT tails (two-tailed) is marked as an extreme SNP. Not used by default (default: %(default)s)
--critValue=2.0
 Set the critical value such that a SNP with |iHS| > CRIT_VAL is marked as an extreme SNP. Default as in Voight et al. (default: %(default)s)
--minSNPs=10 Only consider a bp window if it has at least this many SNPs (default: %(default)s)
--qbins=20 Outlying windows are binned by number of sites within each window. This is the number of quantile bins to use. (default: %(default)s)
--winSize=100000
 GThe non-overlapping window size for calculating the percentage of extreme SNPs (default: %(default)s)
--bpWin=False If set, will use windows of a constant bp size with varying number of SNPs
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
selscan_norm_xpehh

Undocumented

Normalize Selscan’s XPEHH output

usage: scans.py selscan_norm_xpehh [-h] [--bins BINS]
                                   [--critPercent CRITPERCENT]
                                   [--critValue CRITVALUE] [--minSNPs MINSNPS]
                                   [--qbins QBINS] [--winSize WINSIZE]
                                   [--bpWin]
                                   [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                   [--version] [--tmpDir TMPDIR]
                                   [--tmpDirKeep]
                                   inputFiles [inputFiles ...]
Positional arguments:
inputFiles A list of files delimited by whitespace for joint normalization. Expected format for iHS/nSL files (no header): <locus name> <physical pos> <freq> <ihh1/sL1> <ihh2/sL0> <ihs/nsl> Expected format for XP-EHH files (one line header): <locus name> <physical pos> <genetic pos> <freq1> <ihh1> <freq2> <ihh2> <xpehh>
Options:
--bins=100 The number of frequency bins in [0,1] for score normalization (default: %(default)s)
--critPercent=-1.0
 Set the critical value such that a SNP with iHS in the most extreme CRIT_PERCENT tails (two-tailed) is marked as an extreme SNP. Not used by default (default: %(default)s)
--critValue=2.0
 Set the critical value such that a SNP with |iHS| > CRIT_VAL is marked as an extreme SNP. Default as in Voight et al. (default: %(default)s)
--minSNPs=10 Only consider a bp window if it has at least this many SNPs (default: %(default)s)
--qbins=20 Outlying windows are binned by number of sites within each window. This is the number of quantile bins to use. (default: %(default)s)
--winSize=100000
 GThe non-overlapping window size for calculating the percentage of extreme SNPs (default: %(default)s)
--bpWin=False If set, will use windows of a constant bp size with varying number of SNPs
--loglevel=DEBUG
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.
store_selscan_results_in_db

Aggregate results from selscan in to a SQLite database via helper JSON metadata file.

usage: scans.py store_selscan_results_in_db [-h]
                                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                            [--version] [--tmpDir TMPDIR]
                                            [--tmpDirKeep]
                                            inputFile outFile
Positional arguments:
inputFile Input *.metadata.json file
outFile Output SQLite filepath
Options:
--loglevel=INFO
 

Verboseness of output. [default: %(default)s]

Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION

--version, -V show program’s version number and exit
--tmpDir=/tmp Base directory for temp files. [default: %(default)s]
--tmpDirKeep=False
 Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

3.2. cms_modeller.py

This script contains command-line utilities for exploratory fitting of demographic models to population genetic data.

usage: cms_modeller.py [-h] {target_stats,bootstrap,point,grid,optimize} ...
Sub-commands:
target_stats

perform per-site(/per-site-pair) calculations of population summary statistics for model target values

usage: cms_modeller.py target_stats [-h] [--freqs] [--ld] [--fst]
                                    inputTpeds recomFile regions out
Positional arguments:
inputTpeds comma-delimited list of input tped files (only one file per pop being modelled; must run chroms separately or concatenate)
recomFile recombination map
regions tab-separated file with putative neutral regions
out outfile prefix
Options:
--freqs=False calculate summary statistics from within-population allele frequencies
--ld=False calculate summary statistics from within-population linkage disequilibrium
--fst=False calculate summary statistics from population comparison using allele frequencies
bootstrap

perform bootstrap estimates of population summary statistics in order to finalize model target values

usage: cms_modeller.py bootstrap [-h] [--in_freqs IN_FREQS] [--in_ld IN_LD]
                                 [--in_fst IN_FST]
                                 nBootstrapReps out
Positional arguments:
nBootstrapReps number of bootstraps to perform in order to estimate standard error of the dataset (should converge for reasonably small n)
out outfile prefix
Options:
--in_freqs comma-delimited list of infiles with per-site calculations for population. One file per population – for bootstrap estimates of genome-wide values, should first concatenate per-chrom files
--in_ld comma-delimited list of infiles with per-site-pair calculations for population. One file per population – for bootstrap estimates of genome-wide values, should first concatenate per-chrom files
--in_fst comma-delimited list of infiles with per-site calculations for population pair. One file per population-pair – for bootstrap estimates of genome-wide values, should first concatenate per-chrom files
point

run simulates of a point in parameter-space

usage: cms_modeller.py point [-h] [--cosiBuild COSIBUILD]
                             [--dropSings DROPSINGS] [--genmapRandomRegions]
                             [--stopAfterMinutes STOPAFTERMINUTES]
                             [--calcError CALCERROR]
                             [--targetvalsFile TARGETVALSFILE] [--plotStats]
                             inputParamFile nCoalescentReps outputDir
Positional arguments:
inputParamFile file with model specifications for input
nCoalescentReps
 num reps
outputDir location to write cosi output
Options:
--cosiBuild=/Users/vitti/Desktop/COSI_DEBUG_TEST/cosi-2.0/coalescent
 which version of cosi to run? (*automate installation)
--dropSings randomly thin global singletons from output dataset (i.e., to model ascertainment bias)
--genmapRandomRegions=False
 cosi option to sub-sample genetic map randomly from input
--stopAfterMinutes
 cosi option to terminate simulations
--calcError file specifying dimensions of error function to use. if unspecified, defaults to all. first line = stats, second line = pops
--targetvalsFile
 targetvalsfile for model
--plotStats=False
 visualize goodness-of-fit to model targets
grid

run grid search

usage: cms_modeller.py grid [-h] [--cosiBuild COSIBUILD]
                            [--dropSings DROPSINGS] [--genmapRandomRegions]
                            [--stopAfterMinutes STOPAFTERMINUTES]
                            [--calcError CALCERROR]
                            inputParamFile nCoalescentReps outputDir
                            grid_inputdimensionsfile
Positional arguments:
inputParamFile file with model specifications for input
nCoalescentReps
 num reps
outputDir location to write cosi output
grid_inputdimensionsfile
 file with specifications of grid search. each parameter to vary is indicated: KEY INDEX [VALUES]
Options:
--cosiBuild=/Users/vitti/Desktop/COSI_DEBUG_TEST/cosi-2.0/coalescent
 which version of cosi to run? (*automate installation)
--dropSings randomly thin global singletons from output dataset (i.e., to model ascertainment bias)
--genmapRandomRegions=False
 cosi option to sub-sample genetic map randomly from input
--stopAfterMinutes
 cosi option to terminate simulations
--calcError file specifying dimensions of error function to use. if unspecified, defaults to all. first line = stats, second line = pops
optimize

run optimization algorithm to fit model parameters

usage: cms_modeller.py optimize [-h] [--cosiBuild COSIBUILD]
                                [--dropSings DROPSINGS]
                                [--genmapRandomRegions]
                                [--stopAfterMinutes STOPAFTERMINUTES]
                                [--calcError CALCERROR] [--stepSize STEPSIZE]
                                [--method METHOD]
                                inputParamFile nCoalescentReps outputDir
                                optimize_inputdimensionsfile
Positional arguments:
inputParamFile file with model specifications for input
nCoalescentReps
 num reps
outputDir location to write cosi output
optimize_inputdimensionsfile
 file with specifications of optimization. each parameter to vary is indicated: KEY INDEX
Options:
--cosiBuild=/Users/vitti/Desktop/COSI_DEBUG_TEST/cosi-2.0/coalescent
 which version of cosi to run? (*automate installation)
--dropSings randomly thin global singletons from output dataset (i.e., to model ascertainment bias)
--genmapRandomRegions=False
 cosi option to sub-sample genetic map randomly from input
--stopAfterMinutes
 cosi option to terminate simulations
--calcError file specifying dimensions of error function to use. if unspecified, defaults to all. first line = stats, second line = pops
--stepSize scaled step size (i.e. whole range = 1)
--method=SLSQP algorithm to pass to scipy.optimize

3.3. likes_from_model.py

This script contains command-line utilities for generating probability distributions for component scores from pre-specified demographic model(s).

usage: likes_from_model.py [-h]
                           {run_neut_sims,get_sel_trajs,run_sel_sims,scores_from_sims,likes_from_scores}
                           ...
Sub-commands:
run_neut_sims

run neutral simulations

usage: likes_from_model.py run_neut_sims [-h] [--cosiBuild COSIBUILD]
                                         [--dropSings DROPSINGS]
                                         [--genmapRandomRegions]
                                         n inputParamFile outputDir
Positional arguments:
n num replicates to run
inputParamFile file with model specifications for input
outputDir location to write cosi output
Options:
--cosiBuild=/Users/vitti/Desktop/COSI_DEBUG_TEST/cosi-2.0/coalescent
 which version of cosi to run? (*automate installation)
--dropSings randomly thin global singletons from output dataset to model ascertainment bias
--genmapRandomRegions=False
 cosi option to sub-sample genetic map randomly from input
get_sel_trajs

run forward simulations of selection trajectories and perform rejection sampling to populate selscenarios by final allele frequency before running coalescent simulations for entire sample

usage: likes_from_model.py get_sel_trajs [-h] [--cosiBuild COSIBUILD]
                                         [--dropSings DROPSINGS]
                                         [--genmapRandomRegions]
                                         [--freqRange FREQRANGE]
                                         [--nBins NBINS]
                                         nSimsPerBin maxSteps inputParamFile
                                         outputDir
Positional arguments:
nSimsPerBin number of selection trajectories to generate per allele frequency bin
maxSteps number of attempts to generate a selection trajectory before re-sampling selection coefficient and start time.
inputParamFile file with model specifications for input
outputDir location to write cosi output
Options:
--cosiBuild=/Users/vitti/Desktop/COSI_DEBUG_TEST/cosi-2.0/coalescent
 which version of cosi to run? (*automate installation)
--dropSings randomly thin global singletons from output dataset to model ascertainment bias
--genmapRandomRegions=False
 cosi option to sub-sample genetic map randomly from input
--freqRange=.05-.95
 range of final selected allele frequencies to simulate, e.g. .05-.95
--nBins=9 number of frequency bins
run_sel_sims

run sel. simulations

usage: likes_from_model.py run_sel_sims [-h] [--cosiBuild COSIBUILD]
                                        [--dropSings DROPSINGS]
                                        [--genmapRandomRegions]
                                        [--freqRange FREQRANGE]
                                        [--nBins NBINS]
                                        n trajDir inputParamFile outputDir
Positional arguments:
n num replicates to run per sel scenario
trajDir location of simulated trajectories (i.e. outputDir from get_sel_trajs)
inputParamFile file with model specifications for input
outputDir location to write cosi output
Options:
--cosiBuild=/Users/vitti/Desktop/COSI_DEBUG_TEST/cosi-2.0/coalescent
 which version of cosi to run? (*automate installation)
--dropSings randomly thin global singletons from output dataset to model ascertainment bias
--genmapRandomRegions=False
 cosi option to sub-sample genetic map randomly from input
--freqRange=.05-.95
 range of final selected allele frequencies to simulate, e.g. .05-.95
--nBins=9 number of frequency bins
scores_from_sims

get scores from simulations

usage: likes_from_model.py scores_from_sims [-h] [--inputTped INPUTTPED]
                                            [--inputIhs INPUTIHS]
                                            [--inputdelIhh INPUTDELIHH]
                                            [--inputXpehh INPUTXPEHH] [--ihs]
                                            [--delIhh] [--xpehh XPEHH]
                                            [--fst_deldaf FST_DELDAF]
                                            [--normalizeIhs NORMALIZEIHS]
                                            [--normalizeDelIhh NORMALIZEDELIHH]
                                            [--normalizeXpehh NORMALIZEXPEHH]
                                            outputFilename
Positional arguments:
outputFilename where to write scorefile
Options:
--inputTped tped from which to calculate score
--inputIhs iHS from which to calculate delihh
--inputdelIhh delIhh from which to calculate norm
--inputXpehh Xp-ehh from which to calculate norm
--ihs=False Undocumented
--delIhh=False Undocumented
--xpehh inputTped for altpop
--fst_deldaf inputTped for altpop
--normalizeIhs filename for parameters to normalize to; if not given then will by default normalize file to its own global dist
--normalizeDelIhh
 filename for parameters to normalize to; if not given then will by default normalize file to its own global dist
--normalizeXpehh
 filename for parameters to normalize to; if not given then will by default normalize file to its own global dist
likes_from_scores

get component score probability distributions from scores

usage: likes_from_model.py likes_from_scores [-h] [--thinToSize] [--ihs]
                                             [--delihh] [--xp] [--deldaf]
                                             [--fst] [--freqRange FREQRANGE]
                                             [--nBins NBINS]
                                             neutFile selFile selPos
                                             nLikesBins outPrefix
Positional arguments:
neutFile file with scores for neutral scenarios (normalized if necessary)
selFile file with scores for selected scenarios (normalized if necessary)
selPos position of causal variant
nLikesBins number of bins to use for histogram to approximate probability density function
outPrefix save file as
Options:
--thinToSize=False
 subsample from simulated SNPs (since nSel << nLinked < nNeut)
--ihs=False Undocumented
--delihh=False Undocumented
--xp=False Undocumented
--deldaf=False Undocumented
--fst=False Undocumented
--freqRange=.05-.95
 range of final selected allele frequencies to simulate, e.g. .05-.95
--nBins=9 number of frequency bins

3.4. composite.py

This script contains command-line utilities for combining component statistics – i.e., the final step of the CMS 2.0 pipeline.

usage: composite.py [-h]
                    {poppair,outgroups,bayesian_gw,bayesian_region,ml_region}
                    ...
Sub-commands:
poppair

collate all component statistics for a given population pair (as a prerequisite to more sophisticated group comparisons

usage: composite.py poppair [-h] [--xp_reverse_pops] [--deldaf_reverse_pops]
                            in_ihs_file in_delihh_file in_xp_file
                            in_fst_deldaf_file outfile
Positional arguments:
in_ihs_file file with normalized iHS values for putative selpop
in_delihh_file file with normalized delIhh values for putative selpop
in_xp_file file with normalized XP-EHH values
in_fst_deldaf_file
 file with Fst, delDaf values for poppair
outfile file to write with collated scores
Options:
--xp_reverse_pops=False
 include if the putative selpop for outcome is the altpop in XPEHH (and vice versa)
--deldaf_reverse_pops=False
 finclude if the putative selpop for outcome is the altpop in delDAF (and vice versa)
outgroups

combine scores from comparisons of a putative selected pop to 2+ outgroups.

usage: composite.py outgroups [-h] infiles likesfile outfile
Positional arguments:
infiles comma-delimited set of pop-pair comparisons
likesfile text file where probability distributions are specified for component scores
outfile file to write with finalized scores
bayesian_gw

default algorithm and weighting, genome-wide

usage: composite.py bayesian_gw [-h] inputparamfile
Positional arguments:
inputparamfile file with specifications for input
bayesian_region

default algorithm and weighting, within-region

usage: composite.py bayesian_region [-h]
                                    chrom startBp endBp selPop altPops
                                    demModel
Positional arguments:
chrom chromosome containing region
startBp start location of region in basepairs
endBp end location of region in basepairs
selPop Undocumented
altPops comma-delimited
demModel Undocumented
ml_region

machine learning algorithm (within-region)

usage: composite.py ml_region [-h] chrom startBp endBp selPop altPops demModel
Positional arguments:
chrom chromosome containing region
startBp start location of region in basepairs
endBp end location of region in basepairs
selPop Undocumented
altPops comma-delimited
demModel Undocumented