3. Command line tools¶

3.1. scans.py¶

This script contains command-line utilities for calculating EHH-based scans for positive selection in genomes, including EHH, iHS, and XP-EHH.

usage: scans.py subcommand

Sub-commands:

selscan_file_conversion

Process a bgzipped-VCF (such as those included in the Phase 3 1000 Genomes release) into a gzip-compressed tped file of the sort expected by selscan.

usage: scans.py selscan_file_conversion [-h] [--startBp STARTBP]
                                        [--endBp ENDBP] [--ploidy PLOIDY]
                                        [--considerMultiAllelic]
                                        [--rescaleGeneticDistance]
                                        [--includeLowQualAncestral]
                                        [--codingFunctionClassFile CODINGFUNCTIONCLASSFILE]
                                        [--sampleMembershipFile SAMPLEMEMBERSHIPFILE]
                                        [--filterPops FILTERPOPS [FILTERPOPS ...]]
                                        [--filterSuperPops FILTERSUPERPOPS [FILTERSUPERPOPS ...]]
                                        [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                        [--version] [--tmpDir TMPDIR]
                                        [--tmpDirKeep]
                                        inputVCF genMap outPrefix outLocation
                                        chromosomeNum

Positional arguments:

`inputVCF`	Input VCF file
`genMap`	Genetic recombination map tsv file with four columns: (Chromosome, Position(bp), Rate(cM/Mb), Map(cM))
`outPrefix`	Output file prefix
`outLocation`	Output location
`chromosomeNum`	Chromosome number.

Options:

`--startBp=0`	Coordinate in bp of start position. (default: %(default)s).
`--endBp`	Coordinate in bp of end position.
`--ploidy=2`	Number of chromosomes expected for each genotype. (default: %(default)s).
`--considerMultiAllelic=False`
	Include multi-allelic variants in the output as separate records
`--rescaleGeneticDistance=False`
	Genetic distance is rescaled to be out of 100.0 cM
`--includeLowQualAncestral=False`
	Include variants where the ancestral information is low-quality (as indicated by lower-case x for AA=x in the VCF info column) (default: %(default)s).
`--codingFunctionClassFile`
	A python class file containing a function used to code each genotype as ‘1’ and ‘0’. coding_function(current_value, reference_allele, alternate_allele, ancestral_allele)
`--sampleMembershipFile`
	The call sample file containing four columns: sample, pop, super_pop, gender
`--filterPops`	Populations to include in the calculation (ex. “FIN”)
`--filterSuperPops`
	Super populations to include in the calculation (ex. “EUR”)
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmpDir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmpDirKeep=False`
	Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

selscan_ehh

Perform selscan’s calculation of EHH.

usage: scans.py selscan_ehh [-h] [--gapScale GAPSCALE] [--maf MAF]
                            [--threads THREADS] [--window WINDOW]
                            [--cutoff CUTOFF] [--maxExtend MAXEXTEND]
                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                            [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                            inputTped outFile locusID

Positional arguments:

`inputTped`	Input tped file
`outFile`	Output filepath
`locusID`	The locus ID

Options:

`--gapScale=20000`
	Gap scale parameter in bp. If a gap is encountered between two snps > GAP_SCALE and < MAX_GAP, then the genetic distance is scaled by GAP_SCALE/GA (default: %(default)s).
`--maf=0.05`	Minor allele frequency. If a site has a MAF below this value, the program will not use it as a core snp. (default: %(default)s).
`--threads=1`	The number of threads to spawn during the calculation. Partitions loci across threads. (default: %(default)s).
`--window=100000`
	When calculating EHH, this is the length of the window in bp in each direction from the query locus (default: %(default)s).
`--cutoff=0.05`	The EHH decay cutoff (default: %(default)s).
`--maxExtend=1000000`
	The maximum distance an EHH decay curve is allowed to extend from the core. Set <= 0 for no restriction. (default: %(default)s).
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmpDir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmpDirKeep=False`
	Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

selscan_ihs

Perform selscan’s calculation of iHS.

usage: scans.py selscan_ihs [-h] [--gapScale GAPSCALE] [--maf MAF]
                            [--threads THREADS] [--skipLowFreq]
                            [--dontWriteLeftRightiHH] [--truncOk]
                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                            [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                            inputTped outFile

Positional arguments:

`inputTped`	Input tped file
`outFile`	Output filepath

Options:

`--gapScale=20000`
	Gap scale parameter in bp. If a gap is encountered between two snps > GAP_SCALE and < MAX_GAP, then the genetic distance is scaled by GAP_SCALE/GA (default: %(default)s).
`--maf=0.05`	Minor allele frequency. If a site has a MAF below this value, the program will not use it as a core snp. (default: %(default)s).
`--threads=1`	The number of threads to spawn during the calculation. Partitions loci across threads. (default: %(default)s).
`--skipLowFreq=False`
	Do not include low frequency variants in the construction of haplotypes (default: %(default)s).
`--dontWriteLeftRightiHH=False`
	When writing out iHS, do not write out the constituent left and right ancestral and derived iHH scores for each locus.(default: %(default)s).
`--truncOk=False`
	If an EHH decay reaches the end of a sequence before reaching the cutoff, integrate the curve anyway. Normal function is to disregard the score for that core. (default: %(default)s).
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmpDir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmpDirKeep=False`
	Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

selscan_nsl

Perform selscan’s calculation of nSL.

usage: scans.py selscan_nsl [-h] [--gapScale GAPSCALE] [--maf MAF]
                            [--threads THREADS] [--truncOk]
                            [--maxExtendNsl MAXEXTENDNSL]
                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                            [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                            inputTped outFile

Positional arguments:

`inputTped`	Input tped file
`outFile`	Output filepath

Options:

`--gapScale=20000`
	Gap scale parameter in bp. If a gap is encountered between two snps > GAP_SCALE and < MAX_GAP, then the genetic distance is scaled by GAP_SCALE/GA (default: %(default)s).
`--maf=0.05`	Minor allele frequency. If a site has a MAF below this value, the program will not use it as a core snp. (default: %(default)s).
`--threads=1`	The number of threads to spawn during the calculation. Partitions loci across threads. (default: %(default)s).
`--truncOk=False`
	If an EHH decay reaches the end of a sequence before reaching the cutoff, integrate the curve anyway. Normal function is to disregard the score for that core. (default: %(default)s).
`--maxExtendNsl=100`
	The maximum distance an nSL haplotype is allowed to extend from the core. Set <= 0 for no restriction. (default: %(default)s).
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmpDir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmpDirKeep=False`
	Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

selscan_xpehh

Perform selscan’s calculation of XPEHH.

usage: scans.py selscan_xpehh [-h] [--gapScale GAPSCALE] [--maf MAF]
                              [--threads THREADS] [--truncOk]
                              [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                              [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                              inputTped outFile inputRefTped

Positional arguments:

`inputTped`	Input tped file
`outFile`	Output filepath
`inputRefTped`	Input tped for the reference population to which the first is compared

Options:

`--gapScale=20000`
	Gap scale parameter in bp. If a gap is encountered between two snps > GAP_SCALE and < MAX_GAP, then the genetic distance is scaled by GAP_SCALE/GA (default: %(default)s).
`--maf=0.05`	Minor allele frequency. If a site has a MAF below this value, the program will not use it as a core snp. (default: %(default)s).
`--threads=1`	The number of threads to spawn during the calculation. Partitions loci across threads. (default: %(default)s).
`--truncOk=False`
	If an EHH decay reaches the end of a sequence before reaching the cutoff, integrate the curve anyway. Normal function is to disregard the score for that core. (default: %(default)s).
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmpDir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmpDirKeep=False`
	Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

selscan_norm_nsl

Undocumented

Normalize Selscan’s nSL output

usage: scans.py selscan_norm_nsl [-h] [--bins BINS]
                                 [--critPercent CRITPERCENT]
                                 [--critValue CRITVALUE] [--minSNPs MINSNPS]
                                 [--qbins QBINS] [--winSize WINSIZE] [--bpWin]
                                 [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                 [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                                 inputFiles [inputFiles ...]

Positional arguments:

inputFiles

A list of files delimited by whitespace for joint normalization. Expected format for iHS/nSL files (no header): <locus name> <physical pos> <freq> <ihh1/sL1> <ihh2/sL0> <ihs/nsl> Expected format for XP-EHH files (one line header): <locus name> <physical pos> <genetic pos> <freq1> <ihh1> <freq2> <ihh2> <xpehh>

Options:

`--bins=100`	The number of frequency bins in [0,1] for score normalization (default: %(default)s)
`--critPercent=-1.0`
	Set the critical value such that a SNP with iHS in the most extreme CRIT_PERCENT tails (two-tailed) is marked as an extreme SNP. Not used by default (default: %(default)s)
`--critValue=2.0`
	Set the critical value such that a SNP with \|iHS\| > CRIT_VAL is marked as an extreme SNP. Default as in Voight et al. (default: %(default)s)
`--minSNPs=10`	Only consider a bp window if it has at least this many SNPs (default: %(default)s)
`--qbins=20`	Outlying windows are binned by number of sites within each window. This is the number of quantile bins to use. (default: %(default)s)
`--winSize=100000`
	GThe non-overlapping window size for calculating the percentage of extreme SNPs (default: %(default)s)
`--bpWin=False`	If set, will use windows of a constant bp size with varying number of SNPs
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmpDir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmpDirKeep=False`
	Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

selscan_norm_ihs

Undocumented

Normalize Selscan’s iHS output

usage: scans.py selscan_norm_ihs [-h] [--bins BINS]
                                 [--critPercent CRITPERCENT]
                                 [--critValue CRITVALUE] [--minSNPs MINSNPS]
                                 [--qbins QBINS] [--winSize WINSIZE] [--bpWin]
                                 [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                 [--version] [--tmpDir TMPDIR] [--tmpDirKeep]
                                 inputFiles [inputFiles ...]

Positional arguments:

inputFiles

A list of files delimited by whitespace for joint normalization. Expected format for iHS/nSL files (no header): <locus name> <physical pos> <freq> <ihh1/sL1> <ihh2/sL0> <ihs/nsl> Expected format for XP-EHH files (one line header): <locus name> <physical pos> <genetic pos> <freq1> <ihh1> <freq2> <ihh2> <xpehh>

Options:

`--bins=100`	The number of frequency bins in [0,1] for score normalization (default: %(default)s)
`--critPercent=-1.0`
	Set the critical value such that a SNP with iHS in the most extreme CRIT_PERCENT tails (two-tailed) is marked as an extreme SNP. Not used by default (default: %(default)s)
`--critValue=2.0`
	Set the critical value such that a SNP with \|iHS\| > CRIT_VAL is marked as an extreme SNP. Default as in Voight et al. (default: %(default)s)
`--minSNPs=10`	Only consider a bp window if it has at least this many SNPs (default: %(default)s)
`--qbins=20`	Outlying windows are binned by number of sites within each window. This is the number of quantile bins to use. (default: %(default)s)
`--winSize=100000`
	GThe non-overlapping window size for calculating the percentage of extreme SNPs (default: %(default)s)
`--bpWin=False`	If set, will use windows of a constant bp size with varying number of SNPs
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmpDir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmpDirKeep=False`
	Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

selscan_norm_xpehh

Undocumented

Normalize Selscan’s XPEHH output

usage: scans.py selscan_norm_xpehh [-h] [--bins BINS]
                                   [--critPercent CRITPERCENT]
                                   [--critValue CRITVALUE] [--minSNPs MINSNPS]
                                   [--qbins QBINS] [--winSize WINSIZE]
                                   [--bpWin]
                                   [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                   [--version] [--tmpDir TMPDIR]
                                   [--tmpDirKeep]
                                   inputFiles [inputFiles ...]

Positional arguments:

inputFiles

A list of files delimited by whitespace for joint normalization. Expected format for iHS/nSL files (no header): <locus name> <physical pos> <freq> <ihh1/sL1> <ihh2/sL0> <ihs/nsl> Expected format for XP-EHH files (one line header): <locus name> <physical pos> <genetic pos> <freq1> <ihh1> <freq2> <ihh2> <xpehh>

Options:

`--bins=100`	The number of frequency bins in [0,1] for score normalization (default: %(default)s)
`--critPercent=-1.0`
	Set the critical value such that a SNP with iHS in the most extreme CRIT_PERCENT tails (two-tailed) is marked as an extreme SNP. Not used by default (default: %(default)s)
`--critValue=2.0`
	Set the critical value such that a SNP with \|iHS\| > CRIT_VAL is marked as an extreme SNP. Default as in Voight et al. (default: %(default)s)
`--minSNPs=10`	Only consider a bp window if it has at least this many SNPs (default: %(default)s)
`--qbins=20`	Outlying windows are binned by number of sites within each window. This is the number of quantile bins to use. (default: %(default)s)
`--winSize=100000`
	GThe non-overlapping window size for calculating the percentage of extreme SNPs (default: %(default)s)
`--bpWin=False`	If set, will use windows of a constant bp size with varying number of SNPs
`--loglevel=DEBUG`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmpDir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmpDirKeep=False`
	Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

store_selscan_results_in_db

Aggregate results from selscan in to a SQLite database via helper JSON metadata file.

usage: scans.py store_selscan_results_in_db [-h]
                                            [--loglevel {DEBUG,INFO,WARNING,ERROR,CRITICAL,EXCEPTION}]
                                            [--version] [--tmpDir TMPDIR]
                                            [--tmpDirKeep]
                                            inputFile outFile

Positional arguments:

`inputFile`	Input *.metadata.json file
`outFile`	Output SQLite filepath

Options:

`--loglevel=INFO`
	Verboseness of output. [default: %(default)s] Possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL, EXCEPTION
`--version, -V`	show program’s version number and exit
`--tmpDir=/tmp`	Base directory for temp files. [default: %(default)s]
`--tmpDirKeep=False`
	Keep the tmpDir if an exception occurs while running. Default is to delete all temp files at the end, even if there’s a failure.

3.2. cms_modeller.py¶

This script contains command-line utilities for exploratory fitting of demographic models to population genetic data.

usage: cms_modeller.py [-h] {target_stats,bootstrap,point,grid,optimize} ...

Sub-commands:

target_stats

perform per-site(/per-site-pair) calculations of population summary statistics for model target values

usage: cms_modeller.py target_stats [-h] [--freqs] [--ld] [--fst]
                                    inputTpeds recomFile regions out

Positional arguments:

`inputTpeds`	comma-delimited list of input tped files (only one file per pop being modelled; must run chroms separately or concatenate)
`recomFile`	recombination map
`regions`	tab-separated file with putative neutral regions
`out`	outfile prefix

Options:

`--freqs=False`	calculate summary statistics from within-population allele frequencies
`--ld=False`	calculate summary statistics from within-population linkage disequilibrium
`--fst=False`	calculate summary statistics from population comparison using allele frequencies

bootstrap

perform bootstrap estimates of population summary statistics in order to finalize model target values

usage: cms_modeller.py bootstrap [-h] [--in_freqs IN_FREQS] [--in_ld IN_LD]
                                 [--in_fst IN_FST]
                                 nBootstrapReps out

Positional arguments:

`nBootstrapReps`	number of bootstraps to perform in order to estimate standard error of the dataset (should converge for reasonably small n)
`out`	outfile prefix

Options:

`--in_freqs`	comma-delimited list of infiles with per-site calculations for population. One file per population – for bootstrap estimates of genome-wide values, should first concatenate per-chrom files
`--in_ld`	comma-delimited list of infiles with per-site-pair calculations for population. One file per population – for bootstrap estimates of genome-wide values, should first concatenate per-chrom files
`--in_fst`	comma-delimited list of infiles with per-site calculations for population pair. One file per population-pair – for bootstrap estimates of genome-wide values, should first concatenate per-chrom files

point

run simulates of a point in parameter-space

usage: cms_modeller.py point [-h] [--cosiBuild COSIBUILD]
                             [--dropSings DROPSINGS] [--genmapRandomRegions]
                             [--stopAfterMinutes STOPAFTERMINUTES]
                             [--calcError CALCERROR]
                             [--targetvalsFile TARGETVALSFILE] [--plotStats]
                             inputParamFile nCoalescentReps outputDir

Positional arguments:

`inputParamFile`	file with model specifications for input
`nCoalescentReps`
	num reps
`outputDir`	location to write cosi output

Options:

`--cosiBuild=/Users/vitti/Desktop/COSI_DEBUG_TEST/cosi-2.0/coalescent`
	which version of cosi to run? (*automate installation)
`--dropSings`	randomly thin global singletons from output dataset (i.e., to model ascertainment bias)
`--genmapRandomRegions=False`
	cosi option to sub-sample genetic map randomly from input
`--stopAfterMinutes`
	cosi option to terminate simulations
`--calcError`	file specifying dimensions of error function to use. if unspecified, defaults to all. first line = stats, second line = pops
`--targetvalsFile`
	targetvalsfile for model
`--plotStats=False`
	visualize goodness-of-fit to model targets

grid

run grid search

usage: cms_modeller.py grid [-h] [--cosiBuild COSIBUILD]
                            [--dropSings DROPSINGS] [--genmapRandomRegions]
                            [--stopAfterMinutes STOPAFTERMINUTES]
                            [--calcError CALCERROR]
                            inputParamFile nCoalescentReps outputDir
                            grid_inputdimensionsfile

Positional arguments:

`inputParamFile`	file with model specifications for input
`nCoalescentReps`
	num reps
`outputDir`	location to write cosi output
`grid_inputdimensionsfile`
	file with specifications of grid search. each parameter to vary is indicated: KEY INDEX [VALUES]

Options:

`--cosiBuild=/Users/vitti/Desktop/COSI_DEBUG_TEST/cosi-2.0/coalescent`
	which version of cosi to run? (*automate installation)
`--dropSings`	randomly thin global singletons from output dataset (i.e., to model ascertainment bias)
`--genmapRandomRegions=False`
	cosi option to sub-sample genetic map randomly from input
`--stopAfterMinutes`
	cosi option to terminate simulations
`--calcError`	file specifying dimensions of error function to use. if unspecified, defaults to all. first line = stats, second line = pops

optimize

run optimization algorithm to fit model parameters

usage: cms_modeller.py optimize [-h] [--cosiBuild COSIBUILD]
                                [--dropSings DROPSINGS]
                                [--genmapRandomRegions]
                                [--stopAfterMinutes STOPAFTERMINUTES]
                                [--calcError CALCERROR] [--stepSize STEPSIZE]
                                [--method METHOD]
                                inputParamFile nCoalescentReps outputDir
                                optimize_inputdimensionsfile

Positional arguments:

`inputParamFile`	file with model specifications for input
`nCoalescentReps`
	num reps
`outputDir`	location to write cosi output
`optimize_inputdimensionsfile`
	file with specifications of optimization. each parameter to vary is indicated: KEY INDEX

Options:

`--cosiBuild=/Users/vitti/Desktop/COSI_DEBUG_TEST/cosi-2.0/coalescent`
	which version of cosi to run? (*automate installation)
`--dropSings`	randomly thin global singletons from output dataset (i.e., to model ascertainment bias)
`--genmapRandomRegions=False`
	cosi option to sub-sample genetic map randomly from input
`--stopAfterMinutes`
	cosi option to terminate simulations
`--calcError`	file specifying dimensions of error function to use. if unspecified, defaults to all. first line = stats, second line = pops
`--stepSize`	scaled step size (i.e. whole range = 1)
`--method=SLSQP`	algorithm to pass to scipy.optimize

3.3. likes_from_model.py¶

This script contains command-line utilities for generating probability distributions for component scores from pre-specified demographic model(s).

usage: likes_from_model.py [-h]
                           {run_neut_sims,get_sel_trajs,run_sel_sims,scores_from_sims,likes_from_scores}
                           ...

Sub-commands:

run_neut_sims

run neutral simulations

usage: likes_from_model.py run_neut_sims [-h] [--cosiBuild COSIBUILD]
                                         [--dropSings DROPSINGS]
                                         [--genmapRandomRegions]
                                         n inputParamFile outputDir

Positional arguments:

`n`	num replicates to run
`inputParamFile`	file with model specifications for input
`outputDir`	location to write cosi output

Options:

`--cosiBuild=/Users/vitti/Desktop/COSI_DEBUG_TEST/cosi-2.0/coalescent`
	which version of cosi to run? (*automate installation)
`--dropSings`	randomly thin global singletons from output dataset to model ascertainment bias
`--genmapRandomRegions=False`
	cosi option to sub-sample genetic map randomly from input

get_sel_trajs

run forward simulations of selection trajectories and perform rejection sampling to populate selscenarios by final allele frequency before running coalescent simulations for entire sample

usage: likes_from_model.py get_sel_trajs [-h] [--cosiBuild COSIBUILD]
                                         [--dropSings DROPSINGS]
                                         [--genmapRandomRegions]
                                         [--freqRange FREQRANGE]
                                         [--nBins NBINS]
                                         nSimsPerBin maxSteps inputParamFile
                                         outputDir

Positional arguments:

`nSimsPerBin`	number of selection trajectories to generate per allele frequency bin
`maxSteps`	number of attempts to generate a selection trajectory before re-sampling selection coefficient and start time.
`inputParamFile`	file with model specifications for input
`outputDir`	location to write cosi output

Options:

`--cosiBuild=/Users/vitti/Desktop/COSI_DEBUG_TEST/cosi-2.0/coalescent`
	which version of cosi to run? (*automate installation)
`--dropSings`	randomly thin global singletons from output dataset to model ascertainment bias
`--genmapRandomRegions=False`
	cosi option to sub-sample genetic map randomly from input
`--freqRange=.05-.95`
	range of final selected allele frequencies to simulate, e.g. .05-.95
`--nBins=9`	number of frequency bins

run_sel_sims

run sel. simulations

usage: likes_from_model.py run_sel_sims [-h] [--cosiBuild COSIBUILD]
                                        [--dropSings DROPSINGS]
                                        [--genmapRandomRegions]
                                        [--freqRange FREQRANGE]
                                        [--nBins NBINS]
                                        n trajDir inputParamFile outputDir

Positional arguments:

`n`	num replicates to run per sel scenario
`trajDir`	location of simulated trajectories (i.e. outputDir from get_sel_trajs)
`inputParamFile`	file with model specifications for input
`outputDir`	location to write cosi output

Options:

`--cosiBuild=/Users/vitti/Desktop/COSI_DEBUG_TEST/cosi-2.0/coalescent`
	which version of cosi to run? (*automate installation)
`--dropSings`	randomly thin global singletons from output dataset to model ascertainment bias
`--genmapRandomRegions=False`
	cosi option to sub-sample genetic map randomly from input
`--freqRange=.05-.95`
	range of final selected allele frequencies to simulate, e.g. .05-.95
`--nBins=9`	number of frequency bins

scores_from_sims

get scores from simulations

usage: likes_from_model.py scores_from_sims [-h] [--inputTped INPUTTPED]
                                            [--inputIhs INPUTIHS]
                                            [--inputdelIhh INPUTDELIHH]
                                            [--inputXpehh INPUTXPEHH] [--ihs]
                                            [--delIhh] [--xpehh XPEHH]
                                            [--fst_deldaf FST_DELDAF]
                                            [--normalizeIhs NORMALIZEIHS]
                                            [--normalizeDelIhh NORMALIZEDELIHH]
                                            [--normalizeXpehh NORMALIZEXPEHH]
                                            outputFilename

Positional arguments:

outputFilename

where to write scorefile

Options:

`--inputTped`	tped from which to calculate score
`--inputIhs`	iHS from which to calculate delihh
`--inputdelIhh`	delIhh from which to calculate norm
`--inputXpehh`	Xp-ehh from which to calculate norm
`--ihs=False`	Undocumented
`--delIhh=False`	Undocumented
`--xpehh`	inputTped for altpop
`--fst_deldaf`	inputTped for altpop
`--normalizeIhs`	filename for parameters to normalize to; if not given then will by default normalize file to its own global dist
`--normalizeDelIhh`
	filename for parameters to normalize to; if not given then will by default normalize file to its own global dist
`--normalizeXpehh`
	filename for parameters to normalize to; if not given then will by default normalize file to its own global dist

likes_from_scores

get component score probability distributions from scores

usage: likes_from_model.py likes_from_scores [-h] [--thinToSize] [--ihs]
                                             [--delihh] [--xp] [--deldaf]
                                             [--fst] [--freqRange FREQRANGE]
                                             [--nBins NBINS]
                                             neutFile selFile selPos
                                             nLikesBins outPrefix

Positional arguments:

`neutFile`	file with scores for neutral scenarios (normalized if necessary)
`selFile`	file with scores for selected scenarios (normalized if necessary)
`selPos`	position of causal variant
`nLikesBins`	number of bins to use for histogram to approximate probability density function
`outPrefix`	save file as

Options:

`--thinToSize=False`
	subsample from simulated SNPs (since nSel << nLinked < nNeut)
`--ihs=False`	Undocumented
`--delihh=False`	Undocumented
`--xp=False`	Undocumented
`--deldaf=False`	Undocumented
`--fst=False`	Undocumented
`--freqRange=.05-.95`
	range of final selected allele frequencies to simulate, e.g. .05-.95
`--nBins=9`	number of frequency bins

3.4. composite.py¶

This script contains command-line utilities for combining component statistics – i.e., the final step of the CMS 2.0 pipeline.

usage: composite.py [-h]
                    {poppair,outgroups,bayesian_gw,bayesian_region,ml_region}
                    ...

Sub-commands:

poppair

collate all component statistics for a given population pair (as a prerequisite to more sophisticated group comparisons

usage: composite.py poppair [-h] [--xp_reverse_pops] [--deldaf_reverse_pops]
                            in_ihs_file in_delihh_file in_xp_file
                            in_fst_deldaf_file outfile

Positional arguments:

`in_ihs_file`	file with normalized iHS values for putative selpop
`in_delihh_file`	file with normalized delIhh values for putative selpop
`in_xp_file`	file with normalized XP-EHH values
`in_fst_deldaf_file`
	file with Fst, delDaf values for poppair
`outfile`	file to write with collated scores

Options:

`--xp_reverse_pops=False`
	include if the putative selpop for outcome is the altpop in XPEHH (and vice versa)
`--deldaf_reverse_pops=False`
	finclude if the putative selpop for outcome is the altpop in delDAF (and vice versa)

outgroups

combine scores from comparisons of a putative selected pop to 2+ outgroups.

usage: composite.py outgroups [-h] infiles likesfile outfile

Positional arguments:

`infiles`	comma-delimited set of pop-pair comparisons
`likesfile`	text file where probability distributions are specified for component scores
`outfile`	file to write with finalized scores

bayesian_gw

default algorithm and weighting, genome-wide

usage: composite.py bayesian_gw [-h] inputparamfile

Positional arguments:

inputparamfile

file with specifications for input

bayesian_region

default algorithm and weighting, within-region

usage: composite.py bayesian_region [-h]
                                    chrom startBp endBp selPop altPops
                                    demModel

Positional arguments:

`chrom`	chromosome containing region
`startBp`	start location of region in basepairs
`endBp`	end location of region in basepairs
`selPop`	Undocumented
`altPops`	comma-delimited
`demModel`	Undocumented

ml_region

machine learning algorithm (within-region)

usage: composite.py ml_region [-h] chrom startBp endBp selPop altPops demModel

Positional arguments:

`chrom`	chromosome containing region
`startBp`	start location of region in basepairs
`endBp`	end location of region in basepairs
`selPop`	Undocumented
`altPops`	comma-delimited
`demModel`	Undocumented