Generating caches for DFE inference

Input

After inferring a best fit demographic model, users may also infer distributions of fitness effects (DFEs) from data.

dfe

To perform DFE inference, users need to first generate of cache of frequency spectra with different selection coefficients. Because we use the split_mig model in the demographic inference, we need to use the same demographic model plus selection, the split_mig_sel model or the split_mig_sel_single_gamma model:

The split_mig_sel model is used for inferring the DFE from two populations by assuming the population-scaled selection coefficients (usually denoted as gamma in population genetics) are independent in the two populations.
The split_mig_sel_single_gamma model assumes the population-scaled selection coefficients are the same in the two populations.

Here is an example command to generate a cache with shared selection coefficients:

dadi-cli GenerateCache --cache-type cache1d --model split_mig_sel_single_gamma --demo-popt examples/results/demog/1KG.YRI.CEU.20.split_mig.demog.params.InferDM.bestfits --sample-size 20 20 --grids 60 80 100 --gamma-pts 10 --gamma-bounds 1e-4 200 --output examples/results/caches/1KG.YRI.CEU.20.split_mig_sel_single_gamma.spectra.bpkl --cpus 2

--demo-popt specifies the demographic parameters, which are stored in 1KG.YRI.CEU.20.split_mig.demog.params.InferDM.bestfits.

--sample-size defines the population size of each population. Only allele frequency spectra from one or two populations are supported.

By default, GenerateCache will make the cache for the situation where the selection coefficients are the same in the two populations (i.e. --cache-type cache1d). If users want to to make the cache for the situation where the selection coefficients are independent from one another, they should use the --cache-type cache2d option. For example,

dadi-cli GenerateCache --cache-type cache2d --model split_mig_sel --demo-popt examples/results/demog/1KG.YRI.CEU.20.split_mig.demog.params.InferDM.bestfits --sample-size 20 20 --grids 60 80 100 --gamma-pts 10 --gamma-bounds 1e-4 200 --output examples/results/caches/1KG.YRI.CEU.20.split_mig_sel.spectra.bpkl --cpus 2

Users can use the --gamma-bounds option to choose the range of the distribution of selection coefficients and the --gamma-pts option can be used to specify the number of selection coefficients that will be selected in that range to generate the cache. Note that the higher (more negative because dadi assumes deleterious DFEs by default) you make the --gamma-bounds, the bigger the grid points. Alternatively, users can use the --grids option to adjust the grid sizes. If n is the maximum of the sample sizes, then the default grid sizes are (int(n*2.2)+2, int(n*2.4)+4, int(n*2.6)+6).

Output

The output files 1KG.YRI.CEU.20.split_mig_sel_single_gamma.spectra.bpkl and 1KG.YRI.CEU.20.split_mig_sel.spectra.bpkl are binary files generated by the Python pickle module. Note: Since the pickle module can be insecure, only use cache files from trusted sources.

Arguments

Argument	Description
`--additional-gammas`	Additional positive population-scaled selection coefficients to cache for. Default: [].
`--gamma-bounds`	Range of population-scaled selection coefficients to cache. Default: [1e-4, 2000].
`--gamma-pts`	Number of gamma grid points over which to integrate. Default: 50.
`--cpus`	Number of CPUs to use in multiprocessing. Default: All available CPUs.
`--gpus`	Number of GPUs to use in multiprocessing. Default: 0.
`--cache-type`	Type of the generated cache: `cache1d` for SFS from one population or JSFS from two populations but assuming the population-scaled selection coefficients are the same in the two populations; `cache2d` for JSFS from two populations and assuming the population-scaled selection coefficients are independent in the two populations. Default: `cache1d`.
`--sample-sizes`	Sample sizes of populations.
`--output`	Name of the output file.
`--demo-popt`	File containing the bestfit parameters for the demographic model.
`--grids`	Sizes of grids. Default: Based on sample size.
`--model`	Name of the demographic model. To check available demographic models, please use `dadi-cli Model`.
`--model-file`	Name of python module file (not including .py) that contains custom models to use. Can be an HTML link. Default: None.