**magenta**
is an individual-based simulation model of
malaria epidemiology and parasite genetics.
magenta
extends the Imperial malaria model by tracking
the infection history of individuals. With this additional genetic
characteristics of the parasite can be assessed for looking at both
neutral genetic variation as well as loci under selection.
The model is written in C++/Rcpp and interfaced with R.
You can also install rdhs
from github with:
#install.packages("devtools")
devtools::install_github("OJwatson/rdhs")
The magenta
simulation model is run using the
pipeline
function, which handles creating needed parameter
objects, starting the simulation, summarising the simulation model state
at desired intervals and handling how to save the final model output and
state.
For example to run the model for 10 years at an EIR of 1 and no seasonality for a population of 10,000.
out1 <- pipeline(EIR = 1, years = 10, N = 10000)
## Seed set to 34491875
## magenta v1.2.0
By default, magenta will run the simulation for the entire duration
and return a few summaries of the human population at the end of the 10
years, which are stored within the Loggers
list in
out
. For example, the human proportions in each infection
state are the last 6 elements in Loggers
.
as.data.frame(tail(out1$Loggers, 6))
## S D A U T P
## 1 0.8608592 0.00156126 0.09441759 0.03743192 0.001026548 0.004703534
(See Loggers vignette for more info about what is returned in Loggers).
If we want to save information about the model state over time we set
update_save
to TRUE
and then specify at what
intervals we want to stop the simulation using
update_length
. For example, to fetch information about the
simualtion at monthly intervals:
out2 <- pipeline(EIR = 1, years = 10, N = 10000,
update_save = TRUE, update_length = 30,
seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s
When doing update_save
we now get a progress bar for the
simulation and the returned object out
is now a list where
each the first n-1 elements are unnamed lists and are the saved
information at each month. (The last element contains the pointer to the
simulation, which allows us to carry on the simulation if needed.) For
example to plot the size of the susceptible proportion over time:
S <- unlist(lapply(out2[1:(length(out2) - 1)], "[[", "S"))
time <- unlist(lapply(out2[1:(length(out2) - 1)], "[[", "Time"))
plot(time, S)
abline(v = 1000, col = "red")
magenta
starts simulations close to the equilibrium
solution, but it is not quite perfect. For example, in the above plot
there is a burn in period before the red vertical line. The required
burn in time varies depending on the simulation parameters and so it is
best to run the model a few times first to work out a suitable burn in
time. For most simulations though a 40 year burn in period is more than
sufficient.
In the last simulation run, the random seed was provided, which is
used to allow simulations to be able to be run reproducibly. To assist
this, the function call and package version are stored with each
simulation run as an attribute meta
.
meta <- attr(out2, "meta")
print(meta$call)
## pipeline(EIR = 1, N = 10000, years = 10, update_length = 30,
## update_save = TRUE, seed = 123456789L)
print(meta$version)
## [1] '1.2.0'
magenta
primarily characterises parasite populations in
terms of a genetic barcode, in which the barcode is a vector of bits
representing whether the parasite possesses the major or minor allele at
the loci of interest. By default the barcode used is 24 bases long. To
see what this looks like, we can request for magenta to return the human
population rather than the Loggers
as the last list
element.
out3 <- pipeline(EIR = 1, years = 10, N = 10000,
update_save = TRUE, update_length = 30,
human_only_full_save = TRUE,
seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s
We can now look at what is returned in the last list element.
names(out3[[length(out3)]])
## [1] "Infection_States" "Zetas"
## [3] "Ages" "ID"
## [5] "Strain_infection_state_vectors" "Strain_day_of_infection_state_change_vectors"
## [7] "Strain_barcode_vectors"
Again we return the infection states of the humans, but also their
individual biting rates, Zetas
, detection immunity and
their ages, as well as the parasite strain information:
# last element
n <- length(out3)
# first individual who is infected, i.e. not in states S or P (order goes S, D, A, U, T, P)
i <- which.min(out3[[n]]$Infection_States %in% c(0, 5))
out3[[n]]$Strain_barcode_vectors[i]
## [[1]]
## [[1]][[1]]
## [1] FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE FALSE FALSE
## [19] FALSE TRUE FALSE TRUE FALSE FALSE
Above we have found the first individual who is infected and returned the barcodes for each parasite strain the individual has. For example, this individual had one parasite strain, as only one parasite barcode is returned. We can then look at the distribution of parasite strains in the population, for example, against their individual biting rate.
zetas <- out3[[n]]$Zetas
n_strains <- unlist(lapply(out3[[n]]$Strain_infection_state_vectors,length))
plot(zetas, n_strains, log = "x", ylab = "Number of Strains")
With the parasite barcode information we can work out what an individual’s complexity of infection (COI) is, by looking at the number of unique parasite barcodes within an individual. For example, for the individual with the most amoutn of strains:
m <- which.max(n_strains)
message("Individual ", m, " has ", n_strains[m], " strains")
## Individual 782 has 19 strains
un <- length(unique(out3[[n]]$Strain_barcode_vectors[[m]]))
message("Individual ", m, " has ", un, " unique strains")
## Individual 782 has 17 unique strains
The reason we also return the age and the detection immunity,
ID
, is that these are useful for when we want to estimate
the COI in terms of what strains are likely to be observed based on
their parasitaemia, which is inferred by their detection immunity,
ID
. These can be estimated using
convert_barcode_vectors
as follows, which below is used to
show the COI of parsaites that have sufficiently high parasite density
to be detected by microscopy if that was the only parasite strain in the
infection.
cois <- convert_barcode_vectors(out3[[n]],
out3[[n]]$ID,
sub_patents_included = FALSE,
COI_type = "patent")
plot(n_strains, cois$COI, ylab = "Microscopy detectable COI",
xlab = "Number of Strains")
abline(0, 1)
We can also request magenta to return all the information about the human and mosquito population:
out4 <- pipeline(EIR = 1, years = 10, N = 10000,
update_save = TRUE, update_length = 30,
full_save = TRUE,
seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s
names(out4[[n]])
## [1] "population_List" "populations_event_and_strains_List"
## [3] "scourge_List" "parameters_List"
The last list element now contains 4 lists:
population_List
- Human population demographic
informationpopulations_event_and_strains_List
- Human population’s
event triggers (e.g. time of next infection event, time of moving state
etc) and parasite strainsscourge_List
- The mosquito population and their
parasite strains and event triggers. (a scourge is the collective noun
for mosquitoes)parameters_List
- List of simulation parametersThe above 4 lists are sufficient for restarting a simulation and can thus be used to save the model state to file and then be reloaded. Again see the (See Loggers vignette for more info about what information these lists actually contain, and also see the (See Saving & Restarting vignette for more info about how to save these lists and reload them to continue a model run.
magenta
was initially designed for simulating neutral
variation and looking at patterns in COI and parasite diversity. As a
result, there are a number of options for saving summaries of the
within-host parasite genetic diversity at each update length.
Previously, we were only saving the human population’s parasites in
the n-1 element. We can instead save the human population’s parasites at
each update with human_update_save
:
out5 <- pipeline(EIR = 1, years = 10, N = 10000,
update_save = TRUE, update_length = 30,
human_update_save = TRUE,
summary_saves_only = FALSE,
full_save = TRUE,
seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s
From this we could then calculate the population COI at each monthly
interval amongst other genetic traits. However, saving all this
information to file quickly takes up a lot of memory. Instead, we can
request for the information to be summarised using
summary_saves_only
:
out6 <- pipeline(EIR = 1, years = 10, N = 10000,
update_save = TRUE, update_length = 30,
human_update_save = TRUE,
summary_saves_only = TRUE,
full_save = TRUE,
seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s
When saving summaries of the genetic information, the mean COI, % polygenomic infections, % unique barcodes and coefficient of uniqueness (see manuscript for definition) are recorded for seperate age and infection status groups. In addition, the overall clonality is recorded (how many of each barcode were there) and the population level allele frequency are saved. Lastly, some extra summaries of the population are saved, which include treatments given in the preceeding month, EIR and prevalence. For full explanation see Loggers vignette. For example, for COI the following data is stored:
out6[[1]]$summary_coi
## age_bin clinical N mean
## 1 (-0.001,5] Asymptomatic 139 3.820144
## 2 (-0.001,5] Clinical 2 1.000000
## 3 (-0.001,5] P 5 0.000000
## 4 (-0.001,5] S 2084 0.000000
## 5 (5,15] Asymptomatic 413 2.987893
## 6 (5,15] Clinical 11 1.909091
## 7 (5,15] P 12 0.000000
## 8 (5,15] S 2472 0.000000
## 9 (15,100] Asymptomatic 898 2.876392
## 10 (15,100] Clinical 14 1.285714
## 11 (15,100] P 23 0.000000
## 12 (15,100] S 3927 0.000000
We can then plot the population mean COI over time as follows, using the last 100 values after an approximate burnin period:
# We don't want the last element as the last save
pos <- 1:(length(out6) - 1)
COI <- unlist(lapply(out6[pos], function(x) {
inf <- x$summary_coi$clinical %in% c("Asymptomatic","Clinical")
weighted.mean(x$summary_coi$mean[inf], x$summary_coi$N[inf])
}))
prev <- unlist(lapply(out6[pos], function(x) {x$pcr_prev}))
par(mfrow=c(2,1), mar=c(2,4,1,2))
plot(tail(COI,100), xlab = "", ylab = "COI")
plot(tail(prev,100), xlab = "", ylab = "PCR PfPR")
We can increase the length of the barcode or change the assumed population level allele frequencey as well:
out7 <- pipeline(EIR = 1, years = 10, N = 10000,
num_loci = 64,
plaf = c(rep(0.1, 32), rep(0.9, 32)),
update_save = TRUE,
update_length = 30,
human_update_save = TRUE,
summary_saves_only = TRUE,
full_save = TRUE,
seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s
And we can look at these over time:
plafs <- do.call(rbind,lapply(out7[pos], function(x) {x$barcode_freq}))
lattice::levelplot(plafs, xlab = "Time", ylab = "loci")
We can also change how we summarise the simulation at each interval. This can be done in a couple of ways. The first is by altering the age and states:
out8 <- pipeline(EIR = 1, years = 10, N = 10000,
age_breaks = c(-0.1,5,10,15,20,100),
sample_states = c(1:4),
update_save = TRUE,
update_length = 30,
human_update_save = TRUE,
summary_saves_only = TRUE,
full_save = TRUE,
seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s
out8[[1]]$summary_coi
## age_bin clinical N mean
## 1 (-0.001,5] Asymptomatic 139 3.820144
## 2 (-0.001,5] Clinical 2 1.000000
## 3 (5,15] Asymptomatic 413 2.987893
## 4 (5,15] Clinical 11 1.909091
## 5 (15,100] Asymptomatic 898 2.876392
## 6 (15,100] Clinical 14 1.285714
The next is by providing your own update function as
update_save_func
. Have a look at the code for
update_saves
to see the style required.
magenta
also allows you to specify interventions in the
form of the altering the use of insecticide treated nets
(itn_cov
), indoor residual spraying (irs_cov
)
and changing the frequency of treatment (ft
). These should
be provided as vector that dictate the annual coverage.
out9 <- pipeline(EIR = 4, years = 20, N = 10000,
ft = c(rep(0,10), rep(0.2,10)),
irs_cov = c(rep(0,10), rep(0.2,10)),
itn_cov = c(rep(0,10), rep(0.2,10)),
update_save = TRUE,
update_length = 30,
seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
##
## Running Deterministic Model for 20 years
## Starting Stochastic Simulation for 20 years
## Running: [=====================================================================================] 100% eta: 0s
infected_mosquitoes <- unlist(lapply(out9, function(x) {x$Mosquitoes$Mos_I}))
time <- unlist(lapply(out9, function(x) {x$Time}))/365
plot(time, infected_mosquitoes, xlab = "Time (Years)", ylab = "Infected Mosquitoes")
abline(v = 10)
magenta
incorporates vector based interventions by first
running an equivalent deterministic version of the malaria transmission
model with vector based intervetnions in. The output of the
determininistic model is used to estimate the changing mosquito
mortality rate and anthrophagy at each day as a result of the
intervention changes. These are then fed through to the indivdual-based
simulation to adjust the behaviour of the individual mosquitoes.
magenta
can currently be run under 2 different spatial
models. The first is a closed population, with no importation. In this
scenario, there will eventually be one parsaite genetic barcode that
becomes fixed. To demonstrate this, we can simulate a smaller population
of 1000 indiivduals, with only 4 loci in the barcode, for 25 years. The
spatial_type
is set to NULL
here, which
assumes a closed population and is also the default space option.
out10 <- pipeline(EIR = 1,
years = 25, N = 1000,
spatial_type = NULL,
num_loci = 4,
update_save = TRUE,
update_length = 30,
human_update_save = TRUE,
summary_saves_only = TRUE,
full_save = TRUE,
seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 25 years
## Running: [=====================================================================================] 100% eta: 0s
We can see this process again by plotting the barcode frequency over time for the fist n-1 elements (as the last list element will not contain the barcode_frequency summary)
pos <- 1:(length(out10) - 1)
plafs <- do.call(rbind,lapply(out10[pos], function(x) {x$barcode_freq}))
matplot(plafs)
We can change this to have importation of barcodes over time. This
occurs with a genetic island model where the population we are
simulating is assumed to receive imported parasites from a genetically
stable population. There are two forms of importation, the
spatial_incidence_matrix
and
spatial_mosquitoFOI_matrix
, which determine the rate at
which new parasites are imported into the human and mosquito population
respectively. In the following we assume that 20% of new infections in
both the humans and mosquito populations are actually imported, with the
barcodes drawn according to the population level allele frequency (PLAF)
(by default the PLAF is 0.5 for each loci). This ensures that a genetic
equilibrium is reached and barcode fixation does not occur.
out11 <- pipeline(EIR = 1,
years = 25,
N = 1000,
spatial_type = "island",
spatial_incidence_matrix = 0.2,
spatial_mosquitoFOI_matrix = 0.2,
num_loci = 4,
update_save = TRUE,
update_length = 30,
human_update_save = TRUE,
summary_saves_only = TRUE,
full_save = TRUE,
seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 25 years
## Running: [=====================================================================================] 100% eta: 0s
pos <- 1:(length(out11) - 1)
plafs <- do.call(rbind,lapply(out11[pos], function(x) {x$barcode_freq}))
matplot(plafs)
Included in magenta
are the historic interventions and
seasonality profiles for admin level one regions in sub-Saharan African
countries. These can then be used to draw up the required parameter sets
for a given region. E.g. to simulate for Kinsasha from 1980:2015
# set up the years
year_range <- 1980:2015
# grab the interventions for region of interest
country <- "Democratic Republic of the Congo"
admin <- "Kinshasa"
ints <- intervention_grab(country = country,
admin = admin,
year_range = year_range)
## Requested: Kinshasa, Democratic Republic of the Congo
## Returned: Kinshasa, Democratic Republic of the Congo
# grab the importation from the migration model
spl <- spl_grab(country, admin, year_range)
## Requested: Kinshasa, Democratic Republic of the Congo
## Returned: Kinshasa_City, Democratic Republic of the Congo
# get the intervention parmeters
ft <- ints$ft
itn_cov <- ints$itn_cov
irs_cov <- ints$irs_cov
# load the Malaria Atlas Project Prevalence data
MAP_file <- system.file("extdata/MAP_PfPR_population_weighted.csv",
package = "magenta")
MAP <- read.csv(MAP_file)
# work out the EIR for the prevalence in year 2000 (i.e. pre interventions)
pfpr_micro <- MAP$X2000[which(MAP$Name==admin)]
EIR <- magenta:::pfpr_to_eir_heuristic(ft = 0, PfPR_micro = pfpr_micro)
## 0.00152858753907675
# run the simulation
out12 <- pipeline(
EIR = EIR,
N = 10000,
years = length(itn_cov),
country = country,
admin = admin,
itn_cov = itn_cov,
irs_cov = irs_cov,
ft = ft,
spatial_incidence_matrix = spl$incidence,
spatial_mosquitoFOI_matrix = spl$mosquitoFOI,
spatial_type = "island",
human_only_full_save = TRUE,
update_length = 30,
update_save = TRUE,
human_update_save = FALSE,
summary_saves_only = FALSE,
)
## Seed set to 693175740
## magenta v1.2.0
## Requested: Kinshasa, Democratic Republic of the Congo
## Returned: Kinshasa, Democratic Republic of the Congo
##
## Running Deterministic Model for 36 years
## Starting Stochastic Simulation for 36 years
## Running: [=====================================================================================] 100% eta: 0s
# Grab the microscopy prevalence over time in 2-10s
pos <- 1:(length(out12)-1)
time <- 1980 + unlist(lapply(out12[pos], "[[", "Time"))/365
prev_2_10_micro <- unlist(lapply(out12[pos], function(x) {
magenta:::microscopy_detected(ages = x$Ages,
IDs = x$ID,
states = x$InfectionStates)
}))
# get the MAP estimates
map_prev <- MAP[which(MAP$Name==admin),5:20]
# plot the relationship
plot(time, prev_2_10_micro, xlim = c(2000,2015),
ylab = "Micro 2-10", type = "l")
points(2000:2015, map_prev)
The above is similar to how the model fitting simulations were laid
out and it gives the user a chance to check that the parameter sets
being used are the correct parameters. However, you can use
use_historic_interventions
as TRUE
and provide
the year range and pipeline
will conduct the above
parameter searching for you.
As an aside, magenta
has a multitude of little helped
functions like microscopy_detected
that are not exported
currently, due to their use in mainly assisting downstream analysis
rather than being for simulation directly.
magenta
was also designed to simulate patterns in
identity-by-descent (IBD). In these simulations, the barcode is adapted
such that each loci is represented by multiple bits that represent a
unique integer rather than a single bit. The integers are assigned at
the beginning of the simulation, with consecutive integers assigned to
each parasite, which allows the pedigree of parasites to be tracked.
These simulations are slower as the bitset container used to represnt is
substantially larger and simulating recombination and new variants is
slightly more involved. In addition, summarising the parasite population
at monthly intervals takes more time and more memory is required to
simulate patterns of IBD.
To simulate IBD, we allocate the ibd_length
, which is
the bumber of bits used to represent one locus. The number of bits used
gives the maximum number of parasite pedigrees that can be tracked,
which is equal to 2^ibd_length
. As a result you need to
double check that you will not exceed this number during a simulation.
An easy way to check is to use the argument set_up_only
,
which will return the simulation model state after it has been
initialised:
out13 <- pipeline(EIR = 1,
years = 10,
N = 1000,
ibd_length = 20,
num_loci = 24,
set_up_only = TRUE,
update_save = TRUE,
update_length = 30,
human_update_save = TRUE,
summary_saves_only = TRUE,
full_save = TRUE,
seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Set Up Grab
names(out13)
## [1] "population_List" "populations_event_and_strains_List"
## [3] "scourge_List" "parameters_List"
From this we can have a look at the strains to see the IBD representation used:
infected <- which(out13$population_List$Infection_States %in% c(1:4))
strains <- out13$populations_event_and_strains_List$Strain_barcode_vectors[infected]
strains[1]
## [[1]]
## [[1]][[1]]
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [19] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [55] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [91] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [127] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [163] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [199] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [217] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [235] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [253] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [271] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [307] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [325] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [343] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [361] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [379] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [397] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [415] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [451] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [469] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##
## [[1]][[2]]
## [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [19] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [55] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [91] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [127] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [163] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [199] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [217] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [235] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [253] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [271] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [307] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## [325] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [343] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [361] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [379] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [397] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [415] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [451] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [469] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
The first infected individual is infected with two parasites as
indicated with the two strain barcode vectors. The first is the binary
representation of rep(0, 24)
and the next is
rep(1,24)
. To convert these to integer vectors we can use
the unexported convert_ibd_barcode
passing in the barcode
and the number of loci (nl
) being simulated. So the first
parasite in the second infected individual should be represented by a
barcode of 2s give that the first individual had two strains (i.e. the
0s and the 1s).
magenta:::convert_ibd_barcode(strains[[2]][[1]], nl = 24)
## [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
We can then work out how many parasites have been allocated in the simulation to begin with:
last_infected <- tail(strains,1)[[1]]
magenta:::convert_ibd_barcode(last_infected[[length(last_infected)]], nl = 24)
## [1] 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607
Clearly 2^20
will be plenty, even if we add in
substantial importation, with importation in the IBD model importing the
next integer barcode, i.e. rep(608, 24)
. Just remember you
need to factor in both the rate of importation, the length of the
simulation and the size of the population. We can do the same check at
the end of the simulation to make sure we allocated sufficient bits.
out14 <- pipeline(EIR = 1,
years = 10,
N = 1000,
ibd_length = 20,
num_loci = 24,
spatial_type = "island",
spatial_incidence_matrix = 0.2,
spatial_mosquitoFOI_matrix = 0.2,
update_save = TRUE,
update_length = 30,
human_update_save = TRUE,
summary_saves_only = TRUE,
full_save = TRUE,
seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s
l <- length(out14)
strains <- out14[[l]]$populations_event_and_strains_List$Strain_barcode_vectors
infected <- which(out14[[l]]$population_List$Infection_States %in% c(1:4))
converted_barcodes <- lapply(strains[infected], function(x) {
lapply(x, magenta:::convert_ibd_barcode, 24)
})
max(unlist(converted_barcodes))
## [1] 9025
Definitely plenty.
As when simulating normal barcodes, a summary of the genetics is made
at each update interval. For simulations of IBD, we save the population
level IBD (pIBD in the manuscript) as summary_ibd
and the
within host IBD (iIBD in the manuscript) as
summary_within_ibd
. The same system is used for how to bin
the population into clinical infection and age groups:
out14[[1]]$summary_ibd
## age_bin clinical N mean
## 1 (-0.001,5] Asymptomatic 14 9.751341e-06
## 2 (-0.001,5] P 0 NaN
## 3 (-0.001,5] S 0 NaN
## 4 (5,15] Asymptomatic 27 1.560287e-04
## 5 (5,15] P 0 NaN
## 6 (5,15] S 0 NaN
## 7 (15,100] Asymptomatic 59 1.875604e-04
## 8 (15,100] Clinical 1 3.154574e-03
## 9 (15,100] P 0 NaN
## 10 (15,100] S 0 NaN
And for iIBD:
out14[[1]]$summary_within_ibd
## age_bin clinical N mean
## 1 (-0.001,5] Asymptomatic 9 0.02314815
## 2 (-0.001,5] P 0 NaN
## 3 (-0.001,5] S 0 NaN
## 4 (5,15] Asymptomatic 19 0.00000000
## 5 (5,15] P 0 NaN
## 6 (5,15] S 0 NaN
## 7 (15,100] Asymptomatic 24 0.00000000
## 8 (15,100] Clinical 0 NaN
## 9 (15,100] P 0 NaN
## 10 (15,100] S 0 NaN
Note how the denominator is different between the two measures. This is because iIBD is the comparison of IBD (same integer at a given locus) of the strains within an individual. Thus if an individual is only infected with one parasite strain then iIBD can not be calculated and these individuals are not included.
The above introduction gives an overview of how to run the model and
covers all components of magenta
that were used to produce
the findings in the BioArxiv
manuscript.
magenta
, however, has been developed to also simulate
drug resistance and consider more nuanced spatial set ups. Both of these
are under development, with the resistance implementation fully in (but
needing more documentation) and the inclusion of space via a
metapopulation not fully implemented yet. Once documented these will be
availabe in the Resistance
Vignette and Spatial
Vignette