Overview

**magenta** is an individual-based simulation model of malaria epidemiology and parasite genetics.

magenta extends the Imperial malaria model by tracking the infection history of individuals. With this additional genetic characteristics of the parasite can be assessed for looking at both neutral genetic variation as well as loci under selection.

The model is written in C++/Rcpp and interfaced with R.


Installation

You can also install rdhs from github with:

#install.packages("devtools")
devtools::install_github("OJwatson/rdhs")
# Load the package
library(magenta)

Demonstration

Base Model

The magenta simulation model is run using the pipeline function, which handles creating needed parameter objects, starting the simulation, summarising the simulation model state at desired intervals and handling how to save the final model output and state.

For example to run the model for 10 years at an EIR of 1 and no seasonality for a population of 10,000.

out1 <- pipeline(EIR = 1, years = 10, N = 10000)
## Seed set to 34491875
## magenta v1.2.0

By default, magenta will run the simulation for the entire duration and return a few summaries of the human population at the end of the 10 years, which are stored within the Loggers list in out. For example, the human proportions in each infection state are the last 6 elements in Loggers.

as.data.frame(tail(out1$Loggers, 6))
##           S          D          A          U           T           P
## 1 0.8608592 0.00156126 0.09441759 0.03743192 0.001026548 0.004703534

(See Loggers vignette for more info about what is returned in Loggers).

If we want to save information about the model state over time we set update_save to TRUE and then specify at what intervals we want to stop the simulation using update_length. For example, to fetch information about the simualtion at monthly intervals:

out2 <- pipeline(EIR = 1, years = 10, N = 10000, 
                 update_save = TRUE, update_length = 30,
                 seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s

When doing update_save we now get a progress bar for the simulation and the returned object out is now a list where each the first n-1 elements are unnamed lists and are the saved information at each month. (The last element contains the pointer to the simulation, which allows us to carry on the simulation if needed.) For example to plot the size of the susceptible proportion over time:

S <- unlist(lapply(out2[1:(length(out2) - 1)], "[[", "S"))
time <- unlist(lapply(out2[1:(length(out2) - 1)], "[[", "Time"))
plot(time, S)
abline(v = 1000, col = "red")

magenta starts simulations close to the equilibrium solution, but it is not quite perfect. For example, in the above plot there is a burn in period before the red vertical line. The required burn in time varies depending on the simulation parameters and so it is best to run the model a few times first to work out a suitable burn in time. For most simulations though a 40 year burn in period is more than sufficient.

In the last simulation run, the random seed was provided, which is used to allow simulations to be able to be run reproducibly. To assist this, the function call and package version are stored with each simulation run as an attribute meta.

meta <- attr(out2, "meta")
print(meta$call)
## pipeline(EIR = 1, N = 10000, years = 10, update_length = 30, 
##     update_save = TRUE, seed = 123456789L)
print(meta$version)
## [1] '1.2.0'

magenta primarily characterises parasite populations in terms of a genetic barcode, in which the barcode is a vector of bits representing whether the parasite possesses the major or minor allele at the loci of interest. By default the barcode used is 24 bases long. To see what this looks like, we can request for magenta to return the human population rather than the Loggers as the last list element.

out3 <- pipeline(EIR = 1, years = 10, N = 10000, 
                 update_save = TRUE, update_length = 30, 
                 human_only_full_save = TRUE, 
                 seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s

We can now look at what is returned in the last list element.

names(out3[[length(out3)]])
## [1] "Infection_States"                             "Zetas"                                       
## [3] "Ages"                                         "ID"                                          
## [5] "Strain_infection_state_vectors"               "Strain_day_of_infection_state_change_vectors"
## [7] "Strain_barcode_vectors"

Again we return the infection states of the humans, but also their individual biting rates, Zetas, detection immunity and their ages, as well as the parasite strain information:

# last element
n <- length(out3)

# first individual who is infected, i.e. not in states S or P (order goes S, D, A, U, T, P) 
i <- which.min(out3[[n]]$Infection_States %in% c(0, 5))
out3[[n]]$Strain_barcode_vectors[i]
## [[1]]
## [[1]][[1]]
##  [1] FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE
## [19] FALSE  TRUE FALSE  TRUE FALSE FALSE

Above we have found the first individual who is infected and returned the barcodes for each parasite strain the individual has. For example, this individual had one parasite strain, as only one parasite barcode is returned. We can then look at the distribution of parasite strains in the population, for example, against their individual biting rate.

zetas <- out3[[n]]$Zetas
n_strains <- unlist(lapply(out3[[n]]$Strain_infection_state_vectors,length))
plot(zetas, n_strains, log = "x", ylab = "Number of Strains")

With the parasite barcode information we can work out what an individual’s complexity of infection (COI) is, by looking at the number of unique parasite barcodes within an individual. For example, for the individual with the most amoutn of strains:

m <- which.max(n_strains)
message("Individual ", m, " has ", n_strains[m], " strains")
## Individual 782 has 19 strains
un <- length(unique(out3[[n]]$Strain_barcode_vectors[[m]]))
message("Individual ", m, " has ", un, " unique strains")
## Individual 782 has 17 unique strains

The reason we also return the age and the detection immunity, ID, is that these are useful for when we want to estimate the COI in terms of what strains are likely to be observed based on their parasitaemia, which is inferred by their detection immunity, ID. These can be estimated using convert_barcode_vectors as follows, which below is used to show the COI of parsaites that have sufficiently high parasite density to be detected by microscopy if that was the only parasite strain in the infection.

cois <- convert_barcode_vectors(out3[[n]], 
                                out3[[n]]$ID, 
                                sub_patents_included = FALSE, 
                                COI_type = "patent")

plot(n_strains, cois$COI, ylab = "Microscopy detectable COI", 
     xlab = "Number of Strains")
abline(0, 1)

We can also request magenta to return all the information about the human and mosquito population:

out4 <- pipeline(EIR = 1, years = 10, N = 10000, 
                 update_save = TRUE, update_length = 30, 
                 full_save = TRUE, 
                 seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s
names(out4[[n]])
## [1] "population_List"                    "populations_event_and_strains_List"
## [3] "scourge_List"                       "parameters_List"

The last list element now contains 4 lists:

  1. population_List - Human population demographic information
  2. populations_event_and_strains_List - Human population’s event triggers (e.g. time of next infection event, time of moving state etc) and parasite strains
  3. scourge_List - The mosquito population and their parasite strains and event triggers. (a scourge is the collective noun for mosquitoes)
  4. parameters_List - List of simulation parameters

The above 4 lists are sufficient for restarting a simulation and can thus be used to save the model state to file and then be reloaded. Again see the (See Loggers vignette for more info about what information these lists actually contain, and also see the (See Saving & Restarting vignette for more info about how to save these lists and reload them to continue a model run.

Saving different model summaries

magenta was initially designed for simulating neutral variation and looking at patterns in COI and parasite diversity. As a result, there are a number of options for saving summaries of the within-host parasite genetic diversity at each update length.

Previously, we were only saving the human population’s parasites in the n-1 element. We can instead save the human population’s parasites at each update with human_update_save:

out5 <- pipeline(EIR = 1, years = 10, N = 10000, 
                 update_save = TRUE, update_length = 30, 
                 human_update_save = TRUE, 
                 summary_saves_only = FALSE,
                 full_save = TRUE, 
                 seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s

From this we could then calculate the population COI at each monthly interval amongst other genetic traits. However, saving all this information to file quickly takes up a lot of memory. Instead, we can request for the information to be summarised using summary_saves_only:

out6 <- pipeline(EIR = 1, years = 10, N = 10000, 
                 update_save = TRUE, update_length = 30, 
                 human_update_save = TRUE, 
                 summary_saves_only = TRUE,
                 full_save = TRUE, 
                 seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s

When saving summaries of the genetic information, the mean COI, % polygenomic infections, % unique barcodes and coefficient of uniqueness (see manuscript for definition) are recorded for seperate age and infection status groups. In addition, the overall clonality is recorded (how many of each barcode were there) and the population level allele frequency are saved. Lastly, some extra summaries of the population are saved, which include treatments given in the preceeding month, EIR and prevalence. For full explanation see Loggers vignette. For example, for COI the following data is stored:

out6[[1]]$summary_coi
##       age_bin     clinical    N     mean
## 1  (-0.001,5] Asymptomatic  139 3.820144
## 2  (-0.001,5]     Clinical    2 1.000000
## 3  (-0.001,5]            P    5 0.000000
## 4  (-0.001,5]            S 2084 0.000000
## 5      (5,15] Asymptomatic  413 2.987893
## 6      (5,15]     Clinical   11 1.909091
## 7      (5,15]            P   12 0.000000
## 8      (5,15]            S 2472 0.000000
## 9    (15,100] Asymptomatic  898 2.876392
## 10   (15,100]     Clinical   14 1.285714
## 11   (15,100]            P   23 0.000000
## 12   (15,100]            S 3927 0.000000

We can then plot the population mean COI over time as follows, using the last 100 values after an approximate burnin period:

# We don't want the last element as the last save
pos <- 1:(length(out6) - 1)
COI <- unlist(lapply(out6[pos], function(x) { 
    inf <- x$summary_coi$clinical %in% c("Asymptomatic","Clinical")
    weighted.mean(x$summary_coi$mean[inf], x$summary_coi$N[inf]) 
}))
prev <- unlist(lapply(out6[pos], function(x) {x$pcr_prev}))

par(mfrow=c(2,1), mar=c(2,4,1,2))
plot(tail(COI,100), xlab = "", ylab = "COI")
plot(tail(prev,100), xlab = "", ylab = "PCR PfPR")

We can increase the length of the barcode or change the assumed population level allele frequencey as well:

out7 <- pipeline(EIR = 1, years = 10, N = 10000, 
                 num_loci = 64, 
                 plaf = c(rep(0.1, 32), rep(0.9, 32)),
                 update_save = TRUE, 
                 update_length = 30, 
                 human_update_save = TRUE, 
                 summary_saves_only = TRUE,
                 full_save = TRUE, 
                 seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s

And we can look at these over time:

plafs <- do.call(rbind,lapply(out7[pos], function(x) {x$barcode_freq}))
lattice::levelplot(plafs, xlab = "Time", ylab = "loci")

We can also change how we summarise the simulation at each interval. This can be done in a couple of ways. The first is by altering the age and states:

out8 <- pipeline(EIR = 1, years = 10, N = 10000, 
                 age_breaks = c(-0.1,5,10,15,20,100), 
                 sample_states = c(1:4),
                 update_save = TRUE, 
                 update_length = 30, 
                 human_update_save = TRUE, 
                 summary_saves_only = TRUE,
                 full_save = TRUE, 
                 seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s
out8[[1]]$summary_coi
##      age_bin     clinical   N     mean
## 1 (-0.001,5] Asymptomatic 139 3.820144
## 2 (-0.001,5]     Clinical   2 1.000000
## 3     (5,15] Asymptomatic 413 2.987893
## 4     (5,15]     Clinical  11 1.909091
## 5   (15,100] Asymptomatic 898 2.876392
## 6   (15,100]     Clinical  14 1.285714

The next is by providing your own update function as update_save_func. Have a look at the code for update_saves to see the style required.

Interventions

magenta also allows you to specify interventions in the form of the altering the use of insecticide treated nets (itn_cov), indoor residual spraying (irs_cov) and changing the frequency of treatment (ft). These should be provided as vector that dictate the annual coverage.

out9 <- pipeline(EIR = 4, years = 20, N = 10000,
                 ft = c(rep(0,10), rep(0.2,10)),
                 irs_cov = c(rep(0,10), rep(0.2,10)),
                 itn_cov = c(rep(0,10), rep(0.2,10)),
                 update_save = TRUE, 
                 update_length = 30, 
                 seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## 
## Running Deterministic Model for 20 years
## Starting Stochastic Simulation for 20 years
## Running: [=====================================================================================] 100% eta: 0s
infected_mosquitoes <- unlist(lapply(out9, function(x) {x$Mosquitoes$Mos_I}))
time <- unlist(lapply(out9, function(x) {x$Time}))/365
plot(time, infected_mosquitoes, xlab = "Time (Years)", ylab = "Infected Mosquitoes")
abline(v = 10)

magenta incorporates vector based interventions by first running an equivalent deterministic version of the malaria transmission model with vector based intervetnions in. The output of the determininistic model is used to estimate the changing mosquito mortality rate and anthrophagy at each day as a result of the intervention changes. These are then fed through to the indivdual-based simulation to adjust the behaviour of the individual mosquitoes.

Space

magenta can currently be run under 2 different spatial models. The first is a closed population, with no importation. In this scenario, there will eventually be one parsaite genetic barcode that becomes fixed. To demonstrate this, we can simulate a smaller population of 1000 indiivduals, with only 4 loci in the barcode, for 25 years. The spatial_type is set to NULL here, which assumes a closed population and is also the default space option.

out10 <- pipeline(EIR = 1, 
                 years = 25, N = 1000, 
                 spatial_type = NULL,
                 num_loci = 4, 
                 update_save = TRUE, 
                 update_length = 30, 
                 human_update_save = TRUE, 
                 summary_saves_only = TRUE,
                 full_save = TRUE, 
                 seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 25 years
## Running: [=====================================================================================] 100% eta: 0s

We can see this process again by plotting the barcode frequency over time for the fist n-1 elements (as the last list element will not contain the barcode_frequency summary)

pos <- 1:(length(out10) - 1)
plafs <- do.call(rbind,lapply(out10[pos], function(x) {x$barcode_freq}))
matplot(plafs)

We can change this to have importation of barcodes over time. This occurs with a genetic island model where the population we are simulating is assumed to receive imported parasites from a genetically stable population. There are two forms of importation, the spatial_incidence_matrix and spatial_mosquitoFOI_matrix, which determine the rate at which new parasites are imported into the human and mosquito population respectively. In the following we assume that 20% of new infections in both the humans and mosquito populations are actually imported, with the barcodes drawn according to the population level allele frequency (PLAF) (by default the PLAF is 0.5 for each loci). This ensures that a genetic equilibrium is reached and barcode fixation does not occur.

out11 <- pipeline(EIR = 1, 
                 years = 25, 
                 N = 1000, 
                 spatial_type = "island",
                 spatial_incidence_matrix = 0.2,
                 spatial_mosquitoFOI_matrix = 0.2,
                 num_loci = 4, 
                 update_save = TRUE, 
                 update_length = 30, 
                 human_update_save = TRUE, 
                 summary_saves_only = TRUE,
                 full_save = TRUE, 
                 seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 25 years
## Running: [=====================================================================================] 100% eta: 0s
pos <- 1:(length(out11) - 1)
plafs <- do.call(rbind,lapply(out11[pos], function(x) {x$barcode_freq}))
matplot(plafs)

Seasonality and Specific Locations

Included in magenta are the historic interventions and seasonality profiles for admin level one regions in sub-Saharan African countries. These can then be used to draw up the required parameter sets for a given region. E.g. to simulate for Kinsasha from 1980:2015

# set up the years
year_range <- 1980:2015

# grab the interventions for region of interest
country <- "Democratic Republic of the Congo"
admin <- "Kinshasa"
ints <- intervention_grab(country = country, 
                          admin = admin,
                          year_range = year_range)
## Requested: Kinshasa, Democratic Republic of the Congo
## Returned: Kinshasa, Democratic Republic of the Congo
# grab the importation from the migration model
spl <- spl_grab(country, admin, year_range)
## Requested: Kinshasa, Democratic Republic of the Congo
## Returned: Kinshasa_City, Democratic Republic of the Congo
# get the intervention parmeters
ft <- ints$ft
itn_cov <- ints$itn_cov
irs_cov <- ints$irs_cov

# load the Malaria Atlas Project Prevalence data
MAP_file <- system.file("extdata/MAP_PfPR_population_weighted.csv", 
                        package = "magenta")
MAP <- read.csv(MAP_file)

# work out the EIR for the prevalence in year 2000 (i.e. pre interventions)
pfpr_micro <- MAP$X2000[which(MAP$Name==admin)]
EIR <- magenta:::pfpr_to_eir_heuristic(ft = 0, PfPR_micro = pfpr_micro)
## 0.00152858753907675
# run the simulation
out12 <- pipeline(
  EIR = EIR,
  N = 10000,
  years = length(itn_cov),
  country = country,
  admin = admin,
  itn_cov = itn_cov,
  irs_cov = irs_cov,
  ft = ft,
  spatial_incidence_matrix = spl$incidence,
  spatial_mosquitoFOI_matrix = spl$mosquitoFOI,
  spatial_type = "island",
  human_only_full_save = TRUE,
  update_length = 30,
  update_save = TRUE,
  human_update_save = FALSE,
  summary_saves_only = FALSE,
)
## Seed set to 693175740
## magenta v1.2.0
## Requested: Kinshasa, Democratic Republic of the Congo
## Returned: Kinshasa, Democratic Republic of the Congo
## 
## Running Deterministic Model for 36 years
## Starting Stochastic Simulation for 36 years
## Running: [=====================================================================================] 100% eta: 0s
# Grab the microscopy prevalence over time in 2-10s
pos <- 1:(length(out12)-1)
time <- 1980 + unlist(lapply(out12[pos], "[[", "Time"))/365
prev_2_10_micro <- unlist(lapply(out12[pos], function(x) { 
  magenta:::microscopy_detected(ages = x$Ages, 
                                IDs = x$ID, 
                                states = x$InfectionStates)
  }))

# get the MAP estimates
map_prev <- MAP[which(MAP$Name==admin),5:20]

# plot the relationship
plot(time, prev_2_10_micro, xlim = c(2000,2015), 
     ylab = "Micro 2-10", type = "l")
points(2000:2015, map_prev)

The above is similar to how the model fitting simulations were laid out and it gives the user a chance to check that the parameter sets being used are the correct parameters. However, you can use use_historic_interventions as TRUE and provide the year range and pipeline will conduct the above parameter searching for you.

As an aside, magenta has a multitude of little helped functions like microscopy_detected that are not exported currently, due to their use in mainly assisting downstream analysis rather than being for simulation directly.

IBD Simulations

magenta was also designed to simulate patterns in identity-by-descent (IBD). In these simulations, the barcode is adapted such that each loci is represented by multiple bits that represent a unique integer rather than a single bit. The integers are assigned at the beginning of the simulation, with consecutive integers assigned to each parasite, which allows the pedigree of parasites to be tracked. These simulations are slower as the bitset container used to represnt is substantially larger and simulating recombination and new variants is slightly more involved. In addition, summarising the parasite population at monthly intervals takes more time and more memory is required to simulate patterns of IBD.

To simulate IBD, we allocate the ibd_length, which is the bumber of bits used to represent one locus. The number of bits used gives the maximum number of parasite pedigrees that can be tracked, which is equal to 2^ibd_length. As a result you need to double check that you will not exceed this number during a simulation. An easy way to check is to use the argument set_up_only, which will return the simulation model state after it has been initialised:

out13 <- pipeline(EIR = 1, 
                  years = 10, 
                  N = 1000,  
                  ibd_length = 20, 
                  num_loci = 24,
                  set_up_only = TRUE,
                  update_save = TRUE, 
                  update_length = 30, 
                  human_update_save = TRUE, 
                  summary_saves_only = TRUE,
                  full_save = TRUE, 
                  seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Set Up Grab
names(out13)
## [1] "population_List"                    "populations_event_and_strains_List"
## [3] "scourge_List"                       "parameters_List"

From this we can have a look at the strains to see the IBD representation used:

infected <- which(out13$population_List$Infection_States %in% c(1:4))
strains <- out13$populations_event_and_strains_List$Strain_barcode_vectors[infected]

strains[1]
## [[1]]
## [[1]][[1]]
##   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [19] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [55] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [91] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [127] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [163] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [199] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [217] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [235] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [253] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [271] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [307] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [325] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [343] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [361] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [379] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [397] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [415] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [451] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [469] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 
## [[1]][[2]]
##   [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [19] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [37] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [55] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [91] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## [127] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
## [163] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [199] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [217] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [235] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [253] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [271] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## [307] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
## [325] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
## [343] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [361]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [379] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [397] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [415] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [451] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [469] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

The first infected individual is infected with two parasites as indicated with the two strain barcode vectors. The first is the binary representation of rep(0, 24) and the next is rep(1,24). To convert these to integer vectors we can use the unexported convert_ibd_barcode passing in the barcode and the number of loci (nl) being simulated. So the first parasite in the second infected individual should be represented by a barcode of 2s give that the first individual had two strains (i.e. the 0s and the 1s).

magenta:::convert_ibd_barcode(strains[[2]][[1]], nl = 24)
##  [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

We can then work out how many parasites have been allocated in the simulation to begin with:

last_infected <- tail(strains,1)[[1]]
magenta:::convert_ibd_barcode(last_infected[[length(last_infected)]], nl = 24)
##  [1] 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607 607

Clearly 2^20 will be plenty, even if we add in substantial importation, with importation in the IBD model importing the next integer barcode, i.e. rep(608, 24). Just remember you need to factor in both the rate of importation, the length of the simulation and the size of the population. We can do the same check at the end of the simulation to make sure we allocated sufficient bits.

out14 <- pipeline(EIR = 1, 
                  years = 10, 
                  N = 1000,  
                  ibd_length = 20, 
                  num_loci = 24,
                  spatial_type = "island",
                  spatial_incidence_matrix = 0.2,
                  spatial_mosquitoFOI_matrix = 0.2,
                  update_save = TRUE, 
                  update_length = 30, 
                  human_update_save = TRUE, 
                  summary_saves_only = TRUE,
                  full_save = TRUE, 
                  seed = 123456789L)
## Seed set to 123456789
## magenta v1.2.0
## Starting Stochastic Simulation for 10 years
## Running: [=====================================================================================] 100% eta: 0s
l <- length(out14)
strains <- out14[[l]]$populations_event_and_strains_List$Strain_barcode_vectors
infected <- which(out14[[l]]$population_List$Infection_States %in% c(1:4))

converted_barcodes <- lapply(strains[infected], function(x) {
  lapply(x, magenta:::convert_ibd_barcode, 24)
})

max(unlist(converted_barcodes))
## [1] 9025

Definitely plenty.

As when simulating normal barcodes, a summary of the genetics is made at each update interval. For simulations of IBD, we save the population level IBD (pIBD in the manuscript) as summary_ibd and the within host IBD (iIBD in the manuscript) as summary_within_ibd. The same system is used for how to bin the population into clinical infection and age groups:

out14[[1]]$summary_ibd
##       age_bin     clinical  N         mean
## 1  (-0.001,5] Asymptomatic 14 9.751341e-06
## 2  (-0.001,5]            P  0          NaN
## 3  (-0.001,5]            S  0          NaN
## 4      (5,15] Asymptomatic 27 1.560287e-04
## 5      (5,15]            P  0          NaN
## 6      (5,15]            S  0          NaN
## 7    (15,100] Asymptomatic 59 1.875604e-04
## 8    (15,100]     Clinical  1 3.154574e-03
## 9    (15,100]            P  0          NaN
## 10   (15,100]            S  0          NaN

And for iIBD:

out14[[1]]$summary_within_ibd
##       age_bin     clinical  N       mean
## 1  (-0.001,5] Asymptomatic  9 0.02314815
## 2  (-0.001,5]            P  0        NaN
## 3  (-0.001,5]            S  0        NaN
## 4      (5,15] Asymptomatic 19 0.00000000
## 5      (5,15]            P  0        NaN
## 6      (5,15]            S  0        NaN
## 7    (15,100] Asymptomatic 24 0.00000000
## 8    (15,100]     Clinical  0        NaN
## 9    (15,100]            P  0        NaN
## 10   (15,100]            S  0        NaN

Note how the denominator is different between the two measures. This is because iIBD is the comparison of IBD (same integer at a given locus) of the strains within an individual. Thus if an individual is only infected with one parasite strain then iIBD can not be calculated and these individuals are not included.

Summary

The above introduction gives an overview of how to run the model and covers all components of magenta that were used to produce the findings in the BioArxiv manuscript.

magenta, however, has been developed to also simulate drug resistance and consider more nuanced spatial set ups. Both of these are under development, with the resistance implementation fully in (but needing more documentation) and the inclusion of space via a metapopulation not fully implemented yet. Once documented these will be availabe in the Resistance Vignette and Spatial Vignette