Introduction to attrib

library(attrib)
#> attrib 2021.1.2 https://folkehelseinstituttet.github.io/attrib
library(data.table)

Introduction

attrib provides a way of estimating what the mortality would have been if some given exposures are set to a reference value. By using simulations from the posterior distribution of all coefficients we can easily aggregate over time and locations while still estimating valid credible intervals.

This vignette will go through:

  • how to use fit_attrib to fit the model to the data
  • how to use est_attrib to estimate the mortality under different scenarios (i.e. when the exposures are at reference values and at observed values)
  • some examples of usages of the resulting dataset

Data example

We will use the datasets fake_data_county and fake_data_nation.

fake_data_county consists of fake mortality data for all counties of Norway on a weekly basis from 2010 until 2020. The dataset consists of the following features:

  • location_code: Location code of the different counties
  • week: Week number
  • season: Years of the season
  • year: Year
  • yrwk: Year and week
  • x: Number of weeks from the start of the season
  • pop: Population size
  • pr100_ili: Percentage of doctors consultations diagnosed with influenza like illnesses
  • pr100_ili_lag_1: pr_100_ili lagged with one week
  • temperature: Temperature
  • temperature_high: number of heatwaves
  • deaths: number of deaths

fake_data_nation is a similar dataset at the national level.

data_fake_county <- attrib::data_fake_county
data_fake_nation <- attrib::data_fake_nation
head(data_fake_county, 5)
#> Key: <location_code, week>
#>    location_code  week    season  year    yrwk     x    pop pr100_ili
#>           <char> <num>    <char> <num>  <char> <num>  <num>     <num>
#> 1:      county03     1 2009/2010  2010 2010-01    24 693494 0.5302766
#> 2:      county03     1 2010/2011  2011 2011-01    24 693494 0.7234059
#> 3:      county03     1 2011/2012  2012 2012-01    24 693494 1.7817489
#> 4:      county03     1 2012/2013  2013 2013-01    24 693494 1.6000083
#> 5:      county03     1 2013/2014  2014 2014-01    24 693494 1.6037883
#>    pr100_ili_lag_1 temperature temperature_high deaths
#>              <num>       <num>            <num>  <int>
#> 1:       0.4496182 -0.94325234                0    103
#> 2:       0.5359123  0.01462311                0     96
#> 3:       1.8919261 -2.03893188                0    105
#> 4:       1.3909804 -6.02035548                0     97
#> 5:       1.3942666 -1.72146124                0    114

In this example we will look at the exposures pr100_ili_lag_1 and temperature_high and calculate the attributable mortality due to these exposures.

Fitting using fit_attrib

County level

We want to estimate the attributable mortality due to ILI and heatwaves. attrib lets us fit models with both fixed and random effect and offsets using linear mixed models (LMM).

We use the glmer function from the lme4 package. In practice, this means we must specify the response, offsets, the fixed effects, and the random effects. In our case we will model the response deaths as a function of:

  • the fixed effects:
    • temperature_high
    • pr100_ili_lag_1
    • sin(2 * pi * (week - 1) / 52)
    • cos(2 * pi * (week - 1) / 52)
  • the random effects:
    • (1|location_code)
    • (pr100_ili_lag_1|season)
  • the offset:
    • log(pop)
#response
response <- "deaths"

# fixed effects
fixef_county <- " temperature_high +
  pr100_ili_lag_1 +
  sin(2 * pi * (week - 1) / 52) +
  cos(2 * pi * (week - 1) / 52)"


#random effects
ranef_county <- "(1|location_code) +
  (pr100_ili_lag_1|season)"

#offset
offset_county <- "log(pop)"

Now we fit the model using fit_attrib.


fit_county <- fit_attrib(data_fake_county, 
                          response = response, 
                          fixef = fixef_county, 
                          ranef = ranef_county, 
                          offset = offset_county)

This results in the following fit:

fit_county
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: poisson  ( log )
#> Formula: deaths ~ temperature_high + pr100_ili_lag_1 + sin(2 * pi * (week -  
#>     1)/52) + cos(2 * pi * (week - 1)/52) + offset(log(pop)) +  
#>     (1 | location_code) + (pr100_ili_lag_1 | season)
#>    Data: data
#>       AIC       BIC    logLik  deviance  df.resid 
#>  44645.63  44706.39 -22313.82  44627.63      6305 
#> Random effects:
#>  Groups        Name            Std.Dev. Corr
#>  location_code (Intercept)     0.000000     
#>  season        (Intercept)     0.000000     
#>                pr100_ili_lag_1 0.004719  NaN
#> Number of obs: 6314, groups:  location_code, 11; season, 11
#> Fixed Effects:
#>                 (Intercept)             temperature_high  
#>                    -8.79616                      0.08188  
#>             pr100_ili_lag_1  sin(2 * pi * (week - 1)/52)  
#>                     0.02796                      0.01878  
#> cos(2 * pi * (week - 1)/52)  
#>                     0.07498  
#> optimizer (Nelder_Mead) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings

Note that fit has the added attributes offset (saving the offset name) and fit_fix (the coefficients of the linear model fitted on only the fixed effects). These are needed by est_attrib to create the dataset containing only the fixed effects.

National level

We estimate the same as before But on a national level, meaning we remove the random effect (1|location_code) since we only have one location code. This gives the following features:

  • the fixed effects:
    • temperature_high
    • pr100_ili_lag_1
    • sin(2 * pi * (week - 1) / 52)
    • cos(2 * pi * (week - 1) / 52)
  • the random effects:
    • (pr100_ili_lag_1|season)
  • the offset:
    • log(pop)
# take in the fixed effects
response = "deaths"
fixef_nation <- "temperature_high +
  pr100_ili_lag_1 +
  sin(2 * pi * (week - 1) / 52) +
  cos(2 * pi * (week - 1) / 52)"


#take in the random effects
ranef_nation <- "(pr100_ili_lag_1|season)"

# take in the offset
offset_nation <- "log(pop)"

  fit_nation <- fit_attrib(data_fake_nation, 
                           response = response, 
                           fixef = fixef_nation, 
                           ranef = ranef_nation, 
                           offset = offset_nation)

Using the sim function

The sim function can be used to generate simulations for all the rows in our data.

It first generates n_sim simulations from the posterior distribution of the coefficients from out fit before applying these coefficients on our dataset generating n_sim simulations and expected mortality for each line. This is quite generic. Hence if the goal is to compute attributable mortality or incident risk ratios we use est_attrib as shown in a later part of the vignette.

n_sim <- 20
sim_data <- sim(fit_nation, data_fake_nation, n_sim)
head(sim_data[id_row == 1], 5)
#>     week    season  year    yrwk     x location_code     pop pr100_ili
#>    <num>    <char> <num>  <char> <num>        <char>   <num>     <num>
#> 1:     1 2009/2010  2010 2010-01    24         norge 5367580  1.160403
#> 2:     1 2009/2010  2010 2010-01    24         norge 5367580  1.160403
#> 3:     1 2009/2010  2010 2010-01    24         norge 5367580  1.160403
#> 4:     1 2009/2010  2010 2010-01    24         norge 5367580  1.160403
#> 5:     1 2009/2010  2010 2010-01    24         norge 5367580  1.160403
#>    pr100_ili_lag_1 temperature_high deaths id_row sim_id sim_value
#>              <num>            <num>  <int>  <int>  <num>     <num>
#> 1:        1.100072                0    891      1      1  909.6926
#> 2:        1.100072                0    891      1      2  903.3580
#> 3:        1.100072                0    891      1      3  903.3936
#> 4:        1.100072                0    891      1      4  902.2375
#> 5:        1.100072                0    891      1      5  903.2882

We can see that we now have multiple expected mortalities for the same dataline. This is due to the coefficient simulations.

Estimating attributable mortality using est_attrib

To estimate attributable mortality we simulate:

  • the estimated mortality for observed exposures
  • the estimated mortality for the exposures set to reference values

This is easily done using est_attrib.

We need to give the fit, the dataset, the exposures with reference values, and the number of simulations. est_attrib will then using the arm::sim function to generate simulations of the underlying posterior distribution. attrib::sim will then combine the simulated coefficients to estimate the modeled outcome (i.e. number of deaths) for each simulation.

exposures <- list( "temperature_high" = 0, "pr100_ili_lag_1" = 0)
n_sim <- 20
est_attrib_sim_county <- attrib::est_attrib(fit_county, 
                                        data_fake_county, 
                                        exposures = exposures, 
                                        n_sim = n_sim)

est_attrib_sim_nation <- attrib::est_attrib(fit_nation, 
                                        data_fake_nation, 
                                        exposures = exposures,
                                        n_sim = n_sim)

head(est_attrib_sim_county, 5)
#>    location_code  week    season  year    yrwk     x    pop pr100_ili
#>           <char> <num>    <char> <num>  <char> <num>  <num>     <num>
#> 1:      county03     1 2009/2010  2010 2010-01    24 693494 0.5302766
#> 2:      county03     1 2010/2011  2011 2011-01    24 693494 0.7234059
#> 3:      county03     1 2011/2012  2012 2012-01    24 693494 1.7817489
#> 4:      county03     1 2012/2013  2013 2013-01    24 693494 1.6000083
#> 5:      county03     1 2013/2014  2014 2014-01    24 693494 1.6037883
#>    pr100_ili_lag_1 temperature temperature_high deaths    id sim_id
#>              <num>       <num>            <num>  <int> <int>  <num>
#> 1:       0.4496182 -0.94325234                0    103     1      1
#> 2:       0.5359123  0.01462311                0     96     2      1
#> 3:       1.8919261 -2.03893188                0    105     3      1
#> 4:       1.3909804 -6.02035548                0     97     4      1
#> 5:       1.3942666 -1.72146124                0    114     5      1
#>    sim_value_exposures=observed sim_value_temperature_high=0
#>                           <num>                        <num>
#> 1:                     114.2108                     114.2108
#> 2:                     114.2906                     114.2906
#> 3:                     117.8818                     117.8818
#> 4:                     117.3912                     117.3912
#> 5:                     117.1597                     117.1597
#>    sim_value_pr100_ili_lag_1=0
#>                          <num>
#> 1:                    112.8706
#> 2:                    112.9797
#> 3:                    113.4313
#> 4:                    113.8555
#> 5:                    114.2707

We can see in the above dataset that the columns id, sim_id, sim_value_exposures=observed, sim_value_temperature_high=0, sim_value_pr100_ili_lag_1=0 are added to the previous set of columns. For each row in the original dataset we now have 20 rows, one for each of the simulations done by est_attrib. In each row we see the estimate of the number of deaths given a reference value for sim_value_temperature_high and sim_value_pr100_ili_lag_1.

To make the data processing easier later we convert the dataset from wide to long form and collapse the estimated mortality

est_attrib_county_long<-data.table::melt.data.table(est_attrib_sim_county, 
                                                  id.vars = c("location_code", 
                                                              "season",  
                                                              "x", 
                                                              "week", 
                                                              "id", 
                                                              "sim_id", 
                                                              "deaths", 
                                                              "sim_value_exposures=observed"),
                                           measure.vars = c("sim_value_temperature_high=0", 
                                                            "sim_value_pr100_ili_lag_1=0")) 
data.table::setnames(est_attrib_county_long, "variable", "attr")

head(est_attrib_county_long, 5)
#>    location_code    season     x  week    id sim_id deaths
#>           <char>    <char> <num> <num> <int>  <num>  <int>
#> 1:      county03 2009/2010    24     1     1      1    103
#> 2:      county03 2010/2011    24     1     2      1     96
#> 3:      county03 2011/2012    24     1     3      1    105
#> 4:      county03 2012/2013    24     1     4      1     97
#> 5:      county03 2013/2014    24     1     5      1    114
#>    sim_value_exposures=observed                         attr    value
#>                           <num>                       <fctr>    <num>
#> 1:                     114.2108 sim_value_temperature_high=0 114.2108
#> 2:                     114.2906 sim_value_temperature_high=0 114.2906
#> 3:                     117.8818 sim_value_temperature_high=0 117.8818
#> 4:                     117.3912 sim_value_temperature_high=0 117.3912
#> 5:                     117.1597 sim_value_temperature_high=0 117.1597

We can see that the columns sim_value_temperature_high=0, sim_value_pr100_ili_lag_1=0 are now collapsed into the new column attr and value with attr describing which exposure we have and value giving the corresponding reference value.

est_attrib_nation_long<-data.table::melt.data.table(est_attrib_sim_nation, 
                                                  id.vars = c("location_code", 
                                                              "season",  
                                                              "x", 
                                                              "week", 
                                                              "id", 
                                                              "sim_id", 
                                                              "deaths", 
                                                              "sim_value_exposures=observed"),
                                           measure.vars = c("sim_value_temperature_high=0", 
                                                            "sim_value_pr100_ili_lag_1=0")) 
data.table::setnames(est_attrib_nation_long, "variable", "attr")

Compare the national data to data aggregated from county to national level.

We will now aggregate our two simulated datasets (one on a county level and one on a national level) to aid in comparison.

Aggregate from county/weekly to national/seasonal

We proceed by aggregating the county dataset to the national/seasonal level. Afterwards we calculate the expected attributable mortality, exp_attr, by subtracting value (the simulated expected number of deaths given the reference value of the exposure) from the sim_value_exposures=observed.

To be able to separate this dataset from the other we add a tag.

aggregated_county_to_nation <-  est_attrib_county_long[,.(
  "sim_value_exposures=observed" = sum(`sim_value_exposures=observed`),
  value = sum(value), 
  deaths = sum(deaths)
), keyby = .(season, attr, sim_id)]

# Add exp_attr, exp_irr and a tag.
aggregated_county_to_nation[, exp_attr:= (`sim_value_exposures=observed` - value)]
aggregated_county_to_nation[, tag := "aggregated_from_county"]

head(aggregated_county_to_nation, 5)
#> Key: <season, attr, sim_id>
#>       season                         attr sim_id sim_value_exposures=observed
#>       <char>                       <fctr>  <num>                        <num>
#> 1: 2009/2010 sim_value_temperature_high=0      1                     43858.31
#> 2: 2009/2010 sim_value_temperature_high=0      2                     44092.49
#> 3: 2009/2010 sim_value_temperature_high=0      3                     44282.11
#> 4: 2009/2010 sim_value_temperature_high=0      4                     44420.66
#> 5: 2009/2010 sim_value_temperature_high=0      5                     44445.76
#>       value deaths exp_attr                    tag
#>       <num>  <int>    <num>                 <char>
#> 1: 43453.09  44421 405.2236 aggregated_from_county
#> 2: 43664.49  44421 427.9983 aggregated_from_county
#> 3: 43862.21  44421 419.8997 aggregated_from_county
#> 4: 43989.30  44421 431.3637 aggregated_from_county
#> 5: 44061.13  44421 384.6273 aggregated_from_county

Aggregating the national model per season

For the national model we aggregate over seasons and create exp_attr in the same way as above.

aggregated_nation <-  est_attrib_nation_long[, .(
  "sim_value_exposures=observed" = sum(`sim_value_exposures=observed`),
  value = sum(value), 
  deaths = sum(deaths)
), keyby = .(season, attr, sim_id)]

aggregated_nation[, exp_attr:= (`sim_value_exposures=observed` - value)]
aggregated_nation[, tag:= "nation"]
head(aggregated_nation, 5)
#> Key: <season, attr, sim_id>
#>       season                         attr sim_id sim_value_exposures=observed
#>       <char>                       <fctr>  <num>                        <num>
#> 1: 2009/2010 sim_value_temperature_high=0      1                     43928.91
#> 2: 2009/2010 sim_value_temperature_high=0      2                     44400.21
#> 3: 2009/2010 sim_value_temperature_high=0      3                     44427.94
#> 4: 2009/2010 sim_value_temperature_high=0      4                     44073.65
#> 5: 2009/2010 sim_value_temperature_high=0      5                     44130.06
#>       value deaths exp_attr    tag
#>       <num>  <int>    <num> <char>
#> 1: 43928.91  44421        0 nation
#> 2: 44400.21  44421        0 nation
#> 3: 44427.94  44421        0 nation
#> 4: 44073.65  44421        0 nation
#> 5: 44130.06  44421        0 nation

For simplicity we data.table::rbindlist the two datasets together.

library(ggplot2)
data_national<- data.table::rbindlist(list(aggregated_county_to_nation, aggregated_nation))

Calculate simulation quantiles.

The next thing to do is to aggregate away the simulations. The benefits of having the simulations is the possibility it gives to efficiently compute all desired quantiles. For this example we will use the .05, .5 and .95 quantiles.

# Quantile functins
q05 <- function(x){
  return(quantile(x, 0.05))
}
q95 <- function(x){
  return(quantile(x, 0.95))
}

We compute the quantiles for exp_attr in the following way.

col_names <- colnames(data_national)
data.table::setkeyv(data_national, 
                    col_names[!col_names %in% c("exp_attr", 
                                                "sim_id", 
                                                "sim_value_exposures=observed", 
                                                "value", 
                                                "deaths")])

aggregated_sim_seasonal_data_national<- data_national[,
                                   unlist(recursive = FALSE, 
                                          lapply(.(median = median, q05 = q05, q95 = q95),
                                                                    function(f) lapply(.SD, f)
                                   )), 
                                   by = eval(data.table::key(data_national)),
                                   .SDcols = c("exp_attr")]

head(aggregated_sim_seasonal_data_national,5)
#> Key: <season, attr, tag>
#>       season                         attr                    tag
#>       <char>                       <fctr>                 <char>
#> 1: 2009/2010 sim_value_temperature_high=0 aggregated_from_county
#> 2: 2009/2010 sim_value_temperature_high=0                 nation
#> 3: 2009/2010  sim_value_pr100_ili_lag_1=0 aggregated_from_county
#> 4: 2009/2010  sim_value_pr100_ili_lag_1=0                 nation
#> 5: 2010/2011 sim_value_temperature_high=0 aggregated_from_county
#>    median.exp_attr q05.exp_attr q95.exp_attr
#>              <num>        <num>        <num>
#> 1:        406.2245     384.4178     428.9161
#> 2:          0.0000       0.0000       0.0000
#> 3:        688.4236     572.7230     810.9124
#> 4:       1183.1996    1010.9683    1356.3397
#> 5:        431.6623     404.6100     454.2854

We can now see that we have credible intervals and estimates for attributable deaths for all exposures.

Plot to compare the national with the aggregated county to national model

To be able to compare the two models we make a point range plot using ggplot2.

q <- ggplot(aggregated_sim_seasonal_data_national[attr == "sim_value_pr100_ili_lag_1=0"], 
                       aes(x = season, y = median.exp_attr, group = tag, color = tag)) 
q <- q + geom_pointrange(aes(x = season, y = median.exp_attr, ymin = q05.exp_attr, ymax = q95.exp_attr), position = position_dodge(width = 0.3))
q <- q + ggtitle("Attributable mortality due to ILI in Norway according to 2 models") 
q <- q +  scale_y_continuous("Estimated attributable mortality") 
q <- q +  theme(axis.text.x = element_text(angle = 90),axis.title.x=element_blank()) 
q <- q +  labs(caption = glue::glue("Aggregated county model: Attributable mortality modeled on a county level before beeing aggregated up to a national level.\n National model: Attributable mortality modeled on a national level."))
q

Comparing cumulative sums over seasons

When operating on the national level, we prefer to aggregate the county model to national level (instead of using the national model). This ensures consistent results at all geographical levels.

aggregated_county_to_nation <-  est_attrib_county_long[, .(
  "sim_value_exposures=observed" = sum(`sim_value_exposures=observed`),
  value = sum(value), 
  deaths = sum(deaths)
), keyby = .(season, x, week, attr, sim_id)]

aggregated_county_to_nation[, exp_attr:= (`sim_value_exposures=observed` - value)]
aggregated_county_to_nation[, exp_irr:= (`sim_value_exposures=observed` /value)]
head(aggregated_county_to_nation,5)
#> Key: <season, x, week, attr, sim_id>
#>       season     x  week                         attr sim_id
#>       <char> <num> <num>                       <fctr>  <num>
#> 1: 2009/2010     1    30 sim_value_temperature_high=0      1
#> 2: 2009/2010     1    30 sim_value_temperature_high=0      2
#> 3: 2009/2010     1    30 sim_value_temperature_high=0      3
#> 4: 2009/2010     1    30 sim_value_temperature_high=0      4
#> 5: 2009/2010     1    30 sim_value_temperature_high=0      5
#>    sim_value_exposures=observed    value deaths exp_attr  exp_irr
#>                           <num>    <num>  <int>    <num>    <num>
#> 1:                     779.0600 742.4210    816 36.63896 1.049351
#> 2:                     784.2061 745.5027    816 38.70340 1.051916
#> 3:                     787.2436 749.2873    816 37.95627 1.050657
#> 4:                     792.8158 753.7923    816 39.02350 1.051770
#> 5:                     794.3861 759.5668    816 34.81924 1.045841

Again we compute the quantiles.


col_names <- colnames(aggregated_county_to_nation)
data.table::setkeyv(aggregated_county_to_nation, col_names[!col_names %in% c("exp_attr", "exp_irr","sim_id", "exposures", "sim_value_exposures=observed", "value")])

aggregated_county_to_nation_weekly <- aggregated_county_to_nation[,
              unlist(recursive = FALSE, lapply(.(median = median, q05 = q05, q95 = q95),
                                               function(f) lapply(.SD, f)
              )), 
              by=eval(data.table::key(aggregated_county_to_nation)),
              .SDcols = c("exp_attr", "exp_irr")]

We then estimate the cumulative sums of attributable mortality and corresponding credible intervals.

aggregated_county_to_nation_weekly[, cumsum := cumsum(median.exp_attr), by = .( attr, season)]
aggregated_county_to_nation_weekly[, cumsum_q05 := cumsum(q05.exp_attr), by = .( attr, season)]
aggregated_county_to_nation_weekly[, cumsum_q95 := cumsum(q95.exp_attr), by = .( attr, season)]

head(aggregated_county_to_nation_weekly, 5)
#> Key: <season, x, week, attr, deaths>
#>       season     x  week                         attr deaths median.exp_attr
#>       <char> <num> <num>                       <fctr>  <int>           <num>
#> 1: 2009/2010     1    30 sim_value_temperature_high=0    816    36.760202067
#> 2: 2009/2010     1    30  sim_value_pr100_ili_lag_1=0    816     0.000000000
#> 3: 2009/2010     2    31 sim_value_temperature_high=0    854    78.624885724
#> 4: 2009/2010     2    31  sim_value_pr100_ili_lag_1=0    854     0.002206921
#> 5: 2009/2010     3    32 sim_value_temperature_high=0    777    44.157407886
#>    median.exp_irr q05.exp_attr q05.exp_irr q95.exp_attr q95.exp_irr
#>             <num>        <num>       <num>        <num>       <num>
#> 1:       1.049127 34.798470669    1.045839 38.809512025    1.051903
#> 2:       1.000000  0.000000000    1.000000  0.000000000    1.000000
#> 3:       1.104903 74.373542623    1.097824 83.017392031    1.110886
#> 4:       1.000003  0.001822797    1.000002  0.002603291    1.000003
#> 5:       1.058749 41.779707163    1.054815 46.580456354    1.062070
#>          cumsum   cumsum_q05   cumsum_q95
#>           <num>        <num>        <num>
#> 1: 3.676020e+01 3.479847e+01 3.880951e+01
#> 2: 0.000000e+00 0.000000e+00 0.000000e+00
#> 3: 1.153851e+02 1.091720e+02 1.218269e+02
#> 4: 2.206921e-03 1.822797e-03 2.603291e-03
#> 5: 1.595425e+02 1.509517e+02 1.684074e+02

We can then plot the estimated cumulative attributable mortality over influenza seasons in Norway

library(ggplot2)
q <- ggplot(
  data = aggregated_county_to_nation_weekly[
    season %in% c(
      "2015/2016",
      "2016/2017",
      "2017/2018",
      "2018/2019",
      "2019/2020"
    ) &
    attr == "sim_value_pr100_ili_lag_1=0"
  ],
  aes(
    x = x, 
    y = cumsum, 
    group = season, 
    color = season, 
    fill = season
  )
)
q <- q + geom_line()
q <- q + geom_ribbon(
  data = aggregated_county_to_nation_weekly[
    season %in% c("2019/2020") &
    attr == "sim_value_pr100_ili_lag_1=0"
  ],
  aes(
    ymin = cumsum_q05, 
    ymax = cumsum_q95
  ), 
  alpha = 0.4, 
  colour = NA
)
q <- q + scale_y_continuous("Estimated cumulative attributable mortality")
q <- q + ggtitle("Estimated cumulative attributable mortality over influenza seasons in Norway")
q

We can also plot the estimated weekly attributable mortality in Norway

q <- ggplot(
  data = aggregated_county_to_nation_weekly[attr == "sim_value_pr100_ili_lag_1=0"], 
  aes(x = x, y = cumsum, group = season)
  ) 
q <- q + geom_line(
  data = aggregated_county_to_nation_weekly[
    season != "2019/2020" &
    attr == "sim_value_pr100_ili_lag_1=0"
  ],
  aes(
    x = x, 
    y = median.exp_attr, 
    group = season
  ), 
  color = "grey"
)
q <- q + geom_line(
  data = aggregated_county_to_nation_weekly[
    season == "2019/2020" &
    attr == "sim_value_pr100_ili_lag_1=0"
  ], 
  aes(
    x = x,
    y = median.exp_attr,
    group = season
  ), 
  color = "blue"
)
q <- q + geom_ribbon(
  data = aggregated_county_to_nation_weekly[
    season == "2019/2020" &
    attr == "sim_value_pr100_ili_lag_1=0"
  ],
  aes(
    x = x,
    ymin = q05.exp_attr,
    ymax = q95.exp_attr
  ),
  fill = "blue",
  alpha=0.4
)
q <- q + scale_y_continuous("Estimated attributable mortality")
q <- q + ggtitle("Estimated mortality due to ILI per week")
q

Incident rate ratio

Until now we have focused on estimating attributable mortality. Now we will investigate computing the incident rate ratio (IRR) for pr100_ili_lag_1. To do this we will use the fit made by fit_attrib on the county dataset but we will change the values for pr100_ili_lag_1 to 1 (IRRs are generally expressed as the effect of the exposure changing from 0 to 1).

data_fake_county_irr <- data.table::copy(data_fake_county)
data_fake_county_irr[, pr100_ili_lag_1 := 1]
head(data_fake_county_irr, 5)
#> Key: <location_code, week>
#>    location_code  week    season  year    yrwk     x    pop pr100_ili
#>           <char> <num>    <char> <num>  <char> <num>  <num>     <num>
#> 1:      county03     1 2009/2010  2010 2010-01    24 693494 0.5302766
#> 2:      county03     1 2010/2011  2011 2011-01    24 693494 0.7234059
#> 3:      county03     1 2011/2012  2012 2012-01    24 693494 1.7817489
#> 4:      county03     1 2012/2013  2013 2013-01    24 693494 1.6000083
#> 5:      county03     1 2013/2014  2014 2014-01    24 693494 1.6037883
#>    pr100_ili_lag_1 temperature temperature_high deaths
#>              <num>       <num>            <num>  <int>
#> 1:               1 -0.94325234                0    103
#> 2:               1  0.01462311                0     96
#> 3:               1 -2.03893188                0    105
#> 4:               1 -6.02035548                0     97
#> 5:               1 -1.72146124                0    114

Then we can set the reference value to zero and hence obtain the IRR for the given exposure.

exposures_irr = c(pr100_ili_lag_1 = 0)

Now we use est_attrib to create the simulations.

est_attrib_sim_county_irr <- attrib::est_attrib(
  fit_county, 
  data_fake_county_irr, 
  exposures = exposures_irr,
  n_sim = 100
)
head(est_attrib_sim_county_irr, 5)
#>    location_code  week    season  year    yrwk     x    pop pr100_ili
#>           <char> <num>    <char> <num>  <char> <num>  <num>     <num>
#> 1:      county03     1 2009/2010  2010 2010-01    24 693494 0.5302766
#> 2:      county03     1 2010/2011  2011 2011-01    24 693494 0.7234059
#> 3:      county03     1 2011/2012  2012 2012-01    24 693494 1.7817489
#> 4:      county03     1 2012/2013  2013 2013-01    24 693494 1.6000083
#> 5:      county03     1 2013/2014  2014 2014-01    24 693494 1.6037883
#>    pr100_ili_lag_1 temperature temperature_high deaths    id sim_id
#>              <num>       <num>            <num>  <int> <int>  <num>
#> 1:               1 -0.94325234                0    103     1      1
#> 2:               1  0.01462311                0     96     2      1
#> 3:               1 -2.03893188                0    105     3      1
#> 4:               1 -6.02035548                0     97     4      1
#> 5:               1 -1.72146124                0    114     5      1
#>    sim_value_exposures=observed sim_value_pr100_ili_lag_1=0
#>                           <num>                       <num>
#> 1:                     116.8013                    113.5957
#> 2:                     115.8566                    113.1569
#> 3:                     115.9629                    113.3716
#> 4:                     116.5766                    113.7831
#> 5:                     115.9900                    113.6162

We see we have obtained values for the reference of the exposure in the same way as before. The difference is that we changed the dataset before running est_attrib. This means we will now be observing the difference between pr100_ili_lag_1=0 and pr100_ili_lag_1=1.

We now aggregate to the national seasonal level.

aggregated_county_to_nation_sim_irr <-  est_attrib_sim_county_irr[, .(
  "sim_value_exposures=observed" = sum(`sim_value_exposures=observed`),
  "sim_value_pr100_ili_lag_1=0"= sum(`sim_value_pr100_ili_lag_1=0`), 
  deaths = sum(deaths)
), keyby = .(season, sim_id)]

Here we generate the IRR:

aggregated_county_to_nation_sim_irr[, exp_irr:= (`sim_value_exposures=observed`/`sim_value_pr100_ili_lag_1=0`
)]
head(aggregated_county_to_nation_sim_irr,5)
#> Key: <season, sim_id>
#>       season sim_id sim_value_exposures=observed sim_value_pr100_ili_lag_1=0
#>       <char>  <num>                        <num>                       <num>
#> 1: 2009/2010      1                     44824.92                    43594.70
#> 2: 2009/2010      2                     44999.62                    43410.83
#> 3: 2009/2010      3                     44911.74                    43283.49
#> 4: 2009/2010      4                     44611.30                    43371.94
#> 5: 2009/2010      5                     44837.44                    43653.91
#>    deaths  exp_irr
#>     <int>    <num>
#> 1:  44421 1.028219
#> 2:  44421 1.036599
#> 3:  44421 1.037618
#> 4:  44421 1.028575
#> 5:  44421 1.027112

Now we can compute the quantiles:


col_names <- colnames(aggregated_county_to_nation_sim_irr)
data.table::setkeyv(
  aggregated_county_to_nation_sim_irr,
  col_names[!col_names %in% c("exp_irr", "sim_id", "sim_value_exposures=observed", "sim_value_pr100_ili_lag_1=0")]
)

aggregated_county_to_nation_irr <- aggregated_county_to_nation_sim_irr[,
  unlist(recursive = FALSE, lapply(.(median = median, q05 = q05, q95 = q95), function(f) lapply(.SD, f))),
  by = eval(data.table::key(aggregated_county_to_nation_sim_irr)),
  .SDcols = c("exp_irr")
]
aggregated_county_to_nation_irr[, tag := "aggregated"]

aggregated_county_to_nation_irr
#> Key: <season, deaths>
#>        season deaths median.exp_irr q05.exp_irr q95.exp_irr        tag
#>        <char>  <int>          <num>       <num>       <num>     <char>
#>  1: 2009/2010  44421       1.031681    1.026371    1.039722 aggregated
#>  2: 2010/2011  43062       1.027203    1.021396    1.035012 aggregated
#>  3: 2011/2012  43431       1.026417    1.020577    1.034055 aggregated
#>  4: 2012/2013  43228       1.028238    1.022689    1.036722 aggregated
#>  5: 2013/2014  43328       1.024444    1.019271    1.033041 aggregated
#>  6: 2014/2015  42951       1.027011    1.022164    1.035480 aggregated
#>  7: 2015/2016  43736       1.021526    1.016067    1.030191 aggregated
#>  8: 2016/2017  43821       1.032076    1.027001    1.040593 aggregated
#>  9: 2017/2018  43716       1.030563    1.025435    1.038934 aggregated
#> 10: 2018/2019  43326       1.026096    1.021014    1.034313 aggregated
#> 11: 2019/2020  43572       1.029604    1.024646    1.038280 aggregated

Now we compare the resulting values for IRR with the ones obtained by coef(fit_county)$season and the 90 percent credible interval computed manually using the standard deviation given by summary(fit_county) for pr100_ili_lag_1.

coef_fit_county <- data.table::as.data.table(coef(fit_county)$season)
col_names_coef <- c("pr100_ili_lag_1")
coef_irr_data <- coef_fit_county[, ..col_names_coef]
coef_irr_data[, irr := exp(pr100_ili_lag_1)]
coef_irr_data[, q05 := exp(pr100_ili_lag_1 - 1.645 *0.003761)]  # 0.003761 is the standard deviation from coef(fit_county)
coef_irr_data[, q95 := exp(pr100_ili_lag_1 + 1.645 *0.003761)]
coef_irr_data[, tag := "from_coef"]
coef_irr_data
#>     pr100_ili_lag_1      irr      q05      q95       tag
#>               <num>    <num>    <num>    <num>    <char>
#>  1:      0.03191705 1.032432 1.026064 1.038839 from_coef
#>  2:      0.02728584 1.027662 1.021323 1.034039 from_coef
#>  3:      0.02649384 1.026848 1.020515 1.033221 from_coef
#>  4:      0.02850521 1.028915 1.022569 1.035301 from_coef
#>  5:      0.02478464 1.025094 1.018772 1.031456 from_coef
#>  6:      0.02723381 1.027608 1.021270 1.033985 from_coef
#>  7:      0.02211935 1.022366 1.016060 1.028711 from_coef
#>  8:      0.03244745 1.032980 1.026608 1.039390 from_coef
#>  9:      0.03068955 1.031165 1.024805 1.037565 from_coef
#> 10:      0.02624050 1.026588 1.020256 1.032959 from_coef
#> 11:      0.02986655 1.030317 1.023962 1.036711 from_coef

Add the correct seasons to the data.

coef_irr_data <- cbind(season = aggregated_county_to_nation_irr$season, coef_irr_data)
coef_irr_data
#>        season pr100_ili_lag_1      irr      q05      q95       tag
#>        <char>           <num>    <num>    <num>    <num>    <char>
#>  1: 2009/2010      0.03191705 1.032432 1.026064 1.038839 from_coef
#>  2: 2010/2011      0.02728584 1.027662 1.021323 1.034039 from_coef
#>  3: 2011/2012      0.02649384 1.026848 1.020515 1.033221 from_coef
#>  4: 2012/2013      0.02850521 1.028915 1.022569 1.035301 from_coef
#>  5: 2013/2014      0.02478464 1.025094 1.018772 1.031456 from_coef
#>  6: 2014/2015      0.02723381 1.027608 1.021270 1.033985 from_coef
#>  7: 2015/2016      0.02211935 1.022366 1.016060 1.028711 from_coef
#>  8: 2016/2017      0.03244745 1.032980 1.026608 1.039390 from_coef
#>  9: 2017/2018      0.03068955 1.031165 1.024805 1.037565 from_coef
#> 10: 2018/2019      0.02624050 1.026588 1.020256 1.032959 from_coef
#> 11: 2019/2020      0.02986655 1.030317 1.023962 1.036711 from_coef

rbindlist the two datasets together.

total_data_irr <- data.table::rbindlist(list(coef_irr_data, aggregated_county_to_nation_irr), use.names = FALSE)
total_data_irr[, pr100_ili_lag_1 := NULL]
total_data_irr
#>        season      irr      q05      q95        tag
#>        <char>    <num>    <num>    <num>     <char>
#>  1: 2009/2010 1.032432 1.026064 1.038839  from_coef
#>  2: 2010/2011 1.027662 1.021323 1.034039  from_coef
#>  3: 2011/2012 1.026848 1.020515 1.033221  from_coef
#>  4: 2012/2013 1.028915 1.022569 1.035301  from_coef
#>  5: 2013/2014 1.025094 1.018772 1.031456  from_coef
#>  6: 2014/2015 1.027608 1.021270 1.033985  from_coef
#>  7: 2015/2016 1.022366 1.016060 1.028711  from_coef
#>  8: 2016/2017 1.032980 1.026608 1.039390  from_coef
#>  9: 2017/2018 1.031165 1.024805 1.037565  from_coef
#> 10: 2018/2019 1.026588 1.020256 1.032959  from_coef
#> 11: 2019/2020 1.030317 1.023962 1.036711  from_coef
#> 12: 2009/2010 1.031681 1.026371 1.039722 aggregated
#> 13: 2010/2011 1.027203 1.021396 1.035012 aggregated
#> 14: 2011/2012 1.026417 1.020577 1.034055 aggregated
#> 15: 2012/2013 1.028238 1.022689 1.036722 aggregated
#> 16: 2013/2014 1.024444 1.019271 1.033041 aggregated
#> 17: 2014/2015 1.027011 1.022164 1.035480 aggregated
#> 18: 2015/2016 1.021526 1.016067 1.030191 aggregated
#> 19: 2016/2017 1.032076 1.027001 1.040593 aggregated
#> 20: 2017/2018 1.030563 1.025435 1.038934 aggregated
#> 21: 2018/2019 1.026096 1.021014 1.034313 aggregated
#> 22: 2019/2020 1.029604 1.024646 1.038280 aggregated
#>        season      irr      q05      q95        tag
q <- ggplot(
  data = total_data_irr, 
  aes(
    x = season,
    group = tag, 
    color = tag
  )
) 
q <- q +  geom_pointrange(
  aes(
    y = irr,
    ymin = q05,
    ymax = q95
  ),
  position = position_dodge(width = 0.3)
)
q <- q + theme(axis.text.x = element_text(angle = 90),axis.title.x=element_blank())
q <- q + labs(y = "Incident risk ratio")
q <- q + ggtitle("Incident risk ratio for ILI per season")
q

As we can see these intervals are very similar.

The benefit of the simulated approach is that this process will be equally easy no matter the complexity of what we want to compute the IRR for. We do not have to take into account the variance-covariance matrix at any stage.