Effecient way to decompose multiple time series in a data frame and compare the fit of additive and multiplicative models?

datascience.stackexchange https://datascience.stackexchange.com/questions/72655

  •  10-12-2020
  •  | 
  •  

Pregunta

I have a data frame in R that contains time series data of 7 variables that were taken on several hundred different individuals. I want to know if it would be more appropriate to use an additive model or a multiplicative model for each variable.

To give an example, the data is structured something like this:

set.seed(123)
ID = factor(letters[seq(15)])
Time = c(1000,1200,1234,980,1300,1020,1180,1908,1303,
        1045,1373,1111,1097,1167,1423)
df <- data.frame(ID = rep(ID, Time), Time = sequence(Time))
df[paste0('Var', c(1:7))] <- rnorm(sum(Time))

What is an effective way to decompose the data for each variable/ID combination, fit each with an additive model and a multiplicative model, and compare the fits?

¿Fue útil?

Solución

One way to do this would be to fit the decompositions with the same numbers of degrees of freedom and see which fits the best. It is convenient to do this using the tsibble and feasts packages as they allow for modelling many time series at once.

I've modified your example data so that it is possible to do a multiplicative decomposition -- having negative values in the data makes multiplicative decompositions problematic.

The multiplicative decomposition uses STL on the log data, and then exponentiates the trend and seasonal terms to put them back on the original scale.

Your example has no obvious seasonality so I have arbitrarily set the seasonal period to 12 for illustration purposes. Change it to whatever it should be.

I have set the trend window to be 99 and the seasonal component to be periodic. Again, change these to suit your actual data but the two fits should have the same values.

set.seed(123)
ID = factor(letters[seq(15)])
Time = c(1000,1200,1234,980,1300,1020,1180,1908,1303,
         1045,1373,1111,1097,1167,1423)
df <- data.frame(ID = rep(ID, Time), Time = sequence(Time))
df[paste0('Var', c(1:7))] <- abs(rnorm(sum(Time)))

library(tidyverse)
library(tsibble)
library(feasts)

# Create tsibble in long form
df <- df %>%
  pivot_longer(starts_with("Var"), names_to="Series", values_to="value") %>%
  as_tsibble(index=Time, key=c(ID,Series))
# Additive decompositions
additive <- df %>%
  model(add = STL(value ~ trend(window=99) + season("periodic", period=12))) %>%
  components()
# Multiplicative decompositions
multiplicative <- df %>%
  model(mult = STL(log(value) ~ trend(window=99) + season("periodic", period=12))) %>%
  components() %>%
  mutate(remainder = df$value - exp(trend+season_12))
# Find variance of remainders
rva <- additive %>%
  as_tibble() %>%
  group_by(ID, Series) %>%
  summarise(rv = var(remainder, na.rm=TRUE)) %>%
  ungroup()
rvm <- multiplicative %>%
  as_tibble() %>%
  group_by(ID, Series) %>%
  summarise(rv = var(remainder, na.rm=TRUE)) %>%
  ungroup()
# Which remainder has lowest variance?
left_join(rva, rvm, by = c("ID","Series")) %>%
  mutate(best = if_else(rv.x < rv.y, "additive", "multiplicative"))
#> # A tibble: 105 x 5
#>    ID    Series  rv.x  rv.y best    
#>    <fct> <chr>  <dbl> <dbl> <chr>   
#>  1 a     Var1   0.357 0.361 additive
#>  2 a     Var2   0.357 0.361 additive
#>  3 a     Var3   0.357 0.361 additive
#>  4 a     Var4   0.357 0.361 additive
#>  5 a     Var5   0.357 0.361 additive
#>  6 a     Var6   0.357 0.361 additive
#>  7 a     Var7   0.357 0.361 additive
#>  8 b     Var1   0.338 0.341 additive
#>  9 b     Var2   0.338 0.341 additive
#> 10 b     Var3   0.338 0.341 additive
#> # … with 95 more rows

Created on 2020-04-22 by the reprex package (v0.3.0)

Licenciado bajo: CC-BY-SA con atribución
scroll top