--- title: "Rolling calculations in tibbletime" author: "Davis Vaughan" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{Rolling calculations in tibbletime} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introducing rollify() A common task in financial analyses is to perform a rolling calculation. This might be a single value like a rolling mean or standard deviation, or it might be more complicated like a rolling linear regression. To account for this flexibility, `tibbletime` has the `rollify()` function. This function allows you to turn _any_ function into a rolling version of itself. In the `tidyverse`, this type of function is known as an _adverb_ because it _modifies_ an existing function, which are typically given _verb_ names. ## Datasets required ```{r, message=FALSE, warning=FALSE} library(tibbletime) library(dplyr) library(tidyr) # Facebook stock prices. data(FB) # Only a few columns FB <- select(FB, symbol, date, open, close, adjusted) ``` ## A rolling average To calculate a rolling average, picture a column in a data frame where you take the average of the values in rows 1-5, then in rows 2-6, then in 3-7, and so on until you reach the end of the dataset. This type of 5-period moving window is a rolling calculation, and is often used to smooth out noise in a dataset. Let's see how to do this with `rollify()`. ```{r} # The function to use at each step is `mean`. # The window size is 5 rolling_mean <- rollify(mean, window = 5) rolling_mean ``` We now have a rolling version of the function, `mean()`. You use it in a similar way to how you might use `mean()`. ```{r} mutate(FB, mean_5 = rolling_mean(adjusted)) ``` You can create multiple versions of the rolling function if you need to calculate the mean at multiple window lengths. ```{r} rolling_mean_2 <- rollify(mean, window = 2) rolling_mean_3 <- rollify(mean, window = 3) rolling_mean_4 <- rollify(mean, window = 4) FB %>% mutate( rm10 = rolling_mean_2(adjusted), rm20 = rolling_mean_3(adjusted), rm30 = rolling_mean_4(adjusted) ) ``` ## Purrr functional syntax `rollify()` is built using pieces from the `purrr` package. One of those is the ability to accept an anonymous function using the `~` function syntax. The documentation, `?rollify`, gives a thorough walkthrough of the different forms you can pass to `rollify()`, but let's see a few more examples. ```{r} # Rolling mean, but with function syntax rolling_mean <- rollify(.f = ~mean(.x), window = 5) mutate(FB, mean_5 = rolling_mean(adjusted)) ``` You can create anonymous functions (functions without a name) on the fly. ```{r} # 5 period average of 2 columns (open and close) rolling_avg_sum <- rollify(~ mean(.x + .y), window = 5) mutate(FB, avg_sum = rolling_avg_sum(open, close)) ``` ## Optional arguments To pass optional arguments (not `.x` or `.y`) to your rolling function, they must be specified in the non-rolling form in the call to `rollify()`. For instance, say our dataset had `NA` values, but we still wanted to calculate an average. We need to specify `na.rm = TRUE` as an argument to `mean()`. ```{r} FB$adjusted[1] <- NA # Do this rolling_mean_na <- rollify(~mean(.x, na.rm = TRUE), window = 5) FB %>% mutate(mean_na = rolling_mean_na(adjusted)) # Don't try this! # rolling_mean_na <- rollify(~mean(.x), window = 5) # FB %>% mutate(mean_na = rolling_mean_na(adjusted, na.rm = TRUE)) # Reset FB data(FB) FB <- select(FB, symbol, date, adjusted) ``` ## Returning more than 1 value per call Say our rolling function returned a call to a custom `summary_df()` function. This function calculates a 5 number number summary and returns it as a tidy data frame. We won't be able to use the rolling version of this out of the box. `dplyr::mutate()` will complain that an incorrect number of values were returned since `rollify()` attempts to unlist at each call. Essentially, each call would be returning 5 values instead of 1. What we need is to be able to create a list-column. To do this, specify `unlist = FALSE` in the call to `rollify()`. ```{r} # Our data frame summary summary_df <- function(x) { data.frame( rolled_summary_type = c("mean", "sd", "min", "max", "median"), rolled_summary_val = c(mean(x), sd(x), min(x), max(x), median(x)) ) } # A rolling version, with unlist = FALSE rolling_summary <- rollify(~summary_df(.x), window = 5, unlist = FALSE) FB_summarised <- mutate(FB, summary_list_col = rolling_summary(adjusted)) FB_summarised ``` The neat thing is that after removing the `NA` values at the beginning, the list-column can be unnested using `tidyr::unnest()` giving us a nice tidy 5-period rolling summary. ```{r} FB_summarised %>% filter(!is.na(summary_list_col)) %>% unnest(cols = summary_list_col) ``` ## Custom missing values The last example was a little clunky because to unnest we had to remove the first few missing rows manually. If those missing values were empty data frames then `unnest()` would have known how to handle them. Luckily, the `na_value` argument will allow us to specify a value to fill the `NA` spots at the beginning of the roll. ```{r} rolling_summary <- rollify(~summary_df(.x), window = 5, unlist = FALSE, na_value = data.frame()) FB_summarised <- mutate(FB, summary_list_col = rolling_summary(adjusted)) FB_summarised ``` Now unnesting directly: ```{r} FB_summarised %>% unnest(cols = summary_list_col) ``` Finally, if you want to actually keep those first few NA rows in the unnest, you can pass a data frame that is initialized with the same column names as the rest of the values. ```{r} rolling_summary <- rollify(~summary_df(.x), window = 5, unlist = FALSE, na_value = data.frame(rolled_summary_type = NA, rolled_summary_val = NA)) FB_summarised <- mutate(FB, summary_list_col = rolling_summary(adjusted)) FB_summarised %>% unnest(cols = summary_list_col) ``` ## Rolling regressions A final use of this flexible function is to calculate rolling regressions. A very ficticious example is to perform a rolling regression on the `FB` dataset of the form `close ~ high + low + volume`. Notice that we have 4 columns to pass here. This is more complicated than a `.x` and `.y` example, but have no fear. The arguments can be specified in order as `..1`, `..2`, ... for as far as is required, or you can pass a freshly created anonymous function. The latter is what we will do so we can preserve the names of the variables in the regression. Again, since this returns a linear model object, we will specify `unlist = FALSE`. Unfortunately there is no easy default NA value to pass here. ```{r} # Reset FB data(FB) rolling_lm <- rollify(.f = function(close, high, low, volume) { lm(close ~ high + low + volume) }, window = 5, unlist = FALSE) FB_reg <- mutate(FB, roll_lm = rolling_lm(close, high, low, volume)) FB_reg ``` To get some useful information about the regressions, we will use `broom::tidy()` and apply it to each regression using a `mutate() + map()` combination. ```{r} FB_reg %>% filter(!is.na(roll_lm)) %>% mutate(tidied = purrr::map(roll_lm, broom::tidy)) %>% unnest(tidied) %>% select(symbol, date, term, estimate, std.error, statistic, p.value) ```