---
title: "Use with dplyr"
author: "Davis Vaughan"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{Use with dplyr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## Package motivation

`tibbletime` on its own has useful functions for manipulating time-based 
tibbles, but one of its most useful features is how nicely it plays with 
`dplyr`. Traditionally, performing grouped analysis over a time period with
`dplyr` (like quarterly / monthly summaries) is doable, but it could be easier
and typically requires use of the `lubridate` package along with the 
creation of multiple columns to group on. Below is an example of
a monthly summary from a daily dataset.

```{r, warning=FALSE, message=FALSE}
library(tibbletime)
library(dplyr)
library(lubridate)

series <- create_series('2013' ~ '2017', 'day', class = "Date") %>%
  mutate(var = rnorm(n()))

series

series %>%
  mutate(year = year(date), month = month(date)) %>%
  group_by(year, month) %>%
  summarise(mean_var = mean(var))
```

This gets more difficult the more granular you go. Getting 5-minute summaries
from minute or second data requires grouping on year, month, day, hour and minute 
columns.

## Index manipulation

With `tibbletime`, rather than creating new columns to group on, you 
manipulate your original date column into something that corresponds to 
the period you are summarising at. The `tibbletime` way to do this is with 
`collapse_by()`.

```{r}
series %>%
  collapse_by("month") %>%
  group_by(date) %>%
  summarise(mean_var = mean(var))
```

While `collapse_by()` directly manipulates the index column, the lower level `collapse_index()`
function can be used inside of a call to `mutate()` to modify the index column
and then save it as a new column. This can be useful if you don't want to lose
the original index column.

This works for more granular series too. Below we aggregate 5 second level data
up to hourly. This is working with a faily sizable ~19 million row data set.

```{r}
second_series <- create_series('2013' ~ '2015', '5 second')

second_series %>%
  mutate(var = rnorm(n())) %>%
  collapse_by("hour") %>%
  group_by(date) %>%
  summarise(mean_var = mean(var))
```

## Multiple series

One really powerful benefit of working in the `tidyverse` is being able to 
manipulate multiple series at once. Essentially we can create multiple layers
of groupings, one for the stocks we are working with (like Facebook and Apple), 
and one for the period you want to summarise your data at (daily, yearly, etc).

Below we use `create_series()` to create two dummy hourly price series, combine them,
and calculate the OHLC (Open, High, Low, Close) prices per day by first collapsing
to daily with `collapse_by()` to have something to group on.

```{r}
set.seed(123)

# Create price series of hourly movements for apple and facebook stock.
apple <- create_series('2014' ~ '2016', period = '1 hour') %>%
  mutate(price = 100 + cumsum(rnorm(n(), mean = 0, sd = .5)))

facebook <- create_series('2014' ~ '2016', period = '1 hour') %>%
  mutate(price = 150 + cumsum(rnorm(n(), mean = 0, sd = .5)))

# Bind them together and create a symbol column to group on
price_series <- bind_rows(list(apple = apple, facebook = facebook), .id = "symbol") %>%
  as_tbl_time(date) %>%
  group_by(symbol)

# Collapse to daily and transform to OHLC (Open, High, Low, Close), a 
# common financial transformation
price_series %>%
  collapse_by("day") %>%
  group_by(symbol, date) %>%
  summarise(
    open  = first(price),
    high  = max(price),
    low   = min(price),
    close = last(price)
  ) %>%
  slice(1:5)
```