Bucketing Event Data in R
Bucketing Event Data in R
March 10, 2014

I often find myself dealing with event data over irregular intervals. I generally need to turn these data into a proper univariate timeseries bucketed over a regular interval in order to do anything useful with them. In the interest of me-never-forgetting-how-to-do-this-again, here’s the technique:

# Import our libraries:
library(lubridate)
library(xts)

# Get the data:
events <- read.csv("http://abe.is/full/of/data/events.csv")

# It's not very helpful, see?
head(events)

[1] "2014-01-25 12:40:00 EST" "2014-01-25 12:45:00 EST"
[3] "2014-01-25 13:17:00 EST" "2014-01-25 13:18:00 EST"

# So, let's first convert the dates:
events <- as.POSIXct(events$date)

# Round them to the nearest hour:
events <- floor_date(events, "hour")

# Then, create a vector of ones:
ones <- rep(c(1), length(events))

# Turn it into an xts timeseries:
ts <- xts(ones, order.by=events)

# And bucket!
ts <- period.apply(ts, endpoints(ts, "hours"), sum)

# You get a nice lovely series that looks like this:
head(ts)
                    [,1]
2014-01-25 12:00:00    2
2014-01-25 13:00:00    7
2014-01-25 14:00:00    7
2014-01-25 15:00:00    4
2014-01-25 16:00:00    3
2014-01-25 17:00:00    3

Much awesome! We’ve managed to take a bunch of timestamps, which were our events, and turn them into a timeseries bucketed by hour. You can bucket it by whatever you want, though - just change “hour” to “second”, “minute”, “day,” etc. Now you can plot it, analyze it, forecast it, and otherwise go on your merry way.