Often, when we want to do an analysis, we depend on multiple timeseries to inform our understanding of whatever system we are trying to analyze. We talk about "predictor variables" and "leading indicators". But this is literally a two dimensional view of what's actually going on. Timeseries are just data shadows of real systems and processes, and reasoning about the real world in terms of timeseries alone often turns out to be analogous to how the prisoners in Plato's Cave reason about their world. read on…
Everyone loves books, but I think they leave a lot to be desired. Generally because I can't be bothered to read most of them. read on…
As a worker in the field offices of a large distributed workforce assembled by venture capitalists, equity is usually a part of my compensation package. Of late, equity has been getting a bit of a bum rap, and rightfully so. I want to explore why, and what a truly equitable, sustainable corporate structure might look like for founders, investors, and employees alike. read on…
There's a lot of talk these days about age bias among Silicon Valley VCs. Luckily, there's no need for narratives when we've got Crunchbase at our disposal! read on…
Analyzing Citibike Usage
March 13, 2014
After reading a rogue tweet by dataist-in-chief Chris Wiggins, I figured I'd finally dig into the reams of Citibike usage data I've been collecting for the better part of a year. He claims it's finally Citibike weather, and to that I say, show me the data! read on…
Bucketing Event Data in R
March 10, 2014
I often find myself dealing with event data over irregular intervals. I generally need to turn these data into a proper univariate timeseries bucketed over a regular interval in order to do anything useful with them. In the interest of me-never-forgetting-how-to-do-this-again, here's the technique: read on…
When we want to think about anomaly detection in timeseries (in a web operations context, at least), we want to be sure we're thinking about detecting shifts in probability density functions, as opposed to simple outliers. The reason for this is that spiky outliers of one or two datapoints are not often actionable, so alerting on them is silly. It's useful to know about spikes, because they still represent systemic processes that may be detrimental to our infrastructure, but they usually don't represent _fundamental_ shifts in the underlying processes that need to be dealt with immediately. Instead, they are simply very rare occurrences of an otherwise normal system. read on…
The first thing to note is that governments are inefficient. This is a structural, systemic problem, and I don’t believe anyone can engineer a hypothetical government with the proper incentive structures to avoid inefficiencies in the institutions it creates. In general, there seems to be an inverse relationship between the size of an organization and its efficiency, where our spectrum runs the gamut from big government all the way down to personal finance. If individuals can most efficiently use the resources they are allotted, it stands to reason that individuals in aggregate ought to be allotted as much of a nation’s wealth as possible, and a government ought to be allotted as little of it as necessary. read on…