Plato and the Problem with Timeseries

May 2, 2014

Often, when we want to do an analysis, we depend on multiple timeseries to inform our understanding of whatever system we are trying to analyze. We talk about “predictor variables” and “leading indicators”. But this is literally a two dimensional view of what’s actually going on. Timeseries are just data shadows of real systems and processes, and reasoning about the real world in terms of timeseries alone often turns out to be analogous to how the prisoners in Plato’s Cave reason about their world.

There must be better ways to represent complex systems than through bundles of timeseries. I’m thinking specifically of systems like economies, or IT infrastructures, or the human brain. It is obviously possible to develop behavioral models based on timeseries for different parts of all these systems, but these models are based on increasingly complex mathemetical handwaving (read: heavily parametrized) as opposed to being true characterizations of the underlying processes. This makes them extremely susceptible to black swan events - events that could not possibly be predicted using ordinary timeseries analysis techniques.

How should economies, infrastructures, brains, and other systems actually be modeled, monitored, diagnosed, and predicted? I would guess through a network graph - a series of edges and nodes. Thinking “this country trades with that country in this way” is a lot more informative than “historically, the price of widgets in this country has been a leading indicator for the production of gadgets in that country”. For example, if something that widgets depend on, like wompies, suddenly become unavailable, then gadgets will also become unavailable. So now we realize that there has been a link between wompies and gadgets the entire time, but there’s no way we could have known that without a proper networked graph of the dependencies.

The problem is that developing accurate network models for complex systems does not seem tractable. How would you model the global economy strictly in terms of its networked components? It’s impossible to do it by hand.

We need some way to automatically derive networks or other graphical models from timeseries. If we were able to develop a robust networked model of a system over time, using nothing but readily available timeseries, we could use this model to inform diagnoses, forecasts, and simulations in a much more robust way. Intuitively, causality inferred from a network structure which directly models the underlying system seems vastly superior to causality inferred from pairwise correlations of relatively uninformative timeseries.