How to ‘Fix’ Standard Deviations

Standard deviations are a popular and often useful measure of dispersion. To be sure, a standard deviation is merely the most likely deviation from the mean. It also doesn’t take into account the shape of the probability distribution function (this is done better using, for example, entropy, which is a more versatile measure of dispersion).

Standard deviations, however, may be ‘fixed’ to take into account an interesting aspect of data, namely complexity. Let’s see an example. Say you have a system of 28 variables, some of which are independent (i.e. uncorrelated) some are not.

One computes the standard deviation of each variable and may them use it to measure volatility, dispersion or some measures of risk or even performance. However, this simple exercise neglects one fundamental issue – complexity, i.e. the fact that variables are interdependent with other variables, see the complexity map below.

map2

Variables 5 and 7, for example, are correlated with numerous others, while 3, 6 and 25 are uncorrelated. This is reflected in the Complexity Profile (or Complexity Spectrum) which ranks the complexity footprint of each variable in the system. This is illustrated below.

cprof1

Variable 7 has a footprint of just over 17% while 5 is responsible for nearly 15% of the complexity of the system.

The question now is this: why not use the information in the Complexity Profile to ‘adjust’ standard deviations by adding a correction originating from complexity? Clearly, a variable that is heavily correlated to others could be more ‘dangerous’ than an uncorrelated one.

One simple way to accomplish this is the following:

Adjusted STD = (1 + Complexity contribution) x STD

Basically, variables that increase complexity see their standard deviations corrected (increased) by a complexity-based factor. The (ranked) result is illustrated below.

stdevtable

The bar chart below shows the complexity-induced corrections of standard deviations.

adjustedStdev

For example, the standard deviation of the biggest complexity contributor – variable 7 – which is 3.81, is incremented by 17.1% (its complexity footprint) to yield a value of 4.46.

The above correction increases standard deviations, illustrating eloquently the concept of complexity-induced risk. Think of the impact of complexity in very large data sets when building a model. Remember, the most important things in a model are those it doesn’t contain.

Doing classical stats may produce overly optimistic results if complexity is neglected. In reality, every system has some degree of complexity, which is invisible to conventional analytics . In reality, there is often more risk than one may think. The above method shows how one can incorporate, albeit superficially, complexity-induced risk into any calculation or procedure without disrupting it.