degrees.of.freedom: calculate degrees of freedom used by a strategy and available from test data

Description

Degrees of freedom are effectively the number of observations in the testing set which are "free to vary". In trading strategies, following Pardo(2008, p. 130-131) this typically means the total number of observations in the market data to be tested minus the number of observations used by indicators, signals, and rules.

Usage

degrees.of.freedom(strategy, portfolios = NULL, ...,
  paramset.method = c("trial", "max", "sum"), env = .GlobalEnv,
  verbose = TRUE)

Arguments

strategy

an object of type 'strategy' or the name of a stored strategy to examine

portfolios

portfolios to examine for symbols to use for observations, default NULL, see Details.

...

any other passthru parameters

paramset.method

one of 'trial', or 'max', or 'sum' to determine how to count, see Details.

env

environment to look in for market data, default .GlobalEnv

verbose

default TRUE

Value

an object of type dof containing:

strategy: string name of the strategy
dp: decision points in the strategy specification
idf: degrees of freedom consumed by numeric arguments to indicators
psdf: degrees of freedom consumed by numeric parameter sets
strategy.dfc: total degrees of freedom consumed by strategy specification
portfolios: character vector of portfolios examined for symbols
symbols: character vector of symbols taken from portfolios
symbol.obs: named list by symbol, containing observations for each
mktdata.obs: total number of degrees of freedom collected from market data observations
deg.freedom: total degrees of freedom
pct.deg.freedom: percent degrees of freedom, calculated as deg.freedom/mktdata.obs
call: call used when calling degrees.of.freedom

Details

We start by removing one degree of freedom for each 'decision point' in the strategy, e.g. each indicator, signal process, rule, parameter set, parameter constraint. This is a conservative approach that recognizes that the more complex a strategy is, the higher the probability it may be overfit. So we treat added complexity as removing degrees of freedom in the analysis.

If the strategy does not contain parameter sets, then all numeric variables in the strategy will be considered as removing degrees of freedom. This is again slightly more conservative than a strict reading might suggest. For example, a method argument that could range from 1:6, if you choose 6, doesn't necessarily look at 6 market observations. On the other hand, a standard deviation parameter may actually look at *all* trailing observations, arguably using all the degrees of freedom, so only counting 2 degrees of freedom for the stddev arg in an indicator is being generous. We're trying to strike a reasonable balance, erring towards being moderately conservative.

For strategies containing parameter sets, we will examine the parameter combinations after applying all parameter sets and constraints. If paramset.method=='max', we will take the paramset that has the highest total value, and remove that many degrees of freedom. If paramset.method=='sum', we will take an even more conservative view, and consider the sum of degrees of freedom consumed by all examined parameter set combinations. The default, paramset.method=='trial', falls in between, utilizing the same number of degrees of freedom as the 'max' paramset, and subtracting one additional degree of freedom for each trial recorded.

Collecting observations for market data is easy for lower frequencies, and may not be worth doing for higher frequencies. If the portfolios argument is not NULL, this function will check the symbols list, and then attempt to load market data from env to count the number of observations. With low frequency data, this is likely to work fine. With high frequency data, it is quite possible that all the data will not be in memory at once, so the number of observations counted will not contain all the data. In practice, the user should be aware of this, and take appropriate action, such as doing those calculations by hand. Also, as a practical matter, it may make very little difference as a percentage of available degrees of freedom, since the higher the frequency of the data, the smaller the percentage of the data likely to be consumed by the strategy specification.

The % degrees of freedom may be considered as the percentage of the observations that may be used for other inference. Typically, a number greater than 95% will be desired. Raising the available degrees of freedom may be accomplished by multiple methods:

increase the length of market data consumed by the backtest
increase the number of symbols examined
decrease the number of free parameters
decrease the ranges of the parameters examined

References

Pardo, Robert. The Evaluation and Optimization of Trading Strategies. John Wiley & Sons. 2nd Ed., 2008.