Degrees of freedom are effectively the number of observations in the testing set which are "free to vary". In trading strategies, following Pardo(2008, p. 130-131) this typically means the total number of observations in the market data to be tested minus the number of observations used by indicators, signals, and rules.
degrees.of.freedom(strategy, portfolios = NULL, ...,
paramset.method = c("trial", "max", "sum"), env = .GlobalEnv,
verbose = TRUE)
an object of type 'strategy' or the name of a stored strategy to examine
portfolios to examine for symbols to use for observations, default NULL, see Details.
any other passthru parameters
one of 'trial', or 'max', or 'sum' to determine how to count, see Details.
environment to look in for market data, default .GlobalEnv
default TRUE
an object of type dof
containing:
string name of the strategy
decision points in the strategy specification
degrees of freedom consumed by numeric arguments to indicators
degrees of freedom consumed by numeric parameter sets
total degrees of freedom consumed by strategy specification
character vector of portfolios examined for symbols
character vector of symbols taken from portfolios
named list by symbol, containing observations for each
total number of degrees of freedom collected from market data observations
total degrees of freedom
percent degrees of freedom, calculated as deg.freedom/mktdata.obs
call used when calling degrees.of.freedom
We start by removing one degree of freedom for each 'decision point' in the strategy, e.g. each indicator, signal process, rule, parameter set, parameter constraint. This is a conservative approach that recognizes that the more complex a strategy is, the higher the probability it may be overfit. So we treat added complexity as removing degrees of freedom in the analysis.
If the strategy does not contain parameter sets, then all numeric variables in the strategy will be considered as removing degrees of freedom. This is again slightly more conservative than a strict reading might suggest. For example, a method argument that could range from 1:6, if you choose 6, doesn't necessarily look at 6 market observations. On the other hand, a standard deviation parameter may actually look at *all* trailing observations, arguably using all the degrees of freedom, so only counting 2 degrees of freedom for the stddev arg in an indicator is being generous. We're trying to strike a reasonable balance, erring towards being moderately conservative.
For strategies containing parameter sets, we will examine the parameter
combinations after applying all parameter sets and constraints. If
paramset.method=='max'
, we will take the paramset that has the highest
total value, and remove that many degrees of freedom. If paramset.method=='sum'
,
we will take an even more conservative view, and consider the sum of degrees
of freedom consumed by all examined parameter set combinations. The default,
paramset.method=='trial'
, falls in between, utilizing the same number
of degrees of freedom as the 'max' paramset, and subtracting one additional
degree of freedom for each trial recorded.
Collecting observations for market data is easy for lower frequencies, and
may not be worth doing for higher frequencies. If the portfolios
argument is not NULL, this function will check the symbols list, and then
attempt to load market data from env
to count the number of observations.
With low frequency data, this is likely to work fine. With high frequency data,
it is quite possible that all the data will not be in memory at once, so the
number of observations counted will not contain all the data. In practice,
the user should be aware of this, and take appropriate action, such as doing
those calculations by hand. Also, as a practical matter, it may make very
little difference as a percentage of available degrees of freedom, since
the higher the frequency of the data, the smaller the percentage of the data
likely to be consumed by the strategy specification.
The % degrees of freedom may be considered as the percentage of the observations that may be used for other inference. Typically, a number greater than 95% will be desired. Raising the available degrees of freedom may be accomplished by multiple methods:
increase the length of market data consumed by the backtest
increase the number of symbols examined
decrease the number of free parameters
decrease the ranges of the parameters examined
Pardo, Robert. The Evaluation and Optimization of Trading Strategies. John Wiley & Sons. 2nd Ed., 2008.