The fitting of a joint polynomial-trigonometric model is limited to ordinary
least squares (OLS), with autocorrelation analysis of OLS residuals up to a
certain lag. Orthogonal polynomials are used to model broad-scale trends,
whereas cosines and sines model the periodic structures at intermediate
scales. See Dutilleul (2011, section 6.5) and Legendre & Legendre (2012,
section 12.4.4) for details. OLS regression could be replaced by an
estimated generalized least squares (EGLS) procedure, as described in
Dutilleul (2011).
In spectral analysis in general and in mfpa in particular, the cosines and
sines are considered jointly in the search for the dominant frequency
components since they are both required to fully account for a frequency
component in a linear model. So, when either the cosine or the sine is
significant, this is sufficient indication that a significant frequency
component has been found. But see the first paragraph of the 'Recommendations
to users' below.
The periodic phenomenon corresponding to each identified frequency is modelled
by a cosine and a sine. The first pair ('cos 1', 'sin 1') corresponds to the
first frequency, the second pair to the second frequency, and so on. An
intercept is also computed, as well as a polynomial broad-scale trend if
argument ntrend > 0. The coefficients shown for each periodic component ('cos'
and 'sin') are the OLS regression coefficients. The tests of significance
producing the p-values (called 'prob' in the output file) are 2-tailed
parametric t-tests, as in standard OLS regression.
A global R-square statistic for the periodogram is computed as the variance of
the fitted values divided by the variance of the data series. An R-squared
corresponding to each frequency is also returned.
In the Dutilleul periodogram, the time unit is the length of the data series
(in time units: seconds, hours, days, etc.). Hence, the frequency
identified by a Dutilleul periodogram is the number of cycles of the periodic
signal (how many full or partial cycles) along the time series. That number is
an integer when the series contains an integer number of cycles; it may also
be a real number when the number of cycles is fractional. The periodogram can
identify several periodic phenomena with different frequencies. The estimated
frequencies could be divided by an appropriate constant to produce numbers of
cycles per second or day, or per meter or km, depending on the study.
To find the period (number of days, hours, etc.) of the process
generating a periodic signal in the data, divide the length of the series (in
days, hours, etc.) by the frequency identified by Dutilleul's periodogram.
Recommendations to users The mfpa code estimates the periodic frequencies to
be included in the model through a combination of a stepwise procedure and
non-linear optimisation. Following that, the contributions of the 'cos' and
'sin' components of all frequencies in the model are estimated by multiple
linear regression in the presence of the intercept and trends (if any).
Because the mfpa method estimates fractional frequencies, the cos-sin
combinations are not orthogonal among the identified frequencies, and
unnecessary frequencies may be selected as 'significant'.
1. It is important that users of this periodogram have hypotheses in mind
about the frequencies of the processes that may be operating on the system
under study and the number of periodic components they are expecting to find.
If one asks for more components than the number of periodic phenomena at work
on the system, the 'real' frequency usually has a strong or fairly strong
R-squared and it is followed by other components with very small R-squared.
Selection of frequencies of interest should thus be based more upon
examination of the R-squares of the components rather than on the p-values.
For short series in particular, the adjusted R-squared is an unbiased estimate
of the variance of the data explained by the model. Even series of random
numbers can produce 'significant' frequencies for periodic components; the
associated (adjusted) R-squares will, however, be very small.
2. Function mfpa cannot detect frequencies < 1 (smaller than one cycle in the
series) or larger than (n-1) where n is the number of
observations in the series, the latter case corresponding to periods smaller
than the interval between successive observations. When a periodic component
with such a period is present in the data, Dutilleul's periodogram can detect
harmonics of that frequency. Recommendation: when a frequency is detected that
does not seem to correspond to a hypothesized process, one could check, using
simulated data, if it could be produced by a process operating at a temporal
scale (period) smaller than the interval between successive observations. An
example is shown in Example 2.
3. When analysing a time series with unknown periodic structure, it is
recommended to try first with more than one frequency, say 2 or 3, and also
with a trend. Eliminate the non-significant components, step by step, in
successive runs, starting with the trend(s), then eliminate the weakly
significant periodic components, until there are only highly significant
components left in the model.