.addFormants: Add formants per sound

Description

Internal soundgen function called by addFormants

Usage

.addFormants(
  audio,
  formants,
  spectralEnvelope = NULL,
  zFun = NULL,
  action = c("add", "remove")[1],
  vocalTract = NA,
  formantDep = 1,
  formantDepStoch = 1,
  formantWidth = 1,
  formantCeiling = 2,
  lipRad = 6,
  noseRad = 4,
  mouthOpenThres = 0,
  mouth = NA,
  temperature = 0.025,
  formDrift = 0.3,
  formDisp = 0.2,
  smoothing = list(),
  windowLength_points = 800,
  overlap = 75,
  normalize = TRUE,
  play = FALSE,
  ...
)

Arguments

audio

a list returned by readAudio

formants

either a character string like "aaui" referring to default presets for speaker "M1" or a list of formant times, frequencies, amplitudes, and bandwidths (see ex. below). formants = NA defaults to schwa. Time stamps for formants and mouthOpening can be specified in ms or an any other arbitrary scale. See getSpectralEnvelope for more details

spectralEnvelope

(optional): as an alternative to specifying formant frequencies, we can provide the exact filter - a vector of non-negative numbers specifying the power in each frequency bin on a linear scale (interpolated to length equal to windowLength_points/2). A matrix specifying the filter for each STFT step is also accepted. The easiest way to create this matrix is to call soundgen:::getSpectralEnvelope or to use the spectrum of a recorded sound

zFun

(optional) an arbitrary function to apply to the spectrogram prior to iSTFT, where "z" is the spectrogram - a matrix of complex values (see examples)

action

'add' = add formants to the sound, 'remove' = remove formants (inverse filtering)

vocalTract

the length of vocal tract, cm. Used for calculating formant dispersion (for adding extra formants) and formant transitions as the mouth opens and closes. If NULL or NA, the length is estimated based on specified formant frequencies, if any (anchor format)

formantDep

scale factor of formant amplitude (1 = no change relative to amplitudes in formants)

formantDepStoch

the amplitude of additional stochastic formants added above the highest specified formant, dB (only if temperature > 0)

formantWidth

scale factor of formant bandwidth (1 = no change)

formantCeiling

frequency to which stochastic formants are calculated, in multiples of the Nyquist frequency; increase up to ~10 for long vocal tracts to avoid losing energy in the upper part of the spectrum

lipRad

the effect of lip radiation on source spectrum, dB/oct (the default of +6 dB/oct produces a high-frequency boost when the mouth is open)

noseRad

the effect of radiation through the nose on source spectrum, dB/oct (the alternative to lipRad when the mouth is closed)

mouthOpenThres

open the lips (switch from nose radiation to lip radiation) when the mouth is open >mouthOpenThres, 0 to 1

mouth

mouth opening (0 to 1, 0.5 = neutral, i.e. no modification) (anchor format)

temperature

hyperparameter for regulating the amount of stochasticity in sound generation

formDrift

scaling factors for the effect of temperature on formant drift and dispersal, respectively

formDisp

scaling factors for the effect of temperature on formant drift and dispersal, respectively

smoothing

a list of parameters passed to getSmoothContour to control the interpolation and smoothing of contours: interpol (approx / spline / loess), loessSpan, discontThres, jumpThres

windowLength_points

length of FFT window, points

overlap

FFT window overlap, %. For allowed values, see istft

normalize

if TRUE, normalizes the output to range from -1 to +1

play

if TRUE, plays the synthesized sound using the default player on your system. If character, passed to play as the name of player to use, eg "aplay", "play", "vlc", etc. In case of errors, try setting another default player for play

...

other plotting parameters passed to spectrogram