Learn R Programming

soundgen (version 2.3.0)

getSurprisal: Get surprisal

Description

Tracks the (un)predictability of spectral changes in a sound over time, returning a continuous contour of "surprisal". This is an attempt to track auditory salience over time - that is, to identify parts of a sound that are likely to involuntarily attract the listeners' attention. The functions returns surprisal proper (`$surprisal`) and its product with increases in loudness (`$surprisalLoudness`). Because getSurprisal() is slow and experimental, it is not called by analyzed().

Usage

getSurprisal(
  x,
  samplingRate = NULL,
  scale = NULL,
  from = NULL,
  to = NULL,
  step = 20,
  winSurp = 2000,
  yScale = c("bark", "mel", "log")[1],
  nFilters = 64,
  dynamicRange = 80,
  minFreq = 20,
  maxFreq = samplingRate/2,
  summaryFun = "mean",
  reportEvery = NULL,
  plot = TRUE,
  savePlots = NULL,
  osc = c("none", "linear", "dB")[2],
  heights = c(3, 1),
  ylim = NULL,
  contrast = 0.2,
  brightness = 0,
  maxPoints = c(1e+05, 5e+05),
  padWithSilence = TRUE,
  colorTheme = c("bw", "seewave", "heat.colors", "...")[1],
  extraContour = NULL,
  xlab = NULL,
  ylab = NULL,
  xaxp = NULL,
  mar = c(5.1, 4.1, 4.1, 2),
  main = NULL,
  grid = NULL,
  width = 900,
  height = 500,
  units = "px",
  res = NA,
  ...
)

Arguments

x

path to a folder, one or more wav or mp3 files c('file1.wav', 'file2.mp3'), Wave object, numeric vector, or a list of Wave objects or numeric vectors

samplingRate

sampling rate of x (only needed if x is a numeric vector)

scale

maximum possible amplitude of input used for normalization of input vector (only needed if x is a numeric vector)

from

if NULL (default), analyzes the whole sound, otherwise from...to (s)

to

if NULL (default), analyzes the whole sound, otherwise from...to (s)

step

step, ms (determines time resolution). step = NULL means no downsampling at all (ncol of output = length of input audio)

winSurp

surprisal analysis window, ms

yScale

scale of the frequency axis: 'linear' = linear, 'log' = logarithmic (musical), 'bark' = bark with hz2bark, 'mel' = mel with hz2mel

nFilters

the number of filters (determines frequency resolution)

dynamicRange

dynamic range, dB. All values more than one dynamicRange under maximum are treated as zero

minFreq

the range of frequencies to analyze

maxFreq

the range of frequencies to analyze

summaryFun

functions used to summarize each acoustic characteristic, eg "c('mean', 'sd')"; user-defined functions are fine (see examples); NAs are omitted automatically for mean/median/sd/min/max/range/sum, otherwise take care of NAs yourself

reportEvery

when processing multiple inputs, report estimated time left every ... iterations (NULL = default, NA = don't report)

plot

if TRUE, plots the auditory spectrogram and the suprisalLoudness contour

savePlots

full path to the folder in which to save the plots (NULL = don't save, '' = same folder as audio)

osc

"none" = no oscillogram; "linear" = on the original scale; "dB" = in decibels

heights

a vector of length two specifying the relative height of the spectrogram and the oscillogram (including time axes labels)

ylim

frequency range to plot, kHz (defaults to 0 to Nyquist frequency). NB: still in kHz, even if yScale = bark or mel

contrast

spectrum is exponentiated by contrast (any real number, recommended -1 to +1). Contrast >0 increases sharpness, <0 decreases sharpness

brightness

how much to "lighten" the image (>0 = lighter, <0 = darker)

maxPoints

the maximum number of "pixels" in the oscillogram (if any) and spectrogram; good for quickly plotting long audio files; defaults to c(1e5, 5e5)

padWithSilence

if TRUE, pads the sound with just enough silence to resolve the edges properly (only the original region is plotted, so the apparent duration doesn't change)

colorTheme

black and white ('bw'), as in seewave package ('seewave'), or any palette from palette such as 'heat.colors', 'cm.colors', etc

extraContour

a vector of arbitrary length scaled in Hz (regardless of yScale!) that will be plotted over the spectrogram (eg pitch contour); can also be a list with extra graphical parameters such as lwd, col, etc. (see examples)

xlab

graphical parameters for plotting

ylab

graphical parameters for plotting

xaxp

graphical parameters for plotting

mar

graphical parameters for plotting

main

graphical parameters for plotting

grid

if numeric, adds n = grid dotted lines per kHz

width

graphical parameters for saving plots passed to png

height

graphical parameters for saving plots passed to png

units

graphical parameters for saving plots passed to png

res

graphical parameters for saving plots passed to png

...

other graphical parameters

Value

Returns a list with $detailed per-frame and $summary per-file results (see analyze for more information). Three measures are reported: loudness (in sone, as per getLoudness), the first derivative of loudness with respect to time (dLoudness), surprisal (non-negative), and suprisalLoudness (geometric mean of surprisal and dLoudness, treating negative values of dLoudnessas zero).

Details

Algorithm: we start with an auditory spectrogram produced by applying a bank of bandpass filters to the signal, by default with central frequencies equally spaced on the bark scale (see audSpectrogram). For each frequency channel, a sliding window is analyzed to compare the actually observed final value with its expected value. There are many ways to extrapolate / predict time series and thus perform this comparison. Here, we calculate the autocorrelation function of the window without the final point, find its peak (i.e., the delay that produces the highest autocorrelation), calculate autocorrelation of the window with the final point at this "optimal" delay, and compare these two correlations. In effect, we estimate how far the final point in our window deviates from the dominant oscillation frequency or "fundamental frequency" of the time series, which in this case represents the changes in amplitude in the same frequency channel over time. The resulting per-channel surprisal contours are aggregated by taking their mean weighted by the average amplitude of each frequency channel across the analysis window. Because increases in loudness are known to be important predictors of auditory salience, loudness per frame is also returned, as well as the square root of the product of its derivative and surprisal.

Examples

Run this code
# NOT RUN {
# A quick example
s = soundgen(nSyl = 2, sylLen = 50, pauseLen = 25, addSilence = 15)
surp = getSurprisal(s, samplingRate = 16000)
surp

# }
# NOT RUN {
# A more meaningful example
sound = soundgen(nSyl = 5, sylLen = 150,
  pauseLen = c(50, 50, 50, 130), pitch = c(200, 150),
  noise = list(time = c(-300, 200), value = -20), plot = TRUE)
# playme(sound)
surp = getSurprisal(sound, samplingRate = 16000, yScale = 'bark')

# NB: surprisalLoudness contour is also log-transformed if yScale = 'log',
# so zeros become NAs
surp = getSurprisal(sound, samplingRate = 16000, yScale = 'log')

# add bells and whistles
surp = getSurprisal(sound, samplingRate = 16000,
  yScale = 'mel',
  osc = 'dB',  # plot oscillogram in dB
  heights = c(2, 1),  # spectro/osc height ratio
  brightness = -.1,  # reduce brightness
  colorTheme = 'heat.colors',  # pick color theme
  cex.lab = .75, cex.axis = .75,  # text size and other base graphics pars
  ylim = c(0, 5),  # always in kHz
  main = 'Audiogram with surprisal contour' # title
  # + axis labels, etc
)

surp = getSurprisal('~/Downloads/temp/', savePlots = '~/Downloads/temp/surp')
surp$summary
# }

Run the code above in your browser using DataLab