spectrogram: Spectrogram

Description

Produces the spectrogram of a sound using short-term Fourier transform. Inspired by spectro, this function offers added routines for noise reduction, smoothing in time and frequency domains, manual control of contrast and brightness, plotting the oscillogram on a dB scale, grid, etc.

Usage

spectrogram(
  x,
  samplingRate = NULL,
  dynamicRange = 80,
  windowLength = 50,
  step = NULL,
  overlap = 70,
  wn = "gaussian",
  zp = 0,
  normalize = TRUE,
  scale = NULL,
  smoothFreq = 0,
  smoothTime = 0,
  qTime = 0,
  percentNoise = 10,
  noiseReduction = 0,
  contrast = 0.2,
  brightness = 0,
  method = c("spectrum", "spectralDerivative")[1],
  output = c("original", "processed", "complex")[1],
  ylim = NULL,
  yScale = c("linear", "log")[1],
  plot = TRUE,
  osc = FALSE,
  osc_dB = FALSE,
  heights = c(3, 1),
  padWithSilence = TRUE,
  colorTheme = c("bw", "seewave", "heat.colors", "...")[1],
  units = c("ms", "kHz"),
  xlab = paste("Time,", units[1]),
  ylab = paste("Frequency,", units[2]),
  mar = c(5.1, 4.1, 4.1, 2),
  main = "",
  grid = NULL,
  frameBank = NULL,
  duration = NULL,
  pitch = NULL,
  ...
)

Arguments

path to a .wav or .mp3 file or a vector of amplitudes with specified samplingRate

samplingRate

sampling rate of x (only needed if x is a numeric vector, rather than an audio file)

dynamicRange

dynamic range, dB. All values more than one dynamicRange under maximum are treated as zero

windowLength

length of FFT window, ms

step

you can override overlap by specifying FFT step, ms

overlap

overlap between successive FFT frames, %

window type: gaussian, hanning, hamming, bartlett, rectangular, blackman, flattop

window length after zero padding, points

normalize

if TRUE, scales input prior to FFT

scale

maximum possible amplitude of input used for normalization of input vector (not needed if input is an audio file)

smoothFreq, smoothTime

length of the window, in data points (0 to +inf), for calculating a rolling median. Applies median smoothing to spectrogram in frequency and time domains, respectively

qTime

the quantile to be subtracted for each frequency bin. For ex., if qTime = 0.5, the median of each frequency bin (over the entire sound duration) will be calculated and subtracted from each frame (see examples)

percentNoise

percentage of frames (0 to 100%) used for calculating noise spectrum

noiseReduction

how much noise to remove (0 to +inf, recommended 0 to 2). 0 = no noise reduction, 2 = strong noise reduction: \(spectrum - (noiseReduction * noiseSpectrum)\), where noiseSpectrum is the average spectrum of frames with entropy exceeding the quantile set by percentNoise

contrast

spectrum is exponentiated by contrast (-inf to +inf, recommended -1 to +1). Contrast >0 increases sharpness, <0 decreases sharpness

brightness

how much to "lighten" the image (>0 = lighter, <0 = darker)

method

plot spectrum ('spectrum') or spectral derivative ('spectralDerivative')

output

specifies what to return: nothing ('none'), unmodified spectrogram ('original'), denoised and/or smoothed spectrogram ('processed'), or unmodified spectrogram with the imaginary part giving phase ('complex')

ylim

frequency range to plot, kHz (defaults to 0 to Nyquist frequency)

yScale

scale of the frequency axis: 'linear' = linear, 'log' = logarithmic

plot

should a spectrogram be plotted? TRUE / FALSE

osc, osc_dB

should an oscillogram be shown under the spectrogram? TRUE/ FALSE. If `osc_dB`, the oscillogram is displayed on a dB scale. See osc_dB for details

heights

a vector of length two specifying the relative height of the spectrogram and the oscillogram (including time axes labels)

padWithSilence

if TRUE, pads the sound with just enough silence to resolve the edges properly (only the original region is plotted, so apparent duration doesn't change)

colorTheme

black and white ('bw'), as in seewave package ('seewave'), or any palette from palette such as 'heat.colors', 'cm.colors', etc

units

c('ms', 'kHz') is the default, and anything else is interpreted as s (for time) and Hz (for frequency)

xlab, ylab, main, mar

graphical parameters

grid

if numeric, adds n = grid dotted lines per kHz

frameBank, duration, pitch

ignore (only used internally)

...

other graphical parameters

Value

Returns nothing (if output = 'none'), absolute - not power! - spectrum (if output = 'original'), denoised and/or smoothed spectrum (if output = 'processed'), or spectral derivatives (if method = 'spectralDerivative') as a matrix of real numbers.

Details

Many soundgen functions call spectrogram, and you can pass along most of its graphical parameters from functions like soundgen, analyze, etc. However, in some cases this will not work (eg for "units") or may produce unexpected results. If in doubt, omit extra graphical parameters.

Examples

Run this code

# NOT RUN {
# synthesize a sound 1 s long, with gradually increasing hissing noise
sound = soundgen(sylLen = 500, temperature = 0.001, noise = list(
  time = c(0, 650), value = c(-40, 0)), formantsNoise = list(
  f1 = list(freq = 5000, width = 10000)))
# playme(sound, samplingRate = 16000)

# basic spectrogram
spectrogram(sound, samplingRate = 16000)

# }
# NOT RUN {
# add bells and whistles
spectrogram(sound, samplingRate = 16000,
  osc = TRUE,  # plot oscillogram under the spectrogram
  noiseReduction = 1.1,  # subtract the spectrum of noisy parts
  brightness = -1,  # reduce brightness
  colorTheme = 'heat.colors',  # pick color theme
  cex.lab = .75, cex.axis = .75,  # text size and other base graphics pars
  grid = 5,  # lines per kHz; to customize, add manually with graphics::grid()
  units = c('s', 'Hz'),  # plot in s or ms, Hz or kHz
  ylim = c(0, 5000),  # in specified units (Hz)
  main = 'My spectrogram' # title
  # + axis labels, etc
)

# change dynamic range
spectrogram(sound, samplingRate = 16000, dynamicRange = 40)
spectrogram(sound, samplingRate = 16000, dynamicRange = 120)

# add an oscillogram
spectrogram(sound, samplingRate = 16000, osc = TRUE)

# oscillogram on a dB scale, same height as spectrogram
spectrogram(sound, samplingRate = 16000,
            osc_dB = TRUE, heights = c(1, 1))

# frequencies on a logarithmic scale
spectrogram(sound, samplingRate = 16000,
            yScale = 'log', ylim = c(.05, 8))

# broad-band instead of narrow-band
spectrogram(sound, samplingRate = 16000, windowLength = 5)

# focus only on values in the upper 5% for each frequency bin
spectrogram(sound, samplingRate = 16000, qTime = 0.95)

# detect 10% of the noisiest frames based on entropy and remove the pattern
# found in those frames (in this cases, breathing)
spectrogram(sound, samplingRate = 16000,  noiseReduction = 1.1,
  brightness = -2)  # white noise attenuated

# apply median smoothing in both time and frequency domains
spectrogram(sound, samplingRate = 16000, smoothFreq = 5,
  smoothTime = 5)

# increase contrast, reduce brightness
spectrogram(sound, samplingRate = 16000, contrast = 1, brightness = -1)

# specify location of tick marks etc - see ?par() for base graphics
spectrogram(sound, samplingRate = 16000,
            ylim = c(0, 3), yaxp = c(0, 3, 5), xaxp = c(0, 1400, 4))
# }

Run the code above in your browser using DataLab