spectrogram: Spectrogram

Description

Produces the spectrogram of a sound using short-term Fourier transform. Inspired by spectro, this function offers added routines for noise reduction, smoothing in time and frequency domains, manual control of contrast and brightness, plotting the oscillogram on a dB scale, grid, etc.

Usage

spectrogram(
  x,
  samplingRate = NULL,
  dynamicRange = 80,
  windowLength = 50,
  step = NULL,
  overlap = 70,
  wn = "gaussian",
  zp = 0,
  normalize = TRUE,
  scale = NULL,
  smoothFreq = 0,
  smoothTime = 0,
  qTime = 0,
  percentNoise = 10,
  noiseReduction = 0,
  method = c("spectrum", "spectralDerivative")[1],
  output = c("original", "processed", "complex")[1],
  plot = TRUE,
  osc = c("none", "linear", "dB")[2],
  osc_dB = NULL,
  heights = c(3, 1),
  ylim = NULL,
  yScale = c("linear", "log")[1],
  contrast = 0.2,
  brightness = 0,
  maxPoints = c(1e+05, 5e+05),
  padWithSilence = TRUE,
  colorTheme = c("bw", "seewave", "heat.colors", "...")[1],
  units = "deprecated",
  xlab = NULL,
  ylab = NULL,
  xaxp = NULL,
  mar = c(5.1, 4.1, 4.1, 2),
  main = "",
  grid = NULL,
  internal = NULL,
  ...
)

Arguments

path to a .wav or .mp3 file or a vector of amplitudes with specified samplingRate

samplingRate

sampling rate of x (only needed if x is a numeric vector, rather than an audio file)

dynamicRange

dynamic range, dB. All values more than one dynamicRange under maximum are treated as zero

windowLength

length of FFT window, ms

step

you can override overlap by specifying FFT step, ms

overlap

overlap between successive FFT frames, %

window type: gaussian, hanning, hamming, bartlett, rectangular, blackman, flattop

window length after zero padding, points

normalize

if TRUE, scales input prior to FFT

scale

maximum possible amplitude of input used for normalization of input vector (not needed if input is an audio file)

smoothFreq, smoothTime

length of the window, in data points (0 to +inf), for calculating a rolling median. Applies median smoothing to spectrogram in frequency and time domains, respectively

qTime

the quantile to be subtracted for each frequency bin. For ex., if qTime = 0.5, the median of each frequency bin (over the entire sound duration) will be calculated and subtracted from each frame (see examples)

percentNoise

percentage of frames (0 to 100%) used for calculating noise spectrum

noiseReduction

how much noise to remove (0 to +inf, recommended 0 to 2). 0 = no noise reduction, 2 = strong noise reduction: \(spectrum - (noiseReduction * noiseSpectrum)\), where noiseSpectrum is the average spectrum of frames with entropy exceeding the quantile set by percentNoise

method

plot spectrum ('spectrum') or spectral derivative ('spectralDerivative')

output

specifies what to return: nothing ('none'), unmodified spectrogram ('original'), denoised and/or smoothed spectrogram ('processed'), or unmodified spectrogram with the imaginary part giving phase ('complex')

plot

should a spectrogram be plotted? TRUE / FALSE

osc

should an oscillogram be shown under the spectrogram? none = no osc; linear = on the original scale; dB = in decibels

osc_dB

deprecated

heights

a vector of length two specifying the relative height of the spectrogram and the oscillogram (including time axes labels)

ylim

frequency range to plot, kHz (defaults to 0 to Nyquist frequency)

yScale

scale of the frequency axis: 'linear' = linear, 'log' = logarithmic

contrast

spectrum is exponentiated by contrast (-inf to +inf, recommended -1 to +1). Contrast >0 increases sharpness, <0 decreases sharpness

brightness

how much to "lighten" the image (>0 = lighter, <0 = darker)

maxPoints

the maximum number of "pixels" in the oscillogram (if any) and spectrogram; good for plotting long audio files; defaults to c(1e5, 5e5)

padWithSilence

if TRUE, pads the sound with just enough silence to resolve the edges properly (only the original region is plotted, so apparent duration doesn't change)

colorTheme

black and white ('bw'), as in seewave package ('seewave'), or any palette from palette such as 'heat.colors', 'cm.colors', etc

units

deprecated

xlab, ylab, main, mar, xaxp

graphical parameters

grid

if numeric, adds n = grid dotted lines per kHz

internal

ignore (only used internally)

...

other graphical parameters

Value

Returns nothing (if output = 'none'), absolute - not power! - spectrum (if output = 'original'), denoised and/or smoothed spectrum (if output = 'processed'), or spectral derivatives (if method = 'spectralDerivative') as a matrix of real numbers.

Details

Many soundgen functions call spectrogram, and you can pass along most of its graphical parameters from functions like soundgen, analyze, etc. However, in some cases this will not work (eg for "units") or may produce unexpected results. If in doubt, omit extra graphical parameters.

Examples

Run this code

# NOT RUN {
# synthesize a sound 1 s long, with gradually increasing hissing noise
sound = soundgen(sylLen = 500, temperature = 0.001, noise = list(
  time = c(0, 650), value = c(-40, 0)), formantsNoise = list(
  f1 = list(freq = 5000, width = 10000)))
# playme(sound, samplingRate = 16000)

# basic spectrogram
spectrogram(sound, samplingRate = 16000)

# }
# NOT RUN {
# add bells and whistles
spectrogram(sound, samplingRate = 16000,
  osc = 'dB',  # plot oscillogram in dB
  heights = c(2, 1),  # spectro/osc height ratio
  noiseReduction = 1.1,  # subtract the spectrum of noisy parts
  brightness = -1,  # reduce brightness
  colorTheme = 'heat.colors',  # pick color theme
  cex.lab = .75, cex.axis = .75,  # text size and other base graphics pars
  grid = 5,  # lines per kHz; to customize, add manually with graphics::grid()
  ylim = c(0, 5),  # always in kHz
  main = 'My spectrogram' # title
  # + axis labels, etc
)

# change dynamic range
spectrogram(sound, samplingRate = 16000, dynamicRange = 40)
spectrogram(sound, samplingRate = 16000, dynamicRange = 120)

# remove the oscillogram
spectrogram(sound, samplingRate = 16000, osc = 'none')  # or NULL etc

# frequencies on a logarithmic scale
spectrogram(sound, samplingRate = 16000,
            yScale = 'log', ylim = c(.05, 8))

# broad-band instead of narrow-band
spectrogram(sound, samplingRate = 16000, windowLength = 5)

# focus only on values in the upper 5% for each frequency bin
spectrogram(sound, samplingRate = 16000, qTime = 0.95)

# detect 10% of the noisiest frames based on entropy and remove the pattern
# found in those frames (in this cases, breathing)
spectrogram(sound, samplingRate = 16000,  noiseReduction = 1.1,
  brightness = -2)  # white noise attenuated

# apply median smoothing in both time and frequency domains
spectrogram(sound, samplingRate = 16000, smoothFreq = 5,
  smoothTime = 5)

# increase contrast, reduce brightness
spectrogram(sound, samplingRate = 16000, contrast = 1, brightness = -1)

# specify location of tick marks etc - see ?par() for base graphics
spectrogram(sound, samplingRate = 16000,
            ylim = c(0, 3), yaxp = c(0, 3, 5), xaxp = c(0, .8, 10))

# Plot long audio files with reduced resolution
# (# ~4 s to process + 10 s to plot a 3-min song)
sp = spectrogram('~/Downloads/temp.wav', overlap = 0,
  maxPoints = c(1e5, 5e5),  # limit the number of pixels in osc/spec
  output = 'original', ylim = c(0, 6))
nrow(sp) * ncol(sp) / 5e5  # spec downsampled by a factor of ~9
# }

Run the code above in your browser using DataLab