Lab 4
In lab 4, you will synthesize speech using linear predictive coding (LPC) with an autocorrelation-based pitch detection algorithm. The very simple excitation model (each frame is either 100% voiced or 100% unvoiced) will result in an artificial buzzy sound, but it should be intelligible; an example is here (from this original).
- Code, test code, data, and solutions.
- Images of the solutions, created using make_cool_plots.py.
- Autograder submission site.
The ZIP Archive, and the recommended Code-Writing Process
In the ZIP archive, you will find the following directory hierarchy:
-
submitted.py -- this is the ONLY file that you
will submit, and the only file on which you will be
graded. It defines a python class called Dataset
that loads one waveform from the data directory,
analyzes it into LPC, pitch, and RMS amplitude
information, and then resynthesizes it from those three
pieces of information.
- set_frames - chop the input signal into frames
- set_autocor - compute the autocorrelation function of each frame
- set_lpc - compute the linear prediction coefficients in each frame
- set_stable - make the LPC stable, by finding the zeros of the inverse filter, truncating the magnitude of each zero to 0.999, and then recomputing the resulting inverse filter polynomial
- set_pitch - find the pitch period of each frame, as the period with maximum autocorrelation coefficient, within a specified candidate range. If the normalized autocorrelation is less than 0.25, set the pitch period to 0, which is a code meaning "unvoiced".
- set_logrms - compute the log RMS amplitude of each frame
- set_logsigma - linearly interpolate logrms between neighboring frame boundaries, in order to estimate the log standard deviation of each sample
- set_samplepitch - linearly interpolate the pitch period between neighboring frame boundaries, if both frames are voiced. If the next frame is unvoiced, give the current frame a constant pitch period; if the current frame is unvoiced, assign every sample in the current frame to be unvoiced.
- set_excitation - set the excitation to be unit-variance Gaussian white noise if unvoiced, or an impulse train if voiced. The impulse train needs to be scaled and phase-shifted so that its RMS amplitude is 1, and its instantaneous pitch matches samplepitch.
- set_synthesis - synthesize the output speech signal by filtering the excitation through the stable LPC synthesis filter.
- data/file[0-4].wav -- these are the input waveform files, sampled at 11025 samples/second.
- requirements.txt lists the python packages that are used by the autograder. You can install them by typing pip3 install -r requirements.txt.
- make_cool_plots.py --- generates plots that might be useful to debug your code, and also creates the output synthesized speech file. After you have written some functions in submitted.py, you can generate the corresponding plots for file0 by typing, e.g., python3 make_cool_plots.py. Outputs will be placed in the directory make_cool_plots_outputs; they should look like the images provided in the solution image set
- run_tests.py --- once your code is producing reasonable output plots using make_cool_plots.py, try running python3 run_tests.py. This will call the test routines in tests/test_sequence.py, using scoring utility functions in score.py, and compare the results to the hashed solution criteria in solutions/*.json.
How to Submit
When you're ready to submit, go to Gradescope.
- Submit only the one file, submitted.py. Any other files you submit will be ignored.
- You may submit as many times as you like, until the deadline. Only your last submission will count toward your course grade.