A Prony speech processing technique
Scanio, Thomas Joseph
Master of Science
A method for speech processing is presented. The method does not require voiced/unvoiced or pitch determination. It models the sampled speech wave as a concatenation of initial segments of unit pulse responses of linear, time-invariant, recursive discrete time systems. The poles of the systems are calculated by Prony's method applied to blocks of speech samples. The zeroes are chosen to zero the error between the speech wave and the first output samples of each system. The analysis phase proceeds as follows. After an initial block of unit pulse response, the system output samples are compared with the speech samples and the system continues to function until the error between the two grows too large. At this time the next block of samples is used to calculate a new system and the process continues. The parameters describing the speech are thus the system parameters (poles and zeroes, for example) and the number of output samples taken from each system. This information is quantized to produce a bit rate for the process of 20 kilobits/second. The approximate speech is synthesized by implementing each system sequentially, applying a pulse to the input and concatenating the required number of output samples to the samples from previous systems. The speech obtained is very noisy, but it is intelligible and speakers can be recognized. A demonstration tape is available from Dr. T. W. Parks of the Electrical Engineering Department. The entire analysis and synthesis procedure for 8 kHz sampling runs in 145 times real time on a Burroughs B-5500 computer with an ALGOL program. It is estimated that this is fast enough to be done in real time by a special purpose processor.