Speech can be characterized as being voiced or unvoiced.Voicedspeech, such as vowel sounds, occurs when the vocal cords vibrate. Inunvoicedspeech, such as most consonant sounds, the vocal chords do not vibrate. You can use zero crossings to classify the voiced and unvoiced regions in an audio signal.
Load an audio signal into the MATLAB® workspace. The voice says, "Oak is strong, and also gives shade".
The signal is sampled at 44.1 kHz. Calculate the zero-crossing rate for 10 ms windows using the comparison method.
情节rate
to visualize the crossing rate for each segment. Voiced speech is expected to have a low crossing rate, while unvoiced speech is expected to have a high crossing rate.
Use a threshold of0.1
to differentiate between voiced and unvoiced segments. Create asignalMask
object that has two categories ("Unvoiced" and "Voiced") and plot the regions of interest (ROIs). Compare the regions of voiced and unvoiced speech to the location of each spoken word.
IBM® Watson Speech to Text API and Audio Toolbox™ software can be used to extract words from an audio file. Load Transcription.mat
into the workspace. The labeled signal set contains the audio signal, ROI limits, and labels for each spoken word. For details, seeLabel Spoken Words in Audio Signals Using External API. Display the spoken words on the plot.