Getting Started With Speech Recognition And Python
Solution 1:
UPDATE: this is not working anymore
because google closed its platform
--
you can use https://pypi.python.org/pypi/pygsr
$> pip install pygsr
example usage:
from pygsr import Pygsr
speech = Pygsr()
# duration in seconds
speech.record(3)
# select the language
phrase, complete_response = speech.speech_to_text('en_US')
print phrase
Solution 2:
If you really want to understand speech recognition from the ground up, look for a good signal processing package for python and then read up on speech recognition independently of the software.
But speech recognition is an extremely complex problem (basically because sounds interact in all sorts of ways when we talk). Even if you start with the best speech recognition library you can get your hands on, you'll by no means find yourself with nothing more to do.
Solution 3:
Pocketsphinx is also a good alternative. There are Python bindings provided through SWIG that make it easy to integrate in a script.
For example:
from os import environ, path
from itertools import izip
from pocketsphinx import *
from sphinxbase import *
MODELDIR = "../../../model"
DATADIR = "../../../test/data"# Create a decoder with certain model
config = Decoder.default_config()
config.set_string('-hmm', path.join(MODELDIR, 'hmm/en_US/hub4wsj_sc_8k'))
config.set_string('-lm', path.join(MODELDIR, 'lm/en_US/hub4.5000.DMP'))
config.set_string('-dict', path.join(MODELDIR, 'lm/en_US/hub4.5000.dic'))
decoder = Decoder(config)
# Decode static file.
decoder.decode_raw(open(path.join(DATADIR, 'goforward.raw'), 'rb'))
# Retrieve hypothesis.
hypothesis = decoder.hyp()
print'Best hypothesis: ', hypothesis.best_score, hypothesis.hypstr
print'Best hypothesis segments: ', [seg.word for seg in decoder.seg()]
# Access N best decodings.print'Best 10 hypothesis: 'for best, i in izip(decoder.nbest(), range(10)):
print best.hyp().best_score, best.hyp().hypstr
# Decode streaming data.
decoder = Decoder(config)
decoder.start_utt('goforward')
stream = open(path.join(DATADIR, 'goforward.raw'), 'rb')
whileTrue:
buf = stream.read(1024)
if buf:
decoder.process_raw(buf, False, False)
else:
break
decoder.end_utt()
print'Stream decoding result:', decoder.hyp().hypstr
Solution 4:
I know the Question is old but just for people in future:
I use the speech_recognition
-Module and I love it. The only thing is, it requires Internet because it uses the Google to recognize the Speech. But that shouldn't be a problem in most cases. The recognition works almost perfectly.
EDIT:
The speech_recognition
package can use more than just google to translate, including CMUsphinx (which allows offline recognition), among others. The only difference is a subtle change in the recognize command:
https://pypi.python.org/pypi/SpeechRecognition/
Here is a small code-example:
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source: # use the default microphone as the audio source
audio = r.listen(source) # listen for the first phrase and extract it into audio datatry:
print("You said " + r.recognize_google(audio)) # recognize speech using Google Speech Recognition - ONLINEprint("You said " + r.recognize_sphinx(audio)) # recognize speech using CMUsphinx Speech Recognition - OFFLINEexcept LookupError: # speech is unintelligibleprint("Could not understand audio")
There is just one thing what doesn't work well for me: Listening in an infinity loop. After some Minutes it hangs up. (It's not crashing, it's just not responding.)
EDIT: If you want to use Microphone without the infinity loop you should specify recording length. Example code:
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
print("Speak:")
audio = r.listen(source, None, "time_to_record") # recording
Solution 5:
For those who want to get deeper into the subject of speech recognition in Python, here are some links:
- http://www.slideshare.net/mchua/sigproc-selfstudy-17323823 - signal processing in Python, including Audio signal as the most interesting to play with.
Post a Comment for "Getting Started With Speech Recognition And Python"