hi there for those who may want speech recognition capability on openwrt
opkg install git
opkg install alsa-utils
opkg install alsa-lib
opkg install portaudio
opkg install ar
opkg install python3-dev
# Make a working directory
mkdir /opt/tmp/
cd /opt/tmp
# Fetch PyAudio source
git clone https://people.csail.mit.edu/hubert/git/pyaudio.git
# Fetch portaudio.h (required for the build)
cd /opt/tmp/pyaudio/src
wget https://raw.githubusercontent.com/EddieRingle/portaudio/master/include/portaudio.h
# build away!
cd /opt/tmp/pyaudio
python setup.py build
python setup.py install
# done
then you can install
pip install SpeechRecognition
you will need to compile flac
opkg install gcc make
wget https://github.com/xiph/flac/releases/download/1.3.4/flac-1.3.4.tar.xz -O flac.tar.xz
tar -xvf fac.tar.xz
cd flac
chmod +x configure
./configure
make
make install # but it installs to /usr/local/bin jus copy flac to /usr/bin afterwards or manually do it from the build dir
after that it will transcribe speech for you
to test record an audio file
arecord -f S16_LE -d 3 -r 16000 recording.wav
python code for transcribe
#!/usr/bin/env python
import speech_recognition as sr
r = sr.Recognizer()
with sr.AudioFile('recording.wav') as source:
audio = r.record(source, duration=60)
command = r.recognize_google(audio)
text_file = open("Output.txt", "w")
text_file.write(command)
text_file.close()
for text to speak then yuo can install epeak
opkg install espeak
or python code
# pip install gTTS
# opkg install mpg123
from gtts import gTTS
import os
import sys
# define variables
s = sys.argv[1]
file = "audio.mp3"
# initialize tts, create mp3 and play
tts = gTTS(s, 'com')
tts.save(file)
os.system("mpg123 " + file)
for offine decoding
you can install kaldi docker container would be the easiest but uses 5 gigs of memory so you need a fairly big openwrt devic
then you would interface with this code
#!/usr/bin/env python3
import asyncio
import websockets
import sys
import wave
async def run_test(uri):
async with websockets.connect(uri) as websocket:
wf = wave.open(sys.argv[1], "rb")
await websocket.send('{ "config" : { "sample_rate" : %d } }' % (wf.getframerate()))
buffer_size = int(wf.getframerate() * 0.2) # 0.2 seconds of audio
while True:
data = wf.readframes(buffer_size)
if len(data) == 0:
break
await websocket.send(data)
print (await websocket.recv())
await websocket.send('{"eof" : 1}')
print (await websocket.recv())
asyncio.run(run_test('ws://localhost:2700'))
example script using the above python called kaldi
arecord -f S16_LE -d 3 -r 8000 /tmp/test-mic.wav > /dev/null 2>&1
python kaldi.py /tmp/test-mic.wav | jq '.text' | sed 's/\"\"//g' | sed 's/\<null\>//g' | sed -r '/^\s*$/d'
this script will record 3 seconds of auto then translate via kaldi and output
I was hoping to do julius but that still a work in progress as it the smallest foot print
or perhaps if I can some how figure out how to install pytorch on openwrt
i could install openai-whisper