Howto Speech recongnition

hi there for those who may want speech recognition capability on openwrt

opkg install git
opkg install alsa-utils
opkg install alsa-lib
opkg install portaudio
opkg install ar
opkg install python3-dev

# Make a working directory

mkdir /opt/tmp/
cd /opt/tmp


# Fetch PyAudio source

git clone https://people.csail.mit.edu/hubert/git/pyaudio.git


# Fetch portaudio.h (required for the build)

cd /opt/tmp/pyaudio/src
wget https://raw.githubusercontent.com/EddieRingle/portaudio/master/include/portaudio.h


# build away!

cd /opt/tmp/pyaudio
python setup.py build
python setup.py install

# done

then you can install

pip install SpeechRecognition

you will need to compile flac

opkg install gcc make 

wget https://github.com/xiph/flac/releases/download/1.3.4/flac-1.3.4.tar.xz -O flac.tar.xz
tar -xvf fac.tar.xz
cd flac
chmod +x configure
./configure
make 
make install  # but it installs to /usr/local/bin  jus copy flac to /usr/bin afterwards  or manually do it from the build dir

after that it will transcribe speech for you

to test record an audio file

arecord -f S16_LE -d 3 -r 16000  recording.wav

python code for transcribe

#!/usr/bin/env python

import speech_recognition as sr

r = sr.Recognizer()
with sr.AudioFile('recording.wav') as source:
    audio = r.record(source, duration=60)

command = r.recognize_google(audio)

text_file = open("Output.txt", "w")
text_file.write(command)
text_file.close()

for text to speak then yuo can install epeak

opkg install espeak

or python code

# pip install gTTS
# opkg install mpg123

from gtts import gTTS
import os
import sys

# define variables
s = sys.argv[1]
file = "audio.mp3"

# initialize tts, create mp3 and play
tts = gTTS(s, 'com')
tts.save(file)
os.system("mpg123 " + file)

for offine decoding
you can install kaldi docker container would be the easiest but uses 5 gigs of memory so you need a fairly big openwrt devic
then you would interface with this code

#!/usr/bin/env python3

import asyncio
import websockets
import sys
import wave

async def run_test(uri):
    async with websockets.connect(uri) as websocket:

        wf = wave.open(sys.argv[1], "rb")
        await websocket.send('{ "config" : { "sample_rate" : %d } }' % (wf.getframerate()))
        buffer_size = int(wf.getframerate() * 0.2) # 0.2 seconds of audio
        while True:
            data = wf.readframes(buffer_size)

            if len(data) == 0:
                break

            await websocket.send(data)
            print (await websocket.recv())

        await websocket.send('{"eof" : 1}')
        print (await websocket.recv())

asyncio.run(run_test('ws://localhost:2700'))

example script using the above python called kaldi

arecord -f S16_LE -d 3 -r 8000  /tmp/test-mic.wav > /dev/null 2>&1
python kaldi.py /tmp/test-mic.wav | jq '.text'  | sed 's/\"\"//g' | sed 's/\<null\>//g' | sed -r '/^\s*$/d'

this script will record 3 seconds of auto then translate via kaldi and output

I was hoping to do julius but that still a work in progress as it the smallest foot print
or perhaps if I can some how figure out how to install pytorch on openwrt
i could install openai-whisper