Speech-to-Text streaming demo in React

Maciej

February 4, 2022

9 min read

Contents

ASR Demo
Process an audio file or a live stream
ASR in your language
Audio data format
Recording audio as base64
Audio saved in Chrome is missing duration
Transcription confidence scores
Configure AWS Transcribe
StreamingView React component
Transcribe Streaming Client
Helpers for audio encoding
Used NPM libraries
Typescript
AWS Transcribe Streaming DEMO
Troubleshooting

Transcribing live streamed audio to text has become more and more popular. It’s useful in preparing subtitles or archiving conversation in text mode. ASR – automatic speech recognition – uses advanced machine learning solutions to analyze the context of speech and return text data.

ASR Demo

In this example, we’re going to create a React Component that can be reused in your application. It uses the AWS SDK – Client Transcribe Streaming package to connect to the Amazon Transcribe service using web socket. Animated GIF ASR-streaming-demo.gif presents what we are going to build.

Process an audio file or a live stream

There are two modes we can use: uploading an audio file which will be added as a transcription job and wait for results or live streaming using websocket where the response is instant. This demo will focus on streaming audio where we can see live text recognized returned from API.

ASR in your language

In the config file we can specify the language code for our audio conversation. The most popular language – English – uses lang code: ‘en-US’. AWS Transcribe currently supports over 30 languages, more info at: https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html

Audio data format

To achieve good results of Speech to Text recognition, we need to provide a proper audio format that is sent to AWS Transcribe API. It expects audio to be encoded as PCM data. The sample rate is also important, having better quality of voice means we will receive better results. Currently, ‘en-US’ supports sample rates up to 48,000 Hz, and this value was optimal during our tests.

Recording audio as base64

As an additional feature, we’ve implemented saving audio as a base64 audio file. RecordRTC library uses the MediaRecorder Browser API to record voice from microphone. Received BLOB format is converted to base64 which can be easily saved as an archived conversation, or optionally sent to S3 storage.

Audio saved in Chrome is missing duration

The Chrome browser features a bug that was identified in 2016: a file saved using MediaRecorder has malformed metadata, which causes the played file to have incorrect length (duration). As a result, the file recorded in Chrome is not seekable: webm and weba files can be played from the beginning, but searching through them is difficult / impossible.

The issue was reported to https://bugs.chromium.org/p/chromium/issues/detail?id=642012, but has not been fixed yet. There are some existing workarounds, for example: using the ts-ebml Reader and fixing the metadata part of the file. To fix missing duration in Chrome, we’re using the injectMetadata method.

Transcription confidence scores

While doing live speech-to-text recognition, AWS returns a confidence score between 0 and 1. It’s not an accuracy measurement, but rather the service’s self-evaluation on how well it may have transcribed a word. Having this value we can specify the confidence threshold and decide which text data should be saved.

In the presented demo, we will make input text background green only when the receiving data is not partial (AWS has already analyzed the text and is confident with the result). The attached screenshot shows “partial results”. Only when the sentence is finished, the transcription will be matching audio.

Configure AWS Transcribe

To start using streaming we need to obtain: accessKey, secretAccessKey, and choose the AWS region. The configuration can be set up in: src/SpeechToText/transcribe.constants.ts

The main application is just text area and a microphone icon. After clicking the icon, React will connect with transcribe websocket and will start voice analyzing. After clicking Pause, the audio element will appear with autoplay enabled. The source of audio (src) is: base64 URI content of just recorded voice message.

 // src/App.js
import React, { useState, useEffect } from 'react';
import './App.css';
import TextField from '@material-ui/core/TextField';
import StreamingView from './SpeechToText/StreamingView';
import {
  BrowserRouter as Router,
  Switch,
  Route,
} from 'react-router-dom';
function App() {
  // eslint-disable-next-line
  const [inputMessageText, setInputMessageText] = useState("");
  // eslint-disable-next-line
  const [recordedAudio, setRecordedAudio] = useState(null);
  useEffect(() => {
     if(recordedAudio){
       console.log("recorded!");
       console.log(recordedAudio);
     }
  }, [recordedAudio]);
  return (
    <div className="App">
      <Router>
        <Switch>
          <Route path="/">
            <h1>AWS Transcribe Streaming DEMO</h1>
            <TextField
                variant="outlined"
                placeholder="Transcribe results"
                minRows={10}
                value={inputMessageText}
                readOnly={true}
                multiline
                maxRows={Infinity}
                id="input1"
            />
            <StreamingView setInputMessageText={setInputMessageText} setRecordedAudio={setRecordedAudio} />
            { recordedAudio && <p>Recorded audio (base64 URI):</p> }
            { recordedAudio && <audio src={recordedAudio.data.audioRecorded} autoPlay controls /> }
          </Route>
        </Switch>
      </Router>
    </div>
  );
}
export default App;

And some additional CSS styles:

/* src/App.js */
.App {
  text-align: center;
}
.is-final-recognized .MuiTextField-root{
  animation: ctcompleted 1s 1;
}
.is-recognizing .MuiTextField-root{
  background:rgba(0,0,0,.05);
}
@keyframes ctcompleted
{
  0%      {background:#dcedc8;}
  25%     {background:#dcedc8;}
  75%     {background:#dcedc8;}
  100%    {background:inherit;}
}

Transcribe API keys

The previously generated AWS API keys should be hardcoded in the config object.

// src/SpeechToText/transcribe.constants.ts
const transcribe = {
  accessKey: 'AAA',
  secretAccessKey: 'BBB',
  // default config
  language: 'en-US',
  region: 'eu-west-1',
  sampleRate: 48000,
  vocabularyName: '',
};
export default transcribe;

StreamingView React component

The reusable component for audio streaming receives text from API and passes the recorded audio to the parent. It’s written using TypeScript, the icons are imported from material-ui.

// src/SpeechToText/StreamingView.tsx
import React, { useEffect, useMemo, useState } from 'react';
import IconButton from '@material-ui/core/IconButton';
import KeyboardVoiceIcon from '@material-ui/icons/KeyboardVoice';
import PauseIcon from '@material-ui/icons/Pause';
import TranscribeController from './transcribe.controller';
import { setBodyClassName } from './helpers';
import transcribe from "./transcribe.constants";
const StreamingView: React.FC<{
    componentName: 'StreamingView';
    setInputMessageText: (arg1: string) => void;
    setRecordedAudio: (arg1: any) => void;
}> = ({setInputMessageText, setRecordedAudio}) => {
    const [transcribeConfig] = useState(transcribe);
    const [recognizedTextArray, setRecognizedTextArray] = useState<string[]>([]);
    const [recognizingText, setRecognizingText] = useState<string>('');
    const [started, setStarted] = useState(false);
    const transcribeController = useMemo(() => new TranscribeController(), []);
    useEffect(() => {
        transcribeController.setConfig(transcribeConfig);
        setStarted(false);
    }, [transcribeConfig, transcribeController]);
    useEffect(() => {
        const display = ({ text, final }: { text: string; final: boolean }) => {
            // debug
            console.log(text);
            if (final) {
                setRecognizingText('');
                setRecognizedTextArray((prevTextArray) => [...prevTextArray, text]);
                setBodyClassName("is-recognizing","is-final-recognized");
            } else {
                setBodyClassName("is-final-recognized","is-recognizing");
                setRecognizingText(text);
            }
        };
        // @ts-ignore
        const getAudio = ({aaa}: { aaa: Blob}) => {
            let customObj = {};
            if(aaa.type){
                // @ts-ignore
                customObj.audioType = aaa.type;
            }
            // convert Blob to base64 uri
            let reader = new FileReader();
            reader.readAsDataURL(aaa);
            reader.onloadend = () => {
                if(reader.result){
                    // @ts-ignore
                    customObj.audioRecorded = reader.result.toString();
                    setRecordedAudio({name: "audioRecorded", data: customObj});
                }
            }
        }
        transcribeController.on('recognized', display);
        transcribeController.on('newAudioRecorded', getAudio);
        return () => {
            transcribeController.removeListener('recognized', display);
            transcribeController.removeListener('newAudioRecorded', getAudio);
        };
    }, [transcribeController, setRecordedAudio]);
    useEffect(() => {
        (async () => {
            if (started) {
                setRecognizedTextArray([]);
                setRecognizingText('');
                setRecordedAudio(null);
                await transcribeController.init().catch((error: Error) => {
                    console.log(error);
                    setStarted(false);
                });
            } else {
                await transcribeController.stop();
            }
        })();
    }, [started, transcribeController, setRecordedAudio]);
    useEffect(() => {
        const currentRecognizedText = [...recognizedTextArray, recognizingText].join(' ');
        setInputMessageText(currentRecognizedText);
    }, [recognizedTextArray, recognizingText, setInputMessageText]);
    return (<>
            <IconButton onClick={() => {
                setStarted(!started);
            }}>
                {! started ? <KeyboardVoiceIcon/> : <PauseIcon />}
            </IconButton>
        </>
    );
};
export default StreamingView;

Transcribe Streaming Client

The main part of the application is the controller, where communication between AWS Transcribe and Client is established. The stream sends PCM encoded audio and receives partial results through websocket. RecordRTC records audio in the background using native MediaRecorder API, which is supported by all modern browsers.

// src/SpeechToText/transcribe.controller.ts
import {
    TranscribeStreamingClient,
    StartStreamTranscriptionCommand,
    StartStreamTranscriptionCommandOutput,
} from '@aws-sdk/client-transcribe-streaming';
import MicrophoneStream from 'microphone-stream';
import { PassThrough } from 'stream';
import { EventEmitter } from 'events';
import transcribeConstants from './transcribe.constants';
import { streamAsyncIterator, EncodePcmStream } from './helpers';
import { Decoder, tools, Reader } from 'ts-ebml';
import RecordRTC  from 'recordrtc';
class TranscribeController extends EventEmitter {
    private audioStream: MicrophoneStream | null;
    private rawMediaStream: MediaStream | null | any;
    private audioPayloadStream: PassThrough | null;
    private transcribeConfig?: typeof transcribeConstants;
    private client?: TranscribeStreamingClient;
    private started: boolean;
    private mediaRecorder: RecordRTC | null;
    private audioBlob: Blob | null;
    constructor() {
        super();
        this.audioStream = null;
        this.rawMediaStream = null;
        this.audioPayloadStream = null;
        this.started = false;
        this.mediaRecorder = null;
        this.audioBlob = null;
    }
    setAudioBlob(Blob: Blob | null){
        this.audioBlob = Blob;
        const aaa = this.audioBlob;
        this.emit('newAudioRecorded', {aaa});
    }
    hasConfig() {
        return !!this.transcribeConfig;
    }
    setConfig(transcribeConfig: typeof transcribeConstants) {
        this.transcribeConfig = transcribeConfig;
    }
    validateConfig() {
        if (
            !this.transcribeConfig?.accessKey ||
            !this.transcribeConfig.secretAccessKey
        ) {
            throw new Error(
                'missing required config: access key and secret access key are required',
            );
        }
    }
    recordAudioData = async (stream: MediaStream) =>{
        this.mediaRecorder = new RecordRTC(stream,
            {
                type: "audio",
                disableLogs: true,
            });
        this.mediaRecorder.startRecording();
        // @ts-ignore
        this.mediaRecorder.stream = stream;
        return stream;
    }
    stopRecordingCallback = () => {
        // @ts-ignore
        this.injectMetadata(this.mediaRecorder.getBlob())
            // @ts-ignore
            .then(seekableBlob => {
                this.setAudioBlob(seekableBlob);
                // @ts-ignore
                this.mediaRecorder.stream.stop();
                // @ts-ignore
                this.mediaRecorder.destroy();
                this.mediaRecorder = null;
            })
    }
    readAsArrayBuffer = (blob: Blob) => {
        return new Promise((resolve, reject) => {
            const reader = new FileReader();
            reader.readAsArrayBuffer(blob);
            reader.onloadend = () => { resolve(reader.result); };
            reader.onerror = (ev) => {
                // @ts-ignore
                reject(ev.error);
            };
        });
    }
    injectMetadata = async (blob: Blob) => {
        const decoder = new Decoder();
        const reader = new Reader();
        reader.logging = false;
        reader.drop_default_duration = false;
        return this.readAsArrayBuffer(blob)
            .then(buffer => {
                // fix for Firefox
                if(! blob.type.includes('webm')){
                    return blob;
                }
                // @ts-ignore
                const elms = decoder.decode(buffer);
                elms.forEach((elm) => { reader.read(elm); });
                reader.stop();
                const refinedMetadataBuf =
                    tools.makeMetadataSeekable(reader.metadatas, reader.duration, reader.cues);
                // @ts-ignore
                const body = buffer.slice(reader.metadataSize);
                return new Blob([ refinedMetadataBuf, body ], { type: blob.type });
            });
    }
    async init() {
        this.started = true;
        if (!this.transcribeConfig) {
            throw new Error('transcribe config is not set');
        }
        this.validateConfig();
        this.audioStream = new MicrophoneStream();
        this.rawMediaStream = await window.navigator.mediaDevices.getUserMedia({
            video: false,
            audio: {
                sampleRate: this.transcribeConfig.sampleRate,
            },
        }).then(this.recordAudioData, this.microphoneAccessError)
            .catch(function(err) {
                console.log(err);
            });
        await this.audioStream.setStream(this.rawMediaStream);
        this.audioPayloadStream = this.audioStream
            .pipe(new EncodePcmStream())
            .pipe(new PassThrough({ highWaterMark: 1 * 1024 }));
        // creating and setting up transcribe client
        const config = {
            region: this.transcribeConfig.region,
            credentials: {
                accessKeyId: this.transcribeConfig.accessKey,
                secretAccessKey: this.transcribeConfig.secretAccessKey,
            },
        };
        this.client = new TranscribeStreamingClient(config);
        const command = new StartStreamTranscriptionCommand({
            LanguageCode: this.transcribeConfig.language,
            MediaEncoding: 'pcm',
            MediaSampleRateHertz: this.transcribeConfig.sampleRate,
            AudioStream: this.audioGenerator.bind(this)(),
        });
        try {
            const response = await this.client.send(command);
            this.onStart(response);
        } catch (error) {
            if (error instanceof Error) {
            }
        } finally {
            // finally.
        }
    }
    microphoneAccessError = (error:any) => {
        console.log(error);
    }
    async onStart(response: StartStreamTranscriptionCommandOutput) {
        try {
            if (response.TranscriptResultStream) {
                for await (const event of response.TranscriptResultStream) {
                    const results = event.TranscriptEvent?.Transcript?.Results;
                    if (results && results.length > 0) {
                        const [result] = results;
                        const final = !result.IsPartial;
                        const alternatives = result.Alternatives;
                        if (alternatives && alternatives.length > 0) {
                            const [alternative] = alternatives;
                            const text = alternative.Transcript;
                            this.emit('recognized', { text, final });
                        }
                    }
                }
            }
        } catch (error) {
            console.log(error);
        }
    }
    async stop() {
        this.started = false;
        // request to stop recognition
        this.audioStream?.stop();
        this.audioStream = null;
        this.rawMediaStream = null;
        this.audioPayloadStream?.removeAllListeners();
        this.audioPayloadStream?.destroy();
        this.audioPayloadStream = null;
        this.client?.destroy();
        this.client = undefined;
        // @ts-ignore
        if(this.mediaRecorder){
            this.mediaRecorder.stopRecording(this.stopRecordingCallback);
        }
    }
    async *audioGenerator() {
        if (!this.audioPayloadStream) {
            throw new Error('payload stream not created');
        }
        for await (const chunk of streamAsyncIterator(this.audioPayloadStream)) {
            if (this.started) {
                yield { AudioEvent: { AudioChunk: chunk } };
            } else {
                break;
            }
        }
    }
}
export default TranscribeController;

Helpers for audio encoding

Additional methods for manipulating audio are defined in helpers.ts. It also includes a function for changing DOM body className.

// src/SpeechToText/helpers.ts
/* eslint-disable no-await-in-loop */
/* eslint-disable @typescript-eslint/no-explicit-any */
import { PassThrough } from 'stream';
import { Transform, TransformCallback } from 'stream';
import MicrophoneStream from 'microphone-stream';
export function mapRoute(text: string) {
    return `${text}-section`;
}
export async function* fromReadable(stream: PassThrough) {
    let exhausted = false;
    const onData = () =>
        new Promise((resolve) => {
            stream.once('data', (chunk: any) => {
                resolve(chunk);
            });
        });
    try {
        while (true) {
            const chunk = (await onData()) as any;
            if (chunk === null) {
                exhausted = true;
                break;
            }
            yield chunk;
        }
    } finally {
        if (!exhausted) {
            stream.destroy();
        }
    }
}
export function streamAsyncIterator(stream: PassThrough) {
    // Get a lock on the stream:
    //   const reader = stream.getReader();
    return {
        [Symbol.asyncIterator]() {
            return fromReadable(stream);
        },
    };
}
/**
 * encodePcm
 */
export function encodePcm(chunk: any) {
    const input = MicrophoneStream.toRaw(chunk);
    let offset = 0;
    const buffer = new ArrayBuffer(input.length * 2);
    const view = new DataView(buffer);
    for (let i = 0; i < input.length; i++, offset += 2) {
        const s = Math.max(-1, Math.min(1, input[i]));
        view.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7fff, true);
    }
    return Buffer.from(buffer);
}
export class EncodePcmStream extends Transform {
    _transform(chunk: any, encoding: string, callback: TransformCallback) {
        const buffer = encodePcm(chunk);
        this.push(buffer);
        callback();
    }
}
export const setBodyClassName = (removeClass:string, addClass:string) => {
    let body = document.getElementsByTagName('body')[0];
    if(removeClass){
        body.classList.remove(removeClass);
    }
    if(addClass){
        body.classList.add(addClass);
    }
}

Used NPM libraries

The app uses different dependencies. The most important ones are:

“@aws-sdk/client-transcribe-streaming”: “^3.3.0”,

“microphone-stream”: “^6.0.1”,

“react”: “^17.0.2”,

“recordrtc”: “^5.6.2”,

“ts-ebml”: “^2.0.2”,

“typescript”: “^4.5.2”,

Below is a full list of the used npm libraries ( package.json ):

"dependencies": {
  "@aws-sdk/client-transcribe-streaming": "^3.3.0",
  "@babel/core": "7.9.0",
  "@material-ui/core": "^4.9.8",
  "@material-ui/icons": "^4.9.1",
  "@types/node": "^12.20.37",
  "@types/react": "^17.0.37",
  "@types/recordrtc": "^5.6.8",
  "@typescript-eslint/eslint-plugin": "^2.10.0",
  "@typescript-eslint/parser": "^2.10.0",
  "babel-eslint": "10.1.0",
  "babel-jest": "^24.9.0",
  "babel-loader": "8.1.0",
  "babel-plugin-named-asset-import": "^0.3.6",
  "babel-preset-react-app": "^9.1.2",
  "camelcase": "^5.3.1",
  "case-sensitive-paths-webpack-plugin": "2.3.0",
  "css-loader": "3.4.2",
  "dotenv": "8.2.0",
  "dotenv-expand": "5.1.0",
  "eslint": "^6.6.0",
  "eslint-config-react-app": "^5.2.1",
  "eslint-loader": "3.0.3",
  "eslint-plugin-flowtype": "4.6.0",
  "eslint-plugin-import": "2.20.1",
  "eslint-plugin-jsx-a11y": "6.2.3",
  "eslint-plugin-react": "7.19.0",
  "eslint-plugin-react-hooks": "^1.6.1",
  "file-loader": "4.3.0",
  "fs-extra": "^8.1.0",
  "html-webpack-plugin": "4.0.0-beta.11",
  "jest": "24.9.0",
  "jest-environment-jsdom-fourteen": "1.0.1",
  "jest-watch-typeahead": "0.4.2",
  "microphone-stream": "^6.0.1",
  "mini-css-extract-plugin": "0.9.0",
  "optimize-css-assets-webpack-plugin": "5.0.3",
  "pnp-webpack-plugin": "1.6.4",
  "postcss-flexbugs-fixes": "4.1.0",
  "postcss-loader": "3.0.0",
  "postcss-normalize": "8.0.1",
  "postcss-preset-env": "6.7.0",
  "postcss-safe-parser": "4.0.1",
  "react": "^17.0.2",
  "react-app-polyfill": "^1.0.6",
  "react-dev-utils": "^10.2.1",
  "react-dom": "^17.0.2",
  "react-router-dom": "^5.1.2",
  "recordrtc": "^5.6.2",
  "resolve": "1.15.0",
  "resolve-url-loader": "3.1.1",
  "sass-loader": "8.0.2",
  "style-loader": "0.23.1",
  "terser-webpack-plugin": "2.3.5",
  "ts-ebml": "^2.0.2",
  "ts-pnp": "1.1.6",
  "typescript": "^4.5.2",
  "url-loader": "2.3.0",
  "web-vitals": "^1.1.2",
  "webpack": "4.42.0",
  "webpack-dev-server": "3.10.3",
  "webpack-manifest-plugin": "2.2.0",
  "workbox-webpack-plugin": "4.3.1"
},

Typescript

The application is written using Typescript. For proper compilation we need to have tsconfig.json placed in the main directory.

// tsconfig.json
{
  "compilerOptions": {
    "target": "es5",
    "lib": ["dom", "dom.iterable", "esnext"],
    "allowJs": true,
    "skipLibCheck": true,
    "esModuleInterop": true,
    "allowSyntheticDefaultImports": true,
    "strict": true,
    "forceConsistentCasingInFileNames": true,
    "noFallthroughCasesInSwitch": true,
    "module": "esnext",
    "moduleResolution": "node",
    "resolveJsonModule": true,
    "isolatedModules": true,
    "noEmit": true,
    "jsx": "react-jsx",
    "typeRoots": ["./node_modules/@types", "./@types"]
  },
  "include": ["src/**/*"],
  "exclude": ["./node_modules", "./node_modules/*"]
}

AWS Transcribe Streaming DEMO

We use NODE v16.13.0 and React 17.0.2. To run the application, 2 commands should be performed:

npm install
npm run start

Here is a screenshot of an example visible in the browser. You will be able to test Speech-to-text functionality and implement it in your application.

Thanks to Muhammad Qasim whose demo inspired this article. More info at: https://github.com/qasim9872/react-amazon-transcribe-streaming-demo

Troubleshooting

Problem:

Failed to compile.
src/SpeechToText/transcribe.controller.ts
TypeScript error in src/SpeechToText/transcribe.controller.ts(173,14):
Property 'pipe' does not exist on type 'MicrophoneStream'.  TS2339

Solution:

add proper Typescript types definition to the main directory ( @types/microphone-stream/index.d.ts ):

// @types/microphone-stream/index.d.ts
declare module 'microphone-stream' {
  import { Readable } from 'stream';
  export declare class MicrophoneStream extends Readable {
    static toRaw(chunk: any): Float32Array;
    constructor(opts?: {
      stream?: MediaStream;
      objectMode?: boolean;
      bufferSize?: null | 256 | 512 | 1024 | 2048 | 4096 | 8192 | 16384;
      context?: AudioContext;
    });
    public context: AudioContext;
    setStream(mediaStream: MediaStream): Promise<void>;
    stop(): void;
    pauseRecording(): void;
    playRecording(): void;
  }
  export default MicrophoneStream;
}

That’s it for today’s tutorial. Make sure to follow us for other useful tips and guidelines.

Do you need someone to implement this solution for you? Check out our specialists for hire in the outsourcing section. Are you considering a global project and are uncertain how to proceed? See how we do it.

Maciej, Written on February 4, 2022

Techleader of createIT, working in the IT sector for over 14 years. Innovator, author, and owner of an e-shop that he has been continuously developing for over 10 years. Currently, with his team of over 20 developers, he supervises all kinds of e-commerce projects of varied structure, characteristics, and size: from simple on-line shops for retail clients to advanced B2B platforms with support of multiple shopping carts, multiple languages, and integration with external systems. His work also involves modifying and expanding the existing platforms with millions of products in their databases, platforms that support variable pricing (several variants of the same product), dropshipping modules, integration with storage systems, integration with courier delivery services, and the last but not least, the optimization of scripts. By participating in a multitude of projects, he had the opportunity to “test out” all kinds of platforms (open-source, SaaS – Cloud-Hosted eCommerce Platforms) and dedicated platforms.

ask the author

Techleader of createIT, working in the IT sector for over 14 years. Innovator, author, and owner of an e-shop that he has been continuously developing for over 10 years. Currently, with his team of over 20 developers, he supervises all kinds of e-commerce projects of varied structure, characteristics, and size: from simple on-line shops for retail clients to advanced B2B platforms with support of multiple shopping carts, multiple languages, and integration with external systems. His work also involves modifying and expanding the existing platforms with millions of products in their databases, platforms that support variable pricing (several variants of the same product), dropshipping modules, integration with storage systems, integration with courier delivery services, and the last but not least, the optimization of scripts. By participating in a multitude of projects, he had the opportunity to “test out” all kinds of platforms (open-source, SaaS – Cloud-Hosted eCommerce Platforms) and dedicated platforms.

ask the author

Comments

6 response

Hetal says:

February 9, 2023 at 9:36 am

hi,
this is nice article but it cant run in my system even I follow ur instructions.
I got below error
eslintrc » eslint-config-react-app/jest#overrides[0]:
Environment key “jest/globals” is unknown
But u given reference github link, that is working fine.
Now I want to add speaker labels so is this possible?

Reply
- createIT says:
  
  May 29, 2023 at 12:41 pm
  
  The error you are encountering seems to be related to ESLint configuration in your package.json file. Specifically, it’s having an issue with the jest/globals environment key. You might solve this problem by updating your package.json and removing “react-app/jest” from the extends array under eslintConfig or you could remove the whole eslintConfig if you’re not using it. After that, run npm install again and check if the issue is resolved. If you had any devDependencies related to ESLint, you might also need to remove them temporarily, run npm install, and then add them back.
  
  Regarding your question about adding speaker labels, Amazon Transcribe provides a feature called speaker diarization which can accurately label speakers in an audio stream. This is useful in scenarios where you need to distinguish between different speakers in a conversation.
  
  In your case, to enable speaker labeling, you would need to set: ShowSpeakerLabel: true, Once this is done, Amazon Transcribe streaming will return a result object as part of the transcription response that can be used to label the speakers in the transcript. The result object contains several parameters including a Speaker parameter, which represents the speaker label.
  
  Here is example of a Java application, but the principle is the same if you’re using a different language with the AWS SDK.
  https://aws.amazon.com/blogs/machine-learning/using-speaker-diarization-for-streaming-transcription-with-amazon-transcribe-and-amazon-transcribe-medical/
  
  Reply

Hi,
Your article is very nice and very helpful, thank.
can you tell me is this possible that add 2 speakers label in this ?

Thanks

Hi, thanks for your comment. The StartStreamTranscriptionCommand is used for real-time transcription with AWS Transcribe Streaming.
Below is an example code snippet that demonstrates how you can use the Transcribe service with speaker identification:


const { TranscribeStreamingClient, StartStreamTranscriptionCommand } = require("@aws-sdk/client-transcribe-streaming");
const { PassThrough } = require('stream');
async function transcribeStreaming() {
    const client = new TranscribeStreamingClient({ region: 'your_region' });    
    const audioStream = new PassThrough();  
    const command = new StartStreamTranscriptionCommand({
        LanguageCode: 'en-US',
        MediaEncoding: 'pcm',
        MediaSampleRateHertz: 8000, // Sample rate of your audio in Hz
        AudioStream: audioStream,
        ShowSpeakerLabel: true,
        EnableChannelIdentification: true,
        NumberOfChannels: 2 // assuming stereo audio
    });
    
    const response = client.send(command);
    
    response.on('data', event => {
        for (const result of event.Transcript.Results) {
            if (result.IsPartial === false) {
                const noOfSpeakers = result.Alternatives[0].Items.length;
                for(let i = 0; i < noOfSpeakers; i++){
                    console.log(`Speaker ${result.Alternatives[0].Items[i].SpeakerLabel}: ${result.Alternatives[0].Items[i].Content}`);
                }
            }
        }
    });

    // Pipe your audio data into the stream
    // Depends on your implementation, it could be from microphone, file read stream, etc.
    // For example, you might use the fs module to create a read stream from an audio file
    const fs = require('fs');
    fs.createReadStream('path_to_your_audio_file').pipe(audioStream);

    response.on('error', console.error);
}

transcribeStreaming();

Mark says:

March 8, 2023 at 12:07 am

Please provide more explanation to solving the pipe error ts 2339
Maybe an example project.

Reply
- createIT says:
  
  May 29, 2023 at 12:06 pm
  
  The TypeScript error TS2339 occurs when you’re trying to access a property or method that does not exist on a particular type. In your case, the error message suggests that the pipe method doesn’t exist on the MicrophoneStream type.
  
  The solution provided in the article involves adding a proper TypeScript types definition to the main directory. Specifically, you’re asked to create a file named index.d.ts inside the @types/microphone-stream directory and include the following TypeScript types definition: https://www.createit.com/blog/speech-to-text-streaming-demo-in-react/#stoc-troubleshooting . This TypeScript definition is extending the existing MicrophoneStream type to include the methods and properties you need for your application.
  
  If you’re using a module bundler like webpack, ensure it’s configured to include .d.ts files when resolving modules.
  
  Reply

Speech-to-Text streaming demo in React

ASR Demo

Process an audio file or a live stream

ASR in your language

Audio data format

Recording audio as base64

Audio saved in Chrome is missing duration

Transcription confidence scores

Configure AWS Transcribe

StreamingView React component

Transcribe Streaming Client

Helpers for audio encoding

Used NPM libraries

Typescript

AWS Transcribe Streaming DEMO

Troubleshooting

Comments

Add comment Cancel reply

Popular news

Online visibility in industries with increased advertising restrictions

How to Get and Use the ChatGPT API

eCommerce growth – is your business ready?

Google Consent Mode v2. How to implement it? Why you should?

Stop focusing on yourself, start thinking about customers

Digital marketing without third-party cookies – new rules

eCommerce healthcheck

Bing Ads/Microsoft Advertising vs. Google Ads: discover the advantage

Live Visitor Count in WooCommerce with SSE

Calculate shipping costs programmatically in WooCommerce

Support – Tips and Tricks
All tips in one place, and the database keeps growing. Stay up to date and optimize your work!

Get a free advice now!

Pick the topic

Thank you for your message. It has been sent.

ASR Demo

Process an audio file or a live stream

ASR in your language

Audio data format

Recording audio as base64

Audio saved in Chrome is missing duration

Transcription confidence scores

Configure AWS Transcribe

StreamingView React component

Transcribe Streaming Client

Helpers for audio encoding

Used NPM libraries

Typescript

AWS Transcribe Streaming DEMO

Troubleshooting

Comments

Add comment Cancel reply

Popular news

Online visibility in industries with increased advertising restrictions

How to Get and Use the ChatGPT API

eCommerce growth – is your business ready?

Google Consent Mode v2. How to implement it? Why you should?

Stop focusing on yourself, start thinking about customers

Digital marketing without third-party cookies – new rules

eCommerce healthcheck

Bing Ads/Microsoft Advertising vs. Google Ads: discover the advantage

Live Visitor Count in WooCommerce with SSE

Calculate shipping costs programmatically in WooCommerce

Support – Tips and Tricks All tips in one place, and the database keeps growing. Stay up to date and optimize your work!

Support – Tips and Tricks
All tips in one place, and the database keeps growing. Stay up to date and optimize your work!