Speech-to-Text streaming demo in React - createIT
Get a free advice now!

    Pick the topic
    Developer OutsourcingWeb developingApp developingDigital MarketingeCommerce systemseEntertainment systems

    Thank you for your message. It has been sent.
    Tags

    Speech-to-Text streaming demo in React

    Speech-to-Text streaming demo in React

    Transcribing live streamed audio to text has become more and more popular. It’s useful in preparing subtitles or archiving conversation in text mode. ASR – automatic speech recognition – uses advanced machine learning solutions to analyze the context of speech and return text data.

    ASR Demo

    In this example, we’re going to create a React Component that can be reused in your application. It uses  the AWS SDK – Client Transcribe Streaming package to connect to the Amazon Transcribe service using web socket. Animated GIF ASR-streaming-demo.gif presents what we are going to build.

    AWS transcribe streaming demo

    Process an audio file or a live stream

    There are two modes we can use: uploading an audio file which will be added as a transcription job and wait for results or live streaming using websocket where the response is instant. This demo will focus on streaming audio where we can see live text recognized returned from API.

    ASR in your language

    In the config file we can specify the language code for our audio conversation. The most popular language – English – uses lang code: ‘en-US’. AWS Transcribe currently supports over 30 languages, more info at: https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html

    Audio data format

    To achieve good results of Speech to Text recognition, we need to provide a proper audio format that is sent to AWS Transcribe API. It expects audio to be encoded as PCM data. The sample rate is also important, having better quality of voice means we will receive better results. Currently, ‘en-US’ supports sample rates up to 48,000 Hz, and this value was optimal during our tests.

    Recording audio as base64

    As an additional feature, we’ve implemented saving audio as a base64 audio file. RecordRTC library uses the MediaRecorder Browser API to record voice from microphone. Received BLOB format is converted to base64 which can be easily saved as an archived conversation, or optionally sent to S3 storage.

    Audio saved in Chrome is missing duration

    The Chrome browser features a bug that was identified in 2016: a file saved using MediaRecorder has malformed metadata, which causes the played file to have incorrect length (duration). As a result, the file recorded in Chrome is not seekable: webm and weba files can be played from the beginning, but searching through them is difficult / impossible.

    The issue was reported to https://bugs.chromium.org/p/chromium/issues/detail?id=642012, but has not been fixed yet. There are some existing workarounds, for example: using the ts-ebml Reader and fixing the metadata part of the file. To fix missing duration in Chrome, we’re using the injectMetadata method.

    Transcription confidence scores

    While doing live speech-to-text recognition, AWS returns a confidence score between 0 and 1. It’s not an accuracy measurement, but rather the service’s self-evaluation on how well it may have transcribed a word. Having this value we can specify the confidence threshold and decide which text data should be saved.

    In the presented demo, we will make input text background green only when the receiving data is not partial (AWS has already analyzed the text and is confident with the result). The attached screenshot shows “partial results”. Only when the sentence is finished, the transcription will be matching audio.

    aws transcribe screenshot

    Configure AWS Transcribe

    To start using streaming we need to obtain: accessKey, secretAccessKey, and choose the AWS region. The configuration can be set up in: src/SpeechToText/transcribe.constants.ts

    The main application is just text area and a microphone icon. After clicking the icon, React will connect with transcribe websocket and will start voice analyzing. After clicking Pause, the audio element will appear with autoplay enabled. The source of audio (src) is: base64 URI content of just recorded voice message.

     // src/App.js
    import React, { useState, useEffect } from 'react';
    import './App.css';
    import TextField from '@material-ui/core/TextField';
    import StreamingView from './SpeechToText/StreamingView';
    import {
      BrowserRouter as Router,
      Switch,
      Route,
    } from 'react-router-dom';
    function App() {
      // eslint-disable-next-line
      const [inputMessageText, setInputMessageText] = useState("");
      // eslint-disable-next-line
      const [recordedAudio, setRecordedAudio] = useState(null);
      useEffect(() => {
         if(recordedAudio){
           console.log("recorded!");
           console.log(recordedAudio);
         }
      }, [recordedAudio]);
      return (
        <div className="App">
          <Router>
            <Switch>
              <Route path="/">
                <h1>AWS Transcribe Streaming DEMO</h1>
                <TextField
                    variant="outlined"
                    placeholder="Transcribe results"
                    minRows={10}
                    value={inputMessageText}
                    readOnly={true}
                    multiline
                    maxRows={Infinity}
                    id="input1"
                />
                <StreamingView setInputMessageText={setInputMessageText} setRecordedAudio={setRecordedAudio} />
                { recordedAudio && <p>Recorded audio (base64 URI):</p> }
                { recordedAudio && <audio src={recordedAudio.data.audioRecorded} autoPlay controls /> }
              </Route>
            </Switch>
          </Router>
        </div>
      );
    }
    export default App;
    
    And some additional CSS styles:
    
    /* src/App.js */
    .App {
      text-align: center;
    }
    .is-final-recognized .MuiTextField-root{
      animation: ctcompleted 1s 1;
    }
    .is-recognizing .MuiTextField-root{
      background:rgba(0,0,0,.05);
    }
    @keyframes ctcompleted
    {
      0%      {background:#dcedc8;}
      25%     {background:#dcedc8;}
      75%     {background:#dcedc8;}
      100%    {background:inherit;}
    }
    
    Transcribe API keys
    
    The previously generated AWS API keys should be hardcoded in the config object.
    
    // src/SpeechToText/transcribe.constants.ts
    const transcribe = {
      accessKey: 'AAA',
      secretAccessKey: 'BBB',
      // default config
      language: 'en-US',
      region: 'eu-west-1',
      sampleRate: 48000,
      vocabularyName: '',
    };
    export default transcribe;

    StreamingView React component

    The reusable component for audio streaming receives text from API and passes the recorded audio to the parent. It’s written using TypeScript, the icons are imported from material-ui.

    // src/SpeechToText/StreamingView.tsx
    import React, { useEffect, useMemo, useState } from 'react';
    import IconButton from '@material-ui/core/IconButton';
    import KeyboardVoiceIcon from '@material-ui/icons/KeyboardVoice';
    import PauseIcon from '@material-ui/icons/Pause';
    import TranscribeController from './transcribe.controller';
    import { setBodyClassName } from './helpers';
    import transcribe from "./transcribe.constants";
    const StreamingView: React.FC<{
        componentName: 'StreamingView';
        setInputMessageText: (arg1: string) => void;
        setRecordedAudio: (arg1: any) => void;
    }> = ({setInputMessageText, setRecordedAudio}) => {
        const [transcribeConfig] = useState(transcribe);
        const [recognizedTextArray, setRecognizedTextArray] = useState<string[]>([]);
        const [recognizingText, setRecognizingText] = useState<string>('');
        const [started, setStarted] = useState(false);
        const transcribeController = useMemo(() => new TranscribeController(), []);
        useEffect(() => {
            transcribeController.setConfig(transcribeConfig);
            setStarted(false);
        }, [transcribeConfig, transcribeController]);
        useEffect(() => {
            const display = ({ text, final }: { text: string; final: boolean }) => {
                // debug
                console.log(text);
                if (final) {
                    setRecognizingText('');
                    setRecognizedTextArray((prevTextArray) => [...prevTextArray, text]);
                    setBodyClassName("is-recognizing","is-final-recognized");
                } else {
                    setBodyClassName("is-final-recognized","is-recognizing");
                    setRecognizingText(text);
                }
            };
            // @ts-ignore
            const getAudio = ({aaa}: { aaa: Blob}) => {
                let customObj = {};
                if(aaa.type){
                    // @ts-ignore
                    customObj.audioType = aaa.type;
                }
                // convert Blob to base64 uri
                let reader = new FileReader();
                reader.readAsDataURL(aaa);
                reader.onloadend = () => {
                    if(reader.result){
                        // @ts-ignore
                        customObj.audioRecorded = reader.result.toString();
                        setRecordedAudio({name: "audioRecorded", data: customObj});
                    }
                }
            }
            transcribeController.on('recognized', display);
            transcribeController.on('newAudioRecorded', getAudio);
            return () => {
                transcribeController.removeListener('recognized', display);
                transcribeController.removeListener('newAudioRecorded', getAudio);
            };
        }, [transcribeController, setRecordedAudio]);
        useEffect(() => {
            (async () => {
                if (started) {
                    setRecognizedTextArray([]);
                    setRecognizingText('');
                    setRecordedAudio(null);
                    await transcribeController.init().catch((error: Error) => {
                        console.log(error);
                        setStarted(false);
                    });
                } else {
                    await transcribeController.stop();
                }
            })();
        }, [started, transcribeController, setRecordedAudio]);
        useEffect(() => {
            const currentRecognizedText = [...recognizedTextArray, recognizingText].join(' ');
            setInputMessageText(currentRecognizedText);
        }, [recognizedTextArray, recognizingText, setInputMessageText]);
        return (<>
                <IconButton onClick={() => {
                    setStarted(!started);
                }}>
                    {! started ? <KeyboardVoiceIcon/> : <PauseIcon />}
                </IconButton>
            </>
        );
    };
    export default StreamingView;
    

    Transcribe Streaming Client

    The main part of the application is the controller, where communication between AWS Transcribe and Client is established. The stream sends PCM encoded audio and receives partial results through websocket. RecordRTC records audio in the background using native MediaRecorder API, which is supported by all modern browsers.

    // src/SpeechToText/transcribe.controller.ts
    import {
        TranscribeStreamingClient,
        StartStreamTranscriptionCommand,
        StartStreamTranscriptionCommandOutput,
    } from '@aws-sdk/client-transcribe-streaming';
    import MicrophoneStream from 'microphone-stream';
    import { PassThrough } from 'stream';
    import { EventEmitter } from 'events';
    import transcribeConstants from './transcribe.constants';
    import { streamAsyncIterator, EncodePcmStream } from './helpers';
    import { Decoder, tools, Reader } from 'ts-ebml';
    import RecordRTC  from 'recordrtc';
    class TranscribeController extends EventEmitter {
        private audioStream: MicrophoneStream | null;
        private rawMediaStream: MediaStream | null | any;
        private audioPayloadStream: PassThrough | null;
        private transcribeConfig?: typeof transcribeConstants;
        private client?: TranscribeStreamingClient;
        private started: boolean;
        private mediaRecorder: RecordRTC | null;
        private audioBlob: Blob | null;
        constructor() {
            super();
            this.audioStream = null;
            this.rawMediaStream = null;
            this.audioPayloadStream = null;
            this.started = false;
            this.mediaRecorder = null;
            this.audioBlob = null;
        }
        setAudioBlob(Blob: Blob | null){
            this.audioBlob = Blob;
            const aaa = this.audioBlob;
            this.emit('newAudioRecorded', {aaa});
        }
        hasConfig() {
            return !!this.transcribeConfig;
        }
        setConfig(transcribeConfig: typeof transcribeConstants) {
            this.transcribeConfig = transcribeConfig;
        }
        validateConfig() {
            if (
                !this.transcribeConfig?.accessKey ||
                !this.transcribeConfig.secretAccessKey
            ) {
                throw new Error(
                    'missing required config: access key and secret access key are required',
                );
            }
        }
        recordAudioData = async (stream: MediaStream) =>{
            this.mediaRecorder = new RecordRTC(stream,
                {
                    type: "audio",
                    disableLogs: true,
                });
            this.mediaRecorder.startRecording();
            // @ts-ignore
            this.mediaRecorder.stream = stream;
            return stream;
        }
        stopRecordingCallback = () => {
            // @ts-ignore
            this.injectMetadata(this.mediaRecorder.getBlob())
                // @ts-ignore
                .then(seekableBlob => {
                    this.setAudioBlob(seekableBlob);
                    // @ts-ignore
                    this.mediaRecorder.stream.stop();
                    // @ts-ignore
                    this.mediaRecorder.destroy();
                    this.mediaRecorder = null;
                })
        }
        readAsArrayBuffer = (blob: Blob) => {
            return new Promise((resolve, reject) => {
                const reader = new FileReader();
                reader.readAsArrayBuffer(blob);
                reader.onloadend = () => { resolve(reader.result); };
                reader.onerror = (ev) => {
                    // @ts-ignore
                    reject(ev.error);
                };
            });
        }
        injectMetadata = async (blob: Blob) => {
            const decoder = new Decoder();
            const reader = new Reader();
            reader.logging = false;
            reader.drop_default_duration = false;
            return this.readAsArrayBuffer(blob)
                .then(buffer => {
                    // fix for Firefox
                    if(! blob.type.includes('webm')){
                        return blob;
                    }
                    // @ts-ignore
                    const elms = decoder.decode(buffer);
                    elms.forEach((elm) => { reader.read(elm); });
                    reader.stop();
                    const refinedMetadataBuf =
                        tools.makeMetadataSeekable(reader.metadatas, reader.duration, reader.cues);
                    // @ts-ignore
                    const body = buffer.slice(reader.metadataSize);
                    return new Blob([ refinedMetadataBuf, body ], { type: blob.type });
                });
        }
        async init() {
            this.started = true;
            if (!this.transcribeConfig) {
                throw new Error('transcribe config is not set');
            }
            this.validateConfig();
            this.audioStream = new MicrophoneStream();
            this.rawMediaStream = await window.navigator.mediaDevices.getUserMedia({
                video: false,
                audio: {
                    sampleRate: this.transcribeConfig.sampleRate,
                },
            }).then(this.recordAudioData, this.microphoneAccessError)
                .catch(function(err) {
                    console.log(err);
                });
            await this.audioStream.setStream(this.rawMediaStream);
            this.audioPayloadStream = this.audioStream
                .pipe(new EncodePcmStream())
                .pipe(new PassThrough({ highWaterMark: 1 * 1024 }));
            // creating and setting up transcribe client
            const config = {
                region: this.transcribeConfig.region,
                credentials: {
                    accessKeyId: this.transcribeConfig.accessKey,
                    secretAccessKey: this.transcribeConfig.secretAccessKey,
                },
            };
            this.client = new TranscribeStreamingClient(config);
            const command = new StartStreamTranscriptionCommand({
                LanguageCode: this.transcribeConfig.language,
                MediaEncoding: 'pcm',
                MediaSampleRateHertz: this.transcribeConfig.sampleRate,
                AudioStream: this.audioGenerator.bind(this)(),
            });
            try {
                const response = await this.client.send(command);
                this.onStart(response);
            } catch (error) {
                if (error instanceof Error) {
                }
            } finally {
                // finally.
            }
        }
        microphoneAccessError = (error:any) => {
            console.log(error);
        }
        async onStart(response: StartStreamTranscriptionCommandOutput) {
            try {
                if (response.TranscriptResultStream) {
                    for await (const event of response.TranscriptResultStream) {
                        const results = event.TranscriptEvent?.Transcript?.Results;
                        if (results && results.length > 0) {
                            const [result] = results;
                            const final = !result.IsPartial;
                            const alternatives = result.Alternatives;
                            if (alternatives && alternatives.length > 0) {
                                const [alternative] = alternatives;
                                const text = alternative.Transcript;
                                this.emit('recognized', { text, final });
                            }
                        }
                    }
                }
            } catch (error) {
                console.log(error);
            }
        }
        async stop() {
            this.started = false;
            // request to stop recognition
            this.audioStream?.stop();
            this.audioStream = null;
            this.rawMediaStream = null;
            this.audioPayloadStream?.removeAllListeners();
            this.audioPayloadStream?.destroy();
            this.audioPayloadStream = null;
            this.client?.destroy();
            this.client = undefined;
            // @ts-ignore
            if(this.mediaRecorder){
                this.mediaRecorder.stopRecording(this.stopRecordingCallback);
            }
        }
        async *audioGenerator() {
            if (!this.audioPayloadStream) {
                throw new Error('payload stream not created');
            }
            for await (const chunk of streamAsyncIterator(this.audioPayloadStream)) {
                if (this.started) {
                    yield { AudioEvent: { AudioChunk: chunk } };
                } else {
                    break;
                }
            }
        }
    }
    export default TranscribeController;
    

    Helpers for audio encoding

    Additional methods for manipulating audio are defined in helpers.ts. It also includes a function for changing DOM body className.

    // src/SpeechToText/helpers.ts
    /* eslint-disable no-await-in-loop */
    /* eslint-disable @typescript-eslint/no-explicit-any */
    import { PassThrough } from 'stream';
    import { Transform, TransformCallback } from 'stream';
    import MicrophoneStream from 'microphone-stream';
    export function mapRoute(text: string) {
        return `${text}-section`;
    }
    export async function* fromReadable(stream: PassThrough) {
        let exhausted = false;
        const onData = () =>
            new Promise((resolve) => {
                stream.once('data', (chunk: any) => {
                    resolve(chunk);
                });
            });
        try {
            while (true) {
                const chunk = (await onData()) as any;
                if (chunk === null) {
                    exhausted = true;
                    break;
                }
                yield chunk;
            }
        } finally {
            if (!exhausted) {
                stream.destroy();
            }
        }
    }
    export function streamAsyncIterator(stream: PassThrough) {
        // Get a lock on the stream:
        //   const reader = stream.getReader();
        return {
            [Symbol.asyncIterator]() {
                return fromReadable(stream);
            },
        };
    }
    /**
     * encodePcm
     */
    export function encodePcm(chunk: any) {
        const input = MicrophoneStream.toRaw(chunk);
        let offset = 0;
        const buffer = new ArrayBuffer(input.length * 2);
        const view = new DataView(buffer);
        for (let i = 0; i < input.length; i++, offset += 2) {
            const s = Math.max(-1, Math.min(1, input[i]));
            view.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7fff, true);
        }
        return Buffer.from(buffer);
    }
    export class EncodePcmStream extends Transform {
        _transform(chunk: any, encoding: string, callback: TransformCallback) {
            const buffer = encodePcm(chunk);
            this.push(buffer);
            callback();
        }
    }
    export const setBodyClassName = (removeClass:string, addClass:string) => {
        let body = document.getElementsByTagName('body')[0];
        if(removeClass){
            body.classList.remove(removeClass);
        }
        if(addClass){
            body.classList.add(addClass);
        }
    }
    

    Used NPM libraries

    The app uses different dependencies. The most important ones are:

    “@aws-sdk/client-transcribe-streaming”: “^3.3.0”,

    “microphone-stream”: “^6.0.1”,

    “react”: “^17.0.2”,

    “recordrtc”: “^5.6.2”,

    “ts-ebml”: “^2.0.2”,

    “typescript”: “^4.5.2”,

    Below is a full list of the used npm libraries ( package.json ):

    "dependencies": {
      "@aws-sdk/client-transcribe-streaming": "^3.3.0",
      "@babel/core": "7.9.0",
      "@material-ui/core": "^4.9.8",
      "@material-ui/icons": "^4.9.1",
      "@types/node": "^12.20.37",
      "@types/react": "^17.0.37",
      "@types/recordrtc": "^5.6.8",
      "@typescript-eslint/eslint-plugin": "^2.10.0",
      "@typescript-eslint/parser": "^2.10.0",
      "babel-eslint": "10.1.0",
      "babel-jest": "^24.9.0",
      "babel-loader": "8.1.0",
      "babel-plugin-named-asset-import": "^0.3.6",
      "babel-preset-react-app": "^9.1.2",
      "camelcase": "^5.3.1",
      "case-sensitive-paths-webpack-plugin": "2.3.0",
      "css-loader": "3.4.2",
      "dotenv": "8.2.0",
      "dotenv-expand": "5.1.0",
      "eslint": "^6.6.0",
      "eslint-config-react-app": "^5.2.1",
      "eslint-loader": "3.0.3",
      "eslint-plugin-flowtype": "4.6.0",
      "eslint-plugin-import": "2.20.1",
      "eslint-plugin-jsx-a11y": "6.2.3",
      "eslint-plugin-react": "7.19.0",
      "eslint-plugin-react-hooks": "^1.6.1",
      "file-loader": "4.3.0",
      "fs-extra": "^8.1.0",
      "html-webpack-plugin": "4.0.0-beta.11",
      "jest": "24.9.0",
      "jest-environment-jsdom-fourteen": "1.0.1",
      "jest-watch-typeahead": "0.4.2",
      "microphone-stream": "^6.0.1",
      "mini-css-extract-plugin": "0.9.0",
      "optimize-css-assets-webpack-plugin": "5.0.3",
      "pnp-webpack-plugin": "1.6.4",
      "postcss-flexbugs-fixes": "4.1.0",
      "postcss-loader": "3.0.0",
      "postcss-normalize": "8.0.1",
      "postcss-preset-env": "6.7.0",
      "postcss-safe-parser": "4.0.1",
      "react": "^17.0.2",
      "react-app-polyfill": "^1.0.6",
      "react-dev-utils": "^10.2.1",
      "react-dom": "^17.0.2",
      "react-router-dom": "^5.1.2",
      "recordrtc": "^5.6.2",
      "resolve": "1.15.0",
      "resolve-url-loader": "3.1.1",
      "sass-loader": "8.0.2",
      "style-loader": "0.23.1",
      "terser-webpack-plugin": "2.3.5",
      "ts-ebml": "^2.0.2",
      "ts-pnp": "1.1.6",
      "typescript": "^4.5.2",
      "url-loader": "2.3.0",
      "web-vitals": "^1.1.2",
      "webpack": "4.42.0",
      "webpack-dev-server": "3.10.3",
      "webpack-manifest-plugin": "2.2.0",
      "workbox-webpack-plugin": "4.3.1"
    },
    

    Typescript

    The application is written using Typescript. For proper compilation we need to have tsconfig.json placed in the main directory.

    // tsconfig.json
    {
      "compilerOptions": {
        "target": "es5",
        "lib": ["dom", "dom.iterable", "esnext"],
        "allowJs": true,
        "skipLibCheck": true,
        "esModuleInterop": true,
        "allowSyntheticDefaultImports": true,
        "strict": true,
        "forceConsistentCasingInFileNames": true,
        "noFallthroughCasesInSwitch": true,
        "module": "esnext",
        "moduleResolution": "node",
        "resolveJsonModule": true,
        "isolatedModules": true,
        "noEmit": true,
        "jsx": "react-jsx",
        "typeRoots": ["./node_modules/@types", "./@types"]
      },
      "include": ["src/**/*"],
      "exclude": ["./node_modules", "./node_modules/*"]
    }
    

    AWS Transcribe Streaming DEMO

    We use NODE v16.13.0 and React 17.0.2. To run the application, 2 commands should be performed:

    npm install
    npm run start
    

    Here is a screenshot of an example visible in the browser. You will be able to test Speech-to-text functionality and implement it in your application.

    Thanks to Muhammad Qasim whose demo inspired this article. More info at: https://github.com/qasim9872/react-amazon-transcribe-streaming-demo

    AWS transcribe streaming demo

    Troubleshooting

    Problem:

    Failed to compile.
    src/SpeechToText/transcribe.controller.ts
    TypeScript error in src/SpeechToText/transcribe.controller.ts(173,14):
    Property 'pipe' does not exist on type 'MicrophoneStream'.  TS2339
    

    Solution:

    add proper Typescript types definition to the main directory ( @types/microphone-stream/index.d.ts ):

    // @types/microphone-stream/index.d.ts
    declare module 'microphone-stream' {
      import { Readable } from 'stream';
      export declare class MicrophoneStream extends Readable {
        static toRaw(chunk: any): Float32Array;
        constructor(opts?: {
          stream?: MediaStream;
          objectMode?: boolean;
          bufferSize?: null | 256 | 512 | 1024 | 2048 | 4096 | 8192 | 16384;
          context?: AudioContext;
        });
        public context: AudioContext;
        setStream(mediaStream: MediaStream): Promise<void>;
        stop(): void;
        pauseRecording(): void;
        playRecording(): void;
      }
      export default MicrophoneStream;
    }
    

    That’s it for today’s tutorial. Make sure to follow us for other useful tips and guidelines.

    Do you need someone to implement this solution for you? Check out our specialists for hire in the outsourcing section. Are you considering a global project and are uncertain how to proceed? Or you need a custom web development services? Reach us now!

    Comments
    6 response
    1. hi,
      this is nice article but it cant run in my system even I follow ur instructions.
      I got below error
      eslintrc » eslint-config-react-app/jest#overrides[0]:
      Environment key “jest/globals” is unknown
      But u given reference github link, that is working fine.
      Now I want to add speaker labels so is this possible?

      • The error you are encountering seems to be related to ESLint configuration in your package.json file. Specifically, it’s having an issue with the jest/globals environment key. You might solve this problem by updating your package.json and removing “react-app/jest” from the extends array under eslintConfig or you could remove the whole eslintConfig if you’re not using it. After that, run npm install again and check if the issue is resolved. If you had any devDependencies related to ESLint, you might also need to remove them temporarily, run npm install, and then add them back.

        Regarding your question about adding speaker labels, Amazon Transcribe provides a feature called speaker diarization which can accurately label speakers in an audio stream. This is useful in scenarios where you need to distinguish between different speakers in a conversation.

        In your case, to enable speaker labeling, you would need to set: ShowSpeakerLabel: true, Once this is done, Amazon Transcribe streaming will return a result object as part of the transcription response that can be used to label the speakers in the transcript. The result object contains several parameters including a Speaker parameter, which represents the speaker label​.

        Here is example of a Java application, but the principle is the same if you’re using a different language with the AWS SDK.
        https://aws.amazon.com/blogs/machine-learning/using-speaker-diarization-for-streaming-transcription-with-amazon-transcribe-and-amazon-transcribe-medical/

    2. Hi,
      Your article is very nice and very helpful, thank.
      can you tell me is this possible that add 2 speakers label in this ?

      Thanks

      • Hi, thanks for your comment. The StartStreamTranscriptionCommand is used for real-time transcription with AWS Transcribe Streaming.
        Below is an example code snippet that demonstrates how you can use the Transcribe service with speaker identification:

        
        const { TranscribeStreamingClient, StartStreamTranscriptionCommand } = require("@aws-sdk/client-transcribe-streaming");
        const { PassThrough } = require('stream');
        async function transcribeStreaming() {
            const client = new TranscribeStreamingClient({ region: 'your_region' });    
            const audioStream = new PassThrough();  
            const command = new StartStreamTranscriptionCommand({
                LanguageCode: 'en-US',
                MediaEncoding: 'pcm',
                MediaSampleRateHertz: 8000, // Sample rate of your audio in Hz
                AudioStream: audioStream,
                ShowSpeakerLabel: true,
                EnableChannelIdentification: true,
                NumberOfChannels: 2 // assuming stereo audio
            });
            
            const response = client.send(command);
            
            response.on('data', event => {
                for (const result of event.Transcript.Results) {
                    if (result.IsPartial === false) {
                        const noOfSpeakers = result.Alternatives[0].Items.length;
                        for(let i = 0; i < noOfSpeakers; i++){
                            console.log(`Speaker ${result.Alternatives[0].Items[i].SpeakerLabel}: ${result.Alternatives[0].Items[i].Content}`);
                        }
                    }
                }
            });
        
            // Pipe your audio data into the stream
            // Depends on your implementation, it could be from microphone, file read stream, etc.
            // For example, you might use the fs module to create a read stream from an audio file
            const fs = require('fs');
            fs.createReadStream('path_to_your_audio_file').pipe(audioStream);
        
            response.on('error', console.error);
        }
        
        transcribeStreaming();
        
      • The TypeScript error TS2339 occurs when you’re trying to access a property or method that does not exist on a particular type. In your case, the error message suggests that the pipe method doesn’t exist on the MicrophoneStream type.

        The solution provided in the article involves adding a proper TypeScript types definition to the main directory. Specifically, you’re asked to create a file named index.d.ts inside the @types/microphone-stream directory and include the following TypeScript types definition: https://www.createit.com/blog/speech-to-text-streaming-demo-in-react/#stoc-troubleshooting . This TypeScript definition is extending the existing MicrophoneStream type to include the methods and properties you need for your application.

        If you’re using a module bundler like webpack, ensure it’s configured to include .d.ts files when resolving modules.

    Add comment

    Your email address will not be published. Required fields are marked *

    Popular news

    Fetching Time records from ActiveCollab API
    • Dev Tips and Tricks

    Fetching Time records from ActiveCollab API

    September 9, 2024 by createIT
    Docker Compose for PrestaShop
    • Dev Tips and Tricks

    Docker Compose for PrestaShop

    September 2, 2024 by createIT
    WordPress wizard in admin – step by step
    • Dev Tips and Tricks

    WordPress wizard in admin – step by step

    August 29, 2024 by createIT
    Order Status Sync between PrestaShop and External APIs
    • Dev Tips and Tricks

    Order Status Sync between PrestaShop and External APIs

    August 26, 2024 by createIT
    What is PHP used for in web development 
    • Dev Tips and Tricks

    What is PHP used for in web development 

    August 22, 2024 by createIT
    Automating WooCommerce product availability date
    • Dev Tips and Tricks

    Automating WooCommerce product availability date

    August 15, 2024 by createIT
    WP Quiz Adventure – FAQ
    • Dev Tips and Tricks

    WP Quiz Adventure – FAQ

    August 12, 2024 by createIT
    Retrieval Augmented Generation tutorial and OpenAI example
    • Dev Tips and Tricks

    Retrieval Augmented Generation tutorial and OpenAI example

    August 8, 2024 by createIT
    10 useful SEO tools for the iGaming industry
    • Services
    • Technology

    10 useful SEO tools for the iGaming industry

    August 5, 2024 by createIT

    Support – Tips and Tricks
    All tips in one place, and the database keeps growing. Stay up to date and optimize your work!

    Contact us