Speech to text options

There are three options for speech to text APIs that you can use with this system.

Check them out individually for extra setup instruction.

triangle-exclamation
  1. Mozilla Deep Speech (Open Source, need to download 1.8gig for the English language model)

  2. Pocketsphinx (Open Source, integrate inside the app, no extra setup needed)

Overview

AssemblyAI

  • Free tier: 5 hours free per month

  • Easy to setup credentials

  • Generally very accurate (my opinion, judge for yourself)

  • For now only support for English but more coming soon

Speechmatics

  • 1 hour free credit with new account

  • Easy to setup credentials

  • Generally pretty accurate (my opinion, judge for yourself)

  • Including support for "accent agnostic global english".

Mozilla Deepspeech

Pros:

  • Free as in free speech as well as in free beer.

  • Working locally on your machine. No internet connection needed because of that, good for sensitive material.

  • Open source

  • requires to download 1.8gig for the English language model

Cons:

  • Not as accurate as commercial STT but still pretty good (in my opinion, but decide for yourself).

  • Only support US english STT.

Check out for extra setup instructions for Mozilla Deepspeech .

Pocketsphinx

Pros:

Cons:

  • Not as accurate as IBM or Gentle.

  • Only support US english STT.

  • Transcription takes a little longer then the length of the media. (eg 27 min takes 30 min to transcribe).

Pocketsphinx does not require extra setup to use.

Last updated

Was this helpful?