This is the module responsible for generating the transcription, generally using a third party service or API such as IBM Watson one.

It is composed of 2 main components

  • Audio converter Convert audio or video to audio specs for stt API

  • STT sdk audio to STT API/Service, to receive time-coded transcription.

With Extra:

  • Speaker diarization can either happen at the STT API level or as a separate module to be interpolated with the transcription.

And optional:

  • Srt parsing. Allow srt as input. In case transcription comes from elsewhere. Can use module srtParserComposer to refactor

  • Plain text as input, if you already have the transcription, use something like Gentle to re-align and generate transcription json.

It was Initially prototyped as a standalone app to test quality of speech to text. see Transcriber.

Perhaps look into compositor pattern to bring together the components of this module.

