Transcription json

Component/part description

In this section we are looking at the Transcription JSON schema used in autoEdit.

Connected to this is defining a schema for transcription, to make sure all components that work with this have a defined interface/specification. Eg BBC Transcript model

Gentle - json

See appendix for json example.

IBM - Json

See appendix for json example. As well as their stt api reference and documentation.

Pocketsphinx - plain text

See appendix for example pocketsphinx plain text.

Implementations Options considered

Other

An array of words object, to represent lines this could also be a nested array of word objects.

Where the word object at a minimum as a start, end time and text attribute.

Current implementation

Transcription domain

autoEdit JSON Transcription schema at a high level it models the objects present in a transcription.

In this representation:

  • Transcription

    • Paragraphs ← speaker

      • Lines

        • Words

Speakers are associated to paragraphs. Paragraphs are treated as sections of lines.

A list of speakers can also be kept separate, similarly to how IBM Watson stt API returns the results of speaker diarization.

See Appendix for autoEdit json schema example.

What needs refactoring

Name of paragraph and line attribute are ambiguos. It should be lines and words instead(?).

replace array with hash

This is a bigger refactoring but instead of array data structure, could use hash/dictionary.

This way id is the key. and can make use of key value methods available in js. Lookup speed would improve(?) and could easily get array of values using js method.

Last updated