How does this work?
The app will read the metadata of the original rushes (no matter how big the file) and convert it to an audio and video preview.
The audio is used to send to the Speech To Text Engine to get a transcription.
The video is a lower resolution proxy used by the app to generate previews.
When exporting a sequence (eg a paper-edit/programme script as EDL, XML etc... for a video editing software) the app will use the original source metadata to reconnect to the high resolution rushes.