Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcribing archived videos automatically #196

Open
justSteve opened this issue Feb 4, 2021 · 13 comments
Open

Transcribing archived videos automatically #196

justSteve opened this issue Feb 4, 2021 · 13 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@justSteve
Copy link
Contributor

<< We really need a better way to index the videos. It's on the list, along with getting them hosted in the web app via Vimeo>>

It's an idea I've been mulling for a bit -- there are a number of 'speech-to-text' (post-event processing) services and options that, if we could script and automate the transcription process, Ardalis has a wealth of knowledge wrapped up in DevBetter's archive. Turning that video content to text content would seem like a nature first step to a more robust annotation.

I know that Office's Word app has a speech to text processor that's a) free as its included in most O365 subs; b) good enough to produce text where different speakers are differentiated and noted. Azure has something that (like Word's I've only read about.

Speaking as a new comer to the $$ group I'd really value being able get caught up on various threads of discussion but video is simply too slow. Anything that turns video to text is going to be a huge step in that direction.

I'd suggest we ask for responses here from anyone listening who has any prior experience with speech-to-text (in whatever form) share whatever lessons learned.

Thanks!

@ShadyNagy ShadyNagy added this to the Videos Features milestone Jan 29, 2022
@ardalis
Copy link
Collaborator

ardalis commented Sep 8, 2022

anyone have any thoughts on how to add transcriptions to the videos in an automated fashion? Is that something we can leverage our video hosting provider (Vimeo) to do, or would it need to be a separate custom process of ours?

This seems to suggest we can get Vimeo to do this automatically:
https://vimeo.com/blog/post/how-to-transcribe-a-video/

@ardalis ardalis added enhancement New feature or request help wanted Extra attention is needed labels Sep 8, 2022
@ardalis ardalis changed the title Indexing archived videos Transcribing archived videos automatically Sep 8, 2022
@snowfrogdev
Copy link
Contributor

snowfrogdev commented Sep 8, 2022

anyone have any thoughts on how to add transcriptions to the videos in an automated fashion? Is that something we can leverage our video hosting provider (Vimeo) to do, or would it need to be a separate custom process of ours?

This seems to suggest we can get Vimeo to do this automatically: https://vimeo.com/blog/post/how-to-transcribe-a-video/

The link you posted says that if you are on a paid plan, video transcription is done automatically by default. See: https://vimeo.com/features/auto-caption

From what I can tell the transcription/auto-caption is actually working at the moment on devBetter's videos. But I guess we'd also want to actually get the transcript file so we can add the entire transcripted text to the video page on devBetter. I'll look into it.

Are we on a paid plan? Which one?

@snowfrogdev
Copy link
Contributor

If the transcription/caption is automatic on upload, which it seems to be, it looks like we should be able to download it using Vimeo's API. See: https://developer.vimeo.com/api/reference/videos#get_text_tracks

@snowfrogdev snowfrogdev self-assigned this Sep 8, 2022
@ardalis
Copy link
Collaborator

ardalis commented Sep 8, 2022

Yes, we're on a paid plan. @ShadyNagy has the most experience with our Vimeo integration and their APIs if you have questions.

@snowfrogdev
Copy link
Contributor

I noticed that the CCs are only available in videos starting in May 2022. I'm thinking of first implementing this feature so that newly uploaded videos have their transcripts added to the video page, at the bottom.

image

I have a feeling getting transcripts for videos prior to May 2022 might involve a different, more complicated process. So I will leave getting transcripts for previous videos as a separate exercise..

@ardalis
Copy link
Collaborator

ardalis commented Sep 9, 2022

We may not have been on a paid plan prior to that; not sure. Or maybe it's a newish feature of theirs. Your plan sounds good.

@ShadyNagy
Copy link
Contributor

Transcripts are working on our plan but we only uploaded a few srt files by the uploader.

@snowfrogdev
Copy link
Contributor

I've got the transcript showing up on the Video/Details page now. It is the raw VTT text, so it has an index and time stamp for each entry. @ardalis is that good enough for our purposes or would you prefer I parse the string to remove the index and time stamps, in order to be left only with the text?

image

@ardalis
Copy link
Collaborator

ardalis commented Sep 12, 2022

I think it'll be hard to use that transcription broken up into 1 second intervals. So, yeah I would want to see it parsed down. The timestamp data is useful - we have the ability to make links into the video at a given timestamp. So, I'd like to see the transcript have links to the video. One way to do that would be to arbitrarily add links (e.g. every 15 or 30 seconds). Another would be to make literally ever few words be a link. I think that would be easier and more useful.

so in you screenshot above there'd just be the text, but every one of those phrases would be in its own anchor tag going to the video at offset 0, 1, 2, 3, etc.

If it gets annoying to read because everything is styled like a link we could simply adjust the styling or have an option to view the transcript without links.

Thoughts?

@snowfrogdev
Copy link
Contributor

Now that #926 has been merged, if I've implemented things correctly, we should automatically see the transcripts displayed on the Video/Details page of videos recorded during and after May 2022 and will work for future videos as long as we remain on a paid plan.

I will wait until we deploy this to make sure it works properly before analyzing how we might get the transcripts to show up for videos recorded prior to May 2022.

@snowfrogdev
Copy link
Contributor

@ardalis Alright, here's the deal. On our current Vimeo plan, we benefit from auto-transcription ON UPLOAD. Retroactive auto-transcription is only available on an Enterprise plan.
image

So, the way I see it, here are our options.

  1. Switch to an Enterprise plan, even if its just for one month, so we can use Vimeo's own retroactive auto-generation of transcript.

  2. Download, Delete and reUpload every old video so it triggers the auto-transcription. This could be done all in one go, with a script, but you'd probably bust your monthly plan quotas. Or you could do it manually, a few at a time, over the next several months until the whole back catalog is done.

  3. Use a third party (paid, couldn't find a free one) service to transcript the back catalog. Would involve finding one that offers an API. Then write a script to download video from Vimeo, upload to the service, get the transcript file and upload it to Vimeo. Could also be done manually.

Thoughts?

@snowfrogdev
Copy link
Contributor

@ardalis have you given this some thoughts on how you'd like to proceed with this? One additional option that I didn't list is to simply forget about getting the transcripts for older videos and just be happy with the ones we have, knowing that from now on, transcripts will be automatically generated and displayed on new videos.

@ardalis
Copy link
Collaborator

ardalis commented Sep 23, 2022

Either 1 or 0 (let it stay as is) is probably the way to go. I'll see what the enterprise plan involves, $ wise. If we switch to that plan do we need to do anything to get the transcripts? If so, what?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants