This document describes the components of the video workflow for OCW.
SECTIONS
- Overview
- Google Drive Sync and AWS Transcoding
- YouTube Submission
- Captioning and 3Play Transcript Request
- Completing the Workflow
- Management Commands
This assumes that Google Drive sync, YouTube integration, AWS MediaConvert, and 3Play submission are all enabled, which is required for the video workflow.
The high-level description of the process is below, and each subsequent section contains additional details, including links to the relevant code.
- Browse to a course site in the Studio UI, go to the Resources page and click the icon to the right of the
Sync w/ Google Drive
button to open the site's Google Drive folder in the Google Drive UI. - Upload a video with the name
<video_name>.<video_extension>
to thevideos_final
folder on Google Drive, where<video_extension>
is a valid video extension, such asmp4
. If there are pre-existing captions that should be uploaded with the video (as opposed to requesting captions/transcript from 3Play), then these should be named exactly<video_name>_captions.vtt
and<video_name>_transcript.pdf
, and uploaded into thefiles_final
folder on Google Drive. - Sync using the Studio UI. This uploads the video to S3.
- As soon as the upload to S3 is complete, Studio initiates a celery task to submit the video to the AWS Media Convert service.
- Once trancoding is complete, the video is uploaded to YouTube (set as unlisted prior to the course being published).
- After the video has been successfully uploaded to YouTube, and if there are no pre-existing captions, Studio sends a transcript request to 3Play.
- Once 3Play completes the transcript job, the captions (
.vtt
format) and transcript (.pdf
format) are fetched and associated with the video. - On any publish action, the video metadata and YouTube metadata are updated, assuming the information has been received from the external services.
- The YouTube video is set to public once the course has been published to live/production.
Users upload videos in a valid video format to the videos_final
folder. Whether a file is located in this folder is used for defining the is_video property. The file is processed using the process_drive_file function, which triggers the stream_to_s3
and transcode_gdrive_video
functions, which submit the AWS MediaConvert transcoding job.
The parameters of the AWS transcode request are defined through the AWS interface, and the role is defined here. Some example JSONs used for triggering MediaConvert job are in this folder.
The TranscodeJobView
endpoint listens for the webhook that is sent when the transcoding job is complete.
Videos are uploaded to YouTube via the resumable_upload
function. The YouTube upload success notification is sent by email when the update_youtube_statuses
task is complete; exceptions in this task trigger the YouTube upload failure notification. When the course is published to draft/staging, the video is set to unlisted
. However, when it is published to live/production, the video is made public on YouTube, via the update_youtube_metadata
function. When a video is made public on YouTube, all YouTube subscribers will be notified. There are nearly 5 million subscribers to the OCW YouTube channel, so be careful with this setting.
If there are no pre-existing captions, a 3Play transcript request is generated. This is done via the threeplay_transcript_api_request
function.
Once the workflow is completed, the updates to the Video
and WebsiteContent
objects are nearly complete. The only remaining steps are triggered on course publish: updating the video metadata via update_transcripts_for_website
and updating the YouTube metadata via update_youtube_metadata
.
In cases where something may have gone wrong with the data, often due to legacy data issues, there are management commands that can be run to resolve them. The commands are defined here. These commands are:
- backpopulate_video_downloads In the existing video workflow, the MediaConvert job creates a downloadable verion as well as the YouTube version. Initially, these downloadable versions were not in the same S3 path as the course site's other resource content, and running this command moves them to the appropriate location.
- clear_webvtt_files Some captions were initially saved without an extension; this management command deletes them from S3 and clears the resource metadata, allowing them to be re-created.
- sync_missing_captions This management command syncs captions and transcripts from 3Play to course videos missing them.
- sync_transcripts. This management command syncs captions and transcripts for any videos missing them from one course (
from_course
) to another (to_course
).