A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
benchmark research video-summarization dataset video-captioning video-story vision-language video-question-answering video-language large-language-models video-language-pretraining video-story-generation
-
Updated
Sep 25, 2024 - Python