Identifying and visualising the main themes emerging from a video collection of videos.

RQ: Which are the main themes (based on number of scenes) in the Amazon Fires related YouTube videos?

VISUALIZATION

Responsive image
STEPS
WHAT’S IT FOR
TOOLS
DETAILS AND MATERIALS
SCRAPING
DATA EXPLORATION
DATA PREPARATION
URLS CREATION
RENAMING THE NEW COLUMN
DOWNLOAD VIDEOS
COLLECTING VIDEOS IN A NEW FOLDER
FRAME EXTRACTIONS BY CHANGE OF SCENE
COLLECTING ALL FRAMES IN THE FOLDER OF THE VIDEO THEY BELONG TO
RENAMING ALL THE FRAMES IN ORDER OF THE VIDEO VIEWS THEY BELONG (NOT MANDATORY)
REORDERING ALL THE FRAMES IN A NEW FOLDER
CREATION OF A VECTOR SPACE WITH ALL THE FRAMES
CHOOSE THE VIEW ON PIXPLOT
EXPORT THE VISUALISATION
ANNOTATE THE VISUALISATION
GET A LIST OF VIDEOS FOR EACH CHOSEN QUERY AND SELECTED TIME-FRAME
OPEN THE YOUTUBE DATA TOOL (YDT) CSV DOWNLOADED AND EXPLORE THE DATA
FILTER THE LIST IN ORDER OF VIEWS TO TAKE THE FIRST 50 VIDEOS AS SAMPLES
INSIDE THE YDT.CSV THERE ARE ONLY THE VIDEOS ID, BUT YOU NEED THE URL TO DOWNLOAD THEM
TO KEEP TRACK OF THE NEW COLUMN IN WHICH WE HAVE ALL THE VIDEOS URLS
DOWNLOAD THE VIDEO SAMPLE QUICKLY AND AUTOMATICALLY
IT’S IMPORTANT FOR THE NEXT SCRIPT THAT THE FOLDER CONTAINS ONLY THE DOWNLOADED VIDEOS
* THE SCRIPT EXTRACTS THREE FRAMES EVERY SCENE CHANGE
THE DETECT.PY SCRIPT CREATES A SUBFOLDER FOR EACH VIDEO IN WHICH IT INSERTS ALL THE DETECTED FRAMES.
IF YOU WANT TO KEEP TRACK OF THE ORDER OF VIDEOS BY VIEWS IN THE VISUALIZATION
THE NEXT STEP REQUIRES HAVING THE FRAMES OF ALL THE VIDEOS IN ONE FOLDER
ANALYSING THE VISUAL AND THEMATIC SIMILARITY OF FRAMES
FOR THIS METHOD CHOOSE THE VIEW: CLUSTER IMAGES by umap dimensionality reduction in a frame network
TAKE A SCREENSHOT OR USE THE "SAVE AS" COMMAND TO OBTAIN A STATIC IMAGE ON WHICH TO MAKE ANNOTATIONS.
HIGHLIGHTING THEMATIC CLUSTER
Youtube Data Tools[Video List]
Excel[Import Data]
Excel[Filter-Discending]
Excel=CONCATENA(E2;F2)
Excel
Python 3[PyTube3]
No tool needed
Python 3[PySceneDetect]
No tool needed
No tool needed
No tool needed
Anaconda + Python3 + Pixplot
Pixplot
Pixplot
Figma
“Amazon Fires” - “Pray for Amazonia”
videoIdvideoTitlepublishedAtviewCountposition

E2 ⟩ http://www.youtube.com/watch?v=

F2 ⟶ videoId

videoUrl

LINK TO PYTHON3 DOCUMENTATION

LINK TO PYTUBE3 DOCUMENTATION

LINK TO REPOSITORY AND STEP-BY-STEP GUIDE

Rename the videos inside the folder like this:

vid1vid2vid3vid4

LINK TO PYSCENEDETECT DOCUMENTATION

LINK TO REPOSITORY AND STEP-BY-STEP GUIDE

Rename the frames inside the folder like this:

frame1frame2frame3frame4frame5

[SEARCH VIDEO-SCENE /

REPLACE WITH:VIDEO1..VIDEO2..]

Rename the frames inside the folder like this:

video1-001-01video1-001-02video1-001-03

LINK TO INSTALL ANACONDA

LINK TO PIXPLOT DOCUMENTATION

LINK TO REPOSITORY AND STEP-BY-STEP GUIDE

LINK TO DOWNLOAD FIGMA

METODOLOGY

aim

This method aims to identify which are the main themes emerging within a collection of videos. Frame extraction for this purpose is based on scene change detection, so that the images to be analysed are only taken once and there are no duplicates due to scene length. The layout used to arrange the frames according to their visual similarity is offered by Pixplot, which uses UMAP projection, a dimensionality reduction algorithm, specifically designed for visualising complex data in low dimensions (2D or 3D).

output

The final visualisation is a clusterisation of frames sorted by visual similarity that allows the identification of predominant thematic clusters within the analysed video collection. The thematic annotations of the visualisation were drawn following the boundaries identified by the original Pixplot visualisation.