Innovations in Video and Image Content Understanding for Social Media Platforms

During a transformative journey in late summer 2019, a chance encounter led to a groundbreaking project aimed at redefining content understanding on social media platforms akin to Instagram and TikTok. Our mission was clear: to filter content for sensitive elements like nudity and violence effectively and to refine ad targeting through cutting-edge machine learning techniques.

Advanced Object Detection

Leveraging pjreddie’s darknet and the Yolov3 model, we crafted a high-speed image and video classification pipeline. This system identified predominant objects within media, outputting results in a structured JSON format, surpassing traditional methods in both speed and accuracy.

Pioneering Video-Action Detection

Video-Action Detection Visualization

Action detection in videos posed a unique set of challenges, from vast parameter spaces to scant training data. Our solution involved integrating NVIDIA-STEP, a model that excels in interpreting the complex temporal dynamics of video content, setting a new standard for action recognition.

Revolutionizing Speech Transcription

We tapped into the prowess of Google’s Speech-to-Text API, transforming audio extracted from videos into coherent transcriptions. This process not only enriched content understanding but also paved the way for more contextual ad placements and user interactions.

Innovating with Automatic Text Summarization

Exploring both abstractive and extractive summarization techniques allowed us to condense the transcribed text into digestible summaries. This dual approach highlighted essential information and generated new, concise sentences, mirroring a human-like understanding of the content.

Overcoming Challenges

The venture into video-action detection marked our project’s most formidable challenge, navigating through the latest research and parallel processing requirements. By employing multiprocessing and strategic design, we synchronized the analysis of audio and visual elements, achieving unprecedented efficiency and accuracy in content understanding.

For an in-depth look at our journey and technological breakthroughs, visit our project page.