Are you ready for a Video Classification Challenge?

栏目: IT技术 · 发布时间: 5年前

内容简介：To help you learn new skills as well as win some prize money online, while working from home, we at MathWorks are launching a data science competition.The dataset will be comprised of image stack (a 3D image) taken from a live mouse brain showing blood ves

Preparation Guide for Video Classification

Neha Goel

Apr 25 ·6min read

Are you ready for a Video Classification Challenge? — Image Source ( Stall Catchers )

To help you learn new skills as well as win some prize money online, while working from home, we at MathWorks are launching a data science competition.

Teaser: The Datathon will be live in May. Signup for DrivenData account to receive the launch announcement. Request for complimentary MATLAB licenses here : Advance Alzheimer’s Research with Stall Catchers

The dataset will be comprised of image stack (a 3D image) taken from a live mouse brain showing blood vessels and blood flow. Each stack will have an outline drawn around a target vessel segment and will be converted to an .mp4 video file. The problem will be to classify the target vessel segment as either flowing or stalled. The challenge will be online, globally accessible and free to participate in. You can use any approach to solve the problem.

In this story, I will talk about the concepts and methods I learned while working on setting up this problem. I will also point you to the documents you can refer, to start preparing for the challenge.

Working with Data

Video Data

Working with videos is an extension of working with images; we additionally must consider dynamic nature of a video over the static nature of an image. A video can be defined as a stack of images, also referred to as frames arranged in a specific order. Each frame is meaningful, but the order is also very important. Hence both spatial and temporal content of the frames need to be measured.

So, the first step is extracting frames from video. Make sure that the frames should have both, the sequence modeling and the temporal reasoning.

Process Data

Another challenge in working with videos is the large size of the dataset. In MATLAB, you can use the concept of datastore , to create a repository for collections of data that are too large to fit in memory. A datastore allows you to read and process data stored in multiple files on a disk, a remote location, or a database as a single entity.

Documents to refer:

Understand the concept of datastore: Getting Started with Datastore
Create different datastore for images, text, audio, file etc. Datastore for different File Format or Application
Use built-in datastores directly as input for a deep learning network: Datastores for Deep Learning
Implement a custom datastore for file-based data: Develop Custom Datastore
The data for the challenge will use the data stored in AWS. So, Learn how to access data from S3 bucket

Video Classification Methods

Once the data is ready, you can use either of the 5 below methods to proceed with classification. I will talk about the most commonly used video classification methods from basic non-deep learning approach to an advanced one. But I would encourage you to use the deep learning approaches due to the size of the data and to extract features from each frame in timely manner.

Classical Computer Vision Methods

Method 1: Optical Flow, Object Tracking & Cascade Classifier

Optical flow, activity recognition, motion estimation and tracking are the key activities you can use to determine the classes and their movement in adjacent frames of the video.

Resources to refer:

To learn how to implement Optical flow using algorithms Horn-Schunck method, Farneback method and Lucas-Kanade method check out this tutorial video : Computer Vision Training, Motion Estimation
More Examples and documentation for Tracking & Motion Estimation
To learn object tracking using histogram based tracking, tracking occluded or hidden objects using a Kalman Filter , check out this tutorial video: Computer Vision Training, Object Tracking
Example to show how to perform automatic detection and motion-based tracking of moving objects in a video: Motion-Based Multiple Object Tracking

Another approach can be by using the local features like blobs, corners and edge pixels of an image. The cascade classifier supports local features like Haar, local binary patterns (LBP) and histograms of oriented gradients (HOG).

Resources to refer:

Deep Learning Methods

Method 2: Convolutional Neural Network (CNN) + Long short-term memory network (LSTM)

In this method, you convert the videos to a sequence of feature vectors using a pre-trained convolutional neural network to extract features from each frame. Then train a Long short-term memory (LSTM) network on the sequences to predict the video labels. As a final step, combine layers from both networks to assemble a final network that classifies videos directly.

To learn steps for this complete workflow, check this document: Classify Videos Using Deep Learning

Image Source ( MATLAB documen t)

Method 3: Large-scale video classification with CNN

If video classification is like image classification, why not just use convolutional neural network?

To answer this, remember I talked about the temporal component of the video. So, to capture the temporal and spatial aspects, you can use CNN, but you need to structure the network in different ways.

This paper from Stanford, Large-scale Video Classification with Convolutional Neural Networks , talks about the challenges of the basic CNN for videos. It further elaborates all the different models of CNN you can use, to fuse features from multiple frames.

Method 4: Two-stream CNN

The other approach as explained by the researchers in this paper: Two-Stream Convolutional Networks for Action Recognition in Videos , is two conv-nets each for spatial and temporal aspect.

Documents to refer to develop CNN architecture in MATLAB:

Method 5: Using a 3D convolution network

3D ConvNets are on the initial choice for video classification since they inherently apply convolutions and max pooling in the 3D space. In this paper: Learning Spatiotemporal Features with 3D Convolutional Networks , researchers propose a C3D ( convolutional 3D ) with compact features and efficient compute.

Documents to refer:

Design 3D-ConvNet using functions like : image3dInputLayer , convolution3dLayer , maxPooling3dLayer in MATLAB
Design the network using Deep Network Designer
Check the complete list of Deep Learning layers in MATLAB here: List of Deep Learning Layers
Example to work on 3-D medical images: 3-D Brain Tumor Segmentation Using Deep Learning

Next Steps

If you do not have a MATLAB license, start your preparation by requesting for complimentary MATLAB licenses here: Advance Alzheimer’s Research with Stall Catchers .

Stay tuned for further updates, in my next blog in May, on the competition launch day. The blog will be the benchmark code for the problem with all other details.

Feel free to give your feedback or any questions you have in the comments below.

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Are you ready for a Video Classification Challenge?

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

算法

Robert Sedgewick、Kevin Wayne / 人民邮电出版社 / 2012-3 / 99.00元

《算法(英文版•第4版)》作为算法领域经典的参考书，全面介绍了关于算法和数据结构的必备知识，并特别针对排序、搜索、图处理和字符串处理进行了论述。第4版具体给出了每位程序员应知应会的50个算法，提供了实际代码，而且这些Java代码实现采用了模块化的编程风格，读者可以方便地加以改造。本书配套网站提供了本书内容的摘要及更多的代码实现、测试数据、练习、教学课件等资源。《算法(英文版•第4版)》适合......一起来看看《算法》这本书的介绍吧!

码农工具