Wanted:

Brave New Ideas for Video Understanding

CVPR'18 Workshop:
Brave New Ideas for Video Understanding

Date: June 18th, 2018, 09.00-17.15

Together with the Computer Vision and Pattern Recognition (CVPR) 2018.

Description of the workshop and its relevance

In the late years Deep Learning has been a great force of change on most computer vision tasks. In video analysis problems, however, such as action recognition and detection, motion analysis and tracking, shallow architectures remain surprisingly competitive. What is the reason for this conundrum? Larger datasets are part of the solution. The recently proposed Sports1M and Kinetics helped recently in the realistic training of large motion networks. Still, the breakthrough has not yet arrived.

Assuming that the recently proposed video datasets are large enough for training deep networks for video, another likely culprit for the standstill in video analysis is the capacity of the existing deep models. More specifically, the existing deep networks for video analysis might not be sophisticated enough to address the complexity of motion information. This makes sense, as videos introduce an exponential complexity as compared to static images. Unfortunately, state-of-the-art motion representation models are extensions of existing image representations rather than motion dedicated ones. Brave, new and motion-specific representations are likely to be needed for a breakthrough in video analysis.

The goal of this workshop is to bring together researchers from the broad area of video analysis to discuss problem statements, evaluation metrics, and benchmarks that will spur disruptive progress in the field of video understanding. The workshop will include a series of invited talks by leading researchers in this area as well as oral and poster presentations of accepted papers.

Calling papers for brave new ideas

Submissions will be in the form of short non- anonymous papers and will consist of a maximum of 4 pages (excluding references). Submissions must represent new work, i.e., work that has not been previously published or accepted for publication. However, papers that expand previous related work by the authors and papers that have appeared on non peer-reviewed websites (such as arXiv) or that have been presented at workshops (i.e., venues that do not have a publication proceedings) are acceptable. Accepted papers will be presented as posters or contributed talks. Authors of accepted papers will be asked to post their submissions on arXiv. The workshop website will provide links to the accepted papers on arXiv. Accepted papers will be considered non- archival, and may be submitted elsewhere (modified or not).

Expert speakers

To kickstart the discussion we will have several influential speakers.

Topics

The workshop focuses on video representations related, but not limited, to the following topics:

- Influence of motion in object recognition, object affordance, scene understanding.
- Object and optical flow
- Motion prediction, causal reasoning and forecasting
- Event and action recognition
- Spatio-temporal action localization
- Modeling human motion in videos and video streams
- Motion segmentation and saliency
- Tracking of objects in space and time
- Unsupervised action, actom discovery using ego motion
- Applications of motion understanding and video dynamics in sports, healthcare, autonomous driving, driver assistance and robotics

- Influence of motion in object recognition, object affordance, scene understanding.
- Object and optical flow
- Motion prediction, causal reasoning and forecasting
- Event and action recognition
- Spatio-temporal action localization
- Modeling human motion in videos and video streams
- Motion segmentation and saliency
- Tracking of objects in space and time
- Unsupervised action, actom discovery using ego motion
- Applications of motion understanding and video dynamics in sports, healthcare, autonomous driving, driver assistance and robotics

Program & Accepted Papers

The workshop will take place on June 18, 2018.
Please, check below for the titles and abstracts of the invited talks.

Time Event Description
09.00 - 09.10 Welcome to the workshop Information
09.10 - 09.40 Iasonas Kokkinos (Facebook, UCL) Invited speaker
09.40 - 10.10 Honglak Lee (U. Michigan, Google Brain) Invited speaker
10.10 - 10.50 Break Coffee
10.50 - 11.10 Temporal Reasoning in Videos using Convolutional Gated Recurrent Units, by Debidatta Dwibedi, Pierre Sermanet, Jonathan Tompsonng Convolutional Gated Recurrent Units Oral Presentation
11.10 - 11.30 Temporal 3D ConvNets using Temporal Transition Layer, by Ali Diba, Mohsen Fayyaz, Vivek Sharma, A. Hossein Karami, M. Mahdi Arzani, Rahman Yousefzadeh, Luc Van Gool Oral Presentation
11.30 - 12.00 Christoph Feichtenhofer (Facebook) Invited speaker
12.00 - 14.30 Break Lunch (on your own)
14.30 - 15.00 Cees Snoek (University of Amsterdam) Invited speaker
15.00 - 15.20 ContextVP: Fully Context-Aware Video Prediction, by Wonmin Byeon, Qin Wang, Rupesh Kumar Srivastava, Petros Koumoutsakos Oral Presentation
15.20 - 16.30 Break Afternoon Break & Poster Session
16.30 - 17.00 Ivan Laptev (INRIA) Invited speaker
17.00 - 17.15 Organizers Closing remarks

Invited Talks

Invited Speaker: Iasonas Kokkinos (Facebook, UCL)
Title: To be announced
Abstract: To be announced

Invited Speaker: Honglak Lee (U. Michigan, Google Brain)
Title: To be announced
Abstract: To be announced

Invited Speaker: Cees Snoek (University of Amsterdam)
Title: To be announced
Abstract: To be announced

Invited Speaker: Christoph Feichtenhofer (Facebook)
Title: What have we learned from deep representations for action recognition?
Abstract: In this talk, I will shed light on deep spatiotemporal representations by visualizing what two-stream models have learned in order to recognize actions in video. I show that local detectors for appearance and motion objects arise to form distributed representations for recognizing human actions. Key observations include the following. First, cross-stream fusion enables the learning of true spatiotemporal features rather than simply separate appearance and motion features. Second, the networks can learn local representations that are highly class specific, but also generic representations that can serve a range of classes. Third, throughout the hierarchy of the network, features become more abstract and show increasing invariance to aspects of the data that are unimportant to desired distinctions (e.g. motion patterns across various speeds). Fourth, visualizations can be used not only to shed light on learned representations, but also to reveal idiosyncracies of training data and to explain failure cases of the system.

Invited Speaker: Ivan Laptev (INRIA)
Title: To be announced
Abstract: To be announced

Accepted papers

- Temporal 3D ConvNets using Temporal Transition Layer
Luc Van Gool, Vivek Sharma, Ali Diba, Rahman Yousefzadeh, Mohammad Mahdi Arzani, Mohsen Fayyaz, Ami Karami
- Towards an Unequivocal Representation of Actions
Dima Damen, Michael Wray, Davide Moltisanti
- Unsupervised Deep Representations for Learning Audience Facial Behaviors
Suman Saha, Rajitha Navarathna, Romann M. Weber, Leonhard Helminger
- Dealing with sequences in the RGBDT space
Gabriel Moya-Alcover, Antoni Jaume-i-CapĆ³, Javier Varona
- Temporal Reasoning in Videos using Convolutional Gated Recurrent Units
Debidatta Dwibedi, Jonathan Tompson, Pierre Sermanet
- ContextVP: Fully Context-Aware Video Prediction
Wonmin Byeon, Petros Koumoutsakos, Qin Wang, Rupesh Kumar Srivastava
- I Have Seen Enough: A Teacher Student Network for Video Classification Using Fewer Frames
Shweta Bhardwaj, Mitesh M. Khapra

Important Dates

Together with the Computer Vision and Pattern Recognition (CVPR) 2018.
Date of the workshop: June 18th, 2018, 09.00-17.15

Submission Deadline: March 16, 2018
Notifications to authors by: April 13, 2018
Finalized workshop program by: April 20, 2018
Papers posted on arXiv (non-archival) by: May 4, 2018

Submission

Constructive discussion

The workshop's goal is a constructive, creative and open conversation. In principle we accept all papers with interesting ideas.

Instructions

Authors can submit 4 Page papers which will be peer reviewed. However, they will not be include in the proceedings. Please follow the CVPR 2018 camera ready format as per the instructions given here but limit your paper to 4 pages excluding references.

Program Commitee

Jakub Tomczak, Hakan Bilen, Noureldien Hussein, Jan van Gemert, Silvia-Laura Pintea, Osman Kayhan, Jack Valmadre, Amir Ghodrati, Efstratios Gavves, Lorenzo Torresani, Amir Ghodrati, Tom Runia, Christoph Feichtenhofer, Ross Goroshin, Chen Huang, Makarand Tapaswi, Joseph Tighe, Du Tran, Carl Vondrick, Heng Wang, Limin Wang

Registration & venue

The workshop is together with the Computer Vision and Pattern Recognition (CVPR) 2018.

Accepted papers must have at least one registered author (this can be a student).

Venue TBD.