Tutorials will be held on Sunday 25 September 2016 at the Phoenix Convention Center.

Morning Tutorials: Sunday 25 September 2016, 09:00 to 12:30

T1: Deep Learning for Image and Video Processing (Invited)

Jonathon Shlens, Google, USA
George Toderici, Google, USA


Deep learning has profoundly changed the field of computer vision in the last few years. Many computer vision problems have been recast with techniques from deep learning and in turn achieved state of the art results and become industry standards. In this tutorial we will provide an overview about the central ideas of deep learning as applied to computer vision. In the course of this tutorial we will survey the many applications of deep learning to image and video problems. The goal of this tutorial is to teach the central and core ideas and provide a high level overview of how deep learning has influenced computer vision.


-Motivations for deep learning in computer vision.
-Recent progress in applying deep learning for vision.
-Architectures for image classification and image regression.
-Survey of image recognition and localization techniques.
-Tools for performing deep learning
-Advances in image synthesis and image compression.
-Architectures for video classification and summarization.


Jonathon Shlens received his Ph.D in computational neuroscience from UC San Diego in 2007 where his research focused on applying machine learning towards understanding visual processing in real biological systems. He was previously a research fellow at the Howard Hughes Medical Institute, a research engineer at Pixar Animation Studios and a Miller Fellow at UC Berkeley. He has been at Google Research since 2010 and is currently a research scientist focused on building scalable vision systems. During his time at Google, he has been a core contributor to deep learning systems including the recently open-sourced TensorFlow. His research interests have spanned the development of state-of-the-art image recognition systems and training algorithms for deep networks.

George Toderici received his Ph.D. in Computer Science from the University of Houston in 2007 where his research focused on 2D-to-3D face recognition, and joined Google in 2008. His current work at Google Research is focused on lossy multimedia compression using neural networks. His past projects include the design of neural-network architectures and various classical approaches for video classification, YouTube channel recommendations, and video enhancement.

T2: High Dynamic Range Video

Erik Reinhard, Technicolor, UK
Giuseppe Valenzise, ParisTech, France
Frederic Dufaux, ParisTech, France


High dynamic range (HDR) imaging technologies enable the capture, processing and display of images containing a much wider range of illumination compared to traditional imaging solutions. To achieve this, all aspects of the imaging pipeline need to be rethought and redesigned, which has led to an active area of research. In this course, the focus is on the capture, processing and display of HDR video, presenting the state-of-the-art in hardware and software technologies and discussing the main challenges pertinent to this exciting field.

Summary - This course discusses exciting new developments in high dynamic range imaging, with a strong focus on emerging solutions for the capture, transmission and display of video.

Prerequisites - The course has no specific prerequisites but participants would benefit most if they have a basic understanding of photography and image processing.

Intended Audience - This course is appropriate for students, researchers and practitioners with an interest in HDR imaging/video and specifically for those interested in understanding the practical aspects of the technology. The course will offer a balanced academic/industrial perspective on the state of this field.

Level of Difficulty - Beginner


Part 1 Introduction
Part 2 Capture & Display
    -Camera technologies
    -Multi-exposure techniques
    -Ghost removal
    -Display hardware
Part 3 Tone Reproduction
    -How to display HDR video on a standard display
    -Curve-based solution
    -Spatial processing
    -Video processing
Part 4 Inverse Tone Reproduction
    -How to display standard dynamic range video on an HDR display
Part 5 Color and Gamut Management
    -Gamut boundary management techniques
    -Luminance - chroma interactions
Part 6 Challenges in Compression
    -Pre- and post-processing techniques
Part 7 Wrap-up and Questions


Erik Reinhard is Distinguished Scientist at Technicolor R&I since July 2013 prior to holding various academic positions at universities and research institutes in Europe and North America. He was founder and editor-in-chief of ACM Transactions on Applied Perception, and authored books on high dynamic range imaging, color imaging, computer graphics and natural image statistics. He enjoys research that spans different disciplines, including color science, high dynamic range imaging and human visual perception. He has published more than 100 papers in these areas, and was member of more than 50 program committees. He was program co-chair of 6 conferences and workshops, including the Eurographics Symposium on Rendering 2011. He delivered key notes at Eurographics 2010, the Computational Color Imaging Workshop 2011, and the IS\&T European Conference on Colour in Graphics, Imaging and Vision 2012. He has been a speaker in more than 15 courses and tutorials, of which 10 were delivered at SIGGRAPH.

Giuseppe Valenzise is a CNRS researcher at Telecom ParisTech, Paris, since October 2012. Previously, he worked as post-doc researcher in the same lab, starting from July 2011. He completed a master degree and a Ph.D. in Information Technology at the Politecnico di Milano in 2007 and 2011, respectively. From January 2009 to July 2009 he was a visiting scholar at the Signal and Image Processing Institute (SIPI) at the University of Southern California. His research interests span different fields of image and video processing, including single and multi-view video coding, high-dynamic range imaging, video quality assessment, video surveillance, image and video forensics, image and video analysis. He is co-author of more than 40 research publications.

Frederic Dufaux is a CNRS Research Director at Telecom ParisTech. He is also Editor-in-Chief of Signal Processing: Image Communication. Frederic received his M.Sc. in physics and Ph.D. in electrical engineering from EPFL in 1990 and 1994 respectively. He has over 20 years of experience in research, previously holding positions at EPFL, Emitall Surveillance, Genimedia, Compaq, Digital Equipment, MIT, and Bell Labs. He has been involved in the standardization of digital video and imaging technologies, participating both in the MPEG and JPEG committees. He is the recipient of two ISO awards for his contributions. Frederic was Vice General Chair of ICIP 2014. He is an elected member of the IEEE Image, Video, and Multidimensional Signal Processing (IVMSP) and Multimedia Signal Processing (MMSP) Technical Committees. He is also the Chair of the EURASIP Special Area Team on Visual Information Processing. His research interests include image and video coding, high dynamic range imaging, distributed video coding, 3D video, visual quality assessment, video surveillance, privacy protection, image and video analysis, multimedia content search and retrieval, and video transmission over wireless network. He is the author or co-author of more than 120 research publications and holds 17 patents issued or pending.

T3: Distributed Visual Processing

Andrea Cavallaro, Queen Mary University of London, UK


This tutorial will cover fundamental aspects, challenges and current solutions in distributed visual processing using networks of self-organising wired and wireless smart cameras, with applications in robotics, security and the Internet-of-Things. The tutorial sets forth the state-of-the-art in state estimation and coalition formation for distributed smart cameras. The tutorial will discuss and demonstrate the latest algorithms with a unified and comprehensive coverage. Using practical examples and illustration as support, the tutorial will introduce the participants in a discussion of the advantages and the limitations of traditional and modern approaches for synchronisation, distributed estimation and distributed processing for decision making and actuation in camera networks. Recent methods will be presented that allow cameras to move and to interact locally forming coalitions adaptively in order to provide coordinated decisions under resource and physical constraints. The tutorial will also discuss how cameras may learn to improve their performance. I will conclude the tutorial by introducing a collection of software resources to help the attendees develop and test distributed signal processing algorithms for wireless smart cameras.


Part A - Introduction
Problem formulation
Data sharing strategies
Resource and physical constraints
Application examples

Part B - Background
State estimation
Consensus approaches
Aggregation approaches
Ideal vs realistic network conditions
Costs and utility

Part C - Algorithms
Neighbour consensus
Task-based coalition formation
Camera network self-localisation (audio-visual)
Distributed processing for self-organisation
Distributed processing for self-positioning
Audio-visual synchronisation

Part D - Conclusions
Open problems
Research outlook
Resources: videos and code


Andrea Cavallaro is Professor of Multimedia Signal Processing and Director of the Centre for Intelligent Sensing at Queen Mary University of London, UK. He received his Ph.D. in Electrical Engineering from the Swiss Federal Institute of Technology (EPFL), Lausanne, in 2002. He was a Research Fellow with British Telecommunications (BT) in 2004/2005 and was awarded the Royal Academy of Engineering teaching Prize in 2007; three student paper awards on target tracking and perceptually sensitive coding at IEEE ICASSP in 2005, 2007 and 2009; and the best paper award at IEEE AVSS 2009.

Prof. Cavallaro is Senior Area Editor for the IEEE Transactions on Image Processing and Associate Editor for the IEEE Transactions on Circuits and Systems for Video Technology. He is an elected member of the IEEE Signal Processing Society, Image, Video, and Multidimensional Signal Processing Technical Committee, and chair of its Awards committee. He served as an elected member of the IEEE Signal Processing Society, Multimedia Signal Processing Technical Committee, as Area Editor for the IEEE Signal Processing Magazine, as Associate Editor for the IEEE Transactions on Multimedia and the IEEE Transactions on Signal Processing, and as Guest Editor for seven international journals. He was General Chair for IEEE/ACM ICDSC 2009, BMVC 2009, M2SFA2 2008, SSPE 2007, and IEEE AVSS 2007.

Prof. Cavallaro was Technical Program chair of IEEE AVSS 2011, the European Signal Processing Conference (EUSIPCO 2008) and of WIAMIS 2010. He has published more than 150 journal and conference papers, one monograph on Video tracking (2011,Wiley) and three edited books: Multi-camera networks (2009, Elsevier); Analysis, retrieval and delivery of multimedia content (2012, Springer); and Intelligent multimedia surveillance (2013, Springer).

T4: Embedded Computer Vision and Image Processing with OpenCL and OpenVX

Kari Pulli, Intel
Thomas Gardos, Intel
Dukhwan Kim, Intel


We will give a hands-on tutorial how to get started with real-time computer vision and image processing tasks on embedded devices using standard APIs from the Khronos Group: OpenCL and OpenVX. We will discuss different implementation choices, where the tasks can run on a CPU, GPU, or an imaging DSP. The tutorial participants will learn how to get off the ground programming these APIs. We will demonstrate the same tasks running on the three classes of processors, and discuss trade offs such as power-performance ratios of the different choices. The audience will learn which functionalities are supported by the standard, and which functionalities vendors implement in addition the minimum required set.


Overview of OpenVX and OpenCL for computer vision
Heterogeneous computing on CPU, GPU, and IPU
Hello OpenVX and OpenCL with predefined kernels (hands-on)
An extended computer vision example (hands-on)
Power-performance tradeoffs of different architectures


Kari Pulli
Kari is a Senior Principal Engineer at Intel, working in the Imaging and Camera Technologies Group. He has a long history in Computational Photography, Computer Vision, and Computer Graphics (earlier jobs include VP of Computational Imaging at Light, Sr. Director at NVIDIA Research, Nokia Fellow), with numerous publications (h-index = 29). Kari has a PhD from University of Washington, Seattle, he has also been a researcher / lecturer at Stanford, MIT, and University of Oulu. He has contributed to many multimedia standards at the Khronos Group, including OpenVX, and given courses at SIGGRAPH, CVPR, Eurographics, and many other conferences.

Thomas Gardos
Thomas is a Principal Engineer, 22-year veteran at Intel and is the lead imaging software architect in the Imaging and Camera Technologies Group. Tom was Intel's researcher-in-residence at the MIT Media Lab and Intel's representative to MPEG4 and H.263/H.264 video coding standards. Thomas has taught numerous courses and seminars on image and video processing, has been an adjunct professor at Oregon and Portland State Universities and was past associate editor of the IEEE Transactions on Multimedia. Prior to joining Intel Tom was in the Electronic Imaging Labs of Eastman Kodak and has a Masters and Ph.D. in digital image and signal processing from the Georgia Institute of Technology.

Dukhwan Kim
Dukhwan Kim is a software engineering manager in Photography Vision and Application team in Imaging and Camera Technologies Group at Intel. He joined Intel in 2012 when Olaworks, which he co-founded and which was specialized in developing computer vision technologies for mobile devices, was acquired by Intel. Throughout his career, he has been working on enabling computer vision technologies across many platforms and recently his focus is heterogeneous computing framework for computer vision. He received BS and MS degree in EECS from Seoul National University.

Afternoon Tutorials: Sunday 25 September 2016, 14:00 to 17:30

T5: Computational Photography

Mohit Gupta, Columbia University, USA
Jean-François Lalonde, Université Laval, Canada


In the last decade, computational photography has emerged as a vibrant field of research. A computational camera uses a combination of unconventional optics and novel algorithms to produce images that cannot otherwise be captured with traditional cameras. The design of such cameras involves the following two main aspects:
-Optical coding – modifying the design of a traditional camera by introducing programmable optical elements and light sources to capture maximal amount of scene information in images;
-Algorithm design – developing algorithms that take information captured by conventional or modified cameras, and create a visual experience that goes beyond the capabilities of traditional systems.

Examples of computational cameras that are already making an impact in the consumer market include wide field-of-view cameras (Omnicam), light-field cameras (Lytro), high dynamic range cameras (mobile cameras), multispectral cameras, motion sensing cameras (Leap Motion) and depth cameras (Kinect).

This course serves as an introduction to the basic concepts in programmable optics and computational image processing needed for designing a wide variety of computational cameras, as well as an overview of the recent work in the field.


A brief history of photography − Camera Obscura − Film, Digital and Computational photography;
Coded photography − Novel camera designs and functionalities, including:

- Optical coding approaches: Aperture, Image plane, and Illumination coding; Camera arrays,
- Novel functionalities: Light field cameras − Extended DOF cameras, Hyperspectral cameras − Ultra high-resolution cameras (Gigapixel) − HDR cameras − Post-capture refocusing and Post-capture resolution trade-offs,
- Depth cameras: Structured light − Time-of-flight,
- Compressive sensing: Single pixel and High speed cameras;

Augmented photography: algorithmic tools for novel visual experiences:

- Multiple viewpoints: Image stitching, panoramas − Gigapixel imaging − Large-scale structure from motion,
- Data-driven approaches: Texture transfer − Object transfer − Color/attribute/style transfer,
- 2D image plane vs 3D scene: Scene geometry estimation − Light, geometry, and object editing,
- Smarter tools: Content-aware inpainting − Edit propagation in image collections − Matte cutouts,
- Smartphone photography: Cheap optics / powerful computing − Virtual tripod, Burst-mode HDR and denoising − Video stabilization,
- Motion magnification and visual microphone;

Future and impact of photography:

- "Social/collaborative photography" or the Internet of Cameras,
- Wearable and flexible cameras,
- Seeing the invisible: seeing around corners, through walls, laser speckle photography,
- Image forensics,
- Next generation applications (personalized health monitoring, robotic surgery, self-driving cars, astronomy).


Mohit Gupta is an assistant professor in the CS department at the University of Wisconsin-Madison. Previously, he was a research scientist in the CAVE lab at Columbia University. He received a B.Tech. in computer science from Indian Institute of Technology Delhi in 2003, an M.S. from Stony Brook University in 2005 and a Ph.D. from the Robotics Institute, Carnegie Mellon University in 2011. His research interests are in computer vision and computational imaging. His focus is on designing computational cameras that enable computer vision systems to perform robustly in demanding real-world scenarios, as well as capture novel kinds of information about the physical world. Details can be found at http://pages.cs.wisc.edu/~mohitg/.

Jean-François LALONDE is an assistant professor in Electrical and Computer Engineering at Laval University, Quebec City. Previously, he was a Post-Doctoral Associate at Disney Research, Pittsburgh. He received a B.Eng. degree in Computer Engineering with honors from Laval University, Canada, in 2004. He earned his M.S at the Robotics Institute at Carnegie Mellon University in 2006 and received his Ph.D., also from Carnegie Mellon, in 2011. His Ph.D. thesis won the 2010-11 CMU School of Computer Science Distinguished Dissertation Award, and was partly supported by a Microsoft Research Graduate Fellowship. After graduation, he became a Computer Vision Scientist at Tandent, where he helped develop LightBrush™, the first commercial intrinsic imaging application, and introduced the technology of intrinsic videos at SIGGRAPH 2012. His work focuses on lighting-aware image understanding and synthesis by leveraging large amounts of data. More details about his research can be found here.

T6: Image Understanding and Information Mining for Very High Resolution Earth Observation

Mihai Datcu, German Aerospace Center (DLR), Germany


Very High Resolution (VHR) Satellite or Airborne Images enable detailed Observation of Erath structures, objects, and phenomena at global scale. The challenge is in a global understanding involving observations of large extended areas, and long periods of time, with a broad variety of Earth Observation (EO) imaging sensors. Typical EO multispectral sensors acquires images in several spectral channels, covering the visible and infrared spectra, or the Synthetic Aperture Radar (SAR) images are represented as complex values representing modulations in amplitude, frequency, phase or polarization of the collected RADAR echos. An important particularity of EO images should be considered, is their instrument nature, i.e. in addition to the spatial information, they are sensing physical parameters, and they are mainly sensing outside of the visual spectrum.

Therefore, the methods of EO Image Understanding, and EO Information Mining are new fields of study that have arisen to seek solutions to automating the extraction of information mainly from very high resolution EO images and that can lead to the creation of an actionable intelligence. The tutorial proposes very high resolution Earth Observation image content extraction as a challenge for Image Processing. The tutorial introduces specific EO image information processing methods, and provide an interdisciplinary view of methods in signal processing, machine learning and communication theory.


  1. 1. Data modeling and description, EO image types and characterization, meaningful and quantitative descriptors for VHR multispectral and SAR complex valued images.
  2. 2. Visualization of multispectral and SAR images, adaptive methods for optimal band selection and transformation, signatures and data transformation.
  3. 3. EO Data intelligence, hierarchical and generative models, data fusion and learning with multisensor data, semantic learning and annotation, learning modeling as a communication channel.
  4. 4. Satellite Image Time Series, change detection and analysis, recognition and classification of evolution patterns, spatio-temporal reasoning.
  5. 5. Parameter and model free multisesnsor image analysis, elements of algorithmic information theory for compression based pattern recognition, dictionary based similarity measures, lossy vs. lossless compression methods for information extraction.
  6. 6. Applications and perspectives, objects and structure recognition in very high resolution EO images, indexing and semantic annotation, search engines and Image Information Mining.


Mihai Datcu, received the M.S. and Ph.D. degrees in Electronics and Telecommunications from the University Politechnica Bucharest UPB, Romania, in 1978 and 1986. In 1999 he received the title Habilitation  diriger des recherches in Computer Science from University Louis Pasteur, Strasbourg, France. Since 1981 he has been Professor with the Department of Applied Electronics and Information Engineering, Faculty of Electronics, Telecommunications and Information Technology (ETTI), UPB, working in image processing and Electronic Speckle Interferometry. Since 1993, he has been a scientist with the German Aerospace Center (DLR), Oberpfaffenhofen. He is developing algorithms for model-based information retrieval from high complexity signals and methods for scene understanding from Very High Resolution Synthetic Aperture Radar (SAR) and Interferometric SAR data. He is engaged in research related to information theoretical aspects and semantic representations in advanced communication systems. Currently he is Senior Scientist and Image Analysis research group leader with the Remote Sensing Technology Institute (IMF) of DLR, Oberpfaffenhofen. Since 2011 he is also leading the Immersive Visual Information Mining research lab at the Munich Aerospace Faculty and he is director of the Research Center for Spatial Information at UPB. His interests are in Bayesian inference, information and complexity theory, stochastic processes, model-based scene understanding, image information mining, for applications in information retrieval and understanding of high resolution SAR and optical observations. He has held Visiting Professor appointments with the University of Oviedo, Spain, the University Louis Pasteur and the International Space University, both in Strasbourg, France, University of Siegen, Germany, University of Innsbruck, Austria, University of Alcala, Spain, University Tor Vergata, Rome, Italy, Universidad Pontificia de Salamanca, campus de Madrid, Spain, University of Camerino, Italy, the Swiss Center for Scientific Computing (CSCS), Manno, Switzerland, From 1992 to 2002 he had a longer Invited Professor assignment with the Swiss Federal Institute of Technology, ETH Zurich. Since 2001 he has initiated and leaded the Competence Centre on Information Extraction and Image Understanding for Earth Observation, at ParisTech, Paris Institute of Technology, Telecom Paris, a collaboration of DLR with the French Space Agency (CNES). He has been Professor holder of the DLR-CNES Chair at ParisTech, Paris Institute of Technology, Telecom Paris. He initiated the European frame of projects for Image Information Mining (IIM) and is involved in research programs for information extraction, data mining and knowledge discovery and data understanding with the European Space Agency (ESA), NASA, and in a variety of national and European projects. He is a member of the European Image Information Mining Coordination Group (IIMCG). He and his team have developed and are currently developing the operational IIM processor in the Payload Ground Segment systems for the German missions TerraSAR-X, TanDEM-X, and the ESA Sentinel 1 and 2. He is the author of more than 350 scientific publications, among them about 60 journal papers, and a book on number theory. He has served as a co-organizer of International Conferences and workshops, and as guest editor of special issue on IIM of the IEEE and other journals. He received in 2006 the Best Paper Award, IEEE Geoscience and Remote Sensing Society Prize, in 2008 the National Order of Merit with the rank of Knight, for outstanding international research results, awarded by the President of Romania, and in 1987 the Romanian Academy Prize Traian Vuia for the development of SAADI image analysis system and activity in image processing. He is IEEE Fellow of Signal Processing, Computer and Geoscience and Remote Sensing societies.

T7: The Open Set Recognition Problem and Its Implications and Opportunities in Visual Computing, Forensics and Security

Anderson Rocha, University of Campinas, Campinas, SP, Brazil
Walter Scheirer, University of Notre Dame, South Bend, IN, USA


Coinciding with the rise of large-scale statistical learning within the visual computing, forensics and security areas, there has been a dramatic improvement in methods for automated image recognition in myriad of applications ranging from, categorization, object detection, forensics, and human biometrics, among many others. Despite this progress, a tremendous gap exists between the performance of automated methods in the laboratory and the performance of those same methods in the field. A major contributing factor to this is the way in which machine learning algorithms are typically evaluated: without the expectation that a class unknown to the algorithm at training time will be experienced at test time during operational deployment.

The purpose of this tutorial is to introduce the ICIP audience to this difficult problem in statistical learning specifically in the context of visual computing, information forensics and security applications. Examples considering other areas will also be given for completeness. A number of different topics will be explored, including supervised machine learning, probabilistic models, kernel machines, the statistical extreme value theory, and case studies for applications related to the analysis of images. The tutorial is composed of four parts, the first three lasting approximately 45 minutes and the last one lasting about 30 minutes, totaling three hours (with a 15-minute break after the first half). This material is broad enough to appeal to a majority of ICIP attendees, including students, researchers, and practitioners. A complete outline follows.


The tutorial is composed of four parts, the first three lasting approximately 45 minutes and the last one lasting about 30 minutes, totaling three hours (with a 15-minute break after the first half). This material is broad enough to appeal to a majority of ICIP attendees, including students, researchers, and practitioners. A complete outline follows.

Part 1: An introduction to the open set recognition problem
- General introduction: where do we find open set problems in visual computing, information forensics and security?
- Decision models in machine learning
- Theoretical background: the risk of the unknown
- The compact abating probability model (Scheirer et al. T-PAMI 2014)
- The Open-set Optimum-Path Forest classifier (under review)

Part 2: Algorithms that minimize the risk of the unknown
- Kernel Density Estimation
- 1-Class Support Vector Machines (SVMs)
- Support Vector Data Description
- 1-vs-Set Machine (Scheirer et al. T-PAMI 2013)
- PI-SVM (Jain et al. ECCV 2014)
- W-SVM (Scheirer et al. T-PAMI 2014)
- Decision Boundary Carving (Costa et al. 2014)

15-minute break

Part 3: Case studies related to visual computing and other areas
- Image Classification/Recognition
- Visual Information Retrieval
- Detection problems (e.g., pedestrian, objects)
- Face Recognition
- Scene Analysis for Surveillance
- Source Camera Attribution
- Authorship Attribution

Part 4: Research opportunities and trends
- The open set recognition problem and new feature characterization methods (e.g., deep learning)
- Integrating open set solutions with the image characterization process directly (strongly generalizable image characterization)
- Opportunities for novelty detection and automatic addition of classes (online adaptation)
- Bringing the user into the loop (relevance feedback)
- Final considerations


Anderson de Rezende Rocha is an associate professor at the Institute of Computing, University of Campinas (UNICAMP). He received his B.Sc (Computer Science) degree from Federal University of Lavras (UFLA), Brazil in 2003, and his M.S. and Ph.D. (Computer Science) from University of Campinas (Unicamp), Brazil, in 2006 and 2009, respectively. His main interests include Reasoning for Complex Data, Digital Forensics and Machine Intelligence. He has actively worked as a program committee member in several important events and is an associate editor of leading international journals such as the IEEE Transactions on Information Forensics and Security (T.IFS), Elsevier Journal of Visual Communication and Image Representation (JVCI), the EURASIP/Springer Journal on Image and Video Processing (JIVP) and the IEEE Security & Privacy Magazine. He is an elected affiliate member of the Brazilian Academy of Sciences (ABC) and of the IEEE Information Forensics and Security Technical Committee (IFS-TC). He is a Microsoft Research Faculty Fellow and a member of the Brazilian Academy of Forensic Sciences (ABCF).

Walter J. Scheirer, Ph.D. is an Assistant Professor in the Department of Computer Science and Engineering at the University of Notre Dame. Previously, he was a postdoctoral fellow at Harvard University, with affiliations in the School of Engineering and Applied Sciences, Dept. of Molecular and Cellular Biology and Center for Brain Science, and the director of research & development at Securics, Inc., an early stage company producing innovative computer vision-based solutions. He received his Ph.D. from the University of Colorado and his M.S. and B.A. degrees from Lehigh University. Dr. Scheirer has extensive experience in the areas of computer vision and human biometrics, with an emphasis on advanced learning techniques. His overarching research interest is the fundamental problem of recognition, including the representations and algorithms supporting solutions to it.

T8: Transport and other Lagrangian transforms for image modeling, estimation, and classification

Soheil Kolouri, HRL Laboratories, USA
Gustavo K. Rohde, University of Virginia, USA


This tutorial presents a set of recently developed image analysis techniques, based on the mathematics of optimal transport, to address important problems related to sensor data modeling, estimation, and pattern recognition (e.g. classification). These techniques can be interpreted as nonlinear image transforms with well defined forward (analysis) and inverse (synthesis) operations with demonstrable advantages over standard linear transforms (Fourier, Wavelet, Radon, Ridgelet, etc.). These techniques have been shown to allow for more photo-realistic PCA, LDA and related image models, increased accuracy in image based cancer detection tasks, and super-resolution image reconstruction. The materials presented will be taken from publications from the same authors, as well as other related publications. The tutorial will include demonstration using existing and freely available software.


Crash-course in optimal transport - (45 minutes)
    - Monge's and Kantorovich's formulations
    - Bernier's theorem
    - Riemmanian geometry and optimal transport
    - Optimal transport in 1D and ND, and optimization methods
Lagrangian transforms for signal and images (90 minutes)
    - Cumulative distribution transform
    - Linear optimal transport
    - Radon-CDT and the Sliced Wasserstein distance
Applications and demos (45 minutes)
    - Image-based modeling
    - Sensor data classification
    - Image reconstruction and super-resolution


Soheil Kolouri received his B.S. degree in electrical engineering from Sharif University of Technology, Tehran, Iran, in 2010, and his M.S. degree also in electrical engineering in 2012 from Colorado State University, Fort Collins, Colorado. He received his doctorate degree in biomedical engineering from Carnegie Mellon University in 2015, were his research was focused on applications of the optimal transport in signal and image modeling, computer vision, and pattern recognition. His thesis, titled, "Transport-based pattern recognition and image modeling'', won the best thesis award from the Biomedical Engineering Department at Carnegie Mellon University. He is currently a postdoctoral research associate in the Biomedical Engineering Department at Carnegie Mellon University.

Gustavo K. Rohde is an associate professor of Biomedical Engineering, and Electrical and Computer Engineering at the University of Virginia, Charlottesville, VA, USA. He has authored over 60 peer reviewed publications and is currently serving as an associate editor for IEEE Transactions on Image Processing, BMC Bioinformatics, and IEEE Journal of Biomedical and Health Informatics, in addition to being on the editorial board of Cytometry A. His research and teaching interests include predictive modeling in medicine and biology, cytometry, signal and image processing, computer vision, machine learning, and mobile and remote sensing.