Research Mentor(s)
Scott Wehrwein
Description
Long-term video streams such as those available from live streaming webcams provide a unique view into changes in scenes that happen over multiple timescales. While timelapses and other machine learning free techniques are able to show some changes, they struggle to simultaneously represent changes occurring at differing time-scales. In our approach, we aim to represent frames in a video as latent vectors through learned models. We focus on two major directions: (1) learning models that encode and decode frames to and from a latent vector, and (2) learning a model that directly generates frames from latent vectors conditioned on user specified properties. In both cases we aim to enforce that the latent vector represents timescale related content in a separable fashion, either directly as in case (2) or indirectly via a novel loss function as in case (1). The ability to analyze and manipulate these latent representations has the potential to provide novel insights into the long-term video that expands beyond what could be seen in non-learned approaches. We refer to this process of separating out timescale related content through learned latent-space representations as “timescale disentanglement.”
Document Type
Event
Start Date
May 2022
End Date
May 2022
Location
Carver Gym (Bellingham, Wash.)
Department
CSE - Computer Science
Genre/Form
student projects; posters
Type
Image
Rights
Copying of this document in whole or in part is allowable only for scholarly purposes. It is understood, however, that any copying or publication of this document for commercial purposes, or for financial gain, shall not be allowed without the author’s written permission.
Language
English
Format
application/pdf
Timescale Disentanglement
Carver Gym (Bellingham, Wash.)
Long-term video streams such as those available from live streaming webcams provide a unique view into changes in scenes that happen over multiple timescales. While timelapses and other machine learning free techniques are able to show some changes, they struggle to simultaneously represent changes occurring at differing time-scales. In our approach, we aim to represent frames in a video as latent vectors through learned models. We focus on two major directions: (1) learning models that encode and decode frames to and from a latent vector, and (2) learning a model that directly generates frames from latent vectors conditioned on user specified properties. In both cases we aim to enforce that the latent vector represents timescale related content in a separable fashion, either directly as in case (2) or indirectly via a novel loss function as in case (1). The ability to analyze and manipulate these latent representations has the potential to provide novel insights into the long-term video that expands beyond what could be seen in non-learned approaches. We refer to this process of separating out timescale related content through learned latent-space representations as “timescale disentanglement.”