Story-Driven Summarization for Egocentric Video
Zheng Lu and Kristen Grauman
Abstract
As cameras and media storage continues to grow, the author stated, the usage will become more and more ubiquitous. With these upwelling recorded videos whose length is ever increasing, it becomes impossible for human to view every single detail from start to end and hence techniques to analyze and summarize these videos become more significant than ever.
This paper targets at summarizing eco-centric videos taken by wearable devices or as the authors stated, robots, producing a shortened clip given a long video without losing much information and context. Traditionally, papers handling this issue often focuses on selecting high-quality subshots, while putting little effort on inter-shot relationship and hence sometimes loses the context as how one shot transitions to the next and often includes too many redundant subshots.
This paper hence focuses on "telling the story" out of the clip by selecting the best chain of subshots that maximizes a three-part objective function, consisting of story, importance and diversity. Of the three part, story is the essence of this paper -- through using relationship between objects to analyze the intimacy of subshots. Briefly, objects are first detected in subshots and a bipartite directed graph is constructed connecting subgraphs and objects with the weights denoting the probability the object given a certain scene or vice versa. With the graph, random walk is initiated to get the closeness of a pair of subshots based on an intuitive assumption that if two subshots A and B are highly correlated, walks starting from A will be highly likely to end at B.
With the story part and the other two, the best chain of subshots could be retrieved. And through experiments the author showed that subjects preferred their summarization to other three baselines in a blind test where each subject is given summarizations derived using different methods.
Contributions
- Inspired by a previous work targeting at summarizing news using text, the authors successfully transform it and use the concept on video ego-centric summarization.
- Proposed a objective function that considers the preserved context of a video in a generated summarizing clip.
- Proposed a segmentation method tailored to ego-centric videos which deals with lack of sharp distinction often used to get subshots.

沒有留言:
張貼留言