robots perceive time like humans? New study uses reinforcement learning to give time awareness

Big Data Digest Columns

Author: Christopher Dossman

compilation: VICKY, Joey, Yunzhou

, hello, everyone, this week's AI Scholar Weekly program is meeting with you again!

AI ScholarWeekly is an academic column in the field of AI. It is dedicated to bringing you the latest, most comprehensive and in-depth AI academic overview and covering the cutting-edge information of AI academic every week.

is updated weekly and AI research is done. It is enough to start with this article every week!

Key Words of the Week: Video Understanding, Time Perception, BERT

Best Academic Studies of the Week

KnowIT VQA: Answering Knowledge Questions about Video

in this article, the researchers proposed a new video understanding task by combining knowledge and video question answering.

first, they came up with a video dataset called KnowIT(knowledge informated temporal)VQA. The dataset is derived from the TV series The Big Bang Theory, which contains numerous quizzes. KnowIT has more than 24000 human-generated question-and-answer pairs and combines vision, text, and time with knowledge-based questions. Second, they propose a video understanding model that combines the visual and textual content of a video with program-specific knowledge.

they found:

knowledge has brought significant improvements to VQA in video; KnowIT VQA's performance still lags behind human accuracy, suggesting that it is helpful to study the limitations of current video modeling.

IMG_257

their work demonstrates the great potential of knowledge-based models in video understanding problems, these models will play a significant role in combining advances in natural language processing (NLP) and image understanding.

this framework proves that both video comprehension and knowledge-based reasoning are necessary to answer questions. It is able to retrieve and fuse the spatio-temporal domains of language and video in order to reason about questions to predict the correct answer.

, there is still a big gap compared to human performance. The researchers hope that the dataset will help develop more robust models in the field.

original text:

https://arxiv.org/abs/1910.10706v3

uses reinforcement learning to teach robots to perceive time

It is well known that the brains of humans and animals have different areas responsible for time cognition, while robots perform tasks based on algorithms that treat time as an external entity (such as a clock). Is it possible to biologically excite time-sensing mechanisms and reproduce them in robots?

in this work, the researchers looked at the timing mechanisms that the brain uses to be responsible for the perception of time. They used the results of Bayesian inference to estimate the time passage of the data, and used TD learning feature representations to train agents to successfully complete time-related tasks. By choosing features that represent time, they show that, in this case, they are able to provide agents with a perception of time loss similar to that experienced by humans and animals.

the main contributions of this paper:

propose a modeling method for collecting environmental data from robot sensors; under certain assumptions, the correct time estimation can be obtained from the data; the successful application of time cognition mechanism to reinforcement learning problems; and the ability to replicate animal behavior in time-related tasks.

this work presents a procedure for providing temporal cognition to agents. For robots, the perception of time allows them to learn to adapt to dialogue in different environments and characters, as humans do. The framework has been proposed for implementation in real robots in the future.

original text:

https://arxiv.org/abs/1912.10113

Lite BERT: Self-Supervised Learning Language Representation

in this article, researchers at Google AI designed a Lite BERT(ALBERT) architecture with much fewer parameters than traditional BERT. An ALBERT configuration similar to the BERT-large has 18 times fewer parameters and about 1.7 times faster training than the former.

ALBERT integrates two parameter reduction techniques: the first technique is decomposed embedded parameterization; the second technique is cross-layer parameter sharing, which prevents parameters from growing as the depth of the network increases. These two techniques greatly reduce the number of parameters of BERT without seriously affecting performance, thereby improving parameter efficiency.

IMG_258

parametric reduction techniques can also be used as a form of regularization to stabilize training and aid in generalization.

To further improve ALBERT's performance, the researchers also introduced a self-supervised loss for sentence order prediction. As a result, they were able to scale to larger ALBERT configurations that still had fewer parameters than BERT-large but significantly improved performance, establishing entirely new results for natural language understanding on GLUE, SQuAD, and RACE benchmarks.

ALBERT demonstrates its importance in identifying models because it produces a strong and appropriate representation in these aspects.

By focusing on improving these aspects of the model architecture, this study shows that the efficiency and performance of the model can be greatly improved over a wide range of NLP tasks. To facilitate further development in the field of NLP, researchers have open sourced ALBERT to the research community.

code with pre-trained model:

https://github.com/google-research/google-research/tree/master/albert

original text:

https://ai.googleblog.com/2019/12/albert-lite-bert-for-self-supervised.html

Ordered or Disordered? Let's Revisit Personality Identification for Video

Video-based "re-identification" method is a hot research direction in the field of computer vision in recent years, because it can achieve better recognition results by making full use of spatial and temporal information.

in this paper, the researchers propose a simple but stunning VPRe-id approach that treats VPRe-id as an effectively unordered set of image-based people re-identification problems.

IMG_259

specifically, the researchers divided a video into many individual images, then identified and ranked the people appearing in these images, and reassembled the final result. They are premised on the I. I. d. assumption and provide an error boundary for clarifying ways to improve VPRe-id.

this work also presents a promising way to bridge the gap between video and person information re-identification results. The researchers' assessment of this gap demonstrates that their proposed approach is industry-leading across multiple datasets, including iLIDS-VID, PRID 2011, and MARS.

Video-based person re-identification is very important because it has a wide range of applications in visual surveillance and forensics. This work proposes a simple and powerful solution for character re-identification by treating VPReid as a task to perform disordered overall ranking. Each basic ranking is embodied by an identifier with a single person identity.

This solution learns the unordered representation by using multiple feature representations in the time pool of the RNN output at different time steps, which the researchers believe is more important for VPRe-id. The results also prove the fact that we can solve VPRe-id from different angles.