Malicious Action Monitoring Technology By Coupling Deep Leanings Of Image Recognition And Natural Language Processing

Year

2021

Author(s)

Kazuyuki Demachi - University of Tokyo

File Attachment

Abstract

In recent years, deep learning has been very successful in various fields. In particular, since image data contains a lot of information, it is expected that deep learning will be applied to nuclear security technology for detecting the malicious action. However, most human action recognition using deep learning only sets OK / NO in advance for the combination of object identification result, so it is difficult to apply them to complicated nuclear security situation. On the other hand, in most nuclear facility, the nuclear security rules are determined with precise documents. So, it is desirable to detect malicious actions in captured images according to these rule documents. If it is possible to flexibly determine whether the captured image scene violates or matches the nuclear security rules, it is necessary to greatly improve its practicality. However, this requires different deep learning interfaces for image identification and natural language processing. Most of the deep learning research currently being conducted is closed only within each field, and there is no mutual access, that is, an interface. The human brain, on the other hand, usually links and processes all kinds of cognitive information, including language, sight, speech, and touch. Like the brain, deep learning abilities should develop anew when different types of deep learning are interconnected. One of the reasons this is difficult is that there is no common data format between different types of deep learning. In this study, it was determined that the graph structure is the most suitable as a common data format. The graph structure is a method of expressing the correlation between an object and a person by connecting nodes with edges. In this research, a deep learning interface method between image recognition and natural language processing was developed via a graph structure. This allows flexible judgment of malicious actions according to the rule document.