Cross-modal retrieval based on deep regularized hashing constraints

Abstract

Cross-modal retrieval has attracted great attention due to the increasing demand for tremendous amounts of multimodal data in recent years. These retrievals could either be text-to-image or image-to-text. To address the problem of inappropriate information included between images and texts, we propose two cross-modal recovery techniques established on a dual-branch neural network defined on a common subspace and the hashing learning method. First, a cross-modal recovery technique established on a multilabel information deep ranking model (MIDRM) is provided. In this method, we introduce a triplet-loss function into the dual-branch neural network model. This function takes advantage of the semantic information of the bimodal components, focusing on not only the similarities between similar images and text features but also the distances between dissimilar images and texts. Second, we establish a new cross-modal hashing technique said to be the deep regularized hashing constraint (DRHC). In this method, the regularized function is used to replace the binary constraint, and the discrete value is constrained to a certain numerical range so that the network can achieve end-to-end training. Overall, the time complexity is greatly improved, and the occupied storage space is also greatly reduced. Different experiments on our proposed MIDRM and DRHC models demonstrate their superior performance to those of the state-of-the-art methods on two widely used data sets. The experimental results show that our approach also increases the mean average precision of cross-modal recovery.

Publication
International Journal of Intelligent Systems