International Conference on Computer Vision and Pattern Recognition (CVPR), 2019
Abstract: Metric learning networks are used to compute image embeddings, which are widely used in many applications such as image retrieval and face recognition. For many real-world applications, the networks used to compute embeddings must be highly efficient, and therefore these applications cannot take advantage of the latest state-of-the-art deep networks. In this paper we study network distillation to efficiently compute image embeddings with small networks. Network distillation has been successfully used to improve image classification, but has hardly been explored for metric learning. To do so, we propose two new loss functions that model the communication of a deep teacher network to a small student network. We evaluate our system in several datasets, including CUB-200-2011 and Cars-196 and show that embeddings computed using small student networks perform significantly better than those computed using standard networks of similar size. Results on a very compact network (MobileNet-0.25), which can be used on mobile devices, show that the proposed method can greatly improve Recall@1 results from 27.5% to 44.6%. Furthermore, we investigate various aspects of distillation for embeddings, including hint and attention layers, semi-supervised learning and cross quality distillation.
Thirty-second Conference on Neural Information Processing Systems (NIPS), 2018
Abstract: Previous works on sequential learning address the problem of forgetting in discriminative models. In this paper we consider the case of generative models. In particular, we investigate generative adversarial networks (GANs) in the task of learning new categories in a sequential fashion. We first show that sequential fine tuning renders the network unable to properly generate images from previous categories (i.e. forgetting). Addressing this problem, we propose Memory Replay GANs (MeRGANs), a conditional GAN framework that integrates a memory replay generator. We study two methods to prevent forgetting by leveraging these replays, namely joint training with replay and replay alignment. Qualitative and quantitative experimental results in MNIST, SVHN and LSUN datasets show that our memory replay approach can generate competitive images while significantly mitigating the forgetting of previous categories.
International Conference on Pattern Recognition (ICPR), 2018
Abstract: In this paper we propose an approach to avoiding catastrophic forgetting in sequential task learning scenarios. Our technique is based on a network reparameterization that approximately diagonalizes the Fisher Information Matrix of the network parameters. This reparameterization takes the form of a factorized rotation of parameter space which, when used in conjunction with Elastic Weight Consolidation (which assumes a diagonal Fisher Information Matrix), leads to significantly better performance on lifelong learning of sequential tasks. Experimental results on the MNIST, CIFAR-100, CUB-200 and Stanford-40 datasets demonstrate that we significantly improve the results of standard elastic weight consolidation, and that we obtain competitive results when compared to other state-of-the-art in lifelong learning without forgetting.
International Conference on Computer Vision and Pattern Recognition (CVPR), 2018
Abstract: We propose a novel crowd counting approach that leverages abundantly available unlabeled crowd imagery in a learning-to-rank framework. To induce a ranking of cropped images , we use the observation that any sub-image of a crowded scene image is guaranteed to contain the same number or fewer persons than the super-image. This allows us to address the problem of limited size of existing datasets for crowd counting. We collect two crowd scene datasets from Google using keyword searches and query-by-example image retrieval, respectively. We demonstrate how to efficiently learn from these unlabeled datasets by incorporating learning-to-rank in a multi-task network which simultaneously ranks images and estimates crowd density maps. Experiments on two of the most challenging crowd counting datasets show that our approach obtains state-of-the-art results.
International Conference on Computer Vision (ICCV), 2017
Abstract: We propose a no-reference image quality assessment (NR-IQA) approach that learns from rankings (RankIQA). To address the problem of limited IQA dataset size, we train a Siamese Network to rank images in terms of image quality by using synthetically generated distortions for which relative image quality is known. These ranked image sets can be automatically generated without laborious human labeling. We then use fine-tuning to transfer the knowledge represented in the trained Siamese Network to a traditional CNN that estimates absolute image quality from single images. We demonstrate how our approach can be made significantly more efficient than traditional Siamese Networks by forward propagating a batch of images through a single network and backpropagating gradients derived from all pairs of images in the batch. Experiments on the TID2013 benchmark show that we improve the state-of-the-art by over 5%. Furthermore, on the LIVE benchmark we show that our approach is superior to existing NR-IQA techniques and that we even outperform the state-of-the-art in full-reference IQA (FR-IQA) methods without having to resort to high-quality reference images to infer IQA.
Transactions on Pattern Analysis and Machine Intelligence, 2019
Abstract: For many applications the collection of labeled data is expensive laborious. Exploitation of unlabeled data during training is thus a long pursued objective of machine learning. Self-supervised learning addresses this by positing an auxiliary task (different, but related to the supervised task) for which data is abundantly available. In this paper, we show how ranking can be used as a proxy task for some regression problems. As another contribution, we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. We apply our framework to two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results for both IQA and crowd counting. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning and we show that this reduces labeling effort by up to 50%.
Advisors: Joost van de Weijer and Andrew D. Bagdanov
Abstract: In this thesis we present a no-reference image quality assessment (NR-IQA) approach based on deep Siamese networks. One of the major challenges to apply deep learning techniques to the problem of image quality assessment is the absence of large data sets. To address this problem, we train our Siamese Network to rank images in terms of image quality by using ranked image sets for which relative image quality is known. These ranked image sets can be automatically generated without the use of laborious human labelling. We then use fine-tuning to transfer the knowledge represented by the trained Siamese Network to a traditional CNN that is able to estimate absolute image quality from single images. To solve the difficulty of pair selection for Siamese network training, we demonstrate how our approach can be made significantly more efficient than traditional Siamese Networks by forward propagating a batch of images through a single network and backpropagating gradients derived from all pairs of images in the batch. We evaluate our approach on the LIVE dataset. Our approach is demonstrated to be superior to the existing NR-IQA techniques. Furthermore, we are the first NR-IQA method to surpass the state-of-the-art full-reference IQA (FR-IQA) methods. Experiments on TID2008 and Places2 datasets show the generalization ability of our approach.
Collaborate within the Learning and Machine Perception (LAMP) groups at different research projects.
IEEE Transactions on Image Processing (T-IP), IEEE Transactions on Multimedia (T-MM).
I received my B.Sc. and M.Sc. degrees in Information Engineering and Control Engineering from the Northwestern Polytechnic university (NWPU), China in 2013 and 2016, respectively. I received my second M.Sc. degree in Computer Vision from the Universitat Autònoma de Barcelona (UAB), Barcelona in 2016. Currently, I am pursuing the Ph.D. degree under the supervision of Dr. Joost van de Weijer and Dr. Andrew D. Bagdanov starting in 2016. My main research interests include Deep Neural Networks, Object Detection, Image Quality Assessment, Crowd Counting, GANs and Lifelong Learning.