Content-based image retrieval and speech enhancement system using deep learning structure

Date

2017-05

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Deep learning recently attracted a lot of attention in image processing and signal processing. It shows great potential in downsampling high dimension data while abstracts key information inside these data. This characteristic makes deep learning very powerful in content-based image retrieval (CBIR) and speech enhancement (SE) because both of them need high quality and low dimension semantic features. By using the code layer in deep autoencoder (DAE), which is a fully connected deep learning model, the CBIR and SE system can get decent results. For the CBIR, our newly designed multiple input multiple task DAE (MIMT-DAE) using wavelet coefficients can even get better performance than the single input single task DAE using less trainable parameters. However, for image processing, the fully connected structure shows limitation and a locally connected structure named convolutional neural network (CNN) and a hybrid structure is proposed in this dissertation. The CNN works as a preprocessing stage for the autoencoder can provide better input features than the raw images because its locally connected weights. The hybrid structure boosts the retrieval performance substantially in both grayscale and color image retrieval. For the SE system, the fully connected DAE trained only on mask approximation (MA) function does not present desired performance. We design a multiple task structure adding a signal approximation (SA) function during training for the SE system to reduce false positive. Training on both cost functions simultaneously gives much better performance than trained only on MA function or even the latest method that fined tuned on SA function. We also explored the long-short term memory structure and propose it as the future work.

Description

Keywords

Deep leaning, CBIR, Speech enhancement, Autoencoder, Convolutional neural network

Citation