Content-based image retrieval and speech enhancement system using deep learning structure
MetadataShow full item record
Deep learning recently attracted a lot of attention in image processing and signal processing. It shows great potential in downsampling high dimension data while abstracts key information inside these data. This characteristic makes deep learning very powerful in content-based image retrieval (CBIR) and speech enhancement (SE) because both of them need high quality and low dimension semantic features. By using the code layer in deep autoencoder (DAE), which is a fully connected deep learning model, the CBIR and SE system can get decent results. For the CBIR, our newly designed multiple input multiple task DAE (MIMT-DAE) using wavelet coefficients can even get better performance than the single input single task DAE using less trainable parameters. However, for image processing, the fully connected structure shows limitation and a locally connected structure named convolutional neural network (CNN) and a hybrid structure is proposed in this dissertation. The CNN works as a preprocessing stage for the autoencoder can provide better input features than the raw images because its locally connected weights. The hybrid structure boosts the retrieval performance substantially in both grayscale and color image retrieval. For the SE system, the fully connected DAE trained only on mask approximation (MA) function does not present desired performance. We design a multiple task structure adding a signal approximation (SA) function during training for the SE system to reduce false positive. Training on both cost functions simultaneously gives much better performance than trained only on MA function or even the latest method that fined tuned on SA function. We also explored the long-short term memory structure and propose it as the future work.