Attacks and Defenses in Privacy-Preserving Representation Learning

dc.contributor.committeeChairSheng, Victor
dc.contributor.committeeMemberLiu, Ying
dc.contributor.committeeMemberDang, Tommy
dc.contributor.committeeMemberSerwadda, Abdul
dc.contributor.committeeMemberZhuang, Yu
dc.creatorZhan, Huixin
dc.date.accessioned2023-09-27T17:35:53Z
dc.date.available2023-09-27T17:35:53Z
dc.date.issued2023-08
dc.description.abstractNowadays, the users’ privacy concerns mandate data publishers to protect privacy by anonymizing the data before sharing it with data consumers. Thus, the ultimate goal of privacy-preserving representation learning is to protect user privacy while ensuring the utility, e.g., the accuracy of the published data, for future tasks and usages. Privacy-preserving embeddings are usually functions that are encoded to low-dimensional vectors to protect privacy while preserving important semantic information about an input text. We demonstrate that these embeddings still leak private information, even though the low dimensional embeddings encode generic semantics. In this dissertation, we first develop two classes of attacks, i.e., adversarial classification (AC) attack and adversarial generation (AG) attack, to study the new threats for these embeddings. In particular, the threats are (1) these embeddings may reveal sensitive attributes letting alone if they explicitly exist in the input text, and (2) the embedding vectors can be partially recovered via generation models. We further propose a semi-supervised generative adversarial network that inverts the given embeddings back to the sensitive raw text inputs via querying the model. This approach can produce higher-performing adversary models than other AC and AG baselines. Besides, we argue that privacy protection of privacy-preserving representation learning breaks during inference with model partitioning. Specifically, the hidden representations are easy to be eavesdropped during uploading the data from the local devices to the cloud. Based on the aforementioned two attack models, i.e., AC and AG, we correspondingly propose two defenses: defending the adversarial classification (DAC) and defending the adversarial generation (DAG). Both methods optimally modify a subpopulation of the neural representations that are subject to maximally decreasing the adversary’s ability. The representations trained with this bilevel optimization achieve a higher-level sensitive information protection, compared with the current state-of-the-art method~\citep{coavoux2018privacy}, while maintaining their utility for downstream tasks. Moreover, because some of the hidden private information correlates with the output attributes and therefore can be learned by a neural network. In such a case, there is a trade-off between the utility of the representation and its privacy. We explicitly cast this problem as Multi-objective optimization (MOO) and propose a multiple-gradient descent algorithm that enables the efficient application of the Frank-Wolfe algorithm to search for the optimal utility-privacy configuration of the text classification task. Graph neural networks (GNNs) combine the representational power of neural networks with the graph structure. In essence, GNNs compute a sequence of node representations by aggregating information at each node from its neighbors and itself. However, not all data is adequately expressed in terms of pairwise relationships. Interactions in a social network, for instance, do not solely occur in a pairwise relation but also among larger groups of people. This warrants a simplicial complex in order to represent rich and complex datasets. Simplicial complexes describe relational structures that are closed under restriction. Simplicial neural networks (SNNs) have already proven useful in some applications, e.g. coauthorship complexes~\citep{ebli2020simplicial} and social networks~\citep{chen2022bscnets}. The machinery of SNNs allows us to consider richer data, including vector fields and $n$-fold collaboration networks. It's challenging to investigate how to encode the node embeddings of GNNs with higher-order simplices structures using novel topology-aware graph convolution operations, because the input node-level embeddings could be sparse, high-dimensional and preserve complex connections. We first propose two methods for SNN construction. The first method uses the higher-order combinatorial Laplacian to model the higher-order interactions between nodes. The second method iteratively updates the feature representation of node $v_i$ using side information collected from nodes within its $K$-hop local neighborhood from the clique complex filtration. Moreover, we propose three graph reconstruction attacks (GRAs) that recover a graph’s adjacency matrix from three types of representation outputs, i.e., representation outputs from graph convolutional networks, graph attention networks, and simplicial neural networks (SNNs). We find that SNN outputs reveal the lowest privacy-preserving ability to defend the GRAs. Thus, it calls for future research towards building more private and higher-order representations to defend against such threats. Next, we will study the GRA on text-attributed networks that each node is associated with rich text information. Recent network representation learning attempts to integrate rich text information (from the node attributes) with the network structures together to enhance the quality of network representation~\citep{huang2017accelerated}. However, it has not been studied whether this will bring new threats to the network structure (i.e. edge privacy attack). We find out that network representations learned from both network structures and rich text information (NR\_SRT) is more vulnerable to GRAs, compared to network representations learned from pure network structures (NR\_S). Therefore, we propose a privacy-preserving deterministic differentially private alternating direction method of multiplier (D$^2$-ADMM) to learn network representations with network structures and rich text information. Our experimental results show that D$^2$-ADMM achieves the best privacy-preserving ability among all privacy-preserving baselines on three real-world scientific publication datasets in terms of standard metrics (e.g., AUCs of GRAs) and two proposed metrics (Canberra distances between every node pair and utility scores).en_US
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/2346/96291
dc.language.isoengen_US
dc.rights.availabilityUnrestricted.
dc.subjectGraphen_US
dc.subjectPrivacy-preserving representation learningen_US
dc.subjectPrivacyen_US
dc.subjectSimplicial complexen_US
dc.subjectGraph neural networksen_US
dc.titleAttacks and Defenses in Privacy-Preserving Representation Learningen_US
dc.typeDissertation
dc.type.materialtext
thesis.degree.departmentComputer Science
thesis.degree.disciplineComputer Science
thesis.degree.grantorTexas Tech University
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ZHAN-DISSERTATION-2023.pdf
Size:
7.85 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.57 KB
Format:
Item-specific license agreed upon to submission
Description: