Protecting data privacy with anonymity: Quantifying instinctive measures and
intelligent effective search for optimal anonymized data

Arca, Sevgi

Protecting data privacy with anonymity: Quantifying instinctive measures and intelligent effective search for optimal anonymized data

dc.contributor.committeeChair	Hewett, Rattikorn
dc.contributor.committeeMember	Serwadda, Abdul
dc.contributor.committeeMember	Dang, Tommy
dc.contributor.committeeMember	Salman, Tara
dc.contributor.committeeMember	Chen, Lin
dc.contributor.committeeMember	Yamazaki, Kazuo
dc.creator	Arca, Sevgi
dc.date.accessioned	2022-06-02T14:08:01Z
dc.date.available	2022-06-02T14:08:01Z
dc.date.created	2022-05
dc.date.issued	2022-05
dc.date.submitted	May 2022
dc.date.updated	2022-06-02T14:08:02Z
dc.description.abstract	Data privacy entails the ability for individuals to control their personal data. With advanced technology in this digital era, users can lose control of their personal data without knowing as their data can be tracked, stored, and shared across multiple parties. Protecting online data privacy is a daunting task. Current consent-based privacy policies tend to be too elaborate and difficult to apply effectively. There is a need for new approaches to protecting data privacy that go beyond user consent model. As more data are shared and publicly available, attackers can further infer confidential data from multiple query sources to commit malicious acts. This dissertation addresses the issues of how to better protect privacy of such published structured data, particularly, fundamental issues of anonymity quantification and practical issues on efficiency and optimality of automated anonymization. Anonymity is among the most widely used property for data privacy protection. It represents the state of indistinguishability. Thus, increasing users’ anonymity increases their indistinguishability that makes it harder for them to be re-identified. Anonymization ensures that each set of "critical" data values belongs to more than one individual so that the individual's identity can be protected. Many privacy-preserving approaches to anonymizing structured data involve transforming the original data into a more anonymous form (via generalization and suppression) while preserving the data integrity. Although techniques for anonymization have been studied extensively, to our surprise, most of them do not directly measure anonymity but use a measure specified to indirectly indicate the quality of anonymity (e.g., anonymity degree in k-anonymity). Most existing anonymity measures are indirect since they are based on entropy that estimates information loss, a partial consequence of anonymity. Anonymity measure is at the heart of anonymization and yet there is little research on quantifying a direct measure of anonymity. This dissertation models two direct anonymity measures: information-based and inference-based for the disclosure breach and re-identification attack, respectively. The unique aspect of the formulation is its instinctive articulation of opposing perspectives of victims (in concealing their identity) and attackers (in finding the disclosure or identity). Furthermore, unlike most other work, this study distinguishes the measure of an individual anonymity from that of the group. When dealing with large-scaled data, by using data distribution, this dissertation proposes measures of uniformity, variety, and diversity as anonymity indicators to quickly assess degrees of data privacy. On practical issues of anonymization, to improve efficiency, most general-purpose anonymization techniques aim to find "optimal" k-anonymization (or anonymized data satisfying k-anonymity requirements), e.g., by minimizing data distortion or the number of generalization steps. However, the problem of finding k-anonymization to maximize preserved information is NP-hard. This has led to greedy anonymization and special purpose techniques. Still, a common issue in anonymization is trade-offs between data privacy and informativeness. Generalization helps gain anonymity but can result in data that are not useful. Anonymization approaches are mostly designed to address specific goals (e.g., accurate classification, efficient algorithms) but none provides an integrated solution for efficiency, privacy, and preserved informativeness. This dissertation presents a general-purpose anonymization technique that applies generalization for securing privacy by satisfying user-specified anonymity requirements while optimizing information preservation. The proposed approach exploits the monotonicity property of generalization along with a heuristic search to efficiently find optimal generalized data that comply with the anonymity requirements. The approach is theoretically grounded as the search can be mapped to a well-known efficient optimal search in Artificial Intelligence. In addition, the approach can give the data quality for classification relatively well even though its intent is to keep the generalized data as close as possible to the original. Finally, this dissertation puts together a practical methodology for anonymity analytics and retention.
dc.description.abstract	Restricted to TTU community only. To view, login with your eRaider (top right). Others may request the author grant access exception by clicking on the PDF link to the left.
dc.format.mimetype	application/pdf
dc.identifier.uri	https://hdl.handle.net/2346/89366
dc.language.iso	eng
dc.rights.availability	Restricted to TTU community only.
dc.subject	Privacy
dc.subject	Anonymity
dc.subject	Intelligent Anonymization
dc.subject	Privacy Measures
dc.title	Protecting data privacy with anonymity: Quantifying instinctive measures and intelligent effective search for optimal anonymized data
dc.type	Thesis
dc.type.material	text
thesis.degree.department	Computer Science
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Texas Tech University
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ARCA-DISSERTATION-2022.pdf
Size:: 2.05 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: LICENSE.txt
Size:: 1.84 KB
Format:: Plain Text
Description:

Download

Collections

Electronic Theses and Dissertations