Browsing by Author "Liu, Ying"

Now showing 1 - 5 of 5

Attacks and Defenses in Privacy-Preserving Representation Learning
(2023-08) Zhan, Huixin; Sheng, Victor; Liu, Ying; Dang, Tommy; Serwadda, Abdul; Zhuang, Yu
Nowadays, the users’ privacy concerns mandate data publishers to protect privacy by anonymizing the data before sharing it with data consumers. Thus, the ultimate goal of privacy-preserving representation learning is to protect user privacy while ensuring the utility, e.g., the accuracy of the published data, for future tasks and usages. Privacy-preserving embeddings are usually functions that are encoded to low-dimensional vectors to protect privacy while preserving important semantic information about an input text. We demonstrate that these embeddings still leak private information, even though the low dimensional embeddings encode generic semantics. In this dissertation, we first develop two classes of attacks, i.e., adversarial classification (AC) attack and adversarial generation (AG) attack, to study the new threats for these embeddings. In particular, the threats are (1) these embeddings may reveal sensitive attributes letting alone if they explicitly exist in the input text, and (2) the embedding vectors can be partially recovered via generation models. We further propose a semi-supervised generative adversarial network that inverts the given embeddings back to the sensitive raw text inputs via querying the model. This approach can produce higher-performing adversary models than other AC and AG baselines. Besides, we argue that privacy protection of privacy-preserving representation learning breaks during inference with model partitioning. Specifically, the hidden representations are easy to be eavesdropped during uploading the data from the local devices to the cloud. Based on the aforementioned two attack models, i.e., AC and AG, we correspondingly propose two defenses: defending the adversarial classification (DAC) and defending the adversarial generation (DAG). Both methods optimally modify a subpopulation of the neural representations that are subject to maximally decreasing the adversary’s ability. The representations trained with this bilevel optimization achieve a higher-level sensitive information protection, compared with the current state-of-the-art method~\citep{coavoux2018privacy}, while maintaining their utility for downstream tasks. Moreover, because some of the hidden private information correlates with the output attributes and therefore can be learned by a neural network. In such a case, there is a trade-off between the utility of the representation and its privacy. We explicitly cast this problem as Multi-objective optimization (MOO) and propose a multiple-gradient descent algorithm that enables the efficient application of the Frank-Wolfe algorithm to search for the optimal utility-privacy configuration of the text classification task. Graph neural networks (GNNs) combine the representational power of neural networks with the graph structure. In essence, GNNs compute a sequence of node representations by aggregating information at each node from its neighbors and itself. However, not all data is adequately expressed in terms of pairwise relationships. Interactions in a social network, for instance, do not solely occur in a pairwise relation but also among larger groups of people. This warrants a simplicial complex in order to represent rich and complex datasets. Simplicial complexes describe relational structures that are closed under restriction. Simplicial neural networks (SNNs) have already proven useful in some applications, e.g. coauthorship complexes~\citep{ebli2020simplicial} and social networks~\citep{chen2022bscnets}. The machinery of SNNs allows us to consider richer data, including vector fields and $n$-fold collaboration networks. It's challenging to investigate how to encode the node embeddings of GNNs with higher-order simplices structures using novel topology-aware graph convolution operations, because the input node-level embeddings could be sparse, high-dimensional and preserve complex connections. We first propose two methods for SNN construction. The first method uses the higher-order combinatorial Laplacian to model the higher-order interactions between nodes. The second method iteratively updates the feature representation of node $v_i$ using side information collected from nodes within its $K$-hop local neighborhood from the clique complex filtration. Moreover, we propose three graph reconstruction attacks (GRAs) that recover a graph’s adjacency matrix from three types of representation outputs, i.e., representation outputs from graph convolutional networks, graph attention networks, and simplicial neural networks (SNNs). We find that SNN outputs reveal the lowest privacy-preserving ability to defend the GRAs. Thus, it calls for future research towards building more private and higher-order representations to defend against such threats. Next, we will study the GRA on text-attributed networks that each node is associated with rich text information. Recent network representation learning attempts to integrate rich text information (from the node attributes) with the network structures together to enhance the quality of network representation~\citep{huang2017accelerated}. However, it has not been studied whether this will bring new threats to the network structure (i.e. edge privacy attack). We find out that network representations learned from both network structures and rich text information (NR\_SRT) is more vulnerable to GRAs, compared to network representations learned from pure network structures (NR\_S). Therefore, we propose a privacy-preserving deterministic differentially private alternating direction method of multiplier (D$^2$-ADMM) to learn network representations with network structures and rich text information. Our experimental results show that D$^2$-ADMM achieves the best privacy-preserving ability among all privacy-preserving baselines on three real-world scientific publication datasets in terms of standard metrics (e.g., AUCs of GRAs) and two proposed metrics (Canberra distances between every node pair and utility scores).
Does farmland rental contribute to reduction of agrochemical use? A case of grain production in Gansu province, China
(2019) Liu, Ying; Wang, Chenggang (TTU); Tang, Zeng; Nan, Zhibiao
As a consequence of the new strategy to boost productivity capacity and ensure food security, China's farmland rental market is developing rapidly, and its impacts on agricultural productivity have been extensively studied. However, the impacts of farmland rental on food safety have not been considered. The aim of this study was to determine the causal effects of farmland rental on fertilizer and pesticide use in wheat and maize production and evaluate the potential effects of this activity on food safety. Survey data obtained from 900 households in eight counties in Gansu province were used in this study, and the propensity score matching (PSM) method was employed to solve selection bias problems with the data. The results showed that farmland rental significantly reduced fertilizer and pesticide use in wheat and maize production, implying a potential reduction in heavy metal contamination of food and drinking water as well as less pesticide residues-remaining in food and contamination of environment. Also, households renting land were more likely to adopt new agricultural technologies and management methods and to acquire more agricultural acknowledges and information than those not renting land and renting out land. Thus, farmland rental is a benefit to the application of new agricultural technologies and management methods, to rational use of agrochemicals, and finally to food safety and environmental conservation. Policies such as encouraging farmland rental, enhancing education of farmers, improving technological innovation, and providing better information transfer should help ensure not only "enough food" but also "safe food".
Farmland Rental and Productivity of Wheat and Maize: An Empirical Study in Gansu, China
(2017) Liu, Ying; Wang, Chenggang (TTU); Tang, Zeng; Nan, Zhibiao
The rapid growth of farmland rental markets in China raises questions about the association of farmland rental and agricultural productivity. Although this issue has been extensively studied, the majority of studies have focused on yields and technical efficiency, with input use and cost efficiency receiving little attention. This study aimed to determine the statistical association of wheat and maize farmers’ farmland rental behaviors (renting land, not renting land and renting out land) and input use, and the consequent association of farmers’ farmland rental behaviors and cost efficiency. For this purpose, the linear regression model and stochastic frontier model were employed, based on a survey data of 419 wheat and maize farmers in 25 villages in five counties of Gansu Province, China. The study found that farmland rental enhanced productivity and sustainability of agriculture through transferring farmland from households with less productivity to those with high productivity, and it was also helpful to reducing the consumption of fertilizers and chemicals in agricultural production. The results suggest that replacing labor with machines is an important way to reduce production costs, and households specializing in agricultural production use more rational amounts of fertilizers and chemicals than those with low productivity. Thus, the machinery purchase policy in China should continue to give great benefit to farmers. In addition, the machinery purchase subsidization policy has achieved satisfactory results in China, and it could be a good reference for other developing countries. However, some efficiency loss was found in households that rented out their land, and policy makers need to pay some attention to these households.
High-resolution mapping of ground-level fine particulate matter and the associated human health risks
(2018-08) Liu, Ying; Cao, Guofeng; Vanos, Jennifer; Lee, Jeffrey A.; Mulligan, Kevin
Fine particulate matter with aerosol dynamic diameters equal to or less than 2.5 micrometers (PM2.5) is a major component of air pollutants widely threatening public health. To control and mitigate its adverse effects on human health, it is essentially important to explore the potential factors influencing ground-level PM2.5 concentrations and the associations between long-term PM2.5 exposure and its health outcomes. In my dissertation, I incorporate the spatial synoptic classification weather type data to investigate the impacts of meteorological factors on ground-level PM2.5 concentrations in a holistic fashion rather than individual meteorological variables separately. It was found that tropical (polar) weather types have positive (negative) effects on the ground-level PM2.5 concentrations and these positive (negative) effects varied seasonally and geographically. Accurate mapping of ground-level PM2.5 concentrations is the prerequisite for investigating the adverse effect of PM2.5 exposure on human health. However, the current PM2.5 monitoring networks leave many people unmonitored. Satellite-derived gridded PM2.5 images from chemical transport models (CTM) have demonstrated unique attractiveness in terms of their geographic and temporal coverage but often yield results with a coarse spatial resolution and tend to ignore or simplify the impact of geographic and socioeconomic factors on PM2.5 concentrations. In the second part of my dissertation, a random forests-based regression kriging (RFRK) approach was developed to improve the spatial resolution of a CTM-derived PM2.5 dataset from 0.1° to 0.01° with a combined use of in situ PM2.5 observations, brightness of nighttime lights, vegetation index, and elevation. The accuracy and advantages of the proposed approach are demonstrated by comparing the results with an existing PM2.5 dataset with the same spatial resolution. The effectiveness of the geographical variables in long-term PM2.5 mapping were highlighted and the contribution of each variable to the spatial distribution of PM2.5 concentrations was discussed. The third part of my dissertation targets on mapping the distribution of PM2.5-attributable mortality for detecting the potential benefits of PM2.5 control. To highlight the impact of geographic scales and variations of geospatial datasets on the estimation of PM2.5-attributable mortalities, I compared the estimations derived from PM2.5 concentration datasets at different spatial resolutions (i.e., 0.01° and 0.1°) and mortality statistics at different geographic scales (i.e., sub-regional and county-level). Using ischemic heart diseases (IHD) in the contiguous United States (U.S.) as a case study, it was found that the estimated PM2.5-IHD mortalities from the 0.1° PM2.5 dataset tend to be smaller than the estimations from the 0.01° PM2.5 datasets, while the estimated PM2.5-IHD mortalities from the sub-regional-level mortality rates tend to be larger than the estimations from the county-level ones. Simultaneously, the spatiotemporal change of PM2.5-attributable IHD mortality were extracted during 2000 and 2015 and it showed the PM2.5-IHD deaths decreased approximate 50%. A scenario analysis indicated up to 90% deaths could be avoided with the PM2.5 concentration decreased by 4 μg/m3 throughout the country. Influences of long-term PM2.5 exposure on public health have been investigated by many previous studies. However, reliability of those studies may be affected by limited measurements or inaccurate PM2.5 estimations. The last part of my dissertation linked the RFRK-refined PM2.5 dataset with fine spatial resolution and high accuracy to the hospital admission databases for Arizona. Relative risks (RRs) of PM2.5-attributable morbidity were calculated for all-cause, skin cancer, asthma, cerebrovascular, chronic respiratory, and heart diseases. Logarithmic risk functions for all-cause, skin cancer, asthma, and heart diseases and polynomial risk functions for respiratory and cerebrovascular diseases are developed for the total population. I also examined whether long-term PM2.5 exposure had varied intensities on human health among different subpopulations. Female had significantly higher risk of PM2.5-attributable morbidity than those of male for all-cause and heart diseases. African Americans are more vulnerable than Whites and Hispanics for all-cause, heart, and respiratory diseases. Hispanics more easily suffer from skin cancer than Whites from PM2.5 exposure, while Whites’ RRs for cerebrovascular diseases are apparently higher than those of Hispanics.
Will farmland transfer reduce grain acreage? Evidence from Gansu province, China
(2018) Liu, Ying; Wang, Chenggang (TTU); Tang, Zeng; Nan, Zhibiao
Purpose: The purpose of this paper is to examine the impacts of farmland renting-in on planted grain acreage. Design/methodology/approach: A survey data of five counties were analyzed with the two-stage ordinary least squares model. Findings: Households renting-in land trended to plant more maize, and the more land was rented by a household the more maize was planted, while wheat acreage showed non-response to farmland renting-in. Practical implications: Overall, the analysis suggests that policy makers should be prepared for different changing trends of grain crop acreage across the nation as farmland transfer continues. Future research should pay attention to the effect of farmland transfer on agricultural productivity and rural household income growth. Originality/value: As the Chinese Government is promoting larger-scale and more mechanized farms as a way of protecting grain security, it is important to understand whether farmland renting-in will reduce planted grain acreage. This study provides empirical evidence showing the answer to that question may differ across different regions and depend on the particular grain crop in question.