Bringing Energy into Data Analytics with Deep Learning: Modeling Energy Consumption and Measuring Uncertainty for Data Reliability
Energy matters. It drives economies, impacts global warming, and sustains societies. In today's digital world, while computing and smart technologies create Big Data and advanced analytics with deep learning (DL) to reap numerous benefits, the role of energy in data analytics is still relatively new and rare in practices. This dissertation addresses three main issues concerning energy in current data analytics. First, energy data are underused. Research in energy data analytics lags behind other domains even though energy data are ample from connected smart sensors/meters or home devices. Such analytics face at least two challenges: (1) data integration of multiple systems including natural, human, and engineering, and (2) data reliability of artifacts differentiating unreliable data from faulty data. There is a need for a systematic approach to analyzing energy data and a measure of data reliability. Existing techniques focus on improving data quality by fusing data from multiple sources and detecting abnormality or inconsistency. However, these do not quantify or even characterize reliability, as abnormal data can still be reliable. Reliability is about trusts and confidence of the data. This dissertation proposes a theoretical approach to measuring data reliability by beliefs of the data certainty. Second, data analytics do not integrate energy usage into its computational cost. Thus, resulting models are typically selected by performance criteria (e.g., accuracy) as opposed to combining with energy criteria (e.g., energy saving). This is acceptable for negligible energy cost but not when energy resources are critical. Before one can account for energy usage of computation in data analytics, one needs to be able to measure it. This dissertation focuses on estimating energy consumption of DL analytics. Unlike existing approach that uses simulation models, we propose an analytical approach to energy modeling to estimate energy usage of DL. Finally, most applied machine learning (ML)/DL systems consider energy issues as an afterthought after the system was implemented or deployed. Like security issues, while some errors can be patched, some with complex analytics processes cannot. This can lead to costly system re-building. This dissertation proposes integrating energy cost estimates into the design of the applied ML/DL system prior to building and employing the system. To summarize, this dissertation brings energy into data analytics in multiple aspects from energy data analytics to quantifying energy consumption in analytics computation to designing various applied ML/DL systems that take energy consumption into consideration. The dissertation's contributions are: (1) a study of energy data analytics for a utility company. The study shows analytics of complex tradeoffs of consumer preference system and the utility pricing and scheduling system; (2) a fundamental approach to data reliability measure that is also applicable to energy data. Using Dempster-Shafer evidence theory, the approach formulates mass functions to estimate a degree of belief (or trust) and thus, uncertainty or reliability of the data, (3) a fundamental approach to estimating energy consumption in DL analytics computation both in training and testing. Unlike prior work, our approach explicitly derives specific number of basic units (i.e., number of multiply-and-accumulate operations and data accesses) of the estimation; finally (4) a systematic approach to utilizing energy estimates in various applied ML/DL systems design under multiple constraints (e.g., energy cost, quality of services) and in varying infrastructures (e.g., Internet of Things, Smart Systems, Distributed Edge Computing). One notable distinction is the ability to account for energy consumption of analytics computation prior to the system implementation.
Embargo status: Restricted until 01/2028. To request the author grant access, click on the PDF link to the left.