Data-driven modeling and transportation data analytics

Date

2014-05

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Data has become increasingly important in transportation research. Unfortunately, existing traffic models, though developed and practiced for decades, are not data driven and therefore inherently incapable of analyzing modern traffic data from multiple sources with different time resolution and spatial coverage. A new paradigm centered on data-driven theories needs to be established, which can fully exploit and leverage traffic data toward new insights on traffic dynamics, accurate traffic forecast and effective traffic control.

This dissertation focuses on investigations of advanced data-driven methods and developing mathematical models and solution algorithms for analyzing multimodal transportation data from a single source, multiple sources, and large networks. As studies on big transportation data are still in their very early stages, a key objective of this research is to explore suitable modeling and computing methods to address the fundamental problems of transportation data. To this end, various types of traffic data, including microscopic vehicle trajectory data, macroscopic velocity data from multiple sensors, and network-wide Floating Car data were studied. Specifically, this dissertation concentrates on the following three topics: • Single source data: analysis of asymmetric driving behavior using high resolution trajectory data The access to high-resolution vehicle trajectory data opened up new avenues for understanding and modeling both micro- and macroscopic traffic phenomena. A novel data-driven algorithm was developed based on the kernel machine to extract driving patterns from trajectory data, which avoids subjective biases in traditional physical models. A particular focus has been paid to analyzing the asymmetry phenomena in driving behavior. The study successfully proved the existence of significant asymmetry between deceleration and acceleration and revealed its impacts to macroscopic traffic flow characteristics. New findings have revealed a strong connection between the asymmetric driving behavior and prominent macroscopic traffic phenomena including congestion propagation and recovery.

• Multiple source data: traffic projection by fusing stationary and mobile data Due to the advancement of sensing technologies, traffic data are emerging from multiple sources with different temporal and spatial characteristics and varying types of errors. A challenging task in analysis and modeling of multi-source transportation data lies in the combination of floating data from mobile sensors with data from stationary sensors such as loop detectors. To assimilate both stationary and mobile data to estimate highway traffic, a Gaussian process model is developed with a novel covariance function to integrate fundamental features of the congestion propagation within a Bayesian framework. Field experiments with data from U.S. 880 proved the model’s capability of providing reliable and accurate traffic estimation and prediction with a variety of information.

• Large network data: impact of service refusal to urban taxicab system using Floating Car Data The true challenge of big transportation data lies in the techniques and modeling approaches for analyzing multimodal data from large transportation networks. With the knowledge obtained from traffic data analysis, in-depth investigation of network level and multimodal transportation data becomes possible. In traffic data analysis, the focus has been placed on the demand side only, and our goal is to develop methodologies and modeling approaches to accurately reproduce traffic flow profiles and estimate traffic dynamics. A unique problem in network level, multimodal transportation analysis, however, is both the demand and supply sides need to be considered and the result usually relates to policy making. The data to be analyzed in this section is operational data in the taxicab market, which goes well beyond the engineering arena. The technical models are extended to include social and economic components in addition to engineering analysis. A partial differential equation system with a sigmoid function was developed to address the impacts of service refusal to the demand-supply equilibrium of a taxicab system. From the combined approach of data-driven and network analysis, new insights have been gained, which lead to promising policy recommendations against this unpleasant phenomena.

Centered on the technical issues and challenges with regard to the data-driven modeling approach (versus the traditional physical modeling methods), and the development and application of advanced analytic models to address real-world traffic and transportation problems, this dissertation spans a wide range of topics pertinent to the selected typical problems in traffic flow theory and transportation network analysis. Topics include 1) a self-learning car-following model which uses a pure data-driven approach and produces better results than traditional models, 2) accurate reproduction of traffic flow profile and estimation of the flow dynamics using data from both stationary and floating sensors, and 3) analysis of both engineering and socioeconomic data to solve the service refusal problem in a taxicab market.

As pure data-driven methods are still in the early stage, a systematic investigation of the technical issues and methodological approaches pertinent to comprehensive engineering and socioeconomic analysis of transportation data is timely and meaningful. Although it cannot cover every aspect of this promising area, this dissertation is aimed to lay a stone in the foundation of the pure data-driven and self-learning approach in a timely and systematic manner. It contributes to the state of the knowledge by answering the following questions: • Is it possible to use pure self-learning and data-driven methods to develop traffic flow models with the same or an even better level of accuracy of classic theoretical models? What methods are suitable for this approach and what are the technical issues in applying these methods? [Chapter 3] • What are the flaws in existing methods for integration and analysis of data from multiple sources and how to improve these methods? [Chapter 4], and • To make the data-driven approach a complete technical system, it must also be applicable to problems in the socioeconomic area that are fundamental to transportation policy making. How can analysis be extended from the engineering arena, and what are the suitable methodological approaches for analyzing combined engineering and socioeconomic data? [Chapter 5]

Description

Keywords

Traffic data, Asymmetric driving behavior, Support vector regression, Gaussian process regression, Traffic estimation, Taxicab market, Service refusal

Citation