Crisis social media data labeled for storm-related information and toponym usage
MetadataShow full item record
Social media provides citizens and officials with important sources of information during times of crisis. This data article makes available labeled, storm-related social media data collected over a six-hour period during a severe storm and F1 tornado that struck Central Pennsylvania on May 1st, 2017. Three datasets were collected from Twitter using location, keyword, and network filtering techniques, respectively. Only 2% of the 22,706 total tweets overlap among the datasets, providing researchers with a broader scope of information than normally available when collecting tweets using location (i.e., geotag-based) and keyword filtering alone or in combination during a crisis. Each data collection technique is described in detail, including network filtering which collects data from networks of social media users associated with a geographic area. The datasets are manually labeled for information content and toponym usage. The 22,706 tweet IDs, dehydrated for privacy, are labeled for relevance (storm-related and off-topic) and 19 types of storm-related information organized into six categories: infrastructure damage, service disruption, personal experience, weather updates, weather forecasts, and weather warnings. Data are also labeled for toponym usage (with or without toponyms), location (local, remote, and generic toponyms), and granularity (hyperlocal, municipal, and regional toponyms). The comprehensively labeled datasets provide researchers with opportunities to analyze crisis-related information behaviors and volunteered location information behaviors during a hyperlocal crisis event, as well as develop and evaluate automated filtering, geolocation, and event detection techniques that can aid citizens and crisis responders.