Investigating US breast cancer mortality rates using big data



Journal Title

Journal ISSN

Volume Title



Over the last few decades advances have been made in the diagnosis and treatment of breast cancer (BC). With access to secondary data, cancer mortality rates can now be examined outside of clinical and laboratory settings. Despite progress in diagnosis and treatment, no one has utilized a diverse data set using secondary data to assess the current state of BC etiology across the US. This study investigates BC mortality rates (age-adjusted) using environmental, behavioral, demographic, and other determinants. The importance of these variables is assessed using current literature. Using principal component analysis and subsequent regression analysis over 20 explanatory variables are examined. The variables relationship to US BC mortality rates is further determined and explained. The major finding of this paper provides the first evidence from secondary data sources that environmental contaminants and one’s surrounding living conditions are statistically significant determinants of female BC mortality. The other significant finding is that factors associated with poverty (e.g., low rates of literacy, income, and female Medicaid eligibility) and access to mammogram facilities appear to have a positive relationship with BC mortality rates.



Breast cancer, Big data, Principal component analysis