## Applications of spatial autocorrelation

##### Abstract

As we use time series analysis to study data with respect to their time of occurrence, we can also use spatial statistics to study data with respect to their locations of occurrence in space. Even though the history of spatial probabilistic analysis goes way back to the 18th century (Buffon's needle problem), a serious attempt to first study spatial statistics was first made at the beginning of the 20th century (Student, 1907).
This study examines measures of overall spatial autocorrelation or association. The word ``autocorrelation" means the correlation of a variable with itself
(over time or over space, or both). According to Griffith (1987), the quality and quantity of information contained in spatial data is reflected on spatial autocorrelation.
For example, in the case of a numerical variable of interest, if most pairs of neighbouring localities have values of the variable of interest both above the average or both below the average, then spatial autocorrelation tends to be ``large" in some way (above certain number), while if, on the other hand, for most pairs of
neighbouring localities, one locality has a value of the variable above the average and the other one has a value below the average, then the autocorrelation measure tends to be ``small" (below a certain number).
The study of spatial statistics takes different forms according to the kind of data used. For example, when the data are nominal categorical we can use join counts as measures of spatial association. For example, we can find the number of neighbouring localities that are of the same ``type" (category of the nominal variable) and the number of neighbouring localities of different ``types." Moran is one of the first authors who studied join count statistics. He also calculated the moments of join counts in 1948. Similar studies had been carried out by P.~V.~Krishna Iyer in 1949 and 1950 and by Florence Nightingale David in 1971. They both came out with similar results but in different experimental environments.
In the second chapter of the thesis, we review some of the work of Moran on join counts and their moments.
Spatial autocorrelation of numerical data is usually carried out using Moran's $I$ coefficient and Geary's ratio $c$ (introduced in 1950 and 1954, respectively). In the third
chapter of this thesis, we review some of the probabilistic properties of these spatial autocorrelation coefficients that show how a variable is correlated with itself over space.
We use the statistical packages R and SAS to calculate and apply the above statistics to some examples with spatial data. In addition, we show the connection
of the join count statistics with Moran's $I$ coefficient and Geary's ratio $c$, which is probably one of the new contributions in this thesis.
Throughout most of the thesis, we show (using modern notation) the randomization properties of some of the above spatial statistics, that is, we review Moran's and Geary's
calculations on the probabilistic behaviour of these statistics by conditioning on the observed data (values of the variable of interest in different localities), but not
in the order they appear. In other words, we show how statisticians in spatial statistics derive the distribution of some autocorrelation statistics under the non-free sampling scenario.