Eye Spy A PSU! Automating Sampling Frame Construction from Aerial Images Using Machine Learning
Main Article Content
Abstract
The availability of sampling frames is critical for the use of probability-based sampling methods in social science research. Extensive literature addresses how sampling frames can be constructed when the target population consists of people. Less understood is how sampling frames should be constructed when the population being studied consists of places, objects, or locations (POLs). In this paper, we propose an approach that employs machine learning to automate the construction of sampling frames of POLs from aerial (or satellite) images when the POLs of interest have distinctive visual characteristics (e.g., windmills, playgrounds, religious centers). Automating this process with machine learning alleviates the time and monetary costs of researchers manually reviewing potentially several thousands of aerial images to identify sampling units. We evaluate our approach using a case study constructing sampling frames of windmills as POLs within the state of Iowa. We train convolutional neural networks to identify windmills within aerial images from the U.S. Department of Agriculture’s National Agriculture Imagery Program and find that our approach successfully predicted 80% of the windmills in the area of interest (1,521 out of 1,913 windmills across ten counties in Iowa) and ruled out 99% of locations lacking the POL of interest (out of over 300,000). Thus, we achieved good coverage in the resulting sampling frames and suggest that any over-coverage could be removed with manual review of only a small number of images — rather than all of them — representing an approximate 98% reduction in the manual effort required without machine learning.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.