11 Introduction to Satellite Imagery in Social Science Research
Remote sensing satellites have become an important source of data for social scientists and public policy analysts. In the past, studies of human societies relied heavily on ground surveys, government statistics, and qualitative observations. Today, a wealth of satellite imagery provides new perspectives on socio-economic and environmental phenomena from local to global scales. For example, satellites can detect nighttime lights as a proxy for economic activity, revealing patterns of urbanization, electrification, and even population distribution in regions where traditional data are sparse. Earth observation data have been used to examine urban growth, agricultural change, deforestation, conflict impacts, and other issues central to social science and policy domains. The steadily improving quality, frequency, and accessibility of satellite data since the 1950s has opened up new opportunities to integrate remote sensing with social research. Importantly, studies have found that remote sensing insights are most powerful when combined with on-the-ground data such as surveys, censuses, interviews, and administrative records. This chapter provides a comprehensive overview of how to structure and analyze satellite imagery and derived products (including time-series data and even emerging satellite video) for applications in the social sciences and public policy.
We begin with an introduction to different types of satellite data – optical multispectral and hyperspectral images, thermal infrared sensors, and synthetic aperture radar – explaining what information each provides and how they differ. We then cover key image preprocessing steps necessary to make satellite data analysis-ready: georeferencing to align with geographic coordinates, radiometric normalization and calibration to ensure consistency, noise reduction (such as filtering sensor or speckle noise), and cloud masking to deal with obstructions in optical imagery. Next, we discuss practical aspects of data access via modern platforms and libraries. Powerful cloud-based services like Google Earth Engine (GEE) and Sentinel Hub offer application programming interfaces (APIs) and packages in both R and Python to acquire and process imagery (e.g. through R packages like rgee or Python libraries like geemap for GEE, and the sentinelhub Python API). For working with images locally, we introduce popular tools in R (the classic raster package and its faster successor terra) and in Python (e.g. rasterio), which enable reading, manipulating, and analyzing raster data.
With data in hand, we explore image structuring techniques. These include pixel-level classification methods to derive land cover or land use maps from raw pixels, object-based approaches to identify and classify features (such as buildings or roads) in imagery, and advanced computer vision methods for object detection. We also address time-series analysis and change detection, which leverages sequences of images over time to monitor trends (like urban expansion or forest loss) and sudden changes (like disaster impacts or conflict damage). Throughout, we provide illustrative case studies linking remote sensing analysis to social science questions: for example, using satellite data to quantify deforestation rates on indigenous lands (informing land rights discussions), mapping urban growth and infrastructure in rapidly developing cities, detecting population displacement and destruction in conflict zones, and estimating poverty and wealth indices from space. These cases demonstrate how satellite-derived indicators can be integrated with other spatial or socio-economic datasets – such as combining a land cover map or vegetation index with household survey data or using satellite-derived nighttime light intensity alongside census records – to yield richer insights for policy. Finally, we discuss the growing role of deep learning and computer vision in remote sensing. Convolutional neural networks (CNNs) have revolutionized image analysis by automatically learning complex features, enabling high-accuracy classification and detection in satellite and aerial imagery. We examine how researchers are applying CNNs (using frameworks like TensorFlow/Keras or PyTorch, including via R’s keras or torch packages and the reticulate interface) to tasks such as mapping poverty or detecting buildings, and how one can integrate such models into a research workflow.
Throughout the chapter, code snippets in R and Python are provided to illustrate key processes. These examples demonstrate common tasks like reading imagery, computing indices (e.g., NDVI for vegetation), performing simple classifications, and querying data catalogs. The goal is to ensure readers – particularly those from social science backgrounds with basic statistical training – can follow the technical steps and appreciate how satellite image analysis is carried out in practice. With a blend of conceptual explanation, practical instruction, and real-world examples, this chapter equips social scientists to confidently incorporate satellite imagery into their research toolkit, bridging the gap between remotely sensed data and pressing social and policy questions.
11.1 Types of Satellite Data
Satellite sensors capture information in various parts of the electromagnetic spectrum, and understanding the types of imagery available is a crucial first step. Different sensors have different spectral, spatial, temporal, and radiometric resolutions, which determine the kind of information we get from them. Here we focus on four broad categories of satellite data commonly used in socio-environmental applications: multispectral imagery, hyperspectral imagery, thermal infrared imagery, and synthetic aperture radar (SAR) imagery. Each provides unique insights into ground conditions.
Multispectral Imagery (Optical)
Most traditional Earth observation satellites (like the USGS/NASA Landsat series or ESA’s Sentinel-2) are multispectral, meaning they capture reflected sunlight in a handful of broad wavelength bands. These typically include the visible light bands – red, green, and blue – that can be combined into natural-color images, as well as near-infrared and shortwave infrared bands beyond human vision. Multispectral sensors generally record on the order of 3 to 15 separate bands. For example, Landsat 8’s Operational Land Imager collects data in 11 bands (including visible, near-infrared, two shortwave infrared, a panchromatic band, and two thermal bands). Each band represents a range of wavelengths (e.g. Landsat’s Band 4 covers red light ~0.64–0.67 µm, Band 5 covers near-infrared ~0.85–0.88 µm, etc.), and the imagery typically has a moderate spatial resolution (10–30 meters for many multispectral satellites).
Multispectral imagery is the workhorse of remote sensing and is widely used for land cover classification, vegetation monitoring, water and urban mapping, and more. The combination of visible and infrared bands allows us to calculate indices like the Normalized Difference Vegetation Index (NDVI) to quantify vegetation health, or false-color composites that highlight features like healthy vegetation (which strongly reflects near-IR) or water (which absorbs IR). The spectral resolution (number and width of bands) of multispectral sensors is coarser than hyperspectral (described next), but their spatial resolution is often higher, and data volumes are smaller, making them very practical for large-area and time-series analysis. For instance, Sentinel-2 provides 10 m resolution in visible and near-IR bands, suitable for mapping fields or neighborhoods, whereas hyperspectral satellites might sacrifice spatial detail for spectral richness.
In multispectral imagery, each band can be thought of as a layer of a raster image. To interpret the data, we often combine three bands into an RGB display. A natural color image would assign the satellite’s red band to red, green to green, and blue to blue in the display. By contrast, an infrared false-color image might display near-infrared as red, red as green, and green as blue, which makes vegetation (high NIR reflectance) appear red. This can be useful for quickly assessing plant biomass or differentiating crop types. Social scientists have leveraged multispectral images for tasks such as distinguishing urban versus rural land, mapping water bodies, or identifying agricultural areas. Because multispectral data is so common, throughout this chapter many techniques (cloud masking, classification, etc.) will be illustrated with multispectral examples.
Hyperspectral Imagery
Hyperspectral satellites capture a much more detailed spectral signature of the Earth by measuring reflectance in hundreds of narrow, contiguous bands (often 10–20 nm wide or narrower). Instead of lumping all green light into one broad band, for instance, a hyperspectral sensor might have dozens of bands within the green wavelength range. This ultra-fine spectral resolution enables the detection of subtle differences in material properties. For example, different minerals or plant species that look the same in a few broad bands can often be distinguished by their hyperspectral reflectance curves. Early hyperspectral satellite missions like NASA’s Hyperion (launched in 2000 on EO-1) had ~220 bands at 30 m resolution. Newer instruments (e.g. Italy’s PRISMA or Germany’s EnMAP) continue to advance hyperspectral imaging.
The main difference between multispectral and hyperspectral is thus the number and width of bands. Multispectral = fewer (e.g. 3–10) broad bands; hyperspectral = tens to hundreds of narrow bands. Hyperspectral data provides a quasi-continuous spectrum for each pixel, which can be thought of as having a detailed reflectance curve – almost like having a reflectance spectrum from a lab for each point on the ground. This rich data opens up possibilities for spectral unmixing (teasing apart sub-pixel components), identifying specific materials (like distinguishing tree species or detecting pollutants), and other advanced analyses not possible with coarser spectral data.
However, hyperspectral images produce big data: hundreds of bands mean large file sizes and more complex processing. Also, hyperspectral sensors often have coarser spatial resolution and narrower swath widths (covering smaller areas per image) due to technical tradeoffs. For these reasons, hyperspectral satellite use in social applications has been somewhat limited to date, but it is growing as more data become available and computing power increases. Hyperspectral data has shown promise for tasks like detailed urban material mapping (e.g., distinguishing roofing materials), agricultural monitoring (identifying crop stress or crop types), and environmental hazard detection (like locating chemical spills or differentiating soil types). As hyperspectral imagery becomes more readily accessible (e.g., through government missions or commercial providers), social scientists may increasingly integrate it, for instance, to refine land use classifications or to detect environmental conditions that impact human populations (such as water quality or soil contamination) that broad bands might miss.
To summarize, multispectral imagery offers a general-purpose, high-level view with manageable data sizes and is excellent for broad land cover mapping and tracking change over time. Hyperspectral imagery provides depth and detail, capturing the “spectral fingerprints” of surfaces, which can greatly enhance analysis where subtle spectral differences matter – albeit with greater processing complexity. In practice, many projects may start with multispectral data, and use hyperspectral for specific areas or questions requiring finer spectral discrimination.
Thermal Infrared Imagery
Not all satellite imaging is about reflected sunlight. Thermal infrared (TIR) sensors detect the heat emitted by objects on Earth in the form of longwave infrared radiation (approximately 8–14 µm wavelength). In effect, thermal sensors measure surface temperature or thermal properties of surfaces. Thermal imagery is different from visible/NIR imagery because it captures emitted radiation (from the Earth’s warmth) rather than reflected solar radiation. This makes thermal data extremely useful for studying phenomena related to heat: urban heat islands, wildfires, industrial activity, drought stress, and more.
Many multispectral satellites include one or two thermal bands (for instance, Landsat 8 has thermal bands 10 and 11 at 100 m resolution). These allow computation of land surface temperature after appropriate calibration and atmospheric correction. There are also dedicated thermal missions (like NASA’s ECOSTRESS on the space station, or the planned Landsat Next adding more thermal capabilities). Thermal data typically has coarser spatial resolution than visible bands because longer-wavelength sensors are harder to design at high resolution, but they provide critical environmental indicators.
In urban and social science contexts, thermal imagery is often used to map Urban Heat Islands (UHI) – areas in cities that become significantly hotter than their rural surroundings due to built environment and lack of vegetation. For example, NASA’s MODIS (1 km resolution thermal) and Landsat (30–100 m thermal) have been used globally to monitor urban heat patterns. Such data, combined with socio-demographic data, help identify vulnerable neighborhoods and inform heat mitigation strategies. Thermal anomalies from satellite data can also indicate fires (agricultural burning, forest fires, even warfare-related fires), power plant and factory activity (hot effluent water or thermal plumes), or geothermal activity. In agriculture, thermal infrared is used to estimate evapotranspiration and plant water stress, which relate to drought impacts on communities.
One advantage of thermal imagery is that it can be acquired day or night (the Earth emits heat all the time). By day, solar heating dominates; at night, you see how quickly areas cool, which relates to materials and can highlight urban areas retaining heat. For example, satellite thermal mapping has shown the extent and persistence of heat islands in major cities, which correlates with lack of green space and certain land uses. Such analysis often enters public policy by informing urban planning (planting trees, cool roof initiatives) and public health interventions during heat waves.
Example – Urban Heat: Using Landsat thermal data, one can create a land surface temperature map of a city. This might show, for instance, that dense downtown areas are 5°C hotter than a suburban park at noon. Researchers then overlay population data to find that lower-income districts also have higher temperatures, possibly due to fewer trees. This integration (discussed more later) illustrates how thermal remote sensing feeds into social vulnerability assessments. As NASA’s Applied Sciences training notes, “Remote sensing provides global, timely, objective observations to monitor the effects of urban heat islands over time. Thermal mapping from satellites can be used to monitor land surface temperature (LST), while optical data can inform where land cover has changed. Once UHIs have been mapped, incorporating socioeconomic data (population, demographics, health) into heat vulnerability indices can guide interventions to manage heat risks.”.
In summary, thermal infrared imagery adds a crucial dimension to satellite data – the energetic or temperature dimension – which is highly relevant for environmental aspects of social science (climate adaptation, energy use, disaster response). It often complements the reflective optical data; for instance, a city may look similar to a forest in a true-color image (both appear green if trees are present), but in thermal the city will usually appear much warmer than the forest. Social scientists using thermal data should be aware of calibration (converting sensor data to actual temperature) and the influence of factors like surface emissivity and atmospheric humidity on the readings. Many platforms (like Google Earth Engine) now provide ready-to-use land surface temperature products derived from thermal bands, simplifying the use of these data for non-specialists.
Synthetic Aperture Radar (SAR)
A very different type of satellite imagery comes not from passive observation of sunlight or heat, but from active microwave radar signals. Synthetic Aperture Radar (SAR) satellites emit pulses of radio waves toward the Earth and measure the reflections (backscatter) that return. By processing the returning signal, they produce images of the Earth’s surface. SAR operates in the microwave portion of the spectrum (e.g., wavelengths of a few centimeters), and importantly, it does not rely on sunlight or clear skies. SAR can see through clouds, smoke, and at night just as well as in daylight. This all-weather, 24-hour capability makes SAR extremely valuable for consistent monitoring – for example, in tropical regions with frequent cloud cover, or for detecting flooding during a storm when optical satellites see only clouds.
In a SAR image, pixel brightness corresponds to the strength of radar backscatter. Roughly speaking, metal objects, rough surfaces, and urban areas tend to appear bright (strong reflections), whereas smooth surfaces like calm water appear dark (the radar reflects away from the sensor). One often-cited example: a smooth lake will be black in a SAR image, while in an optical image it might reflect the sky and appear blue. Buildings and infrastructure cause bright spots and speckled textures in SAR. SAR signals are also sensitive to structure – for instance, double-bounce reflections (from right-angle structures like building walls and the ground) enhance urban signatures. Because SAR uses a side-looking geometry and long wavelength, the images have a characteristic look with geometric distortions (layover, foreshortening) that differ from optical perspective.
Modern SAR satellites include the European Space Agency’s Sentinel-1 (C-band SAR, ~5.6 cm wavelength, 10 m resolution) and various commercial systems at even higher resolutions. SAR data is typically polarized (sent and received either horizontally or vertically polarized waves), yielding channels like HH, HV, VV, which can provide additional information about surface properties (e.g., vegetation vs. bare soil). For example, Sentinel-1 collects dual polarization (VV and VH) which users can analyze for land cover classification (VV might respond strongly to surfaces like water or smooth ground when calm, whereas VH picks up volume scattering from vegetation).
From a social science perspective, SAR has been used in applications such as flood mapping (radar can see flooded areas under cloud cover – crucial during hurricanes or monsoons for disaster response), deforestation monitoring (radar penetrates the canopy to some extent and can detect forest clearing even when clouds persist; also changes in backscatter indicate forest to bare ground conversion), urban change detection (detecting new building construction or expansion by the change in reflective patterns), and security/conflict monitoring (SAR can detect disturbances, vehicle movement tracks, or even ground deformation from explosions). Because of its ability to detect changes in surface texture and moisture, SAR has proven very complementary to optical data. A common approach is to combine SAR and optical imagery to improve classification accuracy – e.g., using Sentinel-1 (SAR) plus Sentinel-2 (optical) to map flooded vs. non-flooded urban areas, or combining SAR’s sensitivity to structure with optical spectral information for detailed land use mapping.
It’s worth noting that SAR images look unfamiliar to the untrained eye. They are typically displayed in grayscale or sometimes as false-color composites of different polarizations or dates. A beginner might notice a speckled “salt and pepper” noise in SAR images – this is inherent speckle noise from the coherent nature of radar (constructive and destructive interference of the signal). Speckle can be reduced by filtering techniques (e.g., Lee, Frost, Gamma-MAP filters) at the cost of some resolution. Despite the noise, SAR carries rich info: bright spots could indicate metal roofs or vehicles, linear bright features might be railways or fences, random bright/dark scatter indicates certain crop types, etc. Interpreting SAR often requires some learning, but quantitative analysis using backscatter values or indices is effective.
In technical terms, synthetic aperture radar refers to the method of achieving high resolution by moving the antenna (on the satellite) and synthesizing a long antenna by signal processing. As the SkyWatch platform succinctly explains: “A SAR image is created by sending successive radio waves to illuminate a target scene and receiving and recording the echoes of each pulse. By moving the sensor (on a satellite) and combining the returned signals, SAR simulates having a very large antenna, thus achieving high resolution in the along-track direction.” The result: detailed radar images of Earth’s surface structure.
SAR’s independence from sunlight means it is often used for continuous monitoring. For example, the Brazilian Amazon deforestation policing uses daily radar scans to catch illegal clearing even in cloudy seasons. During the 2022 conflict in Ukraine, SAR images from Sentinel-1 were used to identify floodings (from deliberate dam breaks) and to monitor movements when optical imaging was hampered by weather or darkness. In a development context, SAR has been used to measure rice paddy extent and even growth stages (radar backscatter changes with plant structure and water inundation). These examples show why SAR is a powerful addition to the social science data arsenal, especially when reliable, frequent observation is needed.
In summary, SAR imagery provides an active sensing perspective, excels at structural and moisture-related detection, penetrates clouds, and offers day/night coverage. It complements optical imagery (multispectral/hyperspectral) which provides rich spectral detail but only under clear daylight conditions. Together, these different data types can give a comprehensive view: optical for what things are (material, color), thermal for how hot they are, and SAR for shape/structure and moisture. The next sections will discuss how we prepare these data for analysis and then extract meaningful information for social science questions.
11.2 Image Preprocessing: Making Satellite Data Analysis-Ready
Raw satellite images are not immediately ready for analysis – they require a series of preprocessing steps to correct for sensor distortions, align with geographic coordinates, filter out noise, and focus on valid data (e.g., clear pixels vs. clouds). Preprocessing is a crucial part of any remote sensing workflow, as it ensures that downstream analyses (like computing indices, detecting changes, or training models) are using consistent and reliable input. In this section, we cover four key preprocessing steps:
- Georeferencing (Geocoding and Orthorectification) – Ensuring each image pixel is correctly located on the Earth’s surface, with map coordinates.
- Radiometric Calibration and Normalization – Converting raw sensor data (DN values) to meaningful physical units (like reflectance or temperature) and adjusting images so they can be compared consistently (for example, across time or between sensors).
- Noise Reduction – Removing or mitigating sensor-induced noise and artifacts, including speckle in SAR or striping in optical, to improve image quality.
- Cloud and Shadow Masking – Identifying and masking out pixels obscured by clouds (or their shadows) in optical imagery, since they hide the surface of interest.
Performing these steps creates what is often called an analysis-ready dataset (ARD). Many modern satellite products (like USGS Landsat Level-2 or Sentinel-2 Level-2A) are provided in pre-corrected form (orthorectified, surface reflectance calibrated, with quality masks for clouds), which simplifies preprocessing for users. However, it’s important to understand these processes, especially if working with imagery that is not already corrected or if integrating data from multiple sources.
Georeferencing and Orthorectification
Georeferencing means tying an image to real-world map coordinates. A georeferenced image has been positioned and scaled such that each pixel corresponds to a specific latitude-longitude (or other coordinate system) location on Earth. In practical terms, this involves assigning map projection information and possibly warping the image so that it aligns with known ground control points or a digital elevation model (for terrain correction). Without georeferencing, an image is just a picture with no indication of where on Earth it belongs; with georeferencing, it can be layered in a Geographic Information System (GIS) with other spatial data.
Most satellite imagery from major providers comes already georeferenced. For example, Landsat Level-1 products have been precision-corrected using ground control and a terrain model, so the images are typically accurate to within tens of meters of their true location. The metadata will specify a map projection (like UTM zone, datum WGS84) and each pixel’s coordinates can be calculated. Orthorectification is a related process that corrects for sensor viewing angle and terrain variation, ensuring that the final image has a uniform scale and aligns with maps. Orthorectification removes effects like parallax (where a mountain might appear displaced in an un-rectified image). It uses a digital elevation model (DEM) to adjust the image so that it represents a view as if taken from directly overhead every point.
From a user perspective, if you obtain data from platforms like Google Earth Engine or Sentinel Hub, the imagery is typically pre-georeferenced and orthorectified. You can load it into R or Python and it will already have coordinate reference system (CRS) info. If working with raw imagery (say, from a new sensor or historical archive), you might need to georeference it. This can involve identifying a few known locations on the image (ground control points) and applying a transformation.
The U.S. Geological Survey (USGS) defines georeferencing as: “the internal coordinate system of a digital map or aerial photo is related to a ground system of geographic coordinates. A georeferenced image has been tied to a known Earth coordinate system, so you can determine where every point on the image is located on Earth.”. Essentially, georeferencing answers the question “where is this pixel on Earth?” by assigning real coordinates.
In code, georeferencing is usually handled under the hood. For instance, using the rasterio
library in Python to open a GeoTIFF image:
import rasterio
= rasterio.open("sat_image.tif")
dataset print(dataset.crs)
print(dataset.transform)
This might output a CRS like EPSG:32633 (UTM zone 33N) and an affine transform giving the top-left corner coords and pixel size, etc. Those define how pixel (row, col) translates to (x, y) coordinates. In R’s terra package, a SpatRaster
object similarly carries its georeference:
library(terra)
<- rast("sat_image.tif")
img crs(img) # coordinate reference system
ext(img) # spatial extent (min/max coords)
If an image isn’t georeferenced (e.g., a generic JPG from a website), you could use tools like QGIS or the georeference
functions to assign coordinates, but for our purposes we assume most scientific data is already referenced.
Orthorectification specifics: This matters especially in mountainous areas or if using high-resolution images. Without orthorectification, features at different elevations might not line up correctly with maps (for instance, buildings might lean or shift). Orthorectified products incorporate terrain correction. For example, ESA provides Sentinel-2 Level-2A which is orthorectified using a DEM. If you ever use SAR images, you’ll often apply an orthorectification (terrain correction) step after processing, to remove foreshortening and layover as much as possible when creating a map-ready image.
In sum, when you load analysis-ready imagery in R/Python, check the CRS and resolution – that tells you if the data is properly georeferenced. Misalignment issues (e.g., roads not overlaying correctly on a basemap) indicate something is off with georeferencing (maybe a wrong projection or no orthorectification). With good practice, you ensure all imagery is georeferenced to the same projection before performing multi-source analysis.
Radiometric Calibration and Normalization
Satellites detect radiation and record it as digital numbers (DN) – essentially counts from the sensor. Radiometric calibration is the process of converting those raw counts into physical units, like radiance (W/m²/sr/µm) or reflectance (unitless fraction of sunlight reflected). Calibration accounts for sensor sensitivity, calibration coefficients, solar irradiance, and atmospheric effects. Most users don’t need to manually calibrate if they use higher-level products: for example, Landsat Level-2 gives surface reflectance images where each pixel’s value (after scaling) represents reflectance (0 to 1, scaled by some factor) in that band. Similarly, MODIS provides temperature in Kelvin for its thermal bands after processing.
Radiometric normalization refers to making imagery from different times or sensors comparable by adjusting for differences in lighting, sensor gains, or atmospheric conditions. For instance, two images of the same location on different dates might have slight brightness differences due to sun angle or haze; normalization techniques can adjust one image to match the other’s histogram or reflectance scale. This is particularly important in change detection, so that changes in pixel values reflect real surface change, not differences in imaging conditions. A simple example is adjusting an image so that known invariant features (like deep water or bare sand) have the same reflectance in both images.
When dealing with multi-temporal data, one often uses an atmospheric correction to get surface reflectance, then perhaps a relative normalization. There are automated tools (e.g., the Sen2Cor processor for Sentinel-2, or USGS’s L8SR for Landsat) which output images where each band’s values are essentially reflectance (%). If starting with top-of-atmosphere (TOA) reflectance or radiance, one might have to either apply a correction or at least be cautious in analysis (e.g., NDVI using TOA is usually fine for many purposes, but subtle indices might require surface reflectance).
In R, the raster
/terra
packages don’t automatically calibrate; you rely on data that is already calibrated (GeoTIFFs often have scaling factors). In Python, rasterio
will give you the values stored – if the metadata says scale=0.0001 and offset=0, you apply that to get reflectance. Some libraries like satpy or snappy (for Sentinel) can perform calibration.
Normalization examples: If you’re doing a mosaic of several images, you might see seams due to slight radiometric differences. You can normalize by adjusting brightness and contrast of one image to blend with another. There are more sophisticated methods like histogram matching or using regression based on overlapping areas to match one image to another. In change detection, relative radiometric normalization picks one image as reference and adjusts others accordingly.
If you have images with different sensors, normalization helps too. For example, Landsat and Sentinel-2 both provide reflectance, but their spectral band definitions differ slightly. You might normalize one to the other through linear transformations or band math so that a given land cover has similar values in both.
In deep learning or big data contexts, normalization also means scaling inputs for model training (e.g., mean-centering and variance scaling). A remote sensing specific challenge is that pixel distributions are often skewed (reflectance is bounded 0–1 and typically a lot of low values with some high). But this is a different aspect – still, one must often normalize band values for algorithms to converge (common to make each band mean 0, std 1 for the training dataset, or use min-max scaling).
For practical purposes, as a social scientist you should ensure:
- Use Surface Reflectance when possible: This removes atmospheric distortions (haze, etc.) and makes multi-date comparisons more robust. If using GEE or NASA products, look for SR (surface reflectance) in product names.
- Consistent Units: If combining data from multiple sensors (say MODIS NDVI with Landsat NDVI), be aware of differences. You might normalize one’s range to another if needed.
- Clouds and shadows (discussed next) also affect radiometry – obviously you can’t calibrate a cloud to surface reflectance; you must mask it out.
In summary, radiometric preprocessing converts raw imagery to physically meaningful measurements and ensures consistency across data sets. As one remote sensing guide puts it: “Radiometric normalization makes it possible to work on images where thresholds used for separating features are relevant and robust” – meaning your classification or change detection won’t be thrown off by one image being slightly darker or lighter due to external factors.
Noise Reduction and Image Enhancement
Satellite images often contain various kinds of noise or artifacts. Removing or reducing these is important to avoid false signals in analysis. The types of noise depend on the sensor:
- For optical images, common issues include striping or banding (from detector calibration differences, seen in older Landsat), random sensor noise (individual pixel anomalies), or blurriness (due to platform motion or atmosphere). Techniques like destriping algorithms or simple filters (mean, median) can help. Modern sensors are well-calibrated, so striping is rare (Landsat 7’s scan line corrector error is an exception, but that’s data gaps rather than noise).
- For radar (SAR) images, speckle noise is a major issue. SAR images have a grainy appearance because each pixel’s backscatter is an aggregate of many tiny scatterers, producing interference. Speckle does not reflect actual texture of the surface, so it can hinder object recognition or classification.
Speckle filtering is often applied to SAR data. There are several filter algorithms: Lee filter, Frost filter, Gamma-MAP filter, median filter, etc., each balancing noise reduction with edge preservation. For instance, the Lee filter uses a moving window and a statistical model to dampen speckle in relatively uniform areas while preserving edges. The Gamma MAP (Maximum A Posteriori) filter uses a Bayesian approach considering the distribution of radar backscatter. A median filter (taking the median of a window of pixels) is a simpler approach that can eliminate pepper-and-salt noise while keeping edges sharper than a mean filter would. One must choose the window size and filter type based on the application: a larger window removes more speckle but also blurs fine details.
In R, packages like terra or raster don’t have built-in SAR speckle filters, but one can apply focal filters. There are also specialized packages (e.g., RStoolbox has some filters; or one can call Orfeo Toolbox algorithms). In Python, one could use scipy or opencv for filtering arrays, or the snappy (ESA Sentinel Application Platform API) for SAR-specific filters.
Beyond speckle, thermal noise in SAR (a background signal bias) is often removed using calibration data. For Sentinel-1, applying the orbit file and radiometric calibration (provided in ESA’s toolbox or GEE) will remove much of that noise.
Optical image noise can often be handled by standard image processing: smoothing filters (to remove high-frequency noise), denoising algorithms like bilateral filters (smooth noise while preserving edges) or more advanced wavelet-based denoising. However, aggressive smoothing can wipe out small features like narrow roads or small buildings, which might be precisely what social science research cares about (e.g., mapping informal settlements). So a balance is needed.
Example – Denoising: Suppose we have a series of night-time light images (from the VIIRS sensor) that show city lights. These often have some sensor noise causing faint speckles in rural dark areas. We might apply a simple threshold or median filter to remove isolated lit pixels that are likely noise. Or if working with a land cover classification from a classifier, we might do a mode filter on the classified map to clean up isolated misclassified pixels (e.g., a single pixel of “water” in the middle of a large “forest” region could be recoded to the majority class around it).
Image enhancement goes hand in hand with noise reduction. This could mean contrast stretching (improving visibility of certain ranges), edge enhancement (sharpening boundaries), or principal component analysis (to separate signal from noise in multi-band data). While these steps may not directly extract information, they can make visual interpretation easier or improve the performance of classification algorithms.
One specific technique for normalization and enhancement in multi-temporal sets is to use invariant pixels to adjust imagery. Also, if creating a composite (mosaic) of images, one might adjust the radiometry so that the seams aren’t noticeable (e.g., when creating a large map from multiple scenes).
In summary, noise reduction ensures we’re analyzing real ground phenomena rather than sensor quirks. A SAR specialist once put it: “SAR images have inherent salt-and-pepper texturing called speckles which degrade the quality and make interpretation difficult. Speckles are caused by random constructive and destructive interference of return waves in each resolution cell. Speckle noise reduction can be applied by spatial filtering or multilook processing.”. The same principle applies across remote sensing data: identify the noise and apply appropriate filters, while being cautious not to remove actual data. For the social scientist, using imagery that has passed through these preprocessing steps (like official analysis-ready products) is often easiest. If you do it manually, document the steps (e.g., “a 3x3 median filter applied to reduce noise”).
Cloud and Shadow Masking
Clouds are the bane of optical satellite imagery. A cloud can completely obscure the land below, rendering those pixel values useless for ground analysis. Even thin clouds or haze affect the spectral values. Therefore, for any optical dataset, a critical preprocessing step is cloud masking – identifying which pixels are contaminated by clouds (or cloud shadows) and excluding or correcting them.
Most satellite products come with cloud flags. For example, Landsat Level-2 data includes a Quality Assessment (QA) band where each pixel has bits indicating cloud, cloud shadow, snow, etc. Sentinel-2 Level-2A includes a Scene Classification Layer (SCL) that marks clouds and shadows. Cloud masking algorithms like Fmask (Function of Mask, used for Landsat/Sentinel) employ spectral tests to detect clouds and their shadows (often using the cirrus band, thermal band, and visible bands with thresholds). In Google Earth Engine, functions like ee.Algorithms.Landsat.simpleCloudScore
or Sentinel-2 QA60 band are used to filter clouds.
Cloud detection can be challenging – distinguishing bright roofs from clouds, or snow from clouds (both white), is tricky. Modern approaches include machine learning classifiers and even deep learning to identify clouds. Once detected, clouds are typically masked out (set as NA or no-data). One may also attempt cloud removal by filling those gaps with data from other dates (this leads into compositing, discussed below).
As one tutorial notes: “Cloud masking involves identifying and masking out the cloud-affected pixels in the image. Several methods can be used to detect clouds, including thresholding, supervised classification, and machine learning techniques. The masked areas can then be filled in or excluded from analysis.”. Simple thresholding might use the fact that clouds are very bright in visible and very cold in thermal (if thermal band exists). But dynamic thresholds and contextual algorithms work better.
Cloud shadows appear as dark areas displaced from the cloud (depending on sun angle). They can be mistaken for water or simply as false change. Thus, many cloud mask algorithms also predict where the shadow will be (using solar geometry) and mark those pixels too. These should also be masked or treated carefully, as they darken the surface reflectance significantly.
Handling clouds: If you have a single image with some clouds, you might mask them and just accept missing data in those areas. If you need a complete image, you can use a composite from multiple dates – for example, a median composite over a month can eliminate clouds by taking the median value per pixel from all clear observations (clouds being outliers). The Maximum Value Composite (MVC) often used for NDVI chooses the max NDVI per pixel over a period, under the assumption that clouds (and cloud shadows) lower NDVI, so the max is likely cloud-free vegetation value. Google Earth Engine’s default Sentinel-2 imagery often uses such approaches to provide nearly cloud-free mosaics by combining images.
However, be cautious: combining dates means you lose the specific date information. For change detection, you may need a specific time, so you can’t just composite away all clouds without possibly mixing changes. For long-term monitoring, gap-filling with nearby dates or interpolation is common.
Cloud and shadow handling depends on your goals:
- For mapping land cover (which doesn’t require the exact date), you can compile multi-date imagery to get a cloud-free view.
- For time-series (e.g., crop phenology or urban change), you might implement cloud masking and then use only cloud-free observations or fill gaps by interpolation of time-series (e.g., using linear interpolation or more advanced temporal smoothing algorithms).
- For policy applications like disaster response, often you’ll use radar or wait for a clear optical view. In deforestation alert systems (like Global Forest Watch’s GLAD alerts), they use an algorithm that can flag likely clearing even with partial data but will confirm once a clear observation comes through.
Many platforms and libraries help with cloud masking. In R, the sen2r package can apply cloud masks for Sentinel-2. In Python, Google Earth Engine (accessible via geemap
or the ee
API) can apply its built-in cloud masks:
# Example: filter cloudy pixels in an Earth Engine image
= ee.ImageCollection("COPERNICUS/S2_SR")\
s2 "2021-07-01", "2021-07-31")\
.filterDate(\
.filterBounds(area_of_interest)map(lambda img: img.updateMask(img.select("QA60").eq(0)))
.# QA60 band has cloud mask bits, eq(0) keeps clear pixels
Or using a separate cloud probability product (Sentinel-2 has S2 Cloud Probability dataset that can be thresholded).
To reiterate the importance, here’s a straightforward statement: “Clouds can be a significant problem in multispectral data as they obscure land surface features. Common approaches for handling clouds are: (1) cloud masking – identify and mask out cloud pixels via thresholding or classification, (2) cloud removal – interpolate over cloud pixels from surrounding values or other images (with caution), (3) cloud compositing – merge multiple images to produce a cloud-free composite.”. The choice depends on research needs.
Lastly, note that haze (thin clouds or pollution) might not be masked by simple cloud masks but still affects data. Some processing includes haze correction (dark object subtraction etc.), which falls under radiometric correction. But if you see slight hazy areas, it may reduce contrast and NDVI. Advanced users might do a haze removal algorithm (e.g., ERDAS’s ATCOR or ArcGIS has some haze removal).
For completeness, shadow masking can also be applied not just for clouds but any long shadow (e.g., from mountains or tall buildings) if analyzing reflectance. The approach is similar: find very low value regions adjacent to bright objects. But mountain shadows are static and can be dealt with by topographic correction if needed.
In summary, cloud and shadow masking is an indispensable preprocessing step for optical imagery analysis. Ignoring it can lead to wildly incorrect interpretations (e.g., a cloud could be classified as a snowy field or a water body if not masked). In social science contexts, where you might be correlating satellite-derived variables to socioeconomic data, failing to remove cloudy pixels could insert noise or bias. Fortunately, many data sources supply good masks – using them diligently will save a lot of headaches.
11.3 Accessing Satellite Data: APIs and Packages in R and Python
Acquiring satellite images and handling large image datasets used to be a major barrier for non-specialists. Today, there are many convenient tools that allow users to search, download, and even directly process satellite data through APIs (Application Programming Interfaces) and high-level libraries. In this section, we introduce some of the main platforms and software for accessing imagery, with emphasis on those that integrate well with R and Python. These include cloud-based repositories (like Google Earth Engine and Sentinel Hub), as well as libraries to work with image files (like the raster
/terra
packages in R and rasterio
in Python). We also discuss how to obtain specific data (like Landsat or Sentinel) and mention common data formats (GeoTIFF, NetCDF) and their handling.
Google Earth Engine (GEE)
Google Earth Engine is a powerful cloud-based platform that hosts a multi-petabyte catalog of satellite imagery and geospatial datasets (Landsat, Sentinel, MODIS, climate data, etc.) and allows users to run analyses on Google’s servers using a high-level JavaScript or Python API. GEE has been revolutionary for accessibility: instead of downloading hundreds of GB of data to your computer, you can analyze it on the cloud, pulling out just the results or small subsets you need.
For social scientists, Earth Engine is very appealing because it provides ready access to time series of images. For example, you can get the complete 30+ year Landsat archive for an area to study land cover change without manually searching and downloading each scene. Earth Engine also includes many precomputed products (like the Global Forest Change maps, nighttime lights composites, land cover maps, etc.).
While Earth Engine’s native scripting uses JavaScript in the Code Editor, one can use Python via the earthengine-api
package. Additionally, in R, the community has developed the rgee package which allows you to interact with GEE using R syntax. There’s also geemap (for Python) and mapview/leaflet in R to visualize results interactively.
Example workflow (Python with GEE):
import ee
# authenticate first time
ee.Initialize() # Define an area of interest and time range
= ee.Geometry.Point([35.2, 0.5]).buffer(5000) # 5km buffer around a point
roi = (ee.ImageCollection('COPERNICUS/S2_SR') # Sentinel-2 surface reflectance
image
.filterBounds(roi)'2021-01-01', '2021-12-31')
.filterDate(filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20))
.
.median()# take median to get composite
) = image.normalizedDifference(['B8','B4']) # compute NDVI from NIR (B8) and Red (B4)
ndvi = ndvi.getDownloadURL({'scale': 10, 'region': roi})
url print(url) # This gives a link to download the NDVI image as GeoTIFF
This snippet finds Sentinel-2 images for 2021 with <20% cloud over a region, computes a median composite, then NDVI, and finally provides a download link.
In R with rgee the logic is similar:
library(rgee)
ee_Initialize()
<- ee$Geometry$Point(c(35.2, 0.5))$buffer(5000)
roi <- ee$ImageCollection("COPERNICUS/S2_SR")$
col filterBounds(roi)$
filterDate("2021-01-01", "2021-12-31")$
filter(ee$Filter$lt("CLOUDY_PIXEL_PERCENTAGE", 20))
<- col$median()
img <- img$normalizedDifference(c("B8","B4"))
ndvi ee_as_raster(ndvi, region = roi, scale = 10, dsn = "ndvi.tif")
This would save the NDVI GeoTIFF locally via Earth Engine’s processing.
Using Earth Engine, one can easily integrate other data (population grids, roads, air pollution datasets etc.) which it has cataloged – useful for interdisciplinary work. One limitation is that Earth Engine is free for research/non-commercial use, but requires sign-up, and some sensitive data (like very high resolution commercial imagery) is not openly available. Also, you need internet access; it’s not an offline tool.
Nonetheless, Earth Engine has democratized remote sensing. Many social science studies (e.g., on deforestation, water resources, urbanization) now cite using GEE to obtain and process imagery because it dramatically cuts down the time and expertise needed to work with large remote sensing data. If you have programming experience, GEE is a top choice.
Sentinel Hub and Other Data APIs
Sentinel Hub is another popular service that provides API access to satellite data, notably the European Copernicus data (Sentinel-1 SAR, Sentinel-2 multispectral, Sentinel-3, etc.) and some commercial imagery. It allows users to download processed images via web requests, and it has a Python package (sentinelhub
) to interface with its API. Sentinel Hub can perform on-the-fly processing (like band selection, indices, mosaicking) and is known for its OGC APIs (WMS/WMTS for map tiles, and a REST API for raw data).
For example, using the sentinelhub Python package, one can specify a time range, area, and request a cloud-free mosaic or specific band of Sentinel-2. Some features require an account and possibly a subscription for high volumes or commercial data. However, for research and moderate use of free data, it’s very useful. The Copernicus Data Space Ecosystem (recently introduced by ESA) also provides API access to Sentinel data; there’s an R package CDSE to interact with it.
USGS APIs: The USGS EarthExplorer provides a web interface for Landsat, etc., but there are also APIs like the USGS Earth Resources Observation Science (EROS) API or the NASA/USGS Harmonized Landsat Sentinel-2 (HLS) data via AWS. These are a bit more involved. Alternatively, one can use Python libraries like pystac or landsatxplore to search and download imagery.
NASA Earthdata: For MODIS, VIIRS and others, NASA’s APIs (or the R MODIStsp package, or Python’s NASA Harmony API) can be used to download data programmatically. Many NASA datasets are also on AWS or Google Cloud public buckets, accessible via command-line or simple HTTP.
The advantage of these APIs and cloud access is that you don’t manually visit websites and click dozens of times. Instead, you can script the data retrieval, which is reproducible and can be integrated into analysis notebooks. For instance, an R script could query “give me all Landsat scenes for X country in 2020 with <10% cloud” and download them, rather than doing this manually.
One should note, though: dealing directly with downloads means you need storage and possibly have to handle many files. That’s where using cloud processing (like GEE or Sentinel Hub) might be preferred if possible, as you can let them handle the heavy lifting and either just retrieve results or only the subset you need.
Working with Raster Data in R: raster and terra
Once you have image files (GeoTIFFs, etc.), or if you download them within R, the primary tools are the spatial raster packages. The raster package (by Robert Hijmans) has been a staple for years, and recently the terra package is its newer replacement with similar functionality but better performance and handling of large data.
These packages allow you to read rasters into R objects, manipulate them (crop, resample, project), and do computations across layers or rasters (map algebra).
Reading an image (e.g., a GeoTIFF from Landsat) with terra:
library(terra)
<- rast("LC08_L2SP_..._SR.tif") # Landsat 8 surface reflectance product
img print(img)
The rast
function reads the file and creates a SpatRaster
object. Printing it shows the number of layers (bands), resolution, extent, and CRS. For a multi-band file, img
might have (for Landsat) 6 bands (coastal, blue, green, red, NIR, SWIR, etc., depending on what’s included). You can assign names or use indices:
names(img)
# e.g., "SR_B2", "SR_B3", ..., "SR_B7"
<- img[["SR_B5"]] # Landsat-8: Band 5 is NIR
nir <- img[["SR_B4"]] # Band 4 is Red
red <- (nir - red) / (nir + red)
ndvi plot(ndvi)
Here we computed NDVI at 30m resolution. The code leverages terra’s ability to do arithmetic directly on rasters (it handles alignment of grids, etc., if they match).
Terra (and raster) have a rich set of functions: project()
to reproject to another CRS, crop()
to spatially subset, resample()
to change resolution or align grids, focal()
to apply moving window filters, predict()
to apply a model (like a random forest classifier) to each pixel for classification, and many more. The terra package description highlights: “Methods for spatial data analysis with raster (grid) data include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate using regression or machine learning models for spatial prediction, including with satellite remote sensing data.”. It is designed to handle large rasters by processing chunk by chunk if they don’t fit in memory (though one must be mindful of extremely large data).
The older raster package has similar functions (with stack
and brick
for multi-layer rasters). Terra is faster and better for multi-core processing.
Example in R: Suppose we want to get the mean reflectance per land cover class in an image. We have a classified raster (say, from an external source or previous step) and the image raster:
<- rast("landcover_map.tif") # land cover classification (e.g., codes for forest, urban, etc.)
lulc <- rast(c("band1.tif","band2.tif",...)) # multi-band image
sat # Compute zonal mean of band values by land cover class:
<- zonal(sat, lulc, fun="mean", na.rm=TRUE)
zonal_stats print(zonal_stats)
This will output a table of mean band values for each class in lulc
. Such zonal operations are common when integrating remote sensing with other spatial data (for instance, computing average NDVI by administrative region or land tenure type).
The sf package (for vector data) often complements raster/terra – for example, you might have household survey locations as an sf points dataframe and want the raster values at those points:
library(sf)
<- st_read("households.shp")
pts <- terra::extract(ndvi, vect(pts))
vals $ndvi <- vals$layer pts
Now each household point has an NDVI value from the raster, which you could then use in statistical analysis (e.g., correlate NDVI with income or something).
A noteworthy mention: R has specialized packages like RSaga, RStoolbox (which wraps many classification and image processing routines), spatialEco, etc., but terra and raster are the go-to for core raster handling.
Working with Raster Data in Python: rasterio and Others
In Python, a primary library for raster data is rasterio (built on GDAL). It allows similar read/write and coordinate handling as terra in R. Python also has high-level frameworks like xarray (especially with rioxarray for geospatial extension) which can treat multi-band rasters or image time series in a convenient way (like multi-dimensional arrays with labeled coordinates).
Reading with rasterio:
import rasterio
= rasterio.open("sentinel2.tif")
dataset print(dataset.count, dataset.width, dataset.height, dataset.crs)
= dataset.read(3) # read band 3 (assuming 1=blue,2=green,3=red,... for example)
red = dataset.read(4) # read band 4
nir = red.astype('float32')
red = nir.astype('float32')
nir = (nir - red) / (nir + red) ndvi
Here, dataset.read(band_index)
returns a numpy array for that band. We converted to float for NDVI calc to avoid integer division issues. If the image has a scale factor (common in reflectance data stored as integers), one must multiply by that factor (e.g., Landsat surface reflectance might be stored as int where value 10000 = reflectance of 1.0). Metadata often contains dataset.meta
or dataset.scales
for such info.
To write the NDVI array back to a file with geo-info, we can take the profile from the dataset:
= dataset.profile
profile =rasterio.float32, count=1)
profile.update(dtypewith rasterio.open('ndvi.tif', 'w', **profile) as dst:
1) dst.write(ndvi.astype(rasterio.float32),
That creates a GeoTIFF of NDVI with the same georeference as the original.
rasterio can also reproject data using rasterio.warp
(or one can use pyproj
/shapely
for vector). For zonal stats in Python, one can use libraries like rasterstats (e.g., from rasterstats import zonal_stats
) which can take a vector (GeoJSON or shapefile) and a raster and compute stats for each polygon.
An increasingly popular approach in Python for multi-band/time is using xarray which handles N-dimensional data with coordinate labels. The library rioxarray integrates rasterio with xarray. For example:
import rioxarray as rxr
= rxr.open_rasterio("sentinel2.tif", masked=True) # returns an xarray.DataArray
s2 = (s2.sel(band=8) - s2.sel(band=4)) / (s2.sel(band=8) + s2.sel(band=4))
ndvi "ndvi.tif") ndvi.rio.to_raster(
This uses band indices or names directly. If the data is a time series (multiple files concatenated as a DataArray with a time dimension), xarray can manage it easily and let you do things like data.mean(dim='time')
or more complex reductions over time.
GDAL: At a lower-level, one can use GDAL in Python (e.g., via osgeo.gdal). But rasterio is generally more Pythonic and easier. GDAL/PROJ are the underlying engines for many geospatial tasks.
Visualization: For viewing images, Python has matplotlib (you can plt.imshow(ndvi, cmap='viridis')
or composite bands into RGB). In a Jupyter environment, you might use ipyleaflet or folium to quickly display georeferenced raster overlays interactively, or export to web maps.
Data Catalogs and Examples
It’s useful to know some typical sources and how to access them:
Landsat: Available via USGS EarthExplorer (bulk download via API keys or via AWS S3 bucket
landsat-pds
). In GEE asLANDSAT/LC08/...
. In Python, one might uselandsatxplore
to search and get download links. There’s also the HLS (Harmonized Landsat Sentinel) which is an analysis-ready product combining Landsat and Sentinel-2 reflectance to a common grid.Sentinel-2: Available via Copernicus Open Access Hub (requires account, but can use Sentinel API or Sentinel Hub). Also on AWS (
sentinel-s2-l2a
bucket). In GEE asCOPERNICUS/S2_SR
. Many use cases: e.g., to get a true color image, one could use Sentinel Hub’s WMS, or directly load bands with rasterio if downloaded.Sentinel-1 (SAR): On GEE as
COPERNICUS/S1_GRD
. There’s a Python library Sentinelsat to search/download from ESA. SAR data often then needs processing (GEE does some GRD backscatter calibration for you).MODIS: Terra and Aqua MODIS daily data accessible via NASA or GEE (
MODIS/006/MOD13Q1
for NDVI, etc.). R’sMODIStsp
can batch download and mosaic MODIS.Nighttime Lights: DMSP OLS (1992-2013) and VIIRS (2012-present) are in GEE (
NOAA/DMSP-OLS/NIGHTTIME_LIGHTS
andNOAA/VIIRS/DNB/MONTHLY_V1
). One can also get annual composites from NOAA’s site. For R/Python, might directly download GeoTIFFs per year.
These tools abstract away a lot of the complexity. As a social scientist, you don’t necessarily need to become an expert in remote sensing software – using these high-level packages, you can retrieve and analyze data within the familiar environment of R or Python. For instance, the terra package’s documentation and tutorials show examples like predicting vegetation biomass from satellite bands using a regression model, which could be analogous to predicting poverty from satellite features – integrating directly with R’s modeling tools.
To highlight the power of integration: in R, you could use caret or randomForest to train a model predicting some field variable (say crop yield) from satellite bands at sample locations, then use predict(model, sat_raster)
to generate a prediction map. Terra handles chunk-wise processing if needed. Similarly in Python, you could use scikit-learn or TensorFlow on raster data by flattening arrays or using patches.
In summary, modern APIs and libraries have made satellite data far more accessible. Researchers can query, download, and preprocess data in a few lines of code, and immediately start analyzing patterns relevant to social science – whether it’s linking greenness indices to economic indicators or measuring how land cover has changed in areas of policy interventions. The next sections will discuss how to turn these preprocessed images into structured information (like maps of land cover or change over time) and demonstrate that with case studies.
11.4 Structuring Image Data: Classification, Detection, and Extraction of Information
Raw images, once preprocessed, are essentially arrays of pixel values. For many social science and policy questions, we need to convert these pixel data into more meaningful indicators or categorical maps – a process often referred to as image structuring or information extraction. Two fundamental approaches are:
- Pixel-level classification: Assigning each pixel to a category (e.g., land cover type such as water, forest, urban, agriculture). The output is a thematic map.
- Object-level analysis and detection: Identifying and delineating objects or features in the imagery, such as buildings, roads, fields, or vehicles, rather than treating every pixel in isolation. This can involve first grouping pixels into segments (object-based image analysis) and then classifying those, or using algorithms (especially deep learning) to detect specific objects.
These approaches yield structured outputs like maps and counts that can be integrated with other data. For example, a classification might produce a map of urban extent, from which you can calculate what percentage of each district is urban. An object detection might count how many tents are in a refugee camp or how many ships are in a port, yielding metrics for analysis.
We will discuss common techniques for image classification (both traditional and modern), land cover mapping, and object detection. We will also illustrate how indices like NDVI or other spectral transforms can serve as intermediate derived variables that feed into classification or directly serve as indicators (e.g., mean NDVI as a proxy for greenspace in a neighborhood).
Pixel-Level Classification and Land Cover Analysis
Image classification is a core task in remote sensing – categorizing all pixels in an image into a finite number of classes. In land cover mapping, these classes might be things like Forest, Water, Built-up, Agriculture, etc., depending on the scheme. Traditionally, classification could be unsupervised (clustering pixel spectra into groups, e.g., k-means, and then labeling those groups) or supervised (using training data of known classes to train a classifier like maximum likelihood, decision tree, random forest, or nowadays CNN).
A classic example is mapping deforestation: you classify an image into forest vs non-forest. Or mapping urban sprawl: classify into urban, vegetation, bare soil, water, etc., for multiple dates and compare.
Supervised classification requires labeled data – either collected from field observations, existing maps, or by manually interpreting some sample pixels. For instance, you might digitize some polygons of “urban” and “agriculture” on a high-res image to use as training for classifying a Landsat image.
In R, packages like RStoolbox provide convenient classification functions. In Python, one might use scikit-learn or specialized libraries (or even GEE’s built-in classifiers if using that).
Example (R using terra + randomForest):
# Assume 'img' is a SpatRaster with bands, and we have training points 'train_pts' with class labels
<- extract(img, train_pts) # get pixel values under training points
df $landcover <- train_pts$class # attach known class label
df<- na.omit(df)
df # Train a random forest model
library(randomForest)
<- randomForest(factor(landcover) ~ B2 + B3 + B4 + B5 + B6 + B7, data=df)
rf_model # Predict on the whole raster
<- predict(img, rf_model) # each pixel gets a class number
pred_class plot(pred_class)
This would yield a raster of classes. Terra’s predict()
understands how to chunk through the raster so we don’t have to manually loop over pixels.
In Python (sklearn) it might be:
from sklearn.ensemble import RandomForestClassifier
import numpy as np
# X_train = pixel values, y_train = labels from training sample
= RandomForestClassifier(n_estimators=100)
rf
rf.fit(X_train, y_train)# To classify entire image:
= dataset.read(list(range(1, dataset.count+1))).reshape((bands, -1)).T # shape (n_pixels, n_bands)
image_array = rf.predict(image_array)
y_pred = y_pred.reshape((dataset.height, dataset.width)) classified
We would then save classified
with rasterio.
Increasingly, machine learning is the go-to for classification. Random forests and support vector machines were popular in the 2000s-2010s and are still very effective for moderate size problems. They can handle numerical multispectral data well. With the rise of deep learning, if one has abundant training data or high resolution imagery, convolutional neural networks (CNN) can be used for semantic segmentation (per-pixel classification with spatial context). We’ll discuss CNNs further later, but one might use frameworks like Keras (with TensorFlow) in R via the keras package, or directly in Python, to train on image patches. For example, an urban mapping CNN might take a 16x16 pixel patch and predict the central pixel’s class (or the whole patch’s mask via U-Net architecture).
One should choose classes that are meaningful and distinguishable. If two land cover types have very similar spectral signatures (e.g., sparse shrubland vs grassland), classification might confuse them, or you might aggregate them into one category unless you have additional info (like seasonality differences).
Accuracy assessment: After classification, it’s important to evaluate accuracy using some validation data (not used in training). Metrics include overall accuracy, class-specific precision/recall or producer’s/user’s accuracy, and kappa statistic. This is common in remote sensing papers to ensure the map quality is known.
The USGS emphasizes that automated classification is common but can be error-prone if classes aren’t spectrally distinct. They highlight the challenges in West Africa mapping where automated methods struggled, requiring visual interpretation aid. In practice, one often incorporates expert knowledge or manual editing especially if high accuracy is needed. For policy, having a human analyst review the automated map and fix obvious errors can improve the product (a semi-automated approach).
Unsupervised methods like k-means clustering or ISOData can group pixels into spectral clusters without prior labels. An analyst then must label each cluster (e.g., cluster 1 = water, cluster 2 = forest, etc.) or merge clusters if needed. This was more common when getting training data was hard. Now, with tools like GEE and widespread reference data, supervised methods are more prevalent.
Object Detection and Feature Extraction
Instead of classifying every pixel, sometimes the goal is to detect specific objects or count them. For example:
- Count how many buildings or houses are in an area (to estimate population or damage from a disaster).
- Identify all roads and map the road network (for accessibility studies).
- Find features like irrigation ponds, solar panels, ships at sea, airplanes on tarmacs, etc.
Traditional object-based image analysis (OBIA) involves first segmenting the image into meaningful objects – for instance, grouping adjacent similar pixels (e.g., all pixels that likely belong to one building or one field). This can be done with algorithms like multi-resolution segmentation or simple connected components on thresholded images. Software like eCognition (commercial) popularized OBIA in the 2000s. In open source, one can use R’s EBImage or Python’s scikit-image for segmentation algorithms. Once you have segments, you compute features for each segment (like its average color, shape metrics, texture) and then classify the segments (maybe with a machine learning model). OBIA can reduce salt-and-pepper effect and incorporate shape/context, which pixel classification lacks (human image interpreters inherently do an object-based approach – looking at shapes, context – as USGS noted for mapping West Africa).
Example OBIA: We have a high-res image in which individual tree crowns or buildings are multiple pixels. We run a segmentation to delineate each building. Then we use a classifier to mark segments as “building” vs “not building” based on their spectral and shape properties. The result is a vector layer of building footprints.
Increasingly, deep learning object detection is used for such tasks:
- CNN classifiers can slide across an image to find objects (like using YOLO or Faster R-CNN frameworks) for things like vehicles or small discrete objects.
- Semantic segmentation CNNs (like U-Net, DeepLab) can outline objects of certain classes (e.g., identify every building pixel, which after processing yields building polygons).
For example, detecting houses in a rural area: A U-Net model trained on labeled building footprints can take an input image (maybe 256x256 pixel tile) and output a mask highlighting building locations. R’s keras interface can be used to implement U-Net (there are Keras examples online for segmentation). In Python, libraries like torchvision or segmentation_models_pytorch have ready architectures.
Another example is using SAR data to detect flooded areas: one could threshold the change between pre- and during-flood SAR images (water has very low backscatter, so new dark areas likely flood) – that’s a simpler form of object extraction (detect contiguous flooded region extents). Indeed, object detection might be as simple as identifying connected pixel regions that meet criteria (like NDVI drop for deforestation patches, etc.).
The results of object detection are often represented as vector data (points, lines, polygons). Those can then be counted or overlayed with other data. For instance, if you detect 100 shelters in a refugee camp from a satellite image, you could multiply by an average household size to estimate population. Or if you detect all roads, you can compute road density per district and correlate with economic indicators.
Case in point: After a conflict event, analysts used high-res satellite images to count destroyed buildings in a town and compare to pre-conflict counts to estimate damage percentage (this was done for Syrian cities, for example). Another case: using high-res imagery and deep learning to detect brick kilns in South Asia for environmental health research.
For many social applications:
- Informal settlement mapping: detecting small densely packed roofs, often using object-based or CNN approaches.
- Agricultural field delineation: OBIA can segment fields and classify crop types. There are open datasets (e.g., Landcover.ai) that use deep learning to delineate buildings, woodlands, and water from aerial images.
- Vehicle detection: from satellite video or frequent images to monitor traffic (an emerging area with very high-res sensors or aerial).
The choice between pixel classification vs. object detection depends on scale and question. If the phenomena are large area classes (forest vs. non-forest), pixel-level might suffice with moderate resolution images. If fine details matter (exact count of structures), object methods with high-res input are needed.
One challenge is getting training data for detection. Initiatives like SpaceNet have released datasets for buildings and roads from satellite images, which spur development of models that others can use. Also, manual digitizing by experts (or even crowdsourcing as done in OpenStreetMap) can provide training labels.
To illustrate in code conceptually: using Python’s opencv for simple template matching (if you had a template of an object):
import cv2
= cv2.imread('house_template.png',0)
template = cv2.imread('village.png',0)
img = cv2.matchTemplate(img, template, cv2.TM_CCOEFF_NORMED)
res = np.where(res >= 0.8) # threshold matches
locations for pt in zip(*locations[::-1]):
+w, pt+h), (0,0,255), 1) cv2.rectangle(img, pt, (pt
This would draw boxes around places matching the template of a house. Template matching is simplistic and not robust to variation, but shows the idea of slide-and-match.
A more robust deep learning approach in Python:
import torch
from torchvision import models, transforms
# Suppose we use a pretrained Faster R-CNN for COCO and fine-tune it for our object
= models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model # (Would need to replace head and train on custom data of e.g. building = class)
eval()
model.# During inference:
= transforms.ToTensor()(cv2.imread('test_image.png'))
image = model([image])[0]
pred = pred['boxes'], pred['scores'], pred['labels']
boxes, scores, labels # Filter by score and label (e.g., building label)
This would output bounding boxes for detected objects. With segmentation models, one would get masks instead.
What’s notable is that computer vision methods have greatly improved the ability to extract information from high-resolution imagery, enabling analyses that were impractical with manual photo-interpretation at large scales. For example, counting every house in a country like Nigeria by manual digitizing would be enormous work, but a CNN can approximate that if trained well, providing new data for population estimates or development indices.
However, high-res imagery is not always freely available for all areas/time (it’s often commercial). Projects like Microsoft’s building footprints use AI on commercial imagery and then open the footprints. Social scientists can leverage such outputs directly instead of doing it from scratch.
Indices and Continuous Variables
Not all structured outputs are categorical classes or object counts. Often we derive continuous variables from images that become part of our analysis. A prime example is NDVI (Normalized Difference Vegetation Index) which is widely used as an indicator of vegetation greenness or biomass. NDVI is simply a combination of red and NIR bands: NDVI = (NIR - Red) / (NIR + Red). This index ranges from -1 to +1 (where higher values ~0.6+ indicate dense green vegetation, values near 0 or negative indicate bare ground, water, or built areas). NDVI can serve as a proxy for crop yield, drought, ecosystem productivity, or even indirectly for economic conditions (there’s research relating growing season NDVI to GDP in agricultural economies, for example).
Social applications of NDVI include using it as a measure of neighborhood greenness (linked to health outcomes or property values), tracking drought impacts on pastoral communities, or relating it to food security metrics. It’s common to integrate NDVI time-series with survey data on livelihoods.
Other indices:
- Normalized Difference Water Index (NDWI) for water bodies (using green & NIR or SWIR).
- Normalized Burn Ratio (NBR) for burned area (using NIR & SWIR).
- Soil Adjusted Vegetation Index (SAVI) which adjusts NDVI for areas with sparse vegetation and soil visible.
- Nighttime lights brightness (not a spectral index, but a data product often used as a proxy for economic activity and electrification).
- Surface Water Occurrence (from algorithms mapping water presence, like JRC’s yearly water maps, used to see if new lakes form or rivers change course – relevant for displacement or land rights if water sources shift).
The structured output might be a map of a continuous variable. For instance, a poverty prediction model might output a raster of predicted consumption expenditure per capita. That’s not a discrete class map but a continuous estimate map. This can then be aggregated by zone (taking mean poverty per district, etc.) or compared to thresholds.
We mention these here because producing such layers often involves the same classification/detection tools but with regression instead of classification. For example, a random forest can be used to predict a continuous variable (like house prices or crop yield) from imagery (this is regression). Or a CNN can be trained to output a continuous value per patch (e.g., a village wealth index) rather than a class.
As an example, researchers famously predicted village-level wealth in Africa by using a CNN on daytime imagery, transferring knowledge from nightlights, and the output was a continuous wealth index that could be mapped. This blurred the line between classification and regression.
The “structuring” in that case was extracting features via CNN and then regressing to a survey-measured number. The final map is structured in the sense it’s now poverty values per location, which can be aggregated or analyzed spatially, rather than just raw pixels.
Summary of Techniques:
- Classification (supervised/unsupervised) yields thematic maps.
- Regression on pixels or segments yields continuous output maps.
- Object detection yields point/box/polygon features with possibly counts or sizes.
- Change detection (which we’ll cover next) often combines classification/regression at multi-dates or does direct image differencing to structure “change” as its own data layer.
Before moving on, it’s good to reinforce that context is key. A pixel-based classification might label a pixel as “urban” just because it’s a certain color (e.g., a concrete lot) whereas an object-based approach might avoid classifying a single pixel of concrete in the middle of a forest as urban because it knows that object is maybe a small rock, not an actual urban region. Likewise, a deep learning model can recognize the shape of a roof and label it as building, something reflectance alone might not confirm if the roof material is similar to surrounding soil.
The field of remote sensing has moved toward more context-aware approaches (hence, popularity of CNNs). As social scientists, adopting these approaches can greatly improve the quality of information extracted, but they do require more advanced methods and training data. Fortunately, many pre-trained models and open datasets exist now, lowering the barrier. The next section on time-series will delve deeper into how we detect changes – which is essentially structuring data across the temporal dimension.
11.5 Time-Series Analysis and Change Detection
One of the greatest strengths of satellite data is the repeated observations over time. This enables time-series analysis – tracking how the Earth (and thus human-relevant conditions) change seasonally, annually, or over decades. In social science contexts, time-series of images can be used to detect events (e.g., an outbreak of conflict leading to village burnings, or construction of new infrastructure), to monitor trends (urban growth, deforestation, crop cycles), or to assess the impact of policies (e.g., did an intervention to protect forests result in a detectable change in forest cover?).
Change detection in remote sensing refers to comparing images from at least two times to identify areas that have changed significantly. There are many techniques for change detection, and choosing one depends on the nature of change and data availability. Broadly, approaches include:
- Image differencing or ratioing: Subtract or ratio one date’s image or band with another’s. For example, computing the difference in NDVI between two dates to see where vegetation increased or decreased. Thresholding that difference image can yield change masks (e.g., areas where NDVI drop > some value might be deforestation or crop harvest). This is a simple pixel-wise method.
- Post-classification comparison: Classify images from each date into thematic maps (e.g., land cover in 2000 and land cover in 2020) and then compare the maps to see transitions (forest -> non-forest, etc.). This is straightforward and has advantage that each map can be independently validated. The disadvantage is errors in each classification can lead to spurious changes (error propagation).
- Time-series algorithms: When many time points are available (like annual data for decades, or frequent Sentinel-2 imagery), algorithms like CCDC (Continuous Change Detection and Classification) or LandTrendr (Landsat-based Detection of Trends in Disturbance and Recovery) fit models to the time-series and attempt to detect breakpoints when a significant change occurred. These can capture gradual changes vs. abrupt ones, and are useful for long-term monitoring (e.g., pinpointing what year a forest was cut, or tracking urban expansion year by year).
- Direct multi-date classification: Stack images from two dates and classify the stack to directly get “change vs no-change” or change categories. For instance, a supervised classifier could be trained to recognize the spectral signature of unchanged forest vs. new deforestation vs. unchanged other classes, etc. Or use principal components on a multi-date composite to highlight change information (often one of the later principal components will emphasize differences between dates).
A fundamental concept is that change detection ideally looks at the same location on different dates. That requires images to be co-registered (georeferenced together) – which we covered in preprocessing. It also helps if they are at similar phenological times or one accounts for seasonal differences.
Example 1: Deforestation Alerts – Systems like Global Forest Watch’s GLAD or Brazil’s DETER use frequent (weekly to monthly) satellite data to spot new forest clearings. One method is to compare a recent image to a reference baseline (like median of previous clear observations). If in the new image the pixel reflectance changed from forest-like to bare-ground-like (e.g., drop in NIR, increase in SWIR, decrease in NDVI), then flag it as loss. This is essentially an image differencing with threshold method, automated and running continuously. These alerts are delivered to authorities within days to weeks of clearing.
Example 2: Urban Growth – Using Landsat archives, one can do post-classification: classify each decade’s image into urban vs non-urban, then overlay to see where urban class appeared in later image that wasn’t urban in earlier image (new urban areas). Alternatively, use change metrics like NDBI (Normalized Difference Built-up Index) or simply track the expansion of high nighttime light areas over time as a proxy for urban growth. Time-series of nighttime lights are commonly used to study economic growth and urbanization patterns – one can see cities brightening and expanding spatially in the light data.
Example 3: Disaster Impact – For a flood, using before and during-event images: do a classification or threshold on during-event image to find water, also maybe threshold the before image for water, then the difference gives flood extent. Similarly for war destruction: a structure counting or classification approach on pre- and post-conflict images can locate where buildings were demolished (like comparing a building footprint map before vs after, or detecting rubble spectral signature). In 2007 Kenya post-election violence, satellites quickly mapped burned areas by comparing imagery from just before and after the riots (active fires and burn scars can be detected by spectral indices or thermal anomalies).
From a methodological viewpoint, many change detection studies apply a change mask approach: compute some difference image (like band differences, or an index difference). If the absolute difference exceeds a threshold, mark as change. The threshold can be statistical (e.g., mean ± 3σ of no-change areas) or determined via training data of known changes. Another approach is image regression or pseudo-change detection: regressing one date’s pixel values from the other’s and looking at residuals (differences). Principle Components Analysis (PCA) on a multi-date stack often yields a component highlighting change (where positive/negative outliers are changes).
LandTrendr and CCDC are worth a note: they model each pixel’s trajectory over years as a series of straight-line segments in spectral index space. When a segment breaks (like NDVI suddenly drops then maybe slowly recovers), that is recorded as a disturbance event at that time. These have been effectively used for forest disturbance histories (which year was logged, how it recovered, etc.) – now available in some GEE implementations, and give rich info beyond a binary “changed or not”.
Challenges:
- Differences in phenology or seasonal timing can appear as change even if land cover is same (e.g., comparing a leafy season image to a leaf-off season image will show NDVI decline everywhere – which is seasonal, not land cover change). So use same season comparisons or multi-year averages.
- Sensor differences: if using two sensors (like comparing Landsat 5 in 1990 to Landsat 8 in 2020), differences in band specs and calibration need to be normalized or else one might falsely attribute sensor differences to land changes. We addressed normalization earlier.
- Small registration errors can cause change signals along edges of features (like a building’s position off by a pixel looks like change). This is mitigated by good orthorectification and sometimes by using coarser analysis units or a slight buffer when interpreting results.
It’s often wise to produce a change map and then verify with samples or visual interpretation to ensure the changes make sense. In sensitive applications (e.g., mapping village destruction presented as evidence), analysts do manual confirmation.
In code, if using GEE, one could do:
var before = ee.Image('LANDSAT/LC08/C01/T1_SR/LC08_..._2019');
var after = ee.Image('LANDSAT/LC08/C01/T1_SR/LC08_..._2020');
var ndvi_before = before.normalizedDifference(['B5','B4']);
var ndvi_after = after.normalizedDifference(['B5','B4']);
var ndvi_diff = ndvi_after.subtract(ndvi_before);
var changeMask = ndvi_diff.lt(-0.2); // areas with NDVI drop > 0.2 (potential loss)
This simplistic approach would mark large vegetation loss.
In R or Python, one can stack images and do diff <- img2 - img1
then threshold or cluster the diff image.
Categorical change detection (post-classification) yields a change matrix: e.g., how many pixels went from each class to each other class. This is useful for e.g. “forest to agriculture” vs “forest to urban” distinctions, not just forest vs not forest. Social scientists might use that to see what type of land cover replaced forest – was it smallholder farms, plantations, or settlements? Each has different implications. However, doing two separate classifications can double the error: even if each map is say 85% accurate, the combined change accuracy might be lower. Still, it’s widely used because it gives semantic meaning to the change (not just “changed”).
Continuous change detection yields a change magnitude or rate. E.g., LandTrendr can output the percent tree cover lost and regained. That could be integrated with, say, a regression of how forest loss (in hectares) relates to changes in local conflict incidents or policy shifts.
The ArcGIS workflow document nicely summarizes: change detection is fundamentally about “comparison of multiple raster datasets of the same area at different times, to determine the type, magnitude, and location of change”. They break it into categorical change, pixel-value change, and time-series change with algorithms like LandTrendr.
One must decide if pre-classification or post-classification approach is better for the question:
- If exact amounts of each change type are needed, post-classification with a change matrix is useful (though can accumulate error).
- If just locating change, direct differencing is simpler and often more sensitive to subtle changes.
We can also incorporate multi-sensor for robust detection: e.g., use optical and SAR in tandem – if both indicate change, confidence is higher. Or if optical is cloudy, use SAR to not miss changes in that time.
Time-Series in Socioeconomic Analysis: Time series of satellite-derived metrics can be merged with time series of social data. For instance, monthly NDVI over several years can be compared to food price time series to see if vegetation anomalies predict price spikes. Or deforestation rates per year can be plotted alongside policy changes or commodity prices to test relationships.
When integrating time-series, sometimes researchers reduce them to features like “maximum annual NDVI” or “timing of season onset” etc., which are easier to relate to outcomes than an entire series.
A modern technique is to use time-series clustering or phenology classification – e.g., cluster pixels by similarity in their multi-year NDVI trajectory to identify different land use practices or climate responses.
To conclude this section: mastering change detection allows analysts to quantify dynamics that are critical in social contexts – e.g., how fast are cities growing? where and when did conflict cause land abandonment? did a conservation program bend the curve of forest loss? – and provide evidence for policy evaluation. Many global datasets are now available (like annual Global Forest Change maps, or annual land cover maps by ESA or ESRI) which can be directly used, but understanding how they’re produced helps in assessing their reliability and tailoring them to specific local studies.
11.7 Integrating Satellite Outputs with Socioeconomic and Survey Data
Structured satellite outputs (land cover maps, indices like NDVI or nightlights, counts of features, etc.) often need to be combined with other data to inform social science analyses. This integration can happen in several ways:
- Spatial joins: Attaching satellite-derived values to units of analysis like administrative areas, villages, or survey clusters.
- Overlay and aggregation: Calculating statistics of a satellite layer within regions (e.g., average NDVI in each district, or total forest loss in each province) for use as variables in regression or comparative analysis.
- Panel data construction: Using time-series satellite data to create a panel (e.g., yearly forest loss per region, paired with yearly socio-economic data) to analyze trends or impacts over time.
Let’s walk through how one would integrate and the considerations:
Example Integration 1: Program Evaluation – Suppose a microfinance program was rolled out in some villages but not others. We want to see if it had an effect on agricultural productivity. We have a dataset of villages with program or control, and maybe some survey info. We can use satellite-derived yield proxy (say average growing-season NDVI or an index of crop vigor) for each village’s farmlands across years. By extracting that from satellite time-series (perhaps by overlaying village boundaries on NDVI maps), we get a panel: NDVI_2018, NDVI_2019,… for each village. Then we can run a differences-in-differences regression: does NDVI (productivity) increase more in program villages after program start than in control? This is blending satellite outcome data with the program design data. It might reveal, for instance, a significant NDVI rise in treated villages relative to control, implying a program success in improving ag output.
Example Integration 2: Cross-sectional Regression – A researcher might examine determinants of child malnutrition. They have a DHS survey with malnutrition rates by cluster. They hypothesize environmental factors and food availability (greenness, crop yields) matter. So they use remote sensing to get average NDVI and maybe number of harvests (from time-series) in the cluster’s vicinity, plus distance to nearest city (which you can get from maps). They then include those as independent variables in a regression predicting malnutrition, controlling for other socio-economic survey variables. Perhaps they find that higher local NDVI (indicating better ag conditions) is associated with lower stunting, controlling for wealth – providing evidence linking environment to nutrition. The satellite data thus enriched the analysis with info not captured in the survey directly.
Spatial resolution alignment: Often socio-data are at coarser units than pixels. One must decide the appropriate scale. For example, if using district-level poverty from a census, you might aggregate the satellite features to district mean or proportion. Or if you have point data like clinics, you might buffer them (e.g., 5 km radius) and extract satellite variables in that zone (e.g., population density from satellite for clinic catchment).
GIS software or code: Integration is straightforward with the tools mentioned earlier: e.g., in R use terra’s extract
or zonal
to get values per polygon. In Python, use rasterstats zonal_stats
or shapely/rasterio manual overlays.
Data formats: Satellite outputs can be raster or vector. If vector (like polygons of water bodies), you might do spatial joins with other vector data (like linking water body presence to villages by distance). If raster, you often aggregate or sample to vector units. For large country-scale stuff, a database approach like PostGIS might be used to handle heavy joins (some teams load satellite features into a database for easy querying).
Uncertainty propagation: It’s wise to remember that satellite outputs have errors. When using them in analysis, consider their accuracy. For example, if an NDVI-based yield estimate has 80% correlation with true yield, there’s noise which might attenuate regression coefficients. Methods like errors-in-variables or using multiple indicators can mitigate that, but in practice many proceed assuming satellites are reasonably accurate proxies.
Policy use case: Consider Heat Vulnerability Indices for a city. They integrate remote sensing data (land surface temperature from thermal imagery, vegetation cover from NDVI) with social data (population density, age distribution). They might compute each factor by city neighborhood (satellite gives LST and NDVI per neighborhood, census gives population and age per neighborhood). Then they combine them (maybe via weighted sum or principal component) into a composite index of heat vulnerability. That index map is then used to prioritize planting trees or opening cooling centers in the most vulnerable areas. This is a concrete example of integration for decision-making, merging physical and social variables.
Another integration: Land cover maps from satellites can be used as covariates in household income models – e.g., proportion of land in cash crops vs subsistence as seen in land cover classification might predict household income. Or fraction of area deforested around a community could be used in a difference-in-differences if one group got a conservation incentive and another didn’t.
The Ola Hall 2010 paper concluded that remote sensing is most valuable when combined with traditional methods. Indeed, rarely does a satellite image alone answer a social science question – it’s the combination with traditional data (surveys, administrative, ethnographic knowledge) that yields insight. Remote sensing provides additional variables or independent validation.
Challenges in integration: Social data often have confidentiality or bias issues (e.g., surveys don’t sample dangerous areas, while satellite sees all). If exact alignment is off (GPS errors in survey cluster locations), one must be cautious linking pixel-level data; aggregating to a slightly larger area can hedge that risk.
Increasingly, automated pipelines: e.g., World Bank’s POVMAP tool tries to automate going from survey + satellite to poverty map with minimal manual steps. Another example is using ML to predict census data (like education levels) from mix of satellite data – effectively doing data fusion through algorithms.
Ethical and privacy considerations: When linking satellite with, say, individual household data at fine scale, one must ensure not to inadvertently reveal identities or sensitive info. Usually, aggregated results are fine (household locations often anonymized by jittering, etc., but high-res imagery might still show actual houses).
Software aside: In R, after using extract()
you get a data frame of values by unit which you can merge with your survey data frame by unit ID. Then just use lm()
or any model. In Python/pandas, similar merge after using rasterstats.
Final note: Because remote sensing data are often global and consistent, integrating them can allow cross-country studies that previously were hindered by incompatible national data. For instance, using nightlights and landcover, one can attempt a global model of something (with country fixed effects maybe). Social scientists can thus step beyond the confines of official stats.
Integrating structured remote sensing outputs with socio-economic data ultimately enables multidisciplinary insights: environmental change affecting society, or societal actions affecting environment, can be quantitatively assessed. This aligns with the increasingly popular frameworks of coupled human-natural systems or sustainable development research, where remote sensing provides the environmental measures and surveys/records provide the human measures.
11.8 Embracing Computer Vision and Deep Learning in Satellite Image Analysis
Over the last decade, deep learning – particularly convolutional neural networks (CNNs) – has transformed the analysis of images, including satellite and aerial imagery. We’ve touched on deep learning in earlier sections (for poverty mapping, object detection, etc.), but here we focus on the broader integration of these techniques into remote sensing for social science, and practical ways to utilize them in R and Python environments.
Deep learning offers two main advantages for our context:
- Improved accuracy and automation in classifying complex patterns (like identifying a slum or differentiating crop types) by learning directly from large datasets rather than relying on manually crafted features.
- Ability to fuse multiple data types (images, textural info, ancillary data) in one model, which is useful for capturing the multi-faceted nature of social-environmental phenomena.
Common Deep Learning Tasks in Remote Sensing:
- Image Classification: CNNs can classify an entire image or tile (e.g., determine if a given 256x256 pixel image patch is “urban” or “rural”). This is like high-level labeling, sometimes used in scanning large areas (e.g., find all image patches that contain solar farms by training on examples).
- Semantic Segmentation: Using architectures like U-Net, SegNet, DeepLab, a model labels every pixel in an image with a class, akin to our earlier pixel-based classification but using the spatial context learned by CNN. This has been applied to land cover mapping and building footprint extraction with great success. For instance, the geodl R package provides a UNet implementation using the torch backend in R, allowing pixel-level classification with state-of-the-art accuracy.
- Object Detection: Models like Faster R-CNN, YOLO, RetinaNet detect and draw bounding boxes around objects (like vehicles, houses). These are especially useful when we care about counts or presence of discrete items rather than labeling every pixel.
- Feature Extraction: Even if one doesn’t want to do a full CNN mapping, pre-trained models can be used to extract high-level features from images (like the “embedding” of an image from a network’s penultimate layer) which can then feed into regression models for outcomes (similar to how the poverty study did with nightlights + daytime CNN features). This is a way to leverage deep learning’s pattern recognition without building an entire end-to-end model oneself.
Using Deep Learning in R: R has the keras package (and tensorflow backend) which allows defining and training CNNs in R with relatively straightforward code. There is also torch for R (as used in geodl) which provides an R-native interface to PyTorch, enabling deep learning without Python at all. The geodl package authors highlight that it “supports pixel-level classification… geodl is built on the torch package, allowing implementation of DL in R without installing Python/PyTorch, simplifying environment setup”. This means R users can do advanced segmentation modeling completely in R now, which is a big development (published 2024).
For example, using keras in R:
library(keras)
# Define a simple CNN for binary classification (just for concept)
<- keras_model_sequential() %>%
model layer_conv_2d(filters=32, kernel_size=3, activation='relu', input_shape=c(64,64,3)) %>%
layer_max_pooling_2d() %>%
layer_conv_2d(filters=64, kernel_size=3, activation='relu') %>%
layer_max_pooling_2d() %>%
layer_flatten() %>%
layer_dense(units=128, activation='relu') %>%
layer_dense(units=1, activation='sigmoid')
%>% compile(optimizer='adam', loss='binary_crossentropy', metrics='accuracy')
model # Assume X_train, y_train prepared as arrays of image patches and labels
%>% fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2) model
After training, predict()
can be used on new image patches. For segmentation, one would use layer_conv_2d_transpose
for upsampling or the functional API to build the U-shape architecture.
Using Deep Learning in Python: Python is the native zone of deep learning libraries like TensorFlow/Keras, PyTorch, MXNet, etc. For remote sensing tasks:
- Raster data handling: Usually use rasterio or tf.data pipelines to load images. Keras can train on image generators or tfrecords. PyTorch often uses custom Dataset classes (like geodl defines a torch dataset for geospatial data).
- Pre-trained weights: There’s transfer learning possibilities. For example, one might take VGG16 or ResNet50 pre-trained on ImageNet (natural images) and fine-tune it on satellite image data. Though satellite imagery differs (different band spectra), using 3-band composites and a pre-trained model can still expedite training if features like edges, textures learned from natural images are somewhat relevant.
- For segmentation specifically, libraries like segmentation_models.pytorch provide ready-made UNet, FPN, etc., with options for pretrained encoders on ImageNet which often improve performance even on satellite imagery segmentation.
Computational power: Training large CNNs can be heavy; typically one would use a GPU. In R, one can install tensorflow with GPU support or use cloud services. For moderate tasks (like classifying a few hundred images of size 64x64), CPU might suffice, but for full-scene segmentation (e.g., classifying a whole city at 0.5 m resolution), a tiling approach and a GPU is likely needed.
Examples of social science using deep learning on RS:
- Mapping poverty (CNN as in example).
- Detecting schools or health facilities by looking for their distinct shapes (some work uses R-CNN to find large buildings likely to be schools in rural Africa, to help update facility maps).
- Cropland mapping: UN Food and Agriculture Org. uses deep learning on satellite time series to identify fields and crop types, aiding food security assessments.
- Slum mapping: A study in Cape Town used a U-Net on 50 cm imagery to segment informal vs formal housing with high accuracy, allowing analysis of slum growth.
Reticulate and hybrid approaches: The R reticulate package allows calling Python code from R. This means if an R user finds a perfect model implemented in PyTorch, they can invoke it via reticulate. Conversely, Python users can use R packages via rpy2 if needed, though that’s less common. The reticulate approach is sometimes used to access specialized deep learning tools that might not have an R counterpart easily. However, with R’s torch and keras, most deep learning can be done directly in R now.
Model deployment and use: Once a model is trained (in R or Python), it can be used to process large volumes of imagery. For example, a trained building footprint UNet can be applied to an entire country’s imagery in patches and then mosaicked. Google Earth Engine is even starting to allow custom models to run on their platform (TensorFlow models can be exported and used in GEE for inference). This hints at a future where a researcher trains a model on their data, then deploys it on cloud platforms to map at scale.
Interpreting models: Deep learning often is a black box. For social science, knowing why the model predicts something can be as important as the prediction. Techniques like saliency maps or class activation maps can be used – e.g., in the poverty CNN case, they visualized which parts of an image patch contributed to high wealth prediction (often roads or large roofs lit up in activation maps), giving qualitative assurance that the model picks up meaningful features (roads, urban areas) rather than spurious patterns.
Continuous variables with deep learning: Not all outputs are classes. One can train a CNN to output a continuous value (regression CNN). For example, train a CNN to predict the percent of people in an image patch that are below poverty line (a continuous percentage). Loss would be MSE in that case. This was done in some studies to directly predict village poverty rate (rather than classifying into poor/non-poor categories). Another, using multi-modal networks – e.g., concatenating satellite image features with other data (maybe climate or location coordinates) inside a neural network, to output something like crop yield, is possible.
Summary of deep learning integration:
- It’s powerful and increasingly accessible in R/Python with high-level packages.
- Suitable for complex tasks that were hard with traditional methods.
- Requires data (labeled examples), which in social contexts can be a bottleneck. But more labeled datasets are being published (like xView, SpaceNet, etc. for buildings, roads; DeepGlobe land cover; etc., which researchers can utilize or fine-tune on).
- Good practice: use pre-trained models to save time, and apply techniques like cross-validation especially when sample labels are few.
Given how new deep learning methods are rapidly being adopted, it is likely that future social science analyses will routinely incorporate CNN-based extraction of features (like automatically measuring settlement growth, infrastructure presence, even signs of economic activities from overhead images) as part of the data pipeline. The barrier to entry is lowering: one doesn’t need to write a CNN from scratch – with a few lines in keras or torch you can use architectures proven in computer vision literature.
For example, Maxwell et al. 2024 (geodl) emphasize that they’ve provided ready UNet models and metrics in R, so users can train a model and get accuracy like F1-scores readily. This democratizes deep learning for analysts who might not have a deep CS background.
In closing, the synergy of remote sensing and deep learning unlocks new possibilities to analyze social phenomena at scale and detail previously impossible. But it should always be guided by domain knowledge – the models learn patterns, but humans must ensure those patterns make sense in context and the results are interpreted correctly within socio-economic frameworks.
Throughout this chapter, we referenced numerous sources and examples demonstrating the power of satellite data in social science research. As satellite technology and analytical methods (like AI) continue to advance, these tools will become even more integral to understanding and addressing social and policy challenges. Importantly, the combination of satellites with ground data and domain expertise is key – remote sensing doesn’t replace traditional data but complements and enhances it. With the skills and examples outlined – from preprocessing to analysis with R/Python code snippets – researchers and practitioners should be equipped to incorporate satellite imagery into their work, yielding richer insights and more robust evidence for policy decisions.
References
Aybar, C., Wu, Q., Bautista, M., Yali, R., Barja, A., & Lizarazo, J. (2020). rgee: An R package for interacting with Google Earth Engine. Journal of Open Source Software, 5(51), 2272.
Blackman, A., Corral, L., Santos Lima, E., & Asner, G. P. (2021). Does community monitoring reduce deforestation? Evidence from a randomized controlled trial in the Peruvian Amazon. Proceedings of the National Academy of Sciences, 118(29), e2015171118.
Corbane, C., Florczyk, A. J., Freire, S., Schiavina, M., Pesaresi, M., & Kemper, T. (2022). GHSL data package 2022: Global Human Settlement Layer multi‑temporal products from 1975 to 2020. Publications Office of the European Union.
Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary‑scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18–27.
Hansen, M. C., Potapov, P. V., Moore, R., Hancher, M., Turubanova, S. A., Tyukavina, A., … Townshend, J. R. G. (2013). High‑resolution global maps of 21st‑century forest cover change. Science, 342(6160), 850–853.
Henderson, J. V., Storeygard, A., & Weil, D. N. (2012). Measuring economic growth from outer space. American Economic Review, 102(2), 994–1028.
Jean, N., Burke, M., Xie, M., Davis, W. M., Lobell, D. B., & Ermon, S. (2016). Combining satellite imagery and machine learning to predict poverty. Science, 353(6301), 790–794.
Kennedy, R. E., Yang, Z., & Cohen, W. B. (2010). Detecting trends in forest disturbance and recovery using yearly Landsat time series: LandTrendr—Temporal segmentation algorithms. Remote Sensing of Environment, 114(12), 2897–2910.
Li, X., Zhou, Y., Zhao, M., & Zhao, X. (2019). Monitoring conflict from space: Destruction in the Syrian civil war. Remote Sensing, 11(19), 2333.
Maxwell, A. E., Warner, T. A., & Fang, F. (2024). geodl: Deep learning tools for geospatial data in R. SoftwareX, 17, 101299.
Wang, L., You, N., Zhang, F., & Chen, L. (2022). Estimating global poverty with multi‑source remote sensing and machine learning. Nature Communications, 13, 1045.
Zhu, Z., & Woodcock, C. E. (2014). Continuous change detection and classification of land cover using all available Landsat data. Remote Sensing of Environment, 144, 152–171.