3  Advanced Spatial Data Manipulation Using QGIS and R

3.1 Introduction

This chapter introduces advanced techniques for manipulating geospatial data using both QGIS and R. While R is a powerful tool for scripting and automation, QGIS provides a user-friendly graphical interface, making it accessible for those who prefer not to code. We will explore the sf and stars packages in R, essential for handling vector and raster data, and demonstrate how to perform geometric operations, spatial analyses, and visualize data on interactive maps using mapview and leaflet. We will also provide equivalent operations in QGIS, showcasing its built-in tools and plugins for similar tasks, making this guide comprehensive for users who wish to master geospatial data analysis in both environments.

Objectives

  • Introduce raster data handling alongside the vector data explored previously.
  • Demonstrate manipulation of raster and vector data in R and QGIS, including visualization on interactive maps.
  • Cover the extraction of raster values from vector points.
  • Compare tools for creating interactive maps, focusing on mapview and leaflet in R, and their equivalents in QGIS.

3.2 Geometric Operations on Vector Data

QGIS Implementation

In QGIS, geometric operations can be performed using a range of tools available in the interface:

  1. Load Data: Use the Data Source Manager to load vector data such as shapefiles (mrc.shp) and CSV files (plots.csv). Ensure that the coordinate system is correctly assigned when importing data.
  2. Explore Attributes: The Attribute Table allows inspection and manipulation of data fields.
  3. Geoprocessing Tools: Perform operations such as calculating areas, spatial joins, buffers, and intersections using tools like Field Calculator, Geoprocessing Tools, and Measure Tool.

Example: Calculating Area in QGIS

  • Open the Field Calculator and create a new field to store the area calculation using the $area expression. This tool provides a straightforward method to calculate and store geometric attributes directly within the attribute table.

R Implementation

In R, similar operations are performed using the sf package. For example, you can calculate areas, perform spatial intersections, and join attributes using a few lines of code:

library(sf)
library(dplyr)

# Load the shapefile and perform operations
mrc <- st_read("data/mrc.shp")
areas <- st_area(mrc)
head(areas)

# Convert area units
units(areas) <- "km^2"
head(areas)

Spatial Intersections

QGIS Implementation

  • Use the Join Attributes by Location tool in QGIS to find intersections between points and polygons, adding relevant attributes from one layer to another based on spatial relationships.

R Implementation

In R, spatial intersections can be easily managed with sf:

plots <- st_read("data/plots.shp")
inters <- st_intersects(plots, mrc)
head(inters)

Spatial Joins

QGIS Implementation

  • The Join Attributes by Location tool in QGIS can also perform spatial joins, allowing for the seamless integration of attributes from one layer to another.

R Implementation

plots_mrc <- st_join(plots, mrc)
head(plots_mrc)

Review of Geometric Operations

  • QGIS provides a graphical interface for geometric operations, making it accessible to non-coders.
  • R with the sf package offers script-based flexibility for performing similar tasks, such as measuring areas, performing spatial joins, and calculating distances.

3.3 Raster Data Handling

Introduction to Raster Data

QGIS Implementation

In QGIS, raster data is managed through the Raster menu. You can load raster files such as GeoTIFFs, explore their properties, and apply various operations, such as clipping and value extraction.

  • Load Raster Data: Use the Raster menu to import GeoTIFF files.
  • Explore Properties: The Layer Properties panel in QGIS provides information on the raster’s extent, coordinate reference system (CRS), and other metadata.

R Implementation

In R, raster data is handled using the stars package:

library(stars)

# Load raster data
cdem <- read_stars("data/cdem_022BC_3s.tif")
print(cdem)

Visualizing Raster Data

QGIS Implementation

  • Visualize raster data using the Layer Styling panel. Options like Singleband pseudocolor and Hillshade provide powerful ways to render and analyze raster layers in QGIS.

R Implementation

plot(cdem)

Working with Large Raster Files

QGIS Implementation

QGIS efficiently handles large raster datasets using Virtual Raster (VRT) functionality, which references multiple rasters without physically merging them. This allows for the processing of large datasets without overwhelming system memory.

R Implementation

R handles large rasters using the stars_proxy object, enabling manipulation without fully loading data into memory.

Performing Raster Operations

Clipping Raster Data

QGIS Implementation

  • Use the Clip Raster by Mask Layer tool in QGIS to clip a raster based on the extent of another vector layer.

R Implementation

cdem_part <- cdem[cdem$x > -67, , ]
plot(cdem_part)

Extracting Values from Raster

QGIS Implementation

  • Use the Point Sampling Tool plugin to extract raster values at specific point locations, a common task in spatial analysis.

R Implementation

library(raster)

# Convert to Raster object and extract values
cdem_r <- as(cdem, "Raster")
plots_elev <- raster::extract(cdem_r, plots)
plots$elev <- plots_elev
head(plots)

Review of Raster Data Operations

  • QGIS offers an intuitive graphical interface for raster operations, including clipping, sampling, and styling.
  • R provides robust script-based tools with the stars and raster packages, offering flexibility for large-scale raster data manipulation.

3.4 Creating Interactive Maps

Interactive Maps with QGIS

QGIS enables the creation of interactive maps through plugins like QGIS2Web and Lizmap:

  • QGIS2Web: Converts QGIS projects into web maps, similar to R’s leaflet package.
  • Lizmap: Publishes QGIS projects as interactive web maps, providing a user-friendly interface for web GIS applications.

Interactive Maps in R

Using mapview

library(mapview)

# Create interactive map
mapview(cdem_r) + mapview(plots, zcol = "cover_type")

Using leaflet

library(leaflet)

# Create interactive leaflet map
m <- leaflet() %>%
  addTiles() %>%
  addRasterImage(cdem_r, opacity = 0.8) %>%
  addMarkers(lng = 2.4278, lat = 6.3562, popup = "Sample Location")
m

Review of Interactive Mapping

  • QGIS: QGIS2Web and Lizmap provide powerful tools for creating and sharing interactive maps without coding.
  • R: mapview and leaflet offer dynamic and customizable map creation through scripting, suitable for web deployment.

3.5 Case Study: The SUNGEO Package

The SUNGEO R Package is a powerful tool designed for users working with original (unpublished, private, proprietary) data. This statistical software package offers a range of embedded methodological tools specifically for sub-national data processing and integration. The SUNGEO Geoprocessing Toolkit provides researchers with a set of routines, comprehensive documentation, and source code to implement various (dis-)aggregation techniques with their own datasets. These tools include capabilities for interpolating and integrating spatially-misaligned GIS datasets, batch geocoding of addresses, and selecting, downloading, and loading spatial data directly into R from SUNGEO’s servers.

Key Features and Resources:

Data Download and Integration: - The package allows seamless downloading of spatial data from SUNGEO’s API, which can be tailored to specific countries, topics, or boundary sets. Users can retrieve datasets for single or multiple countries, with options for selecting time frames, spatial units, and other parameters.

Geocoding

Geocoding:

The SUNGEO R package includes robust geocoding capabilities that allow users to convert addresses into geographic coordinates. This feature is essential for researchers who need to map data points based on addresses, making it easier to analyze spatial patterns.

Single Address Geocoding:

  • Users can quickly obtain the geographic coordinates for a single address with a simple command. For example, to get the coordinates for “Michigan Stadium,” the following function is used:
geocode_osm("Michigan Stadium")

This returns the top match for the address, providing the latitude and longitude coordinates.

Detailed Geocoding Results:

  • For more comprehensive information, users can request detailed results for the top match, including additional data such as bounding box information and address components:
geocode_osm("Michigan Stadium", details=TRUE)

Batch Geocoding:

  • The package also supports batch geocoding, which is particularly useful when working with large datasets that contain multiple addresses. For instance, to geocode a list of city names like “Ann Arbor,” “East Lansing,” and “Columbus,” the following command can be used:
geocode_osm_batch(c("Ann Arbor","East Lansing","Columbus"))

This function returns the top matches for each address. For a more detailed output, including all possible matches, users can adjust the parameters:

geocode_osm_batch(c("Ann Arbor","East Lansing","Columbus"), details = TRUE, return_all = TRUE)

Additionally, the batch geocoding process can include progress reports to keep users informed about the status of their data processing:

geocode_osm_batch(c("Ann Arbor","East Lansing","Columbus"), verbose = TRUE)

Changes in Geographic Support:

SUNGEO also offers tools for handling changes in geographic support, such as area-weighted and population-weighted polygon-to-polygon interpolation. These methods are critical when working with spatial datasets that need to be adjusted for different geographic units or when data from different sources must be aligned.

Area-Weighted Interpolation:

  • Users can perform area-weighted interpolation to transfer data from one set of polygons to another. For example, to interpolate voter turnout data from legislative constituencies to a hexagonal grid:
out_1 <- poly2poly_ap(poly_from = clea_deu2009,
                      poly_to = hex_05_deu,
                      poly_to_id = "HEX_ID",
                      varz = "to1")

The results can be visualized at the grid cell level to understand spatial distribution patterns:

plot(out_1["to1_aw"])

Population-Weighted Interpolation:

  • For cases where population density is a critical factor, the population-weighted interpolation method can be used. This method adjusts the data based on the population distribution within the polygons, providing a more accurate representation of data like voter turnout:
out_2 <- poly2poly_ap(poly_from = clea_deu2009,
                      poly_to = hex_05_deu,
                      poly_to_id = "HEX_ID",
                      varz = "to1",
                      methodz = "pw",
                      pop_raster = gpw4_deu2010)

The resulting interpolated data can be visualized similarly to the area-weighted results:

plot(out_2["to1_pw"])

Point-to-Polygon Interpolation:

  • SUNGEO includes tools for point-to-polygon interpolation, which is useful when data is available at specific points, such as election results or environmental measurements, and needs to be aggregated into polygons. For example, using tessellation methods or ordinary kriging:
out_4 <- point2poly_tess(pointz = clea_deu2009_pt,
                         polyz = hex_05_deu,
                         poly_id = "HEX_ID",
                         varz = "to1")
plot(out_4["to1_aw"])

Advanced Geostatistical Methods:

  • The package also supports advanced geostatistical techniques, such as ordinary kriging and universal kriging, which allow for more sophisticated spatial interpolations that account for spatial autocorrelation and other factors:
out_5 <- point2poly_krige(pointz = clea_deu2009_pt,
                          polyz = clea_deu2009,
                          yvarz = "to1")

This can be particularly valuable for predicting values in unsampled locations based on known data points.

Line-in-Polygon Analysis:

SUNGEO includes functionality for analyzing the relationship between linear features (like roads or rivers) and polygons (like administrative boundaries or grid cells). This is important for tasks like calculating road densities within regions or determining the distance of each polygon to the nearest highway.

Basic Map Overlay:

  • Users can perform basic overlays of linear features on polygons to visualize relationships between the two:
plot(hex_05_deu["geometry"])
plot(highways_deu1992$geometry, add=TRUE, col = "blue", lwd=2)

Calculating Road Lengths and Densities:

  • The package provides tools for calculating the length and density of roads within each polygon and the distance from each polygon to the nearest road:
out_8 <- line2poly(linez = highways_deu1992,
                   polyz = hex_05_deu,
                   poly_id = "HEX_ID")
plot(out_8["line_length"])

These metrics can be crucial for understanding infrastructure accessibility and its impact on spatial phenomena, such as voter turnout or economic activity.

Projections and Coordinate Reference Systems (CRS):

Working with spatial data often requires re-projecting datasets into different coordinate systems to ensure accuracy and consistency. SUNGEO simplifies this process by automatically selecting appropriate planar CRS for unprojected datasets and re-projecting geometries as needed.

Automatic CRS Selection:

  • The package can automatically identify a suitable CRS based on the geographic extent of the data:
out_10 <- utm_select(clea_deu2009)

Visualization of Re-Projected Data:

  • Once re-projected, the data can be visualized to verify the transformation and ensure that spatial relationships are preserved:
plot(out_10["geometry"], axes=TRUE)

These tools help maintain the integrity of spatial analyses when working with diverse datasets from different sources.

Rasterization of Polygons:

The package includes functions to convert polygon data into raster formats, which can be useful for a variety of spatial analyses that require grid-based data. Rasterization enables users to create grid layers that can be used for overlay analysis, spatial interpolation, and other tasks.

Transforming Polygons into Rasters:

  • Users can transform polygon layers into raster layers with specified resolutions, which is essential for analyses that need consistent grid data:
out_11 <- sf2raster(polyz_from = utm_select(clea_deu2009),
                    input_variable = "to1")
raster::plot(out_11)

Creating Cartograms:

  • SUNGEO also supports the creation of cartograms, where the size of geographic areas is scaled according to a specific variable, such as voter turnout:
out_13 <- sf2raster(polyz_from = utm_select(clea_deu2009),
                    input_variable = "to1",
                    cartogram = TRUE,
                    carto_var = "vv1")
raster::plot(out_13)

This feature provides a powerful way to visualize spatial data in a way that emphasizes the importance of certain areas based on their attributes.

Reverse Rasterization:

  • For users who need to convert raster data back into polygon format, the package offers reverse rasterization, allowing for the polygonization of cartogram rasters:
out_14 <- sf2raster(reverse = TRUE,
                    poly_to = out_14a$poly_to,
                    return_output = out_14a$return_output,
                    return_field = out_14a$return_field)
plot(out_14["to1"])

To expand on the provided section on Advanced Spatial Data Manipulation Using QGIS and R, and integrate a focus on time series data, you can introduce a dedicated section for handling time series geospatial data, leveraging both QGIS and R. Here’s how you can approach this:


3.6 Time Series Geospatial Data Analysis

Introduction

Time series geospatial data analysis involves studying spatial phenomena over time, allowing researchers to observe trends, patterns, and changes across both spatial and temporal dimensions. This type of analysis is crucial for various applications, including climate monitoring, urban development, and environmental change studies. Both QGIS and R offer robust tools for handling and visualizing time series geospatial data, enabling comprehensive analysis.

Objectives

  • Introduce methods for handling and analyzing time series geospatial data in QGIS and R.
  • Demonstrate how to visualize time series data using mapview and leaflet in R and equivalent tools in QGIS.
  • Explore temporal animations to visualize changes over time.

Working with Time Series Data in QGIS

Loading and Visualizing Time Series Data

In QGIS, time series data can be loaded and visualized using the following steps:

  1. Load Data: Use the Data Source Manager to import time series data. For raster data, ensure that each time step is represented as a separate raster layer. For vector data, use a single layer with a time attribute.

  2. Visualize Temporal Data:

    • Install the Time Manager plugin in QGIS.
    • Configure the plugin by selecting the layer, specifying the time attribute, and setting the time step interval.
    • Use the Time Manager to animate the data, observing how the spatial patterns change over time.

Temporal Analysis in QGIS

  • Temporal Heatmaps: Use the Heatmap plugin combined with the Time Manager to create animated heatmaps that show intensity variations over time.
  • Raster Calculator: Perform calculations between different time steps using the Raster Calculator, such as subtracting one time step from another to identify changes.

Time Series Geospatial Data in R

Introduction to Time Series Handling in R

In R, time series data can be handled using the sf, stars, and tidyverse packages. These tools allow for flexible and powerful manipulation of both spatial and temporal dimensions.

library(sf)
library(stars)
library(dplyr)
library(lubridate)

# Load a time series dataset (e.g., a stack of rasters or a time-stamped point dataset)
time_series_data <- st_read("data/time_series_data.shp")

# Ensure the time attribute is in a suitable format
time_series_data$time <- ymd_hms(time_series_data$time)

Visualizing Time Series Data with mapview and leaflet

R offers several packages for creating interactive time series visualizations:

library(mapview)
library(leaflet)

# Example: Create an interactive map with a time slider
map <- leaflet(time_series_data) %>%
  addTiles() %>%
  addCircleMarkers(lng = ~longitude, lat = ~latitude, radius = 5,
                   color = ~time_series_data$color,
                   popup = ~paste("Time:", time_series_data$time)) %>%
  addTimeSlider(time_series_data$time)
map

Analyzing Temporal Patterns

  • Temporal Aggregation: Aggregate data over specific time intervals (e.g., monthly, yearly) using group_by() and summarize() functions in R.
# Example: Aggregate by month
monthly_data <- time_series_data %>%
  group_by(month = floor_date(time, "month")) %>%
  summarize(value = mean(value, na.rm = TRUE))
  • Trend Analysis: Perform trend analysis by fitting time series models to the data, such as linear regression or more complex time series models.
# Example: Fit a linear trend model
trend_model <- lm(value ~ time, data = time_series_data)
summary(trend_model)

3.7 Creating Temporal Animations

QGIS Temporal Animations

  • Use the Time Manager plugin to create temporal animations within QGIS. This allows you to visualize changes over time directly in the QGIS interface.

QGIS provides robust tools for visualizing and analyzing time series geospatial data. The most common way to work with time series data in QGIS is by using the Temporal Controller and Time Manager (now integrated into QGIS as the Temporal Controller). Below is a step-by-step guide on how to visualize time series data in QGIS.

Step 1: Preparing Your Data

Ensure that your data contains a time attribute, which is crucial for time series analysis. This could be in the form of a timestamp, date, or other time-related fields.

  • Vector Data: Should have a date or time field.
  • Raster Data: Each time step should be stored as a separate raster file or band.

Step 2: Loading Your Data

  1. Open QGIS and load your vector or raster time series data.
    • For vector data: Use the Add Vector Layer option to load your shapefiles or other vector formats.
    • For raster data: Use the Add Raster Layer option to load your time series raster files.

Step 3: Activating the Temporal Controller

  1. Enable the Temporal Controller:
    • Go to the View menu and select Panels > Temporal Controller to open the temporal control panel.
    • This panel allows you to control time-based data and animations.

Step 4: Configuring Temporal Properties

  1. Set Temporal Properties for Layers:
    • Right-click on your vector or raster layer and select Properties.
    • Go to the Temporal tab.
    • Check Enable temporal properties for this layer.
    • For vector data, choose the Field with date/time that corresponds to the time data.
    • For raster data, set the Start and End time for the layer if each raster represents a time step.

Step 5: Running the Temporal Animation

  1. Configure the Temporal Controller:
    • In the Temporal Controller panel, set the time range that you want to animate.
    • Adjust the step duration to control the speed of the animation.
    • Choose whether you want the animation to play continuously or in steps.
  2. Run the Animation:
    • Click the Play button in the Temporal Controller panel to start the animation.
    • The map will automatically update to show how the data changes over time.

Step 6: Exporting the Animation

  1. Export to Video:
    • QGIS allows you to export the temporal animation as a video.
    • Click on the Export as Video button in the Temporal Controller panel.
    • Choose the file format and location to save the video.

Additional Tips

  • Temporal Aggregation: If your data has multiple points or polygons that overlap in time, consider aggregating the data (e.g., summing values) for clearer visualization.
  • Time Formats: Ensure your time data is in a format that QGIS can recognize, such as ISO 8601 (YYYY-MM-DD).

Example: Visualizing Monthly Temperature Data

Assume you have a dataset with monthly temperature readings:

  1. Load the temperature data into QGIS.
  2. Set the Temporal Properties for your temperature layer by selecting the month field as the time attribute.
  3. Run the Temporal Controller to visualize temperature changes over the year.

By following these steps, you can create a dynamic time series visualization that shows how your data evolves over time directly within QGIS.

R Temporal Animations

  • In R, use the gganimate package in conjunction with ggplot2 for creating temporal animations.
library(ggplot2)
library(gganimate)

# Example: Animate a time series
p <- ggplot(time_series_data, aes(x = longitude, y = latitude, color = value)) +
  geom_point() +
  transition_time(time) +
  labs(title = "Time: {frame_time}")
animate(p)

Review of Time Series Geospatial Analysis

  • QGIS provides tools like the Time Manager plugin for non-coding users to visualize and animate time series data easily.
  • R offers extensive capabilities for time series analysis, allowing for complex statistical modeling and the creation of interactive visualizations and animations.

3.8 Conclusion

This chapter has provided a comprehensive overview of advanced geospatial data manipulation techniques using both QGIS and R. By integrating vector and raster data handling, performing spatial operations, and creating interactive maps, we’ve highlighted the strengths of both environments. Whether you prefer the graphical interface of QGIS or the scripting power of R, mastering these tools will enhance your geospatial analysis capabilities.