3  Chapter 3: Advanced Techniques for Geospatial Data Manipulation

3.1 Introduction

Geospatial data analysis is crucial across various scientific and practical fields such as urban planning, environmental management, public health, agriculture, and disaster response. The growing availability of spatial data, driven by advancements in technology such as GPS, remote sensing, and geotagged data from social media platforms, has significantly expanded our capabilities for detailed spatial analyses. Consequently, proficiency in handling and analyzing spatial data has become an invaluable skill set.

This chapter provides essential techniques for manipulating geospatial data using two powerful programming languages, R and Python. Leveraging scripted analyses allows for automation, scalability, and reproducibility, making them preferable to traditional Geographic Information Systems (GIS) that rely solely on graphical user interfaces. Additionally, scripted analyses enable seamless integration of spatial and non-spatial data, facilitating complex and multifaceted studies.

3.2 Objectives

The primary objectives of this chapter include:

  • Introducing essential geospatial packages in R (sf, stars) and Python (geopandas, rasterio).
  • Demonstrating basic data loading, exploration, transformation, and visualization techniques.
  • Illustrating the creation of static and interactive maps to communicate spatial insights effectively.

3.3 Working with Vector Data

Reading Vector Data

Vector data structures, typically stored in shapefile formats, encapsulate spatial features as points, lines, or polygons along with associated attribute data. Properly reading these datasets is a foundational step.

R Example:

library(sf)
mrc <- st_read("data/mrc.shp")

Python Example:

import geopandas as gpd
mrc = gpd.read_file("data/mrc.shp")

These commands load the dataset into respective environments, allowing subsequent spatial analyses.

Examining Data

To understand the spatial data fully, examining coordinate reference systems (CRS) and bounding boxes is vital. CRS describes the mathematical framework for locating points on the Earth’s surface, essential for spatial accuracy.

R Example:

st_bbox(mrc)
st_crs(mrc)

Python Example:

print(mrc.crs)
print(mrc.total_bounds)

These explorations confirm the spatial extent and the coordinate framework used in the dataset, ensuring appropriate data handling in subsequent steps.

Basic Visualization

Visualization offers immediate insights into spatial patterns, distributions, and potential anomalies, facilitating exploratory spatial data analysis.

R Example:

plot(mrc["geometry"], axes = TRUE)

Python Example:

mrc.plot(edgecolor='black')
plt.show()

These plots provide rapid visual overviews, aiding quick identification of spatial relationships and potential issues.

Data Selection and Filtering

Spatial data, similar to tabular data, can be filtered and selected based on specific criteria, allowing targeted analyses and focused visualizations.

R Example:

mrc_subset <- mrc[mrc$pop2016 > 200000, c("mrc_name", "pop2016")]

Python Example:

mrc_subset = mrc[mrc['pop2016'] > 200000][['mrc_name', 'pop2016']]

This process highlights spatial subsets relevant to specific research questions, such as identifying densely populated regions for infrastructure planning.

3.4 Integration with Data Manipulation Tools

Integrating spatial data analysis with powerful data manipulation tools in R (dplyr) and Python (pandas) significantly enhances data processing capabilities, enabling complex transformations and aggregations.

R Example:

library(dplyr)
regions <- mrc %>% 
  group_by(reg_name) %>% 
  summarize(pop2016 = sum(pop2016))

Python Example:

regions = mrc.dissolve(by='reg_name', aggfunc={'pop2016': 'sum'})

Grouping and summarizing operations are particularly useful for regional analyses, such as evaluating demographic patterns or resource allocation strategies across administrative boundaries.

3.5 Creating Spatial Data from DataFrames

Non-spatial datasets containing coordinate information can be transformed into spatial objects, enabling them to be visualized and analyzed spatially. This step broadens the applicability of spatial analysis to traditional tabular datasets.

R Example:

plots <- read.csv("data/plots.csv")
plots_sf <- st_as_sf(plots, coords = c("long", "lat"), crs = st_crs(mrc))

Python Example:

import pandas as pd
plots = pd.read_csv("data/plots.csv")
plots_gdf = gpd.GeoDataFrame(plots, geometry=gpd.points_from_xy(plots.long, plots.lat), crs=mrc.crs)

Such transformations are essential for integrating and analyzing observational data, such as environmental monitoring locations or survey points.

3.6 Coordinate Reference Systems (CRS)

Understanding and correctly applying CRS is vital for accurate spatial analysis. It ensures consistency in spatial measurements and comparability across different spatial datasets.

Reprojecting Data

Reprojection transforms spatial data between different CRS, aligning datasets collected or stored in various spatial frameworks.

R Example:

mrc_proj <- st_transform(mrc, crs = 6622)

Python Example:

mrc_proj = mrc.to_crs(epsg=6622)

Proper reprojection is crucial when performing spatial operations, ensuring accuracy in calculations such as distances, areas, and intersections.

3.7 Customizing Maps with Visualization Libraries

Advanced visualizations, such as thematic maps, enhance the interpretability of spatial data, effectively communicating insights to diverse stakeholders.

Using ggplot2 in R

library(ggplot2)
ggplot() +
  geom_sf(data = mrc_proj) +
  geom_sf(data = plots_sf, aes(color = cover_type), size = 1) +
  theme_bw()

Using geopandas and matplotlib in Python

fig, ax = plt.subplots(figsize=(10,8))
mrc_proj.plot(ax=ax, color='lightgrey', edgecolor='black')
plots_gdf.plot(ax=ax, column='cover_type', legend=True, markersize=10)
plt.title('Forest Inventory Plots')
plt.show()

Detailed visualizations facilitate targeted policy decisions, resource allocation, and comprehensive environmental assessments.

3.8 Interactive Maps

Interactive maps offer dynamic visual exploration, enhancing user engagement and understanding of complex spatial relationships.

Using Leaflet in R

library(leaflet)
leaflet(mrc_proj) %>%
  addTiles() %>%
  addPolygons()

Using Folium in Python

import folium
m = folium.Map(location=[46.8139, -71.2080], zoom_start=6)
folium.GeoJson(mrc_proj).add_to(m)
m

Interactive tools are particularly valuable for stakeholder presentations and participatory planning processes, allowing real-time spatial exploration.

3.9 Conclusion

This chapter detailed essential methods for geospatial data manipulation using R and Python, establishing a foundation for robust spatial analysis. Mastery of these techniques enables analysts to address complex spatial challenges and communicate insights effectively, underpinning informed decision-making across various scientific and practical applications.