11  Geolocation Processing

11.1 Overview

All preliminary geolocation processing is done on the virtual machine. The raw geolocation data will always be saved on the restricted drive (R drive) in a folder for the project for those data. Processed geolocation data are saved on the standard drive (S).

To log into the virtual machine, see the virtual machines page.

Talk with Susan about getting a userid on the VM if you do not already have one.

11.2 Initial System Configuration

There are a series of steps that will need to be taken in order to prepare a virtual machine for use with the GPS2 docker environment, they will be discussed here.

11.2.1 Installing Docker & Docker Compose

GPS2 uses containerized PostGIS and Nominatim services that ensure consistent database environments. These services run in containers that are propagated by a docker engine. These containers not only allow us to quickly and easily access this software, but shares that ease of access across different os/platforms.

It is recommended to add any user that is regularly interacting with Docker to the docker group, as this will allow them to run the engine without sudo. This is mainly done for convenience, but also to prevent permission issues with R scripts that call Docker commands.

Finally, systemctl enable ensures that Docker will start automatically on boot, which is useful for long-term analysis jobs / setups.

sudo apt update
sudo apt install -y docker.io docker-compose
sudo systemctl enable --now docker
sudo usermod -aG docker $USER # logout/login required

To verify adequate installation, one can run these commands:

docker --version
docker-compose --version
docker ps # notably, this should work with sudo after logout/login

11.2.2 Verify and Expand /var Partition for Docker Storage

The following change has already been made, but it’s documented here for posterity: Docker stores volumes in /var/lib/docker by default, and Nominatim requires ~5-8GB of disk space during initial Wisconsin OSM import (plus growth over time). If the /var partition is too small, it will cause Nominatim to fail.

Check available space:

df -h /var  # Should show at least 10GB available for Nominatim

If /var has less than 10GB available, check for free space in the LVM volume group:

sudo vgs  # Look at VFree column

If free space exists (e.g., 20GB+ available), extend /var:

# Add 10GB to /var (adjust size based on availability)
sudo lvextend -L +10G /dev/Volume00/var
sudo resize2fs /dev/mapper/Volume00-var

# Verify new size
df -h /var  # Should now show ~15GB total

11.2.3 Fix AppArmor Compatibility Issue

Ubuntu 24.04’s default AppArmor runc profile blocks Docker container execution. It is necessary to disable it:

sudo mv /etc/apparmor.d/runc /etc/apparmor.d/runc.disabled
sudo systemctl restart docker

Then, verify Docker works:

docker run --rm hello-world

11.2.4 Installing PostgreSQL Client Tools

Throughout GPS2, psql commands are used sporadically for direct database testing and verification. Most of the database connections can be handled by R’s DBI, but having psql allows for certain debugging capabilities without the additional complication of R.

sudo apt install -y postgresql-client
psql --version # to verify

11.2.5 Installing R

This is likely already done, but for posterity:

To check for an existing R installation

R --version

If R is missing or below 4.0

sudo apt install -y r-base r-base-dev

11.2.6 Installing Quarto

Needs little explanation, chosen for its support of multi-language notebooks, research documentation and reproducibility, and native support for bash in code chunks.

# download latest stable Quarto (v1.8.25 as of 10/16/2025)
wget https://github.com/quarto-dev/quarto-cli/releases/download/v1.8.25/quarto-1.8.25-linux-amd64.deb

sudo dpkg -i quarto-1.8.25-linux-amd64.deb # install

To verify:

quarto --version
quarto check

11.2.7 System Libraries for R Spatial Packages

System libraries are pre-compiled C/C++ libraries that are installed at the OS level that provide the core functionality for a variety of geospatial operations within GPS2. R packages like sf, geosphere, and httr don’t re-implement these capabilities, but rather act as “wrappers” that compile against and call these system libraries. It is important to install these libraries now, so that later package installation in R (i.e. install.packages("sf")) can properly compile without error.

sudo apt install -y \
  libgdal-dev \
  libproj-dev \
  libgeos-dev \
  libudunits2-dev \
  libcurl4-openssl-dev \
  libssl-dev \
  libxml2-dev \
  libabsl-dev
11.2.7.0.1 GDAL (Geospatial Data Abstraction Library)

Powers the sf package for reading/writing spatial data formats (shapefiles, geoJSON). Examples of implementation can be found in 05-spatial-zoning-analysis.qmd.

11.2.7.0.2 PROJ (Cartographic Projections)

Handles coordinate system transformations (e.g., WGS84 to State Plane Wisconsin). This is important in maintaining accurate distance calculations in clustering. GPS2 uses EPSG:4326 (WGS84) consistently, but PROJ ensures future flexibility, and is a sf dependency.

11.2.7.0.3 GEOS (Geometry Engine Open Source)

Provides spatial operations (intersections, buffers, unions). Example in 05-spatial-zoning-analysis.qmd with “point-in-polygon” tests (which clusters fall within which zoning districts). Core dependency for PostGIS spatial functions.

11.2.7.0.4 udunits2

Handles unit conversions (miles <-> meters, hours <-> minutes), primarily used in GPS point processing and filtering. Dependency of the units package in R.

11.2.7.0.5 libcurl

Enables HTTP requests for httr package. Primarily used in 04-reverse-geocoding.qmd for local Nominatim API calls, and is also needed to download the Wisconsin OSM data if refreshing Nominatim.

11.2.7.0.6 libssl

Provides SSL/TLS encryption for secure HTTPS connections, and is required by libcurl to make HTTPS requests. Powers devtools:source_url() for format_path(), and installing CRAN packages through install.packages().

11.2.7.0.7 libxml2

Parses XML and GTML documents into structured data, dependency of httr and xml (dependency of many tideyverse packages).

11.2.7.0.8 libabsl

Required by s2 package, which itself is a dependency of the sf package.

11.2.8 Install CMake

Cmake is a cross-platform build system for compiling C/C++ code, and will be used if our libraries fail to properly compile certain packages (notably sf)

sudo apt install -y cmake

To verify:

cmake --version

11.3 R Package Installation

11.3.1 Temporary R Directory

Firstly, we are going to configure our own R temporary directory. We do this to avoid the common university practice of mounting /tmp with noexec for security reasons. Notably, this prevents R from executing configure scripts during package compilation.

To check if this will be an issue, run:

mount | grep /temp

If you see ‘noexec’, follow, complete the section below:

# create a temporary directory in home folder
mkdir -p ~/R_temp

# configure R to use it permanently (UPDATE USER IN SECOND LINE!!!)
cat > ~/.Renviron << 'EOF'
TMPDIR=~/home/user/R_temp 
EOF

To verify:

Sys.getenv("TMPDIR") # should show /home/user/R_temp

If it doesn’t show the correct and updated directory, remember to fully restart and reload R/R Studio. If this still doesn’t work, brute-force it within the R console by manually setting the path.

11.4 Package List and Installation

This is the current R package list for GPS2 (updated: 10/16/2025)

install.packages(c(
    "tidyverse",  
    "here",      
    "DBI",       
    "RPostgres",   
    "sf",        
    "httr",       
    "jsonlite",    
    "geosphere",   
    "leaflet",    
    "lubridate"    
  ))

Since devtools is a very complicated package, and we only want format_path() from lab_support, we will be sourcing it directly using base R.

source("https://raw.githubusercontent.com/jjcurtin/lab_support/main/format_path.R")

11.5 Mounting Standard and Restricted Research Drives

For ease of setup, a bash script that automatically maps both drives has been written and stored within the Linux VM for all users, following lab policy of storing it within /usr/local/bin.

Simply run mount-uw-drives and follow the steps of inserting your Net-ID and passwords. Additionally, a script for un-mounting the research drives can be found at the same location, and is dubbed unmount-uw-drives.

11.6 GPS2 Setup and Configuration

Now that all of the preliminary setup has been done, we can move to setting up the actual system.

11.6.1 Cloning into GPS2

An SSH key pair was generated on the research VM and configured as a shared authentication method. The private key (/etc/ssh/lab_github_key) is stored locally with read permissions granted to all VM users, while the corresponding public key was added as a deploy key to the GPS2 repository.

This allows all git operations (clone, push, pull) to be performed without individual GitHub accounts or collaborator permissions being required. The shared key is automatically used by SSH when connections to GitHub are established for this repository.

To clone into the repo:

git clone git@github.com:christopher-janssen/GPS2.git

11.6.2 Specifying mounting points for GPS2

The GPS2 repository is fitted with example config/env files for dealing with OS specific file path dependencies, as functions like format_path() do not work within Docker Compose YAML files.

Instead, simply run the following command in the /GPS2/docker-compose/ directory to properly configure the linux file-path mounting.

cp .env.linux .env

11.6.3 Intializing GPS2 Docker Containers

Navigate to the /docker-compose directory of GPS2 and run:

docker-compose up -d

You can then monitor the docker containers health and status with the command:

docker ps

Given the reduced CPU allotments, allow a size-able amount of time for the Nominatim image to fully initialize, as stopping the process during setup can cause errors.

Assuming all containers fully intialize, you are now ready to work through the standard GPS2 workflow!