Work with Shapefiles with Data Hub
Last Updated: July 05, 2020

## Introduction

Shapefiles are a proprietary but common geospatial file format developed by ESRI. It is frequently used by governments to store geospatial data.

Many shapefiles can be easily uploaded into a Data Hub Space. Some shapefiles require extra processing steps before you can bring them into the Data Hub.

In this tutorial, we’ll cover what you need to do to successfully import shapefiles, along with the special steps needed and some open source tools needed for those trickier ones.

This tutorial assumes:

You should also install:

## Standard shapefile upload via the Data Hub CLI

Duration is 5 min

Unlike a GeoJSON file, a shapefile is made up of a number of separate files. Shapefiles on the internet are usually zipped, but once uncompressed you will see a number of files with the same name but different extensions. Some of the more important ones are:

• .shp - contains the geometries of the features (points, lines, polygons)
• .dbf - contains the attributes for the features
• .prj - contains information aboute the projection and coordinate reference system (CRS)

If the shapefile uses lat/lon coordinates and the WGS84 projection, and is under 200MB, you should be able to upload it using the Data Hub CLI.

In the terminal, cd to the shapefile directory, and type

here xyz upload space_id -f my_shapefile.shp

The CLI will look for my_shapefile.dbf and other files in the specified directory. (If it is missing, no attributes of the geometries will be imported.)

Note that you can use -a to select attributes of features to convert into tags, which will let you filter features server-side when you access the Data Hub API.

Duration is 5 min

Shapefiles are an infinitely variable format, and there will be cases where you may need to manipulate or modify the data in order to import it into your Data Hub space. You can do this with other open-source geospatial tools, specifically mapshaper and QGIS.

### mapshaper

mapshaper is a command-line tool for editing and manipulating geospatial data in a variety of common formats.

https://github.com/mbloch/mapshaper
https://github.com/mbloch/mapshaper/wiki/Command-Reference

You can install it using npm:

npm install -g mapshaper

Note that mapshaper can modify shapefiles directly, or convert shapefiles into GeoJSON. Converting to GeoJSON will give you more options and faster uploads when bringing the data into HERE Studio. The mapshaper documentation provides a wide variety of options, but a simple conversion command is:

mapshaper my_geodata.shp -o my_geodata.geojson

(Note that you can also specify -o format=geojson but mapshaper will also attempt use the extension of the output filename to determine the format.)

### Data Hub QGIS plugin provided by HERE

Duration is 10 min

QGIS is an open source desktop GIS tool that lets you edit, visualize, manage, analyse and convert geospatial data. You can upload and download data from your Data Hub spaces using the Data Hub QGIS plugin. (The plugin is also available on Github.)

You can install the Data Hub QGIS plugin from within QGIS Plugin search tool if you have the “show experimental plugins” option checked in the plugin console settings.

You can easily open almost any shapefile in QGIS, at which point you can save it to your Data Hub spaces using the Data Hub QGIS plugin, or export it as GeoJSON to the desktop to use the Data Hub CLI streaming upload options.

## Large individual features

Duration is 10 min

Some shapefiles may contain very large and extremely detailed individual lines or polygons. If a single feature is greater than 10-20MB, you may see 400 or 413 http errors when you try to upload the shapefile. In many cases, this level of detail is unnecessary for web mapping. If so, you can try to simplify the feature using mapshaper or QGIS. You may also want to adjust Data Hub CLI upload parameters so less data is sent in each API request.

In order to optimize upload speed, the CLI “chunks” features together and then sends the chunk to the CLI. There are typically 200-400 features per chunk. While a large feature may be small enough to be uploaded, when combined with other features, it may be too large for the API.

You can adjust the chunk size using -c – in this example, the CLI will upload 100 features per API request:

here xyz upload spaceID -f large_features.shapefile -c 100

Depending on the size of the feature, you may want to try c -10 (ten per request) or c -1 (one at a time).

### mapshaper

You can simplify lines and polygons in shapefiles using -simplify.

mapshaper very_large_features.shp -simplify dp 20% -o simplified_features.geojson

Depending on the zoom level and extent your web map (think the border of France at zoom 10 vs zoom 3), you can also try 10%, 5%, and 1%.

Note that for smaller shapefiles you can pipe output from mapshaper directly to the Data Hub CLI.

mapshaper big_shapefile.shp -o format=geojson - | here xyz upload spaceID -p property_name -t specific_tag -s

In this case, you must specify the output format as format=geojson as there is no filename extension for mapshaper to reference. The - enables stout.

### QGIS

• open the shapefile in QGIS
• choose Vector -> Geometry Tools -> Simplify
• save the simplified data to a new Data Hub space using the Data Hub plugin

Note that the Simplify tool works in decimal degrees, and the default is 1 degree, which is probably not what you want. Useful values depend on the extent and zoom levels of your map, but 0.01, 0.001 and 0.0001 are interesting values.

## Very large shapefiles (> 200MB)

Duration is 10 min

The Data Hub CLI will attempt to load the entire shapefile into memory before uploading it to the API. This will generally work for shapefiles up to 200-300MB, but you will start to see Node.js memory errors for shapefiles larger than that.

While GeoJSON and CSVs can be streamed via the upload -s option, this option is not yet available for shapefiles. You will have the most success converting the shapefile to GeoJSON and then uploading to HERE Studio.

mapshaper big_data.shp -o format=geojson big_data.geojson
here xyz upload spaceID -f big_data.geojson -s

Note that -a is not available when -s is used, but you can still specify properties to convert into tags using -p.

You can also open the very large shapefile in QGIS and save directly to a Data Hub space using the Data Hub QGIS plugin, though this will be slower than using the CLI streaming feature.

## Projections and CRS (Coordinate Reference Systems)

Duration is 10 min

Just like standards, the beauty of projections is there are so many to choose from. GeoJSON expects points to be projected in Web Mercator (WGS84/EPSG:4326). Many shapefiles are in different projections, or use local projections without lat/lon coordinates (i.e. state plane). Fortunately, it is easy to get mapshaper to convert into GeoJSON-friendly coordinates.

mapshaper different_projection.shp -proj wgs84 -o format=geojson - | here xyz upload spaceID -p property_name -t specific_tag -s

If you see any node.js memory errors, you can break it up into two steps:

mapshaper different_projection.shp -proj wgs84 -o format=geojson different_projection.geojson
here xyz upload spaceID -f different_projection.geojson