Motivation: How to Obtain Roof Heights from a Building? π
I recently heard about an architect who wanted to quickly obtain roof heights for some buildings in Switzerland. As mentioned in an earlier article, we are very fortunate in Switzerland to have detailed open government data, including swissBUILDINGS3D which provides building geometry for 2.4 million buildings and the Federal Register of Buildings and Dwellings for Swiss addresses. In version 3.0 of SwissBuildings3D, roof heights are part of the data model - sadly, these data fields are not accessible in the excellent 3D map viewer, and the GeoAdmin API does not include the buildings dataset; it is only available via download.
The Solution: Cloud-native data for the win π
First, I thought this task would be a good opportunity to try Streamlit, a Python library and hosting service to create custom web applications. However, my result was cumbersome to develop with GDAL, slow in downloading and processing the data, and didn't match what the architect really wanted (though it does have a nice 3D viewer).
The next approach I took was to explore cloud-native design patterns with PMTiles and Geoparquet. I will explain the technical details below, but here is the final result: the Swiss Rooftop Explorer.
I recommend exploring the app: you can zoom in and out as you wish or type in your favorite Swiss zip code. You can also hit the "Random" button on the bottom right β it will teleport you to a random postal code (or zip code) area.1
While this geo app serves the original intent of allowing the architect to obtain the roof heights of any Swiss building, it is also a technology demo with some notable features:
- Fast web map for approximately 3.2 million roof polygons in Switzerland, using PMTiles
- End-to-end data processing pipeline from a FileGeodatabase to PMTiles via Geoparquet
- Swiss addresses stored in a compact parquet file for quick retrieval
- Client-side querying of addresses using DuckDB-WASM
- No server component needed other than a web or object server, making it easy to run locally
- Bookmarking is possible, for example in Bern: https://ping13.net/swiss-rooftop-explorer/?zip=3011 (though it may take a couple of seconds until the app zooms to the area of interest)
The code, including the required data processing, is available on GitHub.
The rise of cloud optimized geospatial data formats π
In recent years, one of the most exciting developments in the geospatial data community has been the rise of cloud-optimized geospatial formats. As the need for and size of geospatial data continuously increase, traditional file downloads or feature services may become impractical for applications meant to operate on a planetary scale. Cloud-optimized geospatial formats like COGs (cloud-optimized GeoTIFFs), GeoParquet, and PMTiles have gained popularity. What they have in common is that during data creation, one can chunk, slice, and dice the data for easier consumption, picking three examples:
- COGs have overviews baked in the data as well as internal tiles which allows to download only a subset of a raster data.
- Geoparquet files represent tabular data with spatial columns. Recently, the Parquet specification officially adopted
GEOMETRY
as a native data type. Parquet files have three main advantages: first, they are column-oriented, which means that the actually bytes for a column are stored in one contiguous block. Second, you can define row groups so that the data is chunked to meet specific filtering Third, the format allows for a very efficient compression of data. - Lastly, PMTiles are a visualization format and move the tiled data idea to the next level: every modern webmap service uses tiles on various zoom levels to visualize data. Each tile is usually a file or a blob in object storage. PMTiles consolidates all tiles into a single file while maintaining fast lookup efficiency.
Using cloud optimized geospatial formats for the rooftop explorer π
I haven't used any COGs for the demo, but Geoparquet and PMTiles have proven to be very useful for developing this web geo application:
For buildings:
- Download and convert: I download the complete dataset of buildings from SwissBuildings3D, and convert it into parquet files using
swissbuildings3D_gdb2pq.py
. The essential part of the work here is interpreting the specific 3D geometry of swissBUILDINGS3D to a geometry that can be saved in ageom
column in the Geoparquet file. - Transform the data to 2D polygons: In
swissbuildings3D_process.py
, I prepare another parquet file with a 2D polygon for the webmap. - Create the PMTiles: With the data in a parquet format, converting it to PMTiles becomes extremely easy using DuckDB (see below). The actual script
pq2pmtiles.sh
is just a few lines of code and consists of a carefully designed SQL statement, which generates GeoJSON objects that are be piped directly to Tippecanoe 2.
- Download and convert: I download the complete dataset of buildings from SwissBuildings3D, and convert it into parquet files using
For addresses, there is essentially only one step: download the SQLite database from the official site and convert it to parquet. The key here is that the data is grouped by zip code so that it can later be queried efficiently, see
addresses_sqlite2pq.py
.
I copy the PMTile file and the address parquet file to my own webhoster (alternatively, you may use your favorite object storage service). When using webhosting, you need to make sure that range requests are allowed - for Apache, the following .htaccess
will do:
<IfModule mod_headers.c>
Header set Access-Control-Allow-Methods "GET, POST, OPTIONS"
Header set Access-Control-Allow-Headers "Range, Content-Type, Authorization"
Header set Access-Control-Expose-Headers "Content-Range, Accept-Ranges, Content-Length"
</IfModule>
The end-to-end processing is more or less automated by the Makefile
in the repository3.
Parquet files and DuckDB π
Now, all that processing is paying off: I checked the resulting PMTile file (350MB) using https://pmtiles.io by dragging the local file into the browser, and it looked good.
To consume and explore Parquet files, DuckDB is an excellent choice. I can directly use the files as tables in an SQL statement, for example:
SELECT COUNT(*) FROM 'https://ping13.net/data/addresses.parquet' WHERE dplz4 = 3011;
In the SQL statement mentioned, we don't download the parquet file; we simply define it as a table reference. When it comes to querying the data, the chunking technique from earlier becomes useful, with DuckDB supporting "predicate-pushdown".
[...] DuckDB can read Parquet row groups in parallel even within the same file and uses predicate pushdown to only scan the row groups whose metadata ranges match the WHERE clause of the query. However there is some overhead associated with reading the metadata in each group. [...]
Running the SQL statement above, it answers in approximately 300ms, although the full file is about 50MB. This is not because I have a fast internet connection (you would need more a bit more than 1Gbps to achieve this transfer rate). In fact, when profiling the SQL statement, it shows that it "only" used about 1.5MB of ingress for the query (basically the data itself plus the overhead for reading the metadata).Click to show SQL profile information for the SQL statement
duckdb> SELECT COUNT(*) FROM 'https://ping13.net/data/addresses.parquet' WHERE dplz4 = 3011;
ββββββββββββββββ
β count_star() β
ββββββββββββββββ‘
β 1613 β
ββββββββββββββββ
duckdb> EXPLAIN ANALYZE SELECT COUNT(*) FROM 'https://ping13.net/data/addresses.parquet' WHERE dplz4 = 3011;
βββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββ
ββ Query Profiling Information ββ
βββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββ
EXPLAIN ANALYZE SELECT COUNT(*) FROM 'https://ping13.net/data/addresses.parquet' WHERE dplz4 = 3011;
βββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββ
ββ HTTPFS HTTP Stats ββ
ββ ββ
ββ in: 1.5 MiB ββ
ββ out: 0 bytes ββ
ββ #HEAD: 1 ββ
ββ #GET: 3 ββ
ββ #PUT: 0 ββ
ββ #POST: 0 ββ
βββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββ
ββ Total Time: 0.272s ββ
ββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββ
β QUERY β
βββββββββββββββ¬ββββββββββββββ
βββββββββββββββ΄ββββββββββββββ
β EXPLAIN_ANALYZE β
β ββββββββββββββββββββ β
β 0 Rows β
β (0.00s) β
βββββββββββββββ¬ββββββββββββββ
βββββββββββββββ΄ββββββββββββββ
β UNGROUPED_AGGREGATE β
β ββββββββββββββββββββ β
β Aggregates: β
β count_star() β
β β
β 1 Rows β
β (0.00s) β
βββββββββββββββ¬ββββββββββββββ
βββββββββββββββ΄ββββββββββββββ
β TABLE_SCAN β
β ββββββββββββββββββββ β
β Function: β
β PARQUET_SCAN β
β β
β Filters: DPLZ4=3011 β
β β
β 1613 Rows β
β (0.30s) β
βββββββββββββββββββββββββββββ
The best part of using DuckDB: It is extremely portable and can even run in your browser: It has been compiled to WebAssembly and runs (to some extent) on your mobile phone. You can try it yourself by opening a Web shell at shell.duckdb.org and type
EXPLAIN ANALYZE SELECT COUNT(*) FROM 'https://ping13.net/data/addresses.parquet' WHERE dplz4 = 3011;
With DuckDB running in a browser, there was nothing stopping me from creating a moderately fast Swiss geo-coding service without any server components (Though I admit it's a somewhat unconventional two-step geocoding service where you first need to specify the zip code and then the street and house number).
An alternative to Geoparquet would be flatgeobuf, which would allow for efficient bounding box filtering4 (example).
Cloud Native Design: Simpler, cheaper, more efficient π
Obviously, this is a toy example, but I truly believe that there is something big behind all of this: The current (classic) web map architecture includes a publication database and a heavy and costly API layer. Just imagine you can get rid of both of them entirely while the user experience is the same! Let's look at the following table:
Layer Architecture for geo applications | Example for a classic geo app architecture | Example for a cloud-native geo app architecture | Benefits of cloud-native approach |
---|---|---|---|
Client Layer | Web browser | Web browser | Same user experience with reduced backend complexity |
API Layer | ArcGIS REST API, or GeoServer | - | β Zero maintenance costs β No version management needed β Eliminates server costs |
Publication Data Layer | Oracle, or PostGIS Tile server | Geoparquet, Flatgeobuf PMTiles | β Lower storage costs β Simpler updates β Easy backup and versioning β CDN-friendly static files |
Extract Transform Load (ETL) Layer | Python scripts, or FME | Python scripts, or FME | β More flexible in creating data β Format and tools still changing. |
Production Data Layer | FileGeodatabase Shapefile | FileGeodatabase Shapefile | Source data remains unchanged while gaining all cloud benefits |
Summarizing the table above, the three advantages are:
- low-cost: object/file storage space is really cheap these days
- no maintenance: no server means no licensing costs and maintenance costs
- very flexible: ETL pipelines are easier to develop when there are only files involved
Will it work for all your use cases? Of course not! But if you have the opportunity to move the logic of your web map architecture to either the ETL process or to the client, you should definitely consider it. Your finance team will thank you, and you wonβt get woken up in the middle of the night due to a server incident.
What's next? π
The Swiss Rooftop Explorer is primarily a technology demo, but it demonstrates how cloud-native geospatial formats can replace traditional GIS server architecture with simple, efficient file-based storage. As geospatial datasets grow, this approach has the chance to become a new standard for low-maintenance, high-performance applications. There is much more to explore and discuss on this topic, but I will save that for another time.
Do you have any questions? Do you think you might have a use case for a cloud-native geo app? Get in touch with me β I'd love to talk.
With entering zip codes in the app, you should use a computer rather than a mobile phone, as this feature can be memory-heavy for the browser. The reason will be discussed later (DuckDB).Β β©
Tippecanoe is a command-line tool originally developed by Mapbox, now maintained by Felt. It can convert large GeoJSON, CSV, or other geospatial datasets into optimized vector tiles (MBTiles format and PMTiles).Β β©
Maybe at some point I'll use tools like Task, but the decades old Makefile is battle proven for specifying dependencies and I got used to its syntax. Plus, there is so much documentation that your favorite LLM can spot mistakes and make suggestions.Β β©
There is an interesting discussion on OvertureMaps' GitHub repo about efficient bounding-box filtering with Geoparquet.Β β©