Want to query GIS data in Google Cloud? You need to know Spatial SQL
In one of our most recent Location Matters podcast episodes, BigQuery Product Manager at Google, Chad Jennings and Director of Spatial Data Science, Matt Forrest, both acknowledged that for the up and coming GIS experts, it was a must to learn Spatial Structured Query Language (SQL). But why?
As a society, we are ingesting more data than ever before and to process and analyse this data requires powerful Cloud technology such as Google Cloud. While Spatial SQL has been around for a while, the ability to process large amounts of geospatial data within seconds has not been possible until the emergence of Cloud technology.
To understand the importance of Spatial SQL when it comes to querying GIS data in Google Cloud, we must first understand more about Spatial SQL.
What is spatial SQL and where does it come from?
SQL is a language designed specifically for managing data that is held in a relational database management system. One such database is PostgreSQL, which is an open-source relational database system. PostGIS is a traditional spatial database extension of PostgreSQL that supports spatial data and allows users to process spatial queries using standard SQL.
Spatial data identifies geographic points, features and boundaries on Earth. To store this data, we use latitude and longitude coordinate systems and geometric objects. We can then store these objects and points in a SQL server for applications to call when they need.
SQL Server supports two types of spatial data: geometry data and geography data. Geometry data is the X and Y coordinates of the Euclidean coordinate system that shows points, lines and polygons in 2D spaces. Geography data captures the points, lines and polygons on a round-earth coordinate system.
While traditional relational databases have comprehensive support, the sheer number of spatial data points, which could sit in the billions of rows in a database combined with the need to process this data is real time, means you may need to invest in a Cloud-native relational database.
Cue Google Cloud’s BigQuery: Querying spatial data in the cloud
BigQuery is Google Cloud’s managed, serverless data warehouse that can process petabytes of data in near real time. BigQuery GIS ingests, processes and analyses geospatial data, with the same syntax as in PostGIS, but in a Cloud environment.
BigQuery has a lot of the same PostGIS functions such as calculating areas, distance and spatial joins which means if you’re familiar with the PostGIS interface, the transition to a Cloud environment will be natural. See the full list of functions available in BigQuery here.
BigQuery supports a “GEOGRAPHY” data type in its standard SQL, representing a pointset on Earth. These geospatial data, in the form of points, lines and polygons on a WGS84 reference spheroid with geodesic edge, are supported by three formats:
- GeoJSON – a JSON-based format used for geometries and spatial features.
- Well-known Text (WKT) – a text format that describes [a collection of] points, lines and polygons as individual geometry shapes.
- Well-known Binary (WKB) – The binary version of a WKT format.
These formats are automatically identified by BigQuery and can support analysis of the petabytes of data within seconds. To learn about loading BigQuery GIS data, check out this resource.
Google also has an extensive repository of public datasets that can be leveraged by users. As of November 2020, there are 205 accessible datasets. See the full list of public datasets here.
Lastly, when using BigQuery GIS, you’re charged based on the amount of data that is stored in tables that contain BigQuery GIS data. This means that users are charged based on what they use when processing data rather than what they think their maximum number of requests will be.
Google Cloud’s BigQuery offers a sandbox environment to access the power of BigQuery for free. Try it here.
I’ve run my query in BigQuery, where can I visualise the data?
Within BigQuery, users can visualise their GIS data using the GeoViz tool. This sandbox tool provides a simple way to visualise location data within Google Cloud.
At the start of 2020, location intelligence platform, CARTO and Google Cloud announced a BigQuery CARTO connector. The connector, currently in Beta, allows users to run basic SQL queries by extracting data directly from a BigQuery table and push it into a CARTO database where users can then style and share insights quickly and securely. Read more about the connector here or get started with Spatial SQL in CARTO here.
Where can I learn more about spatial SQL?
To find out more about why Chad and Matt think Spatial SQL is a must learn, listen to the full podcast episode here.
NGIS has a wealth of knowledge when it comes to performing spatial analyses both on a desktop environment and in a Cloud environment. If your team is interested in learning more about how they can perform spatial analysis in a Cloud environment, reach out to the NGIS team here.
Feature image source: https://carto.com/help/tutorials/spatial-sql-part-1/
Related Articles
Here are more related articles you may be interested in.