The merits of using the Python API in Google Earth Engine
My own GIS weapon of choice was born of a development by the U.S. Army Corps of Engineers’ Construction Engineering Research Laboratory over 35 years ago – and I attest that GRASS GIS is still one of the best GIS desktop tools in my opinion (but that’s for another Geotech Friday).
However, two other weapons that have recently come to the fore and have changed how we process and store data are; gaming and Google. Before these came along, you could feel the pressure building up in the remote sensing world.
Contributing to the pressure build up was the increased accessibility of data generated from drones, high resolution satellite imagery options, camera technology and battery technology. I still remember the day I heard of the extraordinary decision by the US Geological Survey to make Landsat data freely available.
However, thanks to tech giants like Google, we now have access to greater graphics processing units (GPU) with increased computing power. This is born from the fierce competition in the race of data availability vs. processing capability and storage.
With this, we have welcomed cloud computing services and fibre optic network links to the party, allowing us to store massive amounts of data with the flexible processing power provided by third parties (mainly Google and Amazon).
One of the major players that we’re going to explore today is Google Earth Engine.
How Google Earth Engine operates in default
Google Earth Engine (GEE) a tool that has an immediate impact on the way you approach certain tasks. This is because it works equally well as a global quick look window, a fast prototyping tool and an all-out finished application builder.
Not only does GEE provide a platform to easily access open source data on a global scale, it also provides the computing power to be able to do something with it.
While some of this computing power is available directly on GEE’s native Javascript Integrated Development Environment (IDE) through the export of data in TFRecord format, there is a ready pipeline for you to feed preprocessed imagery data from Earth Engine to Google’s Tensorflow environment, an open source machine learning platform that uses the Keras deep learning Python API.
But, Tensorflow and the use of machine learning algorithms are not the only reasons we may choose to use the Python API as a significant alternative to the Javascript default IDE.
The Google Earth Engine Javascript IDE
Python API as an alternative IDE in GEE
The Python API allows you to use the same Earth Engine objects you would use in the Javascript IDE, with near-identical syntax, in a Python environment.
The primary difference is not about the Python language or its distinctions to Javascript. The difference is that you have total control over the client-side environment, which may be local or remote. This opens up a range of possibilities regarding what may be done with the preprocessed data.
Earth Engine data can be passed to Python in array form, and almost seamlessly be picked up as a Numpy Array object. There is little need to explain the potential analysis that can be done once this is achieved – the full gamut of Python (and R, and Matlab…) functionality is available from this point.
I have two observations about this method of operation:
- There is an up-front overhead. The decision to “bring the data home” requires a forced transfer of the contents of an Earth Engine object to the client-side environment. This is essentially a divorce from the Earth Engine, forcing it to resolve all latent computations (which are always carried out “just in time”) to convert the object to GeoJSON or TFRecord form.
- The transfer to a Python object is relatively easy because the Numpy constructor will understand the GeoJSON form directly. In the same way, this transfer may be done to any programming language.
Apart from the availability of all Python packages, there is another advantage to the use of the Python API, and this is to do with workflow and collaboration.
IPython provides a Python shell designed for interaction and exploratory coding. Using the web-based Jupyter notebook, this involves the writing and running of individual code cells, interspersed with Markdown explanatory notes.
A Python computation kernel is maintained, allowing the user to experiment by running and rerunning cells of targeted scope, without having to rerun upstream code. This is a great prototyping and debugging environment. When writing scripts on the Javascript IDE, testing changes involve the rerunning of the script in its entirety, unless the user has taken pains to incorporate interactivity in the form of panels, boxes and buttons.
Collaboration in the IPython notebook environment is done by sharing notebook files on Github, or using Google’s Colaboratory (Colab). Colab provides an IPython notebook in a virtual computer environment with free access to GPUs and access to the user’s Google Drive via a local link. This approach makes sharing notebooks very easy.
The Jupyter Notebook format (jupyter.org)
Are there downsides to the Python approach?
The Javascript IDE is the quickest to get going with, it’s simply a matter of loading the web page and beginning to type the script. Using Jupyter locally involves some set-up and a small wait for the kernel server to start.
Commencing with Colab is almost as quick as with the Javascript API. However, with all Python API-based coding, it is necessary to acquire an authentication code from Earth Engine once per session.
The Javascript IDE has several significant built-in features which are not immediately available with the Python API, such as the scripts library, the useful Docs tab and the quick search facility for available datasets and locations. Of particular benefit is the interactive map feature using the familiar Google topographic or satellite background, onto which layers may be added, and points, lines and regions defined which can then be passed back to the script.
However, with the addition of some Python packages, this functionality can be added to the IPython environment. A shining example of how such tools may be assembled together in one package is Qiusheng Wu’s geemap, which mimics most of the Javascript IDE’s tools, and provides some more.
IPYLeaflet map control taken from the ipyleaflet documentation
The decider: Javascript or Python?
If I had to pick either the Javascript IDE or Python API to work with, I would choose the Python route. This is because there are no limitations on where the data may be taken. But at the end of the day, it’s really all about Google Earth Engine. Google has provided the open source platform, access to great spatial datasets and the means to query them.
The IDE provides the means to go only so far and throws in some analytical functionality. The ability to export TFRecords means that you can take the data out of the Earth Engine and into the realm of analytical capabilities – only limited by your ideas.
In terms of the immediate potential of what it provides, the functionality of Google Earth Engine can be considered as complete. New datasets are added as they become available. When questions arise that can’t be met, Python packages invariably appear quickly to meet the demand. I believe that a major turning point in remote sensing occurred when open source data came together with (near) open source access to computing power with Google Earth Engine. This will prove to be a game-changer, particularly in the scale of uptake by the global community. The consequences can only be imagined at this stage, but it’s great to be a small part of it.
Related Articles
Here are more related articles you may be interested in.