geoplot.aggplot

geoplot.aggplot(df, projection=None, hue=None, by=None, geometry=None, nmax=None, nmin=None, nsig=0, agg=<function mean>, cmap='viridis', vmin=None, vmax=None, legend=True, legend_kwargs=None, extent=None, figsize=(8, 6), ax=None, **kwargs)

Self-aggregating quadtree plot.

Parameters:
  • df (GeoDataFrame) – The data being plotted.
  • projection (geoplot.crs object instance, optional) – A geographic projection. For more information refer to the tutorial page on projections.
  • hue (None, Series, GeoSeries, iterable, or str) – Applies a colormap to the output shapes. Required.
  • cmap (matplotlib color, optional) – The matplotlib colormap to be used.
  • by (iterable or str, optional) – If specified, this data grouping will be used to aggregate points into convex hulls or, if geometry is also specified, into polygons. If left unspecified the data will be aggregated using a quadtree.
  • geometry (GeoDataFrame or GeoSeries, optional) – A list of polygons to be used for spatial aggregation. Optional. See by.
  • nmax (int or None, optional) – Ignored if not plotting a quadtree. Otherwise, controls the maximum number of observations in a quadrangle. If left unspecified, there is no maximum size.
  • nmin (int, optional) – Ignored if not plotting a quadtree. Otherwise, controls the minimum number of observations in a quadrangle. If left unspecified, there is no minimum size.
  • nsig (int, optional) – Ignored if not plotting a quadtree. Otherwise, controls the minimum number of observations in a quadrangle deemed significant. Insignificant quadrangles are removed from the plot. Defaults to 0 (empty patches).
  • agg (function, optional) – The aggregation func used for the colormap. Defaults to np.mean.
  • vmin (float, optional) – Values below this level will be colored the same threshold value. Defaults to the dataset minimum.
  • vmax (float, optional) – Values above this level will be colored the same threshold value. Defaults to the dataset maximum.
  • legend (boolean, optional) – Whether or not to include a legend.
  • legend_values (list, optional) – The values to use in the legend. Defaults to equal intervals. For more information see the Gallery demo.
  • legend_labels (list, optional) –

    The names to use in the legend. Defaults to the variable values. For more information see the Gallery demo.

  • legend_kwargs (dict, optional) – Keyword arguments to be passed to the underlying legend.
  • extent (None or (minx, maxx, miny, maxy), optional) – Used to control plot x-axis and y-axis limits manually.
  • figsize (tuple, optional) – An (x, y) tuple passed to matplotlib.figure which sets the size, in inches, of the resultant plot.
  • ax (AxesSubplot or GeoAxesSubplot instance, optional) – A matplotlib.axes.AxesSubplot or cartopy.mpl.geoaxes.GeoAxesSubplot instance. Defaults to a new axis.
  • kwargs (dict, optional) – Keyword arguments to be passed to the underlying matplotlib Polygon patches.
Returns:

The plot axis

Return type:

AxesSubplot or GeoAxesSubplot

Examples

This plot type accepts any geometry, including mixtures of polygons and points, averages the value of a certain data parameter at their centroids, and plots the result, using a colormap is the visual variable.

For the purposes of comparison, this library’s choropleth function takes some sort of data as input, polygons as geospatial context, and combines themselves into a colorful map. This is useful if, for example, you have data on the amount of crimes committed per neighborhood, and you want to plot that.

But suppose your original dataset came in terms of individual observations - instead of “n collisions happened in this neighborhood”, you have “one collision occured at this specific coordinate at this specific date”. This is obviously more useful data - it can be made to do more things - but in order to generate the same map, you will first have to do all of the work of geolocating your points to neighborhoods (not trivial), then aggregating them (by, in this case, taking a count).

aggplot handles this work for you. It takes input in the form of observations, and outputs as useful as possible a visualization of their “regional” statistics. What a “region” corresponds to depends on how much geospatial information you can provide.

If you can’t provide any geospatial context, aggplot will output what’s known as a quadtree: it will break your data down into recursive squares, and use them to aggregate the data. This is a very experimental format, is very fiddly to make, and has not yet been optimized for speed; but it provides a useful baseline which requires no additional work and can be used to expose interesting geospatial correlations right away. And, if you have enough observations, it can be a pretty good approximation (collisions in New York City pictured).

Our first few examples are of just such figures. A simple aggplot quadtree can be generated with just a dataset, a data column of interest, and, optionally, a projection.

import geoplot as gplt
import geoplot.crs as gcrs
gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='LATDEP')
_images/aggplot-initial.png

To get the best output, you often need to tweak the nmin and nmax parameters, controlling the minimum and maximum number of observations per box, respectively, yourself. In this case we’ll also choose a different matplotlib colormap, using the cmap parameter.

aggplot will satisfy the nmax parameter before trying to satisfy nmin, so you may result in spaces without observations, or ones lacking a statistically significant number of observations. This is necessary in order to break up “spaces” that the algorithm would otherwise end on. You can control the maximum number of observations in the blank spaces using the nsig parameter.

gplt.aggplot(collisions, nmin=20, nmax=500, nsig=5, projection=gcrs.PlateCarree(), hue='LATDEP', cmap='Reds')
_images/aggplot-quadtree-tuned.png

You’ll have to play around with these parameters to get the clearest picture.

Usually, however, observations with a geospatial component will be provided with some form of spatial categorization. In the case of our collisions example, this comes in the form of a postal zip code. With the simple addition of this data column via the by parameter, our output changes radically, taking advantage of the additional context we now have to sort and aggregate our observations by (hopefully) geospatially meaningful, if still crude, grouped convex hulls.

gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
             by='BOROUGH')
_images/aggplot-hulls.png

Finally, suppose you actually know exactly the geometries that you would like to aggregate by. Provide these in the form of a geopandas GeoSeries, one whose index matches the values in your by column (so BROOKLYN matches BROOKLYN for example), to the geometry parameter. Your output will now be an ordinary choropleth.

gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
             by='BOROUGH', geometry=boroughs)
_images/aggplot-by.png

Observations will be aggregated by average, by default. In our example case, our plot shows that accidents in Manhattan tend to result in significantly fewer injuries than accidents occuring in other boroughs. Specify an alternative aggregation using the agg parameter.

gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
         geometry=boroughs_2, by='BOROUGH', agg=len)
_images/aggplot-agg.png

legend toggles the legend. Additional keyword arguments for styling the colorbar legend are passed using legend_kwargs. Other additional keyword arguments are passed to the underlying matplotlib Polygon instances.

gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
             geometry=boroughs_2, by='BOROUGH', agg=len, linewidth=0,
             legend_kwargs={'orientation': 'horizontal'})
_images/aggplot-legend-kwargs.png