geoplot.sankey

geoplot.sankey(*args, projection=None, start=None, end=None, path=None, hue=None, categorical=False, scheme=None, k=5, cmap='viridis', vmin=None, vmax=None, legend=False, legend_kwargs=None, legend_labels=None, legend_values=None, legend_var=None, extent=None, figsize=(8, 6), ax=None, scale=None, limits=(1, 5), scale_func=None, **kwargs)

Spatial Sankey or flow map.

Parameters:
  • df (GeoDataFrame, optional.) – The data being plotted. This parameter is optional - it is not needed if start and end (and hue, if provided) are iterables.
  • projection (geoplot.crs object instance, optional) – A geographic projection. For more information refer to the tutorial page on projections.
  • start (str or iterable) – A list of starting points. This parameter is required.
  • end (str or iterable) – A list of ending points. This parameter is required.
  • path (geoplot.crs object instance or iterable, optional) – Pass an iterable of paths to draw custom paths (see this example), or a projection to draw the shortest paths in that given projection. The default is Geodetic(), which will connect points using great circle distance—the true shortest path on the surface of the Earth.
  • hue (None, Series, GeoSeries, iterable, or str, optional) – Applies a colormap to the output points.
  • categorical (boolean, optional) – Set to True if hue references a categorical variable, and False (the default) otherwise. Ignored if hue is left unspecified.
  • scheme (None or {"quantiles"|"equal_interval"|"fisher_jenks"}, optional) – Controls how the colormap bin edges are determined. Ignored if hue is left unspecified.
  • k (int or None, optional) – Ignored if hue is left unspecified. Otherwise, if categorical is False, controls how many colors to use (5 is the default). If set to None, a continuous colormap will be used.
  • cmap (matplotlib color, optional) – The matplotlib colormap to be used. Ignored if hue is left unspecified.
  • vmin (float, optional) – Values below this level will be colored the same threshold value. Defaults to the dataset minimum. Ignored if hue is left unspecified.
  • vmax (float, optional) – Values above this level will be colored the same threshold value. Defaults to the dataset maximum. Ignored if hue is left unspecified.
  • scale (str or iterable, optional) – Applies scaling to the output points. Defaults to None (no scaling).
  • limits ((min, max) tuple, optional) – The minimum and maximum scale limits. Ignored if scale is left specified.
  • scale_func (ufunc, optional) – The function used to scale point sizes. Defaults to a linear scale. For more information see the Gallery demo.
  • legend (boolean, optional) – Whether or not to include a legend. Ignored if neither a hue nor a scale is specified.
  • legend_values (list, optional) –

    The values to use in the legend. Defaults to equal intervals. For more information see the Gallery demo.

  • legend_labels (list, optional) –

    The names to use in the legend. Defaults to the variable values. For more information see the Gallery demo.

  • legend_var ("hue" or "scale", optional) – If both hue and scale are specified, which variable to use in the legend.
  • legend_kwargs (dict, optional) – Keyword arguments to be passed to the underlying legend.
  • extent (None or (minx, maxx, miny, maxy), optional) – Used to control plot x-axis and y-axis limits manually.
  • figsize (tuple, optional) – An (x, y) tuple passed to matplotlib.figure which sets the size, in inches, of the resultant plot.
  • ax (AxesSubplot or GeoAxesSubplot instance, optional) – A matplotlib.axes.AxesSubplot or cartopy.mpl.geoaxes.GeoAxesSubplot instance. Defaults to a new axis.
  • kwargs (dict, optional) – Keyword arguments to be passed to the underlying matplotlib Line2D instances.
Returns:

The plot axis

Return type:

AxesSubplot or GeoAxesSubplot

Examples

A Sankey diagram is a simple visualization demonstrating flow through a network. A Sankey diagram is useful when you wish to show the volume of things moving between points or spaces: traffic load a road network, for example, or inter-airport travel volumes. The geoplot sankey adds spatial context to this plot type by laying out the points in meaningful locations: airport locations, say, or road intersections.

A basic sankey specifies data, start points, end points, and, optionally, a projection. The df argument is optional; if geometries are provided as independent iterables it is ignored. We overlay world geometry to aid interpretability.

ax = gplt.sankey(la_flights, start='start', end='end', projection=gcrs.PlateCarree())
ax.set_global(); ax.coastlines()
_images/sankey-geospatial-context.png

The lines appear curved because they are great circle paths, which are the shortest routes between points on a sphere.

ax = gplt.sankey(la_flights, start='start', end='end', projection=gcrs.Orthographic())
ax.set_global(); ax.coastlines(); ax.outline_patch.set_visible(True)
_images/sankey-greatest-circle-distance.png

To plot using a different distance metric pass a cartopy crs object (not a geoplot one) to the path parameter.

import cartopy.crs as ccrs
ax = gplt.sankey(la_flights, start='start', end='end', projection=gcrs.PlateCarree(), path=ccrs.PlateCarree())
ax.set_global(); ax.coastlines()
_images/sankey-path-projection.png

If your data has custom paths, you can use those instead, via the path parameter.

gplt.sankey(dc, path=dc.geometry, projection=gcrs.AlbersEqualArea(), scale='aadt')
_images/sankey-path.png

hue parameterizes the color, and cmap controls the colormap. legend adds a a legend. Keyword arguments can be passed to the legend using the legend_kwargs argument. These arguments will be passed to the underlying matplotlib Legend. The loc and bbox_to_anchor parameters are particularly useful for positioning the legend.

ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
                 start='from', end='to',
                 hue='mock_variable', cmap='RdYlBu',
                 legend=True, legend_kwargs={'bbox_to_anchor': (1.4, 1.0)})
ax.set_global()
ax.coastlines()
_images/sankey-legend-kwargs.png

Change the number of bins by specifying an alternative k value. To use a continuous colormap, explicitly specify k=None. You can change the binning sceme with scheme. The default is quantile, which bins observations into classes of different sizes but the same numbers of observations. equal_interval will creates bins that are the same size, but potentially containing different numbers of observations. The more complicated fisher_jenks scheme is an intermediate between the two.

ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
                 start='from', end='to',
                 hue='mock_variable', cmap='RdYlBu',
                 legend=True, legend_kwargs={'bbox_to_anchor': (1.25, 1.0)},
                 k=3, scheme='equal_interval')
ax.set_global()
ax.coastlines()
_images/sankey-scheme.png

If your variable of interest is already categorical, specify categorical=True to use the labels in your dataset directly.

ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
                 start='from', end='to',
                 hue='above_meridian', cmap='RdYlBu',
                 legend=True, legend_kwargs={'bbox_to_anchor': (1.2, 1.0)},
                 categorical=True)
ax.set_global()
ax.coastlines()
_images/sankey-categorical.png

scale can be used to enable linewidth as a visual variable. Adjust the upper and lower bound with the limits parameter.

ax = gplt.sankey(la_flights, projection=gcrs.PlateCarree(),
                 extent=(-125.0011, -66.9326, 24.9493, 49.5904),
                 start='start', end='end',
                 scale='Passengers',
                 limits=(0.1, 5),
                 legend=True, legend_kwargs={'bbox_to_anchor': (1.1, 1.0)})
ax.coastlines()
_images/sankey-scale.png

The default scaling function is linear: an observations at the midpoint of two others will be exactly midway between them in size. To specify an alternative scaling function, use the scale_func parameter. This should be a factory function of two variables which, when given the maximum and minimum of the dataset, returns a scaling function which will be applied to the rest of the data. A demo is available in the example gallery.

def trivial_scale(minval, maxval): return lambda v: 1
ax = gplt.sankey(la_flights, projection=gcrs.PlateCarree(),
                 extent=(-125.0011, -66.9326, 24.9493, 49.5904),
                 start='start', end='end',
                 scale='Passengers', scale_func=trivial_scale,
                 legend=True, legend_kwargs={'bbox_to_anchor': (1.1, 1.0)})
ax.coastlines()
_images/sankey-scale-func.png

hue and scale can co-exist. In case more than one visual variable is used, control which one appears in the legend using legend_var.

ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
         start='from', end='to',
         scale='mock_data',
         legend=True, legend_kwargs={'bbox_to_anchor': (1.1, 1.0)},
         hue='mock_data', legend_var="hue")
ax.set_global()
ax.coastlines()
_images/sankey-legend-var.png