This module defines the geoplot
coordinate reference system classes, wrappers on cartopy.crs
objects meant
to be used as parameters to the projection
parameter of all front-end geoplot
outputs. For the list of
Cartopy CRS objects this module derives from, refer to http://scitools.org.uk/cartopy/docs/latest/crs/projections.html.
This module defines the majority of geoplot functions, including all plot types.
geoplot.geoplot.
aggplot
(df, projection=None, hue=None, by=None, geometry=None, nmax=None, nmin=None, nsig=0, agg=<function mean>, cmap='viridis', vmin=None, vmax=None, legend=True, legend_kwargs=None, extent=None, figsize=(8, 6), ax=None, **kwargs)¶Self-aggregating quadtree plot.
Parameters: |
|
---|---|
Returns: | The plot axis |
Return type: |
|
Examples
This plot type accepts any geometry, including mixtures of polygons and points, averages the value of a certain data parameter at their centroids, and plots the result, using a colormap is the visual variable.
For the purposes of comparison, this library’s choropleth
function takes some sort of data as input,
polygons as geospatial context, and combines themselves into a colorful map. This is useful if, for example,
you have data on the amount of crimes committed per neighborhood, and you want to plot that.
But suppose your original dataset came in terms of individual observations - instead of “n collisions happened in this neighborhood”, you have “one collision occured at this specific coordinate at this specific date”. This is obviously more useful data - it can be made to do more things - but in order to generate the same map, you will first have to do all of the work of geolocating your points to neighborhoods (not trivial), then aggregating them (by, in this case, taking a count).
aggplot
handles this work for you. It takes input in the form of observations, and outputs as useful as
possible a visualization of their “regional” statistics. What a “region” corresponds to depends on how much
geospatial information you can provide.
If you can’t provide any geospatial context, aggplot
will output what’s known as a quadtree: it will break
your data down into recursive squares, and use them to aggregate the data. This is a very experimental format,
is very fiddly to make, and has not yet been optimized for speed; but it provides a useful baseline which
requires no additional work and can be used to expose interesting geospatial correlations right away. And,
if you have enough observations, it can be a pretty good approximation (collisions in New York City pictured).
Our first few examples are of just such figures. A simple aggplot
quadtree can be generated with just a
dataset, a data column of interest, and, optionally, a projection.
import geoplot as gplt
import geoplot.crs as gcrs
gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='LATDEP')
To get the best output, you often need to tweak the nmin
and nmax
parameters, controlling the minimum and
maximum number of observations per box, respectively, yourself. In this case we’ll also choose a different
matplotlib colormap, using the cmap
parameter.
aggplot
will satisfy the nmax
parameter before trying to satisfy nmin
, so you may result in spaces
without observations, or ones lacking a statistically significant number of observations. This is necessary in
order to break up “spaces” that the algorithm would otherwise end on. You can control the maximum number of
observations in the blank spaces using the nsig
parameter.
gplt.aggplot(collisions, nmin=20, nmax=500, nsig=5, projection=gcrs.PlateCarree(), hue='LATDEP', cmap='Reds')
You’ll have to play around with these parameters to get the clearest picture.
Usually, however, observations with a geospatial component will be provided with some form of spatial
categorization. In the case of our collisions example, this comes in the form of a postal zip code. With the
simple addition of this data column via the by
parameter, our output changes radically, taking advantage of
the additional context we now have to sort and aggregate our observations by (hopefully) geospatially
meaningful, if still crude, grouped convex hulls.
gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
by='BOROUGH')
Finally, suppose you actually know exactly the geometries that you would like to aggregate by. Provide these in
the form of a geopandas
GeoSeries
, one whose index matches the values in your by
column (so
BROOKLYN
matches BROOKLYN
for example), to the geometry
parameter. Your output will now be an
ordinary choropleth.
gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
by='BOROUGH', geometry=boroughs)
Observations will be aggregated by average, by default. In our example case, our plot shows that accidents in
Manhattan tend to result in significantly fewer injuries than accidents occuring in other boroughs. Specify an
alternative aggregation using the agg
parameter.
gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
geometry=boroughs_2, by='BOROUGH', agg=len)
legend
toggles the legend. Additional keyword arguments for styling the colorbar legend are passed using legend_kwargs
. Other additional keyword
arguments are passed to the underlying matplotlib
Polygon instances.
gplt.aggplot(collisions, projection=gcrs.PlateCarree(), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
geometry=boroughs_2, by='BOROUGH', agg=len, linewidth=0,
legend_kwargs={'orientation': 'horizontal'})
geoplot.geoplot.
cartogram
(df, projection=None, scale=None, limits=(0.2, 1), scale_func=None, trace=True, trace_kwargs=None, hue=None, categorical=False, scheme=None, k=5, cmap='viridis', vmin=None, vmax=None, legend=False, legend_values=None, legend_labels=None, legend_kwargs=None, legend_var='scale', extent=None, figsize=(8, 6), ax=None, **kwargs)¶Self-scaling area plot.
Parameters: |
|
---|---|
Returns: | The plot axis |
Return type: |
|
Examples
A cartogram is a plot type which ingests a series of enclosed Polygon
or MultiPolygon
entities and spits
out a view of these shapes in which area is distorted according to the size of some parameter of interest.
A basic cartogram specifies data, a projection, and a scale
parameter.
import geoplot as gplt
import geoplot.crs as gcrs
gplt.cartogram(boroughs, scale='Population Density', projection=gcrs.AlbersEqualArea())
The gray outline can be turned off by specifying trace
, and a legend can be added by specifying legend
.
gplt.cartogram(boroughs, scale='Population Density', projection=gcrs.AlbersEqualArea(),
trace=False, legend=True)
Keyword arguments can be passed to the legend using the legend_kwargs
argument. These arguments will be
passed to the underlying matplotlib.legend.Legend
instance (ref). The loc
and bbox_to_anchor
parameters are particularly useful for positioning the legend. Other additional arguments will be passed to the
underlying matplotlib
scatter plot.
gplt.cartogram(boroughs, scale='Population Density', projection=gcrs.AlbersEqualArea(),
trace=False, legend=True, legend_kwargs={'loc': 'upper left'})
Additional arguments to cartogram
will be interpreted as keyword arguments for the scaled polygons,
using matplotlib Polygon patch rules.
gplt.cartogram(boroughs, scale='Population Density', projection=gcrs.AlbersEqualArea(),
edgecolor='darkgreen')
Manipulate the outlines use the trace_kwargs
argument, which accepts the same matplotlib Polygon patch parameters.
gplt.cartogram(boroughs, scale='Population Density', projection=gcrs.AlbersEqualArea(),
trace_kwargs={'edgecolor': 'lightgreen'})
Adjust the level of scaling to apply using the limits
parameter.
gplt.cartogram(boroughs, scale='Population Density', projection=gcrs.AlbersEqualArea(),
limits=(0.5, 1))
The default scaling function is linear: an observations at the midpoint of two others will be exactly midway
between them in size. To specify an alternative scaling function, use the scale_func
parameter. This should
be a factory function of two variables which, when given the maximum and minimum of the dataset,
returns a scaling function which will be applied to the rest of the data. A demo is available in
the example gallery.
def trivial_scale(minval, maxval): return lambda v: 2
gplt.cartogram(boroughs, scale='Population Density', projection=gcrs.AlbersEqualArea(),
limits=(0.5, 1), scale_func=trivial_scale)
cartogram
also provides the same hue
visual variable parameters provided by e.g. pointplot
. For more
information on hue
-related arguments, see the related sections in the pointplot
documentation.
gplt.cartogram(boroughs, scale='Population Density', projection=gcrs.AlbersEqualArea(),
hue='Population Density', k=None, cmap='Blues')
geoplot.geoplot.
choropleth
(df, projection=None, hue=None, scheme=None, k=5, cmap='Set1', categorical=False, vmin=None, vmax=None, legend=False, legend_kwargs=None, legend_labels=None, extent=None, figsize=(8, 6), ax=None, **kwargs)¶Area aggregation plot.
Parameters: |
|
---|---|
Returns: | The plot axis |
Return type: |
|
Examples
A choropleth takes observations that have been aggregated on some meaningful polygonal level (e.g. census tract, state, country, or continent) and displays the data to the reader using color. It is a well-known plot type, and likeliest the most general-purpose and well-known of the specifically spatial plot types. It is especially powerful when combined with meaningful or actionable aggregation areas; if no such aggregations exist, or the aggregations you have access to are mostly incidental, its value is more limited.
The choropleth
requires a series of enclosed areas consisting of shapely
Polygon
or MultiPolygon
entities, and a set of data about them that you would like to express in color. A basic choropleth requires
geometry, a hue
variable, and, optionally, a projection.
import geoplot as gplt
import geoplot.crs as gcrs
gplt.choropleth(polydata, hue='latdep', projection=gcrs.PlateCarree())
Change the colormap with the cmap
parameter.
gplt.choropleth(polydata, hue='latdep', projection=gcrs.PlateCarree(), cmap='Blues')
If your variable of interest is already categorical, you can specify categorical=True
to
use the labels in your dataset directly. To add a legend, specify legend
.
gplt.choropleth(boroughs, projection=gcrs.AlbersEqualArea(), hue='BoroName',
categorical=True, legend=True)
Keyword arguments can be passed to the legend using the legend_kwargs
argument. These arguments will be
passed to the underlying matplotlib.legend.Legend
instance (ref). The loc
and bbox_to_anchor
parameters are particularly useful for positioning the legend. Other additional arguments will be passed to the
underlying matplotlib
scatter plot.
gplt.choropleth(boroughs, projection=gcrs.AlbersEqualArea(), hue='BoroName',
categorical=True, legend=True, legend_kwargs={'loc': 'upper left'})
Additional arguments not in the method signature will be passed as keyword parameters to the underlying matplotlib Polygon patches.
gplt.choropleth(boroughs, projection=gcrs.AlbersEqualArea(), hue='BoroName', categorical=True,
linewidth=0)
Choropleths default to splitting the data into five buckets with approximately equal numbers of observations in
them. Change the number of buckets by specifying k
. Or, to use a continuous colormap, specify k=None
. In
this case a colorbar legend will be used.
gplt.choropleth(polydata, hue='latdep', cmap='Blues', k=None, legend=True,
projection=gcrs.PlateCarree())
The choropleth
binning methodology is controlled using by scheme` parameter. The default is quantile
,
which bins observations into classes of different sizes but the same numbers of observations. equal_interval
will creates bins that are the same size, but potentially containing different numbers of observations.
The more complicated fisher_jenks
scheme is an intermediate between the two.
gplt.choropleth(census_tracts, hue='mock_data', projection=gcrs.AlbersEqualArea(),
legend=True, edgecolor='white', linewidth=0.5, legend_kwargs={'loc': 'upper left'},
scheme='equal_interval')
geoplot.geoplot.
kdeplot
(df, projection=None, extent=None, figsize=(8, 6), ax=None, clip=None, **kwargs)¶Spatial kernel density estimate plot.
Parameters: |
|
---|---|
Returns: | The plot axis |
Return type: |
|
Examples
Kernel density estimate is a flexible unsupervised machine learning technique for non-parametrically estimating the distribution underlying input data. The KDE is a great way of smoothing out random noise and estimating the true shape of point data distributed in your space, but it needs a moderately large number of observations to be reliable.
The geoplot
kdeplot
, actually a thin wrapper on top of the seaborn
kdeplot
, is an application of
this visualization technique to the geospatial setting.
A basic kdeplot
specifies (pointwise) data and, optionally, a projection. To make the result more
interpretable, I also overlay the underlying borough geometry.
ax = gplt.kdeplot(collisions, projection=gcrs.AlbersEqualArea())
gplt.polyplot(boroughs, projection=gcrs.AlbersEqualArea(), ax=ax)
Most of the rest of the parameters to kdeplot
are parameters inherited from the seaborn method by the same
name, on which this plot type is
based. For example, specifying shade=True
provides a filled KDE instead of a contour one:
ax = gplt.kdeplot(collisions, projection=gcrs.AlbersEqualArea(),
shade=True)
gplt.polyplot(boroughs, projection=gcrs.AlbersEqualArea(), ax=ax)
Use n_levels
to specify the number of contour levels.
ax = gplt.kdeplot(collisions, projection=gcrs.AlbersEqualArea(),
n_levels=30)
gplt.polyplot(boroughs, projection=gcrs.AlbersEqualArea(), ax=ax)
Or specify cmap
to change the colormap.
ax = gplt.kdeplot(collisions, projection=gcrs.AlbersEqualArea(),
cmap='Purples')
gplt.polyplot(boroughs, projection=gcrs.AlbersEqualArea(), ax=ax)
Oftentimes given the geometry of the location, a “regular” continuous KDEPlot doesn’t make sense. We can specify a
clip
of iterable geometries, which will be used to trim the kdeplot
. Note that if you have set
shade=True
as a parameter you may need to additionally specify shade_lowest=False
to avoid inversion at
the edges of the plot.
gplt.kdeplot(collisions, projection=gcrs.AlbersEqualArea(),
shade=True, clip=boroughs)
geoplot.geoplot.
pointplot
(df, projection=None, hue=None, categorical=False, scheme=None, k=5, cmap='Set1', vmin=None, vmax=None, scale=None, limits=(0.5, 2), scale_func=None, legend=False, legend_values=None, legend_labels=None, legend_kwargs=None, legend_var=None, figsize=(8, 6), extent=None, ax=None, **kwargs)¶Geospatial scatter plot.
Parameters: |
|
---|---|
Returns: | The plot axis |
Return type: |
|
Examples
The pointplot
is a geospatial scatter plot representing
each observation in your dataset with a single point. It is simple and easily interpretable plot that is nearly
universally understood, making it an ideal choice for showing simple pointwise relationships between
observations.
The expected input is a GeoDataFrame
containing geometries of the shapely.geometry.Point
type. A
bare-bones pointplot goes thusly:
import geoplot as gplt
import geoplot.crs as gcrs
gplt.pointplot(points)
The hue
parameter accepts a data column and applies a colormap to the output. The legend
parameter
toggles a legend.
gplt.pointplot(cities, projection=gcrs.AlbersEqualArea(), hue='ELEV_IN_FT', legend=True)
The pointplot
binning methodology is controlled using by scheme` parameter. The default is quantile
,
which bins observations into classes of different sizes but the same numbers of observations. equal_interval
will creates bins that are the same size, but potentially containing different numbers of observations.
The more complicated fisher_jenks
scheme is an intermediate between the two.
gplt.pointplot(cities, projection=gcrs.AlbersEqualArea(), hue='ELEV_IN_FT',
legend=True, scheme='equal_interval')
Alternatively, your data may already be categorical. In that case specify categorical=True
instead.
gplt.pointplot(collisions, projection=gcrs.AlbersEqualArea(), hue='BOROUGH',
legend=True, categorical=True)
Keyword arguments can be passed to the legend using the legend_kwargs
argument. These arguments will be
passed to the underlying matplotlib.legend.Legend
instance (ref). The loc
and bbox_to_anchor
parameters are particularly useful for positioning the legend. Other additional arguments will be passed to the
underlying matplotlib
scatter plot.
gplt.pointplot(collisions[collisions['BOROUGH'].notnull()], projection=gcrs.AlbersEqualArea(),
hue='BOROUGH', categorical=True,
legend=True, legend_kwargs={'loc': 'upper left'},
edgecolor='white', linewidth=0.5)
Change the number of bins by specifying an alternative k
value. Adjust the colormap using the cmap
parameter. To use a
continuous colormap, explicitly specify k=None
. Note that if legend=True
, a matplotlib
colorbar legend will be used.
gplt.pointplot(data, projection=gcrs.AlbersEqualArea(),
hue='var', k=8,
edgecolor='white', linewidth=0.5,
legend=True, legend_kwargs={'bbox_to_anchor': (1.25, 1.0)})
scale
provides an alternative or additional visual variable.
gplt.pointplot(collisions, projection=gcrs.AlbersEqualArea(),
scale='NUMBER OF PERSONS INJURED',
legend=True, legend_kwargs={'loc': 'upper left'})
The limits can be adjusted to fit your data using the limits
parameter.
gplt.pointplot(collisions, projection=gcrs.AlbersEqualArea(),
scale='NUMBER OF PERSONS INJURED', limits=(0, 10),
legend=True, legend_kwargs={'loc': 'upper left'})
The default scaling function is linear: an observations at the midpoint of two others will be exactly midway
between them in size. To specify an alternative scaling function, use the scale_func
parameter. This should
be a factory function of two variables which, when given the maximum and minimum of the dataset,
returns a scaling function which will be applied to the rest of the data. A demo is available in
the example gallery.
def trivial_scale(minval, maxval):
def scalar(val):
return 2
return scalar
gplt.pointplot(collisions, projection=gcrs.AlbersEqualArea(),
scale='NUMBER OF PERSONS INJURED', scale_func=trivial_scale,
legend=True, legend_kwargs={'loc': 'upper left'})
hue
and scale
can co-exist. In case more than one visual variable is used, control which one appears in
the legend using legend_var
.
gplt.pointplot(collisions[collisions['BOROUGH'].notnull()],
projection=gcrs.AlbersEqualArea(),
hue='BOROUGH', categorical=True,
scale='NUMBER OF PERSONS INJURED', limits=(0, 10),
legend=True, legend_kwargs={'loc': 'upper left'},
legend_var='scale')
geoplot.geoplot.
polyplot
(df, projection=None, extent=None, figsize=(8, 6), ax=None, edgecolor='black', facecolor='None', **kwargs)¶Trivial polygonal plot.
Parameters: |
|
---|---|
Returns: | The plot axis |
Return type: |
|
Examples
The polyplot can be used to draw simple, unembellished polygons. A trivial example can be created with just a geometry and, optionally, a projection.
import geoplot as gplt
import geoplot.crs as gcrs
gplt.polyplot(boroughs, projection=gcrs.AlbersEqualArea())
However, note that polyplot
is mainly intended to be used in concert with other plot types.
ax = gplt.polyplot(boroughs, projection=gcrs.AlbersEqualArea())
gplt.pointplot(collisions[collisions['BOROUGH'].notnull()], projection=gcrs.AlbersEqualArea(),
hue='BOROUGH', categorical=True,
legend=True, edgecolor='white', linewidth=0.5, legend_kwargs={'loc': 'upper left'},
ax=ax)
Additional keyword arguments are passed to the underlying matplotlib
Polygon patches.
ax = gplt.polyplot(boroughs, projection=gcrs.AlbersEqualArea(),
linewidth=0, facecolor='lightgray')
geoplot.geoplot.
sankey
(*args, projection=None, start=None, end=None, path=None, hue=None, categorical=False, scheme=None, k=5, cmap='viridis', vmin=None, vmax=None, legend=False, legend_kwargs=None, legend_labels=None, legend_values=None, legend_var=None, extent=None, figsize=(8, 6), ax=None, scale=None, limits=(1, 5), scale_func=None, **kwargs)¶Spatial Sankey or flow map.
Parameters: |
|
---|---|
Returns: | The plot axis |
Return type: |
|
Examples
A Sankey diagram is a simple visualization demonstrating flow
through a network. A Sankey diagram is useful when you wish to show the volume of things moving between points or
spaces: traffic load a road network, for example, or inter-airport travel volumes. The geoplot
sankey
adds spatial context to this plot type by laying out the points in meaningful locations: airport locations, say,
or road intersections.
A basic sankey
specifies data, start
points, end
points, and, optionally, a projection. The df
argument is optional; if geometries are provided as independent iterables it is ignored. We overlay world
geometry to aid interpretability.
ax = gplt.sankey(la_flights, start='start', end='end', projection=gcrs.PlateCarree())
ax.set_global(); ax.coastlines()
The lines appear curved because they are great circle paths, which are the shortest routes between points on a sphere.
ax = gplt.sankey(la_flights, start='start', end='end', projection=gcrs.Orthographic())
ax.set_global(); ax.coastlines(); ax.outline_patch.set_visible(True)
To plot using a different distance metric pass a cartopy
crs
object (not a geoplot
one) to the
path
parameter.
import cartopy.crs as ccrs
ax = gplt.sankey(la_flights, start='start', end='end', projection=gcrs.PlateCarree(), path=ccrs.PlateCarree())
ax.set_global(); ax.coastlines()
If your data has custom paths, you can use those instead, via the path
parameter.
gplt.sankey(dc, path=dc.geometry, projection=gcrs.AlbersEqualArea(), scale='aadt')
hue
parameterizes the color, and cmap
controls the colormap. legend
adds a a legend. Keyword
arguments can be passed to the legend using the legend_kwargs
argument. These arguments will be
passed to the underlying matplotlib
Legend. The loc
and bbox_to_anchor
parameters are particularly useful for positioning the legend.
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to',
hue='mock_variable', cmap='RdYlBu',
legend=True, legend_kwargs={'bbox_to_anchor': (1.4, 1.0)})
ax.set_global()
ax.coastlines()
Change the number of bins by specifying an alternative k
value. To use a continuous colormap, explicitly
specify k=None
. You can change the binning sceme with scheme
. The default is quantile
, which bins
observations into classes of different sizes but the same numbers of observations. equal_interval
will
creates bins that are the same size, but potentially containing different numbers of observations. The more
complicated fisher_jenks
scheme is an intermediate between the two.
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to',
hue='mock_variable', cmap='RdYlBu',
legend=True, legend_kwargs={'bbox_to_anchor': (1.25, 1.0)},
k=3, scheme='equal_interval')
ax.set_global()
ax.coastlines()
If your variable of interest is already categorical, specify categorical=True
to
use the labels in your dataset directly.
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to',
hue='above_meridian', cmap='RdYlBu',
legend=True, legend_kwargs={'bbox_to_anchor': (1.2, 1.0)},
categorical=True)
ax.set_global()
ax.coastlines()
scale
can be used to enable linewidth
as a visual variable. Adjust the upper and lower bound with the
limits
parameter.
ax = gplt.sankey(la_flights, projection=gcrs.PlateCarree(),
extent=(-125.0011, -66.9326, 24.9493, 49.5904),
start='start', end='end',
scale='Passengers',
limits=(0.1, 5),
legend=True, legend_kwargs={'bbox_to_anchor': (1.1, 1.0)})
ax.coastlines()
The default scaling function is linear: an observations at the midpoint of two others will be exactly midway
between them in size. To specify an alternative scaling function, use the scale_func
parameter. This should
be a factory function of two variables which, when given the maximum and minimum of the dataset,
returns a scaling function which will be applied to the rest of the data. A demo is available in
the example gallery.
def trivial_scale(minval, maxval): return lambda v: 1
ax = gplt.sankey(la_flights, projection=gcrs.PlateCarree(),
extent=(-125.0011, -66.9326, 24.9493, 49.5904),
start='start', end='end',
scale='Passengers', scale_func=trivial_scale,
legend=True, legend_kwargs={'bbox_to_anchor': (1.1, 1.0)})
ax.coastlines()
hue
and scale
can co-exist. In case more than one visual variable is used, control which one appears in
the legend using legend_var
.
ax = gplt.sankey(network, projection=gcrs.PlateCarree(),
start='from', end='to',
scale='mock_data',
legend=True, legend_kwargs={'bbox_to_anchor': (1.1, 1.0)},
hue='mock_data', legend_var="hue")
ax.set_global()
ax.coastlines()
geoplot.geoplot.
voronoi
(df, projection=None, edgecolor='black', clip=None, hue=None, scheme=None, k=5, cmap='viridis', categorical=False, vmin=None, vmax=None, legend=False, legend_kwargs=None, legend_labels=None, extent=None, figsize=(8, 6), ax=None, **kwargs)¶Geospatial Voronoi diagram.
Parameters: |
|
---|---|
Returns: | The axis object with the plot on it. |
Return type: | AxesSubplot or GeoAxesSubplot instance |
Examples
The neighborhood closest to a point in space is known as its Voronoi region. Every point in a dataset has a Voronoi region, which may be either a closed polygon (for inliers) or open infinite region (for points on the edge of the distribution). A Voronoi diagram works by dividing a space filled with points into such regions and plotting the result. Voronoi plots allow efficient assessmelt of the density of points in different spaces, and when combined with a colormap can be quite informative of overall trends in the dataset.
The geoplot
voronoi
is a spatially aware application of this technique. It compares well with the more
well-known choropleth
, which has the advantage of using meaningful regions, but the disadvantage of having
defined those regions beforehand. voronoi
has fewer requirements and may perform better when the number of
observations is small. Compare also with the quadtree technique available in aggplot
.
A basic voronoi
specified data and, optionally, a projection. We overlay geometry to aid interpretability.
ax = gplt.voronoi(injurious_collisions.head(1000))
gplt.polyplot(boroughs, ax=ax)
hue
parameterizes the color, and cmap
controls the colormap.
ax = gplt.voronoi(injurious_collisions.head(1000), hue='NUMBER OF PERSONS INJURED', cmap='Reds')
gplt.polyplot(boroughs, ax=ax)
Add a clip
of iterable geometries to trim the voronoi
against local geography.
ax = gplt.voronoi(injurious_collisions.head(1000), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
clip=boroughs.geometry)
gplt.polyplot(boroughs, ax=ax)
legend
adds a a matplotlib
Legend. This can be tuned even further using the
legend_kwargs
argument. Other keyword parameters are passed to the underlying matplotlib
Polygon patches.
ax = gplt.voronoi(injurious_collisions.head(1000), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
clip=boroughs.geometry,
legend=True, legend_kwargs={'loc': 'upper left'},
linewidth=0.5, edgecolor='white',
)
gplt.polyplot(boroughs, ax=ax)
Change the number of bins by specifying an alternative k
value. To use a continuous colormap, explicitly
specify k=None
. You can change the binning sceme with scheme
. The default is quantile
, which bins
observations into classes of different sizes but the same numbers of observations. equal_interval
will
creates bins that are the same size, but potentially containing different numbers of observations. The more
complicated fisher_jenks
scheme is an intermediate between the two.
ax = gplt.voronoi(injurious_collisions.head(1000),
hue='NUMBER OF PERSONS INJURED', cmap='Reds', k=5, scheme='fisher_jenks',
clip=boroughs.geometry,
legend=True, legend_kwargs={'loc': 'upper left'},
linewidth=0.5, edgecolor='white',
)
gplt.polyplot(boroughs, ax=ax)
If your variable of interest is already categorical, specify categorical=True
to
use the labels in your dataset directly.
ax = gplt.voronoi(injurious_collisions.head(1000), hue='NUMBER OF PERSONS INJURED', cmap='Reds',
edgecolor='white', clip=boroughs.geometry,
linewidth=0.5, categorical=True
)
gplt.polyplot(boroughs, linewidth=1, ax=ax)
This module implements a naive equal-split four-way quadtree algorithm (https://en.wikipedia.org/wiki/Quadtree). It has been written in way meant to make it convenient to use for splitting and aggregating rectangular geometries up to a certain guaranteed minimum instance threshold.
The routines here are used by the geoplot.aggplot
plot type, and only when no user geometry input is provided.
geoplot.quad.
QuadTree
(gdf, bounds=None)¶Bases: object
This module’s core class. For more on quadtrees cf. https://en.wikipedia.org/wiki/Quadtree.
gdf
data initialization input. This is retained for
downstream aggregation purposes.bounds
or left to the QuadTree
instance to compute for itself.bounds
and whose values
consist of the indices of rows in the data
property corresponding with those points. This additional
bookkeeping is necessary because a single coordinate may contain many individual data points.partition
(nmin, nmax)¶This method call decomposes a QuadTree instances into a list of sub- QuadTree instances which are the
smallest possible geospatial “buckets”, given the current splitting rules, containing at least thresh
points.
Parameters: | thresh (int) – The minimum number of points per partition. Care should be taken not to set this parameter to be too low, as in large datasets a small cluster of highly adjacent points may result in a number of sub-recursive splits possibly in excess of Python’s global recursion limit. |
---|---|
Returns: | partitions – A list of sub- QuadTree instances which are the smallest possible geospatial “buckets”, given the current
splitting rules, containing at least thresh points. |
Return type: | list of QuadTree object instances |
split
()¶Splits the current QuadTree instance four ways through the midpoint.
Returns: |
|
---|
geoplot.quad.
flatten
(items)¶Yield items from any nested iterable. Used by QuadTree.flatten
to one-dimensionalize a list of sublists.
cf. http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
geoplot.quad.
subpartition
(quadtree, nmin, nmax)¶Recursive core of the QuadTree.partition
method. Just five lines of code, amazingly.
Parameters: |
|
---|---|
Returns: |
|