Three Methods to Plot Choropleth Map Using Python

Introduction

Choropleth map is an extremely effective way to exhibit geospatial information, usually uses color to express the intensity of that information. In this notebook, we are going to show you three Python libraries that can help you make beautiful Choropleth map.

In [1]:
#We first import some basis libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import re
C:\ProgramData\Anaconda3\lib\site-packages\statsmodels\tools\_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
  import pandas.util.testing as tm

About Dataset

In this project, we use Taiwan as the geographical target along with population data acquired from Taiwanese goverment. The 2020 data shows the population distribution by age group and total population of administrative division in Taiwan. We will first clean and transform this dataset so that we can use it in later process. You can find the dataset in this website.

In [38]:
def clean_county(text):
    """
    This function transformed two similar Chinese characters into consistent one
    
    Input: Original text
    Output: Transformed text
    """
    text = ''.join(text.split(' '))
    if '臺' in text:
        text = re.sub('臺','台',text)
    return text

def clean_population(text):
    """
    This function cleans the population column and turn it into numerical column
    
    Input: Text containing population information
    Output: Transformed numerical information of population
    """
    text = int(''.join(text.strip().split(',')))
    return text

#First read the original dataset
Dataset = pd.read_csv("POPULATION.csv",encoding='Big5')
Dataset = Dataset.iloc[6:,0:3]
Dataset.columns = ['City/County','Group',"Population"]
Dataset = Dataset[Dataset['Group'] == ' 計 ']

#Rearrange dataframe (Due to original dataset problem)
for i in range(Dataset.shape[0]):
    if i != Dataset.shape[0]-1:
        Dataset.iloc[i, 0] = Dataset.iloc[i+1, 0]
    else: 
        Dataset.iloc[i, 0] = '連江縣'

#Apply cleaning function on columns
Dataset['City/County'] = Dataset['City/County'].map(clean_county)
Dataset["Population"] = Dataset["Population"].map(clean_population)        

#Get rid of useless data
Dataset = Dataset[Dataset['City/County'] != '台灣省']
Dataset = Dataset.drop(columns = ['Group'])
Dataset
Out[38]:
City/County Population
6 新北市 4030954
9 台北市 2602418
12 桃園市 2268807
15 台中市 2820787
18 台南市 1874917
21 高雄市 2765932
27 宜蘭縣 453087
30 新竹縣 570775
33 苗栗縣 542590
36 彰化縣 1266670
39 南投縣 490832
42 雲林縣 676873
45 嘉義縣 499481
48 屏東縣 812658
51 台東縣 215261
54 花蓮縣 324372
57 澎湖縣 105952
60 基隆市 367577
63 新竹市 451412
66 嘉義市 266005
69 福建省 153876
72 金門縣 140597
75 連江縣 13279

Get Geojson

Geojson is essential for the plotting Choropleth map because it records the actual outline of specific area. Therefore, you can manipulation the map using information you wish to convey. This step show how I retrieve Geojson data from Github. You can try to search your location of interest on the internet.

In [21]:
#Import modules for acquiring geojson
from urllib.request import urlopen
import json

#Import geojson from Github
with urlopen('https://raw.githubusercontent.com/g0v/twgeojson/master/json/twCounty2010.geo.json') as response:
    geo_taiwan = json.load(response)  #Collect geojson data from Github

#Update current administrative division information    
for i in range(len(geo_taiwan['features'])):
    if geo_taiwan['features'][i]['properties']['COUNTYNAME'] == "桃園縣":
        geo_taiwan['features'][i]['properties']['COUNTYNAME'] = "桃園市"
        geo_taiwan['features'][i]['properties']['name'] = "桃園市"

Folium

Folium is a Python library specializing in visualizing geographical data, of course including Choropleth map. This is our first method of plotting Choropleth map. For more information and documentation, please chech here.

In [39]:
#import folium
import folium

#Creaate a map object for choropleth map
#Set location to your location of interest (latitude and longitude )
map0 = folium.Map(location=[23.9,121.52], zoom_start=7)

#Create choropleth map object with key on TOWNNAME
folium.Choropleth(geo_data = geo_taiwan,#Assign geo_data to your geojson file
    name = "choropleth",
    data = Dataset,#Assign dataset of interest
    columns = ["City/County","Population"],#Assign columns in the dataset for plotting
    key_on = 'feature.properties.name',#Assign the key that geojson uses to connect with dataset
    fill_color = 'YlOrRd',
    fill_opacity = 0.7,
    line_opacity = 0.5,
    legend_name = 'Taiwan').add_to(map0)

#Create style_function
style_function = lambda x: {'fillColor': '#ffffff', 
                            'color':'#000000', 
                            'fillOpacity': 0.1, 
                            'weight': 0.1}

#Create highlight_function
highlight_function = lambda x: {'fillColor': '#000000', 
                                'color':'#000000', 
                                'fillOpacity': 0.50, 
                                'weight': 0.1}

#Create popup tooltip object
NIL = folium.features.GeoJson(
    geo_taiwan,
    style_function=style_function, 
    control=False,
    highlight_function=highlight_function, 
    tooltip=folium.features.GeoJsonTooltip(
        fields=['COUNTYNAME'],
        aliases=['City/County'],
        style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;")))

#Add tooltip object to the map
map0.add_child(NIL)
map0.keep_in_front(NIL)
folium.LayerControl().add_to(map0)

map0
Out[39]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Plotly

Plotly is a Canadian software company, but in this porject, the Plotly we are talking about is an advanced plotting libraries in Python known for interactive plots made by Plotly. You can check out here for more information.

In [43]:
#Import libraries
from plotly import graph_objects as go

#Create figure object
fig = go.Figure(
    go.Choroplethmapbox(
        geojson = geo_taiwan, #Assign geojson file
        featureidkey = "properties.COUNTYNAME", #Assign feature key
        locations = Dataset["City/County"], #Assign location data
        z = Dataset["Population"], #Assign information data
        zauto = True,
        colorscale = 'viridis',
        showscale = True,
    )
)

#Update layout
fig.update_layout(
    mapbox_style = "carto-positron", #Decide a style for the map
    mapbox_zoom = 6, #Zoom in scale
    mapbox_center = {"lat": 23.9, "lon": 121.52}, #Center location of the map
)

Geopandas

Geopandas is an open source project of Python that makes it easier to work with geospatial data. For more information and documentation, please check here

In [41]:
#Import libraries
import geopandas
import descartes

#Read geojson into pandas dataframe
taiwan_geopandas = geopandas.read_file('https://raw.githubusercontent.com/g0v/twgeojson/master/json/twCounty2010.geo.json') #全台灣村里界圖

#Update Taiwan administrative division changes
taiwan_geopandas.iloc[4,1] = '桃園市'
taiwan_geopandas.iloc[4,2] = '桃園市'

#Rename column for later merge
taiwan_geopandas.rename(columns = {'COUNTYNAME':'City/County'}, inplace=True)

#Merge the geopandas dataframe with our dataset
taiwan_geopandas = taiwan_geopandas.merge(Dataset, on = 'City/County', how = 'left')
taiwan_geopandas
Out[41]:
COUNTYSN City/County name geometry Population
0 10014001 台東縣 台東縣 MULTIPOLYGON (((121.60104 22.01792, 121.60112 ... 215261
1 10002001 宜蘭縣 宜蘭縣 MULTIPOLYGON (((121.87445 24.53563, 121.87448 ... 453087
2 63000001 台北市 台北市 POLYGON ((121.55916 25.21013, 121.55952 25.210... 2602418
3 10009001 雲林縣 雲林縣 MULTIPOLYGON (((120.13010 23.56890, 120.12985 ... 676873
4 10003001 桃園市 桃園市 POLYGON ((121.18980 25.09362, 121.18978 25.093... 2268807
5 10013001 屏東縣 屏東縣 MULTIPOLYGON (((120.70254 22.05566, 120.70261 ... 812658
6 10006001 台中市 台中市 MULTIPOLYGON (((120.52978 24.31569, 120.52978 ... 2820787
7 10011001 台南市 台南市 POLYGON ((120.57824 23.21478, 120.57833 23.214... 1874917
8 10017001 基隆市 基隆市 MULTIPOLYGON (((121.78196 25.18914, 121.78190 ... 367577
9 09007001 連江縣 連江縣 MULTIPOLYGON (((119.97044 26.16379, 119.97043 ... 13279
10 10008001 南投縣 南投縣 POLYGON ((121.23767 24.22020, 121.23777 24.220... 490832
11 10016001 澎湖縣 澎湖縣 MULTIPOLYGON (((119.41607 23.19322, 119.41602 ... 105952
12 10005001 苗栗縣 苗栗縣 MULTIPOLYGON (((120.66305 24.48134, 120.66314 ... 542590
13 10020001 嘉義市 嘉義市 POLYGON ((120.42862 23.50917, 120.43156 23.508... 266005
14 10004001 新竹縣 新竹縣 POLYGON ((120.94913 24.88166, 120.94952 24.882... 570775
15 10001001 新北市 新北市 MULTIPOLYGON (((121.53890 25.30113, 121.53893 ... 4030954
16 10015001 花蓮縣 花蓮縣 MULTIPOLYGON (((121.50648 23.50753, 121.50656 ... 324372
17 10012001 高雄市 高雄市 POLYGON ((120.45431 22.77177, 120.45410 22.771... 2765932
18 10007001 彰化縣 彰化縣 MULTIPOLYGON (((120.26482 23.87243, 120.26486 ... 1266670
19 10010001 嘉義縣 嘉義縣 MULTIPOLYGON (((120.13537 23.43690, 120.13549 ... 499481
20 09020001 金門縣 金門縣 MULTIPOLYGON (((118.36483 24.42842, 118.36476 ... 140597
21 10018001 新竹市 新竹市 POLYGON ((120.91090 24.82720, 120.91106 24.827... 451412

If you wish to add labels on your Geopandas Choropleth map. Please check this website. I don't do it here because of Chinese character problem.

In [42]:
#Plot the Choropleth map
taiwan_geopandas.plot(column = 'Population', #Assign numerical data column
                      legend = True, #Decide to show legend or not
                      figsize = [20,10],\
                      legend_kwds = {'label': "Population by County or City"}) #Name the legend
Out[42]:
<matplotlib.axes._subplots.AxesSubplot at 0x23268040f08>

Conclusion

In this project, we create Choropleth map using three different Python libraries.

Folium: Professional, interactive but relatively hard and complicated to build

Plotly: Professional, interactive along with default showing labels function and a lot of pre-built styles

Geopandas: Non-interactive but relative fast to build and you can work with pandas dataframe with which most Python users are familiar

All of them can successfully convey the information we would like to share through Choropleth map, it's up to you to decide which is the best for you. Also remind yourself, interactive maps should be more memory consuming than normal maps.

There are still a lot of parameters you could try to beautify your map. Other than searching for technical documents, also try to search online to see if anyone else has already created one you would like.

In [ ]: