GIS处理

近邻匹配

下面的案例展示如何用TransBigData包进行点与点、点与线的近邻匹配。该方法使用的是KDTree算法,可查看wiki:https://en.wikipedia.org/wiki/K-d_tree,算法复杂度为o(log(n))

点与点匹配(DataFrame与DataFrame)

导入TransBigData包
import transbigdata as tbd

生成两个GeoDataFrame表,但它们只有经纬度列

import pandas as pd
import geopandas as gpd
from shapely.geometry import LineString
dfA = gpd.GeoDataFrame([[1,2],[2,4],[2,6],
                        [2,10],[24,6],[21,6],
                        [22,6]],columns = ['lon1','lat1'])
dfB = gpd.GeoDataFrame([[1,3],[2,5],[2,2]],columns = ['lon','lat'])

使用tbd.ckdnearest进行点与点匹配,如果是DataFrame与DataFrame匹配(不含有地理信息),则需要指定前后两个表的经纬度列

transbigdata.ckdnearest(dfA_origin, dfB_origin, Aname=['lon', 'lat'], Bname=['lon', 'lat'])

输入两个DataFrame,分别指定经纬度列名,为表A匹配表B中最近点,并计算距离

输入

dfA_originDataFrame

表A

dfB_originDataFrame

表B

AnameList

表A中经纬度列字段

BnameList

表B中经纬度列字段

输出

gdfDataFrame

为A匹配到B上最近点的表

tbd.ckdnearest(dfA,dfB,Aname=['lon1','lat1'],Bname=['lon','lat'])
#此时计算出的距离为经纬度换算实际距离
lon1 lat1 index lon lat dist
0 1 2 0 1 3 1.111949e+05
1 2 4 1 2 5 1.111949e+05
2 2 6 1 2 5 1.111949e+05
3 2 10 1 2 5 5.559746e+05
4 24 6 1 2 5 2.437393e+06
5 21 6 1 2 5 2.105798e+06
6 22 6 1 2 5 2.216318e+06

点与点匹配(GeoDataFrame与GeoDataFrame)

将A表B表变为含有点信息的GeoDataFrame

dfA['geometry'] = gpd.points_from_xy(dfA['lon1'],dfA['lat1'])
dfB['geometry'] = gpd.points_from_xy(dfB['lon'],dfB['lat'])

使用tbd.ckdnearest_point进行点与点匹配

transbigdata.ckdnearest_point(gdA, gdB)

输入两个GeoDataFrame,gdfA、gdfB均为点,该方法会为gdfA表连接上gdfB中最近的点,并添加距离字段dist

输入

gdAGeoDataFrame

表A,点要素

gdBGeoDataFrame

表B,点要素

输出

gdfGeoDataFrame

为A匹配到B上最近点的表

tbd.ckdnearest_point(dfA,dfB)
#此时计算出的距离为经纬度距离
lon1 lat1 geometry_x dist index lon lat geometry_y
0 1 2 POINT (1.00000 2.00000) 1.000000 0 1 3 POINT (1.00000 3.00000)
1 2 4 POINT (2.00000 4.00000) 1.000000 1 2 5 POINT (2.00000 5.00000)
2 2 6 POINT (2.00000 6.00000) 1.000000 1 2 5 POINT (2.00000 5.00000)
3 2 10 POINT (2.00000 10.00000) 5.000000 1 2 5 POINT (2.00000 5.00000)
4 24 6 POINT (24.00000 6.00000) 22.022716 1 2 5 POINT (2.00000 5.00000)
5 21 6 POINT (21.00000 6.00000) 19.026298 1 2 5 POINT (2.00000 5.00000)
6 22 6 POINT (22.00000 6.00000) 20.024984 1 2 5 POINT (2.00000 5.00000)

点与线匹配(GeoDataFrame与GeoDataFrame)

将A表变为地理点,B表为线

dfA['geometry'] = gpd.points_from_xy(dfA['lon1'],dfA['lat1'])
dfB['geometry'] = [LineString([[1,1],[1.5,2.5],[3.2,4]]),
                  LineString([[1,0],[1.5,0],[4,0]]),
                   LineString([[1,-1],[1.5,-2],[4,-4]])]
dfB.plot()
_images/output_15_1.png
transbigdata.ckdnearest_line(gdfA, gdfB)

输入两个GeoDataFrame,其中gdfA为点,gdfB为线,该方法会为gdfA表连接上gdfB中最近的线,并添加距离字段dist

输入

gdAGeoDataFrame

表A,点要素

gdBGeoDataFrame

表B,线要素

输出

gdfGeoDataFrame

为A匹配到B中最近的线

用tbd.ckdnearest_line可以实现点匹配线,其原理是将线中的折点提取,然后使用点匹配点。

tbd.ckdnearest_line(dfA,dfB)
#此时计算出的距离为经纬度距离
lon1 lat1 geometry_x dist index lon lat geometry_y
0 1 2 POINT (1.00000 2.00000) 0.707107 0 1 3 LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
1 2 4 POINT (2.00000 4.00000) 1.200000 0 1 3 LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
2 2 6 POINT (2.00000 6.00000) 2.332381 0 1 3 LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
3 2 10 POINT (2.00000 10.00000) 6.118823 0 1 3 LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
4 21 6 POINT (21.00000 6.00000) 17.912007 0 1 3 LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
5 22 6 POINT (22.00000 6.00000) 18.906084 0 1 3 LINESTRING (1.00000 1.00000, 1.50000 2.50000, ...
6 24 6 POINT (24.00000 6.00000) 20.880613 1 2 5 LINESTRING (1.00000 0.00000, 1.50000 0.00000, ...

打断线

在实际应用中,我们可能会需要把很长的线打断为很多子线段,每一条线段不要超过一定的最大长度,此时则可以使用TransBigData包中的splitline_with_length方法。

transbigdata.splitline_with_length(Centerline, maxlength=100)

输入线GeoDataFrame要素,打断为最大长度maxlength的小线段

输入

CenterlineGeoDataFrame

线要素

maxlengthnumber

打断的线段最大长度

输出

splitedlineGeoDataFrame

打断后的线

下面演示如何将线打断为100米一段的线段

#读取线要素
import geopandas as gpd
Centerline = gpd.read_file(r'test_lines.json')
Centerline.plot()
_images/output_2_1.png
#转换线为投影坐标系
Centerline.crs = {'init':'epsg:4326'}
Centerline = Centerline.to_crs(epsg = '4517')
#计算线的长度
Centerline['length'] = Centerline.length
Centerline
Id geometry length
0 0 LINESTRING (29554925.232 4882800.694, 29554987... 285.503444
1 0 LINESTRING (29554682.635 4882450.554, 29554773... 185.482276
2 0 LINESTRING (29554987.079 4882521.969, 29555040... 291.399180
3 0 LINESTRING (29554987.079 4882521.969, 29555073... 248.881529
4 0 LINESTRING (29554987.079 4882521.969, 29554969... 207.571197
5 0 LINESTRING (29554773.177 4882288.671, 29554828... 406.251357
6 0 LINESTRING (29554773.177 4882288.671, 29554926... 158.114403
7 0 LINESTRING (29555060.286 4882205.456, 29555082... 107.426629
8 0 LINESTRING (29555040.278 4882235.468, 29555060... 36.069941
9 0 LINESTRING (29555060.286 4882205.456, 29555095... 176.695446
#将线打断为最长100米的线段
import transbigdata as tbd
splitedline = tbd.splitline_with_length(Centerline,maxlength = 100)
#打断后线型不变
splitedline.plot()
_images/output_5_1.png
#但内容已经变成一段一段了
splitedline
geometry id length
0 LINESTRING (29554925.232 4882800.694, 29554927... 0 100.000000
1 LINESTRING (29554946.894 4882703.068, 29554949... 0 100.000000
2 LINESTRING (29554968.557 4882605.443, 29554970... 0 85.503444
0 LINESTRING (29554682.635 4882450.554, 29554688... 1 100.000000
1 LINESTRING (29554731.449 4882363.277, 29554736... 1 85.482276
0 LINESTRING (29554987.079 4882521.969, 29554989... 2 100.000000
1 LINESTRING (29555005.335 4882423.650, 29555007... 2 100.000000
2 LINESTRING (29555023.592 4882325.331, 29555025... 2 91.399180
0 LINESTRING (29554987.079 4882521.969, 29554993... 3 100.000000
1 LINESTRING (29555042.051 4882438.435, 29555048... 3 99.855617
2 LINESTRING (29555111.265 4882370.450, 29555116... 3 48.881529
0 LINESTRING (29554987.079 4882521.969, 29554985... 4 100.000000
1 LINESTRING (29554973.413 4882422.908, 29554971... 4 99.756943
2 LINESTRING (29554930.341 4882335.023, 29554929... 4 7.571197
0 LINESTRING (29554773.177 4882288.671, 29554777... 5 100.000000
1 LINESTRING (29554816.361 4882198.476, 29554821... 5 99.782969
2 LINESTRING (29554882.199 4882125.314, 29554891... 5 99.745378
3 LINESTRING (29554976.612 4882096.588, 29554987... 5 100.000000
4 LINESTRING (29555076.548 4882100.189, 29555077... 5 6.251357
0 LINESTRING (29554773.177 4882288.671, 29554783... 6 100.000000
1 LINESTRING (29554869.914 4882314.006, 29554876... 6 58.114403
0 LINESTRING (29555060.286 4882205.456, 29555062... 7 100.000000
1 LINESTRING (29555081.239 4882107.675, 29555081... 7 7.426629
0 LINESTRING (29555040.278 4882235.468, 29555042... 8 36.069941
0 LINESTRING (29555060.286 4882205.456, 29555064... 9 100.000000
1 LINESTRING (29555094.981 4882299.244, 29555100... 9 76.419694

面要素处理

面合并

transbigdata.merge_polygon(data, col)

输入多边形GeoDataFrame数据,以及分组列名col,对不同组别进行分组的多边形进行合并

输入

dataGeoDataFrame

多边形数据

colstr

分组列名

输出

data1GeoDataFrame

合并后的面

对面取外边界构成新多边形

transbigdata.polyon_exterior(data, minarea=0)

输入多边形GeoDataFrame数据,对多边形取外边界构成新多边形

输入

dataGeoDataFrame

多边形数据

minareanumber

最小面积,小于这个面积的面全部剔除

输出

data1GeoDataFrame

处理后的面

置信椭圆

置信椭圆参数估计

transbigdata.ellipse_params(data, col=['lon', 'lat'], confidence=95, epsg=None)

输入点数据,获取置信椭圆的参数

输入

dataDataFrame

公交GPS数据,单一公交线路,且需要含有车辆ID、GPS时间、经纬度(wgs84)

confidencenumber

置信度,可选99,95,90

epsgnumber

如果给了,则将原始坐标从wgs84,转换至给定epsg坐标系下进行置信椭圆参数估计

col: List

以[经度,纬度]形式存储的列名

输出

params: List

质心椭圆参数,分别为[pos,width,height,theta,area,alpha] 对应[中心点坐标,短轴,长轴,角度,面积,方向性]

置信椭圆绘制

transbigdata.ellipse_plot(ellip_params, ax, **kwargs)

输入置信椭圆的参数,绘制置信椭圆

输入

ellip_params : List

axmatplotlib.axes._subplots.AxesSubplot

画板

用法

import pandas as pd
import transbigdata as tbd
import numpy as np
#生成测试用数据
data = np.random.uniform(1,10,(100,2))
data[:,1:] = 0.5*data[:,0:1]+np.random.uniform(-2,2,(100,1))
data = pd.DataFrame(data,columns = ['x','y'])

#绘制数据分布
import matplotlib.pyplot as plt
plt.figure(1,(5,5))
#绘制数据点
plt.scatter(data['x'],data['y'],s = 0.5)
#绘制坐标轴
plt.plot([-10,10],[0,0],c = 'k')
plt.plot([0,0],[-10,10],c = 'k')
plt.xlim(-15,15)
plt.ylim(-15,15)
plt.show()
_images/output_1_0.png

输入数据与xy坐标所在列名,置信度,估计椭圆参数 分别代表[中心点坐标,短轴,长轴,角度,面积,扁率

ellip_params = tbd.ellipse_params(data,confidence=95,col = ['x','y'])
ellip_params
[array([5.78928146, 2.88466235]),
 4.6981983145616875,
 14.04315715927693,
 -58.15524535916836,
 51.8186366184246,
 0.6654457212665993]

再用tbd.ellipse_plot绘制置信椭圆

#绘制数据分布
import matplotlib.pyplot as plt
plt.figure(1,(5,5))
ax = plt.subplot(111)
#绘制数据点
plt.scatter(data['x'],data['y'],s = 0.5)
#获取置信椭圆参数并绘制椭圆
#99%置信椭圆
ellip_params = tbd.ellipse_params(data,confidence=99,col = ['x','y'])
tbd.ellipse_plot(ellip_params,ax,fill = False,edgecolor = 'r',linewidth = 1)
#95%置信椭圆
ellip_params = tbd.ellipse_params(data,confidence=95,col = ['x','y'])
tbd.ellipse_plot(ellip_params,ax,fill = False,edgecolor = 'b',linewidth = 1)
#90%置信椭圆
ellip_params = tbd.ellipse_params(data,confidence=90,col = ['x','y'])
tbd.ellipse_plot(ellip_params,ax,fill = False,edgecolor = 'k',linewidth = 1)
#绘制坐标轴
plt.plot([-10,10],[0,0],c = 'k')
plt.plot([0,0],[-10,10],c = 'k')
plt.xlim(-15,15)
plt.ylim(-15,15)
plt.show()
_images/output_3_0.png