GIS处理¶
近邻匹配¶
点与点匹配(DataFrame与DataFrame)¶
import transbigdata as tbd
生成两个GeoDataFrame表,但它们只有经纬度列
import pandas as pd
import geopandas as gpd
from shapely.geometry import LineString
dfA = gpd.GeoDataFrame([[1,2],[2,4],[2,6],
[2,10],[24,6],[21,6],
[22,6]],columns = ['lon1','lat1'])
dfB = gpd.GeoDataFrame([[1,3],[2,5],[2,2]],columns = ['lon','lat'])
使用tbd.ckdnearest进行点与点匹配,如果是DataFrame与DataFrame匹配(不含有地理信息),则需要指定前后两个表的经纬度列
-
transbigdata.
ckdnearest
(dfA_origin, dfB_origin, Aname=['lon', 'lat'], Bname=['lon', 'lat'])¶
输入两个DataFrame,分别指定经纬度列名,为表A匹配表B中最近点,并计算距离
输入
- dfA_originDataFrame
表A
- dfB_originDataFrame
表B
- AnameList
表A中经纬度列字段
- BnameList
表B中经纬度列字段
输出
- gdfDataFrame
为A匹配到B上最近点的表
tbd.ckdnearest(dfA,dfB,Aname=['lon1','lat1'],Bname=['lon','lat'])
#此时计算出的距离为经纬度换算实际距离
lon1 | lat1 | index | lon | lat | dist | |
---|---|---|---|---|---|---|
0 | 1 | 2 | 0 | 1 | 3 | 1.111949e+05 |
1 | 2 | 4 | 1 | 2 | 5 | 1.111949e+05 |
2 | 2 | 6 | 1 | 2 | 5 | 1.111949e+05 |
3 | 2 | 10 | 1 | 2 | 5 | 5.559746e+05 |
4 | 24 | 6 | 1 | 2 | 5 | 2.437393e+06 |
5 | 21 | 6 | 1 | 2 | 5 | 2.105798e+06 |
6 | 22 | 6 | 1 | 2 | 5 | 2.216318e+06 |
点与点匹配(GeoDataFrame与GeoDataFrame)¶
将A表B表变为含有点信息的GeoDataFrame
dfA['geometry'] = gpd.points_from_xy(dfA['lon1'],dfA['lat1'])
dfB['geometry'] = gpd.points_from_xy(dfB['lon'],dfB['lat'])
使用tbd.ckdnearest_point进行点与点匹配
-
transbigdata.
ckdnearest_point
(gdA, gdB)¶
输入两个GeoDataFrame,gdfA、gdfB均为点,该方法会为gdfA表连接上gdfB中最近的点,并添加距离字段dist
输入
- gdAGeoDataFrame
表A,点要素
- gdBGeoDataFrame
表B,点要素
输出
- gdfGeoDataFrame
为A匹配到B上最近点的表
tbd.ckdnearest_point(dfA,dfB)
#此时计算出的距离为经纬度距离
lon1 | lat1 | geometry_x | dist | index | lon | lat | geometry_y | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | POINT (1.00000 2.00000) | 1.000000 | 0 | 1 | 3 | POINT (1.00000 3.00000) |
1 | 2 | 4 | POINT (2.00000 4.00000) | 1.000000 | 1 | 2 | 5 | POINT (2.00000 5.00000) |
2 | 2 | 6 | POINT (2.00000 6.00000) | 1.000000 | 1 | 2 | 5 | POINT (2.00000 5.00000) |
3 | 2 | 10 | POINT (2.00000 10.00000) | 5.000000 | 1 | 2 | 5 | POINT (2.00000 5.00000) |
4 | 24 | 6 | POINT (24.00000 6.00000) | 22.022716 | 1 | 2 | 5 | POINT (2.00000 5.00000) |
5 | 21 | 6 | POINT (21.00000 6.00000) | 19.026298 | 1 | 2 | 5 | POINT (2.00000 5.00000) |
6 | 22 | 6 | POINT (22.00000 6.00000) | 20.024984 | 1 | 2 | 5 | POINT (2.00000 5.00000) |
点与线匹配(GeoDataFrame与GeoDataFrame)¶
将A表变为地理点,B表为线
dfA['geometry'] = gpd.points_from_xy(dfA['lon1'],dfA['lat1'])
dfB['geometry'] = [LineString([[1,1],[1.5,2.5],[3.2,4]]),
LineString([[1,0],[1.5,0],[4,0]]),
LineString([[1,-1],[1.5,-2],[4,-4]])]
dfB.plot()

-
transbigdata.
ckdnearest_line
(gdfA, gdfB)¶
输入两个GeoDataFrame,其中gdfA为点,gdfB为线,该方法会为gdfA表连接上gdfB中最近的线,并添加距离字段dist
输入
- gdAGeoDataFrame
表A,点要素
- gdBGeoDataFrame
表B,线要素
输出
- gdfGeoDataFrame
为A匹配到B中最近的线
用tbd.ckdnearest_line可以实现点匹配线,其原理是将线中的折点提取,然后使用点匹配点。
tbd.ckdnearest_line(dfA,dfB)
#此时计算出的距离为经纬度距离
lon1 | lat1 | geometry_x | dist | index | lon | lat | geometry_y | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | POINT (1.00000 2.00000) | 0.707107 | 0 | 1 | 3 | LINESTRING (1.00000 1.00000, 1.50000 2.50000, ... |
1 | 2 | 4 | POINT (2.00000 4.00000) | 1.200000 | 0 | 1 | 3 | LINESTRING (1.00000 1.00000, 1.50000 2.50000, ... |
2 | 2 | 6 | POINT (2.00000 6.00000) | 2.332381 | 0 | 1 | 3 | LINESTRING (1.00000 1.00000, 1.50000 2.50000, ... |
3 | 2 | 10 | POINT (2.00000 10.00000) | 6.118823 | 0 | 1 | 3 | LINESTRING (1.00000 1.00000, 1.50000 2.50000, ... |
4 | 21 | 6 | POINT (21.00000 6.00000) | 17.912007 | 0 | 1 | 3 | LINESTRING (1.00000 1.00000, 1.50000 2.50000, ... |
5 | 22 | 6 | POINT (22.00000 6.00000) | 18.906084 | 0 | 1 | 3 | LINESTRING (1.00000 1.00000, 1.50000 2.50000, ... |
6 | 24 | 6 | POINT (24.00000 6.00000) | 20.880613 | 1 | 2 | 5 | LINESTRING (1.00000 0.00000, 1.50000 0.00000, ... |
打断线¶
在实际应用中,我们可能会需要把很长的线打断为很多子线段,每一条线段不要超过一定的最大长度,此时则可以使用TransBigData包中的splitline_with_length方法。
-
transbigdata.
splitline_with_length
(Centerline, maxlength=100)¶
输入线GeoDataFrame要素,打断为最大长度maxlength的小线段
输入
- CenterlineGeoDataFrame
线要素
- maxlengthnumber
打断的线段最大长度
输出
- splitedlineGeoDataFrame
打断后的线
下面演示如何将线打断为100米一段的线段
#读取线要素
import geopandas as gpd
Centerline = gpd.read_file(r'test_lines.json')
Centerline.plot()

#转换线为投影坐标系
Centerline.crs = {'init':'epsg:4326'}
Centerline = Centerline.to_crs(epsg = '4517')
#计算线的长度
Centerline['length'] = Centerline.length
Centerline
Id | geometry | length | |
---|---|---|---|
0 | 0 | LINESTRING (29554925.232 4882800.694, 29554987... | 285.503444 |
1 | 0 | LINESTRING (29554682.635 4882450.554, 29554773... | 185.482276 |
2 | 0 | LINESTRING (29554987.079 4882521.969, 29555040... | 291.399180 |
3 | 0 | LINESTRING (29554987.079 4882521.969, 29555073... | 248.881529 |
4 | 0 | LINESTRING (29554987.079 4882521.969, 29554969... | 207.571197 |
5 | 0 | LINESTRING (29554773.177 4882288.671, 29554828... | 406.251357 |
6 | 0 | LINESTRING (29554773.177 4882288.671, 29554926... | 158.114403 |
7 | 0 | LINESTRING (29555060.286 4882205.456, 29555082... | 107.426629 |
8 | 0 | LINESTRING (29555040.278 4882235.468, 29555060... | 36.069941 |
9 | 0 | LINESTRING (29555060.286 4882205.456, 29555095... | 176.695446 |
#将线打断为最长100米的线段
import transbigdata as tbd
splitedline = tbd.splitline_with_length(Centerline,maxlength = 100)
#打断后线型不变
splitedline.plot()

#但内容已经变成一段一段了
splitedline
geometry | id | length | |
---|---|---|---|
0 | LINESTRING (29554925.232 4882800.694, 29554927... | 0 | 100.000000 |
1 | LINESTRING (29554946.894 4882703.068, 29554949... | 0 | 100.000000 |
2 | LINESTRING (29554968.557 4882605.443, 29554970... | 0 | 85.503444 |
0 | LINESTRING (29554682.635 4882450.554, 29554688... | 1 | 100.000000 |
1 | LINESTRING (29554731.449 4882363.277, 29554736... | 1 | 85.482276 |
0 | LINESTRING (29554987.079 4882521.969, 29554989... | 2 | 100.000000 |
1 | LINESTRING (29555005.335 4882423.650, 29555007... | 2 | 100.000000 |
2 | LINESTRING (29555023.592 4882325.331, 29555025... | 2 | 91.399180 |
0 | LINESTRING (29554987.079 4882521.969, 29554993... | 3 | 100.000000 |
1 | LINESTRING (29555042.051 4882438.435, 29555048... | 3 | 99.855617 |
2 | LINESTRING (29555111.265 4882370.450, 29555116... | 3 | 48.881529 |
0 | LINESTRING (29554987.079 4882521.969, 29554985... | 4 | 100.000000 |
1 | LINESTRING (29554973.413 4882422.908, 29554971... | 4 | 99.756943 |
2 | LINESTRING (29554930.341 4882335.023, 29554929... | 4 | 7.571197 |
0 | LINESTRING (29554773.177 4882288.671, 29554777... | 5 | 100.000000 |
1 | LINESTRING (29554816.361 4882198.476, 29554821... | 5 | 99.782969 |
2 | LINESTRING (29554882.199 4882125.314, 29554891... | 5 | 99.745378 |
3 | LINESTRING (29554976.612 4882096.588, 29554987... | 5 | 100.000000 |
4 | LINESTRING (29555076.548 4882100.189, 29555077... | 5 | 6.251357 |
0 | LINESTRING (29554773.177 4882288.671, 29554783... | 6 | 100.000000 |
1 | LINESTRING (29554869.914 4882314.006, 29554876... | 6 | 58.114403 |
0 | LINESTRING (29555060.286 4882205.456, 29555062... | 7 | 100.000000 |
1 | LINESTRING (29555081.239 4882107.675, 29555081... | 7 | 7.426629 |
0 | LINESTRING (29555040.278 4882235.468, 29555042... | 8 | 36.069941 |
0 | LINESTRING (29555060.286 4882205.456, 29555064... | 9 | 100.000000 |
1 | LINESTRING (29555094.981 4882299.244, 29555100... | 9 | 76.419694 |
面要素处理¶
面合并¶
-
transbigdata.
merge_polygon
(data, col)¶
输入多边形GeoDataFrame数据,以及分组列名col,对不同组别进行分组的多边形进行合并
输入
- dataGeoDataFrame
多边形数据
- colstr
分组列名
输出
- data1GeoDataFrame
合并后的面
对面取外边界构成新多边形¶
-
transbigdata.
polyon_exterior
(data, minarea=0)¶
输入多边形GeoDataFrame数据,对多边形取外边界构成新多边形
输入
- dataGeoDataFrame
多边形数据
- minareanumber
最小面积,小于这个面积的面全部剔除
输出
- data1GeoDataFrame
处理后的面
置信椭圆¶
置信椭圆参数估计¶
-
transbigdata.
ellipse_params
(data, col=['lon', 'lat'], confidence=95, epsg=None)¶
输入点数据,获取置信椭圆的参数
输入
- dataDataFrame
公交GPS数据,单一公交线路,且需要含有车辆ID、GPS时间、经纬度(wgs84)
- confidencenumber
置信度,可选99,95,90
- epsgnumber
如果给了,则将原始坐标从wgs84,转换至给定epsg坐标系下进行置信椭圆参数估计
- col: List
以[经度,纬度]形式存储的列名
输出
- params: List
质心椭圆参数,分别为[pos,width,height,theta,area,alpha] 对应[中心点坐标,短轴,长轴,角度,面积,方向性]
置信椭圆绘制¶
-
transbigdata.
ellipse_plot
(ellip_params, ax, **kwargs)¶
输入置信椭圆的参数,绘制置信椭圆
输入
ellip_params : List
- axmatplotlib.axes._subplots.AxesSubplot
画板
用法¶
import pandas as pd
import transbigdata as tbd
import numpy as np
#生成测试用数据
data = np.random.uniform(1,10,(100,2))
data[:,1:] = 0.5*data[:,0:1]+np.random.uniform(-2,2,(100,1))
data = pd.DataFrame(data,columns = ['x','y'])
#绘制数据分布
import matplotlib.pyplot as plt
plt.figure(1,(5,5))
#绘制数据点
plt.scatter(data['x'],data['y'],s = 0.5)
#绘制坐标轴
plt.plot([-10,10],[0,0],c = 'k')
plt.plot([0,0],[-10,10],c = 'k')
plt.xlim(-15,15)
plt.ylim(-15,15)
plt.show()

输入数据与xy坐标所在列名,置信度,估计椭圆参数 分别代表[中心点坐标,短轴,长轴,角度,面积,扁率
ellip_params = tbd.ellipse_params(data,confidence=95,col = ['x','y'])
ellip_params
[array([5.78928146, 2.88466235]),
4.6981983145616875,
14.04315715927693,
-58.15524535916836,
51.8186366184246,
0.6654457212665993]
再用tbd.ellipse_plot绘制置信椭圆
#绘制数据分布
import matplotlib.pyplot as plt
plt.figure(1,(5,5))
ax = plt.subplot(111)
#绘制数据点
plt.scatter(data['x'],data['y'],s = 0.5)
#获取置信椭圆参数并绘制椭圆
#99%置信椭圆
ellip_params = tbd.ellipse_params(data,confidence=99,col = ['x','y'])
tbd.ellipse_plot(ellip_params,ax,fill = False,edgecolor = 'r',linewidth = 1)
#95%置信椭圆
ellip_params = tbd.ellipse_params(data,confidence=95,col = ['x','y'])
tbd.ellipse_plot(ellip_params,ax,fill = False,edgecolor = 'b',linewidth = 1)
#90%置信椭圆
ellip_params = tbd.ellipse_params(data,confidence=90,col = ['x','y'])
tbd.ellipse_plot(ellip_params,ax,fill = False,edgecolor = 'k',linewidth = 1)
#绘制坐标轴
plt.plot([-10,10],[0,0],c = 'k')
plt.plot([0,0],[-10,10],c = 'k')
plt.xlim(-15,15)
plt.ylim(-15,15)
plt.show()
