在python中通读4个csv文件，并根据列id打印出行

嗨，伙计们，我是python的noob，目前正在学习，我想知道是否有人能帮我解决我面临的问题。我有四份档案路由.txt, trips.txt文件，停止_时代.txt, 停止.txt，文件如下所示（文件有数千行）：

routes.txt 
"route_id","agency_id","route_short_name","route_long_name","route_desc","route_type","route_url","route_color","route_text_color"
"01","1","1",,,3,,"FFFF7C","000000"
"04","1","4",,,3,,"FFFF7C","000000"
"05","1","5",,,3,,"FFFF7C","000000"
"07","1","7",,,3,,"FFFF7C","000000"

trips.txt
"route_id","service_id","trip_id","trip_headsign","direction_id","block_id","shape_id"
"108","BUSN13-hbf13011-Weekday-02","19417636","Malden Station via Salem St.",1,"F411-75","1080037"
"94","BUSN13-hbf13011-Weekday-02","19417637","Medford Square via West Medford",0,"F94-5","940014"

stop_times.txt
"trip_id","arrival_time","departure_time","stop_id","stop_sequence","stop_headsign","pickup_type","drop_off_type"
"19417636","14:40:00","14:40:00","7412",1,,0,0
"19417636","14:41:00","14:41:00","6283",2,,0,0
"19417636","14:41:00","14:41:00","6284",3,,0,0

stops.txt
stop_id","stop_code","stop_name","stop_desc","stop_lat","stop_lon","zone_id","stop_url","location_type","parent_station"
"place-alfcl","","Alewife Station","","42.395428","-71.142483","","",1,""
"place-alsgr","","Allston St. Station","","42.348701","-71.137955","","",1,""
"place-andrw","","Andrew Station","","42.330154","-71.057655","","",1,""

例如，如果我们试图打印基于u的列ID，那么就打印a。你知道吗

check the ID in the routes.txt file and check if that ID is equal to the route_id in the Trips.txt file.

如果匹配相等

take the trip_id from the trips.txt file and compare it with the trip_id in the stop_times.txt file

如果是匹配检查

stop_id is equal to the stop_id of the stops_file.txt file then print. Now the stop_id can be a number or a     string

我要打印的是这样的东西，例如：

route_id, trip_id, arrival_time, departure_time, stop_name
01,19417636, 14:40:00,14:40:00, Alewife Station

非常感谢

Tags：

文件thenametxtidtimetypeplaceroutefilestopstationtriptripsffff7c2条回答网友

1楼 ·

编辑于 2023-03-06 23:06:44

我认为在这种情况下最简单的方法是将数据导入数据库并使用SQL连接。您可以只使用sqlite3，这非常简单。即使是内存中的数据库也可以工作，这取决于有多少数据以及脚本运行的频率。你知道吗

确保为外键字段创建索引，否则查找可能会很慢。你知道吗

此外，sqlite3还能够直接从CSV文件导入数据。只需创建表，然后使用“.import”命令（运行sqlite3并键入.help或查看文档）。你知道吗

肖恩

网友

2楼 ·

编辑于 2023-03-06 23:06:44

您要做的工作称为join operation，使用pandas库可以很容易地完成：

import pandas as pd

routes = pd.read_csv('routes.txt')
trips = pd.read_csv('trips.txt')
stop_times = pd.read_csv('stop_times.txt')
stops = pd.read_csv('stops.txt')

您可能需要更改options for read_csv，以便它正确解释您的数据（尤其是route_id上的前导零）

#   Please excuse the Dr. Seuss variable names
routes_trips = pd.merge(routes, trips, on=['route_id'])
routes_trips_stop_times = pd.merge(routes_trips, stop_times, on=['trip_id'])
routes_trips_stop_times_names = pd.merge(routes_trips_stop_times, stops, on=['stop_id'])

默认情况下，pandas执行内部联接，因此您将只得到那些有匹配的route_ids、trip_ids和stop_ids的行

5G编程聚合网

在python中通读4个csv文件，并根据列id打印出行

由King Wang

King Wang

相关文章

对嵌入字符串中的数字进行排序

如何使用Pandas对excel文件中的数据进行排序。并对副本进行排序

在pygtk Treevi中添加新行

发表回复

You missed

对嵌入字符串中的数字进行排序

如何使用Pandas对excel文件中的数据进行排序。并对副本进行排序

在pygtk Treevi中添加新行

我可以在pybel格式转换期间捕获警告消息吗？

5G编程聚合网