• 周六. 7 月 27th, 2024

5G编程聚合网

5G时代下一个聚合的编程学习网

热门标签

在python中通读4个csv文件,并根据列id打印出行

King Wang

3 月 7, 2023

嗨,伙计们,我是python的noob,目前正在学习,我想知道是否有人能帮我解决我面临的问题。我有四份档案路由.txt, trips.txt文件,停止_时代.txt, 停止.txt,文件如下所示(文件有数千行):

routes.txt 
"route_id","agency_id","route_short_name","route_long_name","route_desc","route_type","route_url","route_color","route_text_color"
"01","1","1",,,3,,"FFFF7C","000000"
"04","1","4",,,3,,"FFFF7C","000000"
"05","1","5",,,3,,"FFFF7C","000000"
"07","1","7",,,3,,"FFFF7C","000000"

trips.txt
"route_id","service_id","trip_id","trip_headsign","direction_id","block_id","shape_id"
"108","BUSN13-hbf13011-Weekday-02","19417636","Malden Station via Salem St.",1,"F411-75","1080037"
"94","BUSN13-hbf13011-Weekday-02","19417637","Medford Square via West Medford",0,"F94-5","940014"

stop_times.txt
"trip_id","arrival_time","departure_time","stop_id","stop_sequence","stop_headsign","pickup_type","drop_off_type"
"19417636","14:40:00","14:40:00","7412",1,,0,0
"19417636","14:41:00","14:41:00","6283",2,,0,0
"19417636","14:41:00","14:41:00","6284",3,,0,0

stops.txt
stop_id","stop_code","stop_name","stop_desc","stop_lat","stop_lon","zone_id","stop_url","location_type","parent_station"
"place-alfcl","","Alewife Station","","42.395428","-71.142483","","",1,""
"place-alsgr","","Allston St. Station","","42.348701","-71.137955","","",1,""
"place-andrw","","Andrew Station","","42.330154","-71.057655","","",1,""

例如,如果我们试图打印基于u的列ID,那么就打印a。你知道吗

check the ID in the routes.txt file and check if that ID is equal to the route_id in the Trips.txt file. 

如果匹配相等

take the trip_id from the trips.txt file and compare it with the trip_id in the stop_times.txt file

如果是匹配检查

stop_id is equal to the stop_id of the stops_file.txt file then print. Now the stop_id can be a number or a     string 

我要打印的是这样的东西,例如:

route_id, trip_id, arrival_time, departure_time, stop_name
01,19417636, 14:40:00,14:40:00, Alewife Station 

非常感谢

Tags:

文件thenametxtidtimetypeplaceroutefilestopstationtriptripsffff7c2条回答网友

1楼 ·

编辑于 2023-03-06 23:06:44

我认为在这种情况下最简单的方法是将数据导入数据库并使用SQL连接。您可以只使用sqlite3,这非常简单。即使是内存中的数据库也可以工作,这取决于有多少数据以及脚本运行的频率。你知道吗

确保为外键字段创建索引,否则查找可能会很慢。你知道吗

此外,sqlite3还能够直接从CSV文件导入数据。只需创建表,然后使用“.import”命令(运行sqlite3并键入.help或查看文档)。你知道吗

肖恩

网友

2楼 ·

编辑于 2023-03-06 23:06:44

您要做的工作称为join operation,使用pandas库可以很容易地完成:

import pandas as pd

routes = pd.read_csv('routes.txt')
trips = pd.read_csv('trips.txt')
stop_times = pd.read_csv('stop_times.txt')
stops = pd.read_csv('stops.txt')

您可能需要更改options for read_csv,以便它正确解释您的数据(尤其是route_id上的前导零)

#   Please excuse the Dr. Seuss variable names
routes_trips = pd.merge(routes, trips, on=['route_id'])
routes_trips_stop_times = pd.merge(routes_trips, stop_times, on=['trip_id'])
routes_trips_stop_times_names = pd.merge(routes_trips_stop_times, stops, on=['stop_id'])

默认情况下,pandas执行内部联接,因此您将只得到那些有匹配的route_ids、trip_ids和stop_ids的行

发表回复