对于你给出的JSON数据,你可以通过解析JSON结构来实现,只需返回所有叶子节点的列表。
这假定你的结构在整个过程中是一致的,如果每个条目可以有不同的字段,见第二种方法。
import json
import csv
def get_leaves(item, key=None):
if isinstance(item, dict):
leaves = []
for i in item.keys():
leaves.extend(get_leaves(item[i], i))
return leaves
elif isinstance(item, list):
leaves = []
for i in item:
leaves.extend(get_leaves(i, key))
return leaves
else:
return [(key, item)]
with open('json.txt') as f_input, open('output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
write_header = True
for entry in json.load(f_input):
leaf_entries = sorted(get_leaves(entry))
if write_header:
csv_output.writerow([k for k, v in leaf_entries])
write_header = False
csv_output.writerow([v for k, v in leaf_entries])
如果你的JSON数据是你所给的格式的条目列表,那么你应该得到如下的输出。
address_line_1,company_number,country_of_residence,etag,forename,kind,locality,middle_name,month,name,nationality,natures_of_control,notified_on,postal_code,premises,region,self,surname,title,year
Address 1,12345678,England,26281dhge33b22df2359sd6afsff2cb8cf62bb4a7f00,John,individual-person-with-significant-control,Henley-On-Thames,M,2,John M Smith,Vietnamese,ownership-of-shares-50-to-75-percent,2016-04-06,RG9 1DP,161,Oxfordshire,/company/12345678/persons-with-significant-control/individual/bIhuKnFctSnjrDjUG8n3NgOrl,Smith,Mrs,1977
Address 1,12345679,England,26281dhge33b22df2359sd6afsff2cb8cf62bb4a7f00,John,individual-person-with-significant-control,Henley-On-Thames,M,2,John M Smith,Vietnamese,ownership-of-shares-50-to-75-percent,2016-04-06,RG9 1DP,161,Oxfordshire,/company/12345678/persons-with-significant-control/individual/bIhuKnFctSnjrDjUG8n3NgOrl,Smith,Mrs,1977
如果每个条目可能包含不同的(或可能缺失的)字段,那么一个更好的方法是使用DictWriter
。在这种情况下,需要对所有的条目进行处理,以确定可能的fieldnames
的完整列表,从而可以写出正确的标题。
import json
import csv
def get_leaves(item, key=None):
if isinstance(item, dict):
leaves = {}
for i in item.keys():
leaves.update(get_leaves(item[i], i))
return leaves
elif isinstance(item, list):
leaves = {}
for i in item:
leaves.update(get_leaves(i, key))
return leaves
else:
return {key : item}
with open('json.txt') as f_input:
json_data = json.load(f_input)
# First parse all entries to get the complete fieldname list
fieldnames = set()
for entry in json_data:
fieldnames.update(get_leaves(entry).keys())
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.DictWriter(f_output, fieldnames=sorted(fieldnames))
csv_output.writeheader()
csv_output.writerows(get_leaves(entry) for entry in json_data)