我需要读取一个csv文件,并根据关系类型和LastName在 "Phone,Email,address,city,state,Zip "列中填充空值/空值,然后写入一个新的csv文件。 例子。如果一个人的关系类型是 "雇主",他的家属有相同的姓氏,如果家属没有 "电话,电子邮件,地址,城市,州,邮编",我应该用雇主的 "电话,电子邮件,地址,城市,州,邮编 "填写空值。我面临两个主要问题:1)不同的人/家庭的姓氏可能是相同的(即第5-10行),所以只要 "关系 "变为 "雇主",循环就应该中断并继续进行;2)在某些情况下,家属的姓氏不会与雇主的姓氏相同,但如果它介于相同姓氏之间,我们仍然应该填写空值(即第8行)。
我将无法使用python pandas,因为其余的代码/程序纯粹是基于python 2.7的。
输入格式/表格看起来像下面的空单元格(csv文件)。
[FirstName LastName DoB Relationship Phone Email Address City State Zip
Hannah Kahnw 9/12/1972 Employer 1457871452 hann@gmail.com Han Ave hannas UT 563425
Michel Kahnw 2/9/1993 Dependent
Jonaas Kahnw 2/22/1997 Dependent
Mikkel Nielsen 1/25/1976 Employer 4509213887 mik@yah.com 887 Street neil NY 72356
Magnus Nielsen 9/20/1990 Dependent
Ulrich Nielsen 9/12/1983 Employer 7901234516 ulric@mail.com Ulric Build mavric KS 421256
kathari Nielsen 10/2/2003 Dependent
kathy storm 12/12/1999 Dependent
kiiten Nielsen 6/21/1999 Dependent
Elisab Doppler 2/22/1987 Employer 5439001211 elop@amaz.com Elisa apart Elis AR 758475
Peterp Doppler 1/25/1977 Employer 6847523758 per@dail.com park Ave Pete PT 415253
bartos Doppler 9/21/1990 Dependent][1]
输出格式应如下。
FirstName LastName DoB Relationship Phone Email Address City State Zip
Hannah Kahnw 9/12/1972 Employer 1457871452 hann@gmail.com Han Ave hannas UT 563425
Michel Kahnw 2/9/1993 Dependent 1457871453 hann@gmail.com Han Ave hannas UT 563426
Jonaas Kahnw 2/22/1997 Dependent 1457871454 hann@gmail.com Han Ave hannas UT 563427
Mikkel Nielsen 1/25/1976 Employer 4509213887 mik@yah.com 887 Street neil NY 72356
Magnus Nielsen 9/20/1990 Dependent 4509213888 mik@yah.com 888 Street neil NY 72357
Ulrich Nielsen 9/12/1983 Employer 7901234516 ulric@mail.com Ulric Build mavric KS 421256
kathari Nielsen 10/2/2003 Dependent 7901234517 ulric@mail.com Ulric Build mavric KS 421257
kathy storm 12/12/1999 Dependent 7901234518 ulric@mail.com Ulric Build mavric KS 421258
kiiten Nielsen 6/21/1999 Dependent 7901234519 ulric@mail.com Ulric Build mavric KS 421259
Elisab Doppler 2/22/1987 Employer 5439001211 elop@amaz.com Elisa apart Elis AR 758475
Peterp Doppler 1/25/1977 Employer 6847523758 per@dail.com park Ave Pete PT 415253
bartos Doppler 9/21/1990 Dependent 6847523759 per@dail.com park Ave Pete PT 415254
# Read data from file and convert to list of namedtuples
# dictionary to use to fill in missing information from others.
with open(file_path, 'rb') as fin:
csv_reader = csv.reader(fin, skipinitialspace=True)
header = next(csv_reader)
Record = namedtuple('Record', header)
addr_dict = {}
data = [header]
for rec in (Record._make(row) for row in csv_reader):
if rec.Email or rec.Phone or rec.Address or rec.City or rec.State or rec.Zip:
addr_dict.setdefault(rec.LastName, []).append(rec) # Remember it.
# Try to fill in missing data from any other records with same Address.
for i, row in enumerate(data[1:], 1):
if not (row.Phone and row.Email and rec.Address and rec.City and rec.State and rec.Zip): # Info missing?
# Try to copy it from others at same address.
updated = False
for other in addr_dict.get(row.LastName, []):
if not row.Phone and other.Phone:
row = row._replace(Phone=other.Phone)
updated = True
if not row.Email and other.Email:
row = row._replace(Email=other.Email)
updated = True
if not row.Address and other.Address:
row = row._replace(Address=other.Address)
updated = True
if not row.City and other.City:
row = row._replace(City=other.City)
updated = True
if not row.Zip and other.Zip:
row = row._replace(Zip=other.Zip)
updated = True
if row.Phone and row.Email and rec.Address and rec.City and rec.State and rec.Zip: # Info now filled in?
break
if updated:
data[i] = row
return data
INPUT_FILE = 'null_cols.csv'
OUTPUT_FILE = 'fill_cols.csv'
data = get_info(INPUT_FILE)
with open(OUTPUT_FILE, 'wb') as fout:
writer = csv.DictWriter(fout, data[0]) # First elem has column names.
writer.writeheader()
for row in data[1:]:
writer.writerow(row._asdict())
#(i got this code from earlier question which i asked in S.O This script doesn't include relationshiptype logic and also it doesn't consider the Duplicate LastName issue)
谢谢你的帮助!!