我有以下数据框架。
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Steve Smith', 'Joe Nadal',
'Roger Federer'],
'birthdat/company': ['1995-01-26Sharp, Reed and Crane',
'1955-08-14Price and Sons',
'2000-06-28Pruitt, Bush and Mcguir']})
df[['data_time','full_company_name']] = df['birthdat/company'].str.split('[0-9]{4}-[0-9]{2}-[0-9]{2}', expand=True)
with my code I get the following:
____|____Name______|__birthdat/company_______________|_birthdate_|____company___________
0 |Steve Smith |1995-01-26Sharp, Reed and Crane | |Sharp, Reed and Crane
1 |Joe Nadal |1955-08-14Price and Sons | |Price and Sons
2 |Roger Federer |2000-06-28Pruitt, Bush and Mcguir| |Pruitt, Bush and Mcguir
我想要的是--得到这个词组('[0-9]{4}-[0-9]{2}-[0-9]{2}'),其余的应该进入 "full_company_name "列和:
____|____Name______|_birthdate_|____company_name_______
0 |Steve Smith |1995-01-26 |Sharp, Reed and Crane
1 |Joe Nadal |1955-08-14 |Price and Sons
2 |Roger Federer |2000-06-28 |Pruitt, Bush and Mcguir
更新的问题。
我怎样才能处理出生日期或公司名称的缺失值。
例如:出生日期/公司="NaApple "或出生日期/公司="2003-01-15Na",缺失值不仅限于Na