python pandas将数据图列分割成两个新的列并删除原来的列。

4 人关注

我有以下数据框架。

import pandas as pd
import numpy as np
df = pd.DataFrame({'Name': ['Steve Smith', 'Joe Nadal',
                            'Roger Federer'],
                  'birthdat/company': ['1995-01-26Sharp, Reed and Crane',
                                      '1955-08-14Price and Sons',
                                      '2000-06-28Pruitt, Bush and Mcguir']})
df[['data_time','full_company_name']] = df['birthdat/company'].str.split('[0-9]{4}-[0-9]{2}-[0-9]{2}', expand=True)

with my code I get the following:

____|____Name______|__birthdat/company_______________|_birthdate_|____company___________
0   |Steve Smith   |1995-01-26Sharp, Reed and Crane  |           |Sharp, Reed and Crane
1   |Joe Nadal     |1955-08-14Price and Sons         |           |Price and Sons
2   |Roger Federer |2000-06-28Pruitt, Bush and Mcguir|           |Pruitt, Bush and Mcguir

我想要的是--得到这个词组('[0-9]{4}-[0-9]{2}-[0-9]{2}'),其余的应该进入 "full_company_name "列和:

____|____Name______|_birthdate_|____company_name_______
0   |Steve Smith   |1995-01-26 |Sharp, Reed and Crane
1   |Joe Nadal     |1955-08-14 |Price and Sons
2   |Roger Federer |2000-06-28 |Pruitt, Bush and Mcguir

更新的问题。 我怎样才能处理出生日期或公司名称的缺失值。 例如:出生日期/公司="NaApple "或出生日期/公司="2003-01-15Na",缺失值不仅限于Na

python
python-3.x
regex
pandas
dataframe
TheDev
TheDev
发布于 2020-11-07
2 个回答
Wiktor Stribiżew
Wiktor Stribiżew
发布于 2020-11-07
已采纳
0 人赞同

你可以使用

df[['data_time','full_company_name']] = df['birthdat/company'].str.extract(r'^([0-9]{4}-[0-9]{2}-[0-9]{2})(.*)', expand=False)
            Name  Age  ...   data_time        full_company_name
0    Steve Smith   32  ...  1995-01-26    Sharp, Reed and Crane
1      Joe Nadal   34  ...  1955-08-14           Price and Sons
2  Roger Federer   36  ...  2000-06-28  Pruitt, Bush and Mcguir
[3 rows x 5 columns]

The Series.str.extract之所以在这里使用,是因为你需要在不丢失日期的情况下获得两个部分。

The regex is

  • ^ - start of string
  • ([0-9]{4}-[0-9]{2}-[0-9]{2}) - your date pattern captured into Group 1
  • (.*) - the rest of the string captured into Group 2.
  • See the regex demo.

    Quang Hoang
    Quang Hoang
    发布于 2020-11-07
    0 人赞同

    split 通过分隔符来分割字符串,而忽略它们。我认为你想要的是 extract ,有两个捕获组。

    df[['data_time','full_company_name']] = \
       df['birthdat/company'].str.extract('^([0-9]{4}-[0-9]{2}-[0-9]{2})(.*)')
    

    Output:

        Name           birthdat/company                   data_time    full_company_name