相关文章推荐
狂野的麦片  ·  基于 TMDB ...·  1 周前    · 
有胆有识的椰子  ·  Pandas ...·  1 周前    · 
兴奋的草稿纸  ·  如何从spark scala ...·  4 天前    · 
酷酷的冲锋衣  ·  海康工业相机python ...·  1 年前    · 
失落的企鹅  ·  GitKraKen ...·  1 年前    · 
concat([dataFrame1, dataFrame2,…], index_ingore=False)

参数说明:index_ingore=False(表示合并的索引不延续),index_ingore=True(表示合并的索引可延续)

import pandas as pd import numpy as np

创建一个十行两列的二维数据

df = pd.DataFrame(np.random.randint(0, 10, (3, 2)), columns=['A', 'B'])

将数据拆分成两份,并保存在列表中

data_list = [df[0:2], df[3:]]

索引值不延续

df1 = pd.concat(data_list, ignore_index=False)

索引值延续

df2 = pd.concat(data_list, ignore_index=True)

* 返回结果 ```python ----------------df-------------------------- 0 7 8 1 7 3 2 5 9 3 4 0 4 1 8 ----------------df1-------------------------- 0 7 8 1 7 3 3 4 0# -------------->这里并没有2出现,索引不连续 4 1 8 ----------------df2-------------------------- 0 7 8 1 7 3 2 4 0 3 1 8

1.1.2 append函数

df.append(df1, index_ignore=True) 

参数说明:index_ingore=False(表示索引不延续),index_ingore=True(表示索引延续)

import pandas as pd import numpy as np

创建一个五行两列的二维数组

df = pd.DataFrame(np.random.randint(0, 10, (5, 2)), columns=['A', 'B'])

创建要追加的数据

narry = np.random.randint(0, 10, (3, 2))
data_list = pd.DataFrame(narry, columns=['A', 'B'])

df1 = df.append(data_list, ignore_index=True)

* 返回结果 ```python ----------------df-------------------------- 0 5 6 1 1 2 2 5 3 3 1 8 4 1 2 ----------------df1-------------------------- 0 5 6 1 1 2 2 5 3 3 1 8 4 1 2 5 8 1 6 3 5 7 1 1

1.2 字段合并

将同一个数据不同列合并

pd.merge( left, right, how="inner", on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=("_x", "_y"), copy=True, indicator=False, validate=None, )
import pandas as pd

df1 = pd.DataFrame({'key':['a','b','c'], 'data1':range(3)})
df2 = pd.DataFrame({'key':['a','b','c'], 'data2':range(3)})
df = pd.merge(df1, df2) # 合并时默认以重复列并作为合并依据

* 结果展示 ```python ----------------df1-------------------------- key data1 0 a 0 1 b 1 2 c 2 ----------------df2-------------------------- key data2 0 a 0 1 b 1 2 c 2 ----------------df--------------------------- key data1 data2 0 a 0 0 1 b 1 1 2 c 2 2 # 多键连接时将连接键组成列表传入

right=DataFrame({'key1':['foo','foo','bar','bar'],
'key2':['one','one','one','two'],
'lval':[4,5,6,7]})

left=DataFrame({'key1':['foo','foo','bar'],
'key2':['one','two','one'],
'lval':[1,2,3]})

pd.merge(left,right,on=['key1','key2'],how='outer')

* 结果展示
```python
----------------right-------------------------
  key1 key2  lval
0  foo  one     4
1  foo  one     5
2  bar  one     6
3  bar  two     7
----------------left--------------------------
  key1 key2  lval
0  foo  one     1
1  foo  two     2
2  bar  one     3
----------------df---------------------------
  key1 key2  lval_x  lval_y
0  foo  one     1.0     4.0
1  foo  one     1.0     5.0
2  foo  two     2.0     NaN
3  bar  one     3.0     6.0
4  bar  two     NaN     7.0
data.drop_duplicates(subset=['A','B'],keep='first',inplace=True)
df = pd.DataFrame({ 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'], 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], 'rating': [4, 4, 3.5, 15, 5]

df.drop_duplicates()

```python ---------------去重前的df--------------------------- brand style rating 0 Yum Yum cup 4.0 1 Yum Yum cup 4.0 2 Indomie cup 3.5 3 Indomie pack 15.0 4 Indomie pack 5.0 ---------------去重后的df--------------------------- brand style rating 0 Yum Yum cup 4.0 2 Indomie cup 3.5 3 Indomie pack 15.0 4 Indomie pack 5.0

使用subset 去除某几列重复的行数据

data.drop_duplicates(subset=['A','B'],keep='first',inplace=True)

df.drop_duplicates(subset=['brand'])
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5

使用 keep删除重复项并保留最后一次出现

df.drop_duplicates(subset=['brand', 'style'], keep='last') 
brand style rating
1 Yum Yum cup 4.0
2 Indomie cup 3.5
4 Indomie pack 5.0
  • Pandas: 数据合并
  • drop_duplicates去重详解
  • Pandas之drop_duplicates:去除重复项
  •