concat([dataFrame1, dataFrame2,…], index_ingore=False)
参数说明:index_ingore=False(表示合并的索引不延续),index_ingore=True(表示合并的索引可延续)
import pandas as pd
import numpy as np
创建一个十行两列的二维数据
df = pd.DataFrame(np.random.randint(0, 10, (3, 2)), columns=['A', 'B'])
将数据拆分成两份,并保存在列表中
data_list = [df[0:2], df[3:]]
索引值不延续
df1 = pd.concat(data_list, ignore_index=False)
索引值延续
df2 = pd.concat(data_list, ignore_index=True)
* 返回结果
```python
----------------df--------------------------
0 7 8
1 7 3
2 5 9
3 4 0
4 1 8
----------------df1--------------------------
0 7 8
1 7 3
3 4 0# -------------->这里并没有2出现,索引不连续
4 1 8
----------------df2--------------------------
0 7 8
1 7 3
2 4 0
3 1 8
1.1.2 append函数
df.append(df1, index_ignore=True)
参数说明:index_ingore=False(表示索引不延续),index_ingore=True(表示索引延续)
import pandas as pd
import numpy as np
创建一个五行两列的二维数组
df = pd.DataFrame(np.random.randint(0, 10, (5, 2)), columns=['A', 'B'])
创建要追加的数据
narry = np.random.randint(0, 10, (3, 2))
data_list = pd.DataFrame(narry, columns=['A', 'B'])
df1 = df.append(data_list, ignore_index=True)
* 返回结果
```python
----------------df--------------------------
0 5 6
1 1 2
2 5 3
3 1 8
4 1 2
----------------df1--------------------------
0 5 6
1 1 2
2 5 3
3 1 8
4 1 2
5 8 1
6 3 5
7 1 1
1.2 字段合并
将同一个数据不同列合并
pd.merge( left, right, how="inner", on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=("_x", "_y"), copy=True, indicator=False, validate=None, )
import pandas as pd
df1 = pd.DataFrame({'key':['a','b','c'], 'data1':range(3)})
df2 = pd.DataFrame({'key':['a','b','c'], 'data2':range(3)})
df = pd.merge(df1, df2) # 合并时默认以重复列并作为合并依据
* 结果展示
```python
----------------df1--------------------------
key data1
0 a 0
1 b 1
2 c 2
----------------df2--------------------------
key data2
0 a 0
1 b 1
2 c 2
----------------df---------------------------
key data1 data2
0 a 0 0
1 b 1 1
2 c 2 2
# 多键连接时将连接键组成列表传入
right=DataFrame({'key1':['foo','foo','bar','bar'],
'key2':['one','one','one','two'],
'lval':[4,5,6,7]})
left=DataFrame({'key1':['foo','foo','bar'],
'key2':['one','two','one'],
'lval':[1,2,3]})
pd.merge(left,right,on=['key1','key2'],how='outer')
* 结果展示
```python
----------------right-------------------------
key1 key2 lval
0 foo one 4
1 foo one 5
2 bar one 6
3 bar two 7
----------------left--------------------------
key1 key2 lval
0 foo one 1
1 foo two 2
2 bar one 3
----------------df---------------------------
key1 key2 lval_x lval_y
0 foo one 1.0 4.0
1 foo one 1.0 5.0
2 foo two 2.0 NaN
3 bar one 3.0 6.0
4 bar two NaN 7.0
data.drop_duplicates(subset=['A','B'],keep='first',inplace=True)
df = pd.DataFrame({
'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
'rating': [4, 4, 3.5, 15, 5]
df.drop_duplicates()
```python
---------------去重前的df---------------------------
brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
---------------去重后的df---------------------------
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
使用subset 去除某几列重复的行数据
data.drop_duplicates(subset=['A','B'],keep='first',inplace=True)
df.drop_duplicates(subset=['brand'])
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
使用 keep删除重复项并保留最后一次出现
df.drop_duplicates(subset=['brand', 'style'], keep='last')
brand style rating
1 Yum Yum cup 4.0
2 Indomie cup 3.5
4 Indomie pack 5.0
Pandas: 数据合并
drop_duplicates去重详解
Pandas之drop_duplicates:去除重复项