「python」DataFrame数据合并_first argument must be an iterable of pandas objec_qq_1144521901的博客

介绍两个函数：pandas.merge和pandas.concat

1. merge

merge可以翻译成是融合的意思，使用的时候注意参数的设置。
函数的参数：
merge(
    left,
    right,
    how="inner",
    on=None,
    left_on=None,
    right_on=None,
    left_index=False,
    right_index=False,
    sort=False,
    suffixes=("_x", "_y"),
    copy=True,
    indicator=False,
    validate=None,
参数详解： 
对于inner、left、right、outer的解释： 
参考：https://blog.csdn.net/trayvontang/article/details/103787648 
常见报错信息： 
就是合并之后为空 
a=pd.DataFrame({'a':[1,2,3],'b':[2,3,4]})
b=pd.DataFrame({'a':[11,22,33],'c':[22,33,44]})
c=pd.merge(a,b)
print(c)
输出结果为： 
Empty DataFrame
Columns: [a, b, c]
Index: []
通过验证发现，a和b的同名列表被合并，但是都是空说明默认连接形式是内连接，及二者默认把相同列名作为查找的条件，若是查找不到相同的值返回空。 
因此需要加入连接条件 
c=pd.merge(a,b,how='outer',on='a')
print（c）
输出结果为： 
    a    b     c
0   1  2.0   NaN
1   2  3.0   NaN
2   3  4.0   NaN
3  11  NaN  22.0
4  22  NaN  33.0
5  33  NaN  44.0
参考：https://blog.csdn.net/youyoujbd/article/details/88930961 
2. concat 
该函数可以翻译成：连接（就是两个表格的直接相连） 
和mrege不同的是cancat是真正的"连接‘’，它把a,b两个表完全拼接在一起，默认拼接形式是并集，我们可以通过修改参数来修改拼接模式，以及拼接方向，也可以重述索引。 
a=pd.DataFrame({'a':[1,2,3],'b':[2,3,4]})
b=pd.DataFrame({'a':[11,22,33],'c':[22,33,44]})
pd.concat([a,b],axis=1)
   a  b   a   c
0  1  2  11  22
1  2  3  22  33
2  3  4  33  44
a=pd.DataFrame({'a':[1,2,3],'b':[2,3,4]})
b=pd.DataFrame({'a':[11,22,33],'c':[22,33,44]})
pd.concat([a,b],join='inner')
0   1
1   2
2   3
0  11
1  22
2  33
a=pd.DataFrame({'a':[1,2,3],'b':[2,3,4]})
b=pd.DataFrame({'a':[1,2,3],'b':[22,33,44]})
pd.concat([a,b])
a   b
1   2
2   3
3   4
1  22
2  33
3  44
NB：数据不会被覆盖，而是直接连接到下面 
d=pd.concat([a,b])
d.index=list(range(0,6))
print（d）
   a    b     c
0   1  2.0   NaN
1   2  3.0   NaN
2   3  4.0   NaN
3  11  NaN  22.0
4  22  NaN  33.0
5  33  NaN  44.0
常见的一个报错信息： 
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame" 
出错原因就是，在使用pandas.concat(a,b)进行合并的时候，需要是list的形式。因此改成pandas.concat([a,b]),就可以成功合并。 
 a = pd.DataFrame()
b = pd.DataFrame()
c = pd.concat(a,b) # errors out:
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
c = pd.concat([a,b]) # works. 
参考：https://stackoverflow.com/questions/39534676/typeerror-first-argument-must-be-an-iterable-of-pandas-objects-you-passed-an-o 
3. join函数 
DataFrame自身具有一个函数join，可以实现一定的连接功能。 
 函数参数： 
 df.join(other, on=None, how=’left’, lsuffix=”, rsuffix=”, sort=False) 
df3=pd.DataFrame({'Red':[1,3,5],'Green':[5,0,3]},index=list('abd'))
print（df3）
df4=pd.DataFrame({'Blue':[1,9],'Yellow':[6,6]},index=list('ce'))
print（df4）
df3.join(df4) 
输出结果：默认是left连接 
例2：使用参考how=“outer” 
df3.join(df4,how='outer') 
输出结果： 
例3：合并多个对象 
df3=pd.DataFrame({'Red':[1,3,5],'Green':[5,0,3]},index=list('abd'))
print(df3)
df4=pd.DataFrame({'Blue':[1,9],'Yellow':[6,6]},index=list('ce'))
print(df4)
df5=pd.DataFrame({'Brown':[3,4,5],'White':[1,1,2]},index=list('aed'))
print(df3.join([df4,df5])) 
输出结果： 
df3=pd.DataFrame({'Red':[1,3,5],'Green':[5,0,3]},index=list('abd'))
print(df3)
df4=pd.DataFrame({'Blue':[1,9],'Yellow':[6,6]},index=list('ce'))
print(df4)
df5=pd.DataFrame({'Brown':[3,4,5],'White':[1,1,2]},index=list('aed'))
print(df5)
print(df3.join([df4,df5],how='outer')) 
输出结果： 
参考：https://blog.csdn.net/weixin_38168620/article/details/80659154
                    使用python读取数据，进行所谓表的合并是非常常见的。但是我在这里不是介绍如何合并不同类型的表格介绍两个函数：pandas.merge和pandas.concat1. mergemerge可以翻译成是融合的意思，使用的时候注意参数的设置。函数的参数：merge(  left,  right,  how="inner",  on=None,  ...
				http://liao.cpython.org/pandas26/
http://liao.cpython.org/pandas25/
https://blog.csdn.net/weixin_37226516/article/details/64134643
两个Series的拼接，默认是在列上(往下)拼接，axis = 0,如果要横向往右拼接，axis = 1
concat(objs, a...
				使用concat()函数拼接两个表格，出现以下错误：
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
解决方法：
最后一行必须采用以下格式：
df=pd.concat([df1,df2,df3,df4,...], ignore_index=True
问题得以解决。
相关问题，可参考：https://stackoverflow.com
pandas和python标准库提供了一整套高级、灵活的、高效的核心函数和算法将数据规整化为你想要的形式！
本篇博客主要介绍：
合并数据集：.merge()、.concat()等方法，类似于SQL或其他关系型数据库的连接操作。
合并数据集
1） merge 函数参数
				在工作中经常遇到需要将数据输出到excel，且需要对其中一些单元格进行合并，比如如下表表格，需要根据A列的值，合并B、C列的对应单元格
pandas中的to_excel方法只能对索引进行合并，而xlsxwriter中，虽然提供有merge_range方法，但是这只是一个和基础的方法，每次都需要编写繁琐的测试才能最终调好，而且不能很好的重用。所以想自己写一个方法，结合dataframe和merge_range。大概思路是：
1、定义一个MY_DataFrame类，继承DataFrame类，这样能很好的利用pandas的很多特性，而不用自己重新组织数据结构。
2、定义一个my_mergewr_
				更多文章可关注微信公众号：Excelwork
“作为pandas库常用的函数，应该做到熟悉才行，最近发现自己也并没真正理解这个函数，本文目的也是加深下对concat函数的理解。”
语法：pandas.concat(objs,axis=0,join='outer',join_axes=None,ignore_index=False,keys=None,levels=None,names=None,verify_integrity=False,sort=None,copy=True)...
				CSV & Text files
The two workhorse functions for reading text files (a.k.a. flat files) are read_csv() and read_table().
 They both use the same parsing code to intelligently convert tabular data in
parent_teacher_data['address'] = parent_teacher_data['country']+parent_teacher_data['province']+parent_teacher_data['city']+parent_teacher_data['county']
就可以把四列合并成新的列address
如果某一列是非str类型的数据，那么我们需要用到map(s
装饰器本质上是一个Python函数，它可以让其他函数在不需要做任何代码变动的前提下增加额外功能，装饰器的返回值也是一个函数对象。它经常用于有切面需求的场景，比如：插入日志、性能测试、事务处理、缓存、权限校验等场景。装饰器是解决这类问题的绝佳设计，有了装饰器，我们就可以抽离出大量与函数功能本身无关的雷同代码并继续重用。
在OOP程序设计中，当我们定义一个class的时候，可以...
				pd.merge合并的时候，首先要求是dataframe对象或者是series对象
Can only merge Series or DataFrame objects,a <class 'numpy.ndarray'> was passed
这里我的numpy.ndarray是result = model.predict(test_data)返回的结果
ndarray如何转dataf...
# 创建两个Dataframe
df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value1': [1, 2, 3, 4]})
df2 = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value2': [5, 6, 7, 8]})
# 合并两个Dataframe
merged_df = pd.merge(df1, df2, on='key')
# 输出合并后的Dataframe
print(merged_df)
在上述代码中，我们创建了两个Dataframe，分别包含两列数据，然后使用merge函数将它们合并成一个新的Dataframe。其中，on参数指定了合并的关键字，即两个Dataframe中需要合并的列名。最后，我们输出合并后的Dataframe，即可得到合并后的结果。