python - Intersection of two or more DataFrame columns

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am trying to find the intersect of three dataframes, however the pd.intersect1d does not like to use three dataframes.

import numpy as np
import pandas as pd
df1 = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('BCDE'))
df3 = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('CDEF'))
inclusive_list = np.intersect1d(df1.columns, df2.columns, df3.columns)
Error: 
ValueError: The truth value of a Index is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
The inclusive_list should only include column names C & D. Any help would be appreciated. Thank you. 
                Not going to hammer because the error makes this a bit different, but the question is very similar: stackoverflow.com/questions/48539195/…
– user3483203
                Jan 9, 2019 at 16:25
Why your current approach doesn't work:
intersect1d does not take N arrays, it only compares 2.
  numpy.intersect1d(ar1, ar2, assume_unique=False, return_indices=False)
You can see from the definition that you are passing the third array as the assume_unique parameter, and since you are treating an array like a single boolean, you receive a ValueError. 
You can extend the functionality of intersect1d to work on N arrays using functools.reduce:
from functools import reduce
reduce(np.intersect1d, (df1.columns, df2.columns, df3.columns))
A better approach
However, the easiest approach is to just use intersection on the Index object:
df1.columns & df2.columns & df3.columns
You can using concat 
pd.concat([df1.head(1),df2.head(1),df3.head(1)],join='inner').columns
Out[81]: Index(['C', 'D'], dtype='object')
Note that the arguments passed to np.intersect1d (https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.intersect1d.html) are expected to be two arrays (ar1 and ar2).
Passing 3 arrays means that the assume_unique variable within the function is being set as an array (expected to be a bool).
You can also use simple native python set methods if you don't want to use numpy
inclusive_list = set(df1.columns).intersection(set(df2.columns)).intersection(set(df3.columns))
                You don't need the conversion to set.  An Index object is already implemented as an "ordered, sliceable set", and you can use those operations on it already.
– user3483203
                Jan 9, 2019 at 16:27
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.

推荐文章

虚心的薯片 · 臺灣大學資訊系統訓練班

1 周前

好帅的野马 · nodejs调用脚本(python/shell)和系统命令 - 单曲荨环

1 周前

纯真的丝瓜 · 使用aiohttp实现Python异步HTTP POST请求_python 异步发post请求,不等返回结果的异步

4 天前

乖乖的橙子 · python利用requests库模拟post请求时json的使用 - NewJune

4 天前

失落的木瓜 · Python报错：PermissionError: [Errno 13] Permission denied解决方案详解

昨天

活泼的双杠 · 稳健回归(RANSAC)-SPSSPRO帮助中心

4 月前

风流倜傥的黄豆 · 钢铁雄心4dlc补丁下载-钢铁雄心4dlc补丁绿色版下载v1.0-完全实况

6 月前

爱旅游的汽水 · FAT32及exFAT格式有何不同? 我該使用哪種格式來格式化記憶卡呢? - 創見資訊

6 月前

私奔的领结 · 队友是UZI就难赢？兰林汉偶遇UZI遗憾躺输，网友表示遇到UZI算你自己倒霉_腾讯新闻

1 年前

难过的橙子 · 使用 PowerShell 更改数据源连接字符串 - Power BI | Microsoft Learn

1 年前