如何在一次作业中向pandas数据框架添加多列？

240 人关注

我是pandas的新手，想弄清楚如何同时向pandas添加多个列。希望得到任何帮助。理想情况下，我希望能一步到位，而不是重复多个步骤...

import pandas as pd
df = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)
df[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs',3]  #thought this would work here...


         1
         
         个评论


           
            你需要说明你得到了什么错误。当我在pandas 1.0上尝试这样做时，我得到了
            
             KeyError: "None of [Index(['column_new_1', 'column_new_2', 'column_new_3'], dtype='object')] are in the [columns]"
            
            。


         python


         pandas


         dataframe


        
         
         
          runningbirds
         
        
        
         发布于
         
         2016-08-20


        12
        
        个回答


          
           
           
            Matthias Fripp
           
          
          
           发布于
           
           2022-05-17


          已采纳


         0
         
         人赞同


          
           我本来以为你的语法也可以工作。问题出现了，因为当你用列列表语法（
           
            df[[new1, new2]] = ...
           
           ）创建新的列时，pandas要求右侧是一个DataFrame（注意，如果DataFrame的列与你所创建的列有相同的名字，实际上并不重要）。
          
          
           你的语法对于将标量值分配给
           
            existing
           
           列，而pandas也很乐意使用单列语法（
           
            df[new1] = ...
           
           ）将标量值分配给新的列。因此，解决方案是要么将其转换为几个单列赋值，要么为右侧创建一个合适的DataFrame。
          
          
           这里有几种方法可以
           
            will
           
           work:
          
          import pandas as pd
import numpy as np
df = pd.DataFrame({
    'col_1': [0, 1, 2, 3],
    'col_2': [4, 5, 6, 7]
Then one of the following:
1) Three assignments in one, using list unpacking:
df['column_new_1'], df['column_new_2'], df['column_new_3'] = [np.nan, 'dogs', 3]
2) DataFrame conveniently expands a single row to match the index, so you can do this:
df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)
3) Make a temporary data frame with new columns, then combine with the original data frame later:
df = pd.concat(
        pd.DataFrame(
            [[np.nan, 'dogs', 3]], 
            index=df.index, 
            columns=['column_new_1', 'column_new_2', 'column_new_3']
    ], axis=1
4) Similar to the previous, but using join instead of concat (may be less efficient):
df = df.join(pd.DataFrame(
    [[np.nan, 'dogs', 3]], 
    index=df.index, 
    columns=['column_new_1', 'column_new_2', 'column_new_3']
5) Using a dict is a more "natural" way to create the new data frame than the previous two, but the new columns will be sorted alphabetically (at least before Python 3.6 or 3.7):
df = df.join(pd.DataFrame(
        'column_new_1': np.nan,
        'column_new_2': 'dogs',
        'column_new_3': 3
    }, index=df.index
6) Use .assign() with multiple column arguments.
我非常喜欢@zero的这个答案的变体，但是和前面的一样，新的列总是按字母顺序排序，至少在早期的Python版本中是这样。
df = df.assign(column_new_1=np.nan, column_new_2='dogs', column_new_3=3)
7) This is interesting (based on https://stackoverflow.com/a/44951376/3830997), but I don't know when it would be worth the trouble:
new_cols = ['column_new_1', 'column_new_2', 'column_new_3']
new_vals = [np.nan, 'dogs', 3]
df = df.reindex(columns=df.columns.tolist() + new_cols)   # add empty cols
df[new_cols] = new_vals  # multi-column assignment works for existing cols
8) In the end it's hard to beat three separate assignments:
df['column_new_1'] = np.nan
df['column_new_2'] = 'dogs'
df['column_new_3'] = 3
注意：这些选项中有许多已经在其他答案中涉及。向DataFrame添加多列并将其设置为与现有列相等的列, 是否可以在pandas DataFrame中一次性添加几列？, 为pandas数据框架添加多个空列


           
            
             
              
               
                
                 
                  
                   
                    
                     不会接近第7号(
                     
                      
                       .reindex
                      
                     
                     ) 改变数据框架的索引？为什么有人想在添加列时不必要地改变索引，除非这是一个明确的目标......


           
            
             
              
               
                
                 
                  
                   
                    
                     Matthias Fripp
                    
                    ：


           
            
             
              
               
                
                 
                  
                   
                    
                     
                      .reindex()
                     
                     是与
                     
                      columns
                     
                     参数一起使用的，所以它只改变列的 "索引"（名称）。它并不改变行的索引。


           
            
             
              
               
                
                 
                  
                   
                    
                     如果你使用的是
                     
                      join
                     
                     的选项，请确保你的索引中没有重复的内容（或者先使用
                     
                      reset_index
                     
                     ）。可能会节省你几个小时的调试时间。


           
            
             
              
               
                
                 
                  
                   
                    
                     questionto42standswithUkraine
                    
                    ：


           
            
             
              
               
                
                 
                  
                   
                    
                     @smci
                     
                      .assign()
                     
                     当然更灵活，但如果你只是有几列要添加，这些必须简单地在一个嵌套的数组列表中放入它们的正确顺序，或者像#2中的df一样，然后再进行赋值。我只是不明白为什么要用
                     
                      .assign()
                     
                     来分割每一个列的赋值。#2号是我在实践中每次分配列时的做法。替换代码2】，而#2的
                     
                      pd.DataFrame()
                     
                     甚至不需要。不需要重复df[]（#1），也不需要单独的列赋值（#6）。


           
            
             
              
               
                
                 
                  
                   
                    
                     一些性能指标将真正使这个职位成为黄金。


          
           
            
             
              
               
                
                 
                  
                   
                    你可以使用
                    
                     assign
                    
                    与一个列名和值的口令。
                   
                   In [1069]: df.assign(**{'col_new_1': np.nan, 'col2_new_2': 'dogs', 'col3_new_3': 3})
Out[1069]:
   col_1  col_2 col2_new_2  col3_new_3  col_new_1
0      0      4       dogs           3        NaN
1      1      5       dogs           3        NaN
2      2      6       dogs           3        NaN
3      3      7       dogs           3        NaN


           
            
             
              
               
                
                 
                  
                   
                    
                     
                      是否有一种方法可以保持列的特定顺序？


           
            
             
              
               
                
                 
                  
                   
                    
                     
                      在早期版本的Python中，你可以通过多次调用assign来保持一个特定的顺序。
                      
                       df.assign(**{'col_new_1': np.nan}).assign(**{'col2_new_2': 'dogs'}).assign(**{'col3_new_3': 3})


           
            
             
              
               
                
                 
                  
                   
                    
                     
                      Tobias Bergkvist
                     
                     ：


           
            
             
              
               
                
                 
                  
                   
                    
                     
                      如果列名只包含合法变量名的字符串。
                      
                       df.assign(col_new_1=np.nan, col2_new_2='dogs', col3_new_3=3)
                      
                      。这样可以维持秩序。


           
            
             
              
               
                
                 
                  
                   
                    
                     
                      paradocslover
                     
                     ：


           
            
             
              
               
                
                 
                  
                   
                    
                     
                      如果
                      
                       np.nan
                      
                      、
                      
                       dogs
                      
                      和
                      
                       3
                      
                      的值是通过单一操作获得的呢？使用这种方法将需要进行三次操作。有没有一种方法可以让我使用assign，只做一次操作？@Zero


          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                     
                      Matt Harrison
                     
                    
                    
                     发布于
                     
                     2022-05-17


          
           
            
             
              
               
                
                 
                  
                   
                    
                     我写Pandas的目标是写出高效可读的代码，并能进行链化。在这里我就不说为什么我这么喜欢链式，我在书中详细阐述了。
                     
                      有效的熊猫
                     
                     .
                    
                    
                     我经常想以一种简洁的方式添加新的列，同时也允许我进行连锁。我的一般规则是，我使用
                     
                      .assign
                     
                     的方法更新或创建列。
                    
                    
                     为了回答你的问题，我将使用以下代码。
                    
                    .assign(column_new_1=np.nan,
         column_new_2='dogs',
         column_new_3=3
                    
                     再往前走一点。我经常有一个数据框架有新的列，我想添加到我的数据框架中。让我们假设它看起来像......一个有你想要的三列的数据框架。
                    
                    df2 = pd.DataFrame({'column_new_1': np.nan,
                    'column_new_2': 'dogs',
                    'column_new_3': 3},
                   index=df.index
在这种情况下，我将写下以下代码。
 .assign(**df2)


          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                      
                       Nehal J Wani
                      
                     
                     
                      发布于
                      
                      2022-05-17


          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                      With the use of
                      
                       连接
                      
                      :
                     
                     In [128]: df
Out[128]: 
   col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7
In [129]: pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
Out[129]: 
   col_1  col_2 column_new_1 column_new_2 column_new_3
0    0.0    4.0          NaN          NaN          NaN
1    1.0    5.0          NaN          NaN          NaN
2    2.0    6.0          NaN          NaN          NaN
3    3.0    7.0          NaN          NaN          NaN
不是很清楚你想对[np.nan, 'dogs',3]做什么。也许现在把它们设置为默认值？
In [142]: df1 = pd.concat([df, pd.DataFrame(columns = [ 'column_new_1', 'column_new_2','column_new_3'])])
In [143]: df1[[ 'column_new_1', 'column_new_2','column_new_3']] = [np.nan, 'dogs', 3]
In [144]: df1
Out[144]: 
   col_1  col_2  column_new_1 column_new_2  column_new_3
0    0.0    4.0           NaN         dogs             3
1    1.0    5.0           NaN         dogs             3
2    2.0    6.0           NaN         dogs             3
3    3.0    7.0           NaN         dogs             3


           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        
                         runningbirds
                        
                        ：


           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        
                         如果有一种方法可以在一个步骤中完成你的第二部分--是的，以列中的恒定值为例。


          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        使用列表理解，
                        
                         pd.DataFrame
                        
                        和
                        
                         pd.concat
                        
                        。
                       




    

                       pd.concat(
        pd.DataFrame(
            [[np.nan, 'dogs', 3] for _ in range(df.shape[0])],
            df.index, ['column_new_1', 'column_new_2','column_new_3']
    ], axis=1)


           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        
                         
                          注意，
                          
                           concat
                          
                          将生成一个新的数据框架，而不是向现有的数据框架添加列。


          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        
                         如果添加很多缺失的列(a, b, c ,....)，其数值相同，这里是0，我是这样做的。
                        
                            new_cols = ["a", "b", "c" ] 
    df[new_cols] = pd.DataFrame([[0] * len(new_cols)], index=df.index)
这是基于公认答案的第二个变体。


          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        
                         
                          
                          
                           halfmoonhalf
                          
                         
                         
                          发布于
                          
                          2022-05-17


          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        
                         
                          只想指出，@Matthias Fripp的答案中的选项2
                         
                         
                          (2) 我不一定期望DataFrame能以这种方式工作，但它确实如此
                         
                         
                          df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)
                         
                         
                          在pandas自己的文档中已经记录了
                          
                           http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics
                          
                         
                         
                          你可以向[]传递一个列的列表，以便按该顺序选择列。
如果一个列不包含在DataFrame中，将会产生一个异常。
                          
                           也可以用这种方式设置多列。
                          
                          你可能会发现这对应用转换(
                          
                           就地
                          
                          )到一个列的子集。


           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        
                         
                          
                           Matthias Fripp
                          
                          ：


           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        
                         
                          
                           我认为这对于多列赋值来说是非常标准的。让我吃惊的是，
                           
                            pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)
                           
                           复制了它所得到的一条记录，以创建一个与索引相同长度的整个数据框架。


          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        
                         
                          词典映射与
                          
                           .assign()
                          
                          。
                         
                         
                          在处理许多列时，这是为新列赋值的最易读和最动态的方法。
                         
                         import pandas as pd
import numpy as np
new_cols = ["column_new_1", "column_new_2", "column_new_3"]
new_vals = [np.nan, "dogs", 3]
# Map new columns as keys and new values as values
col_val_mapping = dict(zip(new_cols, new_vals))
# Unpack new column/new value pairs and assign them to the data frame
df = df.assign(**col_val_mapping)
如果你只是想把新列的值初始化为空，因为你要么不知道值是什么，要么有很多新列。
import pandas as pd
import numpy as np
new_cols = ["column_new_1", "column_new_2", "column_new_3"]
new_vals = [None for item in new_cols]
# Map new columns as keys and new values as values
col_val_mapping = dict(zip(new_cols, new_vals))
# Unpack new column/new value pairs and assign them to the data frame
df = df.assign(**col_val_mapping)


          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        
                         
                          
                           
                            
                            
                             Markus Dutschke
                            




    

                           
                           
                            发布于
                            
                            2022-05-17


          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        
                         
                          
                           
                            如果你只是想添加空的新列。
                            
                             reindex
                            
                            will do the job
                           
                           col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7
df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
   col_1  col_2  column_new_1  column_new_2  column_new_3
0      0      4           NaN           NaN           NaN
1      1      5           NaN           NaN           NaN
2      2      6           NaN           NaN           NaN
3      3      7           NaN           NaN           NaN
                           
                            完整的代码示例
                           
                           import numpy as np
import pandas as pd
df = {'col_1': [0, 1, 2, 3],
        'col_2': [4, 5, 6, 7]}
df = pd.DataFrame(df)
print('df',df, sep='\n')
print()
df=df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)
print('''df.reindex(list(df)+['column_new_1', 'column_new_2','column_new_3'], axis=1)''',df, sep='\n')
otherwise go for 零点以下列方式回答指派


          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        
                         
                          
                           
                            
                             我不习惯使用 "Index "等字样......可能会出现如下情况
                            
                            df.columns
Index(['A123', 'B123'], dtype='object')
df=pd.concat([df,pd.DataFrame(columns=list('CDE'))])
df.rename(columns={
    'C':'C123',
    'D':'D123',
    'E':'E123'
},inplace=True)
df.columns
Index(['A123', 'B123', 'C123', 'D123', 'E123'], dtype='object')


          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        
                         
                          
                           
                            
                             
                              You could instantiate the values from a dictionary if you wanted different values for each column & you don't mind making a dictionary on the line before.
                             
                             >>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({
  'col_1': [0, 1, 2, 3], 
  'col_2': [4, 5, 6, 7]
   col_1  col_2
0      0      4
1      1      5
2      2      6
3      3      7
>>> cols = {
  'column_new_1':np.nan,
  'column_new_2':'dogs',
  'column_new_3': 3
>>> df[list(cols)] = pd.DataFrame(data={k:[v]*len(df) for k,v in cols.items()})
   col_1  col_2  column_new_1 column_new_2  column_new_3
0      0      4           NaN         dogs             3
1      1      5           NaN         dogs             3
2      2      6           NaN         dogs             3
3      3      7           NaN         dogs             3
不一定比公认的答案好，但这是另一种尚未列出的方法。


          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        
                         
                          
                           
                            
                             
                              
                               
                               
                                miriam mazzeo
                               
                              
                              
                               发布于
                               
                               2022-05-17


          
           
            
             
              
               
                
                 
                  
                   
                    
                     
                      
                       
                        
                         
                          
                           
                            
                             
                              import pandas as pd
df = pd.DataFrame({
 'col_1': [0, 1, 2, 3], 
 'col_2': [4, 5, 6, 7]
df['col_3'],  df['col_4'] =  [df.col_1]*2
col_1   col_2   col_3   col_4

如何在一次作业中向pandas数据框架添加多列？

1) Three assignments in one, using list unpacking:

2) `DataFrame` conveniently expands a single row to match the index, so you can do this:

3) Make a temporary data frame with new columns, then combine with the original data frame later:

4) Similar to the previous, but using `join` instead of `concat` (may be less efficient):

5) Using a dict is a more "natural" way to create the new data frame than the previous two, but the new columns will be sorted alphabetically (at least before Python 3.6 or 3.7):

6) Use `.assign()` with multiple column arguments.

7) This is interesting (based on https://stackoverflow.com/a/44951376/3830997), but I don't know when it would be worth the trouble:

8) In the end it's hard to beat three separate assignments: