Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

With the nice indexing methods in Pandas I have no problems extracting data in various ways. On the other hand I am still confused about how to change data in an existing DataFrame.

In the following code I have two DataFrames and my goal is to update values in a specific row in the first df from values of the second df. How can I achieve this?

import pandas as pd
df = pd.DataFrame({'filename' :  ['test0.dat', 'test2.dat'], 
                                  'm': [12, 13], 'n' : [None, None]})
df2 = pd.DataFrame({'filename' :  'test2.dat', 'n':16}, index=[0])
# this overwrites the first row but we want to update the second
# df.update(df2)
# this does not update anything
df.loc[df.filename == 'test2.dat'].update(df2)
print(df)

gives

   filename   m     n
0  test0.dat  12  None
1  test2.dat  13  None
[2 rows x 3 columns]

but how can I achieve this:

    filename   m     n
0  test0.dat  12  None
1  test2.dat  13  16
[2 rows x 3 columns]

So first of all, pandas updates using the index. When an update command does not update anything, check both left-hand side and right-hand side. If you don't update the indices to follow your identification logic, you can do something along the lines of

>>> df.loc[df.filename == 'test2.dat', 'n'] = df2[df2.filename == 'test2.dat'].loc[0]['n']
Out[331]: 
    filename   m     n
0  test0.dat  12  None
1  test2.dat  13    16

If you want to do this for the whole table, I suggest a method I believe is superior to the previously mentioned ones: since your identifier is filename, set filename as your index, and then use update() as you wanted to. Both merge and the apply() approach contain unnecessary overhead:

>>> df.set_index('filename', inplace=True)
>>> df2.set_index('filename', inplace=True)
>>> df.update(df2)
Out[292]: 
            m     n
filename           
test0.dat  12  None
test2.dat  13    16
                Currently, update() has some bugs. It will not preserve dtypes, and it may lose some data.
– Molin.L
                Mar 10, 2021 at 9:58

In SQL, I would have do it in one shot as

update table1 set col1 = new_value where col1 = old_value

but in Python Pandas, we could just do this:

data = [['ram', 10], ['sam', 15], ['tam', 15]] 
kids = pd.DataFrame(data, columns = ['Name', 'Age']) 

which will generate the following output :

    Name    Age
0   ram     10
1   sam     15
2   tam     15

now we can run:

kids.loc[kids.Age == 15,'Age'] = 17

which will show the following output

Name    Age
0   ram     10
1   sam     17
2   tam     17

which should be equivalent to the following SQL

update kids set age = 17 where age = 15

If you have one large dataframe and only a few update values I would use apply like this:

import pandas as pd
df = pd.DataFrame({'filename' :  ['test0.dat', 'test2.dat'], 
                                  'm': [12, 13], 'n' : [None, None]})
data = {'filename' :  'test2.dat', 'n':16}
def update_vals(row, data=data):
    if row.filename == data['filename']:
        row.n = data['n']
    return row
df.apply(update_vals, axis=1)
                This case is invalid as the row in the apply function will not be related to the dataframe and as such it will not be updated stackoverflow.com/questions/54432583/…
– user495732 Why Me
                Feb 8, 2019 at 22:19

Update null elements with value in the same location in other. Combines a DataFrame with other DataFrame using func to element-wise combine columns. The row and column indexes of the resulting DataFrame will be the union of the two.

df1 = pd.DataFrame({'A': [None, 0], 'B': [None, 4]})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
df1.combine_first(df2)
     A    B
0  1.0  3.0
1  0.0  4.0

more information in this link

You seem to have simply copied the example from combine_first. How should OP adapt this to make sure it works for their purpose? Shouldn't OP take care of the index to make sure the correct one gets updated? – Teepeemm Nov 9, 2020 at 16:57 In this case df2 prevails over df1. I have just copied because is a function created to solve situations similar to this question. – Pablo Vilas Nov 10, 2020 at 9:26

There are probably a few ways to do this, but one approach would be to merge the two dataframes together on the filename/m column, then populate the column 'n' from the right dataframe if a match was found. The n_x, n_y in the code refer to the left/right dataframes in the merge.

In[100] : df = pd.merge(df1, df2, how='left', on=['filename','m'])
In[101] : df
Out[101]: 
    filename   m   n_x  n_y
0  test0.dat  12  None  NaN
1  test2.dat  13  None   16
In[102] : df['n'] = df['n_y'].fillna(df['n_x'])
In[103] : df = df.drop(['n_x','n_y'], axis=1)
In[104] : df
Out[104]: 
    filename   m     n
0  test0.dat  12  None
1  test2.dat  13    16

I needed to update and add suffix to few rows of the dataframe on conditional basis based on the another column's value of the same dataframe -

df with column Feature and Entity and need to update Entity based on specific feature type

df.loc[df.Feature == 'dnb', 'Entity'] = 'duns_' + df.loc[df.Feature == 'dnb','Entity']
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.