Objects passed to the function are Series objects whose index is either the DataFrame's index (``axis=0``) or the DataFrame's columns(``axis=1``).
传递给函数的对象是Series对象,其索引是DataFrame的索引(axis=0)或DataFrame的列(axis=1)。
By default (``result_type=None``), the final return type is inferred from the return type of the applied function. Otherwise,it depends on the `result_type` argument.
默认情况下( result_type=None),最终的返回类型是从应用函数的返回类型推断出来的。否则,它取决于' result_type '参数。
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]},
index=['a', 'b', 'c'])
A B C
a 1 4 7
b 2 5 8
c 3 6 9
# 对各列应用函数 axis=0
df.apply(lambda x: np.sum(x))
A 6
B 15
C 24
dtype: int64
# 对各行应用函数
df.apply(lambda x: np.sum(x), axis=1)
a 12
b 15
c 18
dtype: int64
3.2 Series使用apply
s = pd.Series([20, 21, 12],index=['London', 'New York', 'Helsinki'])
London 20
New York 21
Helsinki 12
dtype: int64
# 定义函数并将其作为参数传递给 apply,求值平方化。
def square(x):
return x ** 2
s.apply(square)
London 400
New York 441
Helsinki 144
dtype: int64
# 通过将匿名函数作为参数传递给 apply
s.apply(lambda x: x ** 2)
London 400
New York 441
Helsinki 144
dtype: int64
# 定义一个需要附加位置参数的自定义函数
# 并使用args关键字传递这些附加参数。
def subtract_custom_value(x, custom_value):
return x - custom_value
s.apply(subtract_custom_value, args=(5,))
London 15
New York 16
Helsinki 7
dtype: int64
# 定义一个接受关键字参数并将这些参数传递
# 给 apply 的自定义函数。
def add_custom_values(x, **kwargs):
for month in kwargs:
x += kwargs[month]
return x
s.apply(add_custom_values, june=30, july=20, august=25)
London 95
New York 96
Helsinki 87
dtype: int64
# 使用Numpy库中的函数
s.apply(np.log)
London 2.995732
New York 3.044522
Helsinki 2.484907
dtype: float64
3.3 其他案例
import pandas as pd
# 显示所有列
pd.set_option('display.max_columns', None)
# 显示所有行
pd.set_option('display.max_rows', None)
# 设置value的显示长度为100,默认为50
pd.set_option('max_colwidth', 100)
# 用来计算日期差的包
import datetime
def dataInterval(data1, data2):
Args:
:param data1: datetime
:param data2: datetime
:return: delta days
d1 = datetime.datetime.strptime(data1, '%Y-%m-%d')
d2 = datetime.datetime.strptime(data2, '%Y-%m-%d')
delta = d1 - d2
return delta.days
def getInterval(arrLike):
Args:
:param arrLike: DataFrame
:return: delta days
PublishedTime = arrLike['PublishedTime']
ReceivedTime = arrLike['ReceivedTime']
days = dataInterval(PublishedTime.strip(), ReceivedTime.strip())
return days
def getInterval_new(arrLike, before, after):
Args:
:param arrLike: DataFrame
:param before: forward time
:param after: backwar time
:return: delta days
before = arrLike[before]
after = arrLike[after]
days = dataInterval(after.strip(), before.strip())
return days