我正在分析时间序列数据,想提取5个主要的频率成分,作为训练机器学习模型的特征。我的数据集是
921 x 10080
。每一行都是一个时间序列,总共有921个。
在探索可能的方法时,我遇到了各种函数,包括
numpy.fft.fft
、
numpy.fft.fftfreq
和
DFT
。我的问题是,这些函数对数据集做了什么,这些函数之间有什么区别?
对于
Numpy.fft.fft
,Numpy文档指出。
Compute the one-dimensional discrete Fourier Transform.
This function computes the one-dimensional n-point discrete Fourier Transform (DFT) with the efficient Fast Fourier Transform (FFT) algorithm [CT].
While for numpy.fft.fftfreq
:
numpy.fft.fftfreq(n, d=1.0)
Return the Discrete Fourier Transform sample frequencies.
The returned float array f contains the frequency bin centers in cycles per unit of the sample spacing (with zero at the start). For instance, if the sample spacing is in seconds, then the frequency unit is cycles/second.
但这对我来说并不重要,可能是因为我没有信号处理的背景知识。对于我的情况,即提取数据集每一行的前5个主要频率和振幅成分,我应该使用哪个函数?谢谢
使用fft
返回结果如下。我的意图是要获得每个时间序列的前5个频率和振幅值,但它们是频率成分吗?
Here's the code:
def get_fft_values(y_values, T, N, f_s):
f_values = np.linspace(0.0, 1.0/(2.0*T), N//2)
fft_values_ = rfft(y_values)
fft_values = 2.0/N * np.abs(fft_values_[0:N//2])
return f_values[0:5], fft_values[0:5] #f_values - frequency(length = 5040) ; fft_values - amplitude (length = 5040)
t_n = 1
N = 10080
T = t_n / N
f_s = 1/T
result = pd.DataFrame(df.apply(lambda x: get_fft_values(x, T, N, f_s), axis =1))
result
0 ([0.0, 1.000198452073824, 2.000396904147648, 3.0005953562214724, 4.000793808295296], [52.91299603174603, 1.2744877093061115, 2.47064631896607, 1.4657299825335832, 1.9362280837538701])
1 ([0.0, 1.000198452073824, 2.000396904147648, 3.0005953562214724, 4.000793808295296], [57.50430555555556, 4.126212552498241, 2.045294347349226, 0.7878668631936439, 2.6093502232989976])
2 ([0.0, 1.000198452073824, 2.000396904147648, 3.0005953562214724, 4.000793808295296], [52.05765873015873, 0.7214089616631307, 1.8547819994826562, 1.3859749465142301, 1.1848485830307878])
3 ([0.0, 1.000198452073824, 2.000396904147648, 3.0005953562214724, 4.000793808295296], [53.68928571428572, 0.44281647644149114, 0.3880646059685434, 2.3932194091895043, 0.22048418335196407])
4 ([0.0, 1.000198452073824, 2.000396904147648, 3.0005953562214724, 4.000793808295296], [52.049007936507934, 0.08026717757664162, 1.122163085234073, 1.2300320578011028, 0.01109727616896663])
... ...
916 ([0.0, 1.000198452073824, 2.000396904147648, 3.0005953562214724, 4.000793808295296], [74.39303571428572, 2.7956204803382096, 1.788360577194303, 0.8660509272194551, 0.530400826933975])
917 ([0.0, 1.000198452073824, 2.000396904147648, 3.0005953562214724, 4.000793808295296], [51.88751984126984, 1.5768804453161231, 0.9932384706239461, 0.7803585797514547, 1.6151532436755451])
918 ([0.0, 1.000198452073824, 2.000396904147648, 3.0005953562214724, 4.000793808295296], [52.16263888888889, 1.8672674706267687, 0.9955183554654834, 1.0993971449470716, 1.6476405255363171])
919 ([0.0, 1.000198452073824, 2.000396904147648, 3.0005953562214724, 4.000793808295296], [59.22579365079365, 2.1082518972190183, 3.686245044113031, 1.6247500816133893, 1.9790245755039324])
920 ([0.0, 1.000198452073824, 2.000396904147648, 3.0005953562214724, 4.000793808295296], [59.32333333333333, 4.374568790482763, 1.3313693716184536, 0.21391538068483704, 1.414774377287436])