在python中创建序列时寻找range()中的最后一个元素

0 人关注

我正在努力从可变长度的CSV中创建固定大小的序列。

我所使用的方式是一个函数

def create_sequences(csv, window_size, stride):
    sequences = []
    for i in range(0, len(csv)-window_size, stride):
        sequences.append(csv[i:i+window_size])
    return sequences

它成功地返回了序列,但数据丢失了,因为我已经创建了可视化并试图手动解决它,它缺少一些数据。

Total length = 115
Size = 30
Stride = 20
Ending size = 115 -30 = 85
First window: 0 -> 30 
Second Window: 20 -> 50
Third Window: 40 -> 70
Fourth Window: 60 -> 90
Fifth Window: 80 -> 110

The last five frames are lost, How can I set up a window from 100 -> 115 and pad the last row?

python
time-series
lstm
sequence
Muhammad Anas Raza
Muhammad Anas Raza
发布于 2021-09-05
1 个回答
Hai Vu
Hai Vu
发布于 2021-09-05
已采纳
0 人赞同

你的 range() 没有包含足够的内容。你可以按以下方法修复for循环。

def create_sequences(csv, window_size, stride):
    sequences = []
    for start in range(0, len(csv), stride):
        window = csv[start:start+window_size]
        if window == []:
            break
        sequences.append(window)
    return sequences

注意,在for循环中,我们需要测试窗口是否为空--这发生在序列的最后。在这种情况下,我们就知道我们已经完成了for循环,并脱离了。

Update

这里有一个更新,其中包括填充值,使所有的窗口都有相同的尺寸。

def create_sequences(csv, window_size, stride, padding=None):
    sequences = []
    for start in range(0, len(csv), stride):
        window = csv[start:start+window_size]
        if window == []:
            break
        if len(window) < window_size:
            window.extend((window_size - len(window)) * [padding])
        sequences.append(window)
    return sequences

使用实例。

for window in create_sequences(list(range(115)),30, 20, padding=-1):
    print(window)

Output:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]