如何对包含12小时（AM/PM）格式数值的潘达斯时间序列进行排序

1 人关注

我正在使用pandas来处理一些来自csv文件的数据。

我需要按列 df 对我的数据框架 MEETING START TIME 中的数据进行排序，只是对时间进行排序。日期是由另一个字段处理的。

But the result I get is:

MEETING START TIME
10:30 AM
12:30 PM
2:00 PM
4:00 PM
9:15 AM
9:15 AM
上午的任何会议时间，如果是个位数的，就作为结束。我是否需要对日期格式或排序命令做一些改动？


         2
         
         个评论


           
            下面的答案指出，当日期时间是字符串时，你不能对其进行排序，必须将其格式化为日期时间对象。


           
            可能重复的
            
             按日期对潘达斯数据框架进行排序


         python


         pandas


        
         
         
          mattrweaver
         
        
        
         发布于
         
         2015-08-05


        2
        
        个回答


          已采纳


         0
         
         人赞同


          
           你可以使用
           
            pd.datetools.parse
           
           来尝试将你的日期列（目前是一个字符串）转换成一个日期时间对象；然后你应该能够进行排序。
          
          df = pd.DataFrame({'MEETING START TIME': ['10:30 AM','12:30 PM', '2:00 PM', '4:00 PM', '9:15 AM', '9:15 AM']})
df['MEETING START TIME'] = df['MEETING START TIME'].map(lambda x: pd.datetools.parse(x))
df.sort('MEETING START TIME')
Out[33]: 
   MEETING START TIME
5 2015-08-05 09:15:00
4 2015-08-05 09:15:00
0 2015-08-05 10:30:00
1 2015-08-05 12:30:00
2 2015-08-05 14:00:00
3 2015-08-05 16:00:00


           
            
             would
             
              pd.to_datetime
             
             be quicker?


           
            
             mattrweaver
            
            ：


           
            
             好的，可以了，谢谢。我需要把日期剥离出来，我正在这样做：df['MEETING START TIME'] = pd.DatetimeIndex(df['MEETING START TIME']).time 然后我会把它转换回AM/PM，因为这是为了标示，人们会更容易阅读。


           
            
             @EdChum
             
              %timeit pd.datetools.parse('10 AM') 10000 loops, best of 3: 39.7 µs per loop %timeit pd.to_datetime('10 AM') 1000 loops, best of 3: 159 µs per loop


          
           
            我们有兴趣看看使用
            
             map
            
            与
            
             datetools.parse
            
            的方法是否与标准方法一样可以扩展。
            
             here
            
            ,
            
             here
            
            和
            
             here
            
            .
           
           
            让我们做一个非常大的
            
             Series
            
            的字符串代表的日期来找出答案。
           
           In [11]: import datetime as dt
In [12]: format = '%d/%m/%Y %H:%M:%S'
In [13]: def random_date():
   ....:    rand_num = np.random.uniform(2e9)
   ....:    return dt.datetime.fromtimestamp(rand_num).strftime(format)
In [14]: dates = pd.Series([random_date() for i in range(100000)])
In [15]: dates.head() # Some random dates (as strings)
Out[15]: 
0    30/11/1988 15:11:08
1    08/05/2025 10:29:02
2    05/09/2017 02:24:46
3    18/03/2016 14:55:20
4    22/04/1984 04:58:06
dtype: object
现在让我们为这两种方法计时。
In [33]: %timeit dates.map(lambda x: pd.datetools.parse(x))
1 loops, best of 3: 6.98 s per loop