用正则表达式查找时间戳后的最后一个冒号

相关文章推荐

买醉的野马 · 【Python技巧】re.compile有必 ...· 2 周前 ·

月球上的菠菜 · python中re.compile()用法详 ...· 2 周前 ·

成熟的枇杷 · Python3 re.search()方法· 2 周前 ·

满身肌肉的充值卡 · python基础—re模块下的函数及匹配对象 ...· 2 周前 ·

开朗的枕头 · 精通 Oracle+Python，第 3 ...· 3 天前 ·

开心的机器人 · openlayers获取geoserver发 ...· 1 年前 ·

叛逆的乒乓球 · MySQL NULL 值处理 | 菜鸟教程· 2 年前 ·

奔跑的啄木鸟 · linux下 ...· 2 年前 ·

[0:00:02] name1: Okay, this is the continued string...
我想得到一个Python正则表达式，提取所有以Okay...开头的文本。
我已经想出了如何提取时间戳和发言人的名字。
 time_frame = re.search('\[(.*?)\]', temp).group(1)
 speaker_id = re.search('\] (.*?)\:', temp).group(1)
然而，我对最后一个问题感到不满意。请注意，右边的文本字符串中可能有一个冒号，但我想捕捉文本字符串中的所有内容。
    1 个评论
mkrieger1：
在speaker_id模式中附加\s*(.*)有什么特别的问题吗？
python
regex
user1357015发布于 2020-12-04
4 个回答
Ryszard Czech发布于 2020-12-04
已采纳
0 人赞同

Following your logic:
re.search(r'\[.*?\]\s*\w+:\s*(.+)', temp).group(1)
See proof
--------------------------------------------------------------------------------
  \[                       '['
--------------------------------------------------------------------------------
  .*?                      any character except \n (0 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  \]                       ']'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  :                        ':'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    .+                       any character except \n (1 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
    
taras发布于 2020-12-04
0 人赞同

你可以从字面上用\[\d+:\d+:\d+\]匹配时间戳，用.*?:匹配第一个冒号。
'\[\d+:\d+:\d+\].*?:(.*)'
实际上，你可以用一个词组匹配所有3个组。
'\[(\d+:\d+:\d+\)] (.*?):(.*)'
    
Nour-Allah Hussein发布于 2020-12-04
0 人赞同

让我们以一种简单的方式聚在一起g=re.findall(r'\[(.*?)\]\s*(.*):\s*(.*)',text)。
import re
text='[0:00:02] name1: Okay, this is the continued string...'
g=re.findall(r'\[(.*?)\]\s*(.*):\s*(.*)',text)
time_frame = g[0][0]
speaker_id = g[0][1]
speach = g[0][2]
print(time_frame)
print(speaker_id)
print(speach)
output
0:00:02
name1
Okay, this is the continued string...
    
The fourth bird发布于 2020-12-04
0 人赞同

你可以排除匹配:，然后匹配它和可选的空白字符。然后在一个捕捉组中捕捉后面的所有内容。
^\[[^][]*][^:]*:\s*(.+)
Regex demo
import re
regex = r"^\[[^][]*][^:]*:\s*(.+)"
temp = "[0:00:02] name1: Okay, this is the continued string..."
matches = re.search(regex, temp)
if matches:
    print(matches.group(1))
Output
Okay, this is the continued string...
在一个模式中匹配所有3个部分。
^\[([^][]*)]([^:]*):\s*(.+)
^\[ Match opening [ at the start of the string
([^][]*) Capture group 1, match any char except [ and ]
]\s* Match closing ]
([^:]*) Capture group 2 Match any char except :
:\s* Match : and 0+ whitespace chars
(.+) Capture group 3, Match the rest of the string
regex demo
import re
regex = r"^\[([^][]*)]\s*([^:]*):\s*(.+)"
temp = "[0:00:02] name1: Okay, this is the continued string..."
matches = re.search(regex, temp)
if matches:
    print(matches.group(1))
    print(matches.group(2))
    print(matches.group(3))