如何用python提取两个字符串之间的子串[重复]

相关文章推荐

有腹肌的大象 · 《python》用psutil获取电脑CPU ...· 1 周前 ·

近视的口罩 · python中怎样删除字符串头尾的空格和换行 ...· 1 周前 ·

大气的木耳 · 批量删除txt文件最后一列-python_t ...· 2 天前 ·

旅行中的荒野 · Python ...· 2 天前 ·

从容的碗 · txt文件如何设置每行为空行？_编程语言-C ...· 2 天前 ·

想旅行的双杠 · mybatis动态sql将map的key当键 ...· 1 年前 ·

挂过科的镜子 · String.prototype.subst ...· 1 年前 ·

聪明的签字笔 · 想分析单细胞RNA的动态变化？-腾讯云开发者 ...· 1 年前 ·

俊逸的消防车 · 错误使用 fprintf ...· 2 年前 ·

发财的山寨机 · C#“根据验证过程，远程证书无效: ...· 2 年前 ·

我有这句话。

<div data-asin="B0000BYDR1" data-asin-currency-code="USD" data-asin-price="45.66" data-asin-shipping="0" data-device-type="WEB" data-display-code="Asin is not eligible because it is price competitive" data-substitute-count="-1" id="cerberus-data-metrics" style="display: none;"></div>
我想提取价格。45.66之间包含。data-asin-price="和" data-asin-shipping之间。
I found this code but doesn't work very well.
def extractSubstring(text, sub1, sub2):
  pos1 = text.lower().find(sub1) + len(sub1)
  pos2 = text.lower().find(sub2)
  if pos1 > pos2 and pos2 > 0:
    return text[pos1:pos2]
  elif pos2 > pos1 and pos1 > 0:
    return text[pos2:pos1]
  elif pos1 > 0:
    return text[pos1:]
  elif pos2 > 0:
    return text[pos2:]
result = soup.find_all(attrs={"data-asin-currency-code": "USD"})
priceLine='<div data-asin="B0000BYDR1" data-asin-currency-code="USD" data-asin-price="45.66" data-asin-shipping="0" data-device-type="WEB" data-display-code="Asin is not eligible because it is price competitive" data-substitute-count="-1" id="cerberus-data-metrics" style="display: none;"></div>'
sub1 = 'data-asin-price="'
sub2 = '" data-asin-shipping'
substring = extractSubstring(str(priceLine), sub1, sub2)
    5 个评论
Richard Dunn：
Use regex
AlexDotis：
You can use price = re.findall("\d+\.\d+",priceLine)
Mark：
不清楚你不只是用美丽汤来做这件事，也是如此。这使它非常容易提取属性. Like: soup.div['data-asin-price']
Martin Ocando Corleone：
我试过bs4没有成功，我只是找到了提取它的方法result = re.search(sub1+'(.*)'+sub2, text)所以，如果有人想回答这个...
men6288：
这可能会有帮助。stackoverflow.com/questions/3368969/...
python
regex
string
Martin Ocando Corleone发布于 2019-11-28
1 个回答
Ke Zhu发布于 2019-11-28
已采纳
0 人赞同

靓汤是个好办法
html = bs4.BeautifulSoup('<div data-asin="B0000BYDR1" data-asin-currency-code="USD" data-asin-price="45.66" data-asin-shipping="0" data-device-type="WEB" data-display-code="Asin is not eligible because it is price competitive" data-substitute-count="-1" id="cerberus-data-metrics" style="display: none;"></div>')

推荐文章

有腹肌的大象 · 《python》用psutil获取电脑CPU内存等参数信息_python获取cpu温度

1 周前

近视的口罩 · python中怎样删除字符串头尾的空格和换行-百度经验

1 周前

大气的木耳 · 批量删除txt文件最后一列-python_txt删除最后一列

2 天前

旅行中的荒野 · Python 如何解决最后一行最后一个输出有逗号问题_有问必答-CSDN问答

2 天前

从容的碗 · txt文件如何设置每行为空行？_编程语言-CSDN问答

2 天前

想旅行的双杠 · mybatis动态sql将map的key当键value写成添加语句 - CSDN文库

1 年前

挂过科的镜子 · String.prototype.substr() - JavaScript | MDN

1 年前

聪明的签字笔 · 想分析单细胞RNA的动态变化？-腾讯云开发者社区-腾讯云

1 年前

俊逸的消防车 · 错误使用 fprintf 没有为稀疏输入定义函数。-CSDN博客

2 年前

发财的山寨机 · C#“根据验证过程，远程证书无效: RemoteCertificateNameMismatch、RemoteCertificateChainErrors”-腾讯云开发者社区-腾讯云

2 年前