我有这句话。

<div data-asin="B0000BYDR1" data-asin-currency-code="USD" data-asin-price="45.66" data-asin-shipping="0" data-device-type="WEB" data-display-code="Asin is not eligible because it is price competitive" data-substitute-count="-1" id="cerberus-data-metrics" style="display: none;"></div>

我想提取价格。45.66之间包含。data-asin-price="" data-asin-shipping之间。

I found this code but doesn't work very well.

def extractSubstring(text, sub1, sub2):
  pos1 = text.lower().find(sub1) + len(sub1)
  pos2 = text.lower().find(sub2)
  if pos1 > pos2 and pos2 > 0:
    return text[pos1:pos2]
  elif pos2 > pos1 and pos1 > 0:
    return text[pos2:pos1]
  elif pos1 > 0:
    return text[pos1:]
  elif pos2 > 0:
    return text[pos2:]
result = soup.find_all(attrs={"data-asin-currency-code": "USD"})
priceLine='<div data-asin="B0000BYDR1" data-asin-currency-code="USD" data-asin-price="45.66" data-asin-shipping="0" data-device-type="WEB" data-display-code="Asin is not eligible because it is price competitive" data-substitute-count="-1" id="cerberus-data-metrics" style="display: none;"></div>'
sub1 = 'data-asin-price="'
sub2 = '" data-asin-shipping'
substring = extractSubstring(str(priceLine), sub1, sub2)
    
5 个评论
You can use price = re.findall("\d+\.\d+",priceLine)
Mark
不清楚你不只是用美丽汤来做这件事,也是如此。这使它非常容易提取属性. Like: soup.div['data-asin-price']
我试过bs4没有成功,我只是找到了提取它的方法result = re.search(sub1+'(.*)'+sub2, text)所以,如果有人想回答这个...
python
regex
string
Martin Ocando Corleone
Martin Ocando Corleone
发布于 2019-11-28
1 个回答
Ke Zhu
Ke Zhu
发布于 2019-11-28
已采纳
0 人赞同

靓汤是个好办法

html = bs4.BeautifulSoup('<div data-asin="B0000BYDR1" data-asin-currency-code="USD" data-asin-price="45.66" data-asin-shipping="0" data-device-type="WEB" data-display-code="Asin is not eligible because it is price competitive" data-substitute-count="-1" id="cerberus-data-metrics" style="display: none;"></div>')