我正在尝试搜刮维基百科。我希望只获得所需的数据,并抛弃所有不必要的东西,例如 另见 , References , etc.
<span class="mw-headline" id="See_also">See also</span> <li><a href="/wiki/List_of_adaptations_of_works_by_Stephen_King" title="List of adaptations of works by Stephen King">List of adaptations of works by Stephen King</a></li> <li><a href="/wiki/Castle_Rock_(Stephen_King)" title="Castle Rock (Stephen King)">Castle Rock (Stephen King)</a></li> <li><a href="/wiki/Charles_Scribner%27s_Sons" title="Charles Scribner's Sons">Charles Scribner's Sons</a> (aka Scribner)</li> <li><a href="/wiki/Derry_(Stephen_King)" title="Derry (Stephen King)">Derry (Stephen King)</a></li> <li><a href="/wiki/Dollar_Baby" title="Dollar Baby">Dollar Baby</a></li> <li><a href="/wiki/Jerusalem%27s_Lot_(Stephen_King)" title="Jerusalem's Lot (Stephen King)">Jerusalem's Lot (Stephen King)</a></li> <li><i><a href="/wiki/Haven_(TV_series)" title="Haven (TV series)">Haven</a></i></li>如上面的HTML所示。如果我发现 另见 in h2 标签,我想删除它后面的所有内容。在这种情况下是无序列表。