Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I was attempting to generate a choropleth map by modifying an SVG map depicting all counties in the US. The basic approach is captured by Flowing Data . Since SVG is basically just XML, the approach leverages the BeautifulSoup parser.

The thing is, the parser does not capture all path elements in the SVG file. The following captured only 149 paths (out of over 3000):

#Open SVG file
svg=open(shp_dir+'USA_Counties_with_FIPS_and_names.svg','r').read()
#Parse SVG
soup = BeautifulSoup(svg, selfClosingTags=['defs','sodipodi:namedview'])
#Identify counties
paths = soup.findAll('path')
len(paths)

I know, however, that many more exist from both physical inspection, and the fact that ElementTree methods capture 3,143 paths with the following routine:

#Parse SVG
tree = ET.parse(shp_dir+'USA_Counties_with_FIPS_and_names.svg')
#Capture element
root = tree.getroot()
#Compile list of IDs from file
ids=[]
for child in root:
    if 'path' in child.tag:
        ids.append(child.attrib['id'])
len(ids)

I have not yet figured out how to write from the ElementTree object in a way that is not all messed up.

#Define style template string
style='font-size:12px;fill-rule:nonzero;stroke:#FFFFFF;stroke-opacity:1;'+\
        'stroke-width:0.1;stroke-miterlimit:4;stroke-dasharray:none;'+\
        'stroke-linecap:butt;marker-start:none;stroke-linejoin:bevel;fill:'
#For each path...
for child in root:
    #...if it is a path....
    if 'path' in child.tag:
            #...update the style to the new string with a county-specific color...
            child.attrib['style']=style+col_map[child.attrib['id']]
        except:
            #...if it's not a county we have in the ACS, leave it alone
            child.attrib['style']=style+'#d0d0d0'+'\n'
#Write modified SVG to disk
tree.write(shp_dir+'mhv_by_cty.svg')

The modification/write routine above yields this monstrosity:

My primary question is this: why did BeautifulSoup fail to capture all of the path tags? Second, why would the image modified with the ElementTree objects have all of that extracurricular activity going on? Any advice would be greatly appreciated.

Using BeautifulSoup 4.3.2, I ran svg_soup = BeautifulSoup(svg); paths = svg_soup.find_all('path'); len(paths) which outputted 3143. Perhaps you need to upgrade bs4? – MattDMo Jan 19, 2015 at 2:14
>>> from bs4 import BeautifulSoup
>>> svg = open('USA_Counties_with_FIPS_and_names.svg','r').read()
>>> soup = BeautifulSoup(svg, 'lxml')
>>> paths = soup.findAll('path')
>>> len(paths)

alexce's answer is correct for your first question. As far as your second question is concerned:

why would the image modified with the ElementTree objects have all of that extracurricular activity going on?"

the answer is pretty simple - not every <path> element draws a county. Specifically, there are two elements, one with id="State_Lines" and one with id="separator", that should be eliminated. You didn't supply your dataset of colors, so I just used a random hex color generator (adapted from here) for each county, then used lxml to parse the .svg's XML and iterate through each <path> element, skipping the ones I mentioned above:

from lxml import etree as ET
import random
def random_color():
    r = lambda: random.randint(0,255)
    return '#%02X%02X%02X' % (r(),r(),r())
new_style = 'font-size:12px;fill-rule:nonzero;stroke:#FFFFFF;stroke-opacity:1;stroke-width:0.1;stroke-miterlimit:4;stroke-dasharray:none;stroke-linecap:butt;marker-start:none;stroke-linejoin:bevel;fill:'
tree = ET.parse('USA_Counties_with_FIPS_and_names.svg')
root = tree.getroot()
for child in root:
    if 'path' in child.tag and child.attrib['id'] not in ["separator", "State_Lines"]:
        child.attrib['style'] = new_style + random_color()
tree.write('counties_new.svg')

resulting in this nice image:

Worked like a charm. Definitely an oversight, I was thrown by the pattern not corresponding with any specific boundaries. – Marvin Ward Jr Jan 19, 2015 at 17:48

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.