Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I was attempting to generate a choropleth map by modifying an
SVG map
depicting all counties in the US. The basic approach is captured by
Flowing Data
. Since SVG is basically just XML, the approach leverages the
BeautifulSoup
parser.
The thing is, the parser does not capture all
path
elements in the SVG file. The following captured only 149 paths (out of over 3000):
#Open SVG file
svg=open(shp_dir+'USA_Counties_with_FIPS_and_names.svg','r').read()
#Parse SVG
soup = BeautifulSoup(svg, selfClosingTags=['defs','sodipodi:namedview'])
#Identify counties
paths = soup.findAll('path')
len(paths)
I know, however, that many more exist from both physical inspection, and the fact that ElementTree methods capture 3,143 paths with the following routine:
#Parse SVG
tree = ET.parse(shp_dir+'USA_Counties_with_FIPS_and_names.svg')
#Capture element
root = tree.getroot()
#Compile list of IDs from file
ids=[]
for child in root:
if 'path' in child.tag:
ids.append(child.attrib['id'])
len(ids)
I have not yet figured out how to write from the ElementTree
object in a way that is not all messed up.
#Define style template string
style='font-size:12px;fill-rule:nonzero;stroke:#FFFFFF;stroke-opacity:1;'+\
'stroke-width:0.1;stroke-miterlimit:4;stroke-dasharray:none;'+\
'stroke-linecap:butt;marker-start:none;stroke-linejoin:bevel;fill:'
#For each path...
for child in root:
#...if it is a path....
if 'path' in child.tag:
#...update the style to the new string with a county-specific color...
child.attrib['style']=style+col_map[child.attrib['id']]
except:
#...if it's not a county we have in the ACS, leave it alone
child.attrib['style']=style+'#d0d0d0'+'\n'
#Write modified SVG to disk
tree.write(shp_dir+'mhv_by_cty.svg')
The modification/write routine above yields this monstrosity:
My primary question is this: why did BeautifulSoup fail to capture all of the path
tags? Second, why would the image modified with the ElementTree
objects have all of that extracurricular activity going on? Any advice would be greatly appreciated.
–
>>> from bs4 import BeautifulSoup
>>> svg = open('USA_Counties_with_FIPS_and_names.svg','r').read()
>>> soup = BeautifulSoup(svg, 'lxml')
>>> paths = soup.findAll('path')
>>> len(paths)
alexce's answer is correct for your first question. As far as your second question is concerned:
why would the image modified with the ElementTree objects have all of that extracurricular activity going on?"
the answer is pretty simple - not every <path>
element draws a county. Specifically, there are two elements, one with id="State_Lines"
and one with id="separator"
, that should be eliminated. You didn't supply your dataset of colors, so I just used a random hex color generator (adapted from here) for each county, then used lxml
to parse the .svg
's XML and iterate through each <path>
element, skipping the ones I mentioned above:
from lxml import etree as ET
import random
def random_color():
r = lambda: random.randint(0,255)
return '#%02X%02X%02X' % (r(),r(),r())
new_style = 'font-size:12px;fill-rule:nonzero;stroke:#FFFFFF;stroke-opacity:1;stroke-width:0.1;stroke-miterlimit:4;stroke-dasharray:none;stroke-linecap:butt;marker-start:none;stroke-linejoin:bevel;fill:'
tree = ET.parse('USA_Counties_with_FIPS_and_names.svg')
root = tree.getroot()
for child in root:
if 'path' in child.tag and child.attrib['id'] not in ["separator", "State_Lines"]:
child.attrib['style'] = new_style + random_color()
tree.write('counties_new.svg')
resulting in this nice image:
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.