Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I am trying to do the example in
Use Python & Pandas to Create a D3 Force Directed Network Diagram
But in the below line I am getting an error 'KeyError: ('count', 'occurred at index 0')'
temp_links_list = list(grouped_src_dst.apply(lambda row: {"source": row['source'], "target": row['target'], "value": row['count']}, axis=1))
I am new in python. What is the issue here?
Edited code
import pandas as pd
import json
import re
pcap_data = pd.read_csv('C:\packet_metadata.csv', index_col='No.')
dataframe = pcap_data
src_dst = dataframe[["Source","Destination"]]
src_dst.rename(columns={"Source":"source","Destination":"target"}, inplace=True)
grouped_src_dst = src_dst.groupby(["source","target"]).size().reset_index()
grouped_src_dst.rename(columns={'count':'value'}).to_dict(orient='records')
unique_ips = pd.Index(grouped_src_dst['source']
.append(grouped_src_dst['target'])
.reset_index(drop=True).unique())
print(grouped_src_dst.columns.tolist())
['source', 'target', 0]
Final code
import pandas as pd
import json
import re
pcap_data = pd.read_csv('C:\packet_metadata.csv', index_col='No.')
dataframe = pcap_data
src_dst = dataframe[["Source","Destination"]]
src_dst.sample(10)
grouped_src_dst = src_dst.groupby(["Source","Destination"]).size().reset_index()
d={0:'value',"Source":"source","Destination":"target"}
L = grouped_src_dst.rename(columns=d)
unique_ips = pd.Index(L['source']
.append(L['target'])
.reset_index(drop=True).unique())
group_dict = {}
counter = 0
for ip in unique_ips:
breakout_ip = re.match("^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$", ip)
if breakout_ip:
net_id = '.'.join(breakout_ip.group(1,2,3))
if net_id not in group_dict:
counter += 1
group_dict[net_id] = counter
else:
temp_links_list = list(L.apply(lambda row: {"source": row['source'], "target": row['target'], "value": row['value']}, axis=1))
–
I think there is problem with column name count
- missing or some witespace like ' count'
.
#check columns names
print (grouped_src_dst.columns.tolist())
['count', 'source', 'target']
Sample:
grouped_src_dst = pd.DataFrame({'source':['a','s','f'],
'target':['b','n','m'],
'count':[0,8,4]})
print (grouped_src_dst)
count source target
0 0 a b
1 8 s n
2 4 f m
f = lambda row: {"source": row['source'], "target": row['target'], "value": row['count']}
temp_links_list = list(grouped_src_dst.apply(f, axis=1))
print (temp_links_list)
[{'value': 0, 'source': 'a', 'target': 'b'},
{'value': 8, 'source': 's', 'target': 'n'},
{'value': 4, 'source': 'f', 'target': 'm'}]
Simplier solution is rename column count
and use DataFrame.to_dict
:
print (grouped_src_dst.rename(columns={'count':'value'}).to_dict(orient='records'))
[{'value': 0, 'source': 'a', 'target': 'b'},
{'value': 8, 'source': 's', 'target': 'n'},
{'value': 4, 'source': 'f', 'target': 'm'}]
EDIT1:
pcap_data = pd.read_csv('C:\packet_metadata.csv', index_col='No.')
grouped_src_dst = pcap_data.groupby(["Source","Destination"]).size().reset_index()
d = {0:'value', "Source":"source","Destination":"target"}
L = grouped_src_dst.rename(columns=d).to_dict(orient='records')
Sample:
pcap_data = pd.DataFrame({'Source':list('aabbccdd'),
'Destination':list('eertffff')})
print (pcap_data)
Destination Source
0 e a
1 e a
2 r b
3 t b
4 f c
5 f c
6 f d
7 f d
grouped_src_dst = pcap_data.groupby(["Source","Destination"]).size().reset_index()
print (grouped_src_dst)
Source Destination 0
0 a e 2
1 b r 1
2 b t 1
3 c f 2
4 d f 2
d = {0:'value', "Source":"source","Destination":"target"}
L = grouped_src_dst.rename(columns=d).to_dict(orient='records')
print (L)
[{'value': 2, 'source': 'a', 'target': 'e'},
{'value': 1, 'source': 'b', 'target': 'r'},
{'value': 1, 'source': 'b', 'target': 't'},
{'value': 2, 'source': 'c', 'target': 'f'},
{'value': 2, 'source': 'd', 'target': 'f'}]
unique_ips = pd.Index(grouped_src_dst['Source']
.append(grouped_src_dst['Destination'])
.reset_index(drop=True).unique())
print (unique_ips)
Index(['a', 'b', 'c', 'd', 'e', 'r', 't', 'f'], dtype='object')
import numpy as np
unique_ips = np.unique(grouped_src_dst[['Source','Destination']].values.ravel()).tolist()
print (unique_ips)
['a', 'b', 'c', 'd', 'e', 'f', 'r', 't']
–
–
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.