Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Ask Question
I wrote a simple script, which using threads to retrieve data from service.
__author__ = 'Igor'
import requests
import time
from multiprocessing.dummy import Pool as ThreadPool
ip_list = []
good_ip_list = []
bad_ip_list = []
progress = 0
with open('/tmp/ip.txt') as f:
ip_list = f.read().split()
def process_request(ip):
global progress
progress += 1
if progress % 10000 == 0:
print 'Processed ip:', progress, '...'
r = requests.get('http://*****/?ip='+ip, timeout=None)
if r.status_code == 200:
good_ip_list.append(ip)
elif r.status_code == 400:
bad_ip_list.append(ip)
else:
print 'Unknown http code received, aborting'
exit(1)
pool = ThreadPool(16)
pool.map(process_request, ip_list)
except:
for name, ip_list in (('/tmp/out_good.txt', good_ip_list), ('/tmp/out_bad.txt', bad_ip_list)):
with open(name, 'w') as f:
for ip in ip_list:
print>>f, ip
But after some requests processed (40k-50k) i receive:
Exception in thread Thread-7 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
Process finished with exit code 0
Tried to change service settings:
<timeout>999</timeout>
<connectionlimit>600</connectionlimit>
<httpthreads>32</httpthreads>
<workerthreads>128</workerthreads>
but still same error. Can anybody help me - what's wrong?
–
–
–
–
Thanks to everybody, who helped me in solving this problem. Rewrote the whole code and now it works perfectly:
__author__ = 'kulakov'
import requests
import time
from multiprocessing.dummy import Pool as ThreadPool
ip_list = []
good_ip_list = []
bad_ip_list = []
with open('/tmp/ip.txt') as f:
ip_list = f.read().split()
s = requests.Session()
def process_request(ip):
r = s.get('http://*****/?ip='+ip, timeout=None)
if r.status_code == 200:
# good_ip_list.append(ip)
return (ip, True)
elif r.status_code == 400:
# bad_ip_list.append(ip)
return (ip, False)
else:
print 'Unknown http code received, aborting'
exit(1)
pool = ThreadPool(16)
for ip, isOk in pool.imap(process_request, ip_list):
if isOk:
good_ip_list.append(ip)
else:
bad_ip_list.append(ip)
pool.close()
pool.join()
for name, ip_list in (('/tmp/out_good.txt', good_ip_list), ('/tmp/out_bad.txt', bad_ip_list)):
with open(name, 'w') as f:
for ip in ip_list:
print>>f, ip
Some new usefull information:
1) It was really bad idea to write data in different threads in a function process_request
, now it returns statement(true\false) and ip.
2) keep alive
is fully supported by requests
, by default, but if you want to use it, you must create instance of an object Session
, and apply get
method to it only:
s = requests.Session()
r = s.get('http://*****/?ip='+ip, timeout=None)
is not safe to mix with Python multiprocessing. The correct approach is to return a tuple (or something) from each call to process_request
and then concatenate them all at the end. It's also not safe to modify progress
concurrently from multiple processes. I'm not positive what your error is, but I bet it's some synchronization problem that is killing Python as a whole.
Remove the shared state and try again.
–
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.