Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
Ask Question
I wrote a simple script, which using threads to retrieve data from service.
__author__ = 'Igor'
import requests
import time
from multiprocessing.dummy import Pool as ThreadPool
ip_list = []
good_ip_list = []
bad_ip_list = []
progress = 0
with open('/tmp/ip.txt') as f:
ip_list = f.read().split()
def process_request(ip):
global progress
progress += 1
if progress % 10000 == 0:
print 'Processed ip:', progress, '...'
r = requests.get('http://*****/?ip='+ip, timeout=None)
if r.status_code == 200:
good_ip_list.append(ip)
elif r.status_code == 400:
bad_ip_list.append(ip)
else:
print 'Unknown http code received, aborting'
exit(1)
pool = ThreadPool(16)
pool.map(process_request, ip_list)
except:
for name, ip_list in (('/tmp/out_good.txt', good_ip_list), ('/tmp/out_bad.txt', bad_ip_list)):
with open(name, 'w') as f:
for ip in ip_list:
print>>f, ip
But after some requests processed (40k-50k) i receive:
Exception in thread Thread-7 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
Process finished with exit code 0
Tried to change service settings:
<timeout>999</timeout>
<connectionlimit>600</connectionlimit>
<httpthreads>32</httpthreads>
<workerthreads>128</workerthreads>
but still same error. Can anybody help me - what's wrong?
–
–
–
–
Thanks to everybody, who helped me in solving this problem. Rewrote the whole code and now it works perfectly:
__author__ = 'kulakov'
import requests
import time
from multiprocessing.dummy import Pool as ThreadPool
ip_list = []
good_ip_list = []
bad_ip_list = []
with open('/tmp/ip.txt') as f:
ip_list = f.read().split()
s = requests.Session()
def process_request(ip):
r = s.get('http://*****/?ip='+ip, timeout=None)
if r.status_code == 200:
# good_ip_list.append(ip)
return (ip, True)
elif r.status_code == 400:
# bad_ip_list.append(ip)
return (ip, False)
else:
print 'Unknown http code received, aborting'
exit(1)
pool = ThreadPool(16)
for ip, isOk in pool.imap(process_request, ip_list):
if isOk:
good_ip_list.append(ip)
else:
bad_ip_list.append(ip)
pool.close()
pool.join()
for name, ip_list in (('/tmp/out_good.txt', good_ip_list), ('/tmp/out_bad.txt', bad_ip_list)):
with open(name, 'w') as f:
for ip in ip_list:
print>>f, ip
Some new usefull information:
1) It was really bad idea to write data in different threads in a function process_request, now it returns statement(true\false) and ip.
2) keep alive is fully supported by requests, by default, but if you want to use it, you must create instance of an object Session, and apply get method to it only:
s = requests.Session()
r = s.get('http://*****/?ip='+ip, timeout=None)
is not safe to mix with Python multiprocessing. The correct approach is to return a tuple (or something) from each call to process_request and then concatenate them all at the end. It's also not safe to modify progress concurrently from multiple processes. I'm not positive what your error is, but I bet it's some synchronization problem that is killing Python as a whole.
Remove the shared state and try again.
–
–
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.