Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
After call lsof Im looking the generic way to split every row to get in a string each cell of the table, the problem came because each time the command is called the size of every column can change.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
init 1 root cwd DIR 8,1 4096 2 /
kthreadd 2 root txt unknown /proc/2/exe
kjournald 42 root txt unknown /proc/42/exe
udevd 77 root cwd DIR 8,1 4096 2 /
udevd 77 root txt REG 8,1 133176 139359 /sbin/udevd
flush-8:1 26221 root cwd DIR 8,1 4096 2 /
flush-8:1 26221 root rtd DIR 8,1 4096 2 /
flush-8:1 26221 root txt unknown /proc/26221/exe
sudo 26228 root 5u unix 0xfff999002579d3c0 0t0 515611 socket
python 30077 root 2u CHR 1,3 0t0 700 /dev/null
–
Instead of parsing lsof
command output, install the psutil
module instead - it also has the advantage of being cross-platform.
import psutil
def get_all_files():
files = set()
for proc in psutil.process_iter():
files.update(proc.get_open_files())
except Exception: # probably don't have permission to get the files
return files
print get_all_files()
# set([openfile(path='/opt/google/chrome/locales/en-GB.pak', fd=28), openfile(path='/home/jon/.config/google-chrome/Default/Session Storage/000789.log', fd=95), openfile(path='/proc/2414/mounts', fd=8) ... ]
You can then adapt this to include the parent process and other information, eg:
import psutil
for proc in psutil.process_iter():
fids = proc.get_open_files()
except Exception:
continue
for fid in fids:
#print dir(proc)
print proc.name, proc.pid, proc.username, fid.path
#gnome-settings-daemon 2147 jon /proc/2147/mounts
#pulseaudio 2155 jon /home/jon/.config/pulse/2f6a9045c2bc8db6bf32b2d7517969bf-device-volumes.tdb
#pulseaudio 2155 jon /home/jon/.config/pulse/2f6a9045c2bc8db6bf32b2d7517969bf-stream-volumes.tdb
–
–
You know that column labels are right aligned except for the first and last. Hence you can extract the column borders from the ending of the column labels (equivalent to: from the beginning of whitespace between adjacent column labels).
import re
# assuming input_file to be a file-like object
header = input_file.next()
borders = [match.start() for match in re.finditer(r'\s+', header)]
second_to_third_border = borders[1]
borders = borders[1:-1] # delete the first and last because not right-aligned
for line in input_file:
first_to_second_border = line[:second_to_third_border].rfind(' ')
actual_borders = [0, first_to_second_border] + borders + [len(line)]
dset = []
for (s, e) in zip(actual_borders[:-1], actual_borders[1:]):
dset.append(line[s:e].strip())
print dset
Concerning the first column:
You can search for the border between first and second column on each line. Search backwards for whitespace from the border between columns two and three.
You should do backwards because, as mentioned in the comments above, the command might contain spaces - the PID certainly not so.
Concerning the last column:
The column stretches from the border between the second-last and last to the end of the given line.
Example:
from StringIO import StringIO
input_file = StringIO('''\
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
init 1 root cwd DIR 8,1 4096 2 /
kthreadd 2 root txt unknown /proc/2/exe
kjournald 42 root txt unknown /proc/42/exe
prints
['init', '1', 'root', 'cwd', 'DIR', '8,1', '4096', '2', '/']
['kthreadd', '2', 'root', 'txt', 'unknown', '', '', '', '/proc/2/exe']
['kjournald', '42', 'root', 'txt', 'unknown', '', '', '', '/proc/42/exe']
Addressing the 'spaces in NAME problem'
For addressing the issue about possible spaces in NAME column mentioned in the comments I can propose the following solution. It's based on my desire to keep it simple and on the fact that only the last column could have spaces.
The algorithm is simple:
1. Find the position where the last columns start - I use the header NAME starting position
2. Cut the line after that position> What you just cut is the value of the NAME column
3. split() the rest of the line.
Here is the code:
import fileinput
header_limits = dict()
records = list()
input = fileinput.input()
header_line = None
for line in input:
if not header_line:
header_line = line
col_names = header_line.split()
for col_name in col_names:
header_limits[col_name] = header_line.find(col_name)
continue
else:
record = dict()
record['NAME'] = line[header_limits['NAME']:].strip()
line = line[:header_limits['NAME'] - 1]
record.update(zip(col_names, line.split()))
records.append(record)
for record in records:
print "%s\n" % repr(record)
The result is a list of dictionaries. Every dictionary correspond to one line of the lsof output.
This is interesting task showing the power of python for everyday tasks.
Any way, if it's possible I would prefer the use of some python library as the proposed psutils
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.