How do you take data from Python sort and perform some math on the tuple without messing up the sort order?
-
04-06-2021 - |
Question
I am writing a script to list the 20 largest files in a target directory. Once I have the files, I perform some math on the size to apply the correct human readable sizing information, i.e., Kb, Mb, Gb.
This however is getting the sort out of order. How can I do this, and keep the sort order intact?
#! /usr/bin/env python
import operator, os, sys
args = sys.argv
if len(args) != 2:
print "You must one enter one directory as an argument."
sys.exit(1)
else:
target = args[1]
data = {}
for root, dirs, files in os.walk(target):
for name in files:
filename = os.path.join(root, name)
if os.path.exists(filename):
size = float(os.path.getsize(filename))
data[filename] = size
sorted_data = sorted(data.iteritems(), key=operator.itemgetter(1), reverse=True)
total = str(len(sorted_data))
while len(sorted_data) > 20:
sorted_data.pop()
final_data = {}
for name in sorted_data:
size = str(name[1])
if size >= 1024:
size = round(float(size) / 1024, 2)
if size >= 1024:
size = round(size / 1024, 2)
if size >= 1024:
size = round(size / 1024, 2)
size = str(size) + "Gb"
else:
size = str(size) + "Mb"
else:
size = str(size) + "Kb"
final_data[name] = size
print "The 20 largest files are:\n"
for name in final_data:
print str(final_data[name]) + " " + str(name)
print "\nThere are a total of " + total + " files located in " + target
Solution
Your problem is that you create a brand new dictionary to store the modified filesize data. Because that dictionary doesn't contain any information about the file sizes, and because dictionaries don't store their information in any fixed order, you lose your sort order. But it's simple to recover; simply iterate over the sorted_data
instead of the over the final_data
, using final_data
to access the human-readable file sizes. So something like this:
for filename, size in sorted_data:
print filename, final_data[filename]
But an even better solution would be to put your human-readable string generating code into a function!
def human_readable_size(size):
# logic to convert size
return hr_size
Now you don't even have to create a dictionary:
for filename, size in sorted_data:
print filename, human_readable_size(size)
OTHER TIPS
I know you already got a solution, I'm just bored and wanted to see if I could clean up your logic a bit more. Here's a simplified version of your code.
I'd say really, just don't bother with the dicts, they don't offer any benefit here.
import operator, os, sys
if len(sys.argv) != 2:
sys.exit(1)
target = sys.argv[1]
vals = []
for root, dirs, files in os.walk(target):
names = (os.path.join(root, name) for name in files)
vals.extend([ (name, float(os.path.getsize(name)))
for name in names if os.path.exists(name)])
vals = sorted(vals, key=operator.itemgetter(1), reverse=True)
converted = []
for name, size in vals[0:20]:
if size >= 1024*1024*1024:
unit = "Gb"
size /= 1024*1024*1024
elif size >= 1024*1024:
unit = "Mb"
size /= 1024*1024
elif size >= 1024:
unit = "Kb"
size /= 1024
else:
unit = "b"
converted.append((name, "%.2f"%size + unit))
for name, size in converted:
print size + " " + name