How do you take data from Python sort and perform some math on the tuple without messing up the sort order?

https://stackoverflow.com/questions/10367726

04-06-2021
|

Question

I am writing a script to list the 20 largest files in a target directory. Once I have the files, I perform some math on the size to apply the correct human readable sizing information, i.e., Kb, Mb, Gb.

This however is getting the sort out of order. How can I do this, and keep the sort order intact?

#! /usr/bin/env python

import operator, os, sys

args = sys.argv
if len(args) != 2:
    print "You must one enter one directory as an argument."
    sys.exit(1)
else:
    target = args[1]

data = {}
for root, dirs, files in os.walk(target):
   for name in files:       
       filename = os.path.join(root, name)
       if os.path.exists(filename):
            size = float(os.path.getsize(filename))
            data[filename] = size

sorted_data = sorted(data.iteritems(), key=operator.itemgetter(1), reverse=True)
total = str(len(sorted_data))

while len(sorted_data) > 20:
    sorted_data.pop()

final_data = {}
for name in sorted_data:
    size = str(name[1])
    if size >= 1024:
        size = round(float(size) / 1024, 2)
        if size >= 1024:
            size = round(size / 1024, 2)
            if size >= 1024:
                size = round(size / 1024, 2)
                size = str(size) + "Gb"
            else:
                size = str(size) + "Mb"
    else:
        size = str(size) + "Kb"
    final_data[name] = size

print "The 20 largest files are:\n"
for name in final_data:
    print str(final_data[name]) + " " + str(name)
print "\nThere are a total of " + total + " files located in " + target

Solution

Your problem is that you create a brand new dictionary to store the modified filesize data. Because that dictionary doesn't contain any information about the file sizes, and because dictionaries don't store their information in any fixed order, you lose your sort order. But it's simple to recover; simply iterate over the sorted_data instead of the over the final_data, using final_data to access the human-readable file sizes. So something like this:

for filename, size in sorted_data:
    print filename, final_data[filename]

But an even better solution would be to put your human-readable string generating code into a function!

def human_readable_size(size):
    # logic to convert size
    return hr_size

Now you don't even have to create a dictionary:

for filename, size in sorted_data:
    print filename, human_readable_size(size)

OTHER TIPS

I know you already got a solution, I'm just bored and wanted to see if I could clean up your logic a bit more. Here's a simplified version of your code.

I'd say really, just don't bother with the dicts, they don't offer any benefit here.

import operator, os, sys

if len(sys.argv) != 2:
  sys.exit(1)

target = sys.argv[1]

vals = []
for root, dirs, files in os.walk(target):
  names = (os.path.join(root, name) for name in files)
  vals.extend([ (name, float(os.path.getsize(name)))
                for name in names if os.path.exists(name)])

vals = sorted(vals, key=operator.itemgetter(1), reverse=True)

converted = []
for name, size in vals[0:20]:
  if size >= 1024*1024*1024:
    unit = "Gb"
    size /= 1024*1024*1024
  elif size >= 1024*1024:
    unit = "Mb"
    size /= 1024*1024
  elif size >= 1024:
    unit = "Kb"
    size /= 1024
  else:
    unit = "b"
  converted.append((name, "%.2f"%size + unit))

for name, size in converted:
  print size + " " + name

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow