Question

I need to open a csv file, select 1000 random rows and save those rows to a new file. I'm stuck and can't see how to do it. Can anyone help?

Was it helpful?

Solution

So there are two parts to this problem. Firstly getting every row of your csv, secondly randomly sampling. I would suggest constructing your list of rows with a list comprehension. Something along the lines of:

with open("your_file.csv", "rb") as source:
    lines = [line for line in source]

Once you've got that you want to take a random sample of those lines. Luckily python has a function that does just that.

import random
random_choice = random.sample(lines, 1000)

Once you've got those lines you want to write them back to a new file (though I assume you already know how given that a quick google reveals this), so I will include an example just for completeness's sake:

with open("new_file.csv", "wb") as sink:
    sink.write("\n".join(random_choice))

which just outputs your choice as a newline delimited string to the file of your choice. It's also worth noting that in this case it doesn't really matter that you're dealing with a csv, just another file with some lines.

If you're working with a very large file or concerned about taking up too much memory you should replace the above list comprehension with a generator and then sample from that instead, but that process isn't nearly as straightforward. If you want advice on making that more performant you should look at this question: Python random sample with a generator iterable iterator

OTHER TIPS

The basic procedure is this:

1. Open the input file

This can be accomplished with the basic builtin open function.

2. Open the output file

You'll probably use the same method that you chose in step #1, but you'll need to open the file in write mode.

3. Read the input file to a variable

It's often preferable to read the file one line at a time, and operate on that one line before reading the next, but if memory is not a concern, you can also read the entire thing into a variable all at once.

4. Choose selected lines

There will be any number of ways to do this, depending on how you did step #3, and your requirements. You could use filter, or a list comprehension, or a for loop with an if statement, etc. The best way depends on the particular constraints of your goal.

5. Write the selected lines

Take the selected lines you've chosen in step #4 and write them to the file.

6. Close the files

It's generally good practice to close the files you've opened to prevent resource leaks.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top