Question

I am trying to read in a .csv file containing some data. I only need to read in specific chunks of rows from the file, such as line 15- line 20, line 45-line 50, and so on. However, the file contains text and copy write information like, such as ©1990-2016 AAR,All rights reserved in several places. Such lines seem to be producing the error ValueError: No columns to parse from file, because when I just copy lines without such information using pd.read_csv(), it works fine. My goal is to automate the process of downloading these files from the web and reading them into pandas to grab chunks of rows and then do some processing with it, so I can't just manually specify the windows of text lacking such characters.

Here is what I tried:pd.read_csv("filename.csv",encoding=utf-8, skiprows = 14) and pd.read_csv("filename.csv",encoding=utf-16, skiprows = 15), after looking at similar answers in stack exchange, but this didn't work. Can anyone give me some guidance on this?

Was it helpful?

Solution

There is df.drop command that can be used as follows to remove certain rows (in this case, 15 & 16):

df.drop(df.index[[15,16]])

If the rows you don't need are regular (e.g. you never need row 15) then this is a quick and dirty solution.

If you only want to drop arbitrary rows containing some value, this should do the trick:

df = df.drop([df.column_name == ©1990-2016 AAR])

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top