Question

I am new to Python, which is also my first programming language. I have a set of txt files (academic papers), I need to extract the paper ID (e.g. ID: a1111111) and abstract (e.g. ABSTRACT: .....). I have no idea how to extract this data from multiple files from multiple folders? Thanks A LOT!

Was it helpful?

Solution

So your question is two part: reading files and accessing folders

  • Reading files

The methods/objects in python used for reading files is in Python's documentation on chapter 7: http://docs.python.org/2/tutorial/inputoutput.html

The basic gist is that you use the open method to access files that are in the same directory

f = open('stuff.txt', 'r')

Where stuff.txt is the name of the file in the same directory that your python file is in. Calling print f.read() will display the text (in String format) of the file. Feel free to assign f.read() to a variable to capture the data.

>>> x = f.read()
>>> print x
This is the entire file.\n

Best read the documentation for all these methods, cause there are subtleties. For example, calling f.read() once will return the entire file contents to you, but calling f.read() again will return an empty string, as the "end of the file has been reached."

  • Accessing Folders

Can you explain to me how exactly you'd like to access folders? In this case, it would be much easier to just put all your files in the same directory as where you are running your python file. However, the basic way to move around in python is to use: os.chdir(path) which is basically cd'ing around. You must import os before you use this.

Leave a comment if you'd like some more information

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top