Question

I have a textfile that I want to put into lists.

The textfile looks like this:

New  Distribution  Votes  Rank  Title
     0000000125  1196672  9.2  The Shawshank Redemption (1994)
     0000000125  829707   9.2  The Godfather (1972)
     0000000124  547511   9.0  The Godfather: Part II (1974)
     0000000124  1160800  8.9   The Dark Knight (2008)

I have tried splitting the list with this code:

x = open("ratings.list.txt","r")
movread = x.readlines()
x.close()


s = raw_input('Search: ')
for ns in movread:
    if s in ns:
        print(ns.split()[0:100])

Output:

      Search: #1 Single
     ['1000000103', '56', '6.3', '"#1', 'Single"', '(2006)']

But it does not give me the output i want

It splits on the spaces between the Title.

How can I split it into a list without breaking up the title?

Expected output:

 Search: #1 Single

  Distribution  Votes  Rank           Title
 ['1000000103', '56', '6.3', '"#1 Single" (2006)']
Was it helpful?

Solution

split() takes an optional maxsplit argument:

In Python 3:

>>> s = "     0000000125  1196672  9.2  The Shawshank Redemption (1994)"
>>> s.split()
['0000000125', '1196672', '9.2', 'The', 'Shawshank', 'Redemption', '(1994)']
>>> s.split(maxsplit=3)
['0000000125', '1196672', '9.2', 'The Shawshank Redemption (1994)']

In Python 2, you need to specify the maxsplit argument as a positional argument:

>>> s = "     0000000125  1196672  9.2  The Shawshank Redemption (1994)"
>>> s.split(None, 3)
['0000000125', '1196672', '9.2', 'The Shawshank Redemption (1994)']

OTHER TIPS

may be u can try using re.split('your patter, string) , which should give you proper list based on your regex.

import re
d = re.split('\s+',s,3)
print d

Read the docs:

  s = "     0000000125  1196672  9.2  The Shawshank Redemption (1994)"   
    print  s.split(None,3)

    #output ['0000000125', '1196672', '9.2', 'The Shawshank Redemption (1994)']
import re
s = input('Search: ').lower()
for ns in open("ratings.list.txt","rt"):
    if s in ns.lower():
        print(ns.split(maxsplit=3))

Syntax for splitting is: str.split([sep[, maxsplit]])

'sep' is the seperator used to split strings(by default it matches any white space character)
'maxsplit' argument can be used to limit no. of splits as mentioned by Tim

Here if you are using '\t' in between your columns, you can just use '\t' as seperator

As per standard practice, '\t' is used as seperator for columns so that splitting won't interfere with other spaces in strings. And moreover there won't be any compatibility issues with whatever python version you are using.

Hope this helps : )

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top