Question

I have a text document. I want to compile a dictionary (DICT) from this document. The dictionary must only contain all the words that begin with an uppercase letter. (it does not matter if the word is at the beginning of a sentence)

Until now I have done this: By the way I must use the for loop and the split function for this problem

DICT = {}

for line in lines: # lines is the text without line breaks 
    words = line.split(" ")
    for word in words:
        if word in DICT:
            DICT[word] += 1
        else:
            DICT[word] = 1

But I suppose this only makes the dictionary out of all the words in my text.

  1. How do I only choose the words that begin with a capital letter?
  2. How do I verify if I have made the dictionary correctly?
Was it helpful?

Solution

Use the s.isupper() method to test if a string is uppercase. You can use indexing to select just the first character.

Thus, to test if the first character is uppercase, use:

if word[0].isupper():

If you want a fast and pythonic approach, use a collections.Counter() object to do the counting, and split on all whitespace to remove newlines:

from collections import Counter

counts = Counter()

for line in lines: # lines is the text without line breaks 
    counts.update(word for word in line.split() if word[0].isupper())

Here, word.split() without arguments splits on all whitespace, removing any whitespace at the start and end of the line (including the newline).

OTHER TIPS

from itertools import groupby
s = "QWE asd ZXc vvQ QWE"
# extract all the words with capital first letter
caps = [word for word in s.split(" ") if word[0].isupper()]  
# group and count them
caps_counts = {word: len(list(group)) for word, group in groupby(sorted(caps))}

print(caps_counts)

groupby might be less efficient than manual looping as it requires sorted iterable performs a sort, and sorting is O(NlogN) complex, over O(N) compelxity in case of manual looping. But this variant a bit more "pythonic".

You can check if the word begins with a capital letter using the using the isupper function mentioned and include this before your if else statement.

if word[0].isupper():
    if word in DICT:
        DICT[word] += 1
    else:
        DICT[word] = 1

To then verify this you can use the any method:

any(word[0].islower() for word in DICT.keys())

Which should return False. You can asset this if you choose.

To make everything a bit nicer you can utilize the defaultdict

from collection import defaultdict

DICT = defaultdict(int)
for line in lines:
    words = line.split(" ")
    for word in words:
        if (word in DICT) and (word[0].isupper()):
            DICT[word] += 1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top