Counting three letter acronyms in a line with Regex Python [closed]

https://stackoverflow.com/questions/18288035

24-06-2022
|

Question

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist

Closed 8 years ago.

Improve this question

I need to make a program in python which looks through a given file. Let's say acronyms.txt, and then returns a percentage value of how many lines contain at least 1 three letter acronym. For example:

NSW is a very large state.
It's bigger than TAS.
but WA is the biggest!

After reading this it should return 66.7% as 66.7% of the lines contain a three letter acronym. It is also rounded to the first decimal place as you can see. I am not very familiar with regex but I think it would be simplest with regex.

EDIT:

I have finished the code but i need it to recognize acronyms with dots between them, EG N.S.W should be recognized as an acronym. How do i do this?

Any help would be appreciated!

Solution 2

You can do something like:

total_lines = 0
matched_lines = 0
for line in open("filename"):
    total_lines += 1
    matched_lines += bool(re.search(r"\b[A-Z]{3}\b", line))
print "%f%%" % (float(matched_lines) / total_lines * 100)

Note '\b' in search pattern -- it matches empty string in beginning or end of word. It helps you to prevent unwanted matches with acronyms longer than 3 ('asdf ASDF asdf') or with acronyms inside word ('asdfASDasdf').

OTHER TIPS

You can do:

import re
cnt = 0
with open('acronyms.txt') as myfile:
    lines = myfile.readlines()
    length = len(lines)
    for line in lines:
        if re.search(r'\b[A-Z]{3}\b', line) is not None:
            cnt += 1

print("{:.1f}%".format(cnt/length*100))

r'[A-Z]{3}' matches three (and only three) capital letters in a row. If a search is found, then we add a count.

Then we simply do the count divided by the length of lines, and print the result as you have shown.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow