Question

I am trying to find words which start and end with a consonant. Below is what I tried, and it is not what I am looking for. I am really stuck and need your help/suggestions.

import re

a = "Still, the conflicting reports only further served to worsen tensions in the Ukraine crisis, which has grown drastically \
in the past few weeks to a new confrontation between Russia and the West reminiscent of low points in the Cold War." 

b = re.findall(" ([b, c, d, f, g, h, j, k, l, m, n, p, q, r, s, t, v, w, x, y, z, ',', '.'].+?[b, c, d, f, g, h, j, k, l, m, n, p, q, r, s, t, v, w, x, y, z, ',', '.']) ", a.lower())
print(b)

Output is:

['the conflicting', 'further', 'to worsen', 'the ukraine crisis,', 'has', 'drastically', 'the past', 'weeks', 'new', 'between', 'the west', 'low', 'the cold']

But the output is not correct. I have to use regular expressions. Without it it would be too tough, I guess.

Many thanks!

No correct solution

OTHER TIPS

Here is a very clear solution using startswith() and endswith(). In order to achieve your goal, you have to strip special chars on your own and convert your string into a list of words (named s in the code):

vowels = ('a', 'e', 'i', 'o', 'u')
[w for w in s if not w.lower().startswith(vowels) and not w.lower().endswith(vowels)]

Try this:

vowels = ['a', 'e', 'i', 'o', 'u']
words = [w for w in a.split() if w[0] not in vowels and w[-1] not in vowels]

This however would not take care of words ending in . and ,

EDIT: If you have to find patterns using regex:

ending_in_vowel = r'(\b\w+[AaEeIiOoUu]\b)?' #matches all words ending with a vowel
begin_in_vowel = r'(\b[AaEeIiOoUu]\w+\b)?' #matches all words beginning with a vowel

We then need to find all words which either don't begin in a vowel nor end in a vowel

ignore = [b for b in re.findall(begin_in_vowel, a) if b]
ignore.extend([b for b in re.findall(ending_in_vowel, a) if b])

And your result is then:

result = [word for word in a.split() if word not in ignore]

First you should split() a, so that you get each word. Then you check if the first letter and the last letter are in the list consonants. If it is, you append it to all, and at the end, you print the contents of all.

consonants = ['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'y', 'z']

a = "Still, the conflicting reports only further served to worsen tensions in the Ukraine crisis, which has grown drastically \
in the past few weeks to a new confrontation between Russia and the West reminiscent of low points in the Cold War."

all = []

for word in a.split():
    if word[0] in consonants and word[len(word)-1] in consonants:
        all.append(word)

print all

If you were looking to remove punctuation, this regular expression will work:

>>> re.findall(r'\b[bcdfghj-np-tv-z][a-z]*[bcdfghj-np-tv-z]\b', a.lower())
['still', 'conflicting', 'reports', 'further', 'served', 'worsen', 'tensions', 'crisis', 'which', 'has', 'grown', 'drastically', 'past', 'few', 'weeks', 'new', 'confrontation', 'between', 'west', 'reminiscent', 'low', 'points', 'cold', 'war']

However, your original attempt looked like it was trying to preserve commas and periods, so if that was your goal you could use this instead:

>>> re.findall(r'\b[bcdfghj-np-tv-z][a-z]*[bcdfghj-np-tv-z][,.]?(?![a-z])', a.lower())
['still,', 'conflicting', 'reports', 'further', 'served', 'worsen', 'tensions', 'crisis,', 'which', 'has', 'grown', 'drastically', 'past', 'few', 'weeks', 'new', 'confrontation', 'between', 'west', 'reminiscent', 'low', 'points', 'cold', 'war.']

I'm not sure why \b in my first example wouldn't normally match the trailing punctuation (the docs says it will) but at any rate these work.

If you want to account for contractions, the expression would simply be this instead:

r"\b[bcdfghj-np-tv-z][a-z']*[bcdfghj-np-tv-z][,.]?(?![a-z])"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top