Question

The code I have:

result = 0
for line_A in text_A:
    for line_B in text_B:
        if line_A in line_B:
            result += 1
            break
return result / len(text_A)

It's pretty straightforward: if line_A from text_A exists in text_B, count it and check another one. I wonder if I'm missing some utility tool or is this approach 100% correct? tia

Was it helpful?

Solution

You can convert both the texts to sets and take the intersection, like this

len(set(text_A) & set(text_B)) / len(text_A)

But the problem here is, if there is duplicate text then it will be counted only once. So, you might want to use

sum(line_A in text_B for line_A in text_A) / len(text_A)

But if the line_A can be anywhere in line_B, then what you have is correct and that can be written succinctly like this

sum(any(line_A in line_B for line_B in text_B) for line_A in text_A)/len(text_A)

OTHER TIPS

If i understood your question correct, this might be helpful:

from collections import Counter

>>> text_a = 'some text'
>>> a = Counter(text_a.split())
>>> text_b = 'other text'
>>> b = Counter(text_b.split())
>>> a & b
Counter({'text': 1})
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top