Description
Use a forward lookahead like in this regex which will capture complete sentences which contain both Cello and Lillian.
(?:(?<=\.)\s+|^)((?=(?:(?!\.(?:\s|$)).)*?\b[Cc]ello(?=\s|\.|$))(?=(?:(?!\.(?:\s|$)).)*?\b[Ll]illian(?=\s|\.|$)).*?\.(?=\s|$))
The expression is broken down like to these functional components:
(?:(?<=\.)\s+|^)
start matching this sentence at after a .
followed by any number of spaces or at the start to of the string
(
start capture group 1 which will capture the this entire sentence
(?=
start the look ahead
(?:(?!\.(?:\s|$)).)*?
ensure the regex engine doesn't leave this sentence by forcing it acknowledge a .
followed by either white space or an end of string
\b
matcht the word break
[Cc]ello
match the desired text either all lower case or with a capital initial
(?=\s|\.|$)
look ahead to ensure the string has a trailing space, .
, or the end of the string
)
end of the look ahead
(?=(?:(?!\.(?:\s|$)).)*?\b[Ll]illian(?=\s|\.|$))
this essentially does the same but for Lillian
.*?\.(?=\s|$)
capture the rest of the sentence upto and including the period, and make sure the period is followed by either white space or the end of the string
)
end of the sentence capture group 1
Code example
I don't know python well enough so I offer a PHP example. Note in match statement I'm using the s
option which allows the .
expression to match new line characters
Input text
Cello is a yellow parakeet who sings with Lillian. Toby is a clown who doesn't sing. Willy is a Wonka. Cello is a yellow Lillian.
Cello likes Lillian and kittens.
Lillian likes Cello and dogs. Cello has no friends. And Lillian also hasn't met anyone.
Code
<?php
$sourcestring="your source string";
preg_match_all('/(?:(?<=\.)\s+|^)((?=(?:(?!\.(?:\s|$)).)*?\b[Cc]ello(?=\s|\.|$))(?=(?:(?!\.(?:\s|$)).)*?\b[Ll]illian(?=\s|\.|$)).*?\.(?=\s|$))/s',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>
Matches
$matches Array:
(
[0] => Array
(
[0] => Cello is a yellow parakeet who sings with Lillian.
[1] => Cello is a yellow Lillian.
[2] =>
Cello likes Lillian and kittens.
[3] =>
Lillian likes Cello and dogs.
)
[1] => Array
(
[0] => Cello is a yellow parakeet who sings with Lillian.
[1] => Cello is a yellow Lillian.
[2] => Cello likes Lillian and kittens.
[3] => Lillian likes Cello and dogs.
)
)
If you absolutly need to match sentences where the string Cello appears before Lillian, then you use an expression like this. Here I've simply moved a single close parentheses.
(?:(?<=\.)\s+|^)((?=(?:(?!\.(?:\s|$)).)*?\b[Cc]ello(?=\s|\.|$)(?=(?:(?!\.(?:\s|$)).)*?\b[Ll]illian(?=\s|\.|$))).*?\.(?=\s|$))
Input text
Cello is a yellow parakeet who sings with Lillian. Toby is a clown who doesn't sing. Willy is a Wonka. Cello is a yellow Lillian.
Cello likes Lillian and kittens.
Lillian likes Cello and dogs. Cello has no friends. And Lillian also hasn't met anyone.
Output for capture group 1
[1] => Array
(
[0] => Cello is a yellow parakeet who sings with Lillian.
[1] => Cello is a yellow Lillian.
[2] => Cello likes Lillian and kittens.
)