Regex loop parsing with negative word lookahead

Question 1

Try the below one. I have used ?: to avoid few of the group capturing from your regex. Also, added a positive lookahead (?=\d{4}-|$) to see whether there is any \d\d\d\d- format is available at next or its the end of line. You can change this into your pattern if you want(i mean make it into yyyy-mm-dd format).

string.scan(/((?:\d{4}-\d{2}-\d{2}\s+\d{2}\:\d{2}\:\d{2}\s+[AP][M])\:\s(?:.*?)\:.*?)(?=\d{4}-|$)/) {|match| puts match}

Output:

2014-03-29 10:29:24 AM: John Doe: Hey dude how are you feeling 
2014-03-29 10:30:39 AM: Billy: Hey Doe, Im feeling better now. 
2014-03-29 10:30:58 AM: Billy: Yup

Question 2

You can match all until a specific "word" like this, example with the word "2014":

(?>[^2]+|2(?!014))*

The same with an unknow year (four digits):

(?>[^0-9]+|[0-9](?![0-9]{3}-))*

An other way is to split the string with a lookahead:

string.split(/(?=\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\s+[AP][M]:)/)

Note: for these three patterns, you can choose how you want to be specific with the length and the precision of the subpattern inside the lookahead assertion.

Question 3

Character classes don't work like this.

[^\d{4}]*

Meaning:

^    -  Negative class, so -
\d   -  Not digit 0-9
{    -  Not '{'
4    -  Not '4'
}    -  Not '}'

And, the class optionally matches this set many times.
Therefore, it stops, and won't match a number Not digit 0-9

Match until a 4 digit word could also be like this

 (                             # (1 start)
      (                             # (2 start)
           \d{4} - \d{2} - \d{2} 
           \s+ 
           \d{2} \: \d{2} \: \d{2} 
           \s+ 
           [AP] [M] 
      )                             # (2 end)
      \: \s 
      ( .*? )                       # (3)
      \: \s 
      (                             # (4 start)
           (?:
                (?! \d{4} )              # Not 4 digits ahead of this character
                .                        # Ok, match the character
           )*
      )                             # (4 end)
 )                             # (1 end)