Bad Regex performance while searching for times (xx:xx:xx)

Question 1

You could use both of the regular expressions in the question. First a match with the leading colon expression to quickly find or exclude possible lines. If that succeeds then use the full replace expression.

MatchCollection mc = Regex.Matches(Datain, ":[012345][0123456789]:[012345][0123456789].*"));

if ( mc != null && mc.Length > 0 )
{
    Dataout = Regex.Replace(Datain, "[012][0123456789]:[012345][0123456789]:[012345][0123456789].*", string.Empty, RegexOptions.Compiled);
}
else
{
    Dataout = Datain;
}

A variation might be

Regex finder = new Regex(":[012345][0123456789]:[012345][0123456789].*");
Regex changer = new regex("[012][0123456789]:[012345][0123456789]:[012345][0123456789].*");

if ( finder.Match(Datain).Success)
{
    Dataout = changer.Replace(Datain, string.Empty);
}
else
{
    Dataout = Datain;
}

Another variation would be to use the finder as above. If the string is found then just check whether the previous two characters are digits.

Regex finder = new Regex(":[012345][0123456789]:[012345][0123456789].*");

Match m = finder.Match(Datain);
if ( m.Success && m.Index > 1)
{
    if ( char.IsDigit(DataIn[m.index-1]) && char.IsDigit(DataIn[m.index-2])
    {
        Dataout = m.Index-2 == 0 ? string.Empty : DataIn.Substring(0, m.Index-2);
    }
    else
    {
        Dataout = Datain;
    }
}
else
{
    Dataout = Datain;
}

In the second and third ideas the finder and changer should be declared and given values before reading any lines. There is no need to execute the new Regex(...) inside the line reading loop.

Question 2

You could use DateTime.TryParseExact to check whether or not a word is a time and take all words before. Here's a LINQ query to clean all lines from the path, maybe it's more efficient:

string format = "HH:mm:ss";
DateTime time;
var cleanedLines = File.ReadLines(path)
    .Select(l => string.Join(" ", l.Split().TakeWhile(w => w.Length != format.Length
       ||  !DateTime.TryParseExact(w, format, CultureInfo.InvariantCulture, DateTimeStyles.None, out time))));

If performance is very critical you could also create a specialized method that is optimized for this task. Here is one approach that should be much more efficient:

public static string SubstringBeforeTime(string input, string timeFormat = "HH:mm:ss")
{
    if (string.IsNullOrWhiteSpace(input))
        return input;
    DateTime time;

    if (input.Length == timeFormat.Length && DateTime.TryParseExact(input, timeFormat, CultureInfo.InvariantCulture, DateTimeStyles.None, out time))
    {
        return ""; // full text is time
    }
    char[] wordSeparator = {' ', '\t'};
    int lastIndex = 0;
    int spaceIndex = input.IndexOfAny(wordSeparator);
    if(spaceIndex == -1)
        return input;
    char[] chars = input.ToCharArray();
    while(spaceIndex >= 0)
    {
        int nonSpaceIndex = Array.FindIndex<char>(chars, spaceIndex + 1, x => !wordSeparator.Contains(x));
        if(nonSpaceIndex == -1)
            return input;
        string nextWord = input.Substring(lastIndex, spaceIndex - lastIndex);
        if( nextWord.Length == timeFormat.Length 
         && DateTime.TryParseExact(nextWord, timeFormat, CultureInfo.InvariantCulture, DateTimeStyles.None, out time))
        {
            return input.Substring(0, lastIndex);
        }
        lastIndex = nonSpaceIndex;
        spaceIndex = input.IndexOfAny(wordSeparator, nonSpaceIndex + 1);
    }
    return input;
}

Sample data and test:

string[] lines = { "blablabla  12:10:40 I want to remove this", "blablabla some more", "even more bla  ", "14:22:11" };
foreach(string line in lines)
{
    string newLine = SubstringBeforeTime(line, "HH:mm:ss");
    Console.WriteLine(string.IsNullOrEmpty(newLine) ? "<empty>" : newLine);
}

Output:

blablabla  
blablabla some more
even more bla  
<empty>

Question 3

in the end I went for this :

        bool MeerCCOl = true;
        int startpositie = 0;
        int CCOLfound = 0; // aantal keer dat terminal output is gevonden

        while(MeerCCOl)
        {
            Regex rgx = new Regex(":[0-5][0-9]:[0-5][0-9]", RegexOptions.Multiline | RegexOptions.Compiled);
            Match GevondenColon = rgx.Match(VlogDataGefilterd,startpositie);

            MeerCCOl = GevondenColon.Success; // CCOL terminal data gevonden, er is misschien nog meer..

            if (MeerCCOl && GevondenColon.Index >= 2)
            {
                CCOLfound++;
                int GevondenUur = 10 * (VlogDataGefilterd[GevondenColon.Index - 2] - '0') +
                                        VlogDataGefilterd[GevondenColon.Index - 1] - '0';
                if (VlogDataGefilterd[GevondenColon.Index - 2] >= '0' && VlogDataGefilterd[GevondenColon.Index - 2] <= '2' &&
                    VlogDataGefilterd[GevondenColon.Index - 1] >= '0' && VlogDataGefilterd[GevondenColon.Index - 1] <= '9' &&
                    GevondenUur>=0 && GevondenUur<=23)
                {
                    Regex rgx2 = new Regex("[012][0-9]:[0-5][0-9]:[0-5][0-9].*", RegexOptions.Multiline);
                    VlogDataGefilterd = rgx2.Replace(VlogDataGefilterd, string.Empty, 1, (GevondenColon.Index - 2));
                    startpositie = GevondenColon.Index - 2; // start volgende match vanaf de plek waar we de 
                }
            }
        }

It first searches for a match to :xx:xx and then checks the 2 characters before that. If it is recognized as a time it removes the whole thing. Bonus is that by check the hours separately, i can make sure the hours read 00-23, instead of 00-29. Also the number of matches is counted this way.

The original simple regex took about 550ms. This code (while more messy) takes about 12ms for the same datafile. That's a whopping 40x speedup :-)

Thanks all!