Question

I will say that i am very weak in my knowledge of regular expressions... I am trying to match tv series file names in java, like the following:

xyz title name S01E02 bla bla
bla bla title name.S03E04
the season title name s05e03

However my solution is working, it only is matching name s01e02 or name.s03e03 Not the complete name of the tv series

my current regular expression is :

(\\w+)((\\.|\\s)[sS]([0-9]{2})[eE]([0-9]{2}))

Was it helpful?

Solution

Here is a suggestion:

Pattern p = Pattern.compile("(.*?)[.\\s][sS](\\d{2})[eE](\\d{2}).*");

String[] tests = { "xyz title name S01E02 bla bla",
                   "bla bla title name.S03E04",
                   "the season title name s05e03" };

for (String s : tests) {
    Matcher m = p.matcher(s);
    if (m.matches())
        System.out.printf("Name: %-23s Season: %s Episode: %s%n",
                m.group(1), m.group(2), m.group(3));

Prints:

Name: xyz title name          Season: 01 Episode: 02
Name: bla bla title name      Season: 03 Episode: 04
Name: the season title name   Season: 05 Episode: 03

OTHER TIPS

This is because of the (\\w+) you have at the beginning which matches a single word.

To make it match a group of words separated by space replace it with:

(\\w+\\s+)+

This pattern might work better:

(?xi) ^ (?: \b \w+ \s*? ) + [\s.] S \d{2} E \d{2} $

You'll have to add extra backslashes if this pattern is a literal Java string rather than read in from elsewhere.

Also, this only works on ASCII data, not full Unicode, because Java's regexes refuse to budge on the old regex shortcuts. You would have to use Unicode properties then. It's rather unpleasant, but if that might be the case, please tell me and I'll update the pattern to work for Unicode.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top