Capturing more than one letter

Question 1

Description

This regex will validate the string by requiring a /r/ followed by the name of a subreddit, then it'll move through and capture the id providing it appears after the subreddit name or after the comments. By using the m option on the search and including the ^ to match the start of a line and $ to match the end of the line, this regex can be used against a long string of text containing any number of new line delimited reddit links as demonstrated in the PHP example.

^\/r\/([a-z0-9]*)\/(?:Comments\/)?([a-z0-9]*)(?:\/?.*?)?$

enter image description here

Groups

0 matches the entire string

captures the sub reddit name
captures the id

PHP Code Example:

You didn't specify a language so I picked PHP to show how this regex would work.

<?php
$sourcestring="/r/AskReddit/comments/1234
r/AskReddit/2345/
/r/AskReddit/comments/3456/dsada/
/r/IHeartKittens/comments/4567/dsada/
/r/cats/comments/i2sz9/we_rescued_a_kitten_last_month/
/r/IAmA/comments/18pik4/astronaut_chris_hadfield_comments/c8gud3h";
preg_match_all('/^\/r\/([a-z0-9]*)\/(?:Comments\/)?([a-z0-9]*)(?:\/?.*?)?$/im',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>
 

$matches Array:
(
    [0] => Array
        (
            [0] => /r/AskReddit/comments/1234
            [1] => /r/AskReddit/2345/
            [2] => /r/AskReddit/comments/3456/dsada/
            [3] => /r/IHeartKittens/comments/4567/dsada/
            [4] => /r/cats/comments/i2sz9/we_rescued_a_kitten_last_month/
            [5] => /r/IAmA/comments/18pik4/astronaut_chris_hadfield_comments/c8gud3h
        )

    [1] => Array
        (
            [0] => AskReddit
            [1] => AskReddit
            [2] => AskReddit
            [3] => IHeartKittens
            [4] => cats
            [5] => IAmA
        )

    [2] => Array
        (
            [0] => 1234
            [1] => 2345
            [2] => 3456
            [3] => 4567
            [4] => i2sz9
            [5] => 18pik4
        )

)

Question 2

First, in your regex .* matches everything until end of string and then begins to backtrack until it can succeed.

Second, [...] do a match of any of the letters inside them, with ? after that gives the meaning of optional.

So, in your test case of /r/sdifsas/sd, the .*/ matches until last forward slash, the following letter is the s inside [...] and the last d is one in the range a-z.

In your test /r/sdifsas/sdfad/aasdasd/a is similar, .*/ matches until last forward slash, the a letter is no inside [...], so skip that part and matches in the range of a-z. Same behaviour for /r/sdifsas/comments/a/d.

I don't know what flavour of regex you are using, but a shot in the dark would be something like:

/r/.*?/(?:comments/)?([a-z0-9]*)/?

It uses a non-capturing group (?:...) for that part of the path, and a * to match zero or more from letter and/or digits.

Question 3

try

/r/AskReddit/[comments/]?([a-z0-9])/?

instead.

your solution suffers from 2 flaws:

your .* portion matches everything - in particular the / characters structuring the location part of your urls
you're matching greedily, which is the default for most regex engines afaik. 'greedily' means that in a match the subpattern gobbles up as many chars as possible.

1 & 2 conspire to match larger portioins of the urls than you intend them to.