Question

I am using C# and Visual Studio 2010. I'm just trying to match a string (which is a path in this case) and create a pattern that will help me figure out if it's a valid pattern or not. The below examples are made up arbitary ones, but they do contain

So I am trying to create a pattern that will match a UNC path which is coming in as a string. For example:

"\\\\Apple-butter27\\AliceFakePlace\\SomeDay\\Grand100\\Some File Name Stuff\\Yes these are fake words\\One more for fun2000343\\myText.txt"

Above is an example of a file path I'm trying to pattern match. I'm attempting to match it with this pattern:

@"\\\\[a-zA-Z0-9-]+\\\w+\\\w+\\\w+\\((\w+)*(\s+)*)*\\((\w+)*(\s+)*)*\\((\w+)*(\s+)*)*\\w+\.txt";

The thing I am guarenteed is there will be 7 folders until I reach my file(s). I'll have to look for a combo of spaces, letters, and numbers for pretty much all the segments.

I did try starting by matching small bits such as my first iteration of testing I tried this as my pattern:

@"\\\\";

And this works since it'll match the first few characters, but if I add this to it:

@"\\\\[a-zA-Z0-9-]+";

It fails. So I thought maybe it was since the strings are causing it to double up so I may have to double my "\" so I tried it again with 8 "\" alone, but that failed.

My goal with previous pattern is to match "\\\\Apple-butter27"

I've been looking on google and all over this site, but none of the pattern matching UNC stuff I found is quite my issue.

I'd really appreciate it if someone could tell me what I'm doing wrong with this pattern. At least a starting point since I know it's a long and probably is going to be a really complicated one...but if someone could point out general things that are wrong with it.

Though since it's a path in non-string state it looks like this:

\\Apple-butter27\AliceFakePlace\SomeDay\Grand100\Some File Name Stuff\Yes these are fake words\One more for fun2000343\myText.txt

I'm new to attempting pattern matching with UNC paths so it's starting to really confuse me so if someone can light the way, I'd appreciate it a lot.

I'm using the .Success function of Regex to see if the patterns match and I'm simply printing a message if the match was success or failure. My main focus is the pattern unless there's some good insight on working with the path as something other than a string for a solution.

Was it helpful?

Solution

No regex required

Alternatively, use the built in parsing of the System.Uri class:

foreach (var path in new [] { @"C:\foo\bar\", @"\\server\bar" })
{
    var uri = new Uri(path);

    if (uri.IsUnc)
    {
        Console.WriteLine("Connects to host '{0}'", uri.Host);
    }
    else
    {
        Console.WriteLine("Local path");
    }
}

Prints:

Local Path
Connects to host 'server'

And if you are trying to match against the extension, don't re-invent the wheel, use Path.GetExtension:

var path = "\\some\really long and complicated path\foo.txt";
var extensionOfPath = Path.GetExtension(path);

if (string.Equals(".txt", extensionOfPath, StringComparison.CurrentCultureIgnoreCase))
{
    Console.WriteLine("It's a txt");
}
else
{
    Console.WriteLine("It's a '{0}', which is not a txt", extensionOfPath);
}

Generally, I am trying to recommend you avoid jumping to regex when solving a problem. Ask yourself first if someone else has solved the problem for you (example for HTML). There is good discussion of why regex has a bad rep on CodingHorror and (less seriously) on xkcd.

Regex Version

If you are bent on using Regex, which I maintain is not the best tool for the job, it can be done. Use spacing and comments to ensure your code is readable.

string input = @"\\Apple-butter27\AliceFakePlace\SomeDay\Grand100\Some File Name Stuff\Yes these are fake words\One more for fun2000343\myText.txt";
Regex regex = new Regex(@"
    ^
    (?:
        # if server is present, capture to a named group
        # use a noncapturing group to remove the surrounding slashes
        # * is a greedy match, so it will butt up against the following directory search
        # this group may or may not occur, so we allow either this or the drive to match (|)
        (?:\\\\(?<server>[^\\]*)\\)
        # if there is no server, then we best have a drive letter
        |(?:(?<drive>[A-Z]):\\)
    )
    # then we have a repeating group (+) to capture all the directory components
    (?:
        # each directory is composed of a name (which does not contain \\)
        # followed by \\
        (?<directory>[^\\]*)\\
    )+
    # then we have a file name, which is identifiable as we already ate the rest of
    # the string.  So, it is just all non-\\ characters at the end.
    (?<file>[^\\]*)
    $", RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

var matches = regex.Match(input).Groups;

foreach (var group in regex.GetGroupNames())
{
    Console.WriteLine("Matched {0}:", group);
    foreach (var value in matches[group].Captures.Cast<Capture>())
    {
        Console.WriteLine("\t{0}", value.Value);
    }
}

Prints

Matched server:
        Apple-butter27
Matched drive:
Matched directory:
        AliceFakePlace
        SomeDay
        Grand100
        Some File Name Stuff
        Yes these are fake words
        One more for fun2000343
Matched file:
        myText.txt

I'm just guessing now...

It sounds like you have some sort of application which calls a directory it's home and builds a multi-layer structure underneath. Something like the following:

C:\
  root directory for the application\
    site name\
      date of work\
        project name\
          bar\
            actual.txt
            files.txt

And you are looking for the actual files, or not, I can't tell. Either way, we know about C:\root directory\ and think it may have actual files. We can then take the directory tree and enumerate to find the actual files:

var diRoot = new DirectoryInfo(@"C:\drop");

var projectDirectories = FindProjects(diRoot);

// get all of the files in all of the project directories of type .txt
var projectFiles = projectDirectories.SelectMany(di => di.GetFiles("*.txt"));

// projectFiles now contains:
//  actual.txt
//  files.txt

private static IEnumerable<DirectoryInfo> FindProjects(DirectoryInfo cDir, int depth = 0)
{
    foreach (var di in cDir.GetDirectories())
    {
        // assume projects are three levels deep
        if (depth == 3)
        {
            // it's a project, so we can return it
            yield return di;
        }
        else
        {
            // pass it through, return the results
            foreach (var d in FindProjects(di, depth + 1))
                yield return d;
        }
    }
}

And since we are not doing string manipulation of paths, we can handle local and UNC paths transparently.

OTHER TIPS

If you're trying to check if a path exists, you can do something like this:

FileInfo fi = new FileInfo(@""\\\\Apple-butter27\\AliceFakePlace\\SomeDay\\Grand100\\Some File Name Stuff\\Yes these are fake words\\One more for fun2000343\\myText.txt"");
bool exists = fi.Exists;

But if you don't have access to these paths at the point where you run validation, you can use this pattern to find \\Apple-butter27:

const string rootPattern = @"(\\\\[a-zA-Z-_0-9]+)";

const RegexOptions regexOptions = RegexOptions.Compiled;

var regex = new Regex(rootPattern, regexOptions);

            foreach (Match match in regex.Matches(fileName))
            {
                if (match.Success && match.Groups.Count >= 1 )
                {
                    shareRoot = match.Groups[0].Value;
                }
            }

I tried this pattern and group 0 gives me exactly \\Apple-butter27 You will have to add other chars in the [brackets] that you might encounter such as may be '.'.

While I can't disagree with the usage of System.Uri (which might be the tool you need); I'll assume that we strictly need to adhere to a pattern matching Regex:

        const string tString = "\\\\Apple-butter27\\AliceFakePlace\\SomeDay\\Grand100\\Some File Name Stuff\\Yes these are fake words\\One more for fun2000343\\myText.txt";
        const string tRegexPattern = @"(\\\\)?((?<Folder>[a-zA-Z0-9- ]+)(\\))";
        const RegexOptions tRegexOptions = RegexOptions.Compiled;

        Regex tRegex = new Regex(tRegexPattern, tRegexOptions);

        Console.WriteLine(tString);

        if (tRegex.Matches(tString).Count == 7)
        {
            foreach (Match iMatch in tRegex.Matches(tString))
            {
                if (iMatch.Success && iMatch.Groups["Folder"].Length > 0)
                {
                    Console.WriteLine(iMatch.Groups["Folder"].Value);
                }
            }
        }
        else
            throw new Exception("String did not have a path of depth 7");

While you can force the regex to only match 7 groups, Regex is really designed for pattern matching, not 'loopy-logic'.

The ?<Folder> group will match only when followed by the delimiter (trailing '\'), hence it will only match on the folder pattern and not the file or file extension.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top