Question

Is there a built-in mechanism in .NET to match patterns other than Regular Expressions? I'd like to match using UNIX style (glob) wildcards (* = any number of any character).

I'd like to use this for a end-user facing control. I fear that permitting all RegEx capabilities will be very confusing.

Was it helpful?

Solution

I found the actual code for you:

Regex.Escape( wildcardExpression ).Replace( @"\*", ".*" ).Replace( @"\?", "." );

OTHER TIPS

I like my code a little more semantic, so I wrote this extension method:

using System.Text.RegularExpressions;

namespace Whatever
{
    public static class StringExtensions
    {
        /// <summary>
        /// Compares the string against a given pattern.
        /// </summary>
        /// <param name="str">The string.</param>
        /// <param name="pattern">The pattern to match, where "*" means any sequence of characters, and "?" means any single character.</param>
        /// <returns><c>true</c> if the string matches the given pattern; otherwise <c>false</c>.</returns>
        public static bool Like(this string str, string pattern)
        {
            return new Regex(
                "^" + Regex.Escape(pattern).Replace(@"\*", ".*").Replace(@"\?", ".") + "$",
                RegexOptions.IgnoreCase | RegexOptions.Singleline
            ).IsMatch(str);
        }
    }
}

(change the namespace and/or copy the extension method to your own string extensions class)

Using this extension, you can write statements like this:

if (File.Name.Like("*.jpg"))
{
   ....
}

Just sugar to make your code a little more legible :-)

Just for the sake of completeness. Since 2016 in dotnet core there is a new nuget package called Microsoft.Extensions.FileSystemGlobbing that supports advanced globing paths. (Nuget Package)

some examples might be, searching for wildcard nested folder structures and files which is very common in web development scenarios.

  • wwwroot/app/**/*.module.js
  • wwwroot/app/**/*.js

This works somewhat similar with what .gitignore files use to determine which files to exclude from source control.

The 2- and 3-argument variants of the listing methods like GetFiles() and EnumerateDirectories() take a search string as their second argument that supports filename globbing, with both * and ?.

class GlobTestMain
{
    static void Main(string[] args)
    {
        string[] exes = Directory.GetFiles(Environment.CurrentDirectory, "*.exe");
        foreach (string file in exes)
        {
            Console.WriteLine(Path.GetFileName(file));
        }
    }
}

would yield

GlobTest.exe
GlobTest.vshost.exe

The docs state that there are some caveats with matching extensions. It also states that 8.3 file names are matched (which may be generated automatically behind the scenes), which can result in "duplicate" matches in given some patterns.

The methods that support this are GetFiles(), GetDirectories(), and GetFileSystemEntries(). The Enumerate variants also support this.

If you use VB.Net, you can use the Like statement, which has Glob like syntax.

http://www.getdotnetcode.com/gdncstore/free/Articles/Intoduction%20to%20the%20VB%20NET%20Like%20Operator.htm

I wrote a FileSelector class that does selection of files based on filenames. It also selects files based on time, size, and attributes. If you just want filename globbing then you express the name in forms like "*.txt" and similar. If you want the other parameters then you specify a boolean logic statement like "name = *.xls and ctime < 2009-01-01" - implying an .xls file created before January 1st 2009. You can also select based on the negative: "name != *.xls" means all files that are not xls.

Check it out. Open source. Liberal license. Free to use elsewhere.

If you want to avoid regular expressions this is a basic glob implementation:

public static class Globber
{
    public static bool Glob(this string value, string pattern)
    {
        int pos = 0;

        while (pattern.Length != pos)
        {
            switch (pattern[pos])
            {
                case '?':
                    break;

                case '*':
                    for (int i = value.Length; i >= pos; i--)
                    {
                        if (Glob(value.Substring(i), pattern.Substring(pos + 1)))
                        {
                            return true;
                        }
                    }
                    return false;

                default:
                    if (value.Length == pos || char.ToUpper(pattern[pos]) != char.ToUpper(value[pos]))
                    {
                        return false;
                    }
                    break;
            }

            pos++;
        }

        return value.Length == pos;
    }
}

Use it like this:

Assert.IsTrue("text.txt".Glob("*.txt"));

I have written a globbing library for .NETStandard, with tests and benchmarks. My goal was to produce a library for .NET, with minimal dependencies, that doesn't use Regex, and outperforms Regex.

You can find it here:

https://www.nuget.org/packages/Glob.cs

https://github.com/mganss/Glob.cs

A GNU Glob for .NET.

You can get rid of the package reference after installing and just compile the single Glob.cs source file.

And as it's an implementation of GNU Glob it's cross platform and cross language once you find another similar implementation enjoy!

I don't know if the .NET framework has glob matching, but couldn't you replace the * with .*? and use regexes?

Based on previous posts, I threw together a C# class:

using System;
using System.Text.RegularExpressions;

public class FileWildcard
{
    Regex mRegex;

    public FileWildcard(string wildcard)
    {
        string pattern = string.Format("^{0}$", Regex.Escape(wildcard)
            .Replace(@"\*", ".*").Replace(@"\?", "."));
        mRegex = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Singleline);
    }
    public bool IsMatch(string filenameToCompare)
    {
        return mRegex.IsMatch(filenameToCompare);
    }
}

Using it would go something like this:

FileWildcard w = new FileWildcard("*.txt");
if (w.IsMatch("Doug.Txt"))
   Console.WriteLine("We have a match");

The matching is NOT the same as the System.IO.Directory.GetFiles() method, so don't use them together.

From C# you can use .NET's LikeOperator.LikeString method. That's the backing implementation for VB's LIKE operator. It supports patterns using *, ?, #, [charlist], and [!charlist].

You can use the LikeString method from C# by adding a reference to the Microsoft.VisualBasic.dll assembly, which is included with every version of the .NET Framework. Then you invoke the LikeString method just like any other static .NET method:

using Microsoft.VisualBasic;
using Microsoft.VisualBasic.CompilerServices;
...
bool isMatch = LikeOperator.LikeString("I love .NET!", "I love *", CompareMethod.Text);
// isMatch should be true.

Just out of curiosity I've glanced into Microsoft.Extensions.FileSystemGlobbing - and it was dragging quite huge dependencies on quite many libraries - I've decided why I cannot try to write something similar?

Well - easy to say than done, I've quickly noticed that it was not so trivial function after all - for example "*.txt" should match for files only in current directly, while "**.txt" should also harvest sub folders.

Microsoft also tests some odd matching pattern sequences like "./*.txt" - I'm not sure who actually needs "./" kind of string - since they are removed anyway while processing. (https://github.com/aspnet/FileSystem/blob/dev/test/Microsoft.Extensions.FileSystemGlobbing.Tests/PatternMatchingTests.cs)

Anyway, I've coded my own function - and there will be two copies of it - one in svn (I might bugfix it later on) - and I'll copy one sample here as well for demo purposes. I recommend to copy paste from svn link.

SVN Link:

https://sourceforge.net/p/syncproj/code/HEAD/tree/SolutionProjectBuilder.cs#l800 (Search for matchFiles function if not jumped correctly).

And here is also local function copy:

/// <summary>
/// Matches files from folder _dir using glob file pattern.
/// In glob file pattern matching * reflects to any file or folder name, ** refers to any path (including sub-folders).
/// ? refers to any character.
/// 
/// There exists also 3-rd party library for performing similar matching - 'Microsoft.Extensions.FileSystemGlobbing'
/// but it was dragging a lot of dependencies, I've decided to survive without it.
/// </summary>
/// <returns>List of files matches your selection</returns>
static public String[] matchFiles( String _dir, String filePattern )
{
    if (filePattern.IndexOfAny(new char[] { '*', '?' }) == -1)      // Speed up matching, if no asterisk / widlcard, then it can be simply file path.
    {
        String path = Path.Combine(_dir, filePattern);
        if (File.Exists(path))
            return new String[] { filePattern };
        return new String[] { };
    }

    String dir = Path.GetFullPath(_dir);        // Make it absolute, just so we can extract relative path'es later on.
    String[] pattParts = filePattern.Replace("/", "\\").Split('\\');
    List<String> scanDirs = new List<string>();
    scanDirs.Add(dir);

    //
    //  By default glob pattern matching specifies "*" to any file / folder name, 
    //  which corresponds to any character except folder separator - in regex that's "[^\\]*"
    //  glob matching also allow double astrisk "**" which also recurses into subfolders. 
    //  We split here each part of match pattern and match it separately.
    //
    for (int iPatt = 0; iPatt < pattParts.Length; iPatt++)
    {
        bool bIsLast = iPatt == (pattParts.Length - 1);
        bool bRecurse = false;

        String regex1 = Regex.Escape(pattParts[iPatt]);         // Escape special regex control characters ("*" => "\*", "." => "\.")
        String pattern = Regex.Replace(regex1, @"\\\*(\\\*)?", delegate (Match m)
            {
                if (m.ToString().Length == 4)   // "**" => "\*\*" (escaped) - we need to recurse into sub-folders.
                {
                    bRecurse = true;
                    return ".*";
                }
                else
                    return @"[^\\]*";
            }).Replace(@"\?", ".");

        if (pattParts[iPatt] == "..")                           // Special kind of control, just to scan upper folder.
        {
            for (int i = 0; i < scanDirs.Count; i++)
                scanDirs[i] = scanDirs[i] + "\\..";

            continue;
        }

        Regex re = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
        int nScanItems = scanDirs.Count;
        for (int i = 0; i < nScanItems; i++)
        {
            String[] items;
            if (!bIsLast)
                items = Directory.GetDirectories(scanDirs[i], "*", (bRecurse) ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);
            else
                items = Directory.GetFiles(scanDirs[i], "*", (bRecurse) ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);

            foreach (String path in items)
            {
                String matchSubPath = path.Substring(scanDirs[i].Length + 1);
                if (re.Match(matchSubPath).Success)
                    scanDirs.Add(path);
            }
        }
        scanDirs.RemoveRange(0, nScanItems);    // Remove items what we have just scanned.
    } //for

    //  Make relative and return.
    return scanDirs.Select( x => x.Substring(dir.Length + 1) ).ToArray();
} //matchFiles

If you find any bugs, I'll be grad to fix them.

I wrote a solution that does it. It does not depend on any library and it does not support "!" or "[]" operators. It supports the following search patterns:

C:\Logs\*.txt

C:\Logs\**\*P1?\**\asd*.pdf

    /// <summary>
    /// Finds files for the given glob path. It supports ** * and ? operators. It does not support !, [] or ![] operators
    /// </summary>
    /// <param name="path">the path</param>
    /// <returns>The files that match de glob</returns>
    private ICollection<FileInfo> FindFiles(string path)
    {
        List<FileInfo> result = new List<FileInfo>();
        //The name of the file can be any but the following chars '<','>',':','/','\','|','?','*','"'
        const string folderNameCharRegExp = @"[^\<\>:/\\\|\?\*" + "\"]";
        const string folderNameRegExp = folderNameCharRegExp + "+";
        //We obtain the file pattern
        string filePattern = Path.GetFileName(path);
        List<string> pathTokens = new List<string>(Path.GetDirectoryName(path).Split('\\', '/'));
        //We obtain the root path from where the rest of files will obtained 
        string rootPath = null;
        bool containsWildcardsInDirectories = false;
        for (int i = 0; i < pathTokens.Count; i++)
        {
            if (!pathTokens[i].Contains("*")
                && !pathTokens[i].Contains("?"))
            {
                if (rootPath != null)
                    rootPath += "\\" + pathTokens[i];
                else
                    rootPath = pathTokens[i];
                pathTokens.RemoveAt(0);
                i--;
            }
            else
            {
                containsWildcardsInDirectories = true;
                break;
            }
        }
        if (Directory.Exists(rootPath))
        {
            //We build the regular expression that the folders should match
            string regularExpression = rootPath.Replace("\\", "\\\\").Replace(":", "\\:").Replace(" ", "\\s");
            foreach (string pathToken in pathTokens)
            {
                if (pathToken == "**")
                {
                    regularExpression += string.Format(CultureInfo.InvariantCulture, @"(\\{0})*", folderNameRegExp);
                }
                else
                {
                    regularExpression += @"\\" + pathToken.Replace("*", folderNameCharRegExp + "*").Replace(" ", "\\s").Replace("?", folderNameCharRegExp);
                }
            }
            Regex globRegEx = new Regex(regularExpression, RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
            string[] directories = Directory.GetDirectories(rootPath, "*", containsWildcardsInDirectories ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);
            foreach (string directory in directories)
            {
                if (globRegEx.Matches(directory).Count > 0)
                {
                    DirectoryInfo directoryInfo = new DirectoryInfo(directory);
                    result.AddRange(directoryInfo.GetFiles(filePattern));
                }
            }

        }
        return result;
    }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top