Question

I am managing a directory of files. Each file will be named similarly to Image_000000.png, with the numeric portion being incremented for each file that is stored.

Files can also be deleted, leaving gaps in the number sequence. The reason I am asking is because I recognize that at some point in the future, the user could use up the number sequence unless I takes steps to reuse numbers when they become available. I realize that it is a million, and that's a lot, but we have 20-plus year users, so "someday" is not out of the question.

So, I am specifically asking whether or not there exists a way to easily determine the gaps in the sequence without simply looping. I realize that because it's a fixed range, I could simply loop over the expected range.

And I will unless there is a better/cleaner/easier/faster alternative. If so, I'd like to know about it.

This method is called to obtain the next available file name:

public static String GetNextImageFileName()
{
    String retFile = null;
    DirectoryInfo di = new DirectoryInfo(userVars.ImageDirectory);
    FileInfo[] fia = di.GetFiles("*.*", SearchOption.TopDirectoryOnly);
    String lastFile = fia.Where(i => i.Name.StartsWith("Image_") && i.Name.Substring(6, 6).ContainsOnlyDigits()).OrderBy(i => i.Name).Last().Name;
    if (!String.IsNullOrEmpty(lastFile))
    {
        Int32 num;
        String strNum = lastFile.Substring(6, 6);
        String strExt = lastFile.Substring(13);
        if (!String.IsNullOrEmpty(strNum) && 
            !String.IsNullOrEmpty(strExt) && 
            strNum.ContainsOnlyDigits() &&
            Int32.TryParse(strNum, out num))
        {
            num++;
            retFile = String.Format("Image_{0:D6}.{1}", num, strExt);
            while (num <= 999999 && File.Exists(retFile))
            {
                num++;
                retFile = String.Format("Image_{0:D6}.{1}", num, strExt);
            }
        }
    }

    return retFile;
}

EDIT: in case it helps anyone, here is the final method, incorporating Daniel Hilgarth's answer:

public static String GetNextImageFileName()
{
    DirectoryInfo di = new DirectoryInfo(userVars.ImageDirectory);
    FileInfo[] fia = di.GetFiles("Image_*.*", SearchOption.TopDirectoryOnly);
    List<Int32> fileNums = new List<Int32>();
    foreach (FileInfo fi in fia)
    {
        Int32 i;
        if (Int32.TryParse(fi.Name.Substring(6, 6), out i))
            fileNums.Add(i);
    }
    var result = fileNums.Select((x, i) => new { Index = i, Value = x })
                .Where(x => x.Index != x.Value)
                .Select(x => (Int32?)x.Index)
                .FirstOrDefault();

    Int32 index;
    if (result == null)
        index = fileNums.Count - 1;
    else
        index = result.Value - 1;

    var nextNumber = fileNums[index] + 1;

    if (nextNumber >= 0 && nextNumber <= 999999)
        return String.Format("Image_{0:D6}", result.Value);

    return null;
}
Was it helpful?

Solution

A very simple approach to find the first number of the first gap would be the following:

int[] existingNumbers = /* extract all numbers from all filenames and order them */
var allNumbers = Enumerable.Range(0, 1000000);
var result = allNumbers.Where(x => !existingNumbers.Contains(x)).First();

This will return 1,000,000 if all numbers have been used and no gaps exist.

This approach has the drawback that it performs rather badly, as it iterates existingNumbers multiple times.

A somewhat better approach would be to use Zip:

allNumbers.Zip(existingNumbers, (a, e) => new { Number = a, ExistingNumber = e })
          .Where(x => x.Number != x.ExistingNumber)
          .Select(x => x.Number)
          .First();

An improved version of DuckMaestro's answer that actually returns the first value of the first gap - and not the first value after the first gap - would look like this:

var tmp = existingNumbers.Select((x, i) => new { Index = i, Value = x })
                         .Where(x => x.Index != x.Value)
                         .Select(x => (int?)x.Index)
                         .FirstOrDefault();

int index;
if(tmp == null)
    index = existingNumbers.Length - 1;
else
    index = tmp.Value - 1;

var nextNumber = existingNumbers[index] + 1;

OTHER TIPS

Improving over the other answer, use the alternate version of Where.

int[] existingNumbers = ...
var result = existingNumbers.Where( (x,i) => x != i ).FirstOrDefault();

The value i is a counter starting at 0.

This version of where is supported in .NET 3.5 (http://msdn.microsoft.com/en-us/library/bb549418(v=vs.90).aspx).

var firstnonexistingfile = Enumerable.Range(0,999999).Select(x => String.Format("Image_{0:D6}.{1}", x, strExt)).FirstOrDefault(x => !File.Exists(x));

This will iterate from 0 to 999999, then output the result of the String.Format() as an IEnumerable<string> and then find the first string out of that sequence that returns false for File.Exists().

It's an old question, but it has been suggested (in the comments) that you could use .Except() instead. I tend to like this solution a little better since it will give you the first missing number (the gap) or the next smallest number in the sequence. Here's an example:

var allNumbers = Enumerable.Range(0, 999999); //999999 is arbitrary. You could use int.MaxValue, but it would degrade performance
var existingNumbers = new int[] { 0, 1, 2, 4, 5, 6 };

int result;
var missingNumbers = allNumbers.Except(existingNumbers);
if (missingNumbers.Any())
  result = missingNumbers.First();
else //no missing numbers -- you've reached the max
  result = -1;

Running the above code would set result to:

3

Additionally, if you changed existingNumbers to:

var existingNumbers = new int[] { 0, 1, 3, 2, 4, 5, 6 };

So there isn't a gap, you would get 7 back.

Anyway, that's why I prefer Except over the Zip solution -- just my two cents. Thanks!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top