Not exactly sure what you are asking, but I assume that you have a starting list of schemas that you are working off of, and that list is unique (no subsets, etc).
Define a simple IsSubsetOf()
function, and then call that as part of a Linq 'any' call, or you can do it in a for-loop:
var startingSchemas = new [] { "100**101", "110*101*", "1111*000" };
startingSchemas.Any(x => IsSubsetOf(x, "11****11")); // false
startingSchemas.Any(x => IsSubsetOf(x, "11011010")); // true
public bool IsSubsetOf(string main, string sub)
{
for (var i = 0; i < main.Length; i++)
{
if (main[i] == '*') continue; // main is '*', so anything is ok
if (main[i] == sub[i]) continue; // equal is ok, both 1/0/*
return false; // if not equal, sub[i] could be *, or the opposite of main[i]
}
return true;
}
One issue that I think you might need to clarify is what you want to do when you find something that is NOT a subset, but when combined with another schema then it is.
1*1 => 101 or 111
0*1 => 001 or 011 (not a subset of 1*)
But these two combined = the **1
schema or {001, 011, 101, 111}
Do you want to take a string list and then reduce it to the minimal set of schemas that will still match the same inputs? IsSubset(x,y) = false
, but IsSubset(y,x) = true
Addition: Making the starting data unique is pretty easy, if it isn't already:
var uniqueSchemas = startingSchemas.Distinct().ToList();
uniqueSchemas.RemoveAll(x => uniqueSchemas.Any(y => y != x && IsSubsetOf(y, x)));
Compiled (release, no pdb, optimize):
for (int index = 0; index < main.Length; ++index)
{
if ((int) main[index] != 42 && (int) main[index] != (int) sub[index])
return false;
}
Performance
Very crude performance check. Run in parallels VM on an i7/4gb 1 core allocated to vm, other processes running, etc.
- 200 schemas (randomly generated, 800 length)
- 1 test string (800 length, randomly generated the same way)
- 1 millions runs each
Output: (all runs were +/- 500ms at the outside, usually in unison)
// unsafe = new 'public static void unsafe IsSubsetOf' function using pointers
// safe = standard method
Any() warmup : elapsed = 11965 (.0120 ms)
Any() safe : elapsed = 11300 (.0113 ms)
Any() unsafe : elapsed = 10754 (.0108 ms)
for() safe : elapsed = 11480 (.0115 ms)
for() unsafe : elapsed = 7964 (.008 ms)
So, that what I get from this. If there is a clever data structure for this, i've no clue.
Unsafe Version
This isn't guaranteed to be 100% correct. I don't normally do this, and I don't know if the discrepancy I saw was because of the test harness or the code. Also, disclaimer, it's been a good 6 years since I wrote a tiny bit of unsafe code. But I don't pump .net for performance this way, there is usually a bigger bottleneck... If you do use unsafe
code, my only advice would be to NOT modify anything. If you just read
you should be pretty safe. Check all your bounds!
private unsafe static bool IsSubsetOfUnsafe(String main, String sub)
{
if (main == null && sub == null) return true; // is this what you want? decide
if (main == null || sub == null) return false; // this too? maybe if null then say "true" and discard?
if (main.Length != sub.Length) return false;
fixed (char* m = main)
fixed (char* s = sub)
{
var m1 = m;
var s1 = s;
int len = main.Length;
for (int i = 0; i < len; ++i)
{
if ((int)m1 != 42 && m1 != s1) return false;
m1++;
s1++;
}
return true;
}
}