Question

I am creating a data quality framework for a database that looks at single cells of each data type and sees whether or not their values are acceptable.

For data type string: I just use a regular expression to define what is valid

For other data types (Integer, Timestamp, Boolean, TimeDelta, Float, ... ): I don't have any standard way of recording what is valid

Is there an equivalent to Regular Expressions for other data types? Like IntegerRegEx's? For example, lets say I have a field that must contain numbers between 0 and 65535, or I have a field that can only contain odd numbers...

It would be nice if this IntegerRegEx was also a string (just like normal RegEx's), so I could store IntRegEx's and StringRegEx's in the same table.

Thanks in advance!

Was it helpful?

Solution

I guess depending on what language you're programming in, say PHP, you could store a mathematical expression (a string), for example $x >= 0 && $x <= 65535 or $x % 2 == 1.

With regex, you would write something like this, right?

if (!preg_match($regexFromDb, $fieldValueFromDb)) {
    // validation fails
}

So with mathematical expressions, you'd do the same thing, e.g.

$x = $fieldValueFromDb;
if (!eval("return $mathExprFromDb")) {
    // validation fails
}

This is just exemplary code. Of course you should safeguard your code against the dangers of running arbitrary stored executable code, and also against gibberish expressions crashing your script.

I think this is as close as you're gonna get, because the "IntegerRegEx" you seek already has a name... Math. ;)

OTHER TIPS

If you want something that's a string and regex-like, you could just use regexes. Just have a standard way of converting each type to a string, and write regexes against the string form. It might be awkward for some and error-prone for others, but it's simple and doesn't involve creating your own expression language or loading code straight from the db and evaling it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top