Question

i would like to do regular expression matching on custom alphabets, using custom commands. the purpose is to investigate equations and expressions that appear in meteorology.

So for example my alpabet an be [p, rho, u, v, w, x, y, z, g, f, phi, t, T, +, -, /] NOTE: the rho and phi are multiple characters, that should be treated as single character.

I would also like to use custom commands, such a \v for variable, i.e. not the arithmatic operators.

I would like to use other commands such as (\v). note the dot should match dx/dt, where x is a variable. similarly, given p=p(x,y,z), p' would match dp/dx, dp/dy, and dp/dz, but not dp/df. (somewhere there would be given that p = p(x,y,z)).

I would also like to be able to backtrack.

Now, i have investigated PCRE and ragel with D, i see that the first two problems are solvable, with multiple character objects defined s fixed objects. and not a character class.

However how do I address the third?

I dont see either PCRE or RAGEL admitting a way to use custom commands. Moreover, since I would like to use backtrack I am not sure if Ragel is the correct option, as this wouuld need a stack, which means I would be using CFG.

Is there perhaps a domainspeific language to build such regex/cfg machines (for linux 64 bit if that matters)

Was it helpful?

Solution

There is nothing impossible. Just write new class with regex inside with your programming language and define new syntax. It will be your personal regular expression syntax. For example, like:

result = latex_string.match("p'(x,y,z)", "full"); // match dp/dx, dp/dy, dp/dz
result = latex_string_array.match("p'(x,y,z)", "partial"); // match ∂p/∂x, ∂p/∂y, ∂p/∂z
. . .

The method match will treat new, pseudo-regular expression inside the your class and will return the result in desirable form. You can simply make input definition as a string and/or array form. Actually, if some function have to be matched by all derivatives, you must simplify search notation to .match("p'").

One simple notice:

latex equation,

have source: \mathrm{d}y=\frac{\mathrm{d}y}{\mathrm{d}t}\mathrm{d}t, and:

latex equation,

dy=\frac{dy}{dt}dt, and finally:

latex equation,

is dy=(dy/dt)dt

The problem of generalization for latex equations meaning with regular expressions is human input factor. It is just a notation and author can select various manners of input.

The best and precise way is to analysis of formula content and creation a computation three. In this case, you will search not just notations of differentials or derivatives, but instructions to calculate differentials and derivatives, but anyway it is connected with detailed analysis of the formula string with multiple cases of writing manners.

One more thing, and good news for you! It's not necessary to define magic regex-latex multibyte letter greek alphabet. UTF-8 have ρ - GREEK SMALL LETTER RHO you can use in UI, but in search method treat it as \rho, and use simply /\\frac{d\\rho}{dx}/ regex notation.

One more example:

enter image description here

// search string
equation = "dU= \left(\frac{\partial U}{\partial S}\right)_{V,\{N_i\}}dS+ \left(\frac{\partial U}{\partial V}\right)_{S,\{N_i\}}dV+ \sum_i\left(\frac{\partial U}{\partial N_i}\right)_{S,V,\{N_{j \ne i}\}}dN_i";
. . .
// user input by UI 
. . .
// call method
equation.equation_match("U'");// example notation for all types of derivatives for all variables
. . .
// inside the 'equation_match' method you will use native regex methods
matches1 = equation.match(/dU/); // dU
matches2 = equation.match(/\\partial U/); // ∂U
   etc.
return(matches);// combination of matches
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top