I'm wondering whether anyone has any insight on converting an array of character codes to Unicode characters, and searching them with a regex.

If you have

var a = [0,1,2,3]

you can use a loop to convert them into a string of the first four control characters in unicode.

However, if you then want to create a regex

"(X)+"

where X == the character code 3 converted to its Unicode equivalent, the searches never seem to work. If I check for the length of the string, it's correct, and .* returns all the characters in the string. But I'm having difficulties constructing a regex to search the string, when all I have to begin with is the character codes. Any advise?

Edit:

var a = [0,1,2,3,0x111]; str = "";

for(var i = 0; i < a.length; i++) {
    str += String.fromCharCode(a[i]);
}

var r = [0x111]
var reg = ""

reg += "(";
for(var i = 0; i < r.length; i++) {
var hex = r[i].toString(16);
    reg += "\\x" + hex;
}
reg += ")";

var res = str.match(RegExp(reg))[0];

Edit

//Working code:
var a = [0,1,2,3,0x111];
str = "";

for(var i = 0; i < a.length; i++) {
    str += String.fromCharCode(a[i]);
}

var r = [3,0x111]
var reg = ""

reg += "(";
for(var i = 0; i < r.length; i++) {
    var hex = r[i].toString(16);
    reg += ((hex.length > 2) ? "\\u" : "\\x") + ("0000" + hex).slice((hex.length > 2) ? -4 : -2);
}
reg += ")";

var res = str.match(RegExp(reg))[0];
有帮助吗?

解决方案

With changes to a few details, the example can be made to work.

Assuming that you are interested in printable Unicode characters in general, and not specifically the first four control characters, the test vector a for the string "hello" would be:

var a = [104, 101, 108, 108, 111]; // hello

If you want to match both 'l' characters:

var r = [108, 108]

When you construct your regular expression, the character code must be in hexadecimal:

reg += "\\x" + ("0" + r[i].toString(16)).slice(-2);

After that, you should see the results you expect.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top