gsub() 中超过 9 个反向引用

https://stackoverflow.com/questions/1400937

05-07-2019
|

题

如何使用超过 9 个反向引用的 gsub？我希望下面示例中的输出为“e，g，i，j，o”。

> test <- "abcdefghijklmnop"
> gsub("(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)(\\w)", "\\5, \\7, \\9, \\10, \\15", test, perl = TRUE)
[1] "e, g, i, a0, a5"

解决方案

请参阅使用R语言的正则表达式：

您可以在替换文本中使用反向引用 \ 1 到 \ 9 重新插入由捕获组。整体匹配没有替换文本标记。将整个正则表达式放在捕获组中，然后使用 \ 1 。

但是使用PCRE，您应该可以使用命名组。所以尝试（？P＆lt; name ＆gt; regex ）用于分组命名，（？P = name ）作为反向引用

其他提示

改为使用 strsplit ：

test <- "abcdefghijklmnop"
strsplit(test, "")[[1]][c(5, 7, 9, 10, 15)]

我的理解是\ 10我们会理解为反向引用0后跟一个数字1.我认为9是最大值。

stringi 包中的 stri_replace _ * _ regex 函数没有这样的限制：

library("stringi")
stri_replace_all_regex("abcdefghijkl", "(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)", "$10$1$11$12")
## [1] "jakl"

如果您想跟随第一个捕获组1，请使用例如

stri_replace_all_regex("abcdefghijkl", "(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)", "$10$1$1\\1$12")
## [1] "jaa1l"

根据这个网站, ，反向引用 \10 到 \99 适用于某些语言，但不适用于大多数语言。

那些被报告上班的人是

对9个反向引用的限制特定于 sub（）和 gsub（）函数，而不是像 grep（）这样的函数，类似。支持R中超过9个反向引用意味着使用PCRE正则表达式（即 perl = TRUE 参数）;但是，即使使用此选项，sub（）和gsub（）函数也不支持它。

R文档在这一点上是明确的：参见？regexp

There can be more than 9 backreferences (but the replacement in sub can
only refer to the first 9).

此外，使用命名捕获组来规避此限制的想法必然会失败，因为sub（）函数不支持命名捕获组。

regexpr and gregexpr support ‘named capture’. If groups are named,
e.g., "(?<first>[A-Z][a-z]+)" then the positions of the matches are also
returned by name. (Named backreferences are not supported by sub.)

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow