Question

Once again I have hit the wall.

How to replace escape characters using regular expressions? If tab character (\t) occures more than twice, I want to replace those two or more occurances by single \t. For example if \t\t\t comes, then I want to replace it with \t only. How to do this?

I am facing one more problem regarding reading text file and applying regular expressions to it.

I am using C# to read text file and for regular expressions. When I open text file (file having txt extension), I get a normal view of file. But when I read the same file using "textReader" and store it into string, I get text something like this :

O K\t\t\t\t\t\tEmail:
k.o@gmail.com \rPhone: + 91
992\t\r\rExperience Summary
\rBusiness Intelligence and data
warehouse designer with more than 6
years of work experience in OLAP
Project.\r\r\rTechnology\rBelow is a
list of important software products
and tools that I have worked
with.\r\rSoftware
Products\r\a\r\aOperating
Systems:\rWINDOWS NT, WINDOWS 2000,
UNIX\rDatabase Management
Systems:\rOracle 8i, Oracle 9i, Oracle
10g, SQL-Server 7.0, DB2\rSoftware
Packages:\rVSS, ER Win, M1\rFourth
Generation Language:\rPL/SQL,
SQL*PLUS\rTools &
Technologies:\rOracle Warehouse
Builder 10.1.0.4.0, ORACLE 9i AS,
ORACLE Discoverer Reports Data Stage
8.0, Fast Track 8.5, DB@ Cube, JavaScript, JSP, JDEV, BI BEANS, ASP,
ASP.NET, Ab
Initio\r\r\a\r\a\v\r\r\fAssignments\rThe
details of the various assignments
that I have handled are listed here,
in chronological
order.\r\rName\r\aAvery Dennison Data
Warehousing\r\a\r\aClient\r\aAvery
Dennison, he challenge in the project
is to feed EDW from existing
warehouses which has data at an
aggregated
level.\r\a\r\a\r\rName\r\aAOL BI
(Omniture)\rite team. Designing,
coding and testing along with
coordination with Onsite team.
\r\a\r\aTools & Technologies\r\aUnix
Platform, Oracle 10g , Py. Not only
delivering the correct requirement but
also the performance has to be in
acceptable
range.\r\a\r\a\r\r\r\r\r\r\r\r\r\r\r\rName\r\aAIW
Events (ABSA)\r\a\r\aClient\r\aABSA,
South Africa\r\a\r\aP

i.e, all the escape characters like \s, \r, \f are visible. Because of this, the regular expression that works with normal text doesnt work when I read same text into string variable.

Anyone one knows how to solve this problem?

Thanks

I have one more query. I want to match text at the end of the line. I tried to use $ for this. For example, to match text ending with "assignment", I used regex assignment$.It worked with normal text. But when I run this regx on text given by streamreader, this regex doesnt work. Stream reader gives strings like Assignments\r\r\f.How to match end of line or start of line with this kind of text?

Was it helpful?

Solution

You're trying to match the string "\r", right? You'll have to escape the escape character to do it:

"(\\r)*"

This expression will match "\r" any number of times. It works because "\\" escapes to a literal "\". You can apply the same idea to match "\t", too.

OTHER TIPS

/\t{2,}/\t/

replaces two or more tabs with a single character.

For the tab char use something like this:

/(\t)*/\1/g
  1. Make a group with one char (the tab char) and match it as much as possible.
  2. Replace the full match with the single character
  3. (Global) use the pattern for the full text.

Then you could use the same expression for the other escaped chars you want to replace.

You could replace \\t\\t\\t with \\t{3}

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top