Question

How can I have a text file (or XML file) represented as a whole string, and search for (or match) a particular string in it?

I have created a BufferedReader object:

BufferedReader input =  new BufferedReader(new FileReader(aFile));

and then I have tried to use the Scanner class with its option to specify different delimiters, like this:

//Scanner scantext = new Scanner(input);
//Scanner scantext = new Scanner(input).useDelimiter("");
Scanner scantext = new Scanner(input).useDelimiter("\n");
while (scantext.hasNext()) {  ... }

Using the Scanner class like this I can either read the text line by line, or word by word, but it doesn't help me, because sometimes in the text, which I want to process, I have

</review><review>

and I would like to say: if you find "<review>" anywhere in the text, do something with the following next lines (or piece of text) until you find "</review>". The problem is that <review> and </review> are on different places in the text, and sometimes glued to other text (therefore the empty space as delimiter doesn't help me).

I have thought that I might use the regular expression API in Java (the Pattern and Matcher classes), but they seem to match a particular string or line, and I want to have the text as one continuous string (at least this was my impressions from what I have read about them). Could you tell me what structures/methods/classes I should use in this case? Thank you.

Was it helpful?

Solution

Don't try to parse XML with regular expressions; it leads only to pain. There are a lot of very nice existing XML APIs in Java already; why try to reinvent them?

Anyway, to search for a string in a text file, you should:

  1. Load the file as a string (example)
  2. Create a Pattern to search for
  3. Use a Matcher to iterate through any matches

OTHER TIPS

It looks to me as though you are trying to work with a structured xml file, and would suggest that you look into javax.xml.parsers.DocumentBuilder or other built in APIs to parse the document.

Use an XML parser.

Or use xpath, like in this example.

I have thought that I might use the regular expression API in Java (the Pattern and Matcher classes), but they seem to match a particular string or line, and I want to have the text as one continuous string

Um, does something prevent you from reading the XML file into a String, and then operating on that, using the regular expression API?

You can easily read a file into a String using e.g. FileUtils from Apache Commons IO: see readFileToString(File file, String encoding).

I also would recommend using a XML parsing API...But as you only want to do something in case of "review" tag, maybe you could use SAX better than DOM...

I think here, we can copy individual line in the text file into a string and then try to match a substring(search string) with the string(line)

But error produces while excuting metacharacters like / or # etc..

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top