Java reading long text file is very slow

Question 1

@PaulGrime is right. You are copying the string each time the loop reads a line. Once the string gets big (say 10,000 lines big), it is doing a lot of work to do that copying.

Try this:

StringBuilder sb = new StringBuilder();
while (...reading lines..){ 
   ....
   sb.append(tempString);  //should add newline
   ...
}

s = sb.toString();

Note: read Paul's answer below on why stripping newlines makes this a bad way to read in a file. Also, as mentioned in the question comments, XStream provides a way to read the file and even if it had not, IOUtils.toString(reader) would be a safer way to read a file.

Question 2

Some immediate improvements you can do:

Use a StringBuilder instead of concat and +. Using + and concat can really affect the performance, specially when used in loops.
Reduce the access to the disk. You can do it by using a large buffer:

BufferedReader br = new BufferedReader(new FileReader("someFile.txt"), SIZE);

Question 3

You should use a StringBuilder as String concatenation is extremely slow for even small strings.

Further, try using NIO rather than a BufferedReader.

public static void main(String[] args) throws IOException {
    final File file = //some file
    try (final FileChannel fileChannel = new RandomAccessFile(file, "r").getChannel()) {
        final StringBuilder stringBuilder = new StringBuilder();
        final ByteBuffer byteBuffer = ByteBuffer.allocate(1024);
        final CharsetDecoder charsetDecoder = Charset.forName("UTF-8").newDecoder();
        while (fileChannel.read(byteBuffer) > 0) {
            byteBuffer.flip();
            stringBuilder.append(charsetDecoder.decode(byteBuffer));
            byteBuffer.clear();
        }
    }
}

You can tune the buffer size if it's still too slow - it's heavily system dependent what buffer size works better. For me it makes very little difference if the buffer is 1K or 4K but on other systems I have know that change to increase speed by an order of magnitude.

Question 4

In addition to what has already been said, depending on your use of the XML, your code is potentially incorrect as it discards line endings. For example, this code:

package temp.stackoverflow.q15849706;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;

import com.thoughtworks.xstream.XStream;

public class ReadXmlLines {
    public String read1(BufferedReader br) throws IOException {
        try {
            String s = "";
            String tempString;
            int i = 0;
            while ((tempString = br.readLine()) != null) {
                s = s.concat(tempString);
                // s=s+tempString;
                i = i + 1;
                if (i % 1000 == 0) {
                    System.out.println(Integer.toString(i));
                }
            }
            return s;
        } finally {
            br.close();
        }
    }

    public static void main(String[] args) throws IOException {
        ReadXmlLines r = new ReadXmlLines();

        URL url = ReadXmlLines.class.getResource("xml.xml");
        String xmlStr = r.read1(new BufferedReader(new InputStreamReader(url
                .openStream())));

        Object ob = null;

        XStream xs = new XStream();
        xs.alias("root", Root.class);

        // This is incorrectly read/parsed, as the line endings are not
        // preserved.
        System.out.println("----------1");
        System.out.println(xmlStr);
        ob = xs.fromXML(xmlStr);
        System.out.println(ob);

        // This is correctly read/parsed, when passing in the URL directly
        ob = xs.fromXML(url);
        System.out.println("----------2");
        System.out.println(ob);

        // This is correctly read/parsed, when passing in the InputStream
        // directly
        ob = xs.fromXML(url.openStream());
        System.out.println("----------3");
        System.out.println(ob);
    }

    public static class Root {
        public String script;

        public String toString() {
            return script;
        }
    }
}

and this xml.xml file on the classpath (in the same package as the class):

<root>
    <script>
<![CDATA[
// taken from http://www.w3schools.com/xml/xml_cdata.asp
function matchwo(a,b)
{
if (a < b && a < 0) then
  {
  return 1;
  }
else
  {
  return 0;
  }
}
]]>
    </script>
</root>

produces the following output. The first two lines shows the line endings have been removed, and thus made the Javascript in the CDATA section invalid (as the first JS comment now comments out the whole JS, because the JS lines have been merged).

----------1
<root>    <script><![CDATA[// taken from http://www.w3schools.com/xml/xml_cdata.aspfunction matchwo(a,b){if (a < b && a < 0) then  {  return 1;  }else  {  return 0;  }}]]>    </script></root>
// taken from http://www.w3schools.com/xml/xml_cdata.aspfunction matchwo(a,b){if (a < b && a < 0) then  {  return 1;  }else  {  return 0;  }}    
----------2


// taken from http://www.w3schools.com/xml/xml_cdata.asp
function matchwo(a,b)
{
if (a < b && a < 0) then
  {
  return 1;
  }
else
  {
  return 0;
  }
}
...