Why doesn't JTextComponent.setText(String) normalize line endings?

https://stackoverflow.com/questions/17882554

04-06-2022
|

Question

It has recently come to my attention that Java text components use line feed characters (LF, \n, 0x0A) to represent and interpret line breaks internally. This came as quite a surprise to me and puts my assumption, that using System.getProperty('line.separator') everywhere is a good practice, under a question mark.

It would appear that whenever you are dealing with a text component you should be very careful when using the mentioned property, since if you use JTextComponent.setText(String) you might end up with a component that contains invisible newlines (CRs for example). This might not seem that important, unless the content of the text component can be saved to a file. If you save and open the text to a file using the methods that are provided by all text components, your hidden newlines suddenly materialize in the component upon the file being re-opened. The reason for that seems to be that JTextComponent.read(...) method does the normalization.

So why doesn't JTextComponent.setText(String) normalize line endings? Or any other method that allows text to be modified within a text component for that matter? Is using System.getProperty('line.separator') a good practice when dealing with text components? Is it a good practice at all?

Some code to put this question into perspective:

import java.awt.GridBagConstraints;
import java.awt.GridBagLayout;
import java.awt.Insets;
import java.awt.event.ActionEvent;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.Reader;
import java.io.UnsupportedEncodingException;
import java.io.Writer;
import javax.swing.AbstractAction;
import javax.swing.JButton;
import javax.swing.JFrame;
import javax.swing.JOptionPane;
import javax.swing.JScrollPane;
import javax.swing.JTextArea;
import javax.swing.SwingUtilities;

public class TextAreaTest extends JFrame {

    private JTextArea jtaInput;
    private JScrollPane jscpInput;
    private JButton jbSaveAndReopen;

    public TextAreaTest() {
        super();
        setDefaultCloseOperation(EXIT_ON_CLOSE);
        setTitle("Text Area Test");
        GridBagLayout layout = new GridBagLayout();
        setLayout(layout);        

        jtaInput = new JTextArea();
        jtaInput.setText("Some text followed by a windows newline\r\n"
                + "and some more text.");
        jscpInput = new JScrollPane(jtaInput);
        GridBagConstraints constraints = new GridBagConstraints();
        constraints.gridx = 0; constraints.gridy = 0;
        constraints.gridwidth = 2;
        constraints.weightx = 1.0; constraints.weighty = 1.0;
        constraints.fill = GridBagConstraints.BOTH;
        add(jscpInput, constraints);

        jbSaveAndReopen = new JButton(new SaveAndReopenAction());
        constraints = new GridBagConstraints();
        constraints.gridx = 1; constraints.gridy = 1;
        constraints.anchor = GridBagConstraints.EAST;
        constraints.insets = new Insets(5, 0, 2, 2);
        add(jbSaveAndReopen, constraints);

        pack();
    }

    public static void main(String[] args) {
        SwingUtilities.invokeLater(new Runnable() {

            public void run() {
                TextAreaTest tat = new TextAreaTest();
                tat.setVisible(true);
            }
        });
    }

    private class SaveAndReopenAction extends AbstractAction {

        private File file = new File("text-area-test.txt");

        public SaveAndReopenAction() {
            super("Save and Re-open");
        }

        private void saveToFile() 
                throws UnsupportedEncodingException, FileNotFoundException,
                IOException {

            Writer writer = null;
            try {
                writer = new OutputStreamWriter(
                        new FileOutputStream(file), "UTF-8");
                TextAreaTest.this.jtaInput.write(writer);
            } finally {
                if (writer != null) {
                    try {
                        writer.close();
                    } catch (IOException ex) {
                    }
                }
            }
        }

        private void openFile() 
                throws UnsupportedEncodingException, IOException {
            Reader reader = null;
            try {
                reader = new InputStreamReader(
                        new FileInputStream(file), "UTF-8");
                TextAreaTest.this.jtaInput.read(reader, file);
            } finally {
                if (reader != null) {
                    try {
                        reader.close();
                    } catch (IOException ex) {
                    }
                }
            }
        }

        public void actionPerformed(ActionEvent e) {
            Throwable exc = null;
            try {
                saveToFile();
                openFile();
            } catch (UnsupportedEncodingException ex) {
                exc = ex;
            } catch (FileNotFoundException ex) {
                exc = ex;
            } catch (IOException ex) {
                exc = ex;
            }
            if (exc != null) {
                JOptionPane.showConfirmDialog(
                        TextAreaTest.this, exc.getMessage(), "An error occured",
                        JOptionPane.DEFAULT_OPTION, JOptionPane.ERROR_MESSAGE);
            }
        }        
    }
}

An example of what this program saves on my windows machine after adding a new line of text (why the single CR? o_O):

enter image description here

Edit01

I ran/debugged this from within Netbeans IDE, which uses JDK1.7u15 64bit (C:\Program Files\Java\jdk1.7.0_15) on Windows 7.

Solution

First of all, the real answer is that this is how the designers thought the design should work. You'd really need to ask them to get the real reason(s).

Having said that:

So why doesn't JTextComponent.setText(String) normalize line endings?

I think that the most likely reasons are:

It would be unexpected behaviour. Most programmers would expect¹ a 'get' on a text field to return the same string value that was 'set' ... or that the user entered.
If text fields did normalize, then the programmer would have great difficulty preserving the original text's line endings in vases where this was desirable.
The designers might have wanted to change their minds at some point (c.f. the reported behaviour of the read and write methods) bur were unable to for reasons of compatibility.

Anyway, if you need normalization, there's nothing stopping your code from doing this on the value retrieved by the setter.

Or any other method that allows text to be modified within a text component for that matter?

It is reported (see comments) that read and/or write do normalization.

Is using System.getProperty('line.separator') a good practice when dealing with text components? Is it a good practice at all?

It depends on the context. If you know you are reading and writing files to be processed on "this" platform, its probably a good idea. If the file is intended to be read on a different platform (with a different line separators) then normalizing to match the current machine's convention is maybe a bad idea.

^{1 - The fact that other methods like read and write that may behave differently doesn't affect this. They are not "getters" and "setters". I'm talking about how people expect "getters" and "setters" to behave ... not anything else. Besides, people shouldn't expect everything to behave the same way, unless it is specified that they do. But obviously, the part of the problem here is that the spec ... the javadocs ... is silent on these issues.}

^{The other possibility is that the normalization behaviour that @predi reports is actually happening in the Reader / Writer objects ...}

OTHER TIPS

Using the system line separator is questionable. I would only use it to write text files in platform specific format.

When reading, I always simply throw away any '\r', (CR) effectively converting down Windows/Mac/Unix to Unix-style linefeeds. Internally I would never use anything other than plain '\n' (LF) to indicate linefeeds - its a waste of memory and makes processing text only more painful.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow