Question

Every programming language I know (Perl, Javascript, PHP, Python, ASP, ActionScript, Commodore Basic) uses single and double quotes to delimit strings.

This creates the ongoing situation of having to go to great lengths to treat quotes correctly, since the quote is extremely common in the contents of strings.

Why do programming languages not use some other character to delimit strings, one that is not used in normal conversation \, | or { } for example) so we can just get on with our lives?

Is this true, or am I overlooking something? Is there an easy way to stop using quotes for strings in a modern programming language?

print <<<END
I know about here document syntax, but for minor string manipulation it's overly complicated and it complicates formatting.
END;

[UPDATE] Many of you made a good point about the importance of using only ASCII characters. I have updated the examples to reflect that (the backslash, the pipe and braces).

Was it helpful?

Solution

Python has an additional string type, using triple double-quotes,

"""like this"""

In addition to this, Perl allows you to use any delimiter you want,

q^ like this ^

I think for the most part, the regular string delimiters are used because they make sense. A string is wrapped in quotes. In addition to this, most developers are used to using their common-sense when it comes to strings that drastically altering the way strings are presented could be a difficult learning curve.

OTHER TIPS

Perl lets you use whatever characters you like

 "foo $bar" eq
 qq(foo $bar) eq
 qq[foo $bar] eq
 qq!foo $bar! eq
 qq#foo $bar# etc

Meanwhile
 'foo $bar' eq
 q(foo $bar) eq
 q[foo $bar] eq
 q!foo $bar! eq
 q#foo $bar# etc

The syntax extends to other features, including regular expressions, which is handy if you are dealing with URIs.

 "http://www.example.com/foo/bar/baz/" =~ /\/foo/[^\/]+\/baz\//;
 "http://www.example.com/foo/bar/baz/" =~ m!/foo/[^/]+/baz/!;

Current: "Typewriter" 'quotation' marks

There are many good reasons for using the quotation marks we are currently using:

  • Quotes are easily found on keyboards - so they are easy to type, and they have to be easy, because strings are needed so often.

  • Quotes are in ASCII - most programming tools only handle well ASCII. You can use ASCII in almost any environment imaginable. And that's important when you are fixing your program over a telnet connection in some far-far-away server.

  • Quotes come in many versions - single quotes, double quotes, back quotes. So a language can assign different meanings for differently quoted strings. These different quotes can also solve the 'quotes "inside" quotes' problem.

  • Quotes are natural - English used quotes for marking up text passages long before programming languages followed. In linguistics quotes are used in quite the same way as in programming languages. Quotes are natural the same way + and - are natural for addition and substraction.

Alternative: “typographically” ‘correct’ quotes

Technically they are superior. One great advantage is that you can easily differenciate between opening and closing quotes. But they are hard to type and they are not in ASCII. (I had to put them into a headline to make them visible in this StackOverflow font at all.)

Hopefully on one day when ASCII is something that only historians care about and keyboards have changed into something totally different (if we are even going to have keyboards at all), there will come a programming language that uses better quotes...

Python does have an alternative string delimiter with the triple-double quote """Some String""".

Single quotes and double quotes are used in the majority of languages since that is the standard delimiter in most written languages.

Languages (should) try to be as simple to understand as possible, and using something different from quotes to deal with strings introduces unnecessary complexity.

Using quotation marks to define a set of characters as separate from the enclosing text is more natural to us, and thus easier to read. Also, " and ' are on the keyboard, while those other characters you mentioned are not, so it's easier to type. It may be possible to use a character that is widely available on keyboards, but I can't think of one that won't have the same kind of problem.

E: I missed the pipe character, which may actually be a viable alternative. Except that it's currently widely used as the OR operator, and the readability issue still stands.

Because those other characters you listed aren't ASCII. I'm not sure that we are ready for, or need a programming language in unicode...

EDIT: As to why not use {}, | or \, well those symbols all already have meanings in most languages. Imagine C or Perl with two different meanings for '{' and '}'!

| means or, and in some languages concatenate strings already. and how would you get \n if \ was the delimiter?

Fundamentally, I really don't see why this is a problem. Is \" really THAT hard? I mean, in C, you often have to use \%, and \ and several other two-character characters so... Meh.

Because no one has created a language using some other character that has gotten popular.

I think that is largely because the demand for changing the character is just not there, most programmers are used to the standard quote and see no compelling reason to change the status quo.

Compare the following.

print "This is a simple string."
print "This \"is not\" a simple string."

print ¤This is a simple string.¤
print ¤This "is not" a simple string.¤

I for one don't really feel like the second is any easier or more readable.

Ah, so you want old-fashioned FORTRAN, where you'd quote by counting the number of characters in the string and embedding it in a H format, such as: 13HHello, World!. As somebody who did a few things with FORTRAN back in the days when the language name was all caps, quotation marks and escaping them are a Good Thing. (For example, you aren't totally screwed if you are off by one in your manual character count.)

Seriously, there is no ideal solution. It will always be necessary, at some point, to have a string containing whatever quote character you like. For practical purposes, the quote delimiters need to be on the keyboard and easily accessible, since they're heavily used. Perl's q@...@ syntax will fail if a string contains an example of each possible character. FORTRAN's Hollerith constants are even worse.

You say "having to go to great lengths to treat quotes correctly"; but it's only in the text representation. All modern languages treat strings as binary blocks, so they really don't care about the content. Remember that the text representation is only a simple way for the programmer to tell the system what to do. Once the string is interned, it doesn't have any trouble managing the quotes.

One good reason would probably be that if this is the only thing you want to improve on an existing language, you're not really creating a new language.

And if you're creating a new language, picking the right character for the string quotes is probably way way WAY down on the todo list of things to actually implement.

You would probably be best off picking a delimiter that exists on all common keyboards and terminal representation sets, so most of the ones you suggest are right out...

And in any case, a quoting mechanism will still be necessary...you gain a reduction in the number of times you use quoting at the cost of making the language harder for non-specialist to read.

So it is not entirely clear that this is a win, and then there is force of habit.

Ada doesn't use single quotes for strings. Those are only for chars, and don't have to be escaped inside strings.

I find it very rare that the double-quote character comes up in a normal text string that I enter into a computer program. When it does, it is almost always because I am passing that string to a command interpreter, and need to embed another string in it.

I would imagine the main reason none of those other characters are used for string delimiters is that they aren't in the original 7-bit ASCII code table. Perhaps that's not a good excuse these days, but in a world where most language designers are afraid to buck the insanely crappy C syntax, you aren't going to get a lot of takers for an unusual string delimiter choice.

Python allows you to mix single and double quotes to put quotation marks in strings.

print "Please welcome Mr Jim 'Beaner' Wilson."
>>> Please welcome Mr Jim 'Beaner' Wilson.

print 'Please welcome Mr Jim "Beaner" Wilson.'
>>> Please welcome Mr Jim "Beaner" Wilson

You can also used the previously mentioned triple quotes. These also extend across multiple lines to allow you to also keep from having to print newlines.

print """Please welcome Mr Jim "Beaner" Wilson."""
>>> Please welcome Mr Jim "Beaner" Wilson

Finally, you can print strings the same way as everyone else.

print "Please welcome Mr Jim \"Beaner\" Wilson."
>>> Please welcome Mr Jim "Beaner" Wilson
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top