سؤال

How would I indicate a language other than one listed in the IANA Language Subtag Registry, for example a fictional language?

Referring to BCP 47 (RFCs 5646 & 4647), I’d guess that the und tag or the -x private-use indicator would be needed; but is the preferred form (e.g.) “und-x-dothraki”, “x-dothraki”, “qgm-dothraki” (using q+gm for George Martin), or something else?

Consider this example:

The Ficlang words <i lang="???">foo bar</i> mean “Hello, sir” in English.

What would be the correct value in the lang="???" attribute above?

(Ideally this should include a way to distinguish between multiple non-standard languages.)

هل كانت مفيدة؟

المحلول

According to the international standard ISO 639−2, the language code mis denotes uncoded languages.

Yet, BCP 47 – which is an IETF document describing “Internet Best Current Practice”, not a standard – says that mis should not be used. The argument is rather weak: “Because the addition of other codes in the future can render its application invalid, it is inherently unstable and hence incompatible with the stability goals of BCP 47. It is always preferable to use other subtags: either 'und' or (with prior agreement) private use subtags.”

HTML5 CR – a Candidate Recommendation by the W3C – says that if the lang attribute value is the empty string, i.e. lang="", then “it must be interpreted as meaning that the language of the node is explicitly unknown”. Current HTML recommendations do no contain such a principle, and they are rather vague as regards to special values of the lang attribute.

So in principle, this depends on what documents you regard as authoritative. On the other hand, it most probably has no practical impact on anything: as soon as the lang attribute value is not in the limited (and browser-dependent) set of language codes recognized by a browser, it most probably be treated as suppressing any language-specific processing (for the element) that a browser might have.

نصائح أخرى

(I would comment @Jukka K Korpela answer but comments are too short)

I was searching for a way to use the lang attribute to denote the computer language used in a <code> tag, and I ended up in the MDN lang attribute page, which says that:

The attribute contains a single “language tag” in the format defined in Tags for Identifying Languages (BCP47).

So the standard that must (should) be followed is CSP47. Reading the BCP47 spec, I found:

[ISO639-2] has defined several codes included in the subtag registry that require additional care when choosing language tags.

In most of these cases, where omitting the language tag is permitted, such omission is preferable to using these codes.

Language tags SHOULD NOT incorporate these subtags as a prefix, unless the additional information conveys some value to the application.

We're in this case: you provide additional information (the content of the tag is written in your fictionnal language) so it's fine to use lang attribute instead of not using it.

The 'mul' (Multiple) primary language subtag identifies content in multiple languages. [...]

Not our case

The 'und' (Undetermined) primary language subtag identifies linguistic content whose language is not determined.

This subtag SHOULD NOT be used unless a language tag is required and language information is not available or cannot be determined.

Omitting the language tag (where permitted) is preferred. The 'und' subtag might be useful for protocols that require a language tag to be provided or where a primary language subtag is required (such as in "und-Latn"). The 'und' subtag MAY also be useful when matching language tags in certain situations.

Not our case either: language is determined, it's just not in the BCP standard. Hence not using the 'und' either.

The 'zxx' (Non-Linguistic, Not Applicable) primary language subtag identifies content for which a language classification is inappropriate or does not apply. Some examples might include instrumental or electronic music; sound recordings consisting of nonverbal sounds; audiovisual materials with no narration, dialog, printed titles, or subtitles; machine- readable data files consisting of machine languages or character codes; or programming source code.

Not the case of a fictionnal language (assuming your fictionnal characters are not instruments)

The 'mis' (Uncoded) primary language subtag identifies content whose language is known but that does not currently have a corresponding subtag.

Seems to be your case: let's use this!

This subtag SHOULD NOT be used.

Huh...

Because the addition of other codes in the future can render its application invalid, it is inherently unstable and hence incompatible with the stability goals of BCP 47.

Oh, right, "subtag should not be used for languages that might end up in BCP47 someday"! I doubt yours would...

It is always preferable to use other subtags: either 'und' or (with prior agreement) private use subtags.

That's another way to do so, you might use the x- language tag and consider this as a private language. Empty lang tag would not fit, as the language is known. Not setting is would be a mistake too as it's not the main stream's language.

So far I get the spec, you can either use mis-... or x-... language tags. Please correct me if I'm mistaking (that's the way I will go soon in the same case [fictionnal language] so if I know it's a wrong way before starting code, that would be easier for me to correct! )

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top