how does an XML parser know where to find a schema file?

https://stackoverflow.com/questions/23282706

09-07-2023
|

Question

I'm working with XML for the first time for a work project. I feel like I've got the basics down but one thing still has me scratching my head. If you're using an schema to designate a namespace, how does an XML parser know where to find the schema file so it can validate what's being fed into it? I get that on one level the only thing that matters is that elements with globally non-unique names be associated with a namespace in which they are unique, but doesn't the parser have to know whether or not the element tag is actually a namespace member? How exactly does that happen given that the naming convention for namespaces is typically a URL that (probably) doesn't have anything to do with the schema in question other than as a unique string of characters? In other words, how does a parser that needs to validate an XML file find the schema(s) associated with that file?

Solution

There are many possible mechanisms and it depends which schema processor you are using. Schema processing is sometimes integrated with XML parsing but conceptually it's a separate operation and can be done independently.

One way which many people use, but which I don't like much, is the xsi:schemaLocation attribute where the XML instance document itself defines a mapping from namespace URIs to schema locations. I don't like it because if you're validating a document you shouldn't trust it enough to tell you what schema to use for validation.

Most schema processors are likely to have some kind of API or command line interface that allows you to provide schema locations. For example if you're using Saxon then it's

...Validate -s:source.xml -xsd:schema.xsd

where schema.xsd is the top-level schema document that includes/imports any other schema documents needed. There's no explicit binding to namespaces here: Saxon will read the schema documents provided and work out which definitions apply to which namespaces.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow