Microdata vs RDFa

https://stackoverflow.com/questions/9066702

20-04-2021
|

Pergunta

I have a quick question about RDFa and Microdata.

My current understanding is that RDFa is RDF implemented into HTML but is complicated for new developers like myself, Microdata seems really easy and quick to implement.

What are the other advantages and disadvantages around these two semantic formats ?

Solução

Differences between Microdata and RDFa

While there are many (technical, smaller) differences, here’s a selection of those I consider important (used my answer on Webmasters as a base).

Specifications

As W3C’s HTML WG found no volunteer to edit the Microdata specification, it is now merely a W3C Group Note (see history), which means that there are no plans for any further work on it.

So the Microdata section in WHATWG’s "HTML Living Standard" is the only place where Microdata may evolve. Depending on what gets changed, it may happen that their Microdata becomes incompatible to W3C’s HTML5.

Update: In 2017, work started again, with the aim to publish Microdata as W3C Recommendation.
RDFa is published as W3C Recommendation.

Applicability

Microdata can only be used in (X)HTML5 (resp. HTML as defined by the WHATWG).
RDFa can be used in various host languages, i.e. several (X)HTML variants and XML (thus also in SVG, MathML, Atom etc.).

And new host languages can be supported, as RDFa Core "is a specification for attributes to express structured data in any markup language".

Use of multiple vocabularies

In Microdata, it’s harder, and sometimes impossible, to use several vocabularies for the same content.
Thanks to its use of prefixes, RDFa allows to mix vocabularies.

Use of reverse properties

Microdata doesn’t provide a way to use reverse properties. You need this for vocabularies that don’t define inverse properties (e.g., they only define parent instead of parent & child). The popular Schema.org is such a vocabulary (with only a few older exceptions).

While the W3C Note Microdata to RDF defines the experimental itemprop-reverse, this attribute is not part of W3C’s nor WHATWG’s Microdata.
RDFa supports the use of reverse properties (with the rev attribute).

Semantic Web

By using Microdata, you are not directly playing part in the Semantic Web (and AFAIK Microdata doesn’t intend to), mostly because it’s not defined as RDF serialization (although there are ways to extract RDF from Microdata).
RDFa is an RDF serialization, and RDF is the foundation of W3C’s Semantic Web.

The specifications RDFa Core and HTML+RDFa may be more complex than HTML Microdata, but it’s not a "fair" comparison because they offer more features.

Similar to Microdata would be RDFa Lite (which "does work for most day-to-day needs"), and this spec, at least in my opinion, is way less complex than Microdata.

What to do?

If you want to support specific consumers (for example, a search engine and a browser add-on), you should check their documentation about supported syntaxes.

If you want to learn only one syntax and have no specific consumers in mind, (attention, subjective opinion!) go with RDFa. Why?

RDFa matured over the years and is a W3C Rec, while Microdata is a relatively new invention and not standardized by the W3C.
RDFa can be used in many languages, not only HTML5.
RDFa allows mixed use of vocabularies for the same content, and it natively supports the use of reverse properties.

Can’t decide? Use both.

Note that you can also use several syntaxes for the same content, so you could have Microdata and RDFa (and Microformats, and JSON-LD, and …) for maximum compatibility.

Here’s a simple Microdata snippet:

<p itemscope itemtype="http://schema.org/Person">
  <span itemprop="name">John Doe</span> is his name.
</p>

Here’s the same snippet using RDFa (Lite):

<p typeof="schema:Person">
  <span property="schema:name">John Doe</span> is his name.
</p>

And here both syntaxes are used together:

<p itemscope itemtype="http://schema.org/Person" typeof="schema:Person">
  <span itemprop="name" property="schema:name">John Doe</span> is his name.
</p>

But it’s typically not necessary/recommended to go down this route.

Outras dicas

The main advantage you get from any semantic format is the ability for consumers to reuse your data.

For example, search engines like Google are consumers that reuse your data to display Rich Snippets, such as this one:

Recipe Rich Snippet

In order to decide which format is best, you need to know which consumers you want to target. For example, Google says in their FAQ that they will only process microdata (though the testing tool does now work with RDFa, so it is possible that they accept RDFa).

Unless you know that your target consumer only accepts RDFa, you are probably best going with microdata. While many RDFa-consuming services (such as the semantic search engine Sindice) also accept microdata, microdata-consuming services are less likely to accept RDFa.

It's long, but one of the most comprehensive answers you'll get to this question is this blog post by Jeni Tennison: Microdata and RDFa Living Together in Harmony

I'm not certain if unor's suggestion to use both Microdata and RDFa is a good idea. If you use Google's Structured Data Testing Tool (or other similar tools) on his example it shows duplicate data which seems to imply that the Google bot would pick up two people named John Doe on the webpage instead of one which was the original intention.

I'm assuming therefore that using one syntax for a given item is a better idea (you should still be able to mix syntaxes as long as they describe separate entities).

Though I would be happy to be proven wrong on this.

I would say it largely depends on the use case: For Scientific use cases RDF is common and used in different aspects.

For enriching Websites JSON-LD is now recommended, at leas by Google.

A JavaScript notation embedded in a tag in the page head or body. The markup is not interleaved with the user-visible text, which makes nested data items easier to express, such as the Country of a PostalAddress of a MusicVenue of an Event. Also, Google can read JSON-LD data when it is dynamically injected into the page's contents, such as by JavaScript code or embedded widgets in your content management system.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow