Marklogic 7: Semantic Search

Question 1

As Michael says, there are many ways you could go with this. That's because MarkLogic 7 is so flexible - you can express information as triples or as XML (or as JSON or ...) and mix'n'match data models and query languages

The first thing to figure out is - what are you trying to achieve? If you just want to get your feet wet with MarkLogic's mix of XML and triples, here's what I'd suggest:

ingest your XML documents as above. If you have something text-heavy such as a description of the account or a free-text annotation, so much the better.
Using XQuery or XSLT, add a triple to each document that represents the city e.g. for the sample document you posted, add

--this document URI-- unique/uri/Location New York
import triples from the web that map city names to states and zip codes (e.g. from geonames)
now with a mixture of SPARQL and XQuery you can search for e.g. the current balance of every account in some zip code (even though your documents don't contain zip codes).

The documentation gives a good description of loading triples from external sources using mlcp.

See http://docs.marklogic.com/guide/semantics/setup

and for more detail on loading triples see http://docs.marklogic.com/guide/semantics/loading

Note too that you can now run either XQuery or SPARQL (or SQL) queries directly from Query Console at http://your-host:8000/qconsole/

Question 2

It's up to you. If you want to use XML for some facts and triples for others, you can transform selected facts from XML to triples, and combine those in the same documents. For the XML you presented, that's how I'd start. As you insert or update each document in the original XML format, pass it through XQuery that adds new triples. I'd keep those new triples in the same document with the original XML.

You could do this using CPF: http://docs.marklogic.com/guide/cpf - or with a tool like http://marklogic.github.io/recordloader/ and its XccModuleContentFactory class.

But if you want to get away from the original XML format entirely, you could do that. Then you would translate your XML into triples and ingest those triples instead of the original XML. Or you can also have pure XML documents and pure triple documents in the same database.