Best practices for data management in Prolog

Question 1

Consider using a more descriptive name for your predicate, for example:

id_fname_lname_age(_, _, _, _).

This explicitly denotes what the arguments are without needing any additional structures.

In my opinion, a good rule of thumb for naming predicates is to describe the arguments in the order they appear in, using declarative names, separated by underscores.

EDIT: As to your additional questions: assertz/1 is slow (and has many other disadvantages) in comparison to a nicely declarative programming style that simply passes arguments between predicates that do not intrinsically require any modifications of the clause database. When you really need to assert additional facts because you are using Prolog like a relational database system, then assertz/1 is one way to do it (other options are mentioned in other answers here), and will likely be comparable in efficiency to any other relational database system for many usage scenarios. As already mentioned, several modern Prolog systems perform just-in-time indexing on all arguments, and you therefore need not explicitly declare any "keys".

Question 2

No one has yet addressed your question regarding efficiency when using assert/retract.

For SWI-Prolog, in a nutshell, facts are indexed (just-in-time means when they are first queried), and lookup is very efficient (based on hash-tables). By default indexing is only on the first argument, but there are built-ins for working around this (I guess it would be a pain to keep everything in a normalized form).

The rule-of-thumb seems to be, as long as all your data fits in memory, and you don't assert/retract too often, it is the best choice. You can use library(persistency) to make a predicate persistent.

As for things like constrains and triggers etc, I guess you would have to write your own predicates, but with Prolog's syntax this should not be more verbose than defining these in SQL (my experience in relational databases is quite limited though so I might be talking out of my ass).

Question 3

Prolog is based on a relational data model.

Then a relational data model is - banally - adequate to Prolog, albeit - personally - I miss the metadata facilities you get with SQL DML. Documentation - when available - can easily go out of sync, and it's a pain to handle relations with many columns, partly because Prolog is typeless, and partly because you cannot (easily) 'call by name' columns - Prolog misses the 'projection operator' available in relational algebra (and SQL, of course). SWI-Prolog has library(record) to overcome the problem, but I don't like it too much.

Generally, when it come to some 'real world' data modelling, like deeply nested (XML/HTML/SVG/whatever) representations, or dimensionally indexed entities, like spatial and geographical DBs, or large graphs, as those requested by today ontologies, relational only data modelling can be inadequate.

You must supply the missing details, and this technically can be very complex. If you need some indexing your Prolog engine doesn't provide, you will get buried in writing difficult interfaces in low level languages (usually C). Then why not to use some easier language, with ready to use (and debugged) libraries modeled on that complex data ? There are plenty of them.

As a consequence, SWI-Prolog, which development get driven by practicality, instead of abstract language (both natural and synthetic) research that was the initial focus of Prolog applications, has specialized interfaces - for instance - for the Web and for ontologies. See the packages page, most of them are well crafted interfaces to complex data.

From a SW engineering perspective, availability of such interfaces make a difference in language choice. Just to underline how high SWI-Prolog is going in reputation, it has been recently nominated (like Python) for Dutch ICT innovation award.

Ongoing development - like quasi quotation for embedding javascript in DCG based HTML generation - and great support from the SWI-Prolog mailing list are great value adder!

Personally, I'm dedicating my efforts to learn - by applying to practical tasks - RDF modeling.

Question 4

Boris - I made this assertion, or nearly, recently on the swipl list, "The best way to save it is to use qsave_program and not just a text file with all facts." and Jan made a convincing argument that using library(persistency) was a better option. I think the days of save_state as persistance mechanism are gone.

Question 5

If you're interested in using your first format, I'd highly recommend using a list inside the predicate, like so:

person([first_name(_), last_name(_), age(_)]).

This way you can add or remove things as you want. It also makes it easier to grab info out of a particular piece:

?- person(P), member(first_name(Name), P).
P = [first_name(dave), last_name(hardy), middle_name(robert), age(27)],
Name = dave .

This method also makes it really easy to maintain lists of the data, in case you don't want to have the data permanently asserted.

Question 6

Download the Prolog version of WordNet and take a look at what's going on in there:

What would be a relational database table is a separate file.
If you must, generate an integer ID and put it in the first position. WordNet chose only to give the word senses their own IDs.
Document what goes in each position in the documentation.

The other proposals here seem unnecessarily burdensome to me. If you are content with only Prolog accessing this data, then store it in Prolog's format and make life easy on you while you use Prolog. If Prolog is going to be just one of several languages accessing the data, stick it in a relational database. The burden of getting to it from Prolog will be offset by everything else being easier.

Migrations are not terribly hard to fake with Prolog. Take advantage of listing/1:

%   save_database(+functor, +filename)
%
% Records all the facts of Functor in Filename
save_database(Functor, Filename) :-
  telling(OldStream), tell(Filename),
  listing(Functor),
  told, tell(OldStream).

e.g., save_database(foo/1, 'foo.pl'). You can easily write data migrations on top of this. I really don't see a use case that justifies the greater complexities suggested in the other answers.

Question 7

I haven't personally tried this approach out yet but found a simple example/introduction. Turtle is used for a concise RDF data definition, and then the pure Prolog rdf/3 predicate is used to query the RDF data without using any external query languages, relying on Prolog's backgtracking only:

@prefix ex: <http://example.org/ns#>
<ex:user1>
  ex:name "Annie"
  ex:email "annie@example.com"
<ex:drawing1>
  ex:title "Railroad Car"
  ex:author <ex:user1>

Finding the author name:

drawing_author_name(Drawing, Name):-
  rdf(Drawing, ex:author, Author),
  rdf(Author, ex:name, Name).

This could also be applied in pure Prolog, without RDF:

entity(user1). % optional
name(user1, "John").
email(user1, "john@gmail.com").

entity(drawing1). % optional
% ref() wrapper for distinguishing from plain atoms
author(drawing1, ref(user1)).
title(drawing1, "Railroad Bus").

drawing_author_name(Drawing, Name):-
  author(Drawing, ref(Author)),
  name(Author, Name).

Question 8

Update:

my first response instructed the use of terminus_store_prolog to access Terminus’ data store directly, however, further research revealse that TerminusDB can be accessed also at a higher level from Prolog by using its core modules (not client API but not raw store either). See the following forum post for a detailed explanation:

https://discuss.terminusdb.com/t/direct-access-to-terminus-store/291/2

———-

There is the collaborative graph database TerminusDB, which is written in Prolog. It seems to mainly advertise its JS and Python client libraries, but a closer look also reveals a Prolog bindings library to its data store:

Create a new directory (testdir in this example), then do the following:

open_directory_store("testdir", Store),
open_write(Store, Builder),
create_named_graph(Store, "sometestdb", DB),
nb_add_triple(Builder, "Subject", "Predicate", value("Object")),
nb_commit(Builder, Layer),
nb_set_head(DB, Layer).

Add a triple to an existing named graph

open_directory_store("testdir", Store),
open_named_graph(Store, "sometestdb", DB),
open_write(DB, Builder),
nb_add_triple(Builder, "Subject2", "Predicate2", value("Object2")),
nb_commit(Builder, Layer),
nb_set_head(DB, Layer),

Query triples

open_directory_store("testdir", Store),
open_named_graph(Store, "sometestdb", DB),
head(DB, Layer),
triple(Layer, Subject, Predicate, Object).

Convert strings to ids and query by id

open_directory_store("testdir", Store),
open_named_graph(Store, "sometestdb", DB),
head(DB, Layer),
subject_id(Layer, "Subject", S_Id),
id_triple(Layer, S_Id, P_Id, O_Id),
predicate_id(Layer, Predicate, P_Id),
object_id(Layer, Object, O_Id).