Suggest a persistent strategy for a workflow system

https://stackoverflow.com/questions/4801987

22-10-2019
|

Question

I am in the process of creating a UI configuration tool for my pet project. One aspect of this tool lets the end user DEFINE his orchestration. I then need to save this orchestration definition into a database. There will be a executable version of this definition in a running system. The executable version is created dynamically on-demand.

Idea is to separate the DEFINITION from EXECUTABLE version so that I have the flexibility to choose the runtime version among BPMN or JPDL or a POJO based workflow solution (BeanFlow).

Limitation: I can't use the BPMN editors that come with frameworks like jBPM, Activiti etc as I wan't to use my own UI that is specific to my domain.

I need suggestions on HOW to PERSIST the definition.

Should I use rdbms tables? If so, is there a db schema I can borrow that is close to orchestration concepts?
Should I serialize my definition to BPMN/JPDL XML instance document?
Are there any other simple formats that I can use?

Solution

By "orchestration" I'm assuming you mean a finite state machine. Where the current state dictates what transitions can be followed to other states. The representation of states and transitions as edges and vertices often produces a directed acyclic graph, however there are times when the graph will cycle (e.g. draft -- submit for approval --> pending approval -- reject --> draft).

In practice, separating the definition from execution calls for a persistence format that can easily accommodate customization. As your system evolves you will find a number of unanticipated edge cases whose solution should not require altering a persistence schema, only code. This implies XML or a NoSQL solution - something whose schema is easily changed or non existent.

Now, having written my own XML definition for this purpose (for uninteresting reasons I'll exclude), my suggestion is using JPDL (or BPMN). Reason is their definitions likely incorporate whatever you're considering now, will in the future, and enable customization - such as hanging arbitrary data or behavior off them at a given point. You also get the advantage of tools already built - not just UI - for dealing with cycle detection and ensuring there is a path to completion for example.

Some of the interesting features I know JPDL possesses are an ability to help merge forked processes, timed tasks (including those that repeat periodically), and facilities for sending notification. This last item - notification - bears some further exposition. One of the things I've found with my own system is the need for sending out configurable email whose content is based on the data flowing through. These existing engines make that relatively easy by providing a way to plugin variables for instance into text that's then dynamically evaluated at run time before transmission. Also they provide bridges between the engine and whatever user store for the purpose of sending notifications to groups of people, tasking them and enforcing security policy.

Finally, depending on the scope of your system, you will probably still be using a database as well. What I suggest is storing off the XML and data being orchestrated into the database in a serialized format. Then, if the data is being altered as it travels through the execution, write out serializations of the data - and perhaps workflow if it is also changed - into a history/audit log table as well.

OTHER TIPS

I would NOT use rdbms tables, or if you do, store the definitions as text blobs. Trying to make records for the definition is a bad idea because it's much more inflexible and difficult to change your definition over time. Many people would use different approaches, but I'd use JSON or YAML, and avoid XML. The motivation for that is to make it as simple as possible. Trying to use XML, especially a formalized specific format of XML is going to make you spend much more time meeting an exact specification that doesn't actually do anything to help what you're trying to accomplish. JSON and YAML are both very easy to work with from a code perspective. YAML is more easily readable by humans and easier to edit, and isn't as tricky for punctuation and escaping as JSON. JSON is more widely used, and is smaller than YAML. JSON also has a binary counterpart, BSON, if document size is a concern.

Once you have an importer/exporter that goes to/from your internal objects to your data format, then persisting using RDBMS, or other mechanisms, will be straightforward. You could even use CouchDB, which could offer other benefits to your application and may be a great fit.

Very good question! Here is my two cents:

RDBMS: if you do this you will be able to query the workflow instances, for example which tokens are at 'node X'?
Storing XML as clob: the simplicity is the truth of this solution, but you can't really query these just get them by id
NOSQL: there are a lot of different solutions for different problems. MongoDB is a popular solution, it provides document oriented persistence.

How about a simple serialisation of the composed UI using for example XStream and then store the serialised bits into the database as a binary column. Then when user logs in, get the associated data, deserialise, initialise if required and display.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow