Reducing a sequence of array modification operations (insert, sort, replace, remove)

https://softwareengineering.stackexchange.com/questions/289387

09-10-2020
|

Question

Given a plain array of values, e.g.

["Apple", "Orange", "Banana", "Strawberry"]

and a list of operations from the set of [insert, delete, sort and replace], e.g.

[
  {cmd: "insert", index: 2, entries: ["Cherry", "Kiwi"]},
  {cmd: "delete", indices: [3]},
  {cmd: "sort", newOrder: [3,1,2,0,4]},
  {cmd: "insert", index: 3, entries: ["Peach"]},
  {cmd: "replace", index: 3, newEntry: "Raspberry"}
]

which should result in the array

["Banana", "Orange", "Cherry", "Raspberry", "Apple", "Strawberry"]

How can I, with relatively little effort, transform my sequence of operations into something less redundant? E.g.

[
  {cmd: "sort", newOrder: [2,1,0,3]},
  {cmd: "insert", index: 2, entries: ["Cherry", "Raspberry"]}
]

Note that deleting an entry and then inserting an identical one is considered equivalent to adding a completely new entry, whereas inserting a new entry and later deleting cancels out. Edits/replacements/renames of entries need to be kept track of, but they don't combine with any of the other operations.

Question:

This seems like something that would come up in the context of database query optimization, but I have no idea what to google for. I'm ultimately looking for an algorithm (preferably a JS implementation) that automates this kind of thing for me, if only in a naïve/greedy fashion (it doesn't have to exhaustively search for the best reduction, if it can't be done in polynomial time).

Background:

This came up while working on a web app in which the user can edit a list of entries, accept and submit those edits, and they then trigger a reshuffling of a set of corresponding arrays on the server. That means that I have to capture how the elements move around in the array, not just what the array looks like after all the changes are done. (I need to keep track of the operations for undo/redo functionality anyway.) Since the amount of data involved can be quite large, combining those operations before sending them would translate to shorter uploading times and fewer database writes server-side. All of this is purely theoretical at this point and may well amount to premature optimization, but I found it an interesting problem and would love some input.

Solution

I'd try something like symbolic computing where we run the computation over an abstraction of all possible starting states, the goal of this symbolic execution is to produce an abstraction of the ending state, which is then reflective of the computations performed by the set of operations to be optimized.

(Because we use an abstraction of all possible starting states, we can perform the transformation (by symbolic execution) of input steps to their effect (as abstract output state) without involving the real database.)

The abstract state that you can compute over will store nodes of several kinds. First constant or known nodes, having fixed index and known value, like 2:Cherry, which is like what the real list can store, and second, unknown values over known ranges (e.g.I'll write ?0-1 to mean unknown value from positions 0 thru 1 in the original input), and unknown values over only partially known range (e.g. I'll write ?0-N where N is the length of the input list).

So, the initial abstract state is a list of length N of unknown values, which I'll represent as [ ?0-N ], an abstract list of one node that says it has the original values (unknown) from the list positions 0 thru N.

After the insertion of the first step, our abstract list will have four nodes. One to represent the range 0-1, one each for positions 2 and 3 (inserted), and one for the remaining positions. The abstract list now looks like [ ?0-1, 2:Cherry, 3:Kiwi, ?2-N ] where ?2-N means the original value of 2 thru the original values to the length of the list.

After the deletion, we remove element at position 3 in the (current) abstract list, leaving us with [ ?0-1, 2:Cherry, ?2-N ]. And so on...

The sort operates on the immediately above and leaves us with: [ ?2, ?1, 2:Cherry, ?0, ?3, ?4-N ].

The next insert changes the abstract state to: [ ?2, ?1, 2:Cherry, 3:Peach, ?0, ?3, ?4-N ]. And the replace results in: [ ?2, ?1, 2:Cherry, 3:Raspberry, ?0, ?3, ?4-N ].

That end result for this symbolic computation over the abstract state will tell us the resulting command to execute to get that effect on an arbitrary input list, and if you like can be teased apart into sorts, deletes, and inserts; However we might be better off if we could interpret the abstract result state as a single command directly by adding a new bulk change command that can do several kinds of operations in one go (e.g. cmd:change-list substitute:[ ?2, ?1, Cherry, Raspberry, ?0, ?3, ?4-N ])

OTHER TIPS

The most straightforward way would be to track some metadata along with the array, so you end up with a data structure something like this:

[{name: "Banana", originalOrder: 2},
 {name: "Orange", originalOrder: 1, deleted: true}, 
 {name: "Cherry", new: true}, 
 {name: "Raspberry", new: true}, 
 {name: "Apple", originalOrder: 0}, 
 {name: "Strawberry", originalOrder: 3}]

Then it's all relatively simple linear operations to find the new order, create a list of deletions, and create a list of insertions.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange