Question

My objects structure in a current program is organized so that a Doc object contains a list of Mention objects, and each Mention object contains a list of Word objects. Words are identified by their position in the Doc's text and also store some other information (its text, its wordnet sense...)

In the processing of the program (through user interaction, etc..) Word objects inside a Mention can be accessed and modified value (for example, update its sense). User interaction with each Mention is a requirement.

The problem I met here is that several Mentions that belong to the same Doc might share some same Words (after all, all words are in the Doc). So when such a Word is updated, how should I update the corresponding Word contained in other Mentions ? In other words, these Words are in the same exact location in the text and should be updated together, but they are stored separately in Mentions. So how should one update change the others ?

One approach I used is when a Word inside a Mention is modified, I retrieve all mentions (from a stored Doc reference) and then update the corresponding Word in any Mention that contains it. This requires a for loop with Equals checks on each update, which is quite a lot processing.

The second approach I think of is not to store separate Word lists in Mentions. Only a single list of Words is store in Doc, and in each Mention the indices of which Words belong to the Mention is store in a list. So when updating a Word, I will call an update function from the Doc's reference to update the Doc's list. However, the problem lies in the function that returns the whole list of Words for a Mention. I have to return a new list of Words, using the indices I have to pick actual Words inside the Doc's list. This is needed because all Words inside that Mention may have been modified by some other Mention(s) just shortly before. Alternatively, I can check if a Word is updated and copy the update. But it still requires a for loop through all Words in the Mention, so it still seems weird (Each time retrieving the list = long operation)

What I want to ask is if there is any better solution to this update problem. Any help is much appreciated :) If it is necessary, I will add part of my code here.

Was it helpful?

Solution

As I already said in my comments, don't create multiple Word instances for the same word in the document. So, regarding your comment: There never would be a w1 and w2 for a single physical word in the document. There only would be w.

Example:

var w = new Word(2, 3, "age", 1)

var mention1 = new Mention(w);
var mention2 = new Mention(w);

mention1.UpdateWord(); // sets the fourth property of w to 3

mention2.PrintWord(); // prints (2, 3, "age", 3)

This works, because both Mention instances work on the same Word instance.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top