Question

After loading our initial facts into the cube, we then load a second file that adds measures to the existing facts (so no new facts are created by the second file). We use a Handler to do this.

When the second file is removed from the filesystem, we would like to remove just the relevant measures from the facts.

Is there a way for us to plug into the Directory/File Watcher mechanism to accomplish this?

Was it helpful?

Solution

You could extend

.CSVSource.onFileAction(IFileWatcher watcher, Collection<String> added, Collection<String> modified, Collection<String> deleted)

by calling super.onFileAction(...) which will process the added and modified files, and add more logic to handle deleted files.

This can be done by updating the facts which has contributed a deleted file in their deletedFile field. Such a field could be filled automatically by adding the FILEPATH metadata in your LoadInstructions.csv file:

Format,FilePattern,FilePath,MetaData
FormatName,formatRegex.csv,someFolder,FILEPATH=N/A

and having a field like:

<field name="FILEPATH" type="string" indexation="dictionary" nullable="true" defaultValue="N/A" />

OTHER TIPS

If we understand correctly, and to simplify the usecase, your dataset has two measures A and B. For the same records one file brings measure 'A' and another file brings measure 'B'. And you want to freely update or delete the data for measure A or B independently.

There are several ways you can achieve this.

First you could decouple the measures: instead of records that bear both A and B fields, you would have two records with a generic "value" field, and a "mesure type" field to distinguish between both measure types. This design is flexible because you can introduce a new measure 'C' later, itself fed from another file.

The most elegant option is probably to use the ActivePivot Distributed Architecture, with Polymorphic Distribution. You would setup two independent cubes, one holding only the 'A' measure, another cube with the 'B' measure. Then join the cubes together with polymorphic distribution, ActivePivot will merge them together on the fly and present both measure as if they belonged to the same (virtual) cube.

Finally the quick and dirty solution: configure your measures as 'nullable' fields in ActivePivot. This way when you want to erase measure 'A', you actually write 'null' to the 'A' fields of your records.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top