Frage

Dynamically apply methods/"rules" to documents

I hope someone here can help me/guide me in the right direction. I am currently creating a web application, where users can import a text file, and then programmaticaly apply different methods on the text.

Example

Imagine that an user have imported a text document, that looks like below.

Raw string

Now as explained, I wish to allow my users to perform/apply a range of different methods to above text. They should be able to apply all rules, in any given order. Consider below example where I have perfomed 4 rules to the original text document:

Rules

As you can see, the text is transformed during each rule, as the method should be applied to the text and output the new text string.

Now the users should be able to save these rules, so the next time the user uploads a document to this specific stream. The thought is, that the next time the user uploads a document, these rules should automatically be applied for the document.

My question is, what would be the best approach to allowing my users to do this dynamically?

I will define the methods/rules that a user can perform on the text document - but what's the best approach:

  1. Save the rules to the database
  2. Programmatically apply the rules to each incoming document (parse each document, based on the rules)

My progress so far

So I am a bit lost on where to begin here, but I was thinking something like below.

Streams: A stream is kind of like a "stack" for all documents. I can upload multiple different documents to a stream. I can create multiple streams, which holds multiple rules.

streams table

id | name
  1. Name: the name of the stream. For example "Documents from Acme Inc"

documents table:

id | stream_id | path | content
  1. Stream Id: A Stream will be can have many documents. So each document uploaded to a specific stream, will be parsed by the rules defined on the stream.
  2. Path: the server path to the document
  3. Content: The text content of the document

parsing_rules table

id | stream_id | method | arguments
  1. Stream Id: Parsing rules will belong to a stream. So all documents imported into the stream, will be parsed by the rules associated with the specific stream.
  2. Rule: the name of the rule applied by the user. This will also refer to the method name in my PHP code.
  3. Arguments: Optional. The arguments that will be applied to each rule/method.

An example of the rules from the 2nd screenshot above, would then look like in the parsing_rules table:

1 | 5 | remove_empty_lines | null
2 | 5 | text_replace | "a:2:{s:6:"Search";s:9:"Laracasts";s:7:"Replace";s:6:"Google";}"
3 | 5 | regex_text_replace | "a:2:{s:7:"Pattern";s:9:"/Google/i";s:11:"Replacement";s:6:"Amazon";}"
4 | 5 | start_position_no_lines | a:1:{s:4:"Line";s:1:"2";}"

So here, method accepts the name of the actual method that should be called, and arguments is the arguments the specific method accepts/requires - but serialized.

How to apply these rules?

I was thinking that each time a new document is uploaded/imported into a stream, I will apply the rules associated with the stream. Something like:

$content = $document->content;
$parsing_rules = $stream->parsingRules()->get();
foreach($parsing_rules as $rule)
{
    $arguments = unserialize($rule->argments);
    
    return $this->{$rule->method}($arguments, $content);
    
}

Now above is no where near perfect, and it will return the $content already after the first iteration.

Any feedback is highly appreciated. Above is only my thoughts on how to do this project, but I am not sure if there is a better approach to solve this.

War es hilfreich?

Lösung

From my standpoint, we can improve your solution a little bit

Add the order column to the parsing_rules table

Because the list of rules is an ordered list, we need to save the index of each stage to the database to make sure that we get the correct order of rules every time we retrieve them from database.

Move rules implementations to dedicated classes instead of methods

When you add more rules and your rules become more complex, putting all rules in one class can make the class hard to be maintained.

First, design a common interface that every rule class implements:

interface RuleInterface {
    public function apply(string $text): string;
}

Now, each rule is represented as a concrete class that implement the RuleInterface. If a rule requires some parameters, add these parameters to the constructors of the concrete classes. For examples, your last rules can be converted to a class:

class StartPositionRule implements RuleInterface {
    private $startLine;

    public function __construct(array $options) {
         $this->startLine = $options['startLine'];
    }

    public function apply(string $text): string {
         // apply this rule to the text and return the results
    }
}

To save the rule to the database, save the class name to method column and the $options array to arguments column:

$row->method = get_class($rule);
$row->arguments = serialize($options);

To retrieve the rule from the database, create new instance of of the rule class from the class name and the arguments:

$class = $row->method;
$arguments = unserialize($row->arguments);
$rule = new $class($arguments);
// add the unserialized rule to the list of before parsing text
$rules[] = $rule;

In your client class, rules is represented in a list of instances of RuleInterface interface. The text can be transformed as:

$text = $originalText;
foreach($rules as $rule) {
    $text = $rule->apply($text);
}
Lizenziert unter: CC-BY-SA mit Zuschreibung
scroll top