Question

Consider the following

Group A

Job A {
    Depends on Job B of Group A
    Run User -> User1
}

Job B {
    Depends on Job C and Job D of Group A
    Run User -> User2
}

Job C {
    Depends on Job D of Group A and Job A of GroupB
    Run User -> User1
}

Job D {
    Depends on Job E of Group A
    RunUser -> User3
}

Job E { 
    Run User -> User3
}

Group B

Job A {
    Depends on Job C of Group B
    Run User -> User4 
}

Job F {
    Depends on Job C and Job D of Group B
    RunUser -> User2
}

Job C {
    Depends on Job D of Group A
    Run User -> User1
}

Job D {
    Run User -> User5
}

Group C

Job C {
    Depends on Job A of Group A
    Run User -> User6 
}

Job G {
    Depends on Job H of Group C
    Run User -> User5
}

Job H {
    Run User -> User7
}

Group D

Job I {
    Run User -> User8
}

and so on...

For simplicity let us assume that I have ~50-60 such groups and in each group, I have around 1000 Jobs. Run users are Unix users, a user with which Job runs.

If you look closely you will notice that this is cross-group directed acyclic graph of Jobs. Hence I am thinking to build an event-driven system for triggering these Jobs and for that I am thinking to use Kafka.

  1. Producer: Each invocation of a Job is a separate process. These are my producers (short lived).
  2. Consumer: Assume we have one consumer per run user. I am not sure how do I trigger Jobs for cases when Job is dependent on more than one event (i.e. for completion of more than one Job)?
  3. Topics: I am not sure about Kafka Topics. Should I have
    • One topic per group?
    • One topic per user?
    • One topic per group per user? Or,
    • One topic per user per group per job?

Basically I want to solve the following use-cases:

Usecase 1 Secretive Job A run by user x depends on secretive Job B run by user Y. Neither A nor B wants to tell anyone in the world about their existence. A and B need to trust each other which means they can know the existence of each other.

Usecase 2 Public Job A run by user x depends on secretive Job B run by user Y

Usecase 3 Secretive Job A run by user x depends on public Job B run by user Y

Usecase 4 Public Job A run by user x depends on public Job B run by user Y

Any ideas on how should I go about designing

  1. Kafka topics from the secure setup perspective to solve use-cases above.
  2. How do I consume events and launch Jobs (for jobs that depend on multiple other Jobs)?
Was it helpful?

Solution

This appears like it's possible with Kafka but I think it's going to be a little challenging to manage all the authorizations manually. I also think Kafka might be overkill here but I don't know enough about your volume to say for sure. If it were me, I'd probably look into a way to visualize the graph with the access required. There's a big risk of misconfiguration here.

Here's an article on securing Kafka. It would seem that you can allow access to topics by user. Given that, it probably makes sense for each user to have their own output topics. Then you can grant access to those as needed. You can do this by structuring the topic names hierarchically like user.job and add more levels as you see fit. Take some time to structure these properly such that the names go left (most general) to right (most specific). This will allow effective use of wildcards in your ACLs.

Licensed under: CC-BY-SA with attribution
scroll top