Pregunta

I am working on a power management problem where I control the power management of a computing board based on the occurance of events. I am using Reinforcement learning (the traditional Q-learning) for power management where the computing boards works as a Service Provider (SP) for processing requests (images). The SP is connected to a smart camera and the Power Manager (PM) algorithm runs on the camera where it issues appropriate power commands (sleep, wake-up) to the SP. The smart camera captures images (requests) based on the occurance of an event and maintains a Service Queue (SQ) for the requests (images). I also have an ANN based workload estimator that classifies the current workload as low or high. The state space for the Q-learning algorithm is therefore comprises a composite for Q(s,a) where s=(SR, SQ, SP). SR is the state of the workload. SQ is the state of the service queue and SP is the state of the service provider. Based on the current workload, state of the queue and the state of the service provider, the PM issues certain commands to the SP (sleep, wake-up). The decision is taken at the following stages:

  1. SP is idle
  2. SP just entered the sleep state and SQ>=1
  3. SP is in the sleep state and SQ transits from 0 to 1.

For each action, a cost is assigned which consists of a weighted sum of average power consumption and average latency per request caused by the action. In both sleep state and idle state, the action comprises selecting some time-out values from a list of pre-defined time-out values. My problem is as follows:

When the SP enters sleep state and selects a time-out value, some requests may arrive during the time-out value and hence the state of SQ changes. This also changes the composite state (e.g., S(0,0,0) to S(0,N,0). At the end of time-out value, the PM decides to wake-up the SP (as SQ>0). After waking up, the SP processes the requests and when SQ =0 it has a state (0,0,1) or (1,0,1). It then assigns a cost to the previous state. It also updates the Q-matrix accordingly. My problem is that, shall the cost be assigned to state (0,0,0) or to (0,N,0)? In principle, the previous state is (0,N,0) but this request is reached automatically at the arrival of some requests in the queue and hence there is not action taken in this state and no action is available to assign cost.

¿Fue útil?

Solución

Q-learning applies to Markov Decision Processes (MDP), where performing an action in a given state causes a deterministic transition to a new state.

It is not clear whether the problem you describe is a Partially Observable Markov Decision Process (POMDP) or an MDP. I you have a POMDP (you are making a decision to sleep or wake without information about the state of the queue) then the problem is harder to solve.

If you are only making a sleep-timeout decisions when you can observe the state of the system, then you have an MDP. In this case, you should only update the Q matrix when you reach the next state from which you are selecting an action.

Here is how I understand the sequence of events in your example:

  1. The system is sleeping in state (0,0,0)
  2. Requests arrive in the queue, the system is still sleeping - (0,N,0).
  3. The system wakes up - (0,N,1) or (1,N,1)
  4. The system processes the requests - (0|1,0,1)

After step 4, the system needs to make another timeout decision and update the Q matrix. The current state is (0|1,0,1), and this state should be used in the Q-learning algorithm.

You are worried, though, that updating the Q matrix at (0|1,0,1) will not account for the time that the system took to process the N requests that arrived while it was sleeping. There are probably a number of options to deal with this problem, most of which probably involve restructuring the state space of your problem. One way to do so is to account for the N requests in the reward function - if the system finds a large the number of requests you find on awaking, then it should immediately penalize the previous action.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top