Optimal Control Problems In Communication Networks With Information Delays And Quality Of Service Constraints
We consider simple models that capture this situation. Consider a network node through which two connections pass, one each of Types 1 and 2. One would like to maximize the throughput of the Type 2 connection while ensuring that the Type 1 connection's performance objectives are met. This can be set up as a constrained optimization problem that, however, is very hard to solve. We introduce a parameter b that represents the "cost" of buffer occupancy by Type 2 traffic. Since buffer space is limited and shared, a queued Type 2 packet means that a buffer position is not available for storing a Type 1 packet; to discourage the Type 2 connection from hogging the buffer, the cost parameter b is introduced, while a reward for each Type 2 packet coming into the buffer encourages the Type 2 connection to transmit at a high rate.
Using standard on-off models for the Type 1 sources, we show how values can be assigned to the parameter b; the value depends on the characteristics of the Type 1 connection passing through the node, i.e., whether it is a Variable Bit Rate (VBR) video connection or a Continuous Bit Rate (CBR) connection etc. Our approach gives concrete networking significance to the parameter b, which has long been considered as an abstract parameter in reward-penalty formulations of flow control problems (for example, [Stidham '85]).
Having seen how to assign values to b, we focus on the Type 2 connection next. Since Type 2 connections do not have strict performance requirements, it is possible to defer transmitting a Type 2 packet, if the conditions downstream so warrant. This leads to the question: what is the "best" transmission policy for Type 2 packets? Decisions to transmit or not must be based on congestion conditions downstream; however, the network state that is available at any instant gives information that is old, since feedback latency is an inherent feature of high speed networks. Thus the problem is to identify the best transmission policy under delayed feedback information.
We study this problem in the framework of Markov Decision Theory. With appropriate assumptions on the arrivals, service times and scheduling discipline at a network node, we formulate our problem as a Partially Observable Controlled Markov Chain (PO-CMC). We then give an equivalent formulation of the problem in terms of a Completely Observable Controlled Markov Chain (CO-CMC) that is easier to deal with., Using Dynamic Programming and Value Iteration, we identify structural properties of an optimal transmission policy when the delay in obtaining feedback information is one time slot. For both discounted and average cost criteria, we show that the optimal policy has a two-threshold structure, with the threshold on the observed queue length depending, on whether a Type 2 packet was transmitted in the last slot or not.
For an observation delay k > 2, the Value Iteration technique does not yield results. We use the structure of the problem to provide computable upper and lower bounds to the optimal value function. A study of these bounds yields information about the structure of the optimal policy for this problem. We show that for appropriate values of the parameters of the problem, depending on the number of transmissions in the last k steps, there is an "upper cut off" number which is a value such that if the observed queue length is greater than or equal to this number, the optimal action is to not transmit. Since the number of transmissions in the last k steps is between 0 and A: both inclusive, we have a stack of (k+1) upper cut off values. We conjecture that these (k + l) values axe thresholds and the optimal policy for this problem has a (k + l)-threshold structure.
So far it has been assumed that the parameters of the problem are known at the transmission control point. In reality, this is usually not known and changes over time. Thus, one needs an adaptive transmission policy that keeps track of and adjusts to changing network conditions. We show that the information structure in our problem admits a simple adaptive policy that performs reasonably well in a quasi-static traffic environment.
Up to this point, the models we have studied correspond to a single hop in a virtual connection. We consider the multiple hop problem next. A basic matter of interest here is whether one should have end to end or hop by hop controls. We develop a sample path approach to answer this question. It turns out that depending on the relative values of the b parameter in the transmitting node and its downstream neighbour, sometimes end to end controls are preferable while at other times hop by hop controls are preferable.
Finally, we consider a routing problem in a high speed network where feedback information is delayed, as usual. As before, we formulate the problem in the framework of Markov Decision Theory and apply Value Iteration to deduce structural properties of an optimal control policy. We show that for both discounted and average cost criteria, the optimal policy for an observation delay of one slot is Join the Shortest Expected Queue (JSEQ) - a natural and intuitively satisfactory extension of the well-known Join the Shortest Queue (JSQ) policy that is optimal when there is no feedback delay (see, for example, [Weber 78]). However, for an observation delay of more than one slot, we show that the JSEQ policy is not optimal. Determining the structure of the optimal policy for a delay k>2 appears to be very difficult using the Value Iteration approach; we explore some likely policies by simulation.
School:Indian Institute of Science
Source Type:Master's Thesis
Keywords:electrical communication quality of service qos computer networks optimal control flow routing multi hop model policy partially observable controlled markov chain po cmc completely co
Date of Publication:02/01/1995