Fakultät für informatik informatik 12 technische universität dortmund Mapping: Applications  Platforms - Sessions 7-9 - Peter Marwedel TU Dortmund Informatik.

fakultät für informatik informatik 12 technische universität dortmund Mapping: Applications  Platforms - Sessions 7-9 - Peter Marwedel TU Dortmund Informatik 12 Germany Slides use Microsoft cliparts. All Microsoft restrictions apply.

- 2 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Schedule of the course TimeMondayTuesdayWednesdayThursdayFriday 09:30- 11:00 1: Orientation, introduction 2: Models of computation + specs 5: Models of computation + specs 9: Mapping of applications to platforms 13: Memory aware compilation 17: Memory aware compilation 11:00 Brief break 11:15- 12:30 6: Lab*: Ptolemy 10: Lab*: Scheduling 14: Lab*: Mem. opt. 18: Lab*: Mem. opt. 12:30Lunch 14:00- 15:20 3: Models of computation + specs 7: Mapping of applications to platforms 11: High-level optimizations* 15: Memory aware compilation 19: WCET & compilers* 15:20Break 15:40- 17:00 4: Lab*: Kahn process networks 8: Mapping of applications to platforms 12: High-level optimizations* 16: Memory aware compilation 20: Wrap-up * Dr. Heiko Falk

- 3 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Hypothetical design flow Specifications Embedded System HW Standard Software, Real- Time Operating Systems Applications of applications to execution platforms Evaluation Testing Optimization of Embedded Systems Application Knowledge

- 4 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Scope of mapping algorithms Useful terms from hardware synthesis:  Resource Allocation Decision concerning type and number of available resources  Resource Assignment Mapping: Task  (Hardware) Resource  xx to yy binding: Describes a mapping from behavioral to structural domain, e.g. task to processor binding, variable to memory binding  Scheduling Mapping: Tasks  Task start times Sometimes, resource assignment is considered being included in scheduling.

- 5 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Real-time scheduling Assume that we are given a task graph G=(V,E). Def.: A schedule  of G is a mapping V  T of a set of tasks V to start times from domain T. V1 V2 V4V3 t G=(V,E) T  Typically, schedules have to respect a number of constraints, incl. resource constraints, dependency constraints, deadlines. Scheduling = finding such a mapping.

- 6 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Hard and soft deadlines Def.: A time-constraint (deadline) is called hard if not meeting that constraint could result in a catastrophe [Kopetz, 1997]. All other time constraints are called soft. We will focus on hard deadlines.

- 7 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Periodic and aperiodic tasks Def.: Tasks which must be executed once every p units of time are called periodic tasks. p is called their period. Each execution of a periodic task is called a job. All other tasks are called aperiodic. Def.: Tasks requesting the processor at unpredictable times are called sporadic, if there is a minimum separation between the times at which they request the processor.

- 8 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Preemptive and non-preemptive scheduling  Non-preemptive schedulers: Tasks are executed until they are done. Response time for external events may be quite long.  Preemptive schedulers: To be used if -some tasks have long execution times or -if the response time for external events to be short.

- 9 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Dynamic/online scheduling  Dynamic/online scheduling: Processor allocation decisions (scheduling) at run-time; based on the information about the tasks arrived so far.

- 10 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Static/offline scheduling  Static/offline scheduling: Scheduling taking a priori knowledge about arrival times, execution times, and deadlines into account. Dispatcher allocates processor when interrupted by timer. Timer controlled by a table generated at design time. In a time-triggered system, the temporal control structure of all tasks is established a priori by off-line support-tools.

- 11 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Cost functions Cost function: Different algorithms aim at minimizing different functions. Def.: Maximum lateness = max all tasks (completion time – deadline) Is <0 if all tasks complete before deadline. t T1 T2 Max. lateness

- 12 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Classes of mapping algorithms considered in this course  Classical scheduling algorithms Mostly for independent tasks & ignoring communication, mostly for mono- and homogeneous multiprocessors  Resource access protocols  Dependent tasks as considered in architectural synthesis Initially designed in different context, but applicable  Hardware/software partitioning Dependent tasks, heterogeneous systems, focus on resource assignment  Design space exploration using genetic algorithms Heterogeneous systems, incl. communication modeling

- 13 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Aperiodic scheduling - Scheduling with no precedence constraints - Let {T i } be a set of tasks. Let:  c i be the execution time of T i,  d i be the deadline interval, that is, the time between T i becoming available and the time until which T i has to finish execution.  ℓ i be the laxity or slack, defined as ℓ i = d i - c i  f i be the finishing time. ℓiℓi didi cici t i

- 14 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Uniprocessor with equal arrival times Preemption is useless. Earliest Due Date (EDD): Execute task with earliest due date (deadline) first. EDD requires all tasks to be sorted by their (absolute) deadlines. Hence, its complexity is O(n log(n)). fifi fifi fifi

- 15 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Optimality of EDD EDD is optimal, since it follows Jackson's rule: Given a set of n independent tasks, any algorithm that executes the tasks in order of non-decreasing (absolute) deadlines is optimal with respect to minimizing the maximum lateness. Proof (See Buttazzo, 2002):  Let  be a schedule produced by any algorithm A  If A  EDD   T a, T b, d a ≤ d b, T b immediately precedes T a in .  Let  ' be the schedule obtained by exchanging T a and T b.

- 16 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Exchanging T a and T b cannot increase lateness Max. lateness for T a and T b in  is L max (a,b)=f a -d a Max. lateness for T a and T b in  ' is L' max (a,b)=max(L' a,L' b ) Two possible cases 1. L' a ≥ L' b :  L' max (a,b) = f' a – d a < f a – d a = L max (a,b) since T a starts earlier in schedule  '. 2. L' a ≤ L' b :  L' max (a,b) = f' b – d b = f a – d b ≤ f a – d a = L max (a,b) since f a =f' b and d a ≤ d b  L' max (a,b) ≤ L max (a,b) TbTb TbTb TaTa  '' TaTa f a =f' b

- 17 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund EDD is optimal  Any schedule  with lateness L can be transformed into an EDD schedule  n with lateness L n ≤ L, which is the minimum lateness.  EDD is optimal (q.e.d.)

- 18 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Earliest Deadline First (EDF) - Horn’s Theorem - Different arrival times: Preemption potentially reduces lateness. Theorem [Horn74]: Given a set of n independent tasks with arbitrary arrival times, any algorithm that at any instant executes the task with the earliest absolute deadline among all the ready tasks is optimal with respect to minimizing the maximum lateness.

- 19 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Earliest Deadline First (EDF) - Algorithm - Earliest deadline first (EDF) algorithm:  Each time a new ready task arrives:  It is inserted into a queue of ready tasks, sorted by their absolute deadlines. Task at head of queue is executed.  If a newly arrived task is inserted at the head of the queue, the currently executing task is preempted. Straightforward approach with sorted lists (full comparison with existing tasks for each arriving task) requires run-time O(n 2 ); (less with binary search or bucket arrays). Sorted queue Executing task

- 20 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Earliest Deadline First (EDF) - Example - Later deadline  no preemption Earlier deadline  preemption

- 21 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Least laxity (LL), Least Slack Time First (LST) Priorities = decreasing function of the laxity (the less laxity, the higher the priority); dynamically changing priority; preemptive. ℓ ℓ ℓ ℓ ℓ ℓ ℓ ℓ ℓ ℓ ℓ ℓ

- 22 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Scheduling with precedence constraints Task graph and possible schedule:

- 23 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Simultaneous Arrival Times: The Latest Deadline First (LDF) Algorithm LDF [Lawler, 1973]: reads the task graph and among the tasks with no successors inserts the one with the latest deadline into a queue. It then repeats this process, putting tasks whose successor have all been selected into the queue. At run-time, the tasks are executed in the generated total order. LDF is non-preemptive and is optimal for mono-processors. If no local deadlines exist, LDF performs just a topological sort.

- 24 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Asynchronous Arrival Times: Modified EDF Algorithm This case can be handled with a modified EDF algorithm. The key idea is to transform the problem from a given set of dependent tasks into a set of independent tasks with different timing parameters [Chetto90]. This algorithm is optimal for mono-processor systems. If preemption is not allowed, the heuristic algorithm developed by Stankovic and Ramamritham can be used.

- 25 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Overview © L. Thiele, ETH Zürich, 2006 Equal arrival times Non preemptive Arbitrary arrival times preemptive Independent tasks EDD (Jackson)EDF (Horn) Dependent tasksLDF (Lawler)EDF* (Chetto) Scheduling of aperiodic tasks with real time constraints: Table with some known algorithms:

- 26 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Periodic scheduling For periodic scheduling, the best that we can do is to design an algorithm which will always find a schedule if one exists.  A scheduler is defined to be optimal iff it will find a schedule if one exists. T1 T2

- 27 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Periodic scheduling Let  p i be the period of task T i,  c i be the execution time of T i,  d i be the deadline interval, that is, the time between a job of T i becoming available and the time until the same job T i has to finish execution.  ℓ i be the laxity or slack, defined as ℓ i = d i - c i pipi didi cici ℓiℓi

- 28 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Average utilization Average utilization: Necessary condition for schedulability (with m=number of processors):

- 29 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Independent tasks: Rate monotonic (RM) scheduling Most well-known technique for scheduling independent periodic tasks [Liu, 1973]. Assumptions:  All tasks that have hard deadlines are periodic.  All tasks are independent.  d i =p i, for all tasks.  c i is constant and is known for all tasks.  The time required for context switching is negligible.  For a single processor and for n tasks, the following equation holds for the average utilization µ:

- 30 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Rate monotonic (RM) scheduling - The policy - RM policy: The priority of a task is a monotonically decreasing function of its period. At any time, a highest priority task among all those that are ready for execution is allocated. Theorem: If all RM assumptions are met, schedulability is guaranteed.

- 31 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Maximum utilization for guaranteed schedulability Maximum utilization as a function of the number of tasks:

- 32 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Example of RM-generated schedule T1 preempts T2 and T3. T2 and T3 do not preempt each other.

- 33 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Case of failing RM scheduling Task 1: period 5, execution time 2 Task 2: period 7, execution time 4 µ=2/5+4/7=34/35  0.97 2(2 1/2 -1)  0.828 Missed deadline Missing computations scheduled in the next period

- 34 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Intuitively: Why does RM fail ? No problem if p 2 = m p 1, m  ℕ : T1T1 T2T2 t fits T1T1 T2T2 t should be completed Switching to T 1 too early, despite early deadline for T 2 leviRTS animation

- 35 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Critical instants Definition: A critical instant of a task is the time at which the release of a task will produce the largest response time. Lemma: For any task, the critical instant occurs if that task is simultaneously released with all higher priority tasks. Proof: Let T={T 1, …,T n }: periodic tasks with  i: p i ≦ p i +1. Source: G. Buttazzo, Hard Real-time Computing Systems, Kluwer, 2002

- 36 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Critical instances (1) Response time of T n is delayed by tasks T i of higher priority: c n +2c i TnTn TiTi t Maximum delay achieved if T n and T i start simultaneously. c n +3c i TnTn TiTi t Delay may increase if T i starts earlier

- 37 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Critical instants (2) Repeating the argument for all i = 1, … n-1:  The worst case response time of a task occurs when it is released simultaneously with all higher-priority tasks. q.e.d.  Schedulability is checked at the critical instants.  If all tasks of a task set are schedulable at their critical instants, they are schedulable at all release times.  Observation helps designing examples

- 38 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Properties of RM scheduling  RM scheduling is based on static priorities. This allows RM scheduling to be used in standard OS, such as Windows NT.  No idle capacity is needed if  i: p i+1 =F p i : i.e. if the period of each task is a multiple of the period of the next higher priority task, schedulability is then also guaranteed if µ  1.  A huge number of variations of RM scheduling exists.  In the context of RM scheduling, many formal proofs exist.

- 39 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Summary Mapping of applications to platforms  Scheduling algorithms for aperiodic task sets Earliest Due Date (EDD) Earliest Deadline First (EDF) Least Laxity (LL) Latest Deadline First (LDF)  Scheduling algorithms for periodic task sets rate monotonic scheduling (RMS)

- 40 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Coffee/tea break (if on schedule) Q&A?

- 42 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund EDF EDF can also be applied to periodic scheduling. EDF optimal for every period  Optimal for periodic scheduling  EDF must be able to schedule the example in which RMS failed.

- 43 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Comparison EDF/RMS RMS: EDF: T2 not preempted, due to its earlier deadline. EDF-animation

- 44 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund EDF: Properties EDF requires dynamic priorities  EDF cannot be used with a standard operating system just providing static priorities.

- 45 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Comparison RMS/EDF RMSEDF Priorities StaticDynamic Works with std. OS with fixed priorities YesNo Uses full computational power of processor No, just up till µ=n(2 1/n -1) Yes Possible to exploit full computational power of processor without provisioning for slack NoYes

- 46 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Sporadic tasks If sporadic tasks were connected to interrupts, the execution time of other tasks would become very unpredictable.  Introduction of a sporadic task server, periodically checking for ready sporadic tasks;  Sporadic tasks are essentially turned into periodic tasks.

- 47 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Resource access protocols Critical sections: sections of code at which exclusive access to some resource must be guaranteed. Can be guaranteed with semaphores S or “mutexes”. P(S) V(S) P(S) V(S) P(S) checks semaphore to see if resource is available and if yes, sets S to “used“. Uninterruptible operations! If no, calling task has to wait. V(S): sets S to “unused“ and starts sleeping task (if any). Mutually exclusive access to resource guarded by S Task 1 Task 2

- 48 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Priority inversion Priority T 1 assumed to be > than priority of T 2. If T 2 requests exclusive access first (at t 0 ), T 1 has to wait until T 2 releases the resource (time t 3 ), thus inverting the priority: In this example: duration of inversion bounded by length of critical section of T 2.

- 49 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Duration of priority inversion with >2 tasks can exceed the length of any critical section Priority of T1 > priority of T2 > priority of T3. T2 preempts T3: T2 can prevent T3 from releasing the resource. critical sectionnormal execution

- 50 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Solutions  Disallow preemption during the execution of all critical sections.  Simple, but creates unnecessary blocking as unrelated tasks may be blocked. © L. Thiele, ETH Zürich, 2006 T3T3 T2T2 T1T1 T 1 blocked critical sectionnormal execution

- 51 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund The MARS Pathfinder problem (1) “But a few days into the mission, not long after Pathfinder started gathering meteorological data, the spacecraft began experiencing total system resets, each resulting in losses of data. The press reported these failures in terms such as "software glitches" and "the computer was trying to do too many things at once".” …

- 52 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund The MARS Pathfinder problem (2) “Pathfinder contained an "information bus", …a shared memory area used for passing information between different components of the spacecraft.”  A bus management task ran frequently with high priority to move certain kinds of data in and out of the information bus. Access to the bus was synchronized with mutual exclusion locks (mutexes).”  The meteorological data gathering task ran as an infrequent, low priority thread, … When publishing its data, it would acquire a mutex, do writes to the bus, and release the mutex...  The spacecraft also contained a communications task that ran with medium priority.”  High priority: retrieval of data from shared memory Medium priority: communications task Low priority: thread collecting meteorological data

- 53 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Coping with priority inversion: the priority inheritance protocol  Tasks are scheduled according to their active priorities. Tasks with the same priorities are scheduled FCFS.  If task T1 executes P(S) & exclusive access granted to T2: T1 will become blocked. If priority(T2) < priority(T1): T2 inherits the priority of T1.  T2 resumes. Rule: tasks inherit the highest priority of tasks blocked by it.  When T2 executes V(S), its priority is decreased to the highest priority of the tasks blocked by it. If no other task blocked by T2: priority(T2):= original value. Highest priority task so far blocked on S is resumed.  Transitive: if T2 blocks T1 and T1 blocks T0, then T2 inherits the priority of T0.

- 54 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Example T3 inherits the priority of T1 and T3 resumes. How would priority inheritance affect our example with 3 tasks? V(S)

- 55 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Priority Inheritance Protocol (PIP) Example with nested critical sections © L. Thiele, ETH Zürich, 2006 T1T1 T3T3 T2T2 p3p3 a critical sectionnormal execution b P(a)P(b) b P(a) b V(b) a V(a) a t1t1 t2t2 t3t3 P3P3 P2P2 P1P1 V(b) b

- 56 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Priority Inheritance Protocol (PIP) Example of transitive priority inheritance: © L. Thiele, ETH Zürich, 2006 Source: G. Buttazzo T1T1 T3T3 T2T2 p3p3 b P(b) P(a) b V(b) b V(a) a t1t1 t2t2 t3t3 P3P3 P2P2 P1P1 V(b) aa b T 1 blocked by T 2, T 2 blocked by T 3 T 3 inherits priority from T 1 via T 2 critical sectionnormal execution

- 57 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Priority Inheritance Protocol (PIP) Problem: Deadlock critical sectionnormal execution b a b P(a) P(b) P(a) T1T1 T2T2 P(b) V(a) V(b) T1T1 T2T2 ………… ………… © L. Thiele, ETH Zürich, 2006 Source: G. Buttazzo

- 58 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Priority inversion on Mars Priority inheritance also solved the Mars Pathfinder problem: the VxWorks operating system used in the pathfinder implements a flag for the calls to mutex primitives. This flag allows priority inheritance to be set to “on”. When the software was shipped, it was set to “off”. The problem on Mars was corrected by using the debugging facilities of VxWorks to change the flag to “on”, while the Pathfinder was already on the Mars [Jones, 1997]

- 59 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Remarks on priority inheritance protocol Possible large number of tasks with high priority. Possible deadlocks. Ongoing debate about problems with the protocol: Victor Yodaiken: Against Priority Inheritance, http://www.fsmlabs.com/articles/inherit/inherit.html Finds application in ADA: During rendez-vous, task priority is set to the maximum. More sophisticated protocol: priority ceiling protocol.

- 60 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Impact on access methods for remote objects Software packages for access to remote objects; Example: CORBA (Common Object Request Broker Architecture). Information sent to Object Request Broker (ORB) via local stub. ORB determines location to be accessed and sends information via the IIOP I/O protocol. Access times not predictable.

- 61 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Real-time (RT-) CORBA A very essential feature of RT-CORBA is to provide  end-to-end predictability of timeliness in a fixed priority system.  This involves respecting thread priorities between client and server for resolving resource contention,  and bounding the latencies of operation invocations.  Thread priorities might not be respected when threads obtain mutually exclusive access to resources  RT-CORBA includes provisions for bounding the time during which such priority inversion can happen.  Priority management for primitives for mutually exclusive access to resources. Priority inheritance protocol must be available in implementations of RT-CORBA.

- 62 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Classes of mapping algorithms considered in this course  Classical scheduling algorithms Mostly for independent tasks & ignoring communication, mostly for mono- and homogeneous multiprocessors  Resource access protocols  Dependent tasks as considered in architectural synthesis Initially designed in different context, but applicable  Hardware/software partitioning Dependent tasks, heterogeneous systems, focus on resource assignment  Design space exploration using genetic algorithms Heterogeneous systems, incl. communication modeling

- 63 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Classification of Scheduling Problems Scheduling Independent Tasks RMS, EDF, LLF Dependent Tasks Resource constrained Time constrained Uncon- strained ASAP, ALAP FDSLS 1 Proc. LDF

- 64 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Dependent tasks The problem of deciding whether or not a schedule exists for a set of dependent tasks and a given deadline is NP-complete in general [Garey/Johnson]. Strategies: 1.Add resources, so that scheduling becomes easier 2.Split problem into static and dynamic part so that only a minimum of decisions need to be taken at run-time. 3.Use scheduling algorithms from high-level synthesis

- 65 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Taskgraph Assumption: execution time = 1 for all tasks a bcdefg hij klm n z

- 66 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund As soon as possible (ASAP) scheduling ASAP: All tasks are scheduled as early as possible Loop over (integer) time steps:  Compute the set of unscheduled tasks for which all predecessors have finished their computation  Schedule a selected subset of these tasks to start at the current time step.

- 67 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund As soon as possible (ASAP) scheduling: Example  =0  =2  =3  =4  =5 a bcdefg hij klm n z  =1

- 68 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund As-late-as-possible (ALAP) scheduling ALAP: All tasks are scheduled as late as possible Start at last time step*: Schedule tasks with no successors and tasks for which all successors have already been scheduled. * Generate a list, starting at its end

- 69 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund As-late-as-possible (ALAP) scheduling: Example  =0  =2  =3  =4  =5 Start a bcdefg hij klm n z  =1

- 70 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund (Resource constrained) List Scheduling List scheduling: extension of ALAP/ASAP method Preparation:  Topological sort of task graph G=(V,E)  Computation of priority of each task: Possible priorities u: Number of successors Longest path Mobility =  (ALAP schedule)-  (ASAP schedule) Source: Teich: Dig. HW/SW Systeme

- 71 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Mobility as a priority function urgent less urgent Mobility is not very precise  =1  =2  =3  =4  =5  =1  =2  =3  =4  =5 a bcdefg hij klm n z  =0 a bcdefg hij klm n z

- 72 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Algorithm List(G(V,E), B, u){ i:=0; repeat { Compute set of candidate tasks A i ; Compute set of not terminated tasks G i ; Select S i  A i of maximum priority r such that |S i | + |G i | ≤ B (*resource constraint*) foreach (v j  S i ):  (v j ):=i; (*set start time*) i:=i+1; } until (all nodes are scheduled); return (  ); } Complexity: O(|V|) may be repeated for different task/ processor classes

- 73 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Example Assuming B=2, unit execution time and u: path length u(a)=u(b)=4 u(c)=u(f)=3 u(d)=u(g)=u(h)=u(j)=2 u(e)=u(i)=u(k)=1  i: G i =0 ab i cf g hj k d e a b c f g d e h i j k  =0  =1  =2  =3  =4  =5 Modified example based on J. Teich

- 74 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund (Time constrained) Force-directed scheduling Goal: balanced utilization of resources Based on spring model; Originally proposed for high-level synthesis * [Pierre G. Paulin, J.P. Knight, Force-directed scheduling in automatic data path synthesis, Design Automation Conference (DAC), 1987, S. 195-202] © ACM

- 75 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Phase 1: Generation of ASAP and ALAP Schedule  =1  =2  =3  =4  =5  =1  =2  =3  =4  =5 a bcdefg hij klm n z  =0 a bcdefg hij klm n z

- 76 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Phase 2: Compution of Distribution Graphs D(i) R(j)={  (ASAP(j))..  (ALAP(j)) }  =1  =2  =3  =4  =5  =1  =2  =3  =4  =5 a bcdefg hij klm n z  =0 a bcdefg hij klm n z 0 1 2 3 4 5 23145 i

- 77 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Next: computation of “forces”  Direct forces push each task into the direction of lower values of D(i).  Impact of direct forces on dependent tasks taken into account by indirect forces  Balanced resource usage  smallest forces  For our simple example and time constraint=6: result = ALAP schedule 0 1 2 3 4 5 23145 i  =1  =2  =3  =4  =5 a bcdefg hij klm n z  =0

- 78 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Overall approach procedure forceDirectedScheduling; begin AsapScheduling; AlapScheduling; while not all tasks scheduled do begin select task T with smallest total force; schedule task T at time step minimizing forces; recompute forces; end; end May be repeated for different task/ processor classes Not sufficient for today's complex, heterogeneous hardware platforms

- 79 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Trend: multiprocessor systems-on-a-chip (MPSoCs) http://www.mpsoc-forum.org/2007/slides/Hattori.pdf

- 80 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Multiprocessor systems-on-a-chip (MPSoCs) (2) http://www.mpsoc-forum.org/2007/slides/Hattori.pdf

- 84 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Summary Mapping of applications to platforms  Scheduling algorithms for periodic task sets Earliest Deadline First (EDF)  Preemptive scheduling + mutexes  Priority inversion The priority inheritance protocol (PIP) reduces problems. However, PIP adds to the complexity. Better avoid this effect altogether (e.g. by using a DF MoC).  Scheduling for dependent task sets ASAP, ALAP, list scheduling, force directed scheduling  Architectures of Multiprocessor Systems on a Chip (MPSoCs)

- 85 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Questions (if on schedule)? Q&A?

- 87 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Hardware/software partitioning  Functionality to be implemented in software or in hardware? Need to consider special purpose hardware in the long run? “No”, for fixed functionality, but “yes” in general, since “By the time MPEG-n can be implemented in software, MPEG- n+1 has been invented” [de Man]

- 88 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Hardware/software partitioning: approach [Niemann, Hardware/Software Co-Design for Data Flow Dominated Embedded Systems, Kluwer Academic Publishers, 1998 (Comprehensive mathematical model)] Specification Mapping Inputs to COOL: 1.Target technology 2.Design constraints 3.Required behavior

- 89 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Steps of the COOL partitioning algorithm (1) 1.Translation of the behavior into an internal graph model 2.Translation of the behavior of each node from VHDL into C 3.Compilation All C programs compiled for the target processor, Computation of the resulting program size, estimation of the resulting execution time (simulation input data might be required) 4.Synthesis of hardware components:  leaf nodes, application-specific hardware is synthesized. High-level synthesis sufficiently fast.

- 90 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Steps of the COOL partitioning algorithm (2) 5.Flattening of the hierarchy: Granularity used by the designer is maintained. Cost and performance information added to the nodes. Precise information required for partitioning is pre- computed 6.Generating and solving a mathematical model of the optimization problem: Integer programming IP model for optimization. Optimal with respect to the cost function (approximates communication time)

- 91 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Steps of the COOL partitioning algorithm (3) 7.Iterative improvements: Adjacent nodes mapped to the same hardware component are now merged. 8.Interface synthesis: After partitioning, the glue logic required for interfacing processors, application-specific hardware and memories is created.

- 92 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund An integer linear programming model for HW/SW partitioning Notation:  Index set I denotes task graph nodes.  Index set L denotes task graph node types e.g. square root, DCT or FFT  Index set KH denotes hardware component types. e.g. hardware components for the DCT or the FFT.  Index set J of hardware component instances  Index set KP denotes processors. All processors are assumed to be of the same type

- 93 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund An ILP model for HW/SW partitioning  X i,k : =1 if node v i is mapped to hardware component type k  KH and 0 otherwise.  Y i,k : =1 if node v i is mapped to processor k  KP and 0 otherwise.  NY ℓ,k =1 if at least one node of type ℓ is mapped to processor k  KP and 0 otherwise.  T is a mapping from task graph nodes to their types: T: I  L  The cost function accumulates the cost of hardware units: C = cost(processors) + cost(memories) + cost(application specific hardware)

- 94 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Constraints  Operation assignment constraints All task graph nodes have to be mapped either in software or in hardware. Variables are assumed to be integers. Additional constraints to guarantee they are either 0 or 1:

- 95 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Operation assignment constraints (2)  ℓ  L,  i:T(v i )=c ℓ,  k  KP: NY ℓ,k  Y i,k For all types ℓ of operations and for all nodes i of this type: if i is mapped to some processor k, then that processor must implement the functionality of ℓ. Decision variables must also be 0/1 variables:  ℓ  L,  k  KP: NY ℓ,k  1.

- 96 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Scheduling Processor p 1 ASIC h 1 FIR 1 FIR 2 v1v1 v2v2 v3v3 v4v4 v9v9 v 10 v 11 v5v5 v6v6 v7v7 v8v8 e3e3 e4e4 t p1p1 v8v8 v7v7 v7v7 v8v8 or... t c1c1 or... e3e3 e3e3 e4e4 e4e4 t FIR 2 on h 1 v4v4 v3v3 v3v3 v4v4 or... Communication channel c 1

- 97 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Scheduling / precedence constraints  For all nodes v i1 and v i2 that are potentially mapped to the same processor or hardware component instance, introduce a binary decision variable b i1,i2 with b i1,i2 =1 if v i1 is executed before v i2 and = 0 otherwise. Define constraints of the type (end-time of v i1 )  (start time of v i2 ) if b i1,i2 =1 and (end-time of v i2 )  (start time of v i1 ) if b i1,i2 =0  Ensure that the schedule for executing operations is consistent with the precedence constraints in the task graph.  Approach just fixes the order of execution and avoids the complexity of computing start times during optimization.

- 98 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Example HW types H1, H2 and H3 with costs of 20, 25, and 30. Processors of type P. Tasks T1 to T5. Execution times: TH1H2H3P 120100 220100 31210 41210 520100

- 99 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Operation assignment constraints (1) TH1H2H3P 120100 220100 31210 41210 520100 X 1,1 +Y 1,1 =1 (task 1 mapped to H1 or to P) X 2,2 +Y 2,1 =1 X 3,3 +Y 3,1 =1 X 4,3 +Y 4,1 =1 X 5,1 +Y 5,1 =1

- 100 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Operation assignment constraints (2) Assume types of tasks are ℓ =1, 2, 3, 3, and 1.  ℓ  L,  i:T(v i )=c ℓ,  k  KP: NY ℓ,k  Y i,k Functionality 3 to be implemented on processor if node 4 is mapped to it.

- 101 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Other equations Time constraints leading to: Application specific hardware required for time constraints under 100 time units. TH1H2H3P 120100 220100 31210 41210 520100 Cost function: C=20 #(H1) + 25 #(H2) + 30 # (H3) + cost(processor) + cost(memory)

- 102 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Result For a time constraint of 100 time units and cost(P)<cost(H3): TH1H2H3P 120100 220100 31210 41210 520100 Solution (educated guessing) : T1  H1 T2  H2 T3  P T4  P T5  H1

- 103 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Application example Audio lab (mixer, fader, echo, equalizer, balance units); slow SPARC processor 1µ ASIC library Allowable delay of 22.675 µs (~ 44.1 kHz) SPARC processor ASIC (Compass, 1 µ) External memory Outdated technology; just a proof of concept.

- 104 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Design space for audio lab Everything in software: 72.9 µs, 0 2 Everything in hardware: 3.06 µs, 457.9x10 6 2 Lowest cost for given sample rate: 18.6 µs, 78.4x10 6 2,

- 105 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund HW/SW partitioning in the context of mapping applications to processors  Handling of heterogeneous systems  Handling of task dependencies  Considers of communication (at least in COOL)  Considers memory sizes etc (at least in COOL)  For COOL: just homogeneous processors  No link to scheduling theory  Still handles just a single processor type

- 106 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Survey of Mapping Techniques 1st Workshop on Mapping Applications To MPSoCs, Rheinfels castle, June, 2008 Information: http://www.artist-embedded.org/artist/ -Mapping-of-Applications-to-MPSoCs-.html  Automatic parallelization of C-code (see talk by Heiko Falk)  Work at ETH Zürich (SPEA2, Lothar Thiele)  Work at RWTH Aachen (MAPS, Leupers)  Work at Leiden University (Daedalus, Ed Deprettere)  Mapping to the CELL processor (U. Bologna and others)  Work at IMEC (D. Verkest)

- 107 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Daedalus Design-flow System-level synthesis Library of IP cores Platform specification Sequential application Parallel application specification Automatic Parallelization High-level Models Mapping specification System-level design space exploration Explore, modify, select instances Multi-processor System on Chip (Synthesizable VHDL and C/C++ code for processors) RTL-level Models Common XML Interface Library of IP cores KPNgen Sesame ESPAM Xilinx Platform Studio (XPS) RTL-Level Specification System-Level Specification Synthesizable VHDL C/C++ code for processors MP-SoC Kahn Process Network Sequential application © E. Deprettere, U. Leiden

- 108 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Example architecture instances for a single-tile JPEG encoder: JPEG/JPEG200 case study 2 MicroBlaze processors (50KB)1 MicroBlaze, 1HW DCT (36KB) 6 MicroBlaze processors (120KB) 4 MicroBlaze, 2HW DCT (68KB) Vin 8KB 4x2KB 4x16KB 32KB 16KB32KB 2KB 32KB 2KB 8KB 32KB 4KB VLE, Vout DCT, Q Vin,DCTQ,VLE,VoutVin,Q,VLE,Vout Vin Q Q VLE, Vout DCT © E. Deprettere, U. Leiden

- 109 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Sesame DSE results: Single JPEG encoder DSE © E. Deprettere, U. Leiden

- 110 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Work at Zürich Mapping Scenario: Overview Given 1.specification of the task structure (task model) = for each flow the corresponding tasks to be executed 2.different usage scenarios (flow model) Sought processor implementation (resource model) = architecture* + task mapping + scheduling Objectives: 1.maximize performance 2.minimize cost Subject to: 1.memory constraints 2.delay constraints *: 2 cases: 1.fixed architecture 2.architecture to be designed based on Thiele’s slides (performance model)

- 111 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Design Space Scheduling/Arbitration proportional share WFQ staticdynamic fixed priority EDF TDMA FCFS Communication Templates Architecture # 1 Architecture # 2 Computation Templates DSP EE Cipher SDRAM RISC FPGA LookUp DSP TDMA Priority EDF WFQ RISC DSP LookUp Cipher EE EE EE EE EE EE static Which architecture is better suited for our application? © L. Thiele, ETHZ

- 112 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Application Model Example of a simple stream processing task structure: © L. Thiele, ETHZ

- 113 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Practical problem in automotive design Which processor should run the software?

- 114 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund © L. Thiele, ETHZ

- 117 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund © L. Thiele, ETHZ 

- 118 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund © L. Thiele, ETHZ   

- 120 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Exploration Cycle EXPO – Tool architecture (1) MOSES EXPOSPEA 2 selection of “good” architectures system architecture performance values task graph, scenario graph, flows & resources © L. Thiele, ETHZ

- 121 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund SYMTA/S: System optimization using evolutionary algorithms [R. Ernst et al.: A framework for modular analysis and exploration of heteterogenous embedded systems, Real-time Systems, 2006, p. 124]

- 128 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Results Performance for encryption/decryption Performance for RT voice processing © L. Thiele, ETHZ

- 129 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Summary Mapping applications for complex heterogeneous multiprocessor platforms needs  allocation (if hardware is not fixed)  binding of tasks to resources  scheduling Approaches presented  HW/SW codesign tool COOL  Daedalus (briefly)  Symta/S (briefly)  SPEA2: Evolutionary algorithms in use at ETH Zürich

- 130 - technische universität dortmund fakultät für informatik  p. marwedel, informatik 12, 2008 TU Dortmund Brief break (if on schedule) Q&A?

Fakultät für informatik informatik 12 technische universität dortmund Mapping: Applications  Platforms - Sessions 7-9 - Peter Marwedel TU Dortmund Informatik.

Ähnliche Präsentationen

Präsentation zum Thema: "Fakultät für informatik informatik 12 technische universität dortmund Mapping: Applications  Platforms - Sessions 7-9 - Peter Marwedel TU Dortmund Informatik."— Präsentation transkript:

Ähnliche Präsentationen

Über Projekt

Feedback

Anmelden

Anmeldung über soziales Netzwerk:

Fakultät für informatik informatik 12 technische universität dortmund Mapping: Applications  Platforms - Sessions 7-9 - Peter Marwedel TU Dortmund Informatik.

Ähnliche Präsentationen

Präsentation zum Thema: "Fakultät für informatik informatik 12 technische universität dortmund Mapping: Applications  Platforms - Sessions 7-9 - Peter Marwedel TU Dortmund Informatik."— Präsentation transkript:

Ähnliche Präsentationen

Über Projekt

Feedback