Die Präsentation wird geladen. Bitte warten

Die Präsentation wird geladen. Bitte warten

Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic.

Ähnliche Präsentationen


Präsentation zum Thema: "Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic."—  Präsentation transkript:

1 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic ICs: Compensation and Repair Problems, Solutions, Limitations H. T. Vierhaus BTU Cottbus Computer Engineering

2 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Outline 1. Introduction: Nanostructure Problems 3. Repair of Permanent Faults 4. Bus Structures and NoCs 5. Diagnostic Test 6. A Lot of Things to do Transient Faults

3 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus 1. Introduction A bunch of new problems from nanostructures...

4 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Nanoelectronic Problems Lithography: The wavelength used to map structural information from masks to wafers is larger (4 times of more) than the minimum structural features (193 versus 90 / 65 / 45 nm). Adaptation of layouts for correction of mapping faults Parameter variations: The number of atoms in MOS- transistor channels becomes so small that statistical variations of doping densities have an impact on device parameters such as threshold voltages.

5 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Doping Fluctuations in MOS Transistors p-Substrate nn Poly-Si doping atom p-Substrate nn Poly-Si doping atom Density and distribution of doping atoms cause shifts in transistor threshold voltages!

6 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Nanostructure Problems Individual device characteristics such as V th are more dependent on statistical variations of underlying physical features such as doping profiles. A significant share of basic devices will be out or specs and needs a replacement by backup elements for yield improvement after production. As smaller features mean higher stress (field strength, current density), also early failures in the field are more likely and must be compensated. Transient error recognition and compensation in time is becoming a must due to e. g. charged particles that can discharge circuit nodes.

7 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Fault tolerant computing An old technology that is already heavily used in every day computing (e.g. memory interfaces with ECC- check and correction). Is required to handle intermittent and transient fault effects, e.g. induced by radiation. Can handle only a limited number of permanent faults! Built-in self test (BIST) and self-repair (BISR) Is required to handle permanent faults by self-repair using redundant elements. State-of-the-art for memories, not for logic. Can handle multiple faults (sequentially) until the resource of redundancy is exhausted. Algorithms that are fully or partially fault hard Most DSP algorithms show an inherent stability and work even under fault conditions with reduced precision. The effect can be HW-enhanced. Key Technologies

8 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus System-on-a Chip (SoC) SoCs are heterogeneous systems that require test & repair strategies for: - logic (also in processors) - memory blocks - interconnects - analog and D/A components

9 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Fault Tolerant Computing Fault event Software-based fault detection & compensation HW logic & RT-level detection & compensation Works only for transient faults! Typically works for transient and permanent faults! Transistor-and switch level compensation Typically works for specific types of transient faults only! specific very specific universal

10 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus 2. Transient Fault Effects

11 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Storage Nodes and Particles Q /fC Technology 0,350,25 0,180,09 1 MeV Alpha-Particle generates 42fCCharge! Alpha-Part.

12 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Contribution to Soft-Error Rates Static combinational logic: 11 % Sequential elements (FFs, Latches): 49 % Unprotected SRAM: 40 % Source: S. Mitra, N. Seifert, M. Zhang, Q. Shi, K. S. Kim, Robust System Design with Built-In Soft Error Resilience IEEE Computer, Vol. 38, No.2, Febr. 2005, pp

13 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Spikes and Clock Rates in Logic Source: Pulse of 100 ps t clock t Charge-/status restoration is possible Charge-/status restoration is impossible Fault probability is digital logic is about proportional to clock frequency!

14 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Logic Structures and Fault Events Input-FFs Output FFs Particle- radiation Flip-flops need fault tolerance / fault hardening in the first place, logic close-to outputs comes next.

15 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Muller-C-Element

16 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Fault-Tolerant Latch Design Latch 1 Latch 2 Muller C-Element out CL in t v(t) clock outl1 outl2 outl1= in outl2= in outl1, outl2 latched outl1= in outl2= in If clock is high: out = in

17 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Fault Handling Muller-C-Element: If both inputs are equal: out = outl1, outl2 If both element are not equal: out = previous (outl1, outl2) Under local fault conditions on the latch outputs (one of 2 latches false), the C-element preserves the output condition from the charge phase of the latch. Essentially 3 latches!

18 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Intels Scan Path Element

19 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Intels Scan Path Element plus Fault Compensation

20 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus TMR-Latch / Flip-Flop XOR in FF1 FF2 FF3 Out = L1out with cout = 1 MUX cout Out = L2out with cout = 0 clock Can compensate static or dynamic faults in latches / FFs! Works with latches or flip-flops - FF1 is untestable (active redundancy)

21 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus TMR-Scan-Element

22 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus TMR Scan-Element Fault tolerant in functional mode Fault tolerant in scan-mode Optional support of test strategies that require a specific sequence of 2 input bits!

23 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Fault tolerant Latches and FFs

24 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Fault Compensation in Combinational Logic Input-FFs D MC D D

25 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Fault Compensation in Combinational Logic V(t) fault-free signal Signal with glitch Signal with delayed glitch MC capture MC no capture / hold MC capture t t t Latch close Time left to capture!

26 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus 3. Repair of Permanents Faults Compensation of transient faults is not enough. Some technologies for transient compensation can handle permanent faults, too, but not on the long run and with additional transient faults!

27 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Memory Test & Repair Lines columns Line address Read-/ Write lines spare column

28 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Memory Test & Repair (2) Lines columns Line address Read-/ Write lines spare column Memory BIST controller... is already state-of-the-art!

29 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Logic Self Repair

30 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Granularity of Replacement

31 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Levels of Repair

32 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Replacement in Regular Structures (e.g. for DSP)

33 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Parallel Backup Transistors VDD GND VDD GND outin1 in2 in1 in2 out redundant transistors Basic gate Gate with redundant transistors

34 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Redundancy by Active Parallel Transistors Active redundancy is not testable. Therefore there is no way to monitor the status of available redundancy in a logic circuit. Parallel transistors cannot compensate a fault of the stuck-on type (transistor always conducting). Faulty backup-transistors may produce additional faults that cannot be corrected! Adding redundancy is not enough, fault isolation is a real problem!

35 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Configuration and Fault Isolation VDD GND outin1 in2 stuck-on fault

36 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus The Gate-Short-Problem Load 1 Load 2 Driver Gate- short GND-shorts of input gates affect the whole fan-in network and make redundancy obsolete!!

37 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Gate Turn-off

38 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Schematic Layout with VDD/GND Switches Gate with parallel redundancy Gate with parallel redundancy and fault isolation

39 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transistor-Level Overhead Overhead (cells only) parallel transistors VDD / GND switches separate gate poly lines stuck-off coverage stuck-on coverage gate shorts cov. control 30-40%60-80 % % yes noyes no yes noneone wiremult. wires Redundancy lines estimates

40 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Duplicate Standard Cells VDD -Switch control VDD1 GND out in1 in2 Gate1 out VDD2 GND in1 in2 Gate2

41 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Again: Fault Isolation VDD -Switch control VDD1 GND out in1 in2 Gate1 out VDD2 GND in1 in2 Gate2 Output VDD / GND short Gate input short

42 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Administrated Duplicate Cells Gate1 2 GND VDD VDD1VDD2 Act1 GND1 GND2 Gate short gatein gate out gatein gate out powerswitches GNDswitchesAct X 0 X X 1 X

43 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Features Use normal cell designs Four states of operation: Config. 1: Gate 1 active, Gate 2 isolated Config. 2: Gate 2 active, Gate 1 isolated Config. 3: Both Gates active operating in parallel Config. 4: Both Gates isolated from VDD / GND Operations like high / low power possible. Cells can be put to temporary sleep for stress relieve. Permanent repair functions. Active cell output is connected only to floating outputs of the other cell. If twin tubs are used and cell-internal tubs are also disconnected, gate input / GND short prohibited.

44 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Bistable Switching Cell Gate1 2 GND VDD Act Output separation

45 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Cell Duplication and Power Switch Possible for all types of cells (also flip-flops). Granularity of partitioning for replacements (single gates, blocks) can be selected upon demand. Combination with dynamic circuit optimization is favorably possible. Good coverage potential for transistor faults. Significant overhead (above 100 %), but most likely below Triple Modular Redundancy (TMR). Redundancy may become exhausted and requires a further level of redundancy!

46 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Gate - Replacement Std cells (gates) Gate- fault backup- cell Insertion of replacement cell

47 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Regular Logic Wiring feed drive Config Block logic gates link backup cell link next cell next cell

48 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Faults on Irregular Interconnects S signal source C C C C Routing tree single fault (line break)

49 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Redundant Wiring S signal source C C C C Routing tree with loops single fault (line break) extra wire.. plus double vias! Problem: classic delay calculation works well on trees only!

50 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus 4. Bus Structures and Networks on Chip (NoCs) Technology forecasts predict that nano-wires may become the most vulnerable and unreliable circuit elements...

51 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Buses versus NoCs Bus master Bus master Bus master Bus master Bus master NoC node NoC node NoC node NoC node NoC node NoC node NoC node NoC node NoC node Irregular bus structure (SoC) Regular network structure (NoC)

52 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Faults on Bus Structures BM 1 BM 2 BM 3 BM 4 BM 5 BM 6 Local defect affecting the total network

53 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Bus Fault Conditions A single permanent fault on a bus may affect the bus as a whole. Fault detection and compensation by methods developed for transient faults (Hamming code, ECC-checks) can handle static faults, but are relatively expensive. Capabilities of handling transient faults on top of permanent faults are limited. Technology forecasts predict a reliability problem with interconnects (nano-wires) in nano-technologies.

54 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Bus Segmentation BM 1 BM 2 BM 3 BM 4 BM 5 BM 6 SCSC SCSC SC SCSC segment couplers Structure the bus into segments that can be repaired individually!

55 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus The Switching Problem n n+k 1 1 pp n k p switches contr. states n backup

56 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Faults and Repair Actions 1. Line- break: Section of a line is interrupted use spare wire! 2. Line- short to GND: Section of a line is connected to GND use spare wire! 3. Dynamic coupling between adjacent line: a. Re-allocate lines in bundle b. Insert grounded line for decoupling 4. Bridge between lines: a. Feed both lines with same signal b. Make one line floating

57 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Single Line Replacement s0 (k-1) b0 b1 s1 s2 s3 s4 b2 Bachup Signal Overhead: 2k switches, (k+1) logic states for 1 backup line 2pk switches, p (k+1) logic states for p backup lines Fault

58 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Inserting Lines for Decoupling s0 (k-1) b0 b1 s1 s2 s3 s4 b2 Backup Signal coupling- fault Multiple line insertion for de-coupling requires multiple Shifts of lines, multiple switches and states!

59 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Repair Mechanisms Buses with extra backup lines that need specific configuration for repair generate high cost in terms of switches and administration due to many logic states of the bus section. Such repair schemes are not suited to re-organize neighborhood relations on buses for de-coupling of lines. Try to cover all relevant fault conditions by a small set of states using permutation of lines!

60 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Reconfiguration for De-Coupling … can help to minimize dynamic coupling faults! i k i k i k i k 2-Way Switches may be used!

61 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Characteristics of 6 / 8 Wire Bundles Given a bundle of 6 or 8 bus lines: Are there any permutations that create all-new neighbors for every single line in order to eliminate coupling faults? lines 8 lines NNP6 NNP81 NNP82 NNP

62 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus 6 Wires: Permutations and Replacement Input wire mapping 1st switching column 2nd switching column 3rd switching column Replacement possible by lines # (2 sw. col.) Line selected for backup Selected backup lines Permutations Administration: 4 logic states for 2 sw.-columns 6 logic states for 3 sw.-columns 2 extra. wires 1 extra. wire

63 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Selection of Permutations All single faults must be repairable by selecting a minimum set of permutations. Those lines that can act as replacement for most of the others are selected for backup lines. No permutation used for repair must map a functional line to a faulty line. By permutation, also non-faulty functional lines are re-arranged.

64 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Permutations for 8-Wire-Bundles PW1PW2PW NNP1NNP NNP3 New-neighborhoodPair-wise symmetrical

65 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus 8 Wires: Permutations and Replacement Selected backup wires Selected backup 2 lines selected for backup! Permutations

66 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus 8 Wires: Permutations and Replacement 4 lines selected for backup! Permutations

67 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Overhead / Coverage for 6-Line-Bundle Spare. lines / Switches - Single line fault Dyn. coupl. faults Double line faults Faults 0/ 12 1 /36 2 / %

68 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Overhead / Coverage for 8-Line-Bundle Spare Lines (out of 8) / Switches - Single fine fault Dyn. coupl. fault Double line faults Faults 0/ 16 1 /48 2 / % 3 / % % 4/ 32 Note: The number of switches is reduced by a factor of 2 if full 2-way-switches with 2 inputs / 2 outputs are used!

69 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Results Bus segments can favorably be organized into bundles of 8 lines for reconfiguration. Wider bundles require even more columns of switches. In a bundle of 8 lines, all single faults can be repaired either by one backup line and 3 columns of switches or two backup lines and 2 columns with 6 / 4 logic states. Two columns with 4 states also allow for two alternative modes of changing neighborhood relations for de-coupling. It also covers a fraction of double-line faults. A full coverage of double-line-faults requires 4 backup lines and 2 columns of switches or 2 backup lines and 4 columns.

70 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Administration Scheme ABBA C1C2 Switches C2C1 Config-bits Decode Config- Logic Config- Logic Switches Matching in / out in / out lines SC

71 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Processor-Based Bus Test Test Processor Bus Master Bus Master Bus Master clock reflectorselect invertcontrol datalines Busreflector

72 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Test and Fault Diagnosis Test Processor BM SCSC SCSC SCSC SCSC SCSC SCSC SCSC S C SCSC Segment Status List

73 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Upcoming: Test Procedure & Fault Management Test-Processor can reset control of bus sections. Test processor runs diagnostic test to identify faulty lines. In case of faults, trial and error test to identify faulty line segment(s). Test Processor keeps fault list for redundancy management & supervision.

74 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Summary A simple scheme of re-arranging bus sections for repair of permanent faults. Simple control scheme based on few logic states. The number and the electrical effect of switches in complex bus systems may still cause problems. Modular approach based on bundles of lines is scalable to cover wider buses. Should work well with NoCs. Compatibility with regular schemes for bus test based on a dedicated test processor device.

75 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus 5. Diagnostic Tests Fault diagnosis by diagnostic (self-) test is possibly the real bottleneck in logic BISR!

76 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Fault Diagnosis Memory cells are either to diagnose in case of faults affecting single cells. BIST is possible. Diagnostic tests of buses that have to discover a single faulty line are straightforward. They can easily find which wires are affected, but not where the fault is. Detecting a fault gate or even transistor in a logic block is a much more challenging problem. Diagnosis must be compatible with methods of test response compaction used in scan testing. Intelligent encoding for test responses!... such as done by U. Potsdam and Infineon!

77 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Combinational Logic Fault Diagnosis Input-FFs Output FFs Faults can occur within specific gates, on interconnects, or in a distributed manner. Identifying a specific fault gate or line is not easy at best and sometimes close-to impossible by logic testing.

78 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Comb. Logic (pseudo-) inputs (pseudo-) outputs Input vector Output vector Logic Test

79 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Scan Path Technology Comb. Logic (pseudo-) inputs (pseudo-) outputs Input vector Output vector ff Scan-inScan-out

80 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Scan-based Logic Test De-compactor Compacted / encoded test information CLCL CLCL Test response compactor Diagnosis Coding

81 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Fault Diagnosis on Compacted Output Data Scan inputGenerator (De-Compactor) *patented, U. Potsdam andInfineon TechnologiesAG &&&&&&& MISR Ref. MISR compare d 0 d 1 d 3 d 4 d 5 d 6 d 2 d-value storage scan clock MISRclock: k *scan-clock

82 Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus 6. A Lot of Work to Do Logic fault diagnosis Efficient logic self repair Redundancy supervision and management Resource management under fault conditions Repair functions for interconnects Overall system-level fault management


Herunterladen ppt "Lehrstuhl Technische Informatik - Computer Engineering Brandenburgische Technische Universität Cottbus Transient and Permanent Faults in Nanoelectronic."

Ähnliche Präsentationen


Google-Anzeigen