Präsentation zum Thema: "Problems, Solutions, Limitations"— Präsentation transkript:
1 Problems, Solutions, Limitations Transient and Permanent Faults in Nanoelectronic ICs: Compensation and RepairProblems, Solutions, LimitationsH. T. VierhausBTU CottbusComputer Engineering
2 Outline 1. Introduction: Nanostructure Problems 2. Transient Faults 3. Repair of Permanent Faults4. Bus Structures and NoCs5. Diagnostic Test6. A Lot of Things to do ...
3 1. IntroductionA bunch of new problems from nanostructures ...
4 Nanoelectronic Problems Lithography:The wavelength used to „map“ structural information frommasks to wafers is larger (4 times of more) than the minimumstructural features (193 versus 90 / 65 / 45 nm).Adaptation of layouts for correction of mapping faultsParameter variations:The number of atoms in MOS- transistor channels becomes sosmall that statistical variations of doping densities have an impacton device parameters such as threshold voltages.
5 Doping Fluctuations in MOS Transistors -SubstratenPolySidoping atomDensity and distribution of doping atomscause shifts in transistor threshold voltages!
6 Nanostructure Problems Individual device characteristics such as Vth are more dependenton statistical variations of underlying physical features suchas doping profiles.A significant share of basic devices will be „out or specs“ and needsa replacement by backup elements for yield improvement afterproduction.As smaller features mean higher stress (field strength, currentdensity), also early failures „in the field“ are more likely and mustbe compensated.Transient error recognition and compensation „in time“ is becoming a must due to e. g. charged particles that can discharge circuit nodes.
7 Key Technologies Fault tolerant computing Is required to handle intermittent and transient fault effects, e.g. induced by radiation.An old technology that is already heavily used in every day computing(e.g. memory interfaces with ECC- check and correction).Can handle only a limited number of permanent faults!Built-in self test (BIST) and self-repair (BISR)Is required to handle permanent faults by self-repair using redundant elements.State-of-the-art for memories, not for logic.Can handle multiple faults (sequentially) until the resource of redundancy is exhausted.Algorithms that are fully or partially „fault hard“Most DSP algorithms show an inherent „stability“ and work even underfault conditions with reduced precision. The effect can be „HW-enhanced“.
8 System-on-a Chip (SoC) SoCs are heterogeneoussystems that requiretest & repair strategies for:- logic (also in processors)- memory blocks- interconnectsanalog and D/Acomponents
12 Contribution to Soft-Error Rates Static combinational logic: %Sequential elements (FFs, Latches): 49 %Unprotected SRAM: %Source: S. Mitra, N. Seifert, M. Zhang, Q. Shi, K. S. Kim,„Robust System Design with Built-In Soft Error Resilience“IEEE Computer, Vol. 38, No.2, Febr. 2005, pp
13 Spikes and Clock Rates in Logic Source: Pulse of 100psCharge-/statusrestorationis possibleclocktclockCharge-/statusrestorationis impossibletFault probability is digital logic is about proportionalto clock frequency!
14 Logic Structures and Fault Events Particle-radiationInput-FFsOutputFFsFlip-flops need fault tolerance / fault hardeningin the first place, logic close-to outputs comes next.
16 Fault-Tolerant Latch Design outl1Latch1outinMullerC-ElementLatch2outl2If clock is high: out = inCLoutl1= inoutl1,outl2latchedoutl1= inoutl2= inoutl2= inv(t)clockt
17 Fault Handling Muller-C-Element: If both inputs are equal: out = outl1, outl2If both element are not equal: out = previous (outl1, outl2)Under local fault conditions on the latch outputs(one of 2 latches false), the C-element preserves the outputcondition from the „charge“ phase of the latch.Essentially 3 latches!
19 Intel‘s Scan Path Element plus Fault Compensation
20 TMR-Latch / Flip-Flop Works with latches or flip-flops - inFF1Out = L1out with cout = 1MUXOut = L2out with cout = 0FF2XORcoutFF3clockWorks with latches or flip-flops-Can compensate static or dynamic faults in latches / FFs!FF1 is untestable (active redundancy)
24 Fault Compensation in Combinational Logic Particle-radiationInput-FFsMCDMCDMCD
25 Fault Compensation in Combinational Logic fault-free signalV(t)tSignal with glitchV(t)LatchclosetSignal with delayed glitchTime leftto capture!V(t)tMC captureMC no capture /holdMC capture
26 3. Repair of Permanents Faults Compensation of transient faults is not enough.Some technologies for transient compensation can handlepermanent faults, too, but not on the long run and withadditional transient faults!
32 Replacement in Regular Structures (e.g. for DSP)
33 Parallel Backup Transistors VDDVDDoutin1outin1redundanttransistorsin2in2GNDGNDBasic gateGate with redundant transistors
34 Redundancy by „Active“ Parallel Transistors Active redundancy is not testable. Therefore there is no way to monitor the status of „available“ redundancy in a logic circuit.Parallel transistors cannot compensate a fault of the „stuck-on“ type (transistor always conducting).Faulty „backup“-transistors may produce additionalfaults that cannot be corrected!Adding redundancy is not enough,fault isolation is a real problem!
35 Configuration and Fault Isolation VDDstuck-onfaultin1outin2GND
36 The Gate-Short-Problem Load1DriverLoad2Gate-shortGND-shorts of input gates affect the whole fan-innetwork and make redundancy obsolete!!
43 Features Use „normal“ cell designs Four states of operation: Config. 1: Gate 1 active, Gate 2 isolatedConfig. 2: Gate 2 active, Gate 1 isolatedConfig. 3: Both Gates active operating in parallelConfig. 4: Both Gates isolated from VDD / GNDOperations like „high / low power“ possible.Cells can be put to temporary „sleep“ for stress relieve.Permanent repair functions.Active cell output is connected only to „floating“outputs of the other cell.If twin tubs are used and cell-internal tubs arealso disconnected, gate input / GND short prohibited.
45 Cell Duplication and Power Switch Possible for all types of cells (also flip-flops).Granularity of partitioning for replacements (single gates,blocks) can be selected upon demand.Combination with dynamic circuit optimization is favorablypossible.Good coverage potential for transistor faults.Significant overhead (above 100 %), but most likely belowTriple Modular Redundancy (TMR).Redundancy may become exhausted and requires a further levelof redundancy!
47 Regular Logic Wiring logic gates next cell drive next cell backup cell linkdrivefeednextcellConfig Blockbackupcelllinknext cell
48 Faults on Irregular Interconnects Routing treeCsignalsourceSCCsingle fault(line break)C
49 Redundant Wiring Routing tree with loops .. plus double vias! C S C C extrawire.. plus double vias!CsignalsourceSCCsingle fault(line break)CProblem: classic delay calculation works well on trees only!
50 4. Bus Structures and „Networks on Chip“ (NoCs) Technology forecasts predict that nano-wires may becomethe most vulnerable and unreliable circuit elements ...
51 Buses versus NoCs Regular network structure Irregular bus structure nodeNoCnodeNoCnodeBusmasterBusmasterNoCnodeNoCnodeNoCnodeBusmasterBusmasterBusmasterNoCnodeNoCnodeNoCnodeIrregular bus structure(SoC)Regular network structure(NoC)
52 Faults on Bus Structures BM1BM3BM5BM2BM4BM6Local defectaffecting thetotal network
53 Bus Fault ConditionsTechnology forecasts predict a reliability problem withinterconnects (nano-wires) in nano-technologies.A single permanent fault on a bus may affect the busas a whole.Fault detection and compensation by methods developedfor transient faults (Hamming code, ECC-checks) can handlestatic faults, but are relatively expensive.Capabilities of handling transient faults on top of permanentfaults are limited.
54 Bus Segmentation BM 1 BM 3 BM 5 SC SC SC segment couplers S C S C S C 2BM4BM6Structure the bus into segments that can be repairedindividually!
55 The Switching Problem n n+k n k p switches contr. states n backup 1 p 81116916113233322212865
56 Faults and Repair Actions 1. Line- break: Section of a line is interrupteduse spare wire!2. Line- short to GND: Section of a line is connected to GNDuse spare wire!3. Dynamic coupling between adjacent line:a. Re-allocate lines in bundleb. Insert grounded line for decoupling4. Bridge between lines:a. Feed both lines with same signalb. Make one line „floating“
57 Single Line Replacement FaultSignals0s1s2s3s4(k-1)Bachupb0b1b2Overhead:2k switches, (k+1) logic states for 1 backup line2pk switches, p (k+1) logic states for p backup lines
58 Inserting Lines for Decoupling faultSignals0s1s2s3s4(k-1)Backupb0b1b2Multiple line insertion for de-coupling requires multipleShifts of lines, multiple switches and states!
59 Repair MechanismsBuses with „extra“ backup lines that need specific configurationfor repair generate high cost in terms of switches andadministration due to many „logic states“ of the bus section.Such repair schemes are not suited to re-organize neighborhoodrelations on buses for de-coupling of lines.Try to cover all relevant fault conditions by a small set ofstates using permutation of lines!
60 Reconfiguration for De-Coupling 2-Way Switchesmay be used!iikkiikk… can help to minimize dynamic coupling faults!
61 Characteristics of 6 / 8 Wire Bundles Given a bundle of 6 or 8 bus lines:Are there any permutations that create all-new neighborsfor every single line in order to eliminate coupling faults?NNP6 NNP NNP82 NNP836 lines lines0 - 20 - 20 - 30 - 51 - 61 - 51 - 71 - 42 - 02 - 72 - 42 - 03 - 53 - 03 - 63 - 54 - 74 - 64 - 24 - 15 - 35 - 15 - 06 - 16 - 46 - 35 - 37 - 47 - 27 - 1
63 Selection of Permutations All single faults must be repairable by selectinga minimum set of permutations.Those lines that can act as replacement for most of theothers are selected for „backup lines“.By permutation, also non-faulty functional lines arere-arranged.No permutation used for repair must map a functionalline to a faulty line.
68 Overhead / Coverage for 8-Line-Bundle Spare Lines (out of 8) / SwitchesFaults0/ / / 323 / 324/ 32Singlefine fault-++++Dyn. coupl.fault+++ +++++Doubleline faults--20%30%100 %Note: The number of switches is reduced by a factorof 2 if full 2-way-switches with 2 inputs / 2 outputs are used!
69 Results Bus segments can favorably be organized into bundles of 8 lines for reconfiguration. Wider bundles require evenmore columns of switches.In a bundle of 8 lines, all single faults can be repairedeither by one backup line and 3 columns of switches ortwo backup lines and 2 columns with 6 / 4 logic states.Two columns with 4 states also allow for two alternativemodes of changing neighborhood relations for de-coupling.It also covers a fraction of double-line faults.A full coverage of double-line-faults requires 4 backup linesand 2 columns of switches or 2 backup lines and 4 columns.
71 Processor-Based Bus Test MasterBusreflectordatalinesBusBusMasterMasterreflectorselectTestProcessorinvertcontrolclock
72 Test and Fault Diagnosis S CBMSCSCBMS CBMSCSCBMBMSCSCSCSCBMTestProcessorSegmentStatusList
73 Test Procedure & Fault Management Upcoming:Test Procedure & Fault ManagementTest-Processor can „reset“ control of bus sections.Test processor runs diagnostic test to identify faulty lines.In case of faults, „trial and error“ test to identifyfaulty line segment(s).Test Processor keeps „fault list“ for redundancymanagement & supervision.
74 Summary A simple scheme of re-arranging bus sections for repair of permanent faults.Simple control scheme based on few logic states.Modular approach based on bundles of lines is scalable tocover wider buses. Should work well with NoCs.Compatibility with regular schemes for bus test based on adedicated test processor device.The number and the electrical effect of switches in complexbus systems may still cause problems.
75 5. Diagnostic TestsFault diagnosis by diagnostic (self-) test is possibly the real bottleneck in logic BISR!
76 Fault Diagnosis Memory cells are either to diagnose in case of faults affecting single cells. BIST is possible.Diagnostic tests of buses that have to discover a singlefaulty line are straightforward. They can easily find whichwires are affected, but not where the fault is.Detecting a fault gate or even transistor in a logic blockis a much more challenging problem. Diagnosis must becompatible with methods of test response compaction usedin scan testing.Intelligent encoding for test responses!... such as done by U. Potsdam and Infineon!
77 Combinational Logic Fault Diagnosis Input-FFsOutputFFsFaults can occur within specific gates, on interconnects,or in a „distributed“ manner. Identifying a specific fault gate or line isnot easy at best and sometimes close-to impossible by logic testing.
80 Test response compactor Scan-based Logic TestCompacted / encoded test informationDe-compactorC LC LCodingTest response compactorDiagnosis
81 Fault Diagnosis on Compacted Output Data Scan inputGenerator(De-Compactor)scan clockd-valueddddddd123456storage&&&&&&&MISRRef. MISRMISRclock: k *scan-clockcompare*patented, U. Potsdam andInfineon TechnologiesAG
82 6. A Lot of Work to Do Logic fault diagnosis Efficient logic self repairRedundancy supervision and managementResource management under fault conditionsRepair functions for interconnectsOverall system-level fault management