<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="9788770227902.xsl"?>
<book id="home" xmlns:xlink="http://www.w3.org/1999/xlink">
<bookinfo>
<title>Industrial Artificial Intelligence Technologies and Applications</title>
<subtitle></subtitle>
<affiliation><emphasis role="strong">Editors</emphasis></affiliation>
<authorgroup>
<author>
<firstname>Vermesan</firstname>
<surname>Ovidiu</surname>
</author>
<author>
<firstname>Wotawa</firstname>
<surname>Franz</surname>
</author>
<author>
<firstname>Diaz Nava</firstname>
<surname>Mario</surname>
</author>
<author>
<firstname>Debaillie</firstname>
<surname>Bj&#xf6;rn</surname>
</author>
</authorgroup>
<affiliation>SINTEF, Norway</affiliation>
<affiliation>TU Graz, Austria</affiliation>
<affiliation>STMicroelectronics, France</affiliation>
<affiliation>imec, Belgium</affiliation>
<publisher>
<publishername>River Publishers</publishername>
</publisher>
<isbn>9788770227902</isbn>
</bookinfo>
<preface class="preface" id="preface01">
<title>River Publishers Series in Communications and Networking</title>
<para><emphasis>Series Editors</emphasis></para>
<para><emphasis role="strong">ABBAS JAMALIPOUR</emphasis><?lb?><emphasis>The University of Sydney, Australia</emphasis></para>
<para><emphasis role="strong">MARINA RUGGIERI</emphasis><?lb?><emphasis>University of Rome Tor Vergata, Italy</emphasis></para>
<para>The &#x201c;River Publishers Series in Communications and Networking&#x201d; is a series of comprehensive academic and professional books which focus on communication and network systems. Topics range from the theory and use of systems involving all terminals, computers, and information processors to wired and wireless networks and network layouts, protocols, architectures, and implementations. Also covered are developments stemming from new market demands in systems, products, and technologies such as personal communications services, multimedia systems, enterprise networks, and optical communications.</para>
<para>The series includes research monographs, edited volumes, handbooks and textbooks, providing professionals, researchers, educators, and advanced students in the field with an invaluable insight into the latest research and developments.</para>
<para>Topics included in this series include:</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para>Communication theory</para></listitem>
<listitem><para>Multimedia systems</para></listitem>
<listitem><para>Network architecture</para></listitem>
<listitem><para>Optical communications</para></listitem>
<listitem><para>Personal communication services</para></listitem>
<listitem><para>Telecoms networks</para></listitem>
<listitem><para>Wi-Fi network protocols</para></listitem>
</itemizedlist>
<para>For a list of other books in this series, visit <ulink url="https://www.riverpublishers.com">www.riverpublishers.com</ulink></para>
</preface>
<preface class="preface" id="preface02">
<title>Dedication</title>
<para>&#x201c;Without change there is no innovation, creativity, or incentive for improvement. Those who initiate change will have a better opportunity to manage the change that is inevitable.&#x201d;</para>
<para>-William Pollard</para>
<para>&#x201c;The brain is like a muscle. When it is in use we feel very good. Understanding is joyous.&#x201d;</para>
<para>-Carl Sagan</para>
<para>&#x201c;By far, the greatest danger of Artificial Intelligence is that people conclude too early that they understand it.&#x201d;</para>
<para>-Eliezer Yudkowsky</para>
</preface>
<preface class="preface" id="preface03">
<title>Acknowledgement</title>
<para>The editors would like to thank all the contributors for their support in the planning and preparation of this book. The recommendations and opinions expressed in the book are those of the editors, authors, and contributors and do not necessarily represent those of any organizations, employers, or companies.</para>
<para>Ovidiu Vermesan</para>
<para>Franz Wotawa</para>
<para>Mario Diaz Nava</para>
<para>Bj&#xf6;rn Debaillie</para>
</preface>
<preface class="preface" id="preface04">
<title>Contents</title>
<table cellspacing="5" cellpadding="5" frame="none" rules="none">
<tbody>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="preface05">Preface</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="preface06">List of Figures</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="preface07">List of Tables</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="preface08">List of Contributors</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1">1 Benchmarking Neuromorphic Computing for Inference</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-1">1.1 Introduction</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-2">1.2 State of the art in Benchmarking</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-2-1">1.2.1 Machine Learning</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-2-2">1.2.2 Hardware</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-3">1.3 Guidelines</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-3-1">1.3.1 Fair and Unfair Benchmarking</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-3-2">1.3.2 Combined KPIs and Approaches for Benchmarking</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-3-3">1.3.3 Outlook: Use-case Based Benchmarking</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-4">1.4 Conclusion</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-Ref">References</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2">2 Benchmarking the Epiphany Processor as a Reference Neuromorphic Architecture</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-1">2.1 Introduction and Background</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-2">2.2 Comparison with a Few Well-Known Digital Neuromorphic Platforms</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-3">2.3 Major Challenges in Neuromorphic Architectures</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-3-1">2.3.1 Memory Allocation</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-3-2">2.3.2 Efficient Communication</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-3-3">2.3.3 Mapping SNN onto Hardware</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-3-4">2.3.4 On-chip Learning</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-3-5">2.3.5 Idle Power Consumption</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-4">2.4 Measurements from Epiphany</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-5">2.5 Conclusion</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-Ref">References</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3">3 Temporal Delta Layer: Exploiting Temporal Sparsity in Deep Neural Networks for Time-Series Data</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-1">3.1 Introduction</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-2">3.2 Related Works</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-3">3.3 Methodology</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-3-1">3.3.1 Delta Inference</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-3-2">3.3.2 Sparsity Induction Using Activation Quantization</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-3-2-1">3.3.2.1 Fixed Point Quantization</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-3-2-2">3.3.2.2 Learned Step-Size Quantization</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-3-3">3.3.3 Sparsity Penalty</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-4">3.4 Experiments and Results</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-4-1">3.4.1 Baseline</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-4-2">3.4.2 Experiments</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-4-3">3.4.3 Result Analysis</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-5">3.5 Conclusion</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-Ref">References</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4">4 An End-to-End AI-based Automated Process for Semiconductor Device Parameter Extraction</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-1">4.1 Introduction</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-2">4.2 Semantic Segmentation</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-2-1">4.2.1 Proof of Concept and Architecture Overview</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-2-2">4.2.2 Implementation Details and Result Overview</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-3">4.3 Parameter Extraction</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-4">4.4 Conclusion</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-5">4.5 Future Work</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-Ref">References</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch5">5 AI Machine Vision System for Wafer Defect Detection</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch5-1">5.1 Introduction and Background</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch5-2">5.2 Machine Vision-based System Description</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch5-3">5.3 Conclusion</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch5-Ref">References</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6">6 Failure Detection in Silicon Package</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6-1">6.1 Introduction and Background</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6-2">6.2 Dataset Description</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6-2-1">6.2.1 Data Collection &amp; Labelling</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6-3">6.3 Development and Deployment</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6-4">6.4 Transfer Learning and Scalability</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6-5">6.5 Result and Discussion</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6-6">6.6 Conclusion and Outlooks</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6-Ref">References</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7">7 S2ORC-SemiCause: Annotating and Analysing Causality in the Semiconductor Domain</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-1">7.1 Introduction</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-2">7.2 Dataset Creation</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-2-1">7.2.1 Corpus</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-2-2">7.2.2 Annotation Guideline</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-2-3">7.2.3 Annotation Methodology</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-2-4">7.2.4 Dataset Statistics</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-2-5">7.2.5 Causal Cue Phrases</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-3">7.3 Baseline Performance</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-3-1">7.3.1 Train-Test Split</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-3-2">7.3.2 Causal Argument Extraction</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-3-3">7.3.3 Error Analysis</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-4">7.4 Conclusions</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-Ref">References</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8">8 Feasibility ofWafer Exchange for European Edge AI Pilot Lines</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-1">8.1 Introduction</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-2">8.2 Technical Details and Comparison</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-2-1">8.2.1 Comparison TXRF and VPD-ICPMS Equipment for Surface Analysis</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-2-2">8.2.2 VPD-ICPMS Analyses on Bevel</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-3">8.3 Cross-Contamination Check-Investigation</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-3-1">8.3.1 Example for the Comparison of the Institutes</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-4">8.4 Conclusiion</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-Ref">References</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch9">9 A Framework for Integrating Automated Diagnosis into Simulation</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch9-1">9.1 Introduction</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch9-2">9.2 Model-based Diagnosis</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch9-3">9.3 Simulation and Diagnosis Framework</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch9-3-1">9.3.1 FMU Simulation Tool</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch9-3-2">9.3.2 ASP Diagnose Tool</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch9-4">9.4 Experiment</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch9-5">9.5 Conclusion</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch9-Ref">References</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10">10 Deploying a Convolutional Neural Network on Edge MCU and Neuromorphic Hardware Platforms</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-1">10.1 Introduction</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-2">10.2 Related Work</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-3">10.3 Methods</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-3-1">10.3.1 Neural Network Deployment</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-3-1-1">10.3.1.1 Task and Model</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-3-1-2">10.3.1.2 Experimental Setup</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-3-1-3">10.3.1.3 Deployment</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-3-2">10.3.2 Measuring the Ease of Deployment</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-4">10.4 Results</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-4-1">10.4.1 Inference Results</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-4-2">10.4.2 Perceived Effort</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-5">10.5 Conclusion</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-Ref">References</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11">11 Efficient Edge Deployment Demonstrated on YOLOv5 and Coral Edge TPU</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-1">11.1 Introduction</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-2">11.2 Related Work</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-3">11.3 Experimental Setup</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-3-1">11.3.1 Google Coral Edge TPU</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-3-2">11.3.2 YOLOv5</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-4">11.4 Performance Considerations</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-4-1">11.4.1 Graph Optimization</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-4-1-1">11.4.1.1 Incompatible Operations</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-4-1-2">11.4.1.2 Tensor Transformations</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-4-2">11.4.2 Performance Evaluation</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-4-2-1">11.4.2.1 Speed-Accuracy Comparison</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-4-2-2">11.4.2.2 USB Speed Comparison</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-4-3">11.4.3 Deployment Pipeline</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-5">11.5 Conclusion and Future Work</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-Ref">References</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12">12 Embedded Edge Intelligent Processing for End-To-End Predictive Maintenance in Industrial Applications</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-1">12.1 Introduction and Background</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-2">12.2 Machine and Deep Learning for Embedded Edge Predictive Maintenance</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-3">12.3 Approaches for Predictive Maintenance</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-3-1">12.3.1 Hardware and Software Platforms</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-3-2">12.3.2 Motor Classification Use Case</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-4">12.4 Experimental Setup</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-4-1">12.4.1 Signal Data Acquisition and Pre-processing</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-4-2">12.4.2 Feature Extraction, ML/DL Model Selection and Training</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-4-3">12.4.3 Optimisation and Tuning Performance</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-4-4">12.4.4 Testing</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-4-5">12.4.5 Deployment</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-4-6">12.4.6 Inference</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-5">12.5 Discussion and Future Work</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-Ref">References</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13">13 AI-Driven Strategies to Implement a Grapevine Downy MildewWarning System</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-1">13.1 Introduction</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-2">13.2 Research Material and Methodology</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-2-1">13.2.1 Datasets</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-2-2">13.2.2 Labelling Methodology</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-3">13.3 Machine Learning Models</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-4">13.4 Results</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-4-1">13.4.1 Primary Mildew Infection Alerts</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-4-2">13.4.2 Secondary Mildew Infection Alerts</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-5">13.5 Discussion</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-6">13.6 Conclusion</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-Ref">References</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch14">14 On the Verification of Diagnosis Models</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch14-1">14.1 Introduction</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch14-2">14.2 The Model Testing Challenge</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch14-3">14.3 Use Case</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch14-4">14.4 Open Issues and Challenges</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch14-5">14.5 Conclusion</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch14-Ref">References</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="Index">Index</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="preface09">About the Editors</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
</tbody>
</table>
</preface>
<preface class="preface" id="preface05">
<title>Preface</title>
<subtitle>Industrial Artificial Intelligence Technologies and Applications</subtitle>
<para>Digitalisation and Industry 5.0 are changing how manufacturing facilities operate by deploying many sensors/actuators, edge computing, and IIoT devices and forming intelligent networks of collaborative machines that are able to collect, aggregate, and intelligently process data at a network&#x2019;s edge.</para>
<para>Given the vast amount of data produced by IIoT devices, computing at the edge is required. In this context, edge computing plays an important role &#x2013; the edge should provide computing resources for edge intelligence with dependability, data management, and aggregation provision in mind. Edge intelligence &#x2013; for example, AI technologies with edge computing for training/learning, testing, or inference &#x2013; is essential for IIoT applications to build models that can learn from a large amount of aggregated data.</para>
<para>Edge computing is a distributed computing paradigm that brings computation and data storage closer to a device&#x2019;s location. AI algorithms process the data created on a device with or without an internet connection. These new AI-based algorithms allow data to be processed within a few milliseconds, providing real-time feedback.</para>
<para>The AI models operate on the devices themselves without the need for a cloud connection and without the problems associated with data latency, which results in much faster data processing and support for use cases that require real-time inferencing.</para>
<para>Major challenges remain in achieving this potential due to the inherent complexity of designing and deploying energy-efficient edge AI algorithms and architectures, the intricacy of complex variations in neural network architectures, and the underlying limited processing capabilities of edge AI accelerators.</para>
<para>Industrial-edge AI can run on various hardware platforms, from ordinary microcontrollers (MCUs) to advanced neural processing devices. IIoT edge AI-connected devices use embedded algorithms to monitor device behaviour and collect and process device data. Devices make decisions, automatically correct problems, and predict future performance.</para>
<para>AI-based technologies are used across industries by introducing intelligent techniques, including machine and deep learning, cognitive computing, and computer vision. The application of the techniques and methods of AI in the industrial sector is a crucial reference source that provides vital research on implementing advanced technological techniques in this sector.</para>
<para>This book offers comprehensive coverage of the topics presented at the &#x201c;International Workshop on Edge Artificial Intelligence for Industrial Applications (EAI4IA)&#x201d; in Vienna, 25-26 July 2022. EAI4IA is co-located with the 31<sup>st</sup> International Joint Conference on Artificial Intelligence and the 23<sup>rd</sup> European Conference on Artificial Intelligence (IJCAI-ECAI 2022). It combines the ideas and concepts developed by researchers and practitioners working on providing edge AI methods, techniques, and tools for use in industrial applications.</para>
<para>By highlighting important topics, such as embedded AI for semiconductor manufacturing and trustworthy, dependable, and explainable AI for the digitising industry, verification, validation and benchmarking of AI systems and technologies, AI model development workflows and hardware target platforms deployment, the book explores the challenges faced by AI technologies deployed in various industrial application domains.</para>
<para>The book is ideally structured and designed for researchers, developers, managers, academics, analysts, post-graduate students, and practitioners seeking current research on the involvement of industrial-edge AI. It combines the latest methodologies, tools, and techniques related to AI and IIoT in a joint volume to build insight into their sustainable deployment in various industrial sectors.</para>
<para>The book is structured around four different topics:</para>
<itemizedlist mark="none" spacing="normal">
<listitem><para>1. <emphasis role="strong">Verification, Validation and Benchmarking of AI Systems and Technologies.</emphasis></para></listitem>
<listitem><para>2. <emphasis role="strong">Trustworthy, Dependable AI for Digitising Industry.</emphasis></para></listitem>
<listitem><para>3. <emphasis role="strong">Embedded AI for semiconductor manufacturing.</emphasis></para></listitem>
<listitem><para>4. <emphasis role="strong">AI model development workflow and HW target platforms deployment.</emphasis></para></listitem>
</itemizedlist>
<para>In the following, the papers published in this book are briefly discussed.</para>
<para>S. Narduzzi, L. Mateu, P. Jokic, E. Azarkhish, and A. Dunbar: &#x201c;Benchmarking Neuromorphic Computing for Inference&#x201d; tackle the challenge of benchmarking aiming at providing a fair and user-friendly method. The authors introduce the challenge and finally come up with possible key performance indicators.</para>
<para>M. Molendijk, K. Vadivel, F. Corradi, G-J. van Schaik, A. Yousefzadeh, and H. Corporaal: &#x201c;Benchmarking the Epiphany Processor as a Reference Neuromorphic Architecture&#x201d; compare different implementations of neuromorphic processors and present suggestions for improvements.</para>
<para>P. Vijayan, A. Yousefzadeh, M. Sifalakis, and R. van Leuken: &#x201c;Temporal Delta Layer: Exploiting Temporal Sparsity in Deep Neural Networks for Time-Series Data&#x201d; deal with improving the learning of time-series data in the context of deep neural networks. In particular, the authors consider sparsity and show experimentally overall improvements.</para>
<para>D. Purice, M. Ludwig, and C. Lenz: &#x201c;An End-to-End AI-based Automated Process for Semiconductor Device Parameter Extraction&#x201d; present a validation pipeline aiming at gaining trust in semiconductor devices relying on authenticity checking. The authors further evaluate their approach by considering several artificial neural network architectures.</para>
<para>D. Morits, M. Rizzo Piton, and T. Laakko: &#x201c;AI machine vision system for wafer defect detection&#x201d; discuss the use of machine learning for fault detection based on images in the context of semiconductor manufacturing.</para>
<para>S. Al-Baddai and J. Papadoudis: &#x201c;Failure detection in silicon package&#x201d; discuss the use of machine learning techniques for wire-bonding inspection occurring during the packaging of semiconductors. The authors report on the accuracy of failure detection using machine learning in the complex industrial environment.</para>
<para>X. L. Liu, Eileen Salhofer, A. Safont Andreu, and R. Kern: &#x201c;S2ORC-SemiCause: Annotating and analysing causality in the semiconductor domain&#x201d; introduce a benchmark dataset to be used in the context of cause-effect reasoning for extracting causal relations.</para>
<para>A. Wandesleben, D. Truffier-Boutry, V. Brackmann, B. Lilienthal-Uhlig, M. Jaysnkar, S. Beckx, I. Madarevic, A. Demarest, B. Hintze, F. Hochschulz, Y. Le Tiec, A. Spessot, and F. Nemouchi: &#x201c;Feasibility of wafer exchange for European Edge AI pilot lines&#x201d; focus on contamination monitoring for allowing to exchange wafers among different facilities. In particular, the authors presented an analysis of whether such an exchange would be feasible considering three European research institutes.</para>
<para>D. Kaufmann and F. Wotawa: &#x201d;A framework for integrating automated diagnosis into simulation&#x201d; discuss a framework that allows the integration of model-based diagnosis algorithms in physical simulation. The framework can be used for verifying and validating diagnosis implementations for cyber-physical systems.</para>
<para>S. Narduzzi, D. Favre, N. Pazos Escudero, and A. Dunbar: &#x201c;Deploying a Convolutional Neural Network on Edge MCU and Neuromorphic Hardware Platforms&#x201d; discuss the deployment of neural networks for edge computing considering different platforms. The authors also report on the perceived effort of deployment for each of the platforms.</para>
<para>R. Prokscha, M. Schneider, and A. H&#xf6;&#xdf;: &#x201c;Efficient Edge Deployment Demonstrated on YOLOv5 and Coral Edge TPU&#x201d;consider the question of deployment of machine learning on the edge.</para>
<para>O. Vermesan and M. Coppola: &#x201d;Embedded Edge Intelligent Processing for End-To-End Predictive Maintenance in Industrial Applications&#x201d; presented the use of machine learning for edge computing supporting predictive maintenance using different technologies, workflows, and datasets.</para>
<para>L. A. Steffenel, A. Langlet, L. Hollard, L. Mohimont, N. Gaveau, M. Copola, C. Pierlot, and M. Rondeau: &#x201d;AI-Driven Strategies to Implement a Grapevine Downy Mildew Warning System&#x201d; outline the use of machine learning for identifying infections occurring in vineyards and present an experimental evaluation comparing different machine learning algorithms.</para>
<para>F. Wotawa and O. Tazl: &#x201d;On the Verification of Diagnosis Models&#x201d; focus on challenges of verification and in particular testing applied to logic-based diagnosis. The authors consider testing system models and use a running example for demonstrating how such models can be tested and come up with open research questions.</para>
</preface>
<preface class="preface" id="preface06">
<title>List of Figures</title>
<table cellspacing="5" cellpadding="5" frame="none" rules="none">
<tbody>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-F1">Figure 1.1 Benchmarking fairness. (a) Unfair benchmarking: the KPIs are comparable, but the benchmarked hardware platforms are not exploited to their full potential. (b) Fair benchmarking: the hardware platforms are exploited to their full potential, but the resulting combined KPIs (KPI<sub><emphasis>CB</emphasis></sub>) are not comparable</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-F2">Figure 1.2 Combined KPIs for fair benchmarking</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-F3">Figure 1.3 Benchmarking pipeline based on use-cases. An automated search finds the best possible model exploiting the performance offered by each target hardware platforms. The resulting combined KPIs are comparable</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-F1">Figure 2.1 Overall scalable architecture of Epiphany-III</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-F2">Figure 2.2 Adapteva launched an $99 Epiphany-III based single board computer as their first product</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-F3">Figure 2.3 Flow chart of processing a LIF neuron with processing time measured in Epiphany</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-F1">Figure 3.1 (a) Standard DNN, and (b) DNN with proposed temporal delta layer</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-F2">Figure 3.2 Sparsity in activation (&#x0394;x) drastically reduce the memory fetches and multiplications between &#x0394;x and columns of weight matrix, W, that correspond to zero</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-F3">Figure 3.3 Demonstration of two temporally consecutive activation maps leading to near zero values (rather than absolute zeroes) after delta operation</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-F4">Figure 3.4 Importance of step size in quantization: on the right side, in all three cases, the data is quantized to five bins with different uniform step sizes. However, without optimum step size value, the quantization can detrimentally alter the range and resolution of the original data</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-F5">Figure 3.5 Evolution of quantization step size from initialization to convergence in LSQ. As step-size is a learnable parameter, it gets re-adjusted during training to cause minimum information loss in each layer</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-F1">Figure 4.1 Overview of the architecture</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-F2">Figure 4.2 Examples showcasing different semiconductor technologies</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-F3">Figure 4.3 Examples of labelled data showcasing the different ROIs: green &#x2013; VIA; yellow &#x2013; metal; teal &#x2013; lateral isolation; red &#x2013; poly; blue &#x2013; deep trench isolation</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-F4">Figure 4.4 Histograms of the investigated data grouped by label of interest</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-F5">Figure 4.5 Overview of the U-net architecture</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-F6">Figure 4.6 Overview of the FPN architecture</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-F7">Figure 4.7 Overview of the GSCNN architecture</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-F8">Figure 4.8 Overview of the PSPNet architecture</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-F9">Figure 4.9 Overview of the Siamese network architecture</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-F10">Figure 4.10 Average Dice Scores (blue) and spread (green) per investigated network architecture, along with the final chosen architecture (red)</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-F11">Figure 4.11 An overview of the U-net cascade architecture, consisting of a 2D U-net (top) and a 3D U-net (bottom) which takes as input the high resolution input image stacked with the output segmentation of the first stage</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-F12">Figure 4.12 Utilised cluster evaluation techniques</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-F13">Figure 4.13 Example cross-section image with annotated metal and contact/VIA features</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-F14">Figure 4.14 Example cross-section image (upper left). The polygonised VIA objects are shown (lower left). A dendrogram is shown for the relative distances of the y-coordinates of the single objects (upper right). Finally, the results of the utilised cluster evaluation techniques are presented (lower right)</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch5-F1">Figure 5.1 Examples of microscopic images of various superconductor and semiconductor devices with surface defects</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch5-F2">Figure 5.2 General architecture of the developed machine vision system</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch5-F3">Figure 5.3 A scheme of the image dataset preparation, including labelling, cropping and data augmentation</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch5-F4">Figure 5.4 Example of binary classification of wafer defects: defect vs background</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6-F1">Figure 6.1 <emphasis>Left</emphasis>: Curve with abnormal minimum position (red) in comparison to normal ones (white) of recorded sensor data during wirebonding process. <emphasis>Right</emphasis>: shows an example of abnormal OOI image with shown crack on the surface</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6-F2">Figure 6.2 Flow chart of development and deployment life cycle for AI solution at IFX. In development phase data scientists could use different programming language as the final model can be converted to ONNX. In deployment phase, the vision frame can simply access to ONNX and run during inference time</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6-F3">Figure 6.3 Process flow integration of the developed AD solution into an existing IFX infrastructure</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6-F4">Figure 6.4 show the flow processes during silicon package, the backside blue arrow shows the position of transfer learning from OOI backwards to taken images after molding process</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6-F5">Figure 6.5 shows an example of the OOI image on left side (This image is taken before shopping and after electrical test) and example of image after molding process on right side</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-F1">Figure 7.1 Causal cue phrases ranked by frequency for all sentences in S2ORC-SemiCause dataset</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-F1">Figure 8.1 Comparison of TXRF LLDs of CEA LETI/IMEC</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-F2">Figure 8.2 Comparison of VPD-ICPMS LLDs of CEA LETI/IMEC/FhG</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-F3">Figure 8.3 Schematic of the VPD bevel collection at (a) IMEC, (b) CEA-LETI and (c) FhG IPMS</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-F4">Figure 8.4 Comparison LLDs CEA LETI/IMEC for VPDICPMS Bevel</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-F5">Figure 8.5 Comparison TXRF results of CEA LETI/IMEC for IMEC inspection tool</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-F6">Figure 8.6 Comparison VPD-ICPMS results of CEA LETI/IMEC/FhG for IMEC inspection tool</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-F7">Figure 8.7 Comparison VPD-ICPMS bevel results of CEA LETI/IMEC for IMEC inspection tool</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch9-F1">Figure 9.1 A simple electric circuit comprising bulbs, a switch and a battery</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch9-F2">Figure 9.2 Illustration of the simulation and diagnose environment as well as the overall operating principles. The framework of the FMU Simulation Tool provides an interface to enable the integration of a diagnose tool and/or other methods. The models can be substituted by any others in the provided framework</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch9-F3">Figure 9.3 Simulation showing the measured signal output of the two bulbs, switch and the battery. For this example a fault injection (<emphasis>broken</emphasis>) in bulb 1 after 0.2 seconds (red indicator) and a fault injection (<emphasis>broken</emphasis>) to the switch after 0.3 seconds (orange indicator) is initiated</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch9-F4">Figure 9.4 Simulation and diagnose output results based on the electrical two-lamps circuit with a broken bulb after 0.2 seconds and a broken switch at 0.3 seconds. The upper tables illustrate the simulation input/output signals, which are used as observation for the diagnose (lower tables) part. Based on the given observations for the three selected time steps, different diagnose results are obtained</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-F1">Figure 10.1 Illustration of LeNet-5 architecture</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-F2">Figure 10.2 Deployment pipelines for all platforms. From left to right: STM32L4R9, Kendryte K210 and DynapCNN. For DynapCNN, the pipeline is contained in a single Python script, while the other relay on external languages and tools</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-F1">Figure 11.1 Raspberry Pi 4 with Google Coral edge TPU USB accelerator</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-F2">Figure 11.2 Quantized edge TPU Models</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-F3">Figure 11.3 USB3 speed-accuracy comparison of different model types and configurations for edge TPU deployment</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-F4">Figure 11.4 YOLOv5s inference speed comparison between USB2 and USB3</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-F5">Figure 11.5 Micro software stack for fast and lightweight edge deployment</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-F1">Figure 12.1 Industrial motor components</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-F2">Figure 12.2 Micro-edge AI processing flow</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-F3">Figure 12.3 Visualisation of two selected classes signals in both temporal and frequency domain with NEAI</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-F4">Figure 12.4 Benchmarking with NEAI</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-F5">Figure 12.5 Snapshots of Feature Explorer in EI based on the pre-processing block early in the process</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-F6">Figure 12.6 Confusion Matrix and Data Explorer based on full training set: Correctly Classified (Green) and Misclassified (Red)</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-F7">Figure 12.7 A comparison between int8 quantized and unoptimized versions of the same model, showing the difference in performance and results</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-F8">Figure 12.8 Evaluation of trained model using NEAI Emulator with live streaming</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-F9">Figure 12.9 EI model testing with test datasets</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-F10">Figure 12.10 Live classification streaming with detected state and confidence (with Tera Term)</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-F1">Figure 13.1 Algorithm for primary infection alarms</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-F2">Figure 13.2 Algorithm for secondary infection alarms</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch14-F1">Figure 14.1 A simple electric circuit comprising bulbs, a switch and a battery</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch14-F2">Figure 14.2 The model-based diagnosis principle and information needed for testing</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch14-F3">Figure 14.3 A model for diagnosis of the two lamp example from Figure 14.1 comprising the behavior of the components (lines 1-7) and connections (lines 8-10), and the structure of the circuit (lines 11-18)</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch14-F4">Figure 14.4 Another simple electric circuit comprising bulbs, switches and a battery. This circuit is an extended version of the circuit from Figure 14.1. On the right, we have the structural model of this circuit in Prolog notation</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
</tbody>
</table>
</preface>
<preface class="preface" id="preface07">
<title>List of Tables</title>
<table cellspacing="5" cellpadding="5" frame="none" rules="none">
<tbody>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-T1">Table 1.1 Relevant KPIs for tasks, models and hardware domains. We also mention some combined KPIs to illustrate the inter-dependency of the domains</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-T2">Table 1.2 Accuracy (<emphasis>Acc</emphasis>) for different object detection settings on COCO test-dev</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-T3">Table 1.3 Representation of resource-constrained KPIs</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-T4">Table 1.4 Typical display of performance comparison of neuromorphic hardware platforms</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch1-T5">Table 1.5 Recent display of performance comparison of neuromorphic hardware platforms</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-T1">Table 2.1 Memory fragmentations in some digital large-scale neuromorphic chips</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch2-T2">Table 2.2 Mapping LeNet-5 neural network (with binary weights) in different neuromorphic architectures</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-T1">Table 3.1 Spatial stream - comparison of accuracy and activation sparsity obtained through the proposed scenarios against the baseline. In the case of fixed point quantization, the reported results are for a bitwidth of 6 bits</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-T2">Table 3.2 Temporal stream - comparison of accuracy and activation sparsity obtained through the proposed scenarios against the benchmark. In the case of fixed point quantization, the reported results are for a bitwidth of 7 bits</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-T3">Table 3.3 Result of decreasing activation bitwidth in fixed point quantization method. For spatial stream, decreasing below 6 bits caused the accuracy to drop considerably. For temporal stream, the same happened below 7 bits</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch3-T4">Table 3.4 Final results from two-stream network after average fusing the spatial and temporal stream weights. With 5% accuracy loss, the proposed method almost doubles the activation sparsity available in comparison to the baseline</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-T1">Table 4.1 Obtained Dice Scores for each showcased network architecture</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-T2">Table 4.2 Averaged Dice Scores for each label of interest</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch4-T3">Table 4.3 Utilised cluster evaluation techniques. Notation: <emphasis>n</emphasis>: number of objects in data-set; <emphasis>c</emphasis>: centre of data-set; <emphasis>NC</emphasis>: number of clusters; <emphasis>C<sub>i</sub></emphasis>: the i-th cluster; <emphasis>n<sub>i</sub></emphasis>: number of objects in <emphasis>C<sub>i</sub></emphasis>; <emphasis>c<sub>i</sub></emphasis>: centre of <emphasis>C<sub>i</sub></emphasis>; <emphasis>W<sub>k</sub></emphasis>: the withincluster sum of squared distances from cluster mean; <emphasis>W<sub>&#x2217;k</sub></emphasis> appropriate null reference; B reference data-sets</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6-T1">Table 6.1 Show the confusion matrix and metrics of the CNN model on productive data for BOT and TOP of OOI images</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch6-T2">Table 6.2 Show the confusion matrix and metrics of the CNN model on productive data for BOT and TOP of the new process</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-T1">Table 7.1 Inter-annotator agreement for the first two iterations. <emphasis>Arg1</emphasis> (cause) refers to the span of the arguments that lead to <emphasis>Arg2</emphasis> (effect) for the respective relation type</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-T2">Table 7.2 Comparison of labels generated by both annotators for Iteration 2. Examples and total counts (in number of arguments) for each type also given</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-T3">Table 7.3 Descriptive statistics of benchmark datasets. Overview of CoNLL-2003 (training split) and BC5CDR (training split) for named entity recognition, as well as causality dataset BioCause (full dataset), and S2ORCSemiCause (training split)</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-T4">Table 7.4 Descriptive statistics of <emphasis>S2ORC-SemiCause</emphasis> dataset. <emphasis>#-sent</emphasis>: total number of annotated sentences, <emphasis>#-sent no relations</emphasis>: number of sentences without causality, <emphasis>Argument</emphasis>: total amount and mean length (token span) of all annotated argument, <emphasis>Consequence/Purpose</emphasis>: amount and mean length of cause and effect arguments for the respective relation types</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-T5">Table 7.5 Baseline performance using BERT with a token classification head. Both the F1 scores and the standard derivation over 7 different runs are shown. Despite the small sample size, the standard deviation remain low, similar to previous work</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch7-T6">Table 7.6 Comparison of predicted and annotated argument spans for the test split. Examples and total counts (in number of arguments) for correct prediction and for each error source are also given</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-T1">Table 8.1 Contamination monitoring techniques LETI/IMEC/FhG</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch8-T2">Table 8.2 Overview VPD-ICPMS LLD determination and technical details for LETI/IMEC/FhG</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch9-T1">Table 9.1 CPS Model component state description for the light bulb, switch and battery. All used states, including fault states of the components are shown</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-T1">Table 10.1 Relevant technical specifications of the devices (from constructor websites)</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-T2">Table 10.2 Results on MNIST dataset for all platforms. For the DynapCNN, we report the accuracy and latency for the first spike prediction and over the entire simulation</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch10-T3">Table 10.3 Perceived effort for each stage of the inference. 1: small, 5: large</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-T1">Table 11.1 Comparison of YOLOv5s model before and after optimizations</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch11-T2">Table 11.2 Model comparison in regards of input size, file size, operation</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch12-T1">Table 12.1 Frameworks and inference engines for integrating AI mechanisms within MCUs</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-T1">Table 13.1 Accuracy of 2019 Primary Infection Models</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-T2">Table 13.2 Accuracy of 2020 Primary Infection Models</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-T3">Table 13.3 Accuracy of 2021 Primary Infection Models</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch13-T4">Table 13.4 Accuracy of 2021 Primary Infection Models</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch14-T1">Table 14.1 All eight test cases used to verify the 2-bulb example comprising the used observations and the expected diagnoses. The <emphasis role="strong">P/F</emphasis> column indicates whether the original model passes (&#x221A;) or fails (&#x00D7;) the test</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch14-T2">Table 14.2 Running 7 model mutations M<emphasis>i</emphasis>, where we removed line <emphasis>i</emphasis> in the original model of Figure 14.3, using the 8 test cases from Table 14.1</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
<tr><td valign="top" align="left"><emphasis role="strong"><link linkend="ch14-T3">Table 14.3 Test cases for the extended two-bulb example from Figure 14.4 and their test execution results. In gray we indicate tests that check the expected (fault-free) behavior of the circuit</link></emphasis></td><td valign="top" align="left"><ulink url="https://www.riverpublishers.com/pdf/ebook/chapter/">Download As PDF</ulink></td></tr>
</tbody>
</table>
</preface>
<preface class="preface" id="preface08">
<title>List of Contributors</title>
<para><emphasis role="strong">Al-Baddai, Saad</emphasis>, <emphasis>Infineon Technologies AG, Germany</emphasis></para>
<para><emphasis role="strong">Andreu, Anna Safont</emphasis>, <emphasis>University of Klagenfurt, Austria, Infineon Technologies Austria</emphasis></para>
<para><emphasis role="strong">Azarkhish, Erfan</emphasis>, <emphasis>CSEM, Switzerland</emphasis></para>
<para><emphasis role="strong">Beckx, Stephan</emphasis>, <emphasis>imec, Belgium</emphasis></para>
<para><emphasis role="strong">Brackmann, Varvara</emphasis>, <emphasis>Fraunhofer IPMS CNT, Germany</emphasis></para>
<para><emphasis role="strong">Coppola, Marcello</emphasis>, <emphasis>STMicroelectronics, France</emphasis></para>
<para><emphasis role="strong">Corporaal, Henk</emphasis>, <emphasis>Technical University of Eindhoven, Netherlands</emphasis></para>
<para><emphasis role="strong">Corradi, Federico</emphasis>, <emphasis>imec, Netherlands</emphasis></para>
<para><emphasis role="strong">Demarest, Audde</emphasis>, <emphasis>Universit&#xe9; Grenoble Alpes, CEA-Leti, France</emphasis></para>
<para><emphasis role="strong">Dunbar, Andrea</emphasis>, <emphasis>CSEM, Switzerland</emphasis></para>
<para><emphasis role="strong">Escudero, Nuria Pazos</emphasis>, <emphasis>HE-Arc, Switzerland</emphasis></para>
<para><emphasis role="strong">Favre, Dorvan</emphasis>, <emphasis>CSEM, Switzerland, HE-Arc, Switzerland</emphasis></para>
<para><emphasis role="strong">Gaveau, Nathalie</emphasis>, <emphasis>Universit&#xe9; de Reims Champagne Ardenne, France</emphasis></para>
<para><emphasis role="strong">H&#xf6;&#xdf;, Alfred</emphasis>, <emphasis>Ostbayerische Technische Hochschule Amberg-Weiden, Germany</emphasis></para>
<para><emphasis role="strong">Hintze, Bernd</emphasis>, <emphasis>FMD, Germany </emphasis></para>
<para><emphasis role="strong">Hochschulz, Franck</emphasis>, <emphasis>Fraunhofer IMS, Germany </emphasis></para>
<para><emphasis role="strong">Hollard, Lilian</emphasis>, <emphasis>Universit&#xe9; de Reims Champagne Ardenne, France</emphasis></para>
<para><emphasis role="strong">Jaysnkar, Manoj</emphasis>, <emphasis>imec, Belgium </emphasis></para>
<para><emphasis role="strong">Jokic, Petar</emphasis>, <emphasis>CSEM, Switzerland</emphasis></para>
<para><emphasis role="strong">Kaufmann, David</emphasis>, <emphasis>Graz University of Technology, Austria</emphasis></para>
<para><emphasis role="strong">Kern, Roman</emphasis>, <emphasis>Graz University of Technology, Austria</emphasis></para>
<para><emphasis role="strong">Laakko, Timo</emphasis>, <emphasis>VTT Technical Research Centre of Finland Ltd, Finland</emphasis></para>
<para><emphasis role="strong">Langlet, Axel</emphasis>, <emphasis>Universit&#xe9; de Reims Champagne Ardenne, France</emphasis></para>
<para><emphasis role="strong">Le Tiec, Yannick</emphasis>, <emphasis>Universit&#xe9; Grenoble Alpes, CEA, LETI, France</emphasis></para>
<para><emphasis role="strong">Lenz, Claus</emphasis>, <emphasis>Cognition Factory GmbH, Germany</emphasis></para>
<para><emphasis role="strong">Leuken, Rene van</emphasis>, <emphasis>TU Delft, Netherlands</emphasis></para>
<para><emphasis role="strong">Lilienthal-Uhlig, Benjamin</emphasis>, <emphasis>Fraunhofer IPMS CNT, Germany</emphasis></para>
<para><emphasis role="strong">Liu, Xing Lan</emphasis>, <emphasis>Know-Center GmbH, Austria</emphasis></para>
<para><emphasis role="strong">Ludwig, Matthias</emphasis>, <emphasis>Infineon Technologies AG, Germany</emphasis></para>
<para><emphasis role="strong">Madarevic, Ivan</emphasis>, <emphasis>imec, Belgium </emphasis></para>
<para><emphasis role="strong">Mateu, Loreto</emphasis>, <emphasis>Fraunhofer IIS, Germany</emphasis></para>
<para><emphasis role="strong">Mohimont, Lucas</emphasis>, <emphasis>Universit&#xe9; de Reims Champagne Ardenne, France</emphasis></para>
<para><emphasis role="strong">Molendijk, Maarten</emphasis>, <emphasis>imec, Netherlands, Technical University of Eindhoven, Netherlands</emphasis></para>
<para><emphasis role="strong">Morits, Dmitry</emphasis>, <emphasis>VTT Technical Research Centre of Finland Ltd, Finland</emphasis></para>
<para><emphasis role="strong">Narduzzi, Simon</emphasis>, <emphasis>CSEM, Switzerland</emphasis></para>
<para><emphasis role="strong">Nemouchi, Fabrice</emphasis>, <emphasis>Universit&#xe9; Grenoble Alpes, CEA, LETI, France</emphasis></para>
<para><emphasis role="strong">Papadoudis, Jan</emphasis>, <emphasis>Infineon Technologies AG, Germany</emphasis></para>
<para><emphasis role="strong">Pierlot, Cl&#xe9;ment</emphasis>, <emphasis>Vranken-Pommery Monopole, France</emphasis></para>
<para><emphasis role="strong">Piton, Marcelo Rizzo</emphasis>, <emphasis>VTT Technical Research Centre of Finland Ltd, Finland</emphasis></para>
<para><emphasis role="strong">Prokscha, Ruben</emphasis>, <emphasis>Ostbayerische Technische Hochschule Amberg-Weiden, Germany</emphasis></para>
<para><emphasis role="strong">Purice, Dinu</emphasis>, <emphasis>Cognition Factory GmbH, Germany</emphasis></para>
<para><emphasis role="strong">Rondeau, Marine</emphasis>, <emphasis>Vranken-Pommery Monopole, Reims, France </emphasis></para>
<para><emphasis role="strong">Salhofer, Eileen</emphasis>, <emphasis>Know-Center GmbH, Austria, Graz University of Technology, Austria</emphasis></para>
<para><emphasis role="strong">Schneider, Mathias</emphasis>, <emphasis>Ostbayerische Technische Hochschule Amberg-Weiden, Germany</emphasis></para>
<para><emphasis role="strong">Sifalakis, Manolis</emphasis>, <emphasis>imec, Netherlands</emphasis></para>
<para><emphasis role="strong">Spessot, Alessio</emphasis>, <emphasis>imec, Belgium</emphasis></para>
<para><emphasis role="strong">Steffenel, Luiz Angelo</emphasis>, <emphasis>Universit&#xe9; de Reims Champagne Ardenne, France</emphasis></para>
<para><emphasis role="strong">Tazl, Oliver</emphasis>, <emphasis>Graz University of Technology, Austria</emphasis></para>
<para><emphasis role="strong">Truffier-Boutry, Delphine</emphasis>, <emphasis>Universit&#xe9; Grenoble Alpes, CEA, LETI, France </emphasis></para>
<para><emphasis role="strong">Vadivel, Kanishkan</emphasis>, <emphasis>Technical University of Eindhoven, Netherlands</emphasis></para>
<para><emphasis role="strong">van Schaik, Gert-Jan</emphasis>, <emphasis>imec, Netherlands</emphasis></para>
<para><emphasis role="strong">Vermesan, Ovidiu</emphasis>, <emphasis>SINTEF AS, Norway</emphasis></para>
<para><emphasis role="strong">Vijayan, Preetha</emphasis>, <emphasis>TU Delft, Netherlands, imec, Netherlands</emphasis></para>
<para><emphasis role="strong">Wandesleben, Annika Franziska</emphasis>, <emphasis>Fraunhofer IPMS CNT, Germany</emphasis></para>
<para><emphasis role="strong">Wotawa, Franz</emphasis>, <emphasis>Graz University of Technology, Austria</emphasis></para>
<para><emphasis role="strong">Yousefzadeh, Amirreza</emphasis>, <emphasis>imec, Netherlands</emphasis></para>
</preface>
<preface class="preface" id="preface09">
<title>About the Editors</title>
<para><emphasis role="strong">Ovidiu Vermesan</emphasis> holds a PhD degree in microelectronics and a Master of International Business (MIB) degree. He is Chief Scientist at SINTEF Digital, Oslo, Norway. His research interests are in smart systems integration, mixed-signal embedded electronics, analogue neural networks, edge artificial intelligence and cognitive communication systems. Dr. Vermesan received SINTEF&#x2019;s 2003 award for research excellence for his work on the implementation of a biometric sensor system. He is currently working on projects addressing nanoelectronics, integrated sensor/actuator systems, communication, cyber&#x2013;physical systems (CPSs) and Industrial Internet of Things (IIoT), with applications in green mobility, energy, autonomous systems, and smart cities. He has authored or co-authored over 100 technical articles, conference/workshop papers and holds several patents. He is actively involved in the activities of European partnership for Key Digital Technologies (KDT). He has coordinated and managed various national, EU and other international projects related to smart sensor systems, integrated electronics, electromobility and intelligent autonomous systems such as E<sup>3</sup> Car, POLLUX, CASTOR, IoE, MIRANDELA, IoF2020, AUTOPILOT, AutoDrive, ArchitectECA2030, AI4DI, AI4CSM. Dr. Vermesan actively participates in national, Horizon Europe and other international initiatives by coordinating and managing various projects. He is the coordinator of the IoT European Research Cluster (IERC) and a member of the board of the Alliance for Internet of Things Innovation (AIOTI). He is currently the technical co-coordinator of the Artificial Intelligence for Digitising Industry (AI4DI) project.</para>
<para><emphasis role="strong">Franz Wotawa</emphasis> received a M.Sc. in Computer Science (1994) and a PhD in 1996 both from the Vienna University of Technology. He is currently a professor of software engineering at the Graz University of Technology. From the founding of the Institute for Software Technology in 2003 to the year 2009 and starting in 2020 Franz Wotawa has been the head of the institute. His research interests include model-based and qualitative reasoning, theorem proving, mobile robots, verification and validation, and software testing and debugging. Besides theoretical foundations, he has always been interested in closing the gap between research and practice. Starting from October 2017, Franz Wotawa is the head of the Christian Doppler Laboratory for Quality Assurance Methodologies for Autonomous Cyber-Physical Systems. During his career, Franz Wotawa has written more than 430 peer-reviewed papers for journals, books, conferences, and workshops. He supervised 100 master&#x2019;s and 38 Ph.D. students. For his work on diagnosis, he received the Lifetime Achievement Award of the Intl. Diagnosis Community in 2016. Franz Wotawa has been a member of a various number of program committees and organized several workshops and special issues of journals. He is a member of the Academia Europaea, the IEEE Computer Society, ACM, the Austrian Computer Society (OCG), and the Austrian Society for Artificial Intelligence and a Senior Member of the AAAI.</para>
<para><emphasis role="strong">Mario Diaz Nava</emphasis> has a PhD, and M.Sc. both in computer science, from Institut National Polytechnique de Grenoble, France, and B.S. in communications and electronics engineering from Instituto Politecnico National, Mexico. He has worked in STMicroelectronics since 1990. He has occupied different positions (Designer, Architect, Design Manager, Project Leader, Program Manager) in various STMicroelectronics research and development organisations. His selected project experience is related to the specifications and design of communication circuits (ATM, VDSL, Ultra-wideband), digital and analogue design methodologies, system architecture and program management. He currently has the position of ST Grenoble R&amp;D Cooperative Programs Manager, and he has actively participated, for the last five years, in several H2020 IoT projects (ACTIVAGE, IoF2020, Brain-IoT), working in key areas such as Security and Privacy, Smart Farming, IoT System modelling, and edge computing. He is currently leading the ANDANTE project devoted to developing neuromorphic ASICS for efficient AI/ML solutions at the edge. He has published more than 35 articles in these areas. He is currently a member of the Technical Expert Group of the PENTA/Xecs European Eureka cluster and a Chapter chair member of the ECSEL/KDT Strategic Research Innovation Agenda. He is an IEEE member. He participated in the standardisation of several communication technologies in the ATM Forum, ETSI, ANSI and ITU-T standardisation bodies.</para>
<para><emphasis role="strong">Bj&#xf6;rn Debaillie</emphasis> leads imec&#x2019;s collaborative R&amp;D activities on cutting-edge IoT technologies in imec. As program manager, he is responsible for the operational management across programs and projects, and focusses on strategic collaborations and partnerships, innovation management, and public funding policies. As chief of staff, he is responsible for executive finance and operations management and transformations. Bj&#xf6;rn coordinates semiconductor-oriented public funded projects and seeds new initiatives on high-speed communications and neuromorphic sensing. He currently leads the 35M&#x20ac; TEMPO project on neuromorphic hardware technologies, enabling low-power chips for computation-intensive AI applications (<ulink url="https://www.tempo-ecsel.eu">www.tempo-ecsel.eu</ulink>). Bj&#xf6;rn holds patents and authored international papers published in various journals and conference proceedings. He also received several awards, was elected as IEEE Senior Member and is acting in a wide range of expert boards, technical program committees, and scientific/strategic think tanks.</para>
</preface>
<chapter class="chapter" id="ch1" label="1" xreflabel="1">
<title>Benchmarking Neuromorphic Computing for Inference</title>
<subtitle>Simon Narduzzi<sup>1</sup>, Loreto Mateu<sup>2</sup>, Petar Jokic<sup>1</sup>, Erfan Azarkhish<sup>1</sup>, and Andrea Dunbar<sup>1</sup></subtitle>
<affiliation><sup>1</sup>CSEM, Switzerland<?lb?><sup>2</sup>Fraunhofer IIS, Germany</affiliation>
<section class="lev1">
<title>Abstract</title>
<para>In the last decade, there has been significant progress in the IoT domain due to the advances in the accuracy of neural networks and the industrialization of efficient neural network accelerator ASICs. However, intelligent devices will need to be omnipresent to create a seamless consumer experience. To make this a reality, further progress is still needed in the low-power embedded machine learning domain. Neuromorphic computing is a technology suited to such low-power intelligent sensing. However, neuromorphic computing is hampered today by the fragmentation of the hardware providers and the difficulty of embedding and comparing the algorithms&#x2019; performance. The lack of standard key performance indicators spanning across the hardware-software domains makes it difficult to benchmark different solutions for a given application on a fair basis. In this paper, we summarize the current benchmarking solutions used in both hardware and software for neuromorphic systems, which are in general applicable to low-power systems. We then discuss the challenges in creating a fair and user-friendly method to benchmark such systems, before suggesting a clear methodology that includes possible key performance indicators.</para>
<para><emphasis role="strong">Keywords:</emphasis> neuromorphic, inference, accelerators, benchmarking, low power, IoT, ASIC, key performance indicators</para>
</section>
<section class="lev1" id="ch1-1">
<title>1.1 Introduction</title>
<para>The performance necessary for consumer uptake of IoT devices has not been achieved yet. Intelligent always-on edge devices and sensors powered by AI and running on ultra-low power devices require outstanding energy efficiencies, low latency (real-time), high-throughput, and uncompromised accuracy. Neuromorphic computing rises to the challenge; however, the neuromorphic computing landscape is fragmented with no universal Key Performance Indicators (KPI), and comparison on a fair basis remains illusive [<link linkend="ch1-bib1">1</link>]. The landscape is complex: comparisons should consider various aspects such as industrial maturity, CMOS technology implications, arithmetic precision, silicon area, power consumption, and accuracy obtained from neural networks running on the devices. Comparing target use-cases has the advantage of looking at the system-wide requirements but adds additional complexity. For example, if we take into account the inference frequency, this affects the current leakage and active power, significantly impacting the mean power consumption of the system.</para>
<para>The most commonly accepted quantitative metrics for benchmarking neuromorphic hardware are TOPS (Tera Operations Per Second) for throughput, TOPS/W for energy efficiency, and TOPS/mm2 for area efficiency. Hardware metrics rarely take into account the algorithmic structure. For software, the performance of Machine Learning (ML) algorithms is usually defined for a given task. Their KPIs generally target the prediction performance in terms of reached objective (often accuracy). Until recently, the KPIs rarely accounted for algorithm complexity, the computational cost, or the structure which impacts its performance on a given hardware.</para>
<para>Moreover, these metrics are only applicable to traditional neural networks, such as Deep Neural Network (DNNs), while for Spiking Neural Networks (SNN), other metrics such as energy per synaptic operation for energy efficiency are used. Indeed, the very nature of these DNNs and SNNs prohibits a comparison based on standard NN parameters.</para>
<para>The main questions asked by end-users, system integrators, and sensor manufacturers are: what is the best solution for the application, and whether a given neuromorphic processor provides some advantages over the state-of-art microcontrollers. The inability to answer these questions thwarts the industrial interest. This white paper provides a brief guide to relevant metrics for fair benchmarking of neuromorphic inference accelerator ASICs, aiming to help compare different hardware approaches for varioususe-cases.</para>
<para>The paper is organized as follows: Section 1.2 provides an overview of the state-of-the-art benchmarking of inference accelerators at algorithm and hardware levels. Then we look specifically at the KPIs which are applicable to neuromorphic or power-sensitive applications, explaining what influences the metrics. Section 1.3 explains why combining KPIs for both hardware and algorithms is essential for fair benchmarking of neuromorphic computing. Finally, Section 1.4 summarizes and concludes the paper.</para>
</section>
<section class="lev1" id="ch1-2">
<title>1.2 State-of-the-art in Benchmarking</title>
<para>Benchmarking of NNs inference performance for a task occurs at both the algorithm and hardware levels. The use-case provides the constraints and optimizations to be achieved through the combination of the ML model and the hardware. Currently, ML algorithms and hardware are usually benchmarked independently with their own metrics.</para>
<para>For ML algorithms, task-related metrics are the standard. Usually, the task-related metrics are independent of the nature of the ML model used, allowing the comparison between the algorithmic techniques used to perform the task: while the algorithm may change, the way to assess the performance of the algorithm on a certain task (e.g., image classification) remains the same. This methodology allows rapid development of deep learning techniques by comparing the performance of the algorithms on a given task. In order to target resource-limited IoT applications, metrics measuring the complexity of the model exist, such as the number of parameters, sparsity, depth, and (floating-point) operation counts, are taken into account. These KPIs are measurable via simulation of the model, and most of the current deep learning libraries now provide functions that reportthese KPIs.</para>
<para>On the other hand, hardware KPIs are extracted from the deployment platform while running a certain algorithmic model. They can be either simulated or computed by running the target application on the device. These KPIs usually include power consumption (estimation), latency, and memory metrics. In other words, they provide performance results of an ML algorithm for a certain use case on a specific hardware platform. This gives a good representation of how a single device works for a given use-case but makes benchmarking difficult. In the following sections, we present the current state-of-the-art solutions to benchmarking software and hardware with a focus on low-power devices. A summary of the standard KPIs is given in <link linkend="ch1-T1">Table 1.1</link>.</para>
<fig id="ch1-T1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 1.1:</emphasis> Relevant KPIs for tasks, models and hardware domains. We also mention some combined KPIs to illustrate the inter-dependency of the domains.</para></caption>
<graphic xlink:href="graphics/ch1-tab01.jpg"/>
</fig>
<section class="lev2" id="ch1-2-1">
<title>1.2.1 Machine Learning</title>
<para>Machine learning techniques, and especially deep learning algorithms, are engineered iteratively for a given task&#x2019;s performance. ML algorithms are typically compared in terms of accuracy for a given task, such as segmentation or classification on a specified dataset. The task performance comparison is nowadays well established in the ML community. For classification tasks, accuracy, precision, recall, receiver operating characteristics (ROC), and area under the curve (AUC) are some of the most frequently used metrics. A typical example of a table is shown in <link linkend="ch1-T2">Table 1.2</link>. We refer the reader to [<link linkend="ch1-bib2">2</link>, <link linkend="ch1-bib3">3</link>, <link linkend="ch1-bib4">4</link>] for a more detailed overview of relevant metrics in MLtasks.</para>
<fig id="ch1-T2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 1.2:</emphasis> Accuracy (<math id="Ch1.T2.m2" display="inline"><mrow><mi>A</mi><mo>&#x2062;</mo><mi>c</mi><mo>&#x2062;</mo><mi>c</mi></mrow></math>) for different object detection settings on COCO test-dev. Adapted from [<link linkend="ch1-bib9">9</link>].</para></caption>
<graphic xlink:href="graphics/ch1-tab02.jpg"/>
</fig>
<para>In order to give fair comparison for different domains of deep learning, training and test datasets have been established. According to PapersWithCode [<link linkend="ch1-bib6">6</link>], computer vision-related tasks have the largest number of datasets, with long-established quasi-standards such as CIFAR [<link linkend="ch1-bib7">7</link>], ImageNet [<link linkend="ch1-bib8">8</link>], and COCO [<link linkend="ch1-bib5">5</link>]. Specific computer vision tasks have their own standard datasets, such as KITTI [<link linkend="ch1-bib10">10</link>] for autonomous driving and FDDB [<link linkend="ch1-bib11">11</link>] and WIDER Face [<link linkend="ch1-bib12">12</link>] for face detection applications. Natural Language Processing (NLP) tasks are the second most popular tasks for machine learning, with near 2000 datasets comprising GLUE [<link linkend="ch1-bib13">13</link>] and SQuAD [<link linkend="ch1-bib14">14</link>] benchmarks. Audio, biomedical and physics-related tasks equally have their own datasets. It should be mentioned that other ML techniques also have their own equivalent dataset for example reinforcement learning (RL) tasks also have their own standard benchmarks e.g. OpenAI Gym [<link linkend="ch1-bib15">15</link>] which contains a set of tasks to test reinforcement learning algorithms. Here the tasks take place in a virtual environment, and all the physics and interactions are handled by the environment.</para>
<para><emphasis role="strong">The importance of the data set</emphasis></para>
<para>The importance of the datasets can clearly be seen when looking at SNNs. Currently, the performance of SNNs does not reach DNN performance. Research in SNNs has focused on the structure of the network and learning algorithms rather than on task performance. Thus, the work used well-known datasets for DNNs and transformed them into event-based versions, such as MNIST-DVS, N-MNIST, and N-Caltech101 [<link linkend="ch1-bib16">16</link>]. Only recently, with the technology of event-based cameras, have SNN been applied to adapted datasets for various use-cases (e.g., DVS128 [<link linkend="ch1-bib17">17</link>] and TIDIGITS [<link linkend="ch1-bib18">18</link>]). These new datasets will now allow us to see if SNNs can truly rival their DNN counterparts.</para>
<para>The standard ML benchmarking, as discussed above, usually focuses on accuracy. This means that the resources needed due to the underlying algorithm complexity, and thus power consumption, are ignored. In resource-constrained use cases such as those in edge ML, the models are designed to provide a computational advantage. For resource-constrained systems assessing the algorithmic performance on a target task, algorithms can be compared in terms of complexity, which determines the runtime constraints. In classical machine learning, there are well-established metrics for comparing the complexity of algorithms. For example, decision trees are defined by the number of nodes and depth of the tree [<link linkend="ch1-bib19">19</link>]. NNs, on the other hand, are usually compared in terms of number of parameters or number of MAC operations [<link linkend="ch1-bib20">20</link>, <link linkend="ch1-bib21">21</link>, <link linkend="ch1-bib22">22</link>]. We refer the reader to the survey by Hu et al. [<link linkend="ch1-bib23">23</link>] for further discussion about model complexity. <link linkend="ch1-T3">Table 1.3</link> shows a classic representation of results for an edge ML algorithm, taking into account theresources used:</para>
<fig id="ch1-T3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 1.3:</emphasis> Representation of resource-constrained KPIs, adapted from [<link linkend="ch1-bib20">20</link>].</para></caption>
<graphic xlink:href="graphics/ch1-tab03.jpg"/>
</fig>
<para>In low power systems, the number of operations, multiply-accumulate (MAC), or multiply-add (MAD) are also used as an NN optimization parameter. The computation latency of an arithmetic block is also highly dependent on the precision used to represent the weights and activation of the NN (i.e., 8bit computations usually run at higher frequencies than for 32bits). For tiny devices, the type and number of layers of neural networks may be a metric of interest, as some hardware may be optimized for certain architectures: some platforms support separable convolutions, while others do not. The maximum supported activation size for a network layer can also be a limiting factor since some models might exceed this constraint for someembedded platforms.</para>
<para>Standard SNN topologies have also been compared using frameworks [<link linkend="ch1-bib24">24</link>]. Among the metrics that can be used to compare SNN models, the type of neurons and synapses, the number of emitted spikes and synaptic operations, and the rate of the SNNs are the most often used.</para>
<para>It remains difficult, however, to compare cross-paradigm algorithms, especially when comparing deep learning with emerging paradigms like SNNs. While some efforts have been made to compare ANN and SNNs [<link linkend="ch1-bib25">25</link>], a standard set of metrics has still to be defined.</para>
</section>
<section class="lev2" id="ch1-2-2">
<title>1.2.2 Hardware</title>
<para>An increasing number of hardware evaluation tools aim at benchmarking ML applications directly on the hardware. For example, QuTiBench [<link linkend="ch1-bib37">37</link>] presents a benchmarking tool that takes algorithmic optimization and co-design into account. The MLMark [<link linkend="ch1-bib27">27</link>] benchmark targets ML applications running on MCUs at the edge. However, both QuTiBench and MLMark models are too large for tiny applications and require large memories, which are not available on tiny edge devices. TinyMLPerf [<link linkend="ch1-bib28">28</link>] provides benchmarks for tiny systems based on imposed models and tasks, yielding the latency and speed-related KPIs. Submission of results using other network architectures is allowed in its open division. Further tools, like SMAUG [<link linkend="ch1-bib29">29</link>], MAESTRO [<link linkend="ch1-bib30">30</link>] and Aladdin [<link linkend="ch1-bib31">31</link>], provide software solutions to emulate workloads on deep-learning accelerators using varyingtopologies.</para>
<para>The power consumption of edge ML processing hardware is of utmost interest as it directly impacts the battery lifetime of a system. Dynamic power dominates in most high-throughput applications, while leakage power is only significant in low duty cycle modes [<link linkend="ch1-bib32">32</link>], where power gating, body biasing, and voltage scaling techniques are employed to reduce leakage. Peak power consumption corresponds to the maximum power consumption measured, which becomes relevant for battery- or energy harvesting-supplied applications.</para>
<para>The throughput metric indicates the number of operations that the hardware can perform per second, while latency is the time needed to perform an entire inference. Note that the peak throughput can usually not be reached for all network topologies, and latency does not directly scale with parallelization, as the peak throughput does [<link linkend="ch1-bib33">33</link>]. Thus, latency is a combined HW/SW metric. It can be measured by running multiple inferences and afterward averaging the execution time. All parameters to run the inference should be loaded before measuring the inference time.</para>
<para>The CMOS technology employed for the hardware design impacts the die size and the area efficiency, and thus also directly determines its cost. Area efficiency provides a figure of merit between the throughput, limited by hardware resources and frequency, that can be achieved per area. On-chip memory size provides a raw estimation of the number of parameters of the NN that can be stored on the chip. In a multi-core architecture, usually, both the number of neurons and number of synapses per coreare given.</para>
<para>Energy efficiency refers to the throughput that can be achieved per watt, which is equivalent to the number of operations per Joule. For obtaining this KPI, a NN is deployed to an inference accelerator, while execution time and power consumption are measured for performing inference. In the case of NNs, the multiply and accumulate (MAC) operation corresponds to two operations. Note that the bit precision of each operation directly impacts both the accuracy and the energy efficiency (e.g., 32bits float versus 8bits integer) and must therefore be carefully traded off. Energy per operation and energy per neuron are fair metrics if the bit resolution is provided since they are independent of the NN algorithm employed and therefore only hardware-related.</para>
<para>Some hardware only supports a limited number of layers and layer types with restricted dimensions. Others provide optimizations and specialized units. These optimizations, while not being directly comparable, have a strong impact on the hardware KPIs. Furthermore, power consumption is influenced by the core voltage supply, which depends on the CMOS technology used for the hardware design. Thus, the energy efficiency metric (TOPS/W) can be misleading unless all hardware restrictions are known. The same applies to other representations like GOPS/W. Typical display of performance in terms of OPS and associated power are presented in <link linkend="ch1-T4">Table 1.4</link>. and from these terms the TOPS/W metric can be extrapolated. However, recent publications provide combined metrics as it is shown in <link linkend="ch1-T5">Table 1.5</link>.</para>
<fig id="ch1-T4" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 1.4:</emphasis> Typical display of performance comparison of neuromorphic hardware platforms, adapted from [<link linkend="ch1-bib34">34</link>].</para></caption>
<graphic xlink:href="graphics/ch1-tab04.jpg"/>
</fig>
<fig id="ch1-T5" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 1.5:</emphasis> Recent display of performance comparison of neuromorphic hardware platforms, adapted from [<link linkend="ch1-bib35">35</link>].</para></caption>
<graphic xlink:href="graphics/ch1-tab05.jpg"/>
</fig>
<para>Processing hardware is limited by the supported arithmetic precisions for parameters and activations, with the previously mentioned effects on accuracy. Some hardware implementations allow for several bit resolutions, allowing to dynamically trade-off throughput, memory needs, and accuracy. Generally, lower precisions lead to lower algorithmic accuracy.</para>
</section>
</section>
<section class="lev1" id="ch1-3">
<title>1.3 Guidelines</title>
<para>Benchmarking of ML applications cannot be tackled as a standalone problem at the level of either only hardware or algorithms. A holistic view requires a wide range of expertise and domains. It requires a multidisciplinary and multidimensional approach considering, among other things, the hardware platform, the NN (model), and the use-case under evaluation. In order to make the right choices for building blocks, the system integrator needs to know the KPIs for a given use-case that different NNs will be able to deliver on different hardware platforms.</para>
<para>This section explains why a multidisciplinary approach combining both algorithms and hardware is needed to avoid drawing unfair and misleading conclusions and comparisons. In the following, we first describe what is unfair and fair benchmarking in Section 1.1, and then present a combined KPI approach and guidelines for benchmarking in sections 1.2and 1.3.</para>
<section class="lev2" id="ch1-3-1">
<title>1.3.1 Fair and Unfair Benchmarking</title>
<para>With the new generations of hardware accelerators, many optimizations in hardware try to co-optimize energy and performance, such as zero-skipping components, in-memory computing, and multi-core convolution units. However, it is sometimes unclear if these optimization features are correctly exploited when embedding complex deep learning models. This lack of transparency in the optimizations and embedding processes of the models results in sub-optimal deployments in the hardware. Furthermore, SDK documentation for a large number of accelerators is unclear or lacks critical content for high-level developers and data scientists to perform inference-time optimizations. This makes the embedding process and the subsequent measurements of the KPIs difficult.</para>
<para>Today, most models deployed on hardware are trained on GPU machines and deployed on target hardware platforms using their respective optimizations. The wide variety of optimizations employed in different hardware implementations [<link linkend="ch1-bib36">36</link>, <link linkend="ch1-bib37">37</link>] target specific use-cases, which might favor one or the other (benchmarking) algorithms (and the underlying layer types), further complicating fair benchmarking. Thus, there are hardware solutions that outperform others by orders of magnitude for specific tasks while providing poor performance in others. This type of benchmarking is unfair, as the models are not optimized and thus do not take advantage fully of each platform. Their KPIs are comparable, but the benchmarking is unfair with respect to the hardware, as a specially designed model for a particular platform could be more performant than another model deployed on another platform, see <link linkend="ch1-F1">Figure 1.1a</link>. This shows that use-case-agnostic benchmarking can be misleading. A platform might receive a low score with general benchmarks, while performing excellently for a hardware-tailored task.</para>
<fig id="ch1-F1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 1.1:</emphasis> Benchmarking fairness. (a) Unfair benchmarking: the KPIs are comparable, but the benchmarked hardware platforms are not exploited to their full potential. (b) Fair benchmarking: the hardware platforms are exploited to their full potential, but the resulting combined KPIs (KPI<math id="Ch1.F1.m2" display="inline"><msub><mi></mi><mrow><mi>C</mi><mo>&#x2062;</mo><mi>B</mi></mrow></msub></math>) are not comparable.</para></caption>
<graphic xlink:href="graphics/ch1-fig01.jpg"/>
</fig>
<para>In contrast, a fair benchmarking based on a defined use-case (independently of the model used) would exploit all the tools and optimizations provided by the constructor to exploit the hardware to its full potential. However, the results of the benchmarks can be challenging to compare, as the base model and optimizations are different between the compared hardware, see <link linkend="ch1-F1">Figure 1.1b</link>. If we compare with conventional benchmarking of processors, the benchmarks do not account for the underlying optimizations; a superscalar processor will be benchmarked against a non-superscalar processor using the same tests.</para>
<para>One particular aspect to take into account in the design of an inference accelerator is the selection of the CMOS technology and embedded non-volatile memory (eNVM). If eNVM is used for leveraging from the lack of power consumption for retaining the stored values after writing, the qualification of the memory by the foundry in the selected CMOS process is necessary for its industrialization and therefore a crucial criterion. The selection of the CMOS process has an impact on the cost and size of the inference accelerator IP that needs to be considered. Moreover, the CMOS process has also an impact on the active power and leakage power of the inference ASIC and needs to be part of the information provided for a fair comparison between inference accelerators fabricated in different CMOSprocesses.</para>
<para>There still remain challenges in the method of comparison. Benchmarking approaches for Von-Neumann architectures are relatively widespread and standardized [<link linkend="ch1-bib38">38</link>, <link linkend="ch1-bib39">39</link>]. By contrast, clear benchmarking methodologies for non-Von-Neumann architectures do not exist yet, making them difficult to compare. In particular, neuromorphic circuit design is an emerging multidisciplinary challenge that is still in an exploratory phase making the comparison of the underlying hardware difficult due to its variety. Although many existing techniques report significantly reduced energy consumption figures, they still compare themselves to standard low-power microcontrollers.</para>
<para>Benchmarking should be done at different stages and abstraction levels, considering various aspects such as the algorithm performance, the technical characteristics, the architectural parameters, and the flexibility and amenities hardware provides for a specific use-case. As of today, different KPI values can be obtained with the same algorithm and same hardware just by changing the use-case from always-on to event-based.</para>
</section>
<section class="lev2" id="ch1-3-2">
<title>1.3.2 Combined KPIs and Approaches for Benchmarking</title>
<para>The application deployment KPIs are at the intersection of the performance indicators required by a given use-case, the model solving the task, and the hardware system on which the application is deployed, see <link linkend="ch1-F2">Figure 1.2</link>. Because of the large number of KPIs that can be reported, it is difficult to have an objective comparison between different platforms, as a platform can perform well on certain KPIs and poorly on others (e.g., simulating an SNN on a CNN accelerator). Furthermore, not all platforms report the same set of metrics and the metrics are not usually convertible to each other (e.g., energy consumption is not always relying only on MAC operations).</para>
<fig id="ch1-F2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 1.2:</emphasis> Combined KPIs for fair benchmarking</para></caption>
<graphic xlink:href="graphics/ch1-fig02.jpg"/>
</fig>
<para>Some task-related metrics heavily depend on the use-case and application scenarios, and should be used only in these specific cases. For example, the performance of a keyword spotting algorithm should not be compared with the one of an object classification algorithm, even though both aim at high accuracy. For these reasons, a (small) set of KPIs are desirable which have the following properties:</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para>Orthogonality</para></listitem>
<listitem><para>Reproducibility</para></listitem>
<listitem><para>Objectiveness</para></listitem>
<listitem><para>Use-case independence</para></listitem>
</itemizedlist>
<para>To assess the performance of NN models running on hardware for a certain use-case, the KPIs should be combined, as shown in Table 1, to express the performance of the application on the hardware platform. In this regard, Fra et al. [<link linkend="ch1-bib40">40</link>] have proposed a multi-metric approach taking into account: 1) accuracy, 2) number of parameters of the NN, 3) memory footprint in MB. These three metrics provide an overview of the NNs: which one provides better results in the classification task and which one has a smaller memory footprint. Further metrics which should now be taken into consideration are: 4) Energy consumption per inference, 5) the number of operations per second.</para>
<para>The resulting KPIs of the deployment could also contain an indicator about the flexibility of the hardware accelerator. For comparison in terms of flexibility, it is necessary to indicate the supported layer types, the supported bit resolution for inputs, parameters and activation functions, and the sizes of the kernel filters. By combining metrics that depend on the NN algorithm and the hardware, a fair comparison for a use-case can be achieved if the number of parameters of the NN is optimized and the dataset employed isthe same.</para>
</section>
<section class="lev2" id="ch1-3-3">
<title>1.3.3 Outlook : Use-case Based Benchmarking</title>
<para>A solution to the afore-mentioned challenge would be to propose a use-case-dependent benchmarking that does not rely at all on the model architectures of the given model. For an industrial setting, it is interesting to obtain high performance independently of the techniques used. What matters is that the application performs within the given constraints of the use-case.</para>
<para>A solution is illustrated in <link linkend="ch1-F3">Figure 1.3</link>. In this paradigm, a use-case would be defined by some target KPIs to reach, such as minimum accuracy and maximum energy. To benchmark the hardware, an automated search technique, such as Network Architecture Search (NAS), would try to find the model that fits the target hardware and then optimize the model further to improve the latency or memory use. This type of benchmarking would be use-case dependent and model agnostic, beside the meta-model composing the automated search. Such benchmarking method would output comparable (combined) KPIs, making the comparison of hardware and the selection of the best one possible. Of course, an extensive benchmarking suite covering several use-cases (audio-based, image-based, classification, regression, etc.) is necessary to ensure fairness across domains.</para>
<fig id="ch1-F3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 1.3:</emphasis> Benchmarking pipeline based on use-cases. An automated search finds the best possible model exploiting the performance offered by each target hardware platforms. The resulting combined KPIs are comparable.</para></caption>
<graphic xlink:href="graphics/ch1-fig03.jpg"/>
</fig>
<para>Following the methodology presented, there are some guidelines to follow in order to ensure that the extracted KPIs respect the properties presented in the section 1.2. In addition to measuring the combined KPIs, it is necessary to provide information on the entire deployment pipeline. Indeed, the KPIs related to the solved task, the (final) model deployed on the hardware, the characteristics of the hardware, and finally, the combined KPIs based on the previous information can be calculated.</para>
<para>The use-cases should be clearly defined and cover several machine learning tasks. Although the methodology can be applied to a single use-case to compare a few hardware platforms, the industrial application cases are generally broad. It is, therefore, preferable to select a neuromorphic platform that offers the best performance for a wide range of tasks. This can only be achieved with a benchmarking tool that is diversified in terms of the tasks to be solved.</para>
<para>The methodology also requires a complete software tool chain to have rapid and reproducible deployments of the NNs on the hardware. Quantization-aware training tools or even better hardware-aware training tools compatible with the target hardware platforms are beneficial. The efficient execution of algorithms does not only depend on the hardware architecture, like the processing resources, but equally on an efficient mapping strategy that schedules the hardware resources for high throughput and low power consumption. Depending on the architecture, algorithm-to-hardware compilers or on-board schedulers ensure this optimization.</para>
<para>Finally, adequate documentation about the hardware technology, the search algorithm used for benchmarking, the use-case realized by the benchmark, and the interpretation of the results provided by the benchmark is necessary to empower the user in its selection of the most suitable hardware platform.</para>
</section>
</section>
<section class="lev1" id="ch1-4">
<title>1.4 Conclusion</title>
<para>In this paper, we have summarized the standard techniques for benchmarking NN accelerator hardware and ML software, in addition, we have specified the KPIs that are most relevant for resource aware inference. We have through example shown that, in ultra-low-power or neuromorphic systems, separating hardware and ML algorithms and use-case parameters leads to an ineffective means of comparison. Only when considering these three in a holistic manner, can system be benchmarked. Integrating KPIs that allow benchmarking at the system level in this way is complex. It is important to do this as the inability to benchmark the IoT systems today is reducing the uptake by industry. In this paper, we have proposed a benchmarking methodology based on use-cases where the ML algorithm is adapted to the hardware to allow fair comparison. Finally, we provide a guideline on what aspects are important to take into account while developing such benchmarking tool to ensure that the resulting KPIs are comparable.</para>
</section>
<section class="lev1">
<title>Acknowledgements</title>
<para>This work is supported through the project ANDANTE. ANDANTE has received funding from the ECSEL Joint Undertaking (JU) under grant agreement No 876925. The JU receives support from the European Union&#x2019;s Horizon 2020 research and innovation programme and France, Belgium, Germany, Netherlands, Portugal, Spain, Switzerland. ANDANTE has also received funding from the German Federal Ministry of Education and Research (BMBF) under Grant No. 16MEE0116. The authors are responsible for the content of this publication.</para>
</section>
<section class="lev1" id="ch1-Ref">
<title>References</title>
<para id="ch1-bib1">[1] M. Davies. Benchmarks for progress in neuromorphic computing. <emphasis>Nature Machine Intelligence</emphasis>, 1(9):386&#x2013;388, 2019.</para>
<para id="ch1-bib2">[2] B. J. Erickson and F. Kitamura. Magician&#x2019;s corner: 9. performance metrics for machine learning models. <emphasis>Radiology: Artificial Intelligence</emphasis>, 3(3), 2021.</para>
<para id="ch1-bib3">[3] A. R&#xe1;cz, D. Bajusz, and K. H&#xe9;berger. Multi-level comparison of machine learning classifiers and their performance metrics. <emphasis>Molecules</emphasis>, 24(15), 2019.</para>
<para id="ch1-bib4">[4] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. <emphasis>the Journal of machine Learning research</emphasis>, 12:2825&#x2013;2830, 2011.</para>
<para id="ch1-bib5">[5] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll&#xe1;r, and C. L. Zitnick. Microsoft coco: Common objects in context. In <emphasis>European conference on computer vision</emphasis>, pages 740&#x2013;755. Springer, 2014.</para>
<para id="ch1-bib6">[6] <ulink url="https://paperswithcode.com">https://paperswithcode.com</ulink>. Website, 2021.</para>
<para id="ch1-bib7">[7] A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images. 2009.</para>
<para id="ch1-bib8">[8] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In <emphasis>2009 IEEE conference on computer vision and pattern recognition</emphasis>, pages 248&#x2013;255. Ieee,2009.</para>
<para id="ch1-bib9">[9] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Doll&#xe1;r. Focal loss for dense object detection. In <emphasis>Proceedings of the IEEE international conference on computer vision</emphasis>, pages 2980&#x2013;2988, 2017.</para>
<para id="ch1-bib10">[10] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the kitti vision benchmark suite. In <emphasis>2012 IEEE conference on computer vision and pattern recognition</emphasis>, pages 3354&#x2013;3361. IEEE,2012.</para>
<para id="ch1-bib11">[11] V. Jain and E. Learned-Miller. Fddb: A benchmark for face detection in unconstrained settings. Technical report, UMass Amherst technical report, 2010.</para>
<para id="ch1-bib12">[12] S. Yang, P. Luo, C.-C. Loy, and X. Tang. Wider face: A face detection benchmark. In <emphasis>Proceedings of the IEEE conference on computer vision and pattern recognition</emphasis>, pages 5525&#x2013;5533, 2016.</para>
<para id="ch1-bib13">[13] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman. Glue: A multi-task benchmark and analysis platform for natural language understanding. <emphasis>arXiv preprint arXiv:1804.07461</emphasis>, 2018.</para>
<para id="ch1-bib14">[14] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. Squad: 100,000+ questions for machine comprehension of text. <emphasis>arXiv preprint arXiv:1606.05250</emphasis>, 2016.</para>
<para id="ch1-bib15">[15] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Openai gym. <emphasis>arXiv preprint arXiv:1606.01540</emphasis>, 2016.</para>
<para id="ch1-bib16">[16] G. Orchard, A. Jayawant, G. K. Cohen, and N. Thakor. Converting static image datasets to spiking neuromorphic datasets using saccades. <emphasis>Frontiers in neuroscience</emphasis>, 9:437, 2015.</para>
<para id="ch1-bib17">[17] A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T. Nayak, A. Andreopoulos, G. Garreau, M. Mendoza, et al. A low power, fully event-based gesture recognition system. In <emphasis>Proceedings of the IEEE conference on computer vision and pattern recognition</emphasis>, pages 7243&#x2013;7252, 2017.</para>
<para id="ch1-bib18">[18] J. Anumula, D. Neil, T. Delbruck, and S.-C. Liu. Feature representations for neuromorphic audio spike streams. <emphasis>Frontiers in neuroscience</emphasis>, 12:23, 2018.</para>
<para id="ch1-bib19">[19] H. Buhrman and R. De Wolf. Complexity measures and decision tree complexity: a survey. <emphasis>Theoretical Computer Science</emphasis>, 288(1):21&#x2013;43, 2002.</para>
<para id="ch1-bib20">[20] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In <emphasis>Proceedings of the IEEE conference on computer vision and pattern recognition</emphasis>, pages 4510&#x2013;4520, 2018.</para>
<para id="ch1-bib21">[21] N. Ma, X. Zhang, H.-T. Zheng, and J. Sun. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In <emphasis>Proceedings of the European conference on computer vision (ECCV)</emphasis>, pages 116&#x2013;131, 2018.</para>
<para id="ch1-bib22">[22] M. Tan and Q. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In <emphasis>International conference on machine learning</emphasis>, pages 6105&#x2013;6114. PMLR, 2019.</para>
<para id="ch1-bib23">[23] X. Hu, L. Chu, J. Pei, W. Liu, and J. Bian. Model complexity of deep learning: A survey. <emphasis>Knowledge and Information Systems</emphasis>, 63(10):2585&#x2013;2619, 2021.</para>
<para id="ch1-bib24">[24] S. R. Kulkarni, M. Parsa, J. P. Mitchell, and C. D. Schuman. Benchmarking the performance of neuromorphic and spiking neural network simulators. <emphasis>Neurocomputing</emphasis>, 447:145&#x2013;160, 2021.</para>
<para id="ch1-bib25">[25] S. Narduzzi, S. A. Bigdeli, S.-C. Liu, and L. A. Dunbar. Optimizing the consumption of spiking neural networks with activity regularization. In <emphasis>ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</emphasis>, pages 61&#x2013;65. IEEE, 2022.</para>
<para id="ch1-bib26">[26] M. Blott. <emphasis>Benchmarking Neural Networks on Heterogeneous Hardware</emphasis>. PhD thesis, Trinity College, 2021.</para>
<para id="ch1-bib27">[27] P. Torelli and M. Bangale. Measuring inference performance of machine-learning frameworks on edge-class devices with the mlmark benchmark. <emphasis>Techincal Report. Available online: <ulink url="https://www.eembc.org/techlit/articles/MLMARK-WHITEPAPERFINAL-1.pdf">https://www.eembc.org/techlit/articles/MLMARK-WHITEPAPERFINAL-1.pdf</ulink> (accessed on 5 April 2021)</emphasis>, 2021.</para>
<para id="ch1-bib28">[28] C. R. Banbury, V. J. Reddi, M. Lam, W. Fu, A. Fazel, J. Holleman, X. Huang, R. Hurtado, D. Kanter, A. Lokhmotov, et al. Benchmarking tinyml systems: Challenges and direction. <emphasis>arXiv preprint arXiv:2003.04821</emphasis>, 2020.</para>
<para id="ch1-bib29">[29] S. Xi, Y. Yao, K. Bhardwaj, P. Whatmough, G.-Y. Wei, and D. Brooks. Smaug: End-to-end full-stack simulation infrastructure for deep learning workloads. <emphasis>ACM Transactions on Architecture and Code Optimization (TACO)</emphasis>, 17(4):1&#x2013;26, 2020.</para>
<para id="ch1-bib30">[30] H. Kwon, P. Chatarasi, M. Pellauer, A. Parashar, V. Sarkar, and T. Krishna. Understanding reuse, performance, and hardware cost of dnn dataflows: A data-centric approach using maestro. 2020.</para>
<para id="ch1-bib31">[31] Y. S. Shao, B. Reagen, G.-Y. Wei, and D. Brooks. Aladdin: A pre-rtl, power-performance accelerator simulator enabling large design space exploration of customized architectures. In <emphasis>2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)</emphasis>, pages 97&#x2013;108. IEEE, 2014.</para>
<para id="ch1-bib32">[32] F. Fallah and M. Pedram. Standby and active leakage current control and minimization in cmos vlsi circuits. <emphasis>IEICE transactions on electronics</emphasis>, 88(4):509&#x2013;519, 2005.</para>
<para id="ch1-bib33">[33] J. Hanhirova, T. K&#xe4;m&#xe4;r&#xe4;inen, S. Sepp&#xe4;l&#xe4;, M. Siekkinen, V. Hirvisalo, and A. Yl&#xe4;-J&#xe4;&#xe4;ski. Latency and throughput characterization of convolutional neural networks for mobile computer vision. In <emphasis>Proceedings of the 9th ACM Multimedia Systems Conference</emphasis>, pages 204&#x2013;215, 2018.</para>
<para id="ch1-bib34">[34] M. Breiling, R. Struharik, and L. Mateu. Machine learning: Elektronenhirn 4.0. 2019.</para>
<para id="ch1-bib35">[35] Y.-H. Chen, T.-J. Yang, J. Emer, and V. Sze. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. <emphasis>IEEE Journal on Emerging and Selected Topics in Circuits and Systems</emphasis>, 9(2):292&#x2013;308, 2019.</para>
<para id="ch1-bib36">[36] P. Jokic, E. Azarkhish, A. Bonetti, M. Pons, S. Emery, and L. Benini. A construction kit for efficient low power neural network accelerator designs. <emphasis>arXiv preprint arXiv:2106.12810</emphasis>, 2021.</para>
<para id="ch1-bib37">[37] M. Blott, L. Halder, M. Leeser, and L. Doyle. Qutibench: Benchmarking neural networks on heterogeneous hardware. <emphasis>ACM Journal on Emerging Technologies in Computing Systems (JETC)</emphasis>, 15(4):1&#x2013;38, 2019.</para>
<para id="ch1-bib38">[38] EMBCC ULPMark: <ulink url="https://www.eembc.org/ulpmark/">https://www.eembc.org/ulpmark/</ulink>. Website, 2021.</para>
<para id="ch1-bib39">[39] EMBCC CoreMark: <ulink url="https://www.eembc.org/coremark/">https://www.eembc.org/coremark/</ulink>. Website, 2021.</para>
<para id="ch1-bib40">[40] V. Fra, E. Forno, R. Pignari, T. Stewart, E. Macii, and G. Urgese. Human activity recognition: suitability of a neuromorphic approach for on-edge aiot applications. <emphasis>Neuromorphic Computing and Engineering</emphasis>, 2022.</para>
</section>
</chapter>
<chapter class="chapter" id="ch2" label="2" xreflabel="2">
<title>Benchmarking the Epiphany Processor as a Reference Neuromorphic Architecture</title>
<subtitle>Maarten Molendijk<sup>1,2</sup>, Kanishkan Vadivel<sup>2</sup>, Federico Corradi<sup>2,1</sup>, Gert-Jan van Schaik<sup>1</sup>, Amirreza Yousefzadeh<sup>1</sup>, and Henk Corporaal<sup>2</sup></subtitle>
<affiliation><sup>1</sup>imec, Netherlands<?lb?><sup>2</sup>Technical University of Eindhoven, Netherlands</affiliation>
<section class="lev1">
<title>Abstract</title>
<para>This short article explains why the Epiphany architecture is a proper reference for digital large-scale neuromorphic design. We compare the Epiphany architecture with several modern digital neuromorphic processors. We show the result of mapping the binary LeNet-5 neural network into few modern neuromorphic architectures and demonstrate the efficient use of memory in Epiphany. Finally, we show the results of our benchmarking experiments with Epiphany and propose a few suggestions to improve the architecture for neuromorphic applications. Epiphany can update a neuron on average in 120ns which is enough for many real-time neuromorphic applications.</para>
<para><emphasis role="strong">Keywords:</emphasis> neuromorphic processor, spiking neural network, bio-inspired processing, artificial intelligence, edge AI</para>
</section>
<section class="lev1" id="ch2-1">
<title>2.1 Introduction and Background</title>
<para>Neuromorphic sensing and computing systems mimic the functions and the computational primitives of the nervous systems. Nevertheless, state-of-the-art Deep Neural Networks (DNNs) have exceeded the accuracy of biological brains (including the human brain) in specific tasks like video/audio processing, decision-making, planning and playing games. However, all of these tasks are done without considering one of the main restrictions in bio-evolution, the "energy consumption". The biological restrictions pushed the evolution toward power-efficient algorithms and architectures. The human brain is an extreme example that consumes a considerable portion (around 20%) of the human body&#x2019;s energy while it has less than 3% of the totalweight.</para>
<para>Even though the elements of the biological fabric in the brain are not as fast and arguably as power efficient as our modern silicon technologies, no computing platform can get close to the compute efficiency of the biological brain for processing natural signals. The brain is a perfect example of algorithm-hardware co-optimization. As mentioned, the ultimate goal of bio-inspired processing is to process the raw sensory data with the minimum amount of power consumption.</para>
<para>The Epiphany architecture was first introduced back in 2009 [<link linkend="ch2-bib1">1</link>] as a high-performance energy-efficient many-core architecture suitable for real-time embedded systems. Epiphany&#x2019;s architecture contains many RISC processor cores connected with a packet-based mesh Network-On-Chip (NoC). <link linkend="ch2-F1">Figure 2.1</link> shows the big picture of the Epiphany&#x2019;s architecture. This architecture is different from the mainstream von-Neumann type multi-core processors since in Epiphany, the cores are connected directly via a NoC without using a single shared memory to communicate. The mesh packet switch network in Epiphany results in highly efficient local data movement between neighbouring processors. However, it introduces a possible non-deterministic behaviour as the order of the packets in the mesh network is not guaranteed. Despite implementing a synchronization mechanism, the RISC processors work individually, and the architecture is not designed for strict synchronous execution (since it harms the scalability feature). Hence, programming epiphany with a conventional programming model is challenging. Therefore, Epiphany has never gained enough attention in the mainstream general-purpose processor market.</para>
<fig id="ch2-F1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 2.1:</emphasis> Overall scalable architecture of Epiphany-III [<link linkend="ch2-bib1">1</link>].</para></caption>
<graphic xlink:href="graphics/ch2-fig01.jpg"/>
</fig>
<para>In 2011, Adapteva, a kick-starter company, introduced the first processor based on the Epiphany architecture (<link linkend="ch2-F2">Figure 2.2</link>). It contains a 16 RISC core Epiphany chip, expandable to be used in a 256 multi-chip platform (4096 cores in total). The chip is implemented in a 65nm technology node and consumes less than 2 Watts. A few months later, Adapteva introduced a bigger version of the processor with 64 cores. The latest version of the processor [<link linkend="ch2-bib2">2</link>] was announced in 2016 and contains 1024 cores.</para>
<fig id="ch2-F2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 2.2:</emphasis> Adapteva launched an $99 Epiphany-III based single board computer as their first product.</para></caption>
<graphic xlink:href="graphics/ch2-fig02.jpg"/>
</fig>
<para>Despite the failure of Epiphany in the general-purpose compute domain, it has a very similar architecture to the neuromorphic processors which were introduced a few years later (e.g., SpiNNaker in 2013 [<link linkend="ch2-bib3">3</link>], IBM TrueNorth in 2015 [<link linkend="ch2-bib4">4</link>], Intel Loihi in 2018 [<link linkend="ch2-bib5">5</link>], BrainChip AKIDA in 2019 [<link linkend="ch2-bib6">6</link>] and GML NeuronFlow in 2020 [<link linkend="ch2-bib7">7</link>]). The main goal of neuromorphic engineering is to build a brain-inspired processor to execute variations of Spiking Neural Network (SNN) algorithms for real-time sensory signal processing. Programming to implement neural networks using conventional programming models and compilers is difficult (and inefficient), which resulted in a new paradigm shift in the programming models. A neural network usually contains neurons (as the processing unit) and weighted synapses/axons to connect the neurons in a graph like architecture. Therefore, several new graph-based programming models (like TensorFlow from Google and PyTorch from Facebook) are introduced to efficiently execute such applications.</para>
<para>The architecture is made up of eNode processing cores and eMesh routers to build connectivity networks. Each eNode contains a RISC processor (1GHz, with an integer and a floating-point ALU and a 64-word register file), 4 memory banks (each 64<emphasis>b</emphasis> &#xd7; 1024<emphasis>w</emphasis>) to store data (like synaptic weights and neuron states), and the instructions (like the neuron model) locally, a Network Interface (NI), Direct Memory Access (DMA) to handle incoming/outgoing packets, a few general timers (for example to implement periodic leakage) and a memory BUS interconnect which allows access to each memory bank simultaneously. The eMesh routers handle 3 separate networks. A high-performance network for sending one packet of data (spike) to the other cores with the maximum speed of one packet per clock cycle) and two lower performance networks (one for reading from another core&#x2019;s memory and one for off-chip communication) are introduced to make the programming easier. These programming models allow for easy splitting of the computational load over several processing units and mapping synaptic connectivity into the NoC. Therefore, they are a good fit for architectures like Epiphany.</para>
<para>Like the other neuromorphic architectures, Epiphany is extremely scalable, performs near memory processing, is optimized for local data movement (local connectivity) and asynchronous processing. The eMesh network is flexible enough to time multiplex any arbitrary synaptic connections. Besides, the eCores are flexible enough to implement different neuron models. Most importantly, the architecture is straightforward, which allows easy design space exploration and benchmarking. Finally, unlike all the other neuromorphic platforms it is accessible and affordable which makes it a suitable platform for benchmarking new neuromorphic platforms and innovative ideas.</para>
</section>
<section class="lev1" id="ch2-2">
<title>2.2 Comparison with a Few Well-Known Digital Neuromorphic Platforms</title>
<para>Probably the SpiNNaker architecture [<link linkend="ch2-bib3">3</link>] (introduced in 2013) is the most similar neuromorphic platform to Epiphany. SpiNNaker contains several ARM cores as the processing units connected through an advanced asynchronous packet-switched network.</para>
<para>Therefore, like Epiphany, the processing core is very flexible and can implement different neuron models with various mapping schemes. Unlike Epiphany, each SpiNNaker chip contains only one router, with a higher complexity level than the Epiphany&#x2019;s eMesh router. The SpiNNaker&#x2019;s NoC allows for multi-casting (using source-based addressing with a programmable routing table), which is an optimization on top of the plain mesh NoC.</para>
<para>Contrary to SpiNNaker, IBM TrueNorth [<link linkend="ch2-bib4">4</link>] (introduced in 2015) uses a plain mesh packet-switched network but with optimized processing cores. Therefore, the NoC in IBM TrueNorth is very similar to the Epiphany. Each core in the TrueNorth architecture is fixed to emulate 256 neurons, and each neuron with 256 input synapse (a crossbar architecture) and a single output axon (connectable to 256 neurons in any other core). The cores update all the neurons every 1<emphasis>ms</emphasis>. The synaptic weights are limited to be binary. This optimized processing core resulted in an ultra-low-power neuron update (about 26pJ). However, having such constrained cores makes the deployment of many neuromorphic applications either impossible or inefficient.</para>
<para>In Intel Loihi [<link linkend="ch2-bib5">5</link>] (introduced in 2018), the processing cores are more flexible than TrueNorth, and the interconnect is a simple packet-switched mesh. Each core in Loihi emulates 1024 neurons with a fixed neuron model, but the number of input synapses to each neuron and their resolution is flexible (1kb of synaptic memory per neuron). The number of output axons is also flexible, and one axon can be shared among many neurons. Loihi cores accelerate a bio-inspired learning algorithm. The cost of these flexibility is having a higher neuron update energy (about 80pJ) in comparison with the TrueNorth (while using a better technology node).</para>
<para>In addition to the three previous research platforms, many companies started to build neuromorphic processors for commercial purposes. For example, BrainChip AKIDA (introduced in 2019) and GML NeuronFlow (introduced in 2020) have similar architectures to Loihi.</para>
<para>One of the features in the research of neuromorphic chips is asynchronous processing and communication. In Loihi, the asynchronization level is inside the core&#x2019;s logic blocks. In SpiNNaker and TrueNorth, the cores are working asynchronously with each other in a Globally Asynchronous Locally Synchronous (GALS) structure. In Epiphany, NeuronFlow, and AKIDA, the asynchronousity level is pushed toward the boundaries of the chip (asynchronous chip to chip connectivity). Despite where is the boundary of asynchronousity, it is essential for scalability.</para>
<para>Nevertheless, in all the mentioned architectures, the cores still work individually with each other. Therefore, the implementation of a globally synchronous algorithm is not optimal in neuromorphic architectures.</para>
</section>
<section class="lev1" id="ch2-3">
<title>2.3 Major Challenges in Neuromorphic Architectures</title>
<para>Since neuromorphic architecture design aims to follow the principles of bio-inspired processing mechanism in the available nano-electronic technologies, facing several challenges that result from the platform constraints is normal. Many innovative schemes have been introduced to overcome the difficulties of developing neuromorphic technology and spiking neural network algorithm design. These challenges are discussed below.</para>
<section class="lev2" id="ch2-3-1">
<title>2.3.1 Memory Allocation</title>
<para>One of the main challenges in neuromorphic design is the available amount of local memory near or inside the processing element where the data is consumed. In the brain, there is no separation between memory and computation. This feature eliminates a) the memory bandwidth bottleneck issue and b) the high cost of data movement between the processing and a far-away memory block. To mimic this feature, neuromorphic chips use distributed memories near or inside the processing elements (to keep the synaptic weights and neuron states close to the processing unit). However, the onchip memory made by using the conventional SRAM memory technology is not area-efficient (compared to DRAM and Flash) and therefore expensive. Besides using a new denser memory technology [<link linkend="ch2-bib8">8</link>], one of the solutions to overcome this problem is the proper memory management and maximum reuse of the memory bits.</para>
<para>The important elements to be stored in each processing core are the spike queue(s), synaptic weights, neuron states, and axons (destination addresses). The depth and width of these memories heavily depend on the executable neuron model and supported connectivities. <link linkend="ch2-T1">Table 2.1</link> shows the memory allocations in different neuromorphic chips.</para>
<fig id="ch2-T1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 2.1:</emphasis> Memory fragmentations in some digital large-scale neuromorphic chips</para></caption>
<graphic xlink:href="graphics/ch2-tab01.jpg"/>
</fig>
<para>Flexibility in the memory allocations allows for optimized mapping of a neural network in the processor. Different neurons in the neural network have a different number of inputs/outputs and different amounts of activities. Some neuromorphic chips allow flexible parameter resolution to trade-off accuracy and SNN size [<link linkend="ch2-bib5">5</link>]. Since the range of the parameters is sometimes more important than the resolutions of the parameters, using smaller floating-point representations (like BrainFloat16 [<link linkend="ch2-bib9">9</link>]) may results in better accuracy and power/area performance than using a larger inter (like int32) format. Therefore, it is possible to trade-off the memory footprint and complexity of the operations.</para>
<para>Another method to use the memory space efficiently is to store a compressed form of the parameters when there is a high amount of sparsity in the synaptic weight tensor [<link linkend="ch2-bib10">10</link>]. Weight sharing is another method to efficiently use the memory for spiking Convolutional Neural Networks (sCNN) [<link linkend="ch2-bib5">5</link>] [<link linkend="ch2-bib7">7</link>].</para>
<para>The Epiphany contains 256kb of memory per core and is the most flexible architecture in <link linkend="ch2-T1">Table 2.1</link>. In the table N/A means we could not find the data publicly. Axons are the destination core addresses to route spikes from a neuron. All the numbers in this table are for a single processing core inside the mentioned neuromorphic chip. All the above-mentioned schemes can be implemented in Epiphany to optimally use the memory space. To demonstrate the value of flexibility for efficient use of memory, in <link linkend="ch2-T2">Table 2.2</link> we show the result of mapping the binary LeNet-5 [<link linkend="ch2-bib11">11</link>] into the above-mentioned neuromorphic architecture. The average pooling layers are optimized out in the mapping (as average pooling is a linear operation and does not consume stateful neurons). The mappings are hand optimized with only memory constraint. In TrueNorth, several neurons need to be combined to make a single neuron with enough synapses and axons. Also, since weight-sharing is not used, the weight for each synapse needs to be stored individually. In the flexible architectures, the neuron states are assumed to be 16b, without refractory mechanism and with a single threshold per channel. Mapping in SpiNNaker is done with the &#x201c;Convnet Optimized Implementation&#x201d; which is described in [<link linkend="ch2-bib12">12</link>]. Total memory used is (<emphasis>number of cores &#xd7; memory per core</emphasis>).</para>
<fig id="ch2-T2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 2.2:</emphasis> Mapping LeNet-5 neural network (with binary weights) in different neuromorphic architectures</para></caption>
<graphic xlink:href="graphics/ch2-tab02.jpg"/>
</fig>
</section>
<section class="lev2" id="ch2-3-2">
<title>2.3.2 Efficient Communication</title>
<para>Using a packet to communicate spikes between cores can be very inefficient. A spike packet that carries a single bit of data (spike) contains several bits for the address. For example, a spike packet in SpiNNaker contains 44b of data to communicate a single binary spike in the AER format [<link linkend="ch2-bib13">13</link>]. There are several possible solutions to reduce the number of bits for communicating spikes. One solution is to use a more complex neuron model (for example [<link linkend="ch2-bib14">14</link>] and [<link linkend="ch2-bib15">15</link>]) with a lower firing rate (trading off operation complexity with the number of packets). Another solution is to compress several spikes into one event. For instance, when the destination core for several packets is the same, we can compress them easily in one hyper-packet. Epiphany&#x2019;s packets are fixed in size (104b packet with a 64b payload data) but the format of payload data is programmable.</para>
<para>TrueNorth [<link linkend="ch2-bib4">4</link>] and NeuronFlow [<link linkend="ch2-bib7">7</link>] use a relative addressing scheme which allows reducing the number of bits for the destination address in the packet when a limited communication range is acceptable. For example, in a platform with 4096 cores, if the destination address contains only 4b, a core can only communicate with 16 neighbouring cores which might be sufficient for many applications. This results in a saving of 8b per packet (from 12b address in a 4096-cores system to 4b-address). Another method to reduce the number of packets is the multi-casting feature which is used in SpiNNaker [<link linkend="ch2-bib3">3</link>]. In this case, a core can only send one spike out and this spike will be multicasted in the NoC and near the destination cores. Epiphany uses the basic mesh NoC interconnect which is a shortcoming but contributes to its simplicity.</para>
</section>
<section class="lev2" id="ch2-3-3">
<title>2.3.3 Mapping SNN onto Hardware</title>
<para>An optimized mapping algorithm can reduce the memory footprint (by performing maximum sharing of parameters), balance the loads in different cores (as not all the neurons in an SNN are equally active) and reduce the core-to-core communications (since it is expensive in terms of power consumption and latency). Having a flexible number of neurons per core and synapses per neuron allows the mapping optimizer to find a better solution. The Epiphany platform can be used to benchmark different mapping algorithms in the neuromorphic domain because of its flexible and unified memory architecture.</para>
</section>
<section class="lev2" id="ch2-3-4">
<title>2.3.4 On-chip Learning</title>
<para>On-chip learning is supported as a futuristic feature in some neuromorphic chips (like Loihi [<link linkend="ch2-bib5">5</link>] and AKIDA [<link linkend="ch2-bib16">16</link>]). However, implementation a hardware acceleration for on-chip learning is challenging. First, because the algorithm domain is very dynamic (experimental), it is difficult to find a suitable algorithm for a wide range of applications. Second, many applications can be pre-trained and only require fine-tuning after deployment. Therefore, the learning acceleration might be used only for a few last layers of the neural network (after general feature extraction layers). Epiphany does not have a hardware accelerated learning engine, but it allows for software implementation of those algorithms and therefore benchmarking the new learning algorithms.</para>
</section>
<section class="lev2" id="ch2-3-5">
<title>2.3.5 Idle Power Consumption</title>
<para>One of the challenges in event-based neuromorphic processors is the power consumption when the cores are in the idle state (no event to be processed). It is reported that around 30% of power consumption for TrueNorth [<link linkend="ch2-bib4">4</link>] and Loihi [<link linkend="ch2-bib17">17</link>] is the idle power. This can be even worse when the application is sparser. It is possible to reduce idle power consumption by using asynchronous design or clock gating when no input spike is processed. Also, using a non-volatile memory technology helps to reduce leakage in the memory cells (since neuromorphic chips are mostly memory dominant). Epiphany supports dynamic clock gating for the processing cores. In this case, a core can only wake up by an interrupt (for example, receiving a new input packet).</para>
</section>
</section>
<section class="lev1" id="ch2-4">
<title>2.4 Measurements from Epiphany</title>
<para>In this section, we present some of our measurements using the Epiphany processor to provide a sense of its performance for possible neuromorphic applications.</para>
<para>We implemented a neural network with a Leaky integrate and Fire (LIF) neuron model with different parameters in Epiphany and measured processing time for different processes, which can be used as a reference for assessment of Epiphany when one wants to use it as a neuromorphic processor. Our measurements in this work consider the processing time (no power measurement) and are performed using the hardware timers inside the cores.</para>
<para>The compiled instructions (not hand-optimized) for our experiments took around 52kb of the used cores&#x2019; memory. Since the instruction code is almost similar for all the cores, it will be copied in each core&#x2019;s memory. It is therefore recommended to use bigger cores (more memory), so instruction memory takes only a small fraction of the total memory and is used for a higher number of neurons.</para>
<para><link linkend="ch2-F3">Figure 2.3</link> shows a flowchart of our neuron model with the processing cycle time attached to each block, where N is the number of neurons, F is the number of firings, X is the neuron state, W is the synaptic weight, Thr is the firing threshold, Time is the current time (read from Timer), LFT is the last firing time (stored per neuron), Ref is the refractory time and LR is the leak rate.</para>
<fig id="ch2-F3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 2.3:</emphasis> Flow chart of processing a LIF neuron with processing time measured in Epiphany.</para></caption>
<graphic xlink:href="graphics/ch2-fig03.jpg"/>
</fig>
<para>An input spike enters the eCore through the DMA and interrupts the RISC core. Then a process handles this spike and puts it in a FIFO (made with a software process). Thereafter, the target neurons will get updated. After updates, the threshold of neurons is checked, and the refractory check is executed for each firing. If both checks pass, the firing process starts, and the RISC core commands to the DMA to transmit a spike packet. Membrane leakage is also an independent process that starts with a timer interrupt.</para>
<para>Each cycle takes 1ns when using a 1GHz clock frequency. For example, processing a single spike from the first convolutional layer of LeNet-5 to the second convolutional layer requires to update 16<math id="Ch2.S4.p6.m1" display="inline"><mo>&#xd7;</mo></math>5<math id="Ch2.S4.p6.m2" display="inline"><mo>&#xd7;</mo></math>5 neurons. When the second layer is implemented in a core and 1% of the updated neurons fire, the processing time takes around 46us. The leak process on all these 400 neurons takes around 12us. Our measurements are averaged over many experiments and therefore the numbers in this figure are reasonable estimations. Since the neuron model is programmable, one may decide to remove some of the components (like refractory) or make it more complex (for example by introduction of an individual threshold for every neuron)</para>
<para>In <link linkend="ch2-F3">Figure 2.3</link> we showed that updating a neuron with a single spike takes around 120ns on average. We know that TrueNorth can update all of the neurons in a core every 1ms, to be suitable for real-time neuromorphic applications. If we assume a reasonable sparsity in the input spikes in each time-step (32 input spikes per neuron with 256 input synapses), with 120ns update time, Epiphany can also process the 256 neurons in less than 1ms.</para>
</section>
<section class="lev1" id="ch2-5">
<title>2.5 Conclusion</title>
<para>This article demonstrates that the Epiphany processor is compatible with neuromorphic computing. Overall, it has a similar architecture to the well-known neuromorphic processors and is flexible enough for the implementation of new ideas. Unlike Epiphany, all the mentioned neuromorphic processors contain optimized elements that add complexity to the architecture and make it less flexible to be a reference benchmarking architecture (flexibility vs efficiency trade-off). For example, having a fixed number of neurons per core (in TrueNorth, Loihi, and NeuronFlow) does not allow for optimized resource management during mapping. Also, having an accelerated learning mechanism (in Loihi) may be unnecessary for many applications. Additionally, suppose one wants to know the performance improvement of the SpiNNaker processor due to its optimized NoC. In that case, Epiphany is an excellent platform to compare to, due to its simplicity and flexibility.</para>
<para>As mentioned, not having any accelerator makes the epiphany less efficient compared to the accelerated architectures (like Loihi), but it increases its value for benchmarking the performance improvement of any accelerators.</para>
<para>We have implemented a neural network system and measured the processing time for different components of the LIF neuron model. It is already visible that some small improvements (like having a hardware FIFO) can improve the performance of the system. Increasing the size of the core results in better memory saving, but the designer should scale the performance of the cores as well (by the implementation of the schemes like multi-threading [<link linkend="ch2-bib5">5</link>] and SIMD, as it is implemented in the forthcoming SpiNNaker2.0 platform [<link linkend="ch2-bib18">18</link>]). Other improvements (like adding a more suitable interconnect) can be examined and is a topic for our future research. All source code used to benchmark the system and perform hands-on experiments is freely available upon request ({amirreza.yousefzadeh, gert-jan.vanschaik}@imec.nl)</para>
</section>
<section class="lev1">
<title>Acknowledgements</title>
<para>This technology is partially funded and initiated by the Netherlands and European Union&#x2019;s Horizon 2020 research and innovation projects <emphasis role="strong">TEMPO</emphasis> (ECSEL Joint Undertaking under grant agreement No 826655) and <emphasis role="strong">ANDANTE</emphasis> (ECSEL Joint Undertaking under grant agreement No 876925).</para>
</section>
<section class="lev1" id="ch2-Ref">
<title>References</title>
<para id="ch2-bib1">[1] A. Olofsson, et al., Kickstarting high-performance energy-efficient manycore architectures with epiphany, in 2014 48th Asilomar Conference on Signals, Systems and Computers, IEEE, 2014, pp. 1719&#x2013;1726.</para>
<para id="ch2-bib2">[2] A. Olofsson, Epiphany-v: A 1024 processor 64-bit risc system-on-chip, arXiv preprint arXiv:1610.01832.</para>
<para id="ch2-bib3">[3] E. Painkras, et al., Spinnaker: A 1-w 18-core system-on-chip for massively-parallel neural network simulation, IEEE Journal of Solid-State Circuits 48 (8) (2013) 1943&#x2013;1953.</para>
<para id="ch2-bib4">[4] F. Akopyan, et al., Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip, IEEE transactions on computer-aided design of integrated circuits and systems 34 (10) (2015) 1537&#x2013;1557.</para>
<para id="ch2-bib5">[5] M. Davies, et al., Loihi: A neuromorphic manycore processor with on-chip learning, IEEE Micro 38 (1) (2018) 82&#x2013;99.</para>
<para id="ch2-bib6">[6] M. Demler, Brainchip akida is a fast learner, spiking-neural-network processor identifies patterns in unlabeled data, Microprocessor Report (2019).</para>
<para id="ch2-bib7">[7] O. Moreira, et al., Neuronflow: a neuromorphic processor architecture for live ai applications, in 2020 Design, Automation &amp; Test in Europe Conference &amp; Exhibition (DATE), IEEE, 2020, pp. 840&#x2013;845.</para>
<para id="ch2-bib8">[8] E. Miranda, J. Su&#xf1;&#xe9;, Memristors for neuromorphic circuits and artificial intelligence applications (2020).</para>
<para id="ch2-bib9">[9] N. P. Jouppi, et al., In-datacenter performance analysis of a tensor processing unit, in: Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017, pp. 1&#x2013;12.</para>
<para id="ch2-bib10">[10] V. Sze, Y.-H. Chen, T.-J. Yang, J. S. Emer, Efficient processing of deep neural networks, Synthesis Lectures on Computer Architecture 15 (2) (2020) 1&#x2013;341.</para>
<para id="ch2-bib11">[11] Y. LeCun, et al., Lenet-5, convolutional neural networks, URL: <ulink url="http://yann.lecun.com/exdb/lenet">http://yann.lecun.com/exdb/lenet</ulink> 20 (5) (2015) 14.</para>
<para id="ch2-bib12">[12] A. Yousefzadeh, et al., Performance comparison of time-step-driven versus event-driven neural state update approaches in spinnaker, in 2018 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, 2018, pp. 1&#x2013;4.</para>
<para id="ch2-bib13">[13] A. Yousefzadeh, et al., Fast predictive handshaking in synchronous FPGAs for fully asynchronous multisymbol chip links: Application to spinnaker 2-of-7 links, IEEE Transactions on Circuits and Systems II: Express Briefs 63 (8) (2016) 763&#x2013;767.</para>
<para id="ch2-bib14">[14] A. Yousefzadeh, et al., Asynchronous spiking neurons, the natural key to exploit temporal sparsity, IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9 (4) (2019) 668&#x2013;678. doi:10.1109/JETCAS.2019.2951121.</para>
<para id="ch2-bib15">[15] B. Yin, et al., Effective and efficient computation with multiple-timescale spiking recurrent neural networks, in International Conference on Neuromorphic Systems 2020, ICONS 2020, Association for Computing Machinery, New York, NY, USA, 2020. doi:10.1145/3407197.3407225.</para>
<para id="ch2-bib16">[16] S. Thorpe, et al., Method, digital electronic circuit, and system for unsupervised detection of repeating patterns in a series of events, US Patent App. 16/349,248 (Sep. 19, 2019).</para>
<para id="ch2-bib17">[17] P. Blouw, et al., Benchmarking keyword spotting efficiency on neuromorphic hardware, in: Proceedings of the 7th Annual Neuro-inspired Computational Elements Workshop, 2019, pp. 1&#x2013;8.</para>
<para id="ch2-bib18">[18] C. Mayr, S. H&#xf6;ppner, and S. Furber (2019). SpiNNaker 2: a 10 million core processor system for brain simulation and machine learning-keynote presentation. In Communicating Process Architectures 2017 &amp; 2018 277-280, IOS Press, 2019.</para>
</section>
</chapter>
<chapter class="chapter" id="ch3" label="3" xreflabel="3">
<title>Temporal Delta Layer: Exploiting Temporal Sparsity in Deep Neural Networks for Time-Series Data</title>
<subtitle>Preetha Vijayan<sup>1,2</sup>, Amirreza Yousefzadeh<sup>2</sup>, Manolis Sifalakis<sup>2</sup>, and Rene van Leuken<sup>1</sup></subtitle>
<affiliation><sup>1</sup>TU Delft, Netherlands<?lb?><sup>2</sup>imec, Netherlands</affiliation>
<section class="lev1">
<title>Abstract</title>
<para>Real-time video processing using state-of-the-art deep neural networks (DNN) has managed to achieve human-like accuracy in the recent past but at the cost of considerable energy consumption, rendering them infeasible for deployment on edge devices. The energy consumed by running DNNs on hardware accelerators is dominated by the number of memory read/writes and multiply-accumulate (MAC) operations required. This work explores the role of activation sparsity in efficient DNN inference as a potential solution. As matrix-vector multiplication of weights with activations is the most predominant operation in DNNs, skipping operations and memory fetches where (at least) one of them is a zero can make inference more energy efficient. Although spatial sparsification of activations is researched extensively, introducing and exploiting temporal sparsity has received far less attention in DNN literature. This work introduces a new DNN layer (called temporal delta layer) whose primary objective is to induce temporal activation sparsity during training. The temporal delta layer promotes activation sparsity by performing delta operation that is aided by activation quantization and l<sub>1</sub> norm based penalty to the cost function. As a result, the final model behaves like a conventional quantized DNN with high temporal activation sparsity during inference. The new layer was incorporated into the standard ResNet50 architecture to be trained and tested on the popular human action recognition dataset, UCF101. The method resulted in a 2x improvement in activation sparsity, with a 5% reduction in accuracy.</para>
</section>
<section class="lev1" id="ch3-1">
<title>3.1 Introduction</title>
<para>DNNs have lately managed to successfully analyze video data to perform action recognition [<link linkend="ch3-bib1">1</link>], object tracking [<link linkend="ch3-bib2">2</link>], object detection [<link linkend="ch3-bib3">3</link>], etc., with human-like accuracy and robustness. Unfortunately, DNNs&#x2019; high accuracy comes with considerable costs, in terms of computation and memory consumption, resulting in high energy consumption. This makes them unsuitable for always-on edge devices.</para>
<para>Techniques such as network pruning, quantization, regularization, and knowledge distillation [<link linkend="ch3-bib4">4</link>] [<link linkend="ch3-bib5">5</link>] have helped reduce model size over time, resulting in less compute and memory consumption overall. Sparsity is a prominent aspect in all of the aforementioned methods. This is significant because sparse tensors allow computations involving zero multiplication to be skipped. They are also easy to store and retrieve in memory. In the DNN literature, structural sparsity (of weights) and spatial sparsity (of activations) are well-studied topics [<link linkend="ch3-bib6">6</link>]. However, while being a popular concept in neuromorphic computing, temporal activation sparsity has received less attention in the context of DNN.</para>
<para>This work applies the concept of change or delta based processing to the training and inference phases of deep neural networks, drawing inspiration from the human retina [<link linkend="ch3-bib7">7</link>]. DNN inference, which processes each frame independently with no regard to the temporal correlation is dense and obscenely wasteful. Whereas, processing only the changes in the network can lead to zero-skipping in sparse tensor operations minimizing the redundant operations and memory accesses.</para>
<para>Therefore, the proposed methodology in this work induces temporal sparsity to theoretically any DNN by incorporating a new layer (named temporal delta layer), which can be introduced in a DNN at any phase (training, refinement, or inference only). This new layer can be integrated to an existing architecture by positioning it after all or some of the ReLU activation layers as deemed beneficial (see <link linkend="ch3-F1">Figure 3.1</link>). The inclusion of this layer does not necessitate any changes to the preceding or following layers. Furthermore, the new layer adds a novel sparsity penalty to the overall cost function of the DNN during the training phase. This l<sub>1</sub> norm based penalty minimizes the activation density of the delta maps (i.e., temporal difference between two consecutive feature maps). Apart from that, the new layer is compared in conjunction with two activation quantization methods, namely fixed-point quantization (FXP) and learned step-size quantization (LSQ).</para>
<fig id="ch3-F1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 3.1:</emphasis> (a) Standard DNN, and (b) DNN with proposed temporal delta layer</para></caption>
<graphic xlink:href="graphics/ch3-fig01.jpg"/>
</fig>
</section>
<section class="lev1" id="ch3-2">
<title>3.2 Related Works</title>
<para>Although DNNs are in essence bio-inspired, they have not been able to find the balance between power consumption and accuracy yet, especially while dealing with computationally heavy streaming signals. On the other hand, the brain&#x2019;s neocortex handles complex tasks like sensory perception, planning, attention, and motor control while consuming less than 20 W [<link linkend="ch3-bib8">8</link>]. Scalable architecture, in-memory computation, parallel processing, communication using spikes, low precision computation, sparse distributed representation, asynchronous execution, and fault tolerance are some of the characteristics of the biological neural networks that can be leveraged to bridge the energy consumption gap between the brain and DNNs [<link linkend="ch3-bib9">9</link>]. Among these, the proposed methodology focuses on the viability of using sparsity within DNNs to achieve energy efficiency. During a matrix-vector multiplication between a weight matrix and an activation vector, zero elements in the tensor can be skipped leading to computational as well as memory access reduction (see <link linkend="ch1-F2"><link linkend="ch1-F2">Figure 1.2</link></link>).</para>
<fig id="ch3-F2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 3.2:</emphasis> Sparsity in activation (&#x394;x) drastically reduce the memory fetches and multiplications between &#x394;x and columns of weight matrix, W, that correspond to zero [<link linkend="ch3-bib10">10</link>].</para></caption>
<graphic xlink:href="graphics/ch3-fig02.jpg"/>
</fig>
<para>There are broadly two types of sparsity available in DNNs: weight sparsity (related to the interconnect between neurons) and activation sparsity (related to the number of neurons). Furthermore, activation sparsity can be categorised into spatial and temporal sparsity, which exploits the spatial and temporal correlation within the activations, respectively, [<link linkend="ch3-bib11">11</link>]. Unlike weight and spatial sparsity [<link linkend="ch3-bib12">12</link>, <link linkend="ch3-bib13">13</link>, <link linkend="ch3-bib14">14</link>, <link linkend="ch3-bib15">15</link>], exploiting the temporal redundancy of DNNs while processing streaming data as a means to reduce energy consumption is a relatively less explored idea. Exploiting temporal sparsity translates to skipping re-calculation of a function when its input remains unchanged since the last update.</para>
<para>One of the methods to exploit temporal sparsity is to use the compressed representation (like H.264, MPEG-4, etc.) of videos at the input stage itself. These compression techniques only retain a few key-frames completely and reconstruct others using motion vectors and residual error, thus using temporal redundancy [<link linkend="ch3-bib16">16</link>] [<link linkend="ch3-bib17">17</link>]. Another path includes finding a neuron model which is somewhere in between &#x201c;frame-based DNN&#x201d; and &#x201c;event-based spiking neural networks&#x201d;. This work is an attempt in the aforementioned direction. A similar work, CBInfer [<link linkend="ch3-bib18">18</link>], proposes replacing all spatial convolution layers in a network with change-based temporal convolution layers (or CBconv layers). In this, a signal change is propagated forward only when a certain threshold is exceeded. Likewise, [<link linkend="ch3-bib19">19</link>] tapped into temporal sparsity by introducing Sigma-Delta Networks, where neurons in one layer communicated with neurons in the next layer through discretized delta activations. An issue when it comes to CBInfer is the potential error accumulation over time as the method is threshold-based. If the neuron states are not reset periodically, this threshold can cause drift in the approximation of the activation signal and degrade the accuracy. Whereas, sigma-delta scheme experiments on smaller datasets like temporal MNIST, which might not be a reliable confirmation of the method&#x2019;s effectiveness.</para>
</section>
<section class="lev1" id="ch3-3">
<title>3.3 Methodology</title>
<para>In video-based applications, traditional deep neural networks rely on frame-based processing. That is, each frame is processed entirely through all the layers of the model. However, there is very little change in going from one frame to the next through time, which is called temporal locality. Therefore, it is wasteful to perform computations to extract the features of the non-changing parts of the individual frame. Taking that concept deeper into the network, if feature maps of two consecutive frames are inspected after every activation layer throughout the model, this temporal overlap can be observed. Therefore, this work postulates that temporal sparsity can be significantly increased by focusing the inference of the model only on the changing pixels of the feature maps (or deltas).</para>
<section class="lev2" id="ch3-3-1">
<title>3.3.1 Delta Inference</title>
<para>This work introduces a new layer that calculates the delta (or difference) between two temporally consecutive feature maps and quantifies the degree of these changes at only relevant locations in the frame. Since zero changes are not propagated through the layer, the role of this layer may be perceived as "analog event propagation". It is considered an "analog event" as it is not the presence of change, but the magnitude of change that is propagated through.</para>
<para>To better understand it mathematically, in a standard DNN layer, the output activation is related to its weights and input vector through Eq. 3.1 and 3.2.</para>
<table id="Ch3.E1">
<tr>
<td><math id="Ch3.E1.m1" display="block"><mrow><msub><mi>Y</mi><mi>t</mi></msub><mo>=</mo><mrow><mrow><mi>W</mi><mo>&#x2062;</mo><msub><mi>X</mi><mi>t</mi></msub></mrow><mo>+</mo><mi>B</mi></mrow></mrow></math></td>
<td>(3.1)</td>
</tr>
<tr>
<td><math id="Ch3.E2.m1" display="block"><mrow><msub><mi>Z</mi><mi>t</mi></msub><mo>=</mo><mrow><mi>&#x3c3;</mi><mo>&#x2062;</mo><mrow><mo stretchy="false">(</mo><msub><mi>Y</mi><mi>t</mi></msub><mo stretchy="false">)</mo></mrow></mrow></mrow></math></td>
<td>(3.2)</td>
</tr>
</table>
<para>where W and B represent the weights and bias parameters, Xt represents the input vector, and Yt represents the transitional state. Then, Zt is the output vector which is the result of <math id="Ch3.S3.SS1.p2.m1" display="inline"><mrow><mi>&#x3c3;</mi><mrow><mo stretchy="false">(</mo><mo>.</mo><mo stretchy="false">)</mo></mrow></mrow></math> - a non-linear activation function. <math id="Ch3.S3.SS1.p2.m2" display="inline"><mi>t</mi></math> indicates that the tensor has a temporal dimension. However, in the temporal delta layer, weight-input multiplication transforms into,</para>
<table id="Ch3.E3">
<tr>
<td><math id="Ch3.E3.m1" display="block"><mrow><mrow><mi mathvariant="normal">&#x394;</mi><mo>&#x2062;</mo><msub><mi>Y</mi><mi>t</mi></msub></mrow><mo>=</mo><mrow><mi>W</mi><mo>&#x2062;</mo><mi mathvariant="normal">&#x394;</mi><mo>&#x2062;</mo><msub><mi>X</mi><mi>t</mi></msub></mrow><mo>=</mo><mrow><mi>W</mi><mo>&#x2062;</mo><mrow><mo stretchy="false">(</mo><mrow><msub><mi>X</mi><mi>t</mi></msub><mo>-</mo><msub><mi>X</mi><mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub></mrow><mo stretchy="false">)</mo></mrow></mrow></mrow></math></td>
<td>(3.3)</td>
</tr>
<tr>
<td><math id="Ch3.E4.m1" display="block"><mtable columnspacing="0pt" displaystyle="true" rowspacing="0pt"><mtr><mtd columnalign="right"><msub><mi>Y</mi><mi>t</mi></msub></mtd><mtd columnalign="left"><mrow><mi></mi><mo>=</mo><mrow><mrow><mi mathvariant="normal">&#x394;</mi><mo>&#x2062;</mo><msub><mi>Y</mi><mi>t</mi></msub></mrow><mo>+</mo><msub><mi>Y</mi><mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub></mrow></mrow></mtd></mtr><mtr><mtd></mtd><mtd columnalign="left"><mrow><mrow><mi></mi><mo>=</mo><mrow><mrow><mi>W</mi><mo>&#x2062;</mo><mrow><mo stretchy="false">(</mo><mrow><msub><mi>X</mi><mi>t</mi></msub><mo>-</mo><msub><mi>X</mi><mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub></mrow><mo stretchy="false">)</mo></mrow></mrow><mo>+</mo><mrow><mi>W</mi><mo>&#x2062;</mo><mrow><mo stretchy="false">(</mo><mrow><msub><mi>X</mi><mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub><mo>-</mo><msub><mi>X</mi><mrow><mi>t</mi><mo>-</mo><mn>2</mn></mrow></msub></mrow><mo stretchy="false">)</mo></mrow></mrow><mo>+</mo><mi mathvariant="normal">&#x2026;</mi><mo>+</mo><msub><mi>Y</mi><mn>0</mn></msub></mrow></mrow><mo rspace="12.5pt">,</mo><mrow><mrow><mi>w</mi><mo>&#x2062;</mo><mi>h</mi><mo>&#x2062;</mo><mi>e</mi><mo>&#x2062;</mo><mi>r</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi>e</mi></mpadded><mo>&#x2062;</mo><msub><mi>Y</mi><mn>0</mn></msub></mrow><mo>=</mo><mi>B</mi></mrow></mrow></mtd></mtr><mtr><mtd></mtd><mtd columnalign="left"><mrow><mrow><mi></mi><mo>=</mo><mrow><mrow><mi>W</mi><mo>&#x2062;</mo><msub><mi>X</mi><mi>t</mi></msub></mrow><mo>+</mo><mi>B</mi></mrow></mrow><mo>,</mo></mrow></mtd></mtr></mtable></math></td>
<td>(3.4)</td>
</tr>
<tr>
<td><math id="Ch3.E5.m1" display="block"><mrow><mrow><mrow><mi mathvariant="normal">&#x394;</mi><mo>&#x2062;</mo><msub><mi>Z</mi><mi>t</mi></msub></mrow><mo>=</mo><mrow><msub><mi>Z</mi><mi>t</mi></msub><mo>-</mo><msub><mi>Z</mi><mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub></mrow><mo>=</mo><mrow><mrow><mi>&#x3c3;</mi><mo>&#x2062;</mo><mrow><mo stretchy="false">(</mo><msub><mi>Y</mi><mi>t</mi></msub><mo stretchy="false">)</mo></mrow></mrow><mo>-</mo><mrow><mi>&#x3c3;</mi><mo>&#x2062;</mo><mrow><mo stretchy="false">(</mo><msub><mi>Y</mi><mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub><mo stretchy="false">)</mo></mrow></mrow></mrow></mrow><mo rspace="7.5pt">,</mo><mrow><mrow><mi>w</mi><mo>&#x2062;</mo><mi>h</mi><mo>&#x2062;</mo><mi>e</mi><mo>&#x2062;</mo><mi>r</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi>e</mi></mpadded><mo>&#x2062;</mo><mi>&#x3c3;</mi><mo>&#x2062;</mo><mrow><mo stretchy="false">(</mo><msub><mi>Y</mi><mn>0</mn></msub><mo stretchy="false">)</mo></mrow></mrow><mo>=</mo><mn>0</mn></mrow></mrow></math></td>
<td>(3.5)</td>
</tr>
</table>
<para>In Eq. 3.3, instead of using <math id="Ch3.S3.SS1.p3.m1" display="inline"><msub><mi>X</mi><mi>t</mi></msub></math> directly, only changes or <math id="Ch3.S3.SS1.p3.m2" display="inline"><mrow><mi mathvariant="normal">&#x394;</mi><mo>&#x2062;</mo><msub><mi>X</mi><mi>t</mi></msub></mrow></math> are multiplied with W. Using the resulting <math id="Ch3.S3.SS1.p3.m3" display="inline"><mrow><mi mathvariant="normal">&#x394;</mi><mo>&#x2062;</mo><msub><mi>Y</mi><mi>t</mi></msub></mrow></math>, the corresponding <math id="Ch3.S3.SS1.p3.m4" display="inline"><msub><mi>Y</mi><mi>t</mi></msub></math> can be recursively calculated with Eq. 3.4, where <math id="Ch3.S3.SS1.p3.m5" display="inline"><msub><mi>Y</mi><mrow><mi>t</mi><mo>-</mo><mn>1</mn></mrow></msub></math> is the transitional state obtained from the previous calculation. Eq. 3.5 is the final delta activation output that is passed onto the next layer.</para>
<para>Another notable difference between the standard DNN layer and the proposed layer is the role of bias. In delta based inference, bias is only used as an initialization for the transitional state, <math id="Ch3.S3.SS1.p4.m1" display="inline"><msub><mi>Y</mi><mn>0</mn></msub></math> in Eq. 3.4. However, since bias tensors do not change over time, their temporal difference is zero and is removed from Eq. 3.3.</para>
<para>Now, as the input video is considered temporally correlated, the expectation is that <math id="Ch3.S3.SS1.p5.m1" display="inline"><mrow><mi mathvariant="normal">&#x394;</mi><mo>&#x2062;</mo><msub><mi>X</mi><mi>t</mi></msub></mrow></math> and by association <math id="Ch3.S3.SS1.p5.m2" display="inline"><mrow><mi mathvariant="normal">&#x394;</mi><mo>&#x2062;</mo><msub><mi>Z</mi><mi>t</mi></msub></mrow></math> are also temporally sparse. In essence, the temporal sparsity between consecutive feature maps is cast on the spatial sparsity of the delta map that is propagated. Additionally, <math id="Ch3.S3.SS1.p5.m3" display="inline"><msub><mi>Y</mi><mi>t</mi></msub></math> in Eq. 3.1 and 3.4 are always equal. This indicates that as long as the input is the same, both standard DNN and temporal delta layer based DNN provide the same result at any time step.</para>
</section>
<section class="lev2" id="ch3-3-2">
<title>3.3.2 Sparsity Induction Using Activation Quantization</title>
<para>As shown in <link linkend="ch3-F3">Figure 3.3</link>, there is temporal redundancy evident in feature maps of two consecutive frames. However, if looked closely, it can be observed that these feature maps are similar but not identical as shown in <link linkend="ch3-F3">Figure 3.3a</link> and <link linkend="ch3-F3">3.3b</link>. Therefore, if two such consecutive feature maps are subtracted, the resulting delta map has many near zero values, thus restricting the potential increase in temporal sparsity (<link linkend="ch3-F3">Figure 3.3c</link>). This is mainly due to the higher precision available in the floating point representation (FP32) of the activations. For example, in IEEE 754 representation, a single-precision 32-bit floating point number has 1 bit for sign, 8 bits for the exponent and 23 bits for the significant. It, not only, leads to a very high dynamic range, but also, increases the resolution or precision for numbers close to 0. The number nearest to 0 is about <math id="Ch3.S3.SS2.p1.m1" display="inline"><mrow><mo>&#xb1;</mo><mn>1.4</mn></mrow></math> x <math id="Ch3.S3.SS2.p1.m2" display="inline"><msup><mn>10</mn><mrow><mo>-</mo><mn>45</mn></mrow></msup></math>. Therefore, due to high resolution, two similar floating point values have difficulty going to absolute zero when subtracted. A plausible solution to decrease the precision of the activations is to use quantization.</para>
<fig id="ch3-F3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 3.3:</emphasis> Demonstration of two temporally consecutive activation maps leading to near zero values (rather than absolute zeroes) after delta operation.</para></caption>
<graphic xlink:href="graphics/ch3-fig03.jpg"/>
</fig>
<para>In this work, a post-training quantization method (fixed point quantization [<link linkend="ch3-bib20">20</link>]) and a quantization aware training method (learnable step size quantization [<link linkend="ch3-bib21">21</link>]) are considered for comparison as a temporal sparsity facilitator for the new layer.</para>
<section class="lev3" id="ch3-3-2-1">
<title>3.3.2.1 Fixed Point Quantization</title>
<para>In this method, the floating point numbers are quantized to integer or fixed point representation [<link linkend="ch3-bib20">20</link>]. Unlike floating point, in fixed point representation, the integer and the fractional part have fixed length. This limits both range and precision. That is, if more bits are used to represent the integer part, it subsequently decreases the precision and vice versa.</para>
<para><emphasis role="strong">Method:</emphasis></para>
<para>Firstly, a bitwidth is defined to which the 32-bit floating parameter is to be quantized, BW. Then, the number of bits required to represent the unsigned integer part of the parameter (<math id="Ch3.S3.SS2.SSSx1.Px1.p1.m1" display="inline"><mi>x</mi></math>) is calculated as shown in Eq. 3.6.</para>
<table id="Ch3.E6">
<tr>
<td><math id="Ch3.E6.m1" display="block"><mrow><mi>I</mi><mo>=</mo><mrow><mn>1</mn><mo>+</mo><mrow><mo stretchy="false">&#x230a;</mo><mrow><mi>l</mi><mo>&#x2062;</mo><mi>o</mi><mo>&#x2062;</mo><mpadded width="+5pt"><msub><mi>g</mi><mn>2</mn></msub></mpadded><mo>&#x2062;</mo><mrow><mo stretchy="false">(</mo><mrow><mpadded width="+5pt"><munder accentunder="true"><mrow><mi>m</mi><mo>&#x2062;</mo><mi>a</mi><mo>&#x2062;</mo><mi>x</mi></mrow><mrow><mn>1</mn><mo>&lt;</mo><mi>i</mi><mo>&lt;</mo><mi>N</mi></mrow></munder></mpadded><mo>&#x2062;</mo><mrow><mo fence="true" stretchy="false">|</mo><mi>x</mi><mo fence="true" rspace="7.5pt" stretchy="false">|</mo></mrow></mrow><mo stretchy="false">)</mo></mrow></mrow><mo stretchy="false">&#x230b;</mo></mrow></mrow></mrow></math></td>
<td>(3.6)</td>
</tr>
</table>
<para>A positive value of <math id="Ch3.S3.SS2.SSSx1.Px1.p3.m1" display="inline"><mi>I</mi></math> means that <math id="Ch3.S3.SS2.SSSx1.Px1.p3.m2" display="inline"><mi>I</mi></math> bits are required to represent the absolute value of the integer part, while a negative value of <math id="Ch3.S3.SS2.SSSx1.Px1.p3.m3" display="inline"><mi>I</mi></math> means that the fractional part has <math id="Ch3.S3.SS2.SSSx1.Px1.p3.m4" display="inline"><mi>I</mi></math> leading unused bits. Now, it is known that 1 bit is for sign, so the number of fractional bits, <math id="Ch3.S3.SS2.SSSx1.Px1.p3.m5" display="inline"><mi>F</mi></math>, is given by Eq. 3.7.</para>
<table id="Ch3.E7">
<tr>
<td><math id="Ch3.E7.m1" display="block"><mrow><mi>F</mi><mo>=</mo><mrow><mrow><mi>B</mi><mo>&#x2062;</mo><mi>W</mi></mrow><mo>-</mo><mi>I</mi><mo>-</mo><mn>1</mn></mrow></mrow></math></td>
<td>(3.7)</td>
</tr>
</table>
<para>Considering the parameters, BW - bitwidth, F - fractional bits, I - integer bits, and S - sign bit, Eq. 3.8 maps the floating point parameter <math id="Ch3.S3.SS2.SSSx1.Px1.p5.m1" display="inline"><mi>x</mi></math> to the fixed point by,</para>
<table id="Ch3.E8">
<tr>
<td><math id="Ch3.E8.m1" display="block"><mrow><mrow><mi>Q</mi><mo>&#x2062;</mo><mrow><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow></mrow><mo>=</mo><mfrac><mrow><mi>C</mi><mo>&#x2062;</mo><mrow><mo stretchy="false">(</mo><mrow><mi>R</mi><mo>&#x2062;</mo><mrow><mo stretchy="false">(</mo><mrow><mi>x</mi><mo>&#x2062;</mo><msup><mn>.2</mn><mi>F</mi></msup></mrow><mo stretchy="false">)</mo></mrow></mrow><mo>,</mo><mrow><mo>-</mo><mi>t</mi></mrow><mo>,</mo><mi>t</mi><mo stretchy="false">)</mo></mrow></mrow><msup><mn>2</mn><mi>F</mi></msup></mfrac></mrow></math></td>
<td>(3.8)</td>
</tr>
</table>
<para>where <math id="Ch3.S3.SS2.SSSx1.Px1.p6.m1" display="inline"><mrow><mi>R</mi><mrow><mo stretchy="false">(</mo><mo>.</mo><mo stretchy="false">)</mo></mrow></mrow></math> is the round function, <math id="Ch3.S3.SS2.SSSx1.Px1.p6.m2" display="inline"><mrow><mi>C</mi><mo>&#x2062;</mo><mrow><mo stretchy="false">(</mo><mi>x</mi><mo>,</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">)</mo></mrow></mrow></math> is the clipping function, and t is defined as,</para>
<table id="Ch3.S5.EGx1">
<tr>
<td class="td align_right eqn_cell"><math id="Ch3.Ex1.m1" display="inline"><mrow><mi>t</mi><mo>=</mo><mrow><mo>{</mo><mtable columnspacing="5pt" rowspacing="0pt"><mtr><mtd columnalign="left"><mrow><mrow><msup><mn>2</mn><mrow><mrow><mi>B</mi><mo>&#x2062;</mo><mi>W</mi></mrow><mo>-</mo><mi>S</mi></mrow></msup><mo rspace="12.5pt">,</mo><mrow><mi>B</mi><mo>&#x2062;</mo><mi>W</mi></mrow></mrow><mo>&gt;</mo><mn>1</mn></mrow></mtd><mtd></mtd></mtr><mtr><mtd columnalign="left"><mrow><mrow><mn>0</mn><mo separator="true">&#x2003;&#x2003;&#x2003;&#x2003;</mo><mrow><mi>B</mi><mo>&#x2062;</mo><mi>W</mi></mrow></mrow><mo>&#x2264;</mo><mn>1</mn></mrow></mtd><mtd></mtd></mtr></mtable></mrow></mrow></math></td>
<td></td>
</tr>
</table>
<para><emphasis role="strong">Possible Drawback of Fixed Point Quantization:</emphasis></para>
<para>Fixed point quantization, as shown above, is a fairly straightforward mapping scheme and is easy to be included in the model training process during the forward pass before the actual delta calculation. However, it poses a limitation to the extent of quantization possible without sacrificing accuracy. Typically, an 8-bit quantization can sustain floating point accuracy with this method, but if the bitwidth goes below 8 bits, the accuracy starts to deteriorate significantly. This is because, unlike weights, activations are dynamic and activation patterns change from input to input making them more sensitive to harsh quantization [<link linkend="ch3-bib22">22</link>]. Also, quantizing the layers of a network to the same bitwidth can mean that the inter-channel behaviour of the feature maps are not captured properly. Since the number of fractional bits is usually selected depending on the maximum activation value in a layer, this type of quantization tends to cause excessive information loss in channels with a smaller range.</para>
</section>
<section class="lev3" id="ch3-3-2-2">
<title>3.3.2.2 Learned Step-Size Quantization</title>
<para>Quantization aware training is the most logical solution to the aforementioned drawback as it can potentially recover the accuracy in low bit tasks given enough time to train. Therefore, a symmetric uniform quantization scheme is considered called Learned Step size Quantization (LSQ). This method considers the quantizer itself as a trainable parameter which is trying to minimize the task loss using backpropagation and stochastic gradient descent. This serves two purposes: (a) step size, which is the width of quantization bins, gets to be adaptive through the training according to the activation distribution. It is vital to find an optimum step size because, as shown in <link linkend="ch3-F4">Figure 3.4</link>, if the step size is too small or too large, it can lead to the quantized data being a poor representation of the raw data. (b) as the step size is a model parameter, it is also directly seeking to improve the metric of interest, i.e. accuracy.</para>
<fig id="ch3-F4" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 3.4:</emphasis> Importance of step size in quantization: on the right side, in all three cases, the data is quantized to five bins with different uniform step sizes. However, without optimum step size value, the quantization can detrimentally alter the range and resolution of the original data.</para></caption>
<graphic xlink:href="graphics/ch3-fig04.jpg"/>
</fig>
<para><emphasis role="strong">Method:</emphasis></para>
<para>Given: <math id="Ch3.S3.SS2.SSSx2.Px1.p1.m1" display="inline"><mi>x</mi></math> - the parameter to be quantized, <math id="Ch3.S3.SS2.SSSx2.Px1.p1.m2" display="inline"><mi>s</mi></math> - step size, <math id="Ch3.S3.SS2.SSSx2.Px1.p1.m3" display="inline"><msub><mi>Q</mi><mi>N</mi></msub></math> and <math id="Ch3.S3.SS2.SSSx2.Px1.p1.m4" display="inline"><msub><mi>Q</mi><mi>P</mi></msub></math> - number of negative and positive quantization levels respectively, and q(x;s) is the quantized representation with the same scale as x,</para>
<table id="Ch3.E9">
<tr>
<td><math id="Ch3.E9.m1" display="block"><mrow><mrow><mi>q</mi><mo>&#x2062;</mo><mrow><mo stretchy="false">(</mo><mi>x</mi><mo>;</mo><mi>s</mi><mo stretchy="false">)</mo></mrow></mrow><mo>=</mo><mrow><mo>{</mo><mtable columnspacing="5pt" displaystyle="true" rowspacing="0pt"><mtr><mtd columnalign="left"><mrow><mrow><mo stretchy="false">&#x230a;</mo><mstyle displaystyle="false"><mfrac><mi>x</mi><mi>s</mi></mfrac></mstyle><mo stretchy="false">&#x2309;</mo></mrow><mo>.</mo><mi>s</mi><mo>,</mo></mrow></mtd><mtd columnalign="left"><mrow><mrow><mtext>if</mtext><mo>-</mo><msub><mi>Q</mi><mi>N</mi></msub></mrow><mo>&#x2264;</mo><mstyle displaystyle="false"><mfrac><mi>x</mi><mi>s</mi></mfrac></mstyle><mo>&#x2264;</mo><msub><mi>Q</mi><mi>P</mi></msub></mrow></mtd></mtr><mtr><mtd columnalign="left"><mrow><mrow><mrow><mo>-</mo><msub><mi>Q</mi><mi>N</mi></msub></mrow><mo>.</mo><mi>s</mi></mrow><mo>,</mo></mrow></mtd><mtd columnalign="left"><mrow><mrow><mtext>if</mtext><mo>&#x2062;</mo><mstyle displaystyle="false"><mfrac><mi>x</mi><mi>s</mi></mfrac></mstyle></mrow><mo>&#x2264;</mo><mrow><mo>-</mo><msub><mi>Q</mi><mi>N</mi></msub></mrow></mrow></mtd></mtr><mtr><mtd columnalign="left"><mrow><mrow><msub><mi>Q</mi><mi>P</mi></msub><mo>.</mo><mi>s</mi></mrow><mo>,</mo></mrow></mtd><mtd columnalign="left"><mrow><mrow><mtext>if</mtext><mo>&#x2062;</mo><mstyle displaystyle="false"><mfrac><mi>x</mi><mi>s</mi></mfrac></mstyle></mrow><mo>&#x2265;</mo><msub><mi>Q</mi><mi>P</mi></msub></mrow></mtd></mtr></mtable></mrow></mrow></math></td>
<td>(3.9)</td>
</tr>
</table>
<para>where <math id="Ch3.S3.SS2.SSSx2.Px1.p1.m5" display="inline"><mrow><mo stretchy="false">&#x230a;</mo><mi>a</mi><mo stretchy="false">&#x2309;</mo></mrow></math> rounds the value to the nearest integer. Considering the number of bits, <math id="Ch3.S3.SS2.SSSx2.Px1.p1.m6" display="inline"><mi>b</mi></math>, to which the data is to be quantized, <math id="Ch3.S3.SS2.SSSx2.Px1.p1.m7" display="inline"><msub><mi>Q</mi><mi>N</mi></msub></math> = 0 for unsigned and <math id="Ch3.S3.SS2.SSSx2.Px1.p1.m8" display="inline"><msub><mi>Q</mi><mi>N</mi></msub></math> = <math id="Ch3.S3.SS2.SSSx2.Px1.p1.m9" display="inline"><msup><mn>2</mn><mrow><mi>b</mi><mo>-</mo><mn>1</mn></mrow></msup></math> for signed data. Similarly, <math id="Ch3.S3.SS2.SSSx2.Px1.p1.m10" display="inline"><msub><mi>Q</mi><mi>P</mi></msub></math> = <math id="Ch3.S3.SS2.SSSx2.Px1.p1.m11" display="inline"><msup><mn>2</mn><mrow><mi>b</mi><mo>-</mo><mn>1</mn></mrow></msup></math> for unsigned and <math id="Ch3.S3.SS2.SSSx2.Px1.p1.m12" display="inline"><mrow><msup><mn>2</mn><mrow><mi>b</mi><mo>-</mo><mn>1</mn></mrow></msup><mo>-</mo><mn>1</mn></mrow></math> for signed data.</para>
<para><emphasis role="strong">Modified LSQ:</emphasis></para>
<para>In this work, the original LSQ method is slightly modified to remove the clipping function from the equations as (a) the bitwidth, <math id="Ch3.S3.SS2.SSSx2.Px2.p1.m1" display="inline"><mi>b</mi></math>, required to calculate <math id="Ch3.S3.SS2.SSSx2.Px2.p1.m2" display="inline"><msub><mi>Q</mi><mi>N</mi></msub></math> and <math id="Ch3.S3.SS2.SSSx2.Px2.p1.m3" display="inline"><msub><mi>Q</mi><mi>P</mi></msub></math> is not known. This is because the bitwidth is not pre-defined and is determined using the activation statistics of each layer while training which leads to a mixed precision model, which is more advantageous, and (b) clipping leads to accuracy drop as it alters the range of the activation. That is, if activations are clipped during training, there could be a significant difference between the real-valued activation value and the quantized activation value, which in turn affects the gradient calculations and, therefore the SGD optimization.</para>
<para>Thus, in temporal delta layer, the forward pass of the quantization includes only scaling, rounding and de-scaling and can be mathematically expressed as,</para>
<table id="Ch3.E10">
<tr>
<td><math id="Ch3.E10.m1" display="block"><mrow><mi>q</mi><mrow><mo stretchy="false">(</mo><mi>x</mi><mo>;</mo><mi>s</mi><mo stretchy="false">)</mo></mrow><mo>=</mo><mrow><mo stretchy="false">&#x230a;</mo><mfrac><mi>x</mi><mi>s</mi></mfrac><mo stretchy="false">&#x2309;</mo></mrow><mo>.</mo><mi>s</mi></mrow></math></td>
<td>(3.10)</td>
</tr>
</table>
<para>The gradient of the Eq. 3.10 for backpropagation is given by Eq. 3.11.</para>
<table id="Ch3.E11">
<tr>
<td><math id="Ch3.E11.m1" display="block"><mrow><msub><mo>&#x2207;</mo><mi>s</mi></msub><mi>q</mi><mrow><mo stretchy="false">(</mo><mi>x</mi><mo>;</mo><mi>s</mi><mo stretchy="false">)</mo></mrow><mo>=</mo><mrow><mo stretchy="false">&#x230a;</mo><mfrac><mi>x</mi><mi>s</mi></mfrac><mo stretchy="false">&#x2309;</mo></mrow><mo>-</mo><mfrac><mi>x</mi><mi>s</mi></mfrac></mrow></math></td>
<td>(3.11)</td>
</tr>
</table>
</section>
</section>
<section class="lev2" id="ch3-3-3">
<title>3.3.3 Sparsity Penalty</title>
<para>Quantized delta map, created using the above-mentioned methods, in itself has a fair number of absolute zeroes (or sparsity) available. However, like the biological brain, learning can help in increasing this sparsity further. The inspiration for this came from an elegant set of experiments performed by Y. Yu et al. [<link linkend="ch3-bib23">23</link>]. The experiment showed a particular 30 seconds video to rodent specimens and tracked their activation density during each presentation. It was found that activation density decreased as the number of trials increased, i.e as the learning increased, the active neurons required for inference decreases.</para>
<para>Adapting the said concept to this work, a <math id="Ch3.S3.SS3.p2.m1" display="inline"><msub><mi>l</mi><mn>1</mn></msub></math> norm based constraint is introduced to the loss function. This is termed as the <emphasis>sparsity penalty</emphasis>. Therefore, the new cost function can be mathematically expressed as <emphasis>cost function = task loss + sparsity penalty</emphasis>, i.e,</para>
<table id="Ch3.E12">
<tr>
<td><math id="Ch3.E12.m1" display="block"><mrow><mrow><mi mathsize="90%">C</mi><mo>&#x2062;</mo><mi mathsize="90%">o</mi><mo>&#x2062;</mo><mi mathsize="90%">s</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi mathsize="90%">t</mi></mpadded><mo>&#x2062;</mo><mi mathsize="90%">f</mi><mo>&#x2062;</mo><mi mathsize="90%">u</mi><mo>&#x2062;</mo><mi mathsize="90%">n</mi><mo>&#x2062;</mo><mi mathsize="90%">c</mi><mo>&#x2062;</mo><mi mathsize="90%">t</mi><mo>&#x2062;</mo><mi mathsize="90%">i</mi><mo>&#x2062;</mo><mi mathsize="90%">o</mi><mo>&#x2062;</mo><mi mathsize="90%">n</mi></mrow><mo mathsize="90%" stretchy="false">=</mo><mrow><mrow><mi mathsize="90%">T</mi><mo>&#x2062;</mo><mi mathsize="90%">a</mi><mo>&#x2062;</mo><mi mathsize="90%">s</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi mathsize="90%">k</mi></mpadded><mo>&#x2062;</mo><mi mathsize="90%">l</mi><mo>&#x2062;</mo><mi mathsize="90%">o</mi><mo>&#x2062;</mo><mi mathsize="90%">s</mi><mo>&#x2062;</mo><mi mathsize="90%">s</mi></mrow><mo mathsize="90%" stretchy="false">+</mo><mrow><mpadded width="+5pt"><mi mathsize="90%">&#x3bb;</mi></mpadded><mo>&#x2062;</mo><mrow><mo maxsize="90%" minsize="90%">(</mo><mfrac><mrow><mpadded width="+5pt"><msub><mi mathsize="90%">l</mi><mn mathsize="90%">1</mn></msub></mpadded><mo>&#x2062;</mo><mi mathsize="90%">n</mi><mo>&#x2062;</mo><mi mathsize="90%">o</mi><mo>&#x2062;</mo><mi mathsize="90%">r</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi mathsize="90%">m</mi></mpadded><mo>&#x2062;</mo><mi mathsize="90%">o</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi mathsize="90%">f</mi></mpadded><mo>&#x2062;</mo><mi mathsize="90%">a</mi><mo>&#x2062;</mo><mi mathsize="90%">c</mi><mo>&#x2062;</mo><mi mathsize="90%">t</mi><mo>&#x2062;</mo><mi mathsize="90%">i</mi><mo>&#x2062;</mo><mi mathsize="90%">v</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi mathsize="90%">e</mi></mpadded><mo>&#x2062;</mo><mi mathsize="90%">n</mi><mo>&#x2062;</mo><mi mathsize="90%">e</mi><mo>&#x2062;</mo><mi mathsize="90%">u</mi><mo>&#x2062;</mo><mi mathsize="90%">r</mi><mo>&#x2062;</mo><mi mathsize="90%">o</mi><mo>&#x2062;</mo><mi mathsize="90%">n</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi mathsize="90%">s</mi></mpadded><mo>&#x2062;</mo><mi mathsize="90%">i</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi mathsize="90%">n</mi></mpadded><mo>&#x2062;</mo><mi mathsize="90%">d</mi><mo>&#x2062;</mo><mi mathsize="90%">e</mi><mo>&#x2062;</mo><mi mathsize="90%">l</mi><mo>&#x2062;</mo><mi mathsize="90%">t</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi mathsize="90%">a</mi></mpadded><mo>&#x2062;</mo><mi mathsize="90%">m</mi><mo>&#x2062;</mo><mi mathsize="90%">a</mi><mo>&#x2062;</mo><mi mathsize="90%">p</mi></mrow><mrow><mi mathsize="90%">t</mi><mo>&#x2062;</mo><mi mathsize="90%">o</mi><mo>&#x2062;</mo><mi mathsize="90%">t</mi><mo>&#x2062;</mo><mi mathsize="90%">a</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi mathsize="90%">l</mi></mpadded><mo>&#x2062;</mo><mi mathsize="90%">n</mi><mo>&#x2062;</mo><mi mathsize="90%">u</mi><mo>&#x2062;</mo><mi mathsize="90%">m</mi><mo>&#x2062;</mo><mi mathsize="90%">b</mi><mo>&#x2062;</mo><mi mathsize="90%">e</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi mathsize="90%">r</mi></mpadded><mo>&#x2062;</mo><mi mathsize="90%">o</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi mathsize="90%">f</mi></mpadded><mo>&#x2062;</mo><mi mathsize="90%">n</mi><mo>&#x2062;</mo><mi mathsize="90%">e</mi><mo>&#x2062;</mo><mi mathsize="90%">u</mi><mo>&#x2062;</mo><mi mathsize="90%">r</mi><mo>&#x2062;</mo><mi mathsize="90%">o</mi><mo>&#x2062;</mo><mi mathsize="90%">n</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi mathsize="90%">s</mi></mpadded><mo>&#x2062;</mo><mi mathsize="90%">i</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi mathsize="90%">n</mi></mpadded><mo>&#x2062;</mo><mi mathsize="90%">d</mi><mo>&#x2062;</mo><mi mathsize="90%">e</mi><mo>&#x2062;</mo><mi mathsize="90%">l</mi><mo>&#x2062;</mo><mi mathsize="90%">t</mi><mo>&#x2062;</mo><mpadded width="+5pt"><mi mathsize="90%">a</mi></mpadded><mo>&#x2062;</mo><mi mathsize="90%">m</mi><mo>&#x2062;</mo><mi mathsize="90%">a</mi><mo>&#x2062;</mo><mi mathsize="90%">p</mi></mrow></mfrac><mo maxsize="90%" minsize="90%">)</mo></mrow></mrow></mrow></mrow></math></td>
<td>(3.12)</td>
</tr>
</table>
<para>where task loss minimizes the error between the true value and the predicted value and, sparsity penalty minimizes the overall temporal activation density. The <math id="Ch3.S3.SS3.p4.m1" display="inline"><mi>&#x3bb;</mi></math> mentioned in Eq. 3.12 refers to the penalty co-efficient of the cost function. If <math id="Ch3.S3.SS3.p4.m2" display="inline"><mi>&#x3bb;</mi></math> is too small, the sparsity penalty takes little effect and model accuracy is given more priority and if <math id="Ch3.S3.SS3.p4.m3" display="inline"><mi>&#x3bb;</mi></math> is too large, sparsity becomes the priority leading to very sparse models but with unacceptable accuracy. The key is to find the balance between task loss and sparsity penalty.</para>
</section>
</section>
<section class="lev1" id="ch3-4">
<title>3.4 Experiments and Results</title>
<para>In this section, the proposed methodology explained in section 3.4 is analyzed to study how it helps achieve the desired temporal sparsity and accuracy.</para>
<section class="lev2" id="ch3-4-1">
<title>3.4.1 Baseline</title>
<para>For baseline, the two-stream architecture [<link linkend="ch3-bib24">24</link>] was used, with ResNet50 as the feature extractor on both spatial and temporal streams. The dataset used was UCF101, which is a widely used human action recognition dataset of &#x2018;in-the-wild&#x2019; action videos, obtained from YouTube, having 101 action categories [<link linkend="ch3-bib25">25</link>]. The spatial stream used single-frame RGB images of size (224, 224, 3) as the input, while the temporal stream used stacks of 10 RGB difference frames of size (224, 224, 10 <math id="Ch3.S4.SS1.p1.m1" display="inline"><mo>&#xd7;</mo></math> 3) as the input. Also, both these inputs were time distributed to apply the same layer to multiple frames simultaneously and produce output that has time as the fourth dimension. Both the streams were initialized with pre-trained ImageNet weights and fine-tuned with an SGD optimizer.</para>
<para>Under the above-mentioned setup, spatial and temporal streams achieved an accuracy of 75% and 70%, respectively. Then, both streams were <emphasis>average fused</emphasis> to achieve a final classification accuracy of 82%. Also, in this scenario, both streams were found to have an activation sparsity of <math id="Ch3.S4.SS1.p2.m1" display="inline"><mo>&#x223c;</mo></math> 47%.</para>
</section>
<section class="lev2" id="ch3-4-2">
<title>3.4.2 Experiments</title>
<para><emphasis role="strong">Scenario 1:</emphasis> The setup consecutively places the fixed point based quantization layer and temporal delta layer after every activation layer in the network. The temporal delta layer here also includes a l<sub>1</sub> norm based penalty. The baseline weights were used as a starting point, and all the layers including the temporal delta layer is fine-tuned until acceptable convergence. The hyper-parameters specifically required for this setup were bitwidth (to which the activations were to be quantized) and penalty co-efficient to balance the tussle between task loss and sparsity penalty.</para>
<para><emphasis role="strong">Scenario 2:</emphasis> The setup is similar to the previous scenario except for the activation quantization method used. The previous experiment used fixed precision quantization where all the activation layers in the network were quantized to the same bitwidth. However, this experiment uses learnable step-size quantization (LSQ), which performs channel-wise quantization depending on the activation distribution resulting in mixed-precision quantization of the activation maps.</para>
<para>The layer also introduces a hyperparameter during training (apart from the penalty coefficient mentioned earlier) for the step size initialization. Then, during training, the step size increases or decreases depending on the activation distribution in each channel.</para>
</section>
<section class="lev2" id="ch3-4-3">
<title>3.4.3 Result Analysis</title>
<para><link linkend="ch3-T1">Table 3.1</link> and <link linkend="ch3-T1">3.1</link> show the baseline accuracy and activation sparsity compared against the two scenarios mentioned.</para>
<fig id="ch3-T1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 3.1:</emphasis> Spatial stream - comparison of accuracy and activation sparsity obtained through the proposed scenarios against the baseline. In the case of fixed point quantization, the reported results are for a bitwidth of 6 bits.</para></caption>
<graphic xlink:href="graphics/ch3-tab01.jpg"/>
</fig>
<fig id="ch3-T2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 3.2:</emphasis> Temporal stream - comparison of accuracy and activation sparsity obtained through the proposed scenarios against the benchmark. In the case of fixed point quantization, the reported results are for a bitwidth of 7 bits.</para></caption>
<graphic xlink:href="graphics/ch3-tab02.jpg"/>
</fig>
<para>Firstly, when the temporal delta layers with fixed point quantized activations are included in the baseline model, it can be observed that the activation sparsity increases considerably with a slight loss in accuracy in both streams. This is because lowering the precision from 32 bits to 8 bits (or less) leads to temporal differences of activations going to absolute zero.</para>
<fig id="ch3-T3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 3.3:</emphasis> Result of decreasing activation bitwidth in fixed point quantization method. For spatial stream, decreasing below 6 bits caused the accuracy to drop considerably. For temporal stream, the same happened below 7 bits.</para></caption>
<graphic xlink:href="graphics/ch3-tab03.jpg"/>
</fig>
<para>Additionally, the reason for close-to baseline accuracy in the method involving fixed point quantization can be attributed to fractional bit allocation flexibility. That is, as the bitwidth is fixed, the number of integer bits required is decided depending on the activation distribution within the layer, and the rest of the bits are assigned as fractional bits. This makes sure that the precision of the activation is compromised for range. Also, another contributing factor for accuracy sustenance is that the first and the last layers of the model are not quantized, similar to works like [<link linkend="ch3-bib26">26</link>] [<link linkend="ch3-bib27">27</link>]. This is because the first and last layer has a lot of information density. Those are the layers where input pixels turn into features and features turn into output probabilities, respectively, which makes them more sensitive to quantization.</para>
<fig id="ch3-F5" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 3.5:</emphasis> Evolution of quantization step size from initialization to convergence in LSQ. As step-size is a learnable parameter, it gets re-adjusted during training to cause minimum information loss in each layer.</para></caption>
<graphic xlink:href="graphics/ch3-fig05.jpg"/>
</fig>
<para>Although the activation sparsity gain in the case of the temporal delta layer with fixed point quantization is better than the baseline, it is still not sufficiently high as required. In this effort, the bitwidth of the activations are decreased in the expectation of increasing sparsity. However, as the bitwidth goes below a certain value (6 bits for spatial and 7 bits for temporal stream), sparsity increases, but accuracy starts to deteriorate beyond recovery, as shown in <link linkend="ch3-T3">Table 3.3</link>. This is because quantizing all layers of a network to the same bitwidth can mean that the inter-channel variations of the feature maps are not fully accounted for. Since the number of fractional bits is usually selected to cover the maximum activation value in a layer, the fixed bitwidth quantization tends to cause excessive information loss in channels with a smaller dynamic range. Therefore, it can be inferred that mixed-precision quantization of activations is a better approach to obtain good sparsity without compromising accuracy.</para>
<fig id="ch3-T4" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 3.4:</emphasis> Final results from two-stream network after average fusing the spatial and temporal stream weights. With 5% accuracy loss, the proposed method almost doubles the activation sparsity available in comparison to the baseline.</para></caption>
<graphic xlink:href="graphics/ch3-tab04.jpg"/>
</fig>
<para>Finally, using the temporal delta layer where incoming activations are quantized using learnable step-size quantization (LSQ) gives the best results for both spatial and temporal streams. As the step size is a learnable parameter, it gives the model enough flexibility to result in a mixed precision model, where each channel in a layer has a bitwidth that suits its activation distribution. This kind of channel-wise quantization minimizes the impact of low-precision rounding. It is also evident in <link linkend="ch3-F5">Figure 3.5</link> that as the training nears convergence, the values of the step size differ according to the activation distribution and bitwidth required to represent each layer. Moreover, consistent with the expectation, the first and last layers during training opts for smaller step sizes implying they need more bitwidth for their representation.</para>
<para>The weights generated using this method was then average fused to find the final two-stream network accuracy and activation sparsity (<link linkend="ch3-T4">Table 3.4</link>). Finally, the proposed method can achieve an overall 88% activation sparsity with 5% accuracy loss.</para>
</section>
</section>
<section class="lev1" id="ch3-5">
<title>3.5 Conclusion</title>
<para>Intuitively, the proposed new temporal delta layer projects the temporal activation sparsity between two consecutive feature maps onto the spatial activation sparsity of their delta map. When executing sparse tensor multiplications in hardware, this spatial sparsity can be used to decrease the computations and memory accesses. As shown in <link linkend="ch3-T4">Table 3.4</link>, the proposed method resulted in 88% overall activation sparsity with a trade-off of 5% accuracy drop on UCF-101 dataset.</para>
<para>The collateral benefit of the obtained temporal sparsity is that the computations does not increase linearly with the increase in frame rate. In typical DNNs, doubling the frame rate would automatically necessitate doubling the computations. However, in the case of temporal delta layer based model, increasing the frame rate will not only improve the temporal precision of the network but also increase its temporal sparsity limiting the computations required [<link linkend="ch3-bib28">28</link>].</para>
<para>The downside of using the temporal delta layer is that it requires keeping track of previous activations in order to perform delta operations. As a result, the overall memory footprint grows, putting more reliance on off-chip memory. However, the rising popularity of novel memory technologies (like resistive RAM [<link linkend="ch3-bib29">29</link>], embedded Flash memory [<link linkend="ch3-bib30">30</link>], etc.) may improve the cost calculations in the near future.</para>
<para><emphasis role="strong">Disclaimer:</emphasis> This paper is a distillation of the research done by one of the authors as a part of her master thesis and is partially published in chapter 3 of [<link linkend="ch3-bib32">32</link>]. The complete thesis, along with the results and analysis, is available online [<link linkend="ch3-bib31">31</link>].</para>
</section>
<section class="lev1">
<title>Acknowledgment</title>
<para>This work is partially funded by research and innovation projects TEMPO (ECSEL JU under grant agreement No 826655), ANDANTE (ECSEL JU under grant agreement No 876925) and DAIS (KDT JU under grant agreement No 101007273), SunRISE (EUREKA cluster PENTA2018e-17004-SunRISE) and Comp4Drones (ECSEL JU grant agreement No. 826610). The JU receives support from the European Union&#x2019;s Horizon 2020 research and innovation programme and Sweden, Spain, Portugal, Belgium, Germany, Slovenia, Czech Republic, Netherlands, Denmark, Norway and Turkey.</para>
</section>
<section class="lev1" id="ch3-Ref">
<title>References</title>
<para id="ch3-bib1">[1] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool, &#x201c;Temporal segment networks: Towards good practices for deep action recognition,&#x201d; in <emphasis>European conference on computer vision</emphasis>, pp. 20&#x2013;36, Springer, 2016.</para>
<para id="ch3-bib2">[2] K. Chen and W. Tao, &#x201c;Once for all: a two-flow convolutional neural network for visual tracking,&#x201d; <emphasis>IEEE Transactions on Circuits and Systems for Video Technology</emphasis>, vol. 28, no. 12, pp. 3377&#x2013;3386, 2017.</para>
<para id="ch3-bib3">[3] K. Kang, H. Li, J. Yan, X. Zeng, B. Yang, T. Xiao, C. Zhang, Z. Wang, R. Wang, X. Wang, <emphasis>et al.</emphasis>, &#x201c;T-cnn: Tubelets with convolutional neural networks for object detection from videos,&#x201d; <emphasis>IEEE Transactions on Circuits and Systems for Video Technology</emphasis>, vol. 28, no. 10, pp. 2896&#x2013;2907, 2017.</para>
<para id="ch3-bib4">[4] S. Han, H. Mao, and W. J. Dally, &#x201c;Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,&#x201d; <emphasis>arXiv preprint arXiv:1510.00149</emphasis>, 2015.</para>
<para id="ch3-bib5">[5] G. Hinton, O. Vinyals, and J. Dean, &#x201c;Distilling the knowledge in a neural network,&#x201d; <emphasis>arXiv preprint arXiv:1503.02531</emphasis>, 2015.</para>
<para id="ch3-bib6">[6] W. Wen, C. Wu, Y. Wang, Y. Chen, and H. Li, &#x201c;Learning structured sparsity in deep neural networks,&#x201d; <emphasis>arXiv preprint arXiv:1608.03665</emphasis>, 2016.</para>
<para id="ch3-bib7">[7] M. Mahowald, &#x201c;The silicon retina,&#x201d; in <emphasis>An Analog VLSI System for Stereoscopic Vision</emphasis>, pp. 4&#x2013;65, Springer, 1994.</para>
<para id="ch3-bib8">[8] J. W. Mink, R. J. Blumenschine, and D. B. Adams, &#x201c;Ratio of central nervous system to body metabolism in vertebrates: its constancy and functional basis,&#x201d; <emphasis>American Journal of Physiology-Regulatory, Integrative and Comparative Physiology</emphasis>, vol. 241, no. 3, pp. R203&#x2013;R212, 1981.</para>
<para id="ch3-bib9">[9] A. Yousefzadeh, M. A. Khoei, S. Hosseini, P. Holanda, S. Leroux, O. Moreira, J. Tapson, B. Dhoedt, P. Simoens, T. Serrano-Gotarredona, <emphasis>et al.</emphasis>, &#x201c;Asynchronous spiking neurons, the natural key to exploit temporal sparsity,&#x201d; <emphasis>IEEE Journal on Emerging and Selected Topics in Circuits and Systems</emphasis>, vol. 9, no. 4, pp. 668&#x2013;678, 2019.</para>
<para id="ch3-bib10">[10] C. Gao, D. Neil, E. Ceolini, S.-C. Liu, and T. Delbruck, &#x201c;Deltarnn: A power-efficient recurrent neural network accelerator,&#x201d; in <emphasis>Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays</emphasis>, pp. 21&#x2013;30, 2018.</para>
<para id="ch3-bib11">[11] O. Moreira, A. Yousefzadeh, F. Chersi, G. Cinserin, R.-J. Zwartenkot, A. Kapoor, P. Qiao, P. Kievits, M. Khoei, L. Rouillard, <emphasis>et al.</emphasis>, &#x201c;Neuronflow: a neuromorphic processor architecture for live ai applications,&#x201d; in <emphasis>2020 Design, Automation &amp; Test in Europe Conference &amp; Exhibition (DATE)</emphasis>, pp. 840&#x2013;845, IEEE, 2020.</para>
<para id="ch3-bib12">[12] J. Frankle and M. Carbin, &#x201c;The lottery ticket hypothesis: Finding sparse, trainable neural networks,&#x201d; <emphasis>arXiv preprint arXiv:1803.03635</emphasis>, 2018.</para>
<para id="ch3-bib13">[13] H. Yang, W. Wen, and H. Li, &#x201c;Deephoyer: Learning sparser neural network with differentiable scale-invariant sparsity measures,&#x201d; <emphasis>arXiv preprint arXiv:1908.09979</emphasis>, 2019.</para>
<para id="ch3-bib14">[14] S. Seto, M. T. Wells, and W. Zhang, &#x201c;Halo: Learning to prune neural networks with shrinkage,&#x201d; in <emphasis>Proceedings of the 2021 SIAM International Conference on Data Mining (SDM)</emphasis>, pp. 558&#x2013;566, SIAM, 2021.</para>
<para id="ch3-bib15">[15] M. Mahmoud, K. Siu, and A. Moshovos, &#x201c;Diffy: A d&#xe9;j&#xe0; vu-free differential deep neural network accelerator,&#x201d; in <emphasis>2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)</emphasis>, pp. 134&#x2013;147, IEEE, 2018.</para>
<para id="ch3-bib16">[16] C.-Y. Wu, M. Zaheer, H. Hu, R. Manmatha, A. J. Smola, and P. Kr&#xe4;henb&#xfc;hl, &#x201c;Compressed video action recognition,&#x201d; in <emphasis>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</emphasis>, pp. 6026&#x2013;6035, 2018.</para>
<para id="ch3-bib17">[17] M. Buckler, P. Bedoukian, S. Jayasuriya, and A. Sampson, &#x201c;Eva<math id="bib.bib17.m1" display="inline"><msup><mi></mi><mn>2</mn></msup></math>: Exploiting temporal redundancy in live computer vision,&#x201d; in <emphasis>2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA)</emphasis>, pp. 533&#x2013;546, IEEE, 2018.</para>
<para id="ch3-bib18">[18] L. Cavigelli, P. Degen, and L. Benini, &#x201c;Cbinfer: Change-based inference for convolutional neural networks on video data,&#x201d; in <emphasis>Proceedings of the 11th International Conference on Distributed Smart Cameras</emphasis>, pp. 1&#x2013;8, 2017.</para>
<para id="ch3-bib19">[19] P. O&#x2019;Connor and M. Welling, &#x201c;Sigma delta quantized networks,&#x201d; <emphasis>arXiv preprint arXiv:1611.02024</emphasis>, 2016.</para>
<para id="ch3-bib20">[20] P.-E. Novac, G. B. Hacene, A. Pegatoquet, B. Miramond, and V. Gripon, &#x201c;Quantization and deployment of deep neural networks on microcontrollers,&#x201d; <emphasis>Sensors</emphasis>, vol. 21, no. 9, p. 2984, 2021.</para>
<para id="ch3-bib21">[21] S. K. Esser, J. L. McKinstry, D. Bablani, R. Appuswamy, and D. S. Modha, &#x201c;Learned step size quantization,&#x201d; <emphasis>arXiv preprint arXiv:1902.08153</emphasis>, 2019.</para>
<para id="ch3-bib22">[22] R. Krishnamoorthi, &#x201c;Quantizing deep convolutional networks for efficient inference: A whitepaper,&#x201d; <emphasis>arXiv preprint arXiv:1806.08342</emphasis>, 2018.</para>
<para id="ch3-bib23">[23] Y. Yu, R. Hira, J. N. Stirman, W. Yu, I. T. Smith, and S. L. Smith, &#x201c;Mice use robust and common strategies to discriminate natural scenes,&#x201d; <emphasis>Scientific reports</emphasis>, vol. 8, no. 1, pp. 1&#x2013;13, 2018.</para>
<para id="ch3-bib24">[24] K. Simonyan and A. Zisserman, &#x201c;Two-stream convolutional networks for action recognition in videos,&#x201d; <emphasis>arXiv preprint arXiv:1406.2199</emphasis>, 2014.</para>
<para id="ch3-bib25">[25] K. Soomro, A. R. Zamir, and M. Shah, &#x201c;Ucf101: A dataset of 101 human actions classes from videos in the wild,&#x201d; <emphasis>arXiv preprint arXiv:1212.0402</emphasis>, 2012.</para>
<para id="ch3-bib26">[26] J. Choi, Z. Wang, S. Venkataramani, P. I.-J. Chuang, V. Srinivasan, and K. Gopalakrishnan, &#x201c;Pact: Parameterized clipping activation for quantized neural networks,&#x201d; <emphasis>arXiv preprint arXiv:1805.06085</emphasis>, 2018.</para>
<para id="ch3-bib27">[27] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, &#x201c;Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients,&#x201d; <emphasis>arXiv preprint arXiv:1606.06160</emphasis>, 2016.</para>
<para id="ch3-bib28">[28] M. A. Khoei, A. Yousefzadeh, A. Pourtaherian, O. Moreira, and J. Tapson, &#x201c;Sparnet: Sparse asynchronous neural network execution for energy efficient inference,&#x201d; in <emphasis>2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)</emphasis>, pp. 256&#x2013;260, IEEE, 2020.</para>
<para id="ch3-bib29">[29] S. Huang, A. Ankit, P. Silveira, R. Antunes, S. R. Chalamalasetti, I. El Hajj, D. E. Kim, G. Aguiar, P. Bruel, S. Serebryakov, <emphasis>et al.</emphasis>, &#x201c;Mixed precision quantization for reram-based dnn inference accelerators,&#x201d; in <emphasis>2021 26th Asia and South Pacific Design Automation Conference (ASP-DAC)</emphasis>, pp. 372&#x2013;377, IEEE, 2021.</para>
<para id="ch3-bib30">[30] M. Kang, H. Kim, H. Shin, J. Sim, K. Kim, and L.-S. Kim, &#x201c;S-flash: A nand flash-based deep neural network accelerator exploiting bit-level sparsity,&#x201d; <emphasis>IEEE Transactions on Computers</emphasis>, 2021.</para>
<para id="ch3-bib31">[31] P. Vijayan, &#x201c;Temporal Delta Layer.&#x201d; <ulink url="http://resolver.tudelft.nl/uuid:0806241d-9037-4094-a197-6e65d6482f2b">http://resolver.tudelft.nl/uuid:0806241d-9037-4094-a197-6e65d6482f2b</ulink>.</para>
<para id="ch3-bib32">[32] O. Vermesan and M. Diaz Nava (Eds), Intelligent Edge-Embedded Technologies for Digitising Industry ISBN: 9788770226103, River Publishers, Gistrup, Denmark, 2022.</para>
</section>
</chapter>
<chapter class="chapter" id="ch4" label="4" xreflabel="4">
<title>An End-to-End AI-based Automated Process for Semiconductor Device Parameter Extraction</title>
<subtitle>Dinu Purice<sup>1</sup>, Matthias Ludwig<sup>2</sup>, and Claus Lenz<sup>1</sup></subtitle>
<affiliation><sup>1</sup>Cognition Factory GmbH, Germany<?lb?><sup>2</sup>Infineon Technologies AG, Germany</affiliation>
<section class="lev1">
<title>Abstract</title>
<para>In this work, we present an automated AI-supported end-to-end technology validation pipeline aiming to increase trust in semiconductor devices by enabling a check of their authenticity. The high revenue associated with the semiconductor industry makes it vulnerable to counterfeiting activities potentially endangering safety, reliability and trust of critical systems such as highly automated cars, cloud, Internet of Things, connectivity, space, defence and supercomputers [<link linkend="ch4-bib7">7</link>]. The proposed approach combines semiconductor device-intrinsic features extracted by artificial neural networks with domain expert knowledge in a pipeline of two stages: (i) a semantic segmentation stage based on a modular cascaded U-Net architecture to extract spatial and geometric information, and (ii) a parameter extraction stage to identify the technology fingerprint using a clustering approach. An in-depth evaluation and comparison of several artificial neural network architectures has been performed to find the most suitable solution for this task. The final results validate the taken approach, with deviations close to acceptable levels as defined by existing standards within the industry.</para>
<para><emphasis role="strong">Keywords:</emphasis> Semantic segmentation, image processing, hardware trust, physical inspection of electronics, AI, ML, deep learning, supervised learning, convolutional neural networks, computer vision</para>
</section>
<section class="lev1" id="ch4-1">
<title>4.1 Introduction</title>
<para>Automation is one of the key parameters industries can approach to strengthen quality and lower overall costs. The improved availability of data and the mainstream application of approaches relying on artificial intelligence (AI) pushes industries towards the adaption of these AI methods. Nonetheless, practical implementations of these often seem to fail due to inflated expectations. Via a use-case from the semiconductor industry, we show various practical ways to overcome these potential pitfalls.</para>
<para>The recently introduced European Chips act recognises the paramount importance of the semiconductor industry within the global economy. The market for integrated electronics was at $452.25B in 2021 and is expected to grow to $803.15B in 2028 [<link linkend="ch4-bib8">8</link>]. The high revenue potential causes extreme cost pressure and a highly competitive market. Consequently, since decades, the semiconductor industry is driven to automation along the complete value chain. One way to differentiate from competitors is through the utilisation of AI-powered manufacturing enhancements which have the potential to gain $35B - $40B annually over the entire industry [<link linkend="ch4-bib10">10</link>]. Yet, not only manufacturing yields the potential to benefit from the industries push towards AI. The methods also offer the chance to be used for trust generation. In the aforementioned staggering market, rogues also aim to catch their share through counterfeiting, i.e. cloning, remarking, overproducing, or simply reselling of used parts [<link linkend="ch4-bib9">9</link>]. This leads to the use case discussed throughout this work: via physical inspection and a fully integrated AI flow we present a fully automated assessment of the technological properties of a device. The idea for such a pipeline has already been introduced in [<link linkend="ch4-bib15">15</link>] where it is argued that through a subsequent analysis of the cross-sections, the authenticity of the manufacturing technology can be validated. Relevant features in this case include geometric shapes and dimensions of the constituent structures, as well as material-related properties. Each technology can be interpreted as an individual fingerprint, such that deviations from specifications can be reported as suspicious. This work will focus on the end-to-end application aspects of the use case and includes followingcontributions:</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para>We will introduce an end-to-end, fully automated flow for semiconductor device technological parameter extraction by image segmentation and pattern recognition as an exemplary industrial use-case.</para></listitem>
<listitem><para>We introduce our methodology that is tailored to the requirements of the use case. This includes an image segmentation approach which is constituted of a set of specialised U-net cascades, class-specific loss functions, and an evolution-based training approach.</para></listitem>
<listitem><para>The advantages of our design-decisions are quantitatively compared to similar state-of-the-art approaches and important lessons learned &#x2013; transferable to other use-cases &#x2013; are summarised.</para></listitem>
</itemizedlist>
<para><emphasis role="strong">Related work:</emphasis> The demand for measuring structures and critical dimensions within semiconductor devices is ever-increasing. While manufacturing relies mostly on in-line metrology, a further possibility is the post-production measurement. The databases are oftentimes big and automating of these flows is vital. A first template-based approach has been shown in [<link linkend="ch4-bib30">30</link>]. This work relies on template matching and pattern recognition for the extraction of profile parameters. Furthermore, in a previous work [<link linkend="ch4-bib15">15</link>], we have proposed how the flow can be utilised for the detection of counterfeit electronics [<link linkend="ch4-bib9">9</link>] by comparing the extracted parameters against a database of known parameters.</para>
<para>The prospect of (semi)-automation of industrial processes through the use of machine learning-based (ML) methods is further gaining traction due to recent advancements in the field of ML and the uncovering of its unprecedented feature extraction and generalisation capabilities. Further accelerated due to the abundance of data, the &#x201c;smartisation&#x201d; of industrial processes through ML techniques has been conceived as the fourth industrial revolution [<link linkend="ch4-bib6">6</link>].</para>
<para>The data set involved in this application bears two important characteristics: it consists of grey-valued images, and more importantly has a very limited availability of annotated data. The same characteristics are typically observed in medical applications, in dealing with images produced by computed tomography (CT), cone beam computed tomography (CBCT), as well as magnetic resonance imaging (MRI), ultrasound, X-ray, all data types being scarcely available to the public due to the confidential nature of medical data. Nevertheless segmentation tasks have been successfully tackled by ML-based methods, and in particular deep learning approaches which were proven to satisfy the high accuracy requirements typical to applications in the medical field. Of particular note in this context is the work of Ronneberger <emphasis>et al.</emphasis> [<link linkend="ch4-bib22">22</link>] with the introduction of the U-net, a symmetric network consisting of a encoding and a decoding arm which was proved to possess high generalisation capabilities even on relatively small data sets. The progress was further accentuated after the debut of Dice-based loss functions, first introduced by Milletari <emphasis>et al.</emphasis> [<link linkend="ch4-bib17">17</link>], which have been proven to outperform existing alternatives in the analysis of highly skewed data. Based on the above-mentioned innovations, both supervised and unsupervised deep learning-based approaches have been constantly expanding within different use cases in the medical field, as shown by the works of Kawula <emphasis>et al.</emphasis> [<link linkend="ch4-bib11">11</link>], Wang <emphasis>et al.</emphasis> [<link linkend="ch4-bib3">3</link>] or Altaf <emphasis>et al.</emphasis> [<link linkend="ch4-bib2">2</link>].</para>
<para>The following chapters describe the two paramount steps of this application, namely the Image Segmentation and the Parameter Extraction stages, respectively. Both stages are currently being fine-tuned and validated to ensure compliance with industry-defined standards of operation.</para>
<fig id="ch4-F1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 4.1:</emphasis> Overview of the architecture.</para></caption>
<graphic xlink:href="graphics/ch4-fig01.jpg"/>
</fig>
</section>
<section class="lev1" id="ch4-2">
<title>4.2 Semantic Segmentation</title>
<section class="lev2" id="ch4-2-1">
<title>4.2.1 Proof of Concept and Architecture Overview</title>
<para>As a first step of development a benchmark stage was conducted, with the goal of determining the viability of an AI-based approach to scanning electron microscope (SEM) image segmentation and identify the most suitable architecture for the task. Considering that both the industrial sector and the academic sector lack openly available annotated semiconductor cross-section SEM data, a custom data set was assembled and labelled. The data set consists of 1024 by 685 grey-valued images, obtained at Infineon Technologies AG&#x2019;s failure analysis laboratories and represent technology nodes from 500 nm to approximately 40 nm with copper and Al-Tu technologies included. Devices with less than one metal layer (e.g. discrete transistors) were excluded. The image sources are state-of-the-art SEMs available in semiconductor failure analysis laboratories. For the purpose of this stage 202 images were manually sampled and labelled.</para>
<fig id="ch4-F2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 4.2:</emphasis> Examples showcasing different semiconductor technologies</para></caption>
<graphic xlink:href="graphics/ch4-fig02.jpg"/>
</fig>
<para>The images were annotated with 5 relevant labels of interest, namely &#x201c;metal&#x201d;, &#x201c;VIA&#x201d;, &#x201c;lateral isolation&#x201d;, &#x201c;poly&#x201d;, and &#x201c;deep trench isolation&#x201d; [<link linkend="ch4-bib25">25</link>], each bearing features important in the process of technology identification. The selected features imbue the following purposes within a semiconductor device:</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para><emphasis role="strong">Metal:</emphasis> Low resistance metallic connections between devices. Several metallisation layer can be stacked over each other to route inter-device connections.</para></listitem>
<listitem><para><emphasis role="strong">Vertical interconnect access (VIA) / contact:</emphasis> Low ohmicinterconnections between different metallisation layers (VIA) or between devices and the lowest metallisation layer.</para></listitem>
<listitem><para><emphasis role="strong">Lateral isolation (shallow trench isolation):</emphasis> Electrical lateral isolation between devices with a dioxide trough a <emphasis>shallow</emphasis> deposition.</para></listitem>
<listitem><para><emphasis role="strong">Deep trench isolation:</emphasis> Trenches for lateral isolation with a high depth-width ratio. Mostly found in analogue integrated circuits.</para></listitem>
<listitem><para><emphasis role="strong">Poly:</emphasis> Poly-crystalline silicon which is used as gate electrode.</para></listitem>
</itemizedlist>
<fig id="ch4-F3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 4.3:</emphasis> Examples of labelled data showcasing the different ROIs: green &#x2013; VIA; yellow &#x2013; metal; teal &#x2013; lateral isolation; red &#x2013; poly; blue &#x2013; deep trench isolation</para></caption>
<graphic xlink:href="graphics/ch4-fig03.jpg"/>
</fig>
<para>For the benchmark stage however only two regions of interest (ROIs) were selected, namely &#x201c;VIA&#x201d; and &#x201c;metal&#x201d;. The two ROIs strongly differ in terms of size and quantity, with the pixel-wise class-distribution of the &#x201c;metal&#x201d; objects representing 13.61%, while &#x201c;VIA&#x201d; objects being more numerous but at the same time smaller, taking up 2.5%. Therefore, they reflect the two important properties of the expected data: high variability and high skewness.</para>
<fig id="ch4-F4" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 4.4:</emphasis> Histograms of the investigated data grouped by label of interest</para></caption>
<graphic xlink:href="graphics/ch4-fig04.jpg"/>
</fig>
<para>As it can be seen in <link linkend="ch4-F4">Fig. 4.4</link> there is a strong overlap in intensity between the various regions of interest, yielding classical segmentation methods such as thresholding [<link linkend="ch4-bib21">21</link>], region-growing [<link linkend="ch4-bib20">20</link>], watershed [<link linkend="ch4-bib18">18</link>] and k-means clustering [<link linkend="ch4-bib19">19</link>] ineffective. Instead, an effective segmentation process requires domain-expert knowledge &#x2013; thus encouraging the use of deep learning-based methods capable of extracting spatial and semantic features. Several network architectures were selected as candidates, based on their respective performance in similar segmentation tasks. An overview of each candidate network architecture is presented below:</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para><emphasis role="strong">U-net</emphasis> [<link linkend="ch4-bib22">22</link>]</para></listitem>
</itemizedlist>
<fig id="ch4-F5" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 4.5:</emphasis> Overview of the U-net architecture [<link linkend="ch4-bib24">24</link>]</para></caption>
<graphic xlink:href="graphics/ch4-fig05.jpg"/>
</fig>
<para>Introduced by Ronneberger <emphasis>et al.</emphasis> [<link linkend="ch4-bib22">22</link>] as a solution for biomedical image segmentation, this architecture has been shown to perform reasonably well even when trained with small amounts of data. It consists of an down-sampling encoder and an up-sampling decoder arm enabling efficient spatial context capture. The arms are connected with skip connection which accelerate convergence during training and combat vanishing gradients. The U-net achieved an averaged Dice score of 0.76 on the test subset.</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para><emphasis role="strong">Feature Pyramid Network (FPN)</emphasis> [<link linkend="ch4-bib13">13</link>]</para></listitem>
</itemizedlist>
<fig id="ch4-F6" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 4.6:</emphasis> Overview of the FPN architecture [<link linkend="ch4-bib24">24</link>]</para></caption>
<graphic xlink:href="graphics/ch4-fig06.jpg"/>
</fig>
<para>The FPN follows a top-down approach with skip connections, similar to the previously mentioned U-net. However instead of using the final output as the prediction, the FPN makes predictions for each stage (see <link linkend="ch4-F6">Fig. 4.6</link>) thus combining semantically strong low-resolution features with semantically weaker high-level features. An additional segmentation branch is used to then merge the information from all levels into a single output. The FPN obtained an averaged Dice score of 0.71 on the test subset</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para><emphasis role="strong">Gated-Shape Convolutional Neural Network (GSCNN)</emphasis> [<link linkend="ch4-bib27">27</link>]</para></listitem>
</itemizedlist>
<fig id="ch4-F7" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 4.7:</emphasis> Overview of the GSCNN architecture [<link linkend="ch4-bib24">24</link>]</para></caption>
<graphic xlink:href="graphics/ch4-fig07.jpg"/>
</fig>
<para>The GSCNN employs a two-stream architecture, with the shape-related features focused in a dedicated stream that works in parallel to the standard encoder. A key characteristic of this architecture is the use of gated convolutional layers, which connect intermediate layers of both streams, facilitating the transfer of information from the encoder to the shape stream while filtering irrelevant information. The information of both streams is then combined within the fusion stage using an Atrous Spatial Pyramid Pooling module (ASPP). An averaged Dice score of 0.74 on the test subset was obtained by the GSCNN.</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para><emphasis role="strong">Pyramid Scene Parsing Network (PSPNet)</emphasis> [<link linkend="ch4-bib31">31</link>]</para></listitem>
</itemizedlist>
<fig id="ch4-F8" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 4.8:</emphasis> Overview of the PSPNet architecture [<link linkend="ch4-bib24">24</link>]</para></caption>
<graphic xlink:href="graphics/ch4-fig08.jpg"/>
</fig>
<para>The PSPNet architecture makes use of a Pyramid Pooling Module (PPM) to capture rich context information from the output of the encoder arm. The capture is done through fusion of the network&#x2019;s four pyramid scales, as seen in <link linkend="ch4-F8">Fig. 4.8</link>. An averaged Dice score of 0.69 on the test subset was obtained using the PSPNet architecture.</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para><emphasis role="strong">Siamese network</emphasis> [<link linkend="ch4-bib16">16</link>]</para></listitem>
</itemizedlist>
<fig id="ch4-F9" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 4.9:</emphasis> Overview of the Siamese network architecture [<link linkend="ch4-bib12">12</link>]</para></caption>
<graphic xlink:href="graphics/ch4-fig09.jpg"/>
</fig>
<para>The Siamese network presents another approach to combine the features extracted at low-resolution and high-resolution levels, namely through a two step approach. The first step operates on the whole, down-sampled image, and outputs a coarse segmentation map. As a second step the segmentation map is then fed into a Siamese network containing two encoders (as show in <link linkend="ch4-F9">Fig. 4.9</link>), with the original high resolution image going through the other encoder in patches. Finally the decoder stitches together the patches, obtaining a segmentation map at the same resolution as the input image. The Siamese network reached an averaged Dice score of 0.78 on the test subset.</para>
</section>
<section class="lev2" id="ch4-2-2">
<title>4.2.2 Implementation Details and Result Overview</title>
<para>To complete the benchmark stage, each network architecture was trained 5 times on random pre-sampled splits of the data set (60% training, 20% validation, 20% test). The resulting Dice scores (averaged over the 5 tries and the 2 labels of interest) and their respective spread is presented in <link linkend="ch4-F10">Fig. 4.10</link> below. All experiments were ran on a server equipped with Intel Core i9-9940x (14 Cores, 3,30GHz), 4 RTX 5000 GPUs and 128 GB RAM.</para>
<fig id="ch4-F10" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 4.10:</emphasis> Average Dice Scores (blue) and spread (green) per investigated network architecture, along with the final chosen architecture (red)</para></caption>
<graphic xlink:href="graphics/ch4-fig10.jpg"/>
</fig>
<fig id="ch4-T1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 4.1:</emphasis> Obtained Dice Scores for each showcased network architecture</para></caption>
<graphic xlink:href="graphics/ch4-tab01.jpg"/>
</fig>
<para>The two best performing approaches are the Siamese (with a mean Dice score of 0.78) and the U-net (with a mean Dice score of 0.76). The performance of the Siamese approach can be explained by the two steps analysis employed by this method, which segments firstly in low resolution therefore with a larger perceptive field, followed by a second step analysing in a higher resolution, with the downside of having a lower perceptive field. On the other hand, the U-net architecture obtained similar results with much lower resource consumption during training and inference.</para>
<para>Based on this performance, a branched U-net cascade was chosen as the preferred architecture, combining both the two step analysis at different resolution levels, as well as the generalisation power associated with the U-net. The chosen architecture consists of independent branches targeting each ROI. For each branch, a 2D U-net takes the down-sampled image as input and produces an intermediate, rough segmentation, which is then up-sampled to the dimensions of the original input image. The intermediate segmentation is then aggregated with the original high-resolution input image to be fed into a 3D U-net (as introduced by Milletari <emphasis>et al.</emphasis> [<link linkend="ch4-bib17">17</link>]), which then outputs a high-resolution segmentation map. Some practical advantages of such a modular architecture are the possibility to update each branch individually in case of additional data being available, as well as to allow scaling up with additional branches targeting new labels without having to update each branch. An overview of the described architecture for a given branch is presented in <link linkend="ch4-F11">Fig. 4.11</link>. Repeating the experiment in benchmark conditions has yielded an averaged Dice score of 0.84 (shown in red in <link linkend="ch4-F10">Fig. 4.10</link>), outperforming all the other candidate architectures.</para>
<fig id="ch4-F11" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 4.11:</emphasis> An overview of the U-net cascade architecture, consisting of a 2D U-net (top) and a 3D U-net (bottom) which takes as input the high resolution input image stacked with the output segmentation of the first stage</para></caption>
<graphic xlink:href="graphics/ch4-fig11.jpg"/>
</fig>
<para>Typical to deep learning applications with limited data sets and despite the use of data augmentation techniques, overfitting was proven to be an issue. This could be clearly seen in the discrepancy between the Dice scores on the train and test subsets respectively. Having chosen an architecture for the fine-tuning stage, additional effort was invested in expanding the data set from 202 to 2192 images. For this stage of the application all five previously mentioned labels of interest were trained on.</para>
<para>Due to the relatively large number of hyper-parameters to be tuned a population-based training method was used, consisting of two evolution phases: exploration and exploitation. During the exploration phase the networks are trained with randomly sampled hyper-parameters sets. During the following exploitation phase the best performing sets of hyper-parameters are identified, and new sets are sampled in close proximity within the hyper-parameter space.</para>
<para>Although Dice loss has proven itself effective in segmentation tasks, the high skewness and variability as well as low availability of data require additional compensatory mechanisms. For this purpose several alternative loss functions were investigated as hyper-parameters, including Focal Tversky loss [<link linkend="ch4-bib1">1</link>], Combo Loss [<link linkend="ch4-bib26">26</link>], Unified Focal Loss (LogCoshDSC) [<link linkend="ch4-bib29">29</link>]. Training experiments indicated that the loss function is the paramount hyper-parameter, having the most impact upon the resulting accuracy of the network. Furthermore, different labels have been shown to benefit differently from each loss function. For example the network trained on the &#x201c;metal&#x201d; label, which has the highest pixel-wise distribution of all classes and typically large structures on each image, performed best when trained using the LogCoshDSC loss. At the same time the &#x201c;VIA&#x201d; and &#x201c;poly&#x201d; labels, both with a very low pixel-wise distribution (<math id="Ch4.S2.SS2.p6.m1" display="inline"><mrow><mi></mi><mo>&lt;</mo><mrow><mn>2.5</mn><mo>%</mo></mrow></mrow></math>) were segmented best by networks trained with the Focal Tversky loss. The Combo loss on the other hand was most effective for the networks targeting the &#x201c;lateral iso&#x201d; and &#x201c;deep trench&#x201d; labels, which have an average pixel-wise distribution but are the most difficult to identify visually.</para>
<para>The average Dice scores obtained on the test set for each label of interest are presented in the table below.</para>
<fig id="ch4-T2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 4.2:</emphasis> Averaged Dice Scores for each label of interest</para></caption>
<graphic xlink:href="graphics/ch4-tab02.jpg"/>
</fig>
<para>The &#x201c;metal&#x201d; and &#x201c;VIA&#x201d; labels obtained the highest Dice scores, with a substantial increase in accuracy of about 10% compared to the benchmark stage. Also of particular note is the &#x201c;deep trench&#x201d; case. Despite being the class with the lowest pixel-wise distribution, only appearing in 58 of the images, the proposed network architecture was able to segment it with reasonable accuracy to make use of the extracted information.</para>
</section>
</section>
<section class="lev1" id="ch4-3">
<title>4.3 Parameter Extraction</title>
<para>The process following the semantic image segmentation is the extraction of the technological device parameters. The overview of the algorithmic approach is shown in algorithm 1. The inputs are the image meta-data &#x2013; with the sole relevant information being the pixel size per image &#x2013; and the segmented image. In a first step the segmented are written to polygon while retaining their class-labels. Subsequently, the polygons of every class (C) are retrieved. From this set of polygons, polygons below a statistically evaluated threshold (area of a polygon instance lower than five times the mean of a polygon instances within this class) are removed from the list From these <emphasis>cleaned</emphasis> polygons, the centroids of the single objects are computed which are utilised for clustering. The customised clustering method is shown in <link linkend="ch4-T3">Table 4.3</link>.</para>
<para><graphic xlink:href="graphics/ch4-alo01.jpg"/></para>
<fig id="ch4-T3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 4.3:</emphasis> Utilised cluster evaluation techniques [<link linkend="ch4-bib14">14</link>]. Notation: <emphasis>n</emphasis>: number of objects in data-set; <emphasis>c</emphasis>: centre of data-set; <emphasis>NC</emphasis>: number of clusters; <emphasis><math id="Ch4.T3.m9" display="inline"><msub><mi>C</mi><mi>i</mi></msub></math></emphasis>: the i-th cluster; <emphasis><math id="Ch4.T3.m10" display="inline"><msub><mi>n</mi><mi>i</mi></msub></math></emphasis>: number of objects in <math id="Ch4.T3.m11" display="inline"><msub><mi>C</mi><mi>i</mi></msub></math>; <emphasis><math id="Ch4.T3.m12" display="inline"><msub><mi>c</mi><mi>i</mi></msub></math></emphasis>: centre of <math id="Ch4.T3.m13" display="inline"><msub><mi>C</mi><mi>i</mi></msub></math>; <math id="Ch4.T3.m14" display="inline"><msub><mi>W</mi><mi>k</mi></msub></math>: the within-cluster sum of squared distances from cluster mean; <math id="Ch4.T3.m15" display="inline"><mrow><mi>W</mi><msub><mo>*</mo><mi>k</mi></msub></mrow></math> appropriate null reference; <math id="Ch4.T3.m16" display="inline"><mi>B</mi></math> reference data-sets</para></caption>
<graphic xlink:href="graphics/ch4-tab03.jpg"/>
</fig>
<fig id="ch4-F12" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 4.12:</emphasis> Utilised cluster evaluation techniques.</para></caption>
<graphic xlink:href="graphics/ch4-fig12.jpg"/>
</fig>
<para>Different cluster evaluation techniques &#x2013; namely Calinski-Harabasz (CH) [<link linkend="ch4-bib4">4</link>], gap [<link linkend="ch4-bib28">28</link>], Davies-Bouldin (DB) [<link linkend="ch4-bib5">5</link>], a custom squared Davies-Bouldin (DB<math id="Ch4.S3.p3.m1" display="inline"><msup><mi></mi><mn>2</mn></msup></math>) [<link linkend="ch4-bib5">5</link>], and silhouette (Sil.) [<link linkend="ch4-bib23">23</link>] &#x2013; are conducted on one-dimensional feature vectors which are constituted of the y-share of the centroid coordinates. The previous clustering is done via trough k-means clustering while the k is kept &#x2013; adapted to the use case &#x2013; between <emphasis>2</emphasis> and <emphasis>10</emphasis>. For the different evaluation techniques optimal number of clusters (<emphasis>k</emphasis>) are reported through different metrics (minimum, maximum, elbow). This computationally costly approach is suitable for the use case since the vectors are one-dimensional and the total number of polygon objects to be evaluated is relatively small (<math id="Ch4.S3.p3.m2" display="inline"><mo rspace="4.2pt">&lt;</mo></math>100). In the final step of the vertical clustering, the optimal number of clusters is inferred through a majority voting among the individual evaluation techniques.</para>
<fig id="ch4-F13" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 4.13:</emphasis> Example cross-section image with annotated metal and contact/VIA features</para></caption>
<graphic xlink:href="graphics/ch4-fig13.jpg"/>
</fig>
<para>Since the polygon objects are now vertically assigned, a clustering in the horizontal dimension is the next step. The procedure is the same as previously discussed for the vertical clustering. For the vertically and horizontally clustered elements, the technological, geometrical parameters can be inferred. These are illustrated via <link linkend="ch4-F13">Figure 4.13</link> for the metal and VIA classes. The vertical height is determined for metallisation layers and height, width, and pitches for the interconnecting contact and VIA layers. After the polygons objects are assigned to classes, these attributes can be calculated through trivial mathematical operation. Height is the difference of the bounding box maximum and minimum in vertical dimension. Width is the difference of the bounding box in horizontal dimension, and pitches are the differences of the x-coordinate of the centroid of two adjacent polygon objects. The values are respectively averaged within all classes.</para>
<fig id="ch4-F14" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 4.14:</emphasis> Example cross-section image (upper left). The polygonised VIA objects are shown (lower left). A dendrogram is shown for the relative distances of the y-coordinates of the single objects (upper right). Finally, the results of the utilised cluster evaluation techniques are presented (lower right).</para></caption>
<graphic xlink:href="graphics/ch4-fig14.jpg"/>
</fig>
<para>An example is shown for the VIA class through the example in <link linkend="ch4-F14">Figure 4.14</link>. After segmentation of the grey-scale image, the individual segmented classes are inferred into polygon objects. Here the VIA class is exemplified. The vertical clustering process is shown through the two right images. The dendrogram visualises the linkage of the different clusters which are subsequently optimised via discussed approach. The optimum number of clusters are shown in the bottom right figure. An evaluation techniques report an optimum of four (different values constitute the optimum) clusters. Following this, these four are subsequently clustered in horizontal dimension and respectively geometrically inferred. Following results were obtained for this example (besides the absolute values, the relative deviation to a manual measurement is given):</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para>Contact (<emphasis role="strong">h</emphasis>eight, <emphasis role="strong">w</emphasis>idth, <emphasis role="strong">p</emphasis>itch): 942 nm (+9.66%), 319 nm (+5.98%), 525 nm (+12.42%)</para></listitem>
<listitem><para>VIA1 (h, w, p): 870 nm (+26.82%), 319 nm (n.a.), 545 nm (n.a.)</para></listitem>
<listitem><para>VIA2 (h, w, p): 898 nm (+14.83%), 319 nm (n.a.), 542 nm (n.a.)</para></listitem>
<listitem><para>VIA3 (h, w, p): 1086 nm (+16.27%), 434 nm (n.a.), 750 nm (n.a.)</para></listitem>
</itemizedlist>
<para>Within the technology validation use case, the inferred technological features are tested against the designed and manufactured technological properties. This is computed via multi-dimensional distance matching (e.g. Euclidean, rectilinear distance) of both vectors. The validation accuracy depends on several different factors which are the segmentation quality, parameter extraction accuracy and image acquisition completeness. Experiments have shown that the current automated end-to-end flow reaches 75% accuracy for previously known Al-Tu technologies. Improvement is necessary for copper (Cu) technologies which are more complex to segment. According to existing procedures within the industry, deviations of less than 5% for pitches and deviations of less than 25% for all other geometrical measurements compared to a ground truth, i.e. the designed technology parameters are acceptable. The same requirements have been used as a benchmark for the validation of this application. The high deviations are a consequence attributed to process variances during device manufacturing and de-processing. Presented image shows a single frame which was acquired in a sub-optimal zoom level for measuring discussed features. Yet, almost all requirements were achieved. In summary it can be stated that the proof-of-concept presented in this work displays strong potential to satisfy existing industrial requirements, especially when adequate zooms levels are chosen for the particular technological parameters.</para>
</section>
<section class="lev1" id="ch4-4">
<title>4.4 Conclusion</title>
<para>The settings for AI implementation in an industrial setting are often completely different from consumer applications. Data being scarce the design of productive AI application is forcibly <emphasis>data-driven</emphasis>, or more specifically <emphasis>data-adapted</emphasis>. Industrial parameters are manifold, and the requirements typically impose the need to automate, improve, or even enable new processes. To make an AI-based solution <emphasis>viable</emphasis> these requirements must be met. In this work, we have shown through an end-to-end technology demonstrator &#x2013; incorporating deep learning and cluster evaluation &#x2013; showcasing the automation of semiconductor technology identification based on SEM cross-section analysis. A comparison of different convolutional neural network architectures was presented, and a candidate best suited for the SEM segmentation task was drafted. The proposed candidate architecture represents a cascade of 2D and 3D Unets, arranged in branches each dedicated to a single label of interest. Following a pragmatic perspective, a modular design is proposed, ensuring scalability and ease-of-maintenance. Trained on a custom-created data set of 2192 images, the proposed architecture obtained Dice scores in the range of 0.76-0.93 for labels of different complexity, arguing in favour of the employment of supervised deep learning-based methods even in applications with strongly limited amounts of available labelled data. Based on the obtained results, a parameter extraction algorithm is proposed, aimed at exploiting the obtained segmentation maps with the purpose of identifying and validating the technology of the investigated semiconductor devices. The obtained results were in the range of ground truth measurements with deviations in an acceptable measuring range. The potential for narrowing down these uncertainty ranges were outlined.</para>
</section>
<section class="lev1" id="ch4-5">
<title>4.5 Future Work</title>
<para>Following the development and validation steps described above, a production test stage will determine the potential of the segmentation component of the process to be used in other applications of semiconductor analysis. Aside from a high degree of automation and the mandatory fulfilment of functional requirements, industry has established high thresholds for non-functional requirements. Maintainability, system up time, extensibility, usability, or updateability are just some of the potential requirements across different industries. Such requirements are addressed by specialised frameworks such as Ray and TorchServe. Combined with the advantages of a modular architecture, they enable the possibility to update each network with virtually no down-time. Additional investigations are conducted in the expansion of the data augmentation pipeline, with the goal of increasing the exploitation of the available data set regardless of its relatively small size.</para>
</section>
<section class="lev1">
<title>Acknowledgment</title>
<para>This work is conducted under the framework of the ECSEL AI4DI &#x201c;Artificial Intelligence for Digitising Industry&#x201d; project. The project has received funding from the ECSEL Joint Undertaking (JU) under grant agreement No 826060. The JU receives support from the European Union&#x2019;s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Italy, Latvia, Belgium, Lithuania, France, Greece, Finland, Norway.</para>
</section>
<section class="lev1" id="ch4-Ref">
<title>References</title>
<para id="ch4-bib1">[1] N. Abraham and N. Mefraz Khan. A novel focal tversky loss function with improved attention u-net for lesion segmentation. <emphasis>2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019)</emphasis>, pages 683&#x2013;687, 2019.</para>
<para id="ch4-bib2">[2] F. Altaf, S. M. S. Islam, N. Akhtar, and N. Khalid Janjua. Going deep in medical image analysis: Concepts, methods, challenges, and future directions. <emphasis>IEEE Access</emphasis>, 7:99540&#x2013;99572, 2019.</para>
<para id="ch4-bib3">[3] S. Budd, E. C. Robinson, and B. Kainz. A survey on active learning and human-in-the-loop deep learning for medical image analysis. <emphasis>Medical Image Analysis</emphasis>, 71:102062, 2021.</para>
<para id="ch4-bib4">[4] T. Cali&#x144;ski and J. Harabasz. A dendrite method for cluster analysis. <emphasis>Communications in Statistics - Theory and Methods</emphasis>, 3(1):1&#x2013;27, 01 1974.</para>
<para id="ch4-bib5">[5] D. L. Davies and D. W. Bouldin. A cluster separation measure. <emphasis>IEEE Transactions on Pattern Analysis and Machine Intelligence</emphasis>, PAMI-1(2):224&#x2013;227, 1979.</para>
<para id="ch4-bib6">[6] A. Diez-Olivan, J. Del Ser, D. Galar, and B. Sierra. Data fusion and machine learning for industrial prognosis: Trends and perspectives towards industry 4.0. <emphasis>Information Fusion</emphasis>, 50:92&#x2013;111, 2019.</para>
<para id="ch4-bib7">[7] European Commision. A chips act for europe.</para>
<para id="ch4-bib8">[8] Fortune Business Insights. Semiconductor market size, share &amp; covid-19 impact analysis, 2021-2028. FBI102365.</para>
<para id="ch4-bib9">[9] U. Guin, K. Huang, D. DiMase, J. M. Carulli, M. Tehranipoor, and Y. Makris. Counterfeit integrated circuits: A rising threat in the global semiconductor supply chain. <emphasis>Proceedings of the IEEE</emphasis>, 102(8):1207&#x2013;1228, 2014.</para>
<para id="ch4-bib10">[10] S. G&#xf6;ke, K. Staight, and R. Vrijen. Scaling ai in the sector that enables it: Lessons for semiconductor-device makers.</para>
<para id="ch4-bib11">[11] M. Kawula, D. Purice, M. Li, G. Vivar, S.-A. Ahmadi, K. Parodi, C. Belka, G. Landry, and C. Kurz. Dosimetric impact of deep learning-based ct auto-segmentation on radiation therapy treatment planning for prostate cancer. <emphasis>Radiation Oncology</emphasis>, 17, 01 2022.</para>
<para id="ch4-bib12">[12] G.-M. Konnerth. Exploring application-oriented methods to improve cnn-based segmentation of sem microchip images. Master&#x2019;s thesis, Technical University of Munich, 2020.</para>
<para id="ch4-bib13">[13] X. Li, T. Lai, S. Wang, Q. Chen, C. Yang, R. Chen, J. Lin, and F. Zheng. Weighted feature pyramid networks for object detection. In <emphasis>2019 IEEE Intl Conf on Parallel Distributed Processing with Applications, Big Data Cloud Computing, Sustainable Computing Communications, Social Computing Networking (ISPA/BDCloud/SocialCom/SustainCom)</emphasis>, pages 1500&#x2013;1504, 2019.</para>
<para id="ch4-bib14">[14] Y. Liu, Z. Li, H. Xiong, X. Gao, and J. Wu. Understanding of internal clustering validation measures. In <emphasis>2010 IEEE International Conference on Data Mining</emphasis>. IEEE.</para>
<para id="ch4-bib15">[15] M. Ludwig, B. Lippmann, A.-C. Bette, and C. Lenz. Demo: A fully automated process for semiconductor technologyanalysis through SEM cross-sections. In <emphasis>25th International Conference on Pattern Recognition (ICPR)</emphasis>.</para>
<para id="ch4-bib16">[16] K. Martin, N. Windunga, S. Sani, S. Massie, and J. Clos. A convolutional siamese network for developing similarity knowledge in the selfback dataset, 2017.</para>
<para id="ch4-bib17">[17] F. Milletari, N. Navab, and S.-A. Ahmadi. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In <emphasis>2016 Fourth International Conference on 3D Vision (3DV)</emphasis>, pages 565&#x2013;571. IEEE, 10 2016.</para>
<para id="ch4-bib18">[18] L. Najman and M. Schmitt. Watershed of a continuous function. <emphasis>Signal Processing</emphasis>, 38(1):99&#x2013;112, 1994. Mathematical Morphology and its Applications to Signal Processing.</para>
<para id="ch4-bib19">[19] D. Nameirakpam, K. Singh, and Y. Chanu. Image segmentation using k -means clustering algorithm and subtractive clustering algorithm. <emphasis>Procedia Computer Science</emphasis>, 54:764&#x2013;771, 12 2015.</para>
<para id="ch4-bib20">[20] R. Nock and F. Nielsen. Statistical region merging. <emphasis>IEEE Transactions on Pattern Analysis and Machine Intelligence</emphasis>, 26(11):1452&#x2013;1458,2004.</para>
<para id="ch4-bib21">[21] N. Otsu. A threshold selection method from gray level histograms. <emphasis>IEEE Transactions on Systems, Man, and Cybernetics</emphasis>, 9:62&#x2013;66, 1979.</para>
<para id="ch4-bib22">[22] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi, editors, <emphasis>Medical Image Computing and Computer-Assisted Intervention &#x2013; MICCAI 2015</emphasis>, pages 234&#x2013;241, Cham, 2015. Springer International Publishing.</para>
<para id="ch4-bib23">[23] P. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. <emphasis>J. Comput. Appl. Math.</emphasis>, 20(1):53&#x2013;65, November 1987.</para>
<para id="ch4-bib24">[24] F. Schiegg. Boundary detection and semantic segmentation of sem-images. Master&#x2019;s thesis, Technical University of Munich, 2020.</para>
<para id="ch4-bib25">[25] S. M. Sze and K. K. Ng. <emphasis>Physics of Semiconductor Devices</emphasis>. Wiley, 2006.</para>
<para id="ch4-bib26">[26] S. A. Taghanaki, Y. Zheng, S. K. Zhou, B. Georgescu, P. Sharma, D. Xu, D. Comaniciu, and G. Hamarneh. Combo loss: Handling input and output imbalance in multi-organ segmentation. <emphasis>Computerized Medical Imaging and Graphics</emphasis>, 75:24&#x2013;33, 2019.</para>
<para id="ch4-bib27">[27] T. Takikawa, D. Acuna, V. Jampani, and S. Fidler. Gated-scnn: Gated shape cnns for semantic segmentation. In <emphasis>2019 IEEE/CVF International Conference on Computer Vision (ICCV)</emphasis>, pages 5228&#x2013;5237, 2019.</para>
<para id="ch4-bib28">[28] R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a data set via the gap statistic. 63(2):411&#x2013;423.</para>
<para id="ch4-bib29">[29] M. Yeung, E. Sala, C.-B. Sch&#xf6;nlieb, and L. Rundo. A mixed focal loss function for handling class imbalanced medical image segmentation. <emphasis>ArXiv</emphasis>, abs/2102.04525, 2021.</para>
<para id="ch4-bib30">[30] X. Zhang, Z. Fu, Y. Huang, A. Lin, Y. Shi, and Y. Xu. Effective method to automatically measure the profile parameters of integrated circuit from SEM/TEM/STEM images. In <emphasis>2017 China Semiconductor Technology International Conference (CSTIC)</emphasis>. IEEE, 3 2017.</para>
<para id="ch4-bib31">[31] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia. Pyramid scene parsing network. In <emphasis>2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</emphasis>, pages 6230&#x2013;6239, 2017.</para>
</section>
</chapter>
<chapter class="chapter" id="ch5" label="5" xreflabel="5">
<title>AI Machine Vision System for Wafer Defect Detection</title>
<subtitle>Dmitry Morits, Marcelo Rizzo Piton, and Timo Laakko</subtitle>
<affiliation>VTT Technical Research Centre of Finland Ltd, Finland</affiliation>
<section class="lev1">
<title>Abstract</title>
<para>Surface defects generated during semiconductor wafers processing are among the main challenges in micro- and nanofabrication. The wafers are typically scanned using optical microscopy and then the images are inspected by human experts. That tends to be a quite slow and tiring process. The development of a reliable machine vision-based system for correct identification and classification of wafer defect types for replacement of manual inspection is a challenging task, due to the variety of possible defects. In this work we developed a machine vision system for the inspection of semiconductor wafers and detection of surface defects. The system integrates an optical scanning microscopy system and an AI algorithm based on the Mask R-CNN architecture. The system was trained using a dataset of microscopic images of wafers with Micro Electro-Mechanical Systems (MEMS), silicon photonics and superconductor devices at different fabrication stages including surface defects. The achieved accuracy and detection speed makes the system promising for cleanroom applications.</para>
<para><emphasis role="strong">Keywords:</emphasis> AI, machine vision, semiconductor wafer, defect detection, convolutional neural network, Mask R-CNN.</para>
</section>
<section class="lev1" id="ch5-1">
<title>5.1 Introduction and Background</title>
<para>One of the main challenges in micro- and nanofabrication is the identification and classification of surface defects. The defects are unavoidably generated during processes such as chemical-mechanical polishing, photolithography, etching, diffusion and ion implantation, oxidation, metallization, and others [<link linkend="ch5-bib1">1</link>] [<link linkend="ch5-bib2">2</link>]. The increasing complexity and density of semiconductor devices leads to an increase of the number of surface defects and dictates stricter requirements for defect detection. For example, contamination particles harmless for some design rules at the same time could be critical as the device dimensions grow smaller. The defect criteria are also varying in different locations of devices: for example, defects in a movable part or in the hermetic bond-sealing frame of a MEMS device are usually more severe than in secondary structures. <link linkend="ch5-F1">Figure 5.1</link> illustrates microscopic images with surface defects generated during the microfabrication of different superconductor and semiconductor devices. Typical types of defects include particles, photoresist spots, edge defects, scratches, etc. It becomes evident that defect detection is an extremely important procedure, especially at the critical areas of the devices.</para>
<fig id="ch5-F1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 5.1:</emphasis> Examples of microscopic images of various superconductor and semiconductor devices with surface defects</para></caption>
<graphic xlink:href="graphics/ch5-fig01.jpg"/>
</fig>
<para>VTT Micronova semiconductor fab is a Finnish national research infrastructure for micro-, nano- and quantum technology. The research areas include MEMS, photonic, quantum and other specialty components that can be used to create a wide range of sensors and devices. At VTT, the current visual inspection process of the wafer surface is manually performed by human experts. The wafers are scanned using optical microscopy, and then the images are inspected by the human experts. Since the inspection task requires extreme concentration, the time that an expert can perform the task is quite limited. Additionally, it tends to be a quite slow, tiring process and susceptible to human mistakes. Identification of defects by experts alone can potentially result in false identifications due to fatigue and lack of objectivity. The goal of this work is the development of a reliable machine vision-based system for the correct identification of wafer defects in the hope of replacing manual inspection. Moreover, this system would be directly integrated in the wafer inspection production line. Such a system would speed up the defect inspection, simplify the analysis and eventually help to improve the fabrication yield.</para>
</section>
<section class="lev1" id="ch5-2">
<title>5.2 Machine Vision-based System Description</title>
<para>The general architecture of the developed machine vision system is shown in <link linkend="ch5-F2">Figure 5.2</link>. The wafers are inspected by a semi-automatic microscopy scanning system. In this work we tested both IJ 13 IR-inspector and Muetec CD3000 optical scanning system. The system produces a set of microscopic images, covering the full area of the wafer.</para>
<fig id="ch5-F2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 5.2:</emphasis> General architecture of the developed machine vision system</para></caption>
<graphic xlink:href="graphics/ch5-fig02.jpg"/>
</fig>
<para>For the training of neural networks, we prepared an image dataset using microscopic images of wafers with MEMS, silicon photonics and superconductor devices at different fabrication stages including surface defects. The initial set included images of different resolutions and magnifications. First, we manually labelled the defects on each image and then cropped the areas with defects. The cropping allowed the increase of the dataset size and provided faster and more consistent training. Next, a data augmentation technique was used to increase the amount of data by adding slightly modified copies of already existing data, or newly created synthetic data from existing data. That procedure acts as a regularizer and helps to reduce overfitting when training a machine learning model [<link linkend="ch5-bib3">3</link>]. In this case, the augmentation included mirror and rotation image transformation, as well as a change of the RGB spectre of the images. The full procedure of dataset preparation is schematically shown in <link linkend="ch5-F3">Figure 5.3</link>. The dataset was split into training and validation sets, containing 935 and 165 images each.</para>
<fig id="ch5-F3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 5.3:</emphasis> A scheme of the image dataset preparation, including labelling, cropping and data augmentation</para></caption>
<graphic xlink:href="graphics/ch5-fig03.jpg"/>
</fig>
<para>Here we used a Convolutional Neural Network (CNN): a special type of deep learning algorithm, used primarily for image recognition and processing. CNNs are inspired by the organization of the animal visual cortex [<link linkend="ch5-bib4">4</link>] [<link linkend="ch5-bib5">5</link>] and are designed to learn spatial hierarchies of features, from low- to high-level patterns. We developed an algorithm based on the Mask R-CNN architecture [<link linkend="ch5-bib6">6</link>], which is a state-of-the-art algorithm for object detection - a computer vision technique that enables the identification and location of objects in an image or video. Mask R-CNN is the latest stage of evolution of CNNs, providing high detection accuracy. At the same time, it requires more computational resources compared to faster algorithms, such as YOLO [<link linkend="ch5-bib7">7</link>]. Mask R-CNN consists of two stages. The first stage, called a Region Proposal Network, proposes candidate object bounding boxes. The second stage extracts features using Region of Interest Pool from each candidate box, then performs classification and bounding-box regression and outputs a binary mask for each Region. The ResNet-101 [<link linkend="ch5-bib8">8</link>] convolutional backbone architecture was used for feature extraction over an entire image. The algorithm was optimized for so-called binary classification, which provides results in &#x201c;defect vs background&#x201d; format, without classification of defects, shown in <link linkend="ch5-F4">Figure 5.4</link>. The general comparison of the algorithm&#x2019;s performance to other object detection algorithms can be found in Refs [<link linkend="ch5-bib6">6</link>] and [<link linkend="ch5-bib9">9</link>].</para>
<para>Among the main requirements for the system are the functional suitability for defect detection, the integration of the scanning optical microscope and the server with the AI software, the usability for cleanroom users who are not familiar with the details of implementation, and the readability and visualization of the detection results for the users. The main KPIs for the system were: detection accuracy, time of processing a single image and evaluation by the cleanroom users from the points of usability and result readability. The AI algorithm based on the Mask R-CNN architecture passed several rounds of optimization and testing using microscopic images of various microelectronic devices.</para>
<para>There has been a significant progress in the application of deep learning techniques for wafer defect detection and classification [<link linkend="ch5-bib10">10</link>]. The main innovation elements of this work compared to the state of the art is the integration of the algorithm with the scanning microscopy system, and training of the system using the dataset containing images of various devices at different stages of processing, instead of standard image databases available online. It allows the system to better distinguish between wafer defects and features of the devices and provides reliable detection of wafer defects for a wide range of semiconductor components.</para>
<para>To improve the system usability for the end-users, we implemented a Graphical User Interface adapted for cleanroom personnel not familiar with AI systems. The software was installed on a PC/server with NVIDIA Quadro RTX 5000 16GB GPU at the VTT Micronova cleanroom. Then the algorithm was integrated with the optical scanning microscopy system Muetec CD3000 by connection through the internal network. To improve the readability of the results, the system provides binary classification defect vs background with results available in both graphical and text formats. The feedback from the cleanroom experts helped in the improvement of system usability after several iterations of optimization. The testing results at the latest dataset with 192 images of 1600x1200px resolution and 5x optical magnification, demonstrated 86% accuracy with a detection time of 1<math id="Ch5.S2.p6.m1" display="inline"><mo>&#xf7;</mo></math>2 seconds per image. The accuracy of the system is approximately on the same level as that of a human operator, although it also depends a lot on the experience of the operators and their tiredness. The experts estimated 86% accuracy as sufficient for applications at VTT cleanroom but mentioned that only about 15% of the detected defects were critical for wafer processing. Unfortunately, the criteria of a defect being critical or non-critical is very device-specific and cannot be easily generalized. After the system provides the detection results, the final decision on the importance of the defects for processing had to be made by the cleanroom experts.</para>
<fig id="ch5-F4" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 5.4:</emphasis> Example of binary classification of wafer defects: defect vs background</para></caption>
<graphic xlink:href="graphics/ch5-fig04.jpg"/>
</fig>
<para>Regarding the system scalability, in the current work we did not have the goal of moving towards smaller technology nodes, although such scaling might require utilization of faster neural networks, like one-stage YOLO detectors. In general, the main expected impact of the system development is the reduction of the overall working time required for wafer defect inspection. We believe that the system will help saving valuable working time of cleanroom experts, improve fabrication yield and reduce fabrication cost.</para>
</section>
<section class="lev1" id="ch5-3">
<title>5.3 Conclusion</title>
<para>We developed a system for the detection of wafer surface defects. The system integrates an optical scanning microscopy system and an AI algorithm based on the Mask R-CNN architecture. The image dataset used for training and testing the system included microscopic images of wafers with MEMS, silicon photonics and superconductor devices at different fabrication stages including surface defects. The system demonstrated functional suitability for defect detection, high accuracy, and reasonable detection speed, making it suitable for potential cleanroom applications.</para>
</section>
<section class="lev1">
<title>Acknownledgements</title>
<para>This work is conducted under the framework of the ECSEL AI4DI &#x201c;Artificial Intelligence for Digitising Industry&#x201d; project. The project has received funding from the ECSEL Joint Undertaking (JU) under grant agreement No 826060. The JU receives support from the European Union&#x2019;s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Italy, Latvia, Belgium, Lithuania, France, Greece, Finland, Norway.</para>
</section>
<section class="lev1" id="ch5-Ref">
<title>References</title>
<para id="ch5-bib1">[1] H. J. Queisser, E. E. Haller, &#x201c;Defects in Semiconductors: Some Fatal, Some Vital&#x201d;, Science, 281, 945&#x2013; 950, 1998.</para>
<para id="ch5-bib2">[2] T. Yuan, W. Kuo, and S. J. Bae, &#x201c;Detection of spatial defect patterns generated in semiconductor fabrication processes&#x201d;, <emphasis>IEEE Trans. Semicond. Manuf.</emphasis>, vol. 24, no. 3, pp. 392&#x2013;403, Aug. 2011.</para>
<para id="ch5-bib3">[3] A. Buslaev, V. I. Iglovikov,; E. Khvedchenya, A. Parinov, M. Druzhinin, A. A. Kalinin, &#x201c;Albumentations: Fast and Flexible Image Augmentations&#x201d;, Information 11, 2, 2020. <ulink url="https://www.mdpi.com/2078-2489/11/2/125">https://www.mdpi.com/2078-2489/11/2/125</ulink>.</para>
<para id="ch5-bib4">[4] S. Albawi, T. A. Mohammed, S. Al-Zawi, &#x201c;Understanding of a convolutional neural network&#x201d;, <emphasis>International Conference on Engineering and Technology (ICET), IEEE</emphasis>, pp. 1-6, 2017.</para>
<para id="ch5-bib5">[5] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, F.E. Alsaadi, &#x201c;A survey of deep neural network architectures and their applications&#x201d;, <emphasis>Neurocomputing</emphasis>, 234, pp. 11-26, 2017.</para>
<para id="ch5-bib6">[6] K. He, G. Gkioxari, P. Doll<math id="bib.bib6.m1" display="inline"><mover accent="true"><mi mathvariant="normal">a</mi><mo>&#xb4;</mo></mover></math>r, R. Girshick, &#x201c;Mask R-CNN&#x201d;. arXiv:1703.06870, 2018.</para>
<para id="ch5-bib7">[7] M. Carranza-Garc<math id="bib.bib7.m1" display="inline"><mover accent="true"><mi mathvariant="normal">i</mi><mo>&#xb4;</mo></mover></math>a, J. Torres-Mateo, P. Lara-Ben<math id="bib.bib7.m2" display="inline"><mover accent="true"><mi mathvariant="normal">i</mi><mo>&#xb4;</mo></mover></math>tez, J. Garc<math id="bib.bib7.m3" display="inline"><mover accent="true"><mi mathvariant="normal">i</mi><mo>&#xb4;</mo></mover></math>a-Guti<math id="bib.bib7.m4" display="inline"><mover accent="true"><mi mathvariant="normal">e</mi><mo>&#xb4;</mo></mover></math>rrez, &#x201c;On the Performance of One-Stage and Two-Stage Object Detectors in Autonomous Vehicles Using Camera Data&#x201d;, Remote Sens. 13, 89, 2021.</para>
<para id="ch5-bib8">[8] K. He, X. Zhang, S. Ren, J. Sun, &#x201c;Deep Residual Learning for Image Recognition&#x201d;, <emphasis>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</emphasis>, 770-778, 2016.</para>
<para id="ch5-bib9">[9] Z. Zhao, P. Zheng, S. Xu, X. Wu, &#x201c;Object Detection With Deep Learning: A Review&#x201d;, <emphasis>IEEE Transactions on Neural Networks and Learning Systems</emphasis>, 30, 11, 2019.</para>
<para id="ch5-bib10">[10] U. Batool, M. I. Shapiai, M. Tahir, Z. H. Ismail, N. J. Zakaria, A. Elfakharany, &#x201c;A Systematic Review of Deep Learning for Silicon Wafer Defect Recognition&#x201d;, <emphasis>IEEE Access</emphasis>, 9, 116573, 2021.</para>
</section>
</chapter>
<chapter class="chapter" id="ch6" label="6" xreflabel="6">
<title>Failure Detection in Silicon Package</title>
<subtitle>Saad Al-Baddai and Jan Papadoudis</subtitle>
<affiliation>Infineon Technologies AG, Germany</affiliation>
<section class="lev1">
<title>Abstract</title>
<para>In an ever more connected world, semiconductor devices represent the core of every technically sophisticated system. The desired quality and effectiveness of such a system through assembly and packaging processes is high demanding. In order to achieve an expected quality, the output of each process must be inspected either manually or rule-based. The latter would lead to high over-reject rates which require a lot of additional manual effort. Moreover, such an inspection is sort of handcrafted by engineers, who can only extract shallow features. As a result, either more yield-losses due to an increase in the rejection rate or more products with low quality will be shipped. Therefore, the demand for advanced image inspection techniques is constantly increasing. Recently, machine learning and deep learning algorithms are playing an increasingly critical role to fulfil this demand and therefore have been introduced in multiple applications. In this paper, an overview of the potential use of advanced machine learning techniques is explored by showcasing of image and wirebonding inspection in semiconductor manufacturing. The results are very promising and show that AI models can find failures accurately in a complex environment.</para>
<para><emphasis role="strong">Keywords:</emphasis> anomaly detection, labelling, manufacturing AI solutions, AI integration, transfer learning, scalability.</para>
</section>
<section class="lev1" id="ch6-1">
<title>6.1 Introduction and Background</title>
<para>Semiconductor manufacturing produces the most highly advanced microchips in the world. A manufacturing process of these chips goes through multiple sequences and interacting sub-processes and during that operates in extreme quality-demanding conditions. Thus, it has an increasing complexity and demand on quality requirements, as electronics increasingly become an important part of modern society. In principle semiconductor manufacturing is equipped with lots of sensors to monitor the processes but it lacks a suitable way to make use of this data. However, due to the complexity of the processes and unknown correlation among the collected data, such traditional techniques become quite limited. Here&#x2019;s where AI takes the initiative and offers a promising solution for feature extraction, condition monitoring and fault modelling for anomaly/defect detection using sophisticated algorithms [<link linkend="ch6-bib5">5</link>]. Therefore, one of the success factors in optimizing the industrial processes is either automatic anomaly detection, supervised learning or both, which leads to prevent production flaws, herewith improving quality, increasing yields and making benefits. The popular way of anomaly detection in many of industrial application is by adjusting digital camera parameters or sensors during collecting images or time series data. This is basically an image or signal anomaly detection problem that is searching later on for patterns that are different from normal data [<link linkend="ch6-bib4">4</link>]. As a human one can easily manage such task by recognizing of normal patterns, but this is relatively not easy for machines. Unlike other classical approach, image anomaly detection faces some of the following difficult challenges: class imbalance, quality of data, and unknown anomaly [<link linkend="ch6-bib4">4</link>]. A prevalence of abnormal events are generally exception, whereas normal events account for a significant proportion. Some techniques usually handle the anomaly detection problem as a &#x201c;one-class&#x201d; problem. Here models learn by using the normal data as truth ground and afterwards evaluates whether the new data belong to this truth ground or not, by the degree of similarity to the truth ground. In the early applications of surface defect detection, the background is often modeled by designing handmade features on defect-free data. For example, Bennatnoun et al. used blobs technique [<link linkend="ch6-bib3">3</link>] to characterize the correct texture and to detect deviations through changes in the charter ships of generated blobs. Amet et al. [<link linkend="ch6-bib2">2</link>] used wavelet filters to extract different scales of defect-free images, then extracted the informative features of different frequency scales of images. However, most of these methods focus can work with homogeneous date with good quality and would require a prior knowledge. Generally, still some challenges which strongly depend on the field of application. Thus, there is no universal pattern or system, which does not directly allow to use techniques developed for one application to another. Thus, machine/deep learning offers promising solutions in such complex environment. However, the former can be adapted or scaled to other application or use cases. Due to these above-mentioned challenges unsupervised anomaly detection on multi-dimensional data is very highly demanding in machine learning and business applications [<link linkend="ch6-bib6">6</link>]. Please note, this paper is extended of the published work in [<link linkend="ch6-bib1">1</link>]. The latter focused on the data preparation, labelling techniques and preliminary results. A new contribution related to quantities, framework and transfer learning and scalability is presented. Therefore, a short description about the data is introduced. Then, labelling approach is shortly discussed. Afterwards, framework is depicted and effective of transfer learning is discussed. Finally, the results are showed and conclusion is drawn.</para>
</section>
<section class="lev1" id="ch6-2">
<title>6.2 Dataset Description</title>
<para>This manuscript showcases dealing with time series data as well as with images at different processes during packaging. The data for the first case is collected in the early phase, at wirebonding process. These data are collected from three different sensors. Namely a current sensor, located at the transducer, a displacement sensor measuring the deformation of the wire respectively the path of the bonding tool and a frequency sensor, also located at the transducer of the wirebonder. Each of these sensors collects roughly 432 features during 143 timestamps. However, the collected data are highly redundant (see <link linkend="ch6-F1">Figure 6.1</link>). This is because there is multiple bond connection on one device which share the same process parameters and behave quite similar. However, sometimes, contamination of the device or a misadjusted machine would cause misaligned or deformed bonds, see <link linkend="ch6-F1">Figure 6.1</link>. Here, there is a need to develop a ML solution for detecting such deviations. Similarly, the biggest challenge of the outgoing optical inspection (OOI), in the second use case, is the defect detection on the heatsink, see <link linkend="ch6-F1">Figure 6.1</link>, which consists of a rough copper surface. It needs to inspected for scratches, metal or mold particles as well as for mechanical damage like imprints. However, this surface shows a very high variety in appearance, as it is oxidized during preceding high temperature testing steps. Hence, the inspection cannot be carried out using rule-based algorithms, as the oxidized areas cannot be distinguished clearly from true defects by a rule-based algorithm. In this context, trained personnel took care of the heatsink inspection and was used to label the image data, roughly 300 images, for supervised learning.</para>
<fig id="ch6-F1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 6.1:</emphasis> <emphasis>Left</emphasis>: Curve with abnormal minimum position (red) in comparison to normal ones (white) of recorded sensor data during wirebonding process. <emphasis>Right</emphasis>: shows an example of abnormal OOI image with shown crack on the surface.</para></caption>
<graphic xlink:href="graphics/ch6-fig01.jpg"/>
</fig>
<section class="lev2" id="ch6-2-1">
<title>6.2.1 Data Collection and Labelling</title>
<para>Data labelling is an essential step in a machine learning area. Here, the common phrase &#x201c;Garbage in - Garbage out&#x201d; is used very commonly in the ML community, that means the quality of the model strongly depends on the quality of the (labelled) training data. In this work, two approached are considered:</para>
<para><math id="Ch6.S2.SS1.p2.m1" display="inline"><mo>&#x2219;</mo></math> X <math id="Ch6.S2.SS1.p2.m2" display="inline"><mo>&#x2192;</mo></math> Y</para>
<para>Indeed, data labelling is a task that requires a lot of manual work. In this approach, labelling data(images) is done based on human experience. Luckily, only few percent of data had to reviewed after applying the tool introduced in [<link linkend="ch6-bib1">1</link>] for reducing the effort. This process is done by review the sorted data of historical images and recognize on the defects by looking closely at heat sink surface. Thus, there is no need for prior knowledge about the status of Y machine to sort out X data. Afterwards, simply, the data can be categorized into two categories as either healthy(good) or unhealthy(fail). These data, then, can be used for training the AI model. This approach is used for labelling the first case OOI.</para>
<para><math id="Ch6.S2.SS1.p4.m1" display="inline"><mo>&#x2219;</mo></math> Y <math id="Ch6.S2.SS1.p4.m2" display="inline"><mo>&#x2192;</mo></math> X</para>
<para>Contrary to the first approach, in this approach human&#x2019;s experience unfortunately is not fully helpful for labelling data, as the data is very complex. Hence, the design of experiment (DOE) is set by checking the machine status while collecting data. Therefore, a predefined mis-adjustment in Y wire bond should be known to get deviation on X data.</para>
</section>
</section>
<section class="lev1" id="ch6-3">
<title>6.3 Development and Deployment</title>
<para>In order to satisfy the robustness requirements of AI model, we propose the AI framework to be adapted to the best practices with the following characteristics</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para>Short adaption cycles.</para></listitem>
<listitem><para>Testing in every stage and automatically integration and deployment.</para></listitem>
<listitem><para>Reproducible processes and reliable software releases.</para></listitem>
</itemizedlist>
<para><link linkend="ch6-F2">Figure 6.2</link> shows a typical DevOps process which is the basis for continuous integration and delivery. Thus, the following feedback loops are added to the process in order to integrate central ML lifecycle steps:</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para>Define and build a suitable model and improve it based on demo feedback through experiments using any suitable programming language.</para></listitem>
<listitem><para>Converting the optimal model, based on observed model performance, into ONNX (or other suitable format) and integrating it to the target AI platform.</para></listitem>
<listitem><para>Retrain, when it is needed, an operational model based on new real-life data and report the performance.</para></listitem>
<listitem><para>Adapt result of the whole process based on the performance of the models on productive data.</para></listitem>
</itemizedlist>
<fig id="ch6-F2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 6.2:</emphasis> Flow chart of development and deployment life cycle for AI solution at IFX. In development phase data scientists could use different programming language as the final model can be converted to ONNX. In deployment phase, the vision frame can simply access to ONNX and run during inference time.</para></caption>
<graphic xlink:href="graphics/ch6-fig02.jpg"/>
</fig>
<para>However, for deployment, it gets more complex, because of additional types of IFX infrastructure must be considered. Here, <link linkend="ch6-F3">Figure 6.3</link> shows the process which is extended by the new development into the existing IFX infrastructure. From the perspective of a classic ML lifecycle, the role setting of Business Analysts together with Data Scientists and Data Engineers is sufficient for conducting a working ML solution which proves to deliver all required benefits.</para>
<fig id="ch6-F3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 6.3:</emphasis> Process flow integration of the developed AD solution into an existing IFX infrastructure.</para></caption>
<graphic xlink:href="graphics/ch6-fig03.jpg"/>
</fig>
</section>
<section class="lev1" id="ch6-4">
<title>6.4 Transfer Learning and Scalability</title>
<para>Transfer learning is simply fine-tuning previously trained neural networks. In this context we transfer the trained model on OOI data into other processes of packaging, see <link linkend="ch6-F4">Figure 6.4</link> Thus, instead of creating an AI model from scratch, only a few images of the new process are enough for fine tuning the pre-trained model of OOI images. Interestingly, not only the collected images from new process are similar to the OOI images but the defect types as well. As a result, the model reports a high accuracy as is shown in Table <span class="ref missing_label ref_self">LABEL:T5.2</span>. The anomaly detection for the wire bonding process has a wide range of application, as there are multiple Infineon sites and multiple machines of the same type. The training of an anomaly detection model can benefit from unlabelled data under the assumption that the majority of the data is good. Given the general high yield this assumption is valid. Given multiple similar machines there are two approaches to scale one model to multiple machines.</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para>Using data from multiple machines for the training. Thus, the model implicitly learns differences between the machines and the same model can be used for multiple machines.</para></listitem>
<listitem><para>Using an anomaly detection model, which was trained on a prior defined machine and setting up all other machines to behave most similar to the selected machine. Thus, all other machines generate raw data of the same input space as the selected machine.</para></listitem>
</itemizedlist>
<para>With this procedure it was possible to scale one model to s complete production line with more than 30 machines.</para>
<fig id="ch6-F4" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 6.4:</emphasis> show the flow processes during silicon package, the backside blue arrow shows the position of transfer learning from OOI backwards to taken images after molding process, see <link linkend="ch6-F5">Figure 6.5</link></para></caption>
<graphic xlink:href="graphics/ch6-fig04.jpg"/>
</fig>
<fig id="ch6-F5" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 6.5:</emphasis> shows an example of the OOI image on left side (This image is taken before shopping and after electrical test) and example of image after molding process on right side.</para></caption>
<graphic xlink:href="graphics/ch6-fig05.jpg"/>
</fig>
</section>
<section class="lev1" id="ch6-5">
<title>6.5 Result and Discussion</title>
<para>For wire bonding use case, two different approaches to validate the system were made. The first one was to simply calculate the percentage of devices which showed an anomaly in the dataset and compare this to the process yield. If these percentages align this is a good indicator that the anomaly detection represents the product quality. Additionally, a statistical significantly correlation between high anomaly values and bad electrical test results is considered. For the second approach, we gathered multiple devices which showed a high anomaly value and examined them thoroughly. In all of the cases different influences could be found on the device, like a contaminated device, reduced shear value or input material which was out of specifications. But not all findings, even though varying from the normal, will lead to a malfunctioning device. However, an important aspect of the used anomaly detection was that the result is an anomaly score, indicating how different the raw data from normal is not a Boolean indication anomaly / no anomaly. Thus, it is necessary to find an optimal threshold on which the difference in the raw data influences the quality of the product. An important impact of the work was also the adaptation of the approach to a performant data management infrastructure; i. e. the development of automatable methods for the detection of conspicuous parameter behaviour and its marking and storage. The evaluation was based on sample data and statistical analysis of standard deviations considering Nelson&#x2019;s rules. The work carried out covers both the familiarization with the various technologies and their variants, the adaptation of the methods to the subject area, and the prototypical implementation and testing of the algorithms by embedding them in automated analysis pipelines. Currently the anomaly detection for wirebonding is running on over 40 machines on 3 different IFX sites. During a runtime of 4 months, several misadjusted bonders were detected, random errors and contaminated devices. However, currently a big focus is set to fully integrate the model not only in the infrastructure but also in the day to day workflow of the operators, this also includes a clear definition of action plans for found deviations and trainings of operators. For OOI use case, after collecting images, the labeled images are pre-processed first by cropping the region of interest and normalization the intensity values between 0 and 1. These images are sent to CNN for training purpose. The CNN consist of 100 layers. The latter consisting of different blocks. Each block contains the convolutional, pooling and ReLU layer. Also, before the last layer, fully connected layer, a strict regularization factor is added in order to avoid over-fitting issue by adding dropout layer with value 0.6. The data was splited into 80<math id="Ch6.S5.p1.m1" display="inline"><mo>%</mo></math> training and 20<math id="Ch6.S5.p1.m2" display="inline"><mo>%</mo></math> validation data. The model reported with accuracy higher than 99<math id="Ch6.S5.p1.m3" display="inline"><mo>%</mo></math>. Afterwards, the model is tested on productive data with roughly 25k images. <link linkend="ch6-T1">Table 6.1</link> shows the confusion matrix with the important measures, sensitivity, specificity and accuracy. As, one can see that model to follow zero defect philosophy, as sensitivity value is 100<math id="Ch6.S5.p1.m4" display="inline"><mo>%</mo></math>. The accuracy also is less than 1<math id="Ch6.S5.p1.m5" display="inline"><mo>%</mo></math>. Hence, only the latter have to be reviewed by an expert. Moreover, the performance model after scaling to anew process is still very robust. As one can see in the <link linkend="ch6-T2">Table 6.2</link>, which shows the reported results by a model when run on productive data of the new process. Although, one can see there is one escapee in bottom surface (BOT), but the accuracy is stillhigher than 99<math id="Ch6.S5.p1.m6" display="inline"><mo>%</mo></math>.</para>
<fig id="ch6-T1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 6.1:</emphasis> Show the confusion matrix and metrics of the CNN model on productive data for BOT and TOP of OOI images.</para></caption>
<graphic xlink:href="graphics/ch6-tab01.jpg"/>
</fig>
<fig id="ch6-T2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 6.2:</emphasis> Show the confusion matrix and metrics of the CNN model on productive data for BOT and TOP of the new process.</para></caption>
<graphic xlink:href="graphics/ch6-tab02.jpg"/>
</fig>
</section>
<section class="lev1" id="ch6-6">
<title>6.6 Conclusion and Outlooks</title>
<para>In this paper, two use cases show the potential benefits of using AI models in detecting abnormalities in industrial packages. Moreover, the methodology shows the possibility of scaling such solutions to new similar use cases or machines with minimum effort. As a result, not only the manual effort would significantly be reduced, but also costs and the quality of the products would be improved. Additionally, the long-term goal is not only to find the deviation but to detect exactly the root cause behind it. However, there is still a lot of work left, unrealized potentials benefit of AI solutions, but IFX has already taken a step forward in the right direction. Thus, semiconductor community is investing more with AI to harvest its benefits in the short and, most importantly, long term. Generally, the results are promising and would be a good alternative to classical approaches. The next steps are monitoring, optimization and more validation for both solutions in a productive environment.</para>
</section>
<section class="lev1">
<title>Acknowledgements</title>
<para>AI4DI receives funding within the Electronic Components and Systems for European Leadership Joint Undertaking (ECSEL JU) in collaboration with the European Union&#x2019;s Horizon2020 Framework Programme and National Authorities, under grant agreement n<math id="Ch6.Sx2.p1.m1" display="inline"><msup><mi></mi><mo>&#x2218;</mo></msup></math> 826060.</para>
</section>
<section class="lev1" id="ch6-Ref">
<title>References</title>
<para id="ch6-bib1">[1] S. Al-Baddai, M. Juhrisch, J. Papadoudis, A. Renner, L. Bernhard, C. Luca, F. Haas, and W. Schober. Automated Anomaly Detection through Assembly and Packaging Process, pages 161&#x2013;176. 09 2021.</para>
<para id="ch6-bib2">[2] A. Amet, A. Ertuzun, and A. Ercil. Texture defect detection using subband domain co-occurrence matrices. pages 205 &#x2013; 210, 05 1998.</para>
<para id="ch6-bib3">[3] A. Bodnarova, M. Bennamoun, and K. Kubik. Automatic visual inspection and flaw detection in textile materials: A review. pages 194&#x2013;197, 01 2001.</para>
<para id="ch6-bib4">[4] T. Ehret, A. Davy, J. M. Morel, and M. Delbracio. Image anomalies: a review and synthesis of detection methods. 08 2018.</para>
<para id="ch6-bib5">[5] G. A. Susto, M. Terzi, and A. Beghi. Anomaly detection approaches for semiconductor manufacturing. Procedia Manufacturing, 11:2018&#x2013;2024, 12 2017.</para>
<para id="ch6-bib6">[6] B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen. Deep autoencoding gaussian mixture model for unsupervised anomaly detection, 2018.</para>
</section>
</chapter>
<chapter class="chapter" id="ch7" label="7" xreflabel="7">
<title>S2ORC-SemiCause: Annotating and Analysing Causality in the Semiconductor Domain</title>
<subtitle>Xing Lan Liu<sup>1</sup>, Eileen Salhofer<sup>1,2</sup>, Anna Safont Andreu<sup>3,4</sup>, and Roman Kern<sup>2</sup></subtitle>
<affiliation><sup>1</sup>Know-Center GmbH, Austria<?lb?><sup>2</sup>Graz University of Technology, Austria<?lb?><sup>3</sup>University of Klagenfurt, Austria<?lb?><sup>4</sup>Infineon Technologies Austria</affiliation>
<section class="lev1">
<title>Abstract</title>
<para>For semiconductor manufacturing, easy access to causal knowledge documented in free texts facilitates timely Failure Modes and Effects Analysis (FMEA), which plays an important role to reduce failures and to decrease production cost. Causal relation extraction is the tasks of identifying causal knowledge in natural text and to provide a higher level of structure. However, the lack of publicly available benchmark causality datasets remains a bottleneck in the semiconductor domain. This work addresses this issue and presents the S2ORC-SemiCause benchmark dataset. It is based on the S2ORC corpus, which has been filtered for literature on semiconductor research, and consecutively annotated by humans for causal relations. The resulting dataset differs from existing causality datasets of other domain in the long spans of causes and effects, as well as causal cue phrases exclusive to the domain semiconductor research. As a consequence, this novel datasets poses challenges even for state-of-the-art token classification models such as S2ORC-SciBERT. Thus this dataset serves as benchmark for causal relation extraction for the semiconductor domain.</para>
<para><emphasis role="strong">Keywords:</emphasis> causality, relation extraction, information extraction, bertology, annotation</para>
</section>
<section class="lev1" id="ch7-1">
<title>7.1 Introduction</title>
<para>Although causality represents a simple logical idea, it becomes a complex phenomenon when appearing in textual form. Natural language provides a wide variety of structures to represent causal relationships that can obfuscate the causal relations expressed via cause and effect. The task of causal relation extraction aims at extracting sentences containing causal language and identifying causal constituents and their relations [<link linkend="ch7-bib17">17</link>].</para>
<para>In the last years significant progress have been made in automatizing the identification of causal cues and extraction of causal relation in natural language, defining it as a multi-way classification problem of semantic relationships [<link linkend="ch7-bib6">6</link>], designing a lexicon of causal constructions [<link linkend="ch7-bib2">2</link>, <link linkend="ch7-bib3">3</link>], and insights how to achieve high inter-rater agreement [<link linkend="ch7-bib13">13</link>]. Approaches have been developed in scientific domains traditionally dominated by textual information, such as biomedical sciences. Here, models to process causal relations are facilitated and accelerated with the development of benchmark datasets such as BioCause [<link linkend="ch7-bib10">10</link>]. Such datasets not only allow for comparison and automatic evaluation of custom causal extractors, but also allow for training high performing supervised models.</para>
<para>For semiconductor manufacturing, much of existing knowledge can be considered to be causal, highlighted by approaches like Ishikawa causal diagrams as well as the Failure Modes and Effects Analysis (FMEA) tool which captures root causes of potential failures. Even though such FMEA document provides more structure than natural language text, dedicated pre-processing is required before further processing [<link linkend="ch7-bib12">12</link>]. A signification amount of such causal knowledge is captured in textual documents, such as reports and knowledge bases. However, there is no publicly available annotated dataset for causal relation extraction yet. As a consequence, in this work we propose such a dataset, named <emphasis>S2ORC-SemiCause</emphasis>. The source for the documents of this novel dataset is the S2ORC academic corpus, which has been filtered for documents of relevance for the semiconductor domain. Human annotators identified causal cues and causal relations in the documents of the corpus. To achieve consistent and reproducible results, an annotation guideline was created and the annotation processes was conducted in multiple phases. To provide baseline performance, the pre-trained language model BERT [<link linkend="ch7-bib1">1</link>], which is currently considered state of the art for many natural language processing (NLP) tasks was adapted for the task. An error analysis gives insights on the challenges of future causal relation extraction methods.</para>
<para>In summary, our main contributions are:</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para><emphasis>S2ORC-SemiCause</emphasis>, a causality dataset for the semiconductor domain that aims to provide a benchmark for causal relation extraction performances and facilitate research on dedicated methods;</para></listitem>
<listitem><para>Practical annotation guidelines designed to yield high inter-annotator agreement for semiconductor literature, to enable the creation of further, similar datasets;</para></listitem>
<listitem><para>Identified the key differences of <emphasis>S2ORC-SemiCause</emphasis> compared to other domains, and highlighted the resulting challenges for state-of-the-art NLP models.</para></listitem>
</itemizedlist>
</section>
<section class="lev1" id="ch7-2">
<title>7.2 Dataset Creation</title>
<section class="lev2" id="ch7-2-1">
<title>7.2.1 Corpus</title>
<para>Our semiconductor corpus is selected from the 24 million papers in the engineering and related domains from the S2ORC corpus [<link linkend="ch7-bib8">8</link>] (total 81.8 million papers). The subdomain is further filtered using a series of keywords specific for the semiconductor domain, such as device locations, electrical and physical faults, technologies (e.g. SFET), Focused Ion Beam, etc. For a paper to be selected, it needs to include at least four of these keywords.</para>
<para>From the resulting subset of 21 thousand papers, 400 abstract and 400 paragraphs are randomly sampled, among which 600 sentences are selected randomly for annotation.</para>
</section>
<section class="lev2" id="ch7-2-2">
<title>7.2.2 Annotation Guideline</title>
<para>We have adapted the annotation guidelines<sup>1</sup> from the creation of BECauSE Corpus 2.0 [<link linkend="ch7-bib3">3</link>]. The main differences are (1) the relation types "Motivation" and "Purpose" are further merged into one type (name "Purpose") since it is found from previous work [<link linkend="ch7-bib5">5</link>] that annotators have difficulty distinguishing these two types; (2) <emphasis>"max-span" rule</emphasis>, namely, the span should include full phrase or clause. The <emphasis>"max-span"</emphasis> rule not only retains important context information for the causal relations, but also enables higher inter-annotator agreement. This was also motivated that it assumed to be easier to automatically reduce a phrase to its heads, instead of expanding a short, existing annotation.</para>
</section>
<section class="lev2" id="ch7-2-3">
<title>7.2.3 Annotation Methodology</title>
<para>Since the annotations should contain as little ambiguity as possible, we aimed to design a methodology to achieve consistent annotations. To this end, the dataset was annotated in a total of 3 iterations. For the first two iterations with 50 sentences each, both annotators label the same set, so that inter-annotator-agreement (IAA) can be evaluated. Between the two iterations, the two annotators discussed the results and updated the guideline.</para>
<fig id="ch7-T1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 7.1:</emphasis> Inter-annotator agreement for the first two iterations. <emphasis>Arg1</emphasis> (cause) refers to the span of the arguments that lead to <emphasis>Arg2</emphasis> (effect) for the respective relation type.</para></caption>
<graphic xlink:href="graphics/ch7-tab01.jpg"/>
</fig>
<para><link linkend="ch7-T1">Table 7.1</link> shows that there are significant improvement in Inter-Annotator Agreement (IAA) from iteration 1 to iteration 2, both in terms of Cohen&#x2019;s <sub>1</sub>. The main improvement comes from (1) direction for <emphasis>Purpose</emphasis> relation (namely, <emphasis>arg2</emphasis> should be the purpose); (2) <emphasis>"max-span" rule</emphasis>, namely, the span should include full phrase or clause.</para>
<fig id="ch7-T2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 7.2:</emphasis> Comparison of labels generated by both annotators for Iteration 2. Examples and total counts (in number of arguments) for each type also given. <emphasis role="strong">Arg1</emphasis> and <emphasis role="strong">Arg2</emphasis> are highlighted with blue and yellow background, respectively. <emphasis role="strong">Partial overlapped texts</emphasis> are highlighted with green background.</para></caption>
<graphic xlink:href="graphics/ch7-tab02.jpg"/>
</fig>
<para>With Iteration 2, the two annotators reached a substantial agreement, where both Cohen&#x2019;s <sub>1</sub> for argument spans are around 0.8. For reference, in Dunietz et al. [<link linkend="ch7-bib3">3</link>] a Cohen&#x2019;s <math id="Ch7.S2.SS3.p3.m3" display="inline"><mi>&#x3ba;</mi></math> of 0.70 was reported for the relation type. Results of detailed inspection are summarized in <link linkend="ch7-T2">Table 7.2</link>. For 54 arguments, both annotators agree in both span and argument type. The remaining disagreements are from (1) one annotator misses a relation (14 occurrences); (2) only partial overlap of the annotated spans by both annotators (8 occurrences).</para>
<para>Based on the insights from the updated baseline, the first set of document was revisited and both set of annotations from the first two iterations were then merged manually. In addition, for the 3rd iteration, two extra sets of 250 sentences were annotated by each annotators. As a result, our dataset consist of 600 sentences annotated with Consequence and Purpose relations.</para>
</section>
<section class="lev2" id="ch7-2-4">
<title>7.2.4 Dataset Statistics</title>
<para>We notice that compared to other benchmark NER datasets, such as CoNLL2003 [<link linkend="ch7-bib4">4</link>], BC5CDR [<link linkend="ch7-bib7">7</link>], and BioCause [<link linkend="ch7-bib10">10</link>] (see <link linkend="ch7-T3">Table 7.3</link>), S2ORC-SemiCause dataset differs in terms of (1) smaller size; (2) longer sentence length; (3) longer argument length. While data size is found to be generally sufficient for entity recognition tasks [<link linkend="ch7-bib14">14</link>], and longer sentence length is found to be preferred [<link linkend="ch7-bib14">14</link>], the effect of longer argument length remains to be evaluated.</para>
<fig id="ch7-T3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 7.3:</emphasis> <emphasis role="strong">Descriptive statistics of benchmark datasets</emphasis>. Overview of CoNLL-2003 (training split) and BC5CDR (training split) for named entity recognition, as well as causality dataset BioCause (full dataset), and S2ORC-SemiCause (training split).</para></caption>
<graphic xlink:href="graphics/ch7-tab03.jpg"/>
</fig>
</section>
<section class="lev2" id="ch7-2-5">
<title>7.2.5 Causal Cue Phrases</title>
<para>When present, the causal cue phrases are also annotated. <link linkend="ch7-F1">Figure 7.1</link> depicts the most common cue phrases for both relation types. <emphasis>"To"</emphasis> is the most frequently occurring cue because it is by far the most dominating cue phrase for relation type <emphasis>purpose</emphasis>. The cue phrases for <emphasis>consequence</emphasis> are much more diverse. Compared to other corpus of general domain [<link linkend="ch7-bib9">9</link>, <link linkend="ch7-bib11">11</link>], in S2ORC-SemiCause dataset, cue words such as <emphasis>increase, decrease, improve, reduce</emphasis> are also ranked very high.</para>
<fig id="ch7-F1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 7.1:</emphasis> Causal cue phrases ranked by frequency for all sentences in S2ORC-SemiCause dataset.</para></caption>
<graphic xlink:href="graphics/ch7-fig01.jpg"/>
</fig>
</section>
</section>
<section class="lev1" id="ch7-3">
<title>7.3 Baseline Performance</title>
<para>To establish a point of reference for the community, we provide an initial baseline performance. For the baseline approach we considered the causal relation extraction task as an sequence classification task. As a technical realisation, we fine-tuned BERT on the down-stream task of token-level classification [<link linkend="ch7-bib1">1</link>]. An error analysis is then performed to identify the main challenges in extracting causal relations from scientific publications in semiconductor research.</para>
<section class="lev2" id="ch7-3-1">
<title>7.3.1 Train-Test Split</title>
<para>The total 600 sentences are split into training, validation, and test sets, with the ratio <math id="Ch7.S3.SS1.p1.m1" display="inline"><mrow><mn>60</mn><mo>:</mo><mn>20</mn><mo>:</mo><mn>20</mn></mrow></math>, stratified on relation type<sup>2</sup>. In addition, also the iterations were stratified evenly to avoid unwanted biases. The descriptive statistics for each split is listed in <link linkend="ch7-T4">Table 7.4</link>.</para>
<fig id="ch7-T4" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 7.4:</emphasis> Descriptive statistics of <emphasis>S2ORC-SemiCause</emphasis> dataset. <emphasis>#-sent</emphasis>: total number of annotated sentences, <emphasis>#-sent no relations</emphasis>: number of sentences without causality, <emphasis>Argument</emphasis>: total amount and mean length (token span) of all annotated argument, <emphasis>Consequence/Purpose</emphasis>: amount and mean length of cause and effect arguments for the respective relation types.</para></caption>
<graphic xlink:href="graphics/ch7-tab04.jpg"/>
</fig>
</section>
<section class="lev2" id="ch7-3-2">
<title>7.3.2 Causal Argument Extraction</title>
<para>As recommended in [<link linkend="ch7-bib1">1</link>], which describes a similar scenario, we considered the task as a token-level classification. Namely, a pretrained BERT model is stacked with a linear layer on top of the hidden-states output, before fine-tuned on training examples. And the pretrained S2ORC-SciBERT model [<link linkend="ch7-bib8">8</link>] is selected for fine-tuning using transformers library from Hugging Face [<link linkend="ch7-bib16">16</link>]. The resulting F<sub>1</sub> scores<sup>3</sup> are shown in <link linkend="ch7-T5">Table 7.5</link> and is remarkable lower than for other benchmark NER datasets when down-sampled to similar size [<link linkend="ch7-bib14">14</link>].</para>
<fig id="ch7-T5" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 7.5:</emphasis> Baseline performance using BERT with a token classification head. Both the F<sub>1</sub> scores and the standard derivation over 7 different runs are shown. Despite the small sample size, the standard deviation remain low, similar to previous work [<link linkend="ch7-bib14">14</link>].</para></caption>
<graphic xlink:href="graphics/ch7-tab05.jpg"/>
</fig>
</section>
<section class="lev2" id="ch7-3-3">
<title>7.3.3 Error Analysis</title>
<para>In order to understand the causes for the low F<sub>1</sub> score of the baseline model, an error analysis is performed.</para>
<para><emphasis role="strong">Length of Argument Span</emphasis></para>
<para>Firstly, a manual inspection revealed that for 30 <math id="Ch7.S3.SS3.SSSx1.p1.m1" display="inline"><mo>&#xb1;</mo></math> 4 (out of the total 120) sentences, the fine-tuned model predicts sequences similar to [O I I <math id="Ch7.S3.SS3.SSSx1.p1.m2" display="inline"><mi mathvariant="normal">&#x22ef;</mi></math>], i.e., the models did not learn that an argument must always start with a "B" type with the IOB (Inside&#x2013;Outside&#x2013;Beginning) notation.</para>
<para>We hypothesize that this might be because our argument spans are much longer than other datasets (see <link linkend="ch7-T4">Table 7.4</link> and <link linkend="ch7-T3">Table 7.3</link>). As a result, either the self-attention might no longer efficiently keep track of the [B I <math id="Ch7.S3.SS3.SSSx1.p2.m1" display="inline"><mi mathvariant="normal">&#x22ef;</mi></math>] pattern, or the over-abundant "I" class might bias the model loss.</para>
<para>Following this hypothesis, we expect better performances for shorter arguments than for longer. Indeed we observe that correct predictions are shorter by 2.7 tokens on average (<math id="Ch7.S3.SS3.SSSx1.p3.m1" display="inline"><mrow><mrow><mi>p</mi><mo>&#x2062;</mo><mi mathvariant="normal">_</mi><mo>&#x2062;</mo><mi>v</mi><mo>&#x2062;</mo><mi>a</mi><mo>&#x2062;</mo><mi>l</mi><mo>&#x2062;</mo><mi>u</mi><mo>&#x2062;</mo><mi>e</mi></mrow><mo>=</mo><mn>0.008</mn></mrow></math>).</para>
<para>To quantify the effect of such incorrect [O I I <sub>1</sub> score after filtering out such predictions. The results are shown in <link linkend="ch7-T5">Table 7.5</link> as "F<sub>1</sub>-filter", and an improvement of 6 points is observed compared to the F<sub>1</sub> score before filtering.</para>
<para><emphasis role="strong">Predictions with Partial Overlap</emphasis></para>
<para>Out of the predicted argument, 41 were counted as incorrect, but overlapped partially (see example in <link linkend="ch7-T6">Table 7.6</link>), and manual inspection suggest that they often contain valid causal information.</para>
<fig id="ch7-T6" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 7.6:</emphasis> Comparison of predicted and annotated argument spans for the test split. Examples and total counts (in number of arguments) for correct prediction and for each error source are also given. <emphasis role="strong">Arg 1</emphasis> and <emphasis role="strong">Arg 2</emphasis> are highlighted with blue and yellow background, respectively. <emphasis role="strong">Partial overlapped texts</emphasis> are highlighted with green background.</para></caption>
<graphic xlink:href="graphics/ch7-tab06.jpg"/>
</fig>
<para>Following [<link linkend="ch7-bib15">15</link>], the model performance can be evaluated taking into account partial overlaps. The results are listed in <link linkend="ch7-T5">Table 7.5</link> as "F1-filter partial", and the average F1 score becomes 0.59, which is about 80% of human performance (inter-annotator F1 value of 0.78), and is inline with the sample-size scaling as reported previously [<link linkend="ch7-bib14">14</link>].</para>
<para><emphasis role="strong">Spurious and Missed Predictions</emphasis></para>
<para>Spurious examples (false positives) are the cases where the model predicts a relation while annotators do not label. After manual inspection, we find it arguable that some <emphasis>spurious</emphasis> predictions made by the model might actually be valid causal relations as well. For example, the spurious example shown in <link linkend="ch7-T6">Table 7.6</link> is arguably causal as well following the (<emphasis>The role of &#x2026; in &#x2026;</emphasis>) construct.</para>
<para>Missed examples (false negatives) are the cases where annotators have labelled while the model fails to predict a relation. For example, the missed example shown in <link linkend="ch7-T6">Table 7.6</link> uses the rare causal trigger <emphasis>derived from</emphasis>, which might be the reason why the model failed to recognize.</para>
</section>
</section>
<section class="lev1" id="ch7-4">
<title>7.4 Conclusions</title>
<para>Causality is critical knowledge in semiconductor manufacturing. In order to enable automatic causality recognition, we created the <emphasis>S2ORC-SemiCause</emphasis> dataset by annotating 600 sentences with 670 arguments for causal relation extraction from a subset of semiconductor literature taken from the S2ORC dataset. This unique dataset challenges established state-of-the-art techniques, because of its long spans for each argument. This benchmark dataset is intended to spur further research, fuel development of machine learning models, and to provide benefit to the NLP research in semiconductor domain.</para>
</section>
<section class="lev1">
<title>Acknowledgements</title>
<para>The research was conducted under the framework of the ECSEL AI4DI "Artificial Intelligence for Digitising Industry" project. The project has received funding from the ECSEL Joint Undertaking (JU) under grant agreement No 826060. The Know-Center is funded within the Austrian COMET Program&#x2013;Competence Centers for Excellent Technologies under the auspices of the Austrian Federal Ministry of Transport, Innovation and Technology, the Austrian Federal Ministry of Economy, Family and Youth and by the State of Styria. COMET is managed by the Austrian Research Promotion Agency FFG. We acknowledge useful comments and assistance from our colleagues at Know-Center and at Infineon.</para>
</section>
<section class="lev1" id="ch7-Ref">
<title>References</title>
<para id="ch7-bib1">[1] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. <emphasis>NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference</emphasis>, 1(Mlm):4171&#x2013;4186, 2019.</para>
<para id="ch7-bib2">[2] J. Dunietz, L. Levin, and J. Carbonell. Annotating causal language using corpus lexicography of constructions. <emphasis>The 9th Linguistic Annotation Workshop held in conjunction with NAACL 2015</emphasis>, (2014):188&#x2013;196, 2015.</para>
<para id="ch7-bib3">[3] J. Dunietz, L. Levin, and J. G Carbonell. The because corpus 2.0: Annotating causality and overlapping relations. In <emphasis>Proceedings of the 11th Linguistic Annotation Workshop</emphasis>, pages 95&#x2013;104, 2017.</para>
<para id="ch7-bib4">[4] E. F. Tjong Kim Sang, and F. De Meulder. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In <emphasis>Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003</emphasis>, pages 142&#x2013;147, 2003.</para>
<para id="ch7-bib5">[5] D. Gaerber. Causal information extraction from historical german texts, 2022.</para>
<para id="ch7-bib6">[6] I. Hendrickx, S. N. Kim, Z. Kozareva, P. Nakov, D. &#xd3; S&#xe9;aghdha, S. Pad&#xf3;, M. Pennacchiotti, L. Romano, and S. Szpakowicz. SemEval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals. In <emphasis>Proc. of the 5th Int. Workshop on Semantic Evaluation</emphasis>, pages 33&#x2013;38, Uppsala, Sweden, 2010. Association for Computational Linguistics.</para>
<para id="ch7-bib7">[7] J. Li, Y. Sun, R. J. Johnson, D. Sciaky, C.-H. Wei, R. Leaman, A. P. Davis, C. J. Mattingly, T. C. Wiegers, and Z. Lu. Biocreative V CDR task corpus: a resource for chemical disease relation extraction. <emphasis>Database</emphasis>, 2016.</para>
<para id="ch7-bib8">[8] K. Lo, L. L. Wang, M. Neumann, R. Kinney, and D. Weld. S2ORC: The semantic scholar open research corpus. In <emphasis>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</emphasis>, pages 4969&#x2013;4983, Online, July 2020. Association for Computational Linguistics.</para>
<para id="ch7-bib9">[9] Z. Luo, Y. Sha, K. Q. Zhu, S. W. Hwang, and Z. Wang. Commonsense causal reasoning between short texts. <emphasis>Proc. Int. Workshop Tempor. Represent. Reason.</emphasis>, pages 421&#x2013;430, 2016.</para>
<para id="ch7-bib10">[10] C. Mih&#x1ce;il&#x1ce;, T. Ohta, S. Pyysalo, and S. Ananiadou. BioCause: Annotating and analysing causality in the biomedical domain. <emphasis>BMC Bioinformatics</emphasis>, 14, 2013.</para>
<para id="ch7-bib11">[11] S. Pawar, R. More, G. K. Palshikar, P. Bhattacharyya, and V. Varma. Knowledge-based Extraction of Cause-Effect Relations from Biomedical Text. 2021.</para>
<para id="ch7-bib12">[12] H. Razouk and R. Kern. Improving the consistency of the failure mode effect analysis (fmea) documents in semiconductor manufacturing. <emphasis>Applied Sciences</emphasis>, 12(4), 2022.</para>
<para id="ch7-bib13">[13] I. Rehbein and J. Ruppenhofer. A new resource for German causal language. In <emphasis>Proceedings of the 12th Language Resources and Evaluation Conference</emphasis>, pages 5968&#x2013;5977, Marseille, France, May 2020. European Language Resources Association.</para>
<para id="ch7-bib14">[14] E. Salhofer, X. L. Liu, and R. Kern. Impact of training instance selection on domain-specific entity extraction using bert. In <emphasis>NAACL SRW</emphasis>, 2022.</para>
<para id="ch7-bib15">[15] I. Segura-Bedmar, P. Mart&#xed;nez, and M. Herrero-Zazo. SemEval-2013 task 9 : Extraction of drug-drug interactions from biomedical texts (DDIExtraction 2013). In <emphasis>Proc. of the 7th Int. Workshop on Semantic Evaluation (SemEval 2013)</emphasis>, 2013.</para>
<para id="ch7-bib16">[16] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. v. Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. L. Scao, S. Gugger, M. Drame, Q. Lhoest, and A. Rush. Transformers: State-of-the-art natural language processing. In <emphasis>Proc. of the 2020 Conf. on Empirical Methods in NLP: System Demonstrations</emphasis>, 2020.</para>
<para id="ch7-bib17">[17] J. Yang, S. C. Han, and J. Poon. A survey on extraction of causal relations from natural language text. <emphasis>Knowledge and Information Systems</emphasis>, pages 1&#x2013;26, 2022.</para>
</section>
<para><sup>1</sup>The annotation guideline will be make public at <ulink url="https://github.com/tugraz-isds/kd">https://github.com/tugraz-isds/kd</ulink>.</para>
<para><sup>2</sup>We release all data for future studies at <ulink url="https://github.com/tugraz-isds/kd">https://github.com/tugraz-isds/kd</ulink>.</para>
<para><sup>3</sup>The best performance is found using learning rate <math id="Ch7.footnote3.m1" display="inline"><mrow><mrow><mn>1.5</mn><mo>&#x2062;</mo><mi>e</mi></mrow><mo>-</mo><mn>4</mn></mrow></math>, batch size 8, warm up steps 10, and 10 epochs.</para>
</chapter>
<chapter class="chapter" id="ch8" label="8" xreflabel="8">
<title>Feasibility of Wafer Exchange for European Edge AI Pilot Lines</title>
<subtitle>Annika Franziska Wandesleben<sup>1*</sup>, Delphine Truffier-Boutry<sup>2*</sup>, Varvara Brackmann<sup>1</sup>, Benjamin Lilienthal-Uhlig<sup>1</sup>, Manoj Jaysnkar<sup>3</sup>, Stephan Beckx<sup>3</sup>, Ivan Madarevic<sup>3</sup>, Audde Demarest<sup>2</sup>, Bernd Hintze<sup>4</sup>, Franck Hochschulz<sup>5</sup>, Yannick Le Tiec<sup>2</sup>, Alessio Spessot<sup>3</sup>, and Fabrice Nemouchi<sup>2</sup></subtitle>
<affiliation><sup>1</sup>Fraunhofer IPMS CNT, Germany<?lb?><sup>2</sup>Universit&#xe9; Grenoble Alpes, CEA-Leti, France<?lb?><sup>3</sup>imec, Belgium<?lb?><sup>4</sup>FMD, Germany<?lb?><sup>5</sup>Fraunhofer IMS, Germany<?lb?><sup>*</sup>Equal contribution</affiliation>
<section class="lev1">
<title>Abstract</title>
<para>This paper compares the contamination monitoring of the three largest microelectronics research organizations in Europe, CEA-Leti, imec and Fraunhofer. The aim is to align the semiconductor infrastructure of the three research institutes to accelerate the supply to European industry for disruptive chip processing. To offer advanced edge AI systems with novel non-volatile memory components, integration into state-of-the-art semiconductor fabrication production flow must be validated. For this, the contamination monitoring is an essential aspect. Metallic impurities can have a major impact on expensive and complex microelectronic process flows. Knowing this, it is important to avoid contamination of process lines. In order to benefit from the combined infrastructure, expertise and individual competences, the feasibility of wafer loops needs to be investigated.</para>
<para>Through a technical comparison and a practical analysis of potential cross-contaminations, the correlation of the contamination measurement results of the research institutes is investigated. The results demonstrate that the three institutes are able to analyse metallic contamination with comparable Lower Limits of Detection (LLDs). This result sets the foundations for smooth and fast wafer exchange for current and future needs, potentially not only within research institutes as well as with industrial and foundry partners. The present work pays attention to both surface and bevel contamination. The latter requires very specific contamination collection which was also compared. Nevertheless, some challenges need to be addressed in the future to advance and accurate contaminationmonitoring.</para>
<para><emphasis role="strong">Keywords:</emphasis> contamination, contamination monitoring and management, TXRF, VPD-ICPMS, surface, bevel, wafer loops.</para>
</section>
<section class="lev1" id="ch8-1">
<title>8.1 Introduction</title>
<para>The aim is to align the semiconductor infrastructure of the three largest microelectronics research institutions in Europe, CEA-Leti, imec and Fraunhofer, in order to accelerate supply to European industry for disruptive chip processing. Contamination monitoring is an essential aspect of this alignment. Metallic impurities can have a major impact on expensive and complex microelectronic process flows. Therefore, it is important to avoid contamination of the process lines. To benefit from the semiconductor infrastructure, expertise and individual skills, the feasibility of wafer loops needs to be investigated. Additionally, to offer advanced edge AI systems with novel non-volatile memory components, integration into state-of-the-art semiconductor fabrication production flow must be validated. Metallic contamination can have a major impact within the microelectronic process flow, whereby the different chemical elements have various effects. Therefore, contamination of the process lines must be avoided (Bigot, Danel, &amp; Thevenin, 2005; Borde, Danel, Roche, Grouillet, &amp; Veillerot, 2007). To simplify the future exchange of wafers in-between research institutes and between institutes and semiconductor fabs, it is necessary to find out more about contamination monitoring and possible cross-contamination. For this purpose, a technical comparison and a practical analysis of the possible cross-contaminations is carried out in order to furthermore investigate the correlation of the contamination measurement results of the threeinstitutes.</para>
</section>
<section class="lev1" id="ch8-2">
<title>8.2 Technical Details and Comparison</title>
<para>The common techniques for contamination monitoring are TXRF and VPD-ICMPS. The three largest microelectronics research organizations in Europe, CEA-Leti, imec and Fraunhofer, are equipped with VPD-ICPMS while imec and CEA-Leti additionally use TXRF tools. The type of tool, its set up and qualification depend on the contamination management strategy developed in each clean room.</para>
<para>The capabilities of the individual institutes are summarised in the following <link linkend="ch8-T1">Table 8.1</link>.</para>
<fig id="ch8-T1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 8.1:</emphasis> Contamination monitoring techniques LETI / IMEC / FhG</para></caption>
<graphic xlink:href="graphics/ch8-tab01.jpg"/>
</fig>
<section class="lev2" id="ch8-2-1">
<title>8.2.1 Comparison TXRF and VPD-ICPMS Equipment for Surface Analysis</title>
<para>TXRF is ideal for high throughput applications as the measurements are based on the interaction of electron beams and silicon surfaces, without chemical manipulation. This technique allows to analyse fast enough both standard and noble elements in automatic mode with the possibility to localize the contamination on wafer with the mapping option. However lower limits of detection (LLD) are quite high, from 1E<math id="Ch8.S2.SS1.p1.m1" display="inline"><mo>+</mo></math>9 to 1E<math id="Ch8.S2.SS1.p1.m2" display="inline"><mo>+</mo></math>11 at/cm<math id="Ch8.S2.SS1.p1.m3" display="inline"><msup><mi></mi><mn>2</mn></msup></math>.</para>
<para>Concerning the VPD-ICPMS technique, it requires different chemical solutions for the collection of standard and noble elements, so campaigns need to be planned and there is no local resolution of contaminants. However, the collection of all metallic contaminants in a small droplet of chemistry induces significantly improved LLD values for all elements.</para>
<para>To compare metallic contamination results obtained by the different institutes, the first goal was to compare LLDs of each element of each institute and how it is experimentally determined. Indeed, LLD is the lowest concentration at which an element can be reliably detected and is a key point for the control of the metallic contamination at very low level. Depending on the equipment, there are several ways to determine the LLD, and hence the need for comparing the capabilities of each institute.</para>
<fig id="ch8-F1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 8.1:</emphasis> Comparison of TXRF LLDs of CEA LETI / IMEC</para></caption>
<graphic xlink:href="graphics/ch8-fig01.jpg"/>
</fig>
<para>For TXRF, LLD values are nearly identical for each element, as shown in <link linkend="ch8-F1">Figure 8.1</link>. As this technique is based on physical principles and since both institutes have the same equipment (Rigaku TXRF), capabilities of both institutes are the same. All LLDs are between 5E+9 and 5E+10 at/cm<math id="Ch8.S2.SS1.p4.m1" display="inline"><msup><mi></mi><mn>2</mn></msup></math>. Only Ca and Ag are a little bit higher because Ca comes from the manual wafer manipulation and Ag results from a high background noise on the TXRF spectrum near 3 keV (L<math id="Ch8.S2.SS1.p4.m2" display="inline"><mi>&#x3b1;</mi></math>1 ray of Ag at 2.983 keV).</para>
<para>In case of VPD-ICPMS technique, the LLD results are not the same across the three institutes. This can be explained by the fact that the technique is based on chemical collection and each institute has its own specific system with different approaches to the analysis and calculation of LLDs, as shown in <link linkend="ch8-T2">Table 8.2</link>.</para>
<fig id="ch8-T2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 8.2:</emphasis> Overview VPD-ICPMS LLD determination and technical details for LETI / IMEC / FhG</para></caption>
<graphic xlink:href="graphics/ch8-tab02.jpg"/>
</fig>
<fig id="ch8-F2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 8.2:</emphasis> Comparison of VPD-ICPMS LLDs of CEA LETI / IMEC / FhG</para></caption>
<graphic xlink:href="graphics/ch8-fig02.jpg"/>
</fig>
<para><link linkend="ch8-F2">Figure 8.2</link> shows that the VPD-ICPMS LLDs of each institute are between 1E+6 and 5E+9 at/cm<math id="Ch8.S2.SS1.p6.m1" display="inline"><msup><mi></mi><mn>2</mn></msup></math>, more or less three decades lower than TXRF ones.</para>
<para>Differences observed across LLDs of each institute are due to the different techniques used and the different environments. The collection system at CEA-Leti is not full automatic and technicians have to transfer a small container containing the chemical droplet from the VPD to the ICPMS. This container has to be manually cleaned between collection and all these manual steps contribute to the increased Na, Mg and Ca levels of contamination. However, these specific LLDs are still lower than 1E+10 at/cm<math id="Ch8.S2.SS1.p7.m1" display="inline"><msup><mi></mi><mn>2</mn></msup></math> and these elements are usually not critical for the microelectronic device performances. For imec, high values of Ti and V seem to be due to specific detector settings that favours minimal peak interference for Ti and V. For other elements, all imec LLDs are lower as they use a fully automatic tool without manual steps. Fraunhofer has a comparable system to CEA-Leti, but it is still in the method development process and the current analyses are done externally on an automated system.</para>
<para>Overall, the VPD-ICPMS LLDs of each institute are very low and comparable to industry standards and thus are sufficient for the metallic contamination control in the microelectronic environment. One other important parameter is the recovery rate that has to be more than 95 % for each of the elements. As each institute use the same chemical solution for the collection step, recovery rates are nearly the same and are verygood (<math id="Ch8.S2.SS1.p8.m1" display="inline"><mo>&gt;</mo></math>95 %).</para>
</section>
<section class="lev2" id="ch8-2-2">
<title>8.2.2 VPD-ICPMS Analyses on Bevel</title>
<para>For several years, wafer bevel contamination has become a challenge in the industry and it is therefore an increasing issue for R&amp;D institutes. Actually, in order to increase device density on a wafer, individual chips need to be placed closer to the edge of a wafer limiting the waste of surface. In addition, wafers are increasingly processed by physical contact at the bevel, so this particular part of the wafer will need to be precisely controlled in the future. The full bevel area can only be analysed by VPD-ICPMS on bare Si wafers. Effectively, TXRF analysis of the full bevel is impossible because this technique is too sensitive to the topography and cannot quantify the metallic contamination localized on the fall of the bevel. The collection of contaminants at the bevel is a key point and each institute had to develop a specific system for the analysis. Thus, there are major technical variations between the collection systems used by the three institutes for the analysis of the bevel.</para>
<fig id="ch8-F3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 8.3:</emphasis> Schematic of the VPD bevel collection at (a) IMEC, (b) CEA-LETI and (c) FhG IPMS</para></caption>
<graphic xlink:href="graphics/ch8-fig03.jpg"/>
</fig>
<para>The <link linkend="ch8-F3">Figure 8.3</link> shows the different techniques used by each institute for VPD collection on the bevel and the resulting different analysis surface. Therefore differences are also expected for the results of the VPD bevel analysis. Imec analyses the same area front side and back side 1 mm and the bevel, CEA-Leti analyses 5 mm front side, bevel and 1 mm back side. In Fraunhofer institute, the area is not defined yet as the method is still under development. The monitoring of the bevel is another promising analytical technique and will be mandatory for the safe exchange of wafers, as with this control the probability of cross-contamination is further reduced.</para>
<fig id="ch8-F4" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 8.4:</emphasis> Comparison LLDs CEA LETI / IMEC for VPD-ICPMS Bevel</para></caption>
<graphic xlink:href="graphics/ch8-fig04.jpg"/>
</fig>
<para>Comparison of the LLDs for VPD-ICPMS bevel are shown in <link linkend="ch8-F4">Figure 8.4</link>. It shows that the LLDs are higher than those of the VPD-ICPMS surface since they are in the range of 1E+8 and 1E+11 at/cm<math id="Ch8.S2.SS2.p3.m1" display="inline"><msup><mi></mi><mn>2</mn></msup></math>. However, the values are quite similar and only Ti and V are noticeable again for imec due to their specific ICPMS detector setting.</para>
</section>
</section>
<section class="lev1" id="ch8-3">
<title>8.3 Cross-Contamination Check-Investigation</title>
<para>In the frame of the present study, one equipment of each institute was selected for the control of the metallic contamination. Therefore, each institute chooses the tool that is regularly involved in the production memory flow and most critical in terms of contamination.</para>
<para>So called &#x201c;witness wafers&#x201d; were generated by each institute with the selected tool by handling bare Si wafers through the tool. In this way, the wafers are subjected to the specific tool contamination process. The analysis of the backside delivers information about the contamination by the handling system (chuck and robot). The analysis of the front side provides information about a possible contamination of the chamber. Afterwards, each institute characterises the metallic contamination of the wafers with their own techniques and finally, the analysis results are comprehensively evaluated.</para>
<section class="lev2" id="ch8-3-1">
<title>8.3.1 Example for the Comparison of the Institutes</title>
<para>For the practical comparison of the measurement, the results of the three research institutes for a tool from Imec are presented as an example. The tool is a multi-module macro inspection, metrology and review tool for the front side of 200 mm and 300 mm wafers and additionally for the back side and edge of 300 mm wafers. The tool supports the inspection of patterned and unpatterned wafers.</para>
<para><link linkend="ch8-F5">Figure 8.5</link> shows the comparison of TXRF measurement obtained by CEA-Leti and imec for the inspection tool. There is a high agreement between the values, demonstrating the comparability of the measurement results. The Ti measured by imec is assumed to be a handling contamination during the measurement. Nevertheless, the concentration is low.</para>
<fig id="ch8-F5" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 8.5:</emphasis> Comparison TXRF results of CEA LETI / IMEC for IMEC inspection tool</para></caption>
<graphic xlink:href="graphics/ch8-fig05.jpg"/>
</fig>
<para><link linkend="ch8-F6">Figure 8.6</link> shows the comparison of the VPD-ICPMS data for the back side surface of wafers. For the VPD-ICPMS, the results show noticeable differences. On <link linkend="ch8-F6">Figure 8.6</link>, only detected element at concentrations higher than the LLD are reported; i.e. if an element is not detected in one of institute, it is not mentioned in the graph. The first conclusion is that more elements are detected by VPD-ICPMS due to the lower LLDs. All the concentrations are lower than 1E+11 at/cm<math id="Ch8.S3.SS1.p3.m1" display="inline"><msup><mi></mi><mn>2</mn></msup></math> and are in accordance with TXRF results. The second conclusion is that the three analysed wafers have not the same contamination. If CEA-Leti and imec found Ga, Ge and Sb, Fraunhofer did not detect these elements. Imec and Fraunhofer quantified Al, Fe, Ti and W whereas CEA-Leti did not find these elements. The analysed wafers are not twins because the cross-contamination process do not allow to contaminate each wafers at the same concentration. Moreover, some wafers were more handled and shipped than other and these differences impact the metallic contamination.</para>
<fig id="ch8-F6" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 8.6:</emphasis> Comparison VPD-ICPMS results of CEA LETI / IMEC /FhG for IMEC inspection tool</para></caption>
<graphic xlink:href="graphics/ch8-fig06.jpg"/>
</fig>
<para><link linkend="ch8-F7">Figure 8.7</link> shows the results obtained on the bevel. Contamination levels on the bevel are higher than those measured on the surface. In this example, results obtained by CEA-Leti and imec are in agreement when the elements are detected by both institutes. Concentrations measured by imec are almost higher than those of CEA-Leti, probably due to the different influencing factors. At first, collection techniques are different and the droplet scanned areas are not the same. Moreover, the bevel of each wafers was probably contaminated by the handling and the shipping. That is why concentrations obtained on the bevel were always higher than those obtained on the surface. The study of the bevel is very challenging and these results show the metallic contamination due to the process in the selected equipment, but also those brought by the handling and the shipping.</para>
<fig id="ch8-F7" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 8.7:</emphasis> Comparison VPD-ICPMS bevel results of CEA LETI / IMEC for IMEC inspection tool</para></caption>
<graphic xlink:href="graphics/ch8-fig07.jpg"/>
</fig>
</section>
</section>
<section class="lev1" id="ch8-4">
<title>8.4 Conclusiion</title>
<para>This study confirms that the three different institutes are able to analyse metallic contamination either by TXRF or VPD-ICPMS with comparable LLDs. This result is very promising for the exchange of wafers in the future. TXRF, with higher LLDs, did not show metallic contamination above 1E+11 at/cm<math id="Ch8.S4.p1.m1" display="inline"><msup><mi></mi><mn>2</mn></msup></math>. On the other side, due to very low limits of detection, VPD-ICPMS allows to observe different concentrations obtained by the different institutes. Nevertheless, these concentrations are very low. The cross-contamination in a tool do not allow to contaminate wafers at the same level. Hence in the future, in order to compare more reliably the capabilities of different institutes, an inter-laboratory test with intentionally standardised contaminated wafers would be necessary. Moreover, all the measurements were done on &#x201c;witness wafers&#x201d; and not on product-wafers. In the future, it will be necessary to develop techniques able to analyse the metallic contamination on real wafers during their flow. In this way, CEA-Leti has developed a new system allowing the metallic contamination control of the bevel of product wafers. (Boulard, et al., 2022) (FR Patentnr. U.S. Patent No 20200203190 A1, 2020).</para>
<para>Although some additional improvement is required to create a smooth loop between the research institutes, this work makes wafer exchange flow much easier due to the first experiences and contribute to the strengthening of the collaboration in current and future projects. Moreover, the conclusion of this study broadens the capabilities in terms of tool, process and expertise access for potential industrial partners. Thus, an important milestone has been reached in aligning the three research institutes to offer advanced AI systems with novel non-volatile memory components.</para>
</section>
<section class="lev1">
<title>Acknowledgements</title>
<para>This study was fully financed by TEMPO project. The TEMPO project has received funding from the Electronic Components and Systems for European Leadership Joint Undertaking under grant agreement No 826655. This Joint Undertaking receives support from the European Union&#x2019;s Horizon 2020 research and innovation program and Belgium, France, Germany, Switzerland, The Netherlands.</para>
</section>
<section class="lev1" id="ch8-Ref">
<title>References</title>
<para id="ch8-bib1">[1] C. Bigot, A. Danel, S. Thevenin (2005). Influence of Metal Contamination in the Measurement of p-Type Cz Silicon Wafer Lifetime and Impact on the Oxide Growth. Solid State Phenomena (Vols. 108-109), S. 297&#x2013;302 <ulink url="http://www.scientific.net/SSP.108-109.297">doi:10.4028/www.scientific.net/SSP.108-109.297</ulink></para>
<para id="ch8-bib2">[2] Y. Borde, A. Danel, A. Roche, A. Grouillet, M. Veillerot (2007). Estimation of Detrimental Impact of New Metal Candidates in Advanced Microelectronics. Solid State Phenomena (Vol. 134), S. 247&#x2013;250 <ulink url="http://www.scientific.net/SSP.134.247">doi:10.4028/www.scientific.net/SSP.134.247</ulink></para>
<para id="ch8-bib3">[3] F. Boulard, V. Gros, C. Porzier, L. Brunet, V. Lapras, F. Fournel, N. Posseme (21. Mai 2022). Bevel contamination management in 3D integration by localized SiO2 deposition. SSRN Journal (SSRN Electronic Journal)</para>
<para id="ch8-bib4">[4] D. Autillo, et al. (June 2020). FR Patentnr. U.S. Patent No 20200203190 A1</para>
</section>
</chapter>
<chapter class="chapter" id="ch9" label="9" xreflabel="9">
<title>A Framework for Integrating Automated Diagnosis into Simulation</title>
<subtitle>David Kaufmann and Franz Wotawa</subtitle>
<affiliation>Graz University of Technology, Austria</affiliation>
<section class="lev1">
<title>Abstract</title>
<para>Automatically detecting and locating faults in systems is of particular interest for mitigating undesired effects during operation. Many diagnosis approaches have been proposed including model-based diagnosis, which allows to derive diagnoses from system models directly. In this paper, we present a framework bringing together simulation models with diagnosis allowing for evaluating and testing diagnosis models close to its real world application. The framework makes use of functional mock-up units for bringing together simulation models and enables their integration with ordinary programs written in either Python or Java. We present the integration of simulation and diagnosis using a two-lamp example model.</para>
<para><emphasis role="strong">Keywords:</emphasis> model-based diagnosis, fault detection, fault localization, physical simulation</para>
</section>
<section class="lev1" id="ch9-1">
<title>9.1 Introduction</title>
<para>To keep systems operational, we need to carry out diagnoses regularly. Diagnosis includes the detection of failures, the localization of corresponding root causes, and repair. We carry out regular maintenance activities that include diagnosis and predictions regarding the remaining lifetime of components to prevent systems from breaking during use. However, there is no guarantee that system components are not breaking during operation, even when carrying out maintenance as requested. In some cases, it is sufficient to indicate such a failure, i.e., via presenting a warning or error message and passing mitigation measures to someone else. Unfortunately, there are systems like autonomous systems where we can hardly achieve such a mitigation process. For example, in fully autonomous driving, there is no driver anymore for passing control. Therefore, there is a need for coming up with advanced diagnosis solutions that cover detection, localization, and repair. A practical real world problem demonstration of an on-board control agent was validated in the year 1999, within the scope of Deep Space One, a space exploration mission, carried out by NASA. Regarding this, the authors of the paper [<link linkend="ch9-bib4">4</link>] describe developed methods related to model-based programming principles, including the area of model-based diagnosis. The methods were applied on autonomous systems, designed for high reliability, operating as subject of a spacecraft system.</para>
<para>When we want to integrate advanced diagnosis into systems, we need to come up with means for allowing us to easily couple monitoring with diagnosis. As stated by the authors in [<link linkend="ch9-bib3">3</link>], the coupling enables the diagnosis method to detect and localize faults based on observations, obtained by monitoring a cyber-physical system (CPS). Furthermore, we require close integration of today&#x2019;s development processes, which rely on system simulation. The latter aspect is of uttermost importance for showing early that diagnosis based on monitoring can improve the overall behaviour of a system even when working not as expected. We contribute to this challenge and present a framework for integrating different simulation models and diagnoses. The framework utilizes combining functional mock-up units (FMUs) that may originate from modeling environments like Open Modelica<sup>1</sup> with ordinary programming languages like Java or Python. We use these language capabilities to integrate diagnosis functionality. The architecture of our framework is based on the client-server pattern and implemented using Docker containers.</para>
<para>Using our framework, we can easily add diagnoses into systems. In addition, we can use this framework for carrying out verification and validation of the system functionality enhanced with diagnosis capabilities. In this manuscript, we present the framework and show the integration of diagnosis. For the latter purpose, we make use of a simple example. We will make the framework and the underlying diagnosis engine available for free and as open-source. The framework contributes to research area of Edge Artificial Intelligence because it enables the direct use of diagnosis functionality that is based on Artificial Intelligence methodology in systems without the necessity for communication with other systems.</para>
<para>We structure the paper as follows. First, we discuss the foundations behind the used diagnosis method, i.e., model-based diagnosis. Afterwards, we describe the simulation framework that is based functional mock-up units using a small example. We further show how diagnosis can be integrated into this framework, and finally we conclude the paper.</para>
</section>
<section class="lev1" id="ch9-2">
<title>9.2 Model-based Diagnosis</title>
<para>Diagnosis, i.e., the detection of failures and the identification of faults, have been of interest for several decades. In the early eighties of the last century, Davis and colleagues [<link linkend="ch9-bib1">1</link>] [<link linkend="ch9-bib2">2</link>] introduced the basic concepts behind model-based diagnosis. The idea is to utilize a model of the system directly for detecting and locating faults. Reiter [<link linkend="ch9-bib5">5</link>] formalized the idea utilizing first-order logic. For a more recent paper we refer to Wotawa and Kaufmann [<link linkend="ch9-bib8">8</link>] where the authors introduced how advanced reasoning systems can be used for computing diagnosis. For recent applications of diagnosis in the context of CPS have a look at [<link linkend="ch9-bib3">3</link>] [<link linkend="ch9-bib9">9</link>] [<link linkend="ch9-bib7">7</link>] [<link linkend="ch9-bib6">6</link>].</para>
<fig id="ch9-F1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 9.1:</emphasis> A simple electric circuit comprising bulbs, a switch and a battery.</para></caption>
<graphic xlink:href="graphics/ch9-fig01.jpg"/>
</fig>
<para>In the following, we illustrate the basic concepts using a small example circuit comprising a battery <math id="Ch9.S2.p2.m1" display="inline"><mi>B</mi></math>, a switch <math id="Ch9.S2.p2.m2" display="inline"><mi>S</mi></math>, and two bulbs <math id="Ch9.S2.p2.m3" display="inline"><msub><mi>L</mi><mn>1</mn></msub></math>, <math id="Ch9.S2.p2.m4" display="inline"><msub><mi>L</mi><mn>2</mn></msub></math>. The bulbs are put in parallel and both should provide light when the switch is turned on and the battery is not empty. Otherwise, both bulbs do not deliver any light. We depict the circuit in <link linkend="ch9-F1">Figure 9.1</link>. If we know that the switch <math id="Ch9.S2.p2.m5" display="inline"><mi>S</mi></math> is on, and the battery is working as expected, then we also would expect both bulbs to be illuminated. In case one bulb is emitting light but the other is not, we would immediately to derive that the bulb with no transmitting light is broken.</para>
<para>To compute diagnoses from system models, we first need to come up with a model of the system that we want to diagnose. Such models comprise components and their connections, via ports. Hence, in the following, we discuss the component models, and a model of connections separately. For the electric circuit, we simplify modelling by only considering that components like batteries are providing electrical power, some are transferring power like switches, and others are consuming power. Furthermore, we utilize first-order logic for formalization where we follow Prolog syntax<sup>2</sup>. For all the component models we describe how values are computed assuming that the component is of a particular type and that it is working as expected. For the type we use a predicate <emphasis role="strong">type\2</emphasis> and for stating the component to be correct a predicate <emphasis role="strong">nab\1</emphasis>.</para>
<para><emphasis role="strong">Battery</emphasis> A component X that is a battery is when working correctly providing a nominal power at its output.</para>
<para>val(pow(X),nominal) :- type(X,bat), nab(X).</para>
<para><emphasis role="strong">Switch</emphasis> A component X that is a switch works as follows. If it is on and working as expected, then the output must have the same value as the input port and vice versa. If it is off, the switch is not transferring any power.</para>
<para>val(out_pow(X),V) :-type(X,sw), on(X), val(in_pow(X),V), nab(X).</para>
<para>val(in_pow(X),V) :-type(X,sw), on(X), val(out_pow(X),V), nab(X).</para>
<para>val(out_pow(X),zero) :- type(X,sw), off(X), nab(X).</para>
<para><emphasis role="strong">Lamp</emphasis> A lamp X is on, whenever there is a power on its input. If it emits light, then there must be power on its input. If there is no power at the input of X, then the light must be off.</para>
<para>val(light(X),on) :- type(X,lamp), val(in_pow(X), nominal), nab(X).</para>
<para>val(in_pow(X), nominal) :- type(X,lamp), val(light(X),on).</para>
<para>val(light(X),off) :-type(X, lamp), val(in_pow(X),zero), nab(X).</para>
<para>For completing the model, we introduce connections using a predicate <emphasis role="strong">conn\2</emphasis> that allows to state two ports to be connected. The behaviour of a component comprises the transfer of values in both directions, and stating that it is impossible to have different values at a connection. The following rules are covering this behaviour:</para>
<para>val(X,V) :- conn(X,Y), val(Y,V).</para>
<para>val(Y,V) :- conn(X,Y), val(X,V).</para>
<para>:- val(X,V), val(X,W), not V=W.</para>
<para>To use a model for diagnosis we only need to define the structure of the system making use of the component models. For the two bulb example, we define a battery, a switch, and two bulbs that are connected accordingly to <link linkend="ch9-F1">Figure 9.1</link>.</para>
<para>type(b, bat).</para>
<para>type(s, sw).</para>
<para>type(l1, lamp).</para>
<para>type(l2, lamp).</para>
<para></para>
<para>conn(in_pow(s), pow(b)).</para>
<para>conn(out_pow(s), in_pow(l1)).</para>
<para>conn(out_pow(s), in_pow(l2)).</para>
<para>To use this model for diagnosis, we further need observations. We might state that the switch s is on, bulb l1 is not on but l2 is. Again we can make use of Prolog to represent this knowledge as facts:</para>
<para>on(s).</para>
<para>val(light(l1),off).</para>
<para>val(light(l2),on).</para>
<para>When using a diagnosis engine like described in [<link linkend="ch9-bib8">8</link>] we obtain one single fault diagnosis <math id="Ch9.S2.p11.m1" display="inline"><mrow><mo stretchy="false">{</mo><mtext mathvariant="monospace">l1</mtext><mo stretchy="false">}</mo></mrow></math>. But how is this working? The diagnosis engine makes use of a simple mechanism. It searches for a truth setting to the nab<math id="Ch9.S2.p11.m2" display="inline"><mrow><mi></mi><mo mathvariant="normal">\</mo><mn mathvariant="normal">1</mn></mrow></math> predicates, such that the model together with these assumptions is not leading to a contradiction. When assuming l1 to be not working, the fact that lamp l2 is on can be derived. However, we cannot derive anything else that would lead to a contradiction.</para>
<para>Note that this simple model is also working in other more interesting cases. Let us assume that the switch is on but no light is on. For this case, the diagnosis engine delivers three diagnoses: <math id="Ch9.S2.p12.m1" display="inline"><mrow><mo stretchy="false">{</mo><mtext mathvariant="monospace">b</mtext><mo stretchy="false">}</mo></mrow></math>, <math id="Ch9.S2.p12.m2" display="inline"><mrow><mo stretchy="false">{</mo><mtext mathvariant="monospace">s</mtext><mo stretchy="false">}</mo></mrow></math>, and <math id="Ch9.S2.p12.m3" display="inline"><mrow><mo stretchy="false">{</mo><mtext mathvariant="monospace">l1</mtext><mo>,</mo><mtext mathvariant="monospace">l2</mtext><mo stretchy="false">}</mo></mrow></math> stating the either the battery is empty, the switch is broken, or both lamps are not working at the same time. Another interesting case that might occur is setting the switch to off, put still one lamp, i.e., l1 is on. In this case we only obtain a double fault diagnosis <math id="Ch9.S2.p12.m4" display="inline"><mrow><mo stretchy="false">{</mo><mtext mathvariant="monospace">s</mtext><mo>,</mo><mtext mathvariant="monospace">l2</mtext><mo stretchy="false">}</mo></mrow></math> stating that the switch is not working as expected and lamp l2 as well.</para>
</section>
<section class="lev1" id="ch9-3">
<title>9.3 Simulation and Diagnosis Framework</title>
<para>In the following section, we introduce a framework making use of two collaborating tools, comprising a simulation environment for function mock-up unit (FMU)<sup>3</sup> models and a diagnose application based on the theorem solver Clingo<sup>4</sup>. <link linkend="ch9-F2">Figure 9.2</link> gives a brief overview of the framework and the operating principles. The FMU simulation tool server is utilized to run a CPS model within the given simulation environment, whereas the client enables to control the simulation. The separation enables to execute other applications, tools and methods after each simulation time step update, as the ASP Diagnose Tool (see Section 9.3.2). The mentioned tool receives the observations provided by the simulation framework and a settings configuration to compute the diagnose of a system, based on an abstract model, developed with the declarative programming language ASP (Answer Set Programming). Further, the diagnose may be used to control the inputs and parameter to restore a safe operating system or to bring the system in a state to prevent harm to the system or environment.</para>
<fig id="ch9-F2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 9.2:</emphasis> Illustration of the simulation and diagnose environment as well as the overall operating principles. The framework of the FMU Simulation Tool provides an interface to enable the integration of a diagnose tool and/or other methods. The models can be substituted by any others in the provided framework.</para></caption>
<graphic xlink:href="graphics/ch9-fig02.jpg"/>
</fig>
<section class="lev2" id="ch9-3-1">
<title>9.3.1 FMU Simulation Tool</title>
<para>The developed application provides an entire environment to load, configure, run, observe and control simulations related to CPS models. In general, the application is set up as a client-server system to distribute the structure between the provider of a service, the server, and the service requester, the client. The service executed on the server is defined as the simulator environment providing the options to observe and control the simulation by client requests during run-time. The reason of using a client-server system is to detach the simulation environment and the observation/control process. The separation enables the user to utilize individual programming environments/languages as client, whereas the server works independent to the selected client environment, receiving and sending the control commands and simulation observations via a REST (Representational State Transfer) application programming interface. In order to run a simulation of a CPS model with the described application, a fundamental requirement is to generate a standardized FMU from the given model. Common modeling software as OpenModelica or Matlab<sup>5</sup> have a FMU generation tool already implemented, but there are also other applications, as e.g. UniFMU<sup>6</sup>, which are capable of generating a FMU from different language source code (Python, Java or C/C++). A FMU enables to use a general simulation environment for all kind of models, although they are build on different sources. The simulation environment is developed to execute a step by step (for a given time step) simulation. To enable that feature, it is essential that the FMU is generated as a co-simulation model. Within a co-simulation setup, the numerical solver is embedded and supplied by the generated FMU. By the provided interface methods, the FMU can be controlled by setting the inputs and parameter, computing the next simulation time step, and reading the resulting observations. The given setup, enables to execute tools and methods while the simulation is paused after a simulated time step.</para>
</section>
<section class="lev2" id="ch9-3-2">
<title>9.3.2 ASP Diagnose Tool</title>
<para>To enable diagnoses based on observations of a given CPS model, we developed a diagnose tool. This tool is built up on the theorem solver Clingo and makes use of the provided methods within a Python environment. In addition the tool provides extended functionalities, e.g. including observations as simulation outputs, inputs, states, modes or time and applying optional settings as limiting the number of required answer sets, setting the maximum fault size search space for abnormal component behaviour, considering additional fault modes and adding other constraints to be considered.</para>
<para>The tool is designed to iterate through each fault size in ascending order, whereas fault size zero indicates a normal operating system without detecting any abnormal behaviour in the diagnosed components. The procedure is repeated for each fault size, except when the model is satisfied for fault size zero, what is interpreted as no abnormal component is present for the given observation. The detailed theorem solver implementation structure is shown in algorithm 1, which was initially introduced and applied by Wotawa, Nica and Kaufmann [<link linkend="ch9-bib3">3</link>]. In the following, we briefly describe the setup of the stated algorithm. First the input model is initiated, defined as an abstract model (<math id="Ch9.S3.SS2.p2.m1" display="inline"><mi>M</mi></math>), comprising the system description (<math id="Ch9.S3.SS2.p2.m2" display="inline"><mrow><mi>S</mi><mo>&#x2062;</mo><mi>D</mi></mrow></math>), observations (<math id="Ch9.S3.SS2.p2.m3" display="inline"><mrow><mi>O</mi><mo>&#x2062;</mo><mi>b</mi><mo>&#x2062;</mo><mi>s</mi></mrow></math>) and additional fault modes (<math id="Ch9.S3.SS2.p2.m4" display="inline"><mrow><mi>F</mi><mo>&#x2062;</mo><mi>M</mi></mrow></math>) to guide the diagnosis search. We start with an empty diagnosis set (<math id="Ch9.S3.SS2.p2.m5" display="inline"><mrow><mi>D</mi><mo>&#x2062;</mo><mi>S</mi></mrow></math>) and compute diagnosis of a certain size, iterating from <math id="Ch9.S3.SS2.p2.m6" display="inline"><mn>0</mn></math> to <math id="Ch9.S3.SS2.p2.m7" display="inline"><mi>n</mi></math>. Line 4 shows how the limitation of the number for abnormal predicates is applied to the model (<math id="Ch9.S3.SS2.p2.m8" display="inline"><msub><mi>M</mi><mi>f</mi></msub></math>), before the solver is called (line 5). A specified answer set is returned and filtered for abnormal predicates (<math id="Ch9.S3.SS2.p2.m9" display="inline"><mi>S</mi></math>) only. To prevent the multiple occurrence of abnormal elements (<math id="Ch9.S3.SS2.p2.m10" display="inline"><mi>C</mi></math>) in the iterations, the corresponding integrity constraints are added to the model (<math id="Ch9.S3.SS2.p2.m11" display="inline"><msub><mi>M</mi><mi>f</mi></msub></math>) as stated in line 12. In relation to the given example in Section 9.2, a integrity constraint at fault size 1 could be stated as :- ab(l1). for a detected abnormal behaviour of the component lamp 1.</para>
<para>Besides the main diagnose algorithm, the tool enables different output options to simplify the evaluation of the received diagnose. Thus, the received data can be exported in a JSON file, CSV file or directly printed in the terminal during run-time. The output results are the detailed computed diagnose, the total number of found diagnosis for each fault size, an indicator for strong faults and the diagnose time separated for each fault size and in total. As input, the tool requires the Prolog model, representing the CPS as abstract model (see Section 9.2), and the related observation/constraint file with all necessary input information to execute the diagnose process.</para>
<para>In reference to <link linkend="ch9-F2">Figure 9.2</link>, we show the simulation tool update loop, where an update is triggered and the observations are received. Further the observations are passed by the method interface as input to the implemented diagnose tool. Before calling the diagnose, some configurations are specified, as the abstract model, the maximum number of computing answer sets, the maximum fault size of interest and the observations, which are generated based on the simulation output information. In addition, the diagnose output format, e.g., JSON or CSV can be selected. Last, the ASP theorem solver with the given model, configuration and simulation observations is executed. After receiving the diagnose result of the current time frame, it is stored in the defined format structure and the simulation is continued with the next time step in the loop.</para>
</section>
</section>
<section class="lev1" id="ch9-4">
<title>9.4 Experiment</title>
<para>To show the applicability of the framework, we make use of the two-lamps-model concept as shown in <link linkend="ch9-F1">Figure 9.1</link>. For the simulation, a model of the two-lamps-model (see Listing 1) is generated in OpenModelica comprising a battery (<math id="Ch9.S4.p1.m1" display="inline"><mrow><mn>5.0</mn><mo>&#x2062;</mo><mi>V</mi></mrow></math>), a closing switch and two light bulbs (<math id="Ch9.S4.p1.m2" display="inline"><mrow><mn>100</mn><mo>&#x2062;</mo><mi mathvariant="normal">&#x3a9;</mi></mrow></math>). Besides the connection of each component, the model also describes inputs, which can be set during the simulation. These inputs are covering the fault type of each component and the operational switch logic. To give an example of the component programming, the switch model is shown in more detail at Listing 2. Besides the component mode, the equations also represent the behaviour based on different fault states, e.g. a broken switch, resulting in an infinite high internal resistor value equal to an open electrical circuit. An equivalent fault state is implemented for each component as shown in <link linkend="ch9-T1">Table 9.1</link>.</para>
<para><graphic xlink:href="graphics/ch9-alo01.jpg"/></para>
<fig id="ch9-L1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Listing 1:</emphasis> OpenModelica model of a two-lamp electrical circuit with fault injection capability to each used component. The component connections are specified to describe the same electrical circuit as given in <link linkend="ch9-F1">Figure 9.1</link>.</para></caption>
<graphic xlink:href="graphics/ch9-lis01.jpg"/>
</fig>
<fig id="ch9-L2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Listing 2:</emphasis> OpenModelica model of a switch component including a mode {open, close} and fault state {ok, broken, short} implementation logic.</para></caption>
<graphic xlink:href="graphics/ch9-lis02.jpg"/>
</fig>
<fig id="ch9-T1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 9.1:</emphasis> CPS Model component state description for the light bulb, switch and battery. All used states, including fault states of the components are shown.</para></caption>
<graphic xlink:href="graphics/ch9-tab01.jpg"/>
</fig>
<para>Moreover the OpenModelica model is converted into a co-simulation FMU, which enables to use the model in the described FMU simulation tool. In order to simulate the model behaviour in detail, the update time step is set to 0.01 seconds. In addition, the fault injection during run-time is configured to trigger a single light bulb fault at 0.2 seconds and a switch fault after 0.3 seconds, which is described in detail at the simulation part of <link linkend="ch9-F4">Figure 9.4</link>.</para>
<para>For the diagnose part, we make use of the described abstract model of the electrical two-lamps circuit (see Section 9.2). The overall framework is built up in a way, that a diagnose is computed after each simulated time step and is based on the actual observations (simulation outputs, parameter and inputs). The use of a co-simulation FMU, allows a step-by-step simulation, which enables to pause the simulation during the diagnose process and continuing afterwards. Therefore, the diagnose time effort has no impact on the overall simulation results.</para>
<fig id="ch9-F3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 9.3:</emphasis> Simulation showing the measured signal output of the two bulbs, switch and the battery. For this example a fault injection (broken) in bulb 1 after 0.2 seconds (red indicator) and a fault injection (broken) to the switch after 0.3 seconds (orange indicator) is initiated.</para></caption>
<graphic xlink:href="graphics/ch9-fig03.jpg"/>
</fig>
<para><link linkend="ch9-F3">Figure 9.3</link> shows the observed signals for the current flow in the battery, light bulb 1 and 2 as well as the actual switch mode. Further the injected faults are highlighted at the correlated time point. In <link linkend="ch9-F4">Figure 9.4</link> a table represents the observations for the three interesting time sections, as the normal behaviour, a broken light bulb and a broken switch. After reaching simulation time 0.05 seconds, the switch mode is changed from open to closed and the model shows the expected ordinary behaviour without any abnormal components. Both light bulbs are operating at an expected current consumption of 0.05 A. These observations are translated to a readable input format for the diagnose tool, which is shown in the corresponding status row "Observation" (see <link linkend="ch9-F4">Figure 9.4</link>). In regards to the abstract model and the observation input, the diagnose tool computed a satisfied model at fault size zero, which concludes an expected ordinary behaviour of all considered components.</para>
<para>The time section at 0.2 seconds shows the behaviour with a broken light bulb. Thus the current consumption of bulb 1 immediately drops to 0.0 A and the diagnose observation changes from mode on to off. Since the main power switch is still closed and bulb 2 is in active mode on, the diagnose model concludes component bulb 1 as abnormal ab(l1). The next investigated fault (broken) is injected to the closed switch. Since the power supply for both light bulbs is not given, the current consumption drops to 0.0 A. The diagnose model concludes as expected an abnormal switch (ab(s)) or battery (ab(b)) based on the given observations for single faults. Under consideration of double faults, the computed diagnose shows a combination of an abnormal behaviour of light bulb 1 and bulb 2 ({ab(l1), ab(l2)}), which is also a possible solution for the given observation.</para>
<fig id="ch9-F4" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 9.4:</emphasis> Simulation and diagnose output results based on the electrical two-lamps circuit with a broken bulb after 0.2 seconds and a broken switch at 0.3 seconds. The upper tables illustrate the simulation input/output signals, which are used as observation for the diagnose (lower tables) part. Based on the given observations for the three selected time steps, different diagnose results are obtained.</para></caption>
<graphic xlink:href="graphics/ch9-fig04.jpg"/>
</fig>
</section>
<section class="lev1" id="ch9-5">
<title>9.5 Conclusion</title>
<para>In this paper, we have shown how to use an automated diagnosis method within a simulation framework for a CPS (cyber-physical system). For this purpose we introduced the foundations behind the model-based diagnosis method based on a simple electric circuit model comprising two light bulbs, a switch and battery. Next we describe a framework for simulating the developed CPS model with the ability of fault injection during run-time. In order to run the model in the given framework, it is essential to generate a functional mock-up unit (FMU) based on the developed electrical two lamp circuit model. By providing the FMU in co-simulation configuration, the simulation can run in a step-by-step mode (time steps), which enables to call other functions, as for example the diagnose method, while the simulation is paused and continued with the next time step.</para>
<para>Besides the physical electrical circuit model, an abstract model for diagnosis is developed in the declarative programming language Prolog. For computing the diagnose based on observations of the model simulation, we introduce a tool which uses the theorem solver Clingo and offers additional productive options. The tool is developed to automate the process of searching for abnormal components at each fault size (in ascending order). To prevent multiple occurrence of abnormal components in higher fault sizes, the derived results are continuously added as constraints to the model.</para>
<para>In order to demonstrate the concept of the simulation framework with the automated diagnose tool, we executed an experiment based on the described electrical two-lamps circuit model with the capability of fault injection to the light bulbs and switch. After each time step of simulation, the received observations are forwarded as input to the diagnose tool. The diagnose tool enables to detect the injected faults in a fast and accurate way, as a single bulb fault or even the more interesting case, when a switch erroneously indicates a closed position although both light bulbs are powered off. In this case, we obtain a single fault for an abnormal switch or battery behaviour, and a double fault stating an abnormal behaviour for both light bulbs in combination.</para>
<para>For the purpose of deploying the diagnose tool on a system applied under real environmental conditions, validation and verification is a fundamental process. Thus we make use of a simulated environment framework, enabling a high test case coverage of scenarios with abnormal component behaviour of the system under test. In addition, the required time to conclude a diagnose, may also lead to issues and need to be considered in the evaluation. Future research includes investigating more complex CPSs by making use of the discussed simulation framework in combination with the diagnose tool and further development of both tools.</para>
</section>
<section class="lev1">
<title>Acknowledgments</title>
<para>The research was supported by ECSEL JU under the project H2020 826060 AI4DI - Artificial Intelligence for Digitising Industry. AI4DI is funded by the Austrian Federal Ministry of Transport, Innovation and Technology (BMVIT) under the program "ICT of the Future" between May 2019 and April 2022. More information can be retrieved from <ulink url="https://iktderzukunft.at/en/">https://iktderzukunft.at/en/</ulink>.</para>
<para><graphic xlink:href="graphics/ch9-alo02.jpg"/></para>
</section>
<section class="lev1" id="ch9-Ref">
<title>References</title>
<para id="ch9-bib1">[1] R. Davis, H. Shrobe, W. Hamscher, K. Wieckert, M. Shirley, and S. Polit. Diagnosis based on structure and function. In <emphasis>Proceedings AAAI</emphasis>, pages 137&#x2013;142, Pittsburgh, August 1982. AAAI Press.</para>
<para id="ch9-bib2">[2] R. Davis. Diagnostic reasoning based on structure and behavior. <emphasis>Artificial Intelligence</emphasis>, 24:347&#x2013;410, 1984.</para>
<para id="ch9-bib3">[3] D. Kaufmann, I. Nica, and F. Wotawa. Intelligent agents diagnostics - enhancing cyber-physical systems with self-diagnostic capabilities. <emphasis>Adv. Intell. Syst.</emphasis>, 3(5):2000218, 2021.</para>
<para id="ch9-bib4">[4] N. Muscettola, P. Pandurang Nayak, B. Pell, and B. C. Williams. Remote agent: to boldly go where no ai system has gone before. <emphasis>Artificial Intelligence</emphasis>, 103(1):5&#x2013;47, 1998. Artificial Intelligence 40 years later.</para>
<para id="ch9-bib5">[5] R. Reiter. A theory of diagnosis from first principles. <emphasis>Artificial Intelligence</emphasis>, 32(1):57&#x2013;95, 1987.</para>
<para id="ch9-bib6">[6] F. Wotawa. Reasoning from first principles for self-adaptive and autonomous systems. In E. Lughofer and M. Sayed-Mouchaweh, editors, <emphasis>Predictive Maintenance in Dynamic Systems &#x2013; Advanced Methods, Decision Support Tools and Real-World Applications</emphasis>. Springer, 2019.</para>
<para id="ch9-bib7">[7] F. Wotawa. Using model-based reasoning for self-adaptive control of smart battery systems. In Moamar Sayed-Mouchaweh, editor, <emphasis>Artificial Intelligence Techniques for a Scalable Energy Transition &#x2013; Advanced Methods, Digital Technologies, Decision Support Tools, and Applications</emphasis>. Springer, 2020.</para>
<para id="ch9-bib8">[8] F. Wotawa and D. Kaufmann. Model-based reasoning using answer set programming. <emphasis>Applied Intelligence</emphasis>, 2022.</para>
<para id="ch9-bib9">[9] F. Wotawa, O. A. Tazl, and D. Kaufmann. Automated diagnosis of cyber-physical systems. In <emphasis>IEA/AIE (2)</emphasis>, volume 12799 of <emphasis>Lecture Notes in Computer Science</emphasis>, pages 441&#x2013;452. Springer, 2021.</para>
</section>
<para><sup>1</sup>see <ulink url="https://openmodelica.org">https://openmodelica.org</ulink></para>
<para><sup>2</sup>We are using Prolog syntax because recent solvers like Clingo (see <ulink url="https://potassco.org/clingo/">https://potassco.org/clingo/</ulink>) are relying on it.</para>
<para><sup>3</sup>see <ulink url="https://fmi-standard.org">https://fmi-standard.org</ulink></para>
<para><sup>4</sup>see <ulink url="https://potassco.org/clingo/">https://potassco.org/clingo/</ulink></para>
<para><sup>5</sup>see <ulink url="https://de.mathworks.com/products/matlab.html">https://de.mathworks.com/products/matlab.html</ulink></para>
<para><sup>6</sup>see <ulink url="https://github.com/INTO-CPS-Association/unifmu">https://github.com/INTO-CPS-Association/unifmu</ulink></para>
</chapter>
<chapter class="chapter" id="ch10" label="10" xreflabel="10">
<title>Deploying a Convolutional Neural Network on Edge MCU and Neuromorphic Hardware Platforms</title>
<subtitle>Simon Narduzzi<sup>1</sup>, Dorvan Favre<sup>1,2</sup>, Nuria Pazos Escudero<sup>2</sup> and L. Andrea Dunbar<sup>1</sup></subtitle>
<affiliation><sup>1</sup>CSEM, Switzerland<?lb?><sup>2</sup>HE-Arc, Switzerland</affiliation>
<section class="lev1">
<title>Abstract</title>
<para>The rapid development of embedded technologies in recent decades has led to the advent of dedicated inference platforms for deep learning. However, unlike development libraries for the algorithms, hardware deployment is highly fragmented in both technology, tools, and usability. Moreover, emerging paradigms such as spiking neural networks do not use the same prediction process, making the comparison between platforms difficult. In this paper, we deploy a convolutional neural network model on different platforms comprising microcontrollers with and without deep learning accelerators and an event-based accelerator and compare their performance. We also report the perceived effort of deployment for each platform.</para>
<para><emphasis role="strong">Keywords:</emphasis> neuromorphic computing, IoT, kendryte, DynapCNN, STM32, performance, comparison, benchmark</para>
</section>
<section class="lev1" id="ch10-1">
<title>10.1 Introduction</title>
<para>Edge computing is a key tool in harnessing the possibilities of artificial intelligence. Some advantages of edge over cloud processing are low latency, allowing real-time application and connectivity independence, i.e., no need of infrastructure and no transmission of sensitive data, allowing improved security and privacy-preserving applications. However, perhaps the most important and as yet untapped potential of edge computing is in the low power possibilities. Low power allows always-on IoT devices for seamlessly integrated intelligent systems. Creating edge-based IoT devices often requires limited hardware resources, both in terms of power and on-device memory. Today&#x2019;s intelligence is mainly based on Deep Learning (DL) networks which are power and memory hungry. This conflict has resulted in several emerging technologies and platforms to perform efficient inference at the edge.</para>
<para>Established companies have both targeted the IoT device by creating ultra-low-power processors (Intel Loihi, STM32 Cortex-M4), but there are also several other innovative platforms such as DynapCNN [<link linkend="ch10-bib1">1</link>] and Kendryte K210 [<link linkend="ch10-bib2">2</link>] specialized for deep neural network inference with a very little power budget. The specialized nature and variety of products and platforms require platform-specific software tools, making the deployment of one model on several platforms cumbersome and creating a barrier to technology adoption. Moreover, the lack of hardware standardization coupled with the necessary customization of the software makes it difficult to compare, and thus choose, the best technology.</para>
<para>To remove this barrier, it is essential to facilitate access to platforms to non-hardware experts. Indeed, the success of DL is essentially linked to the acceleration provided by graphical processing units (GPUs). Currently, only a very small proportion of users have mastered the CUDA programming language used by the majority of GPUs. In most DL libraries, mobilization of the necessary resources can be called in a single command line, without the user having to understand the technology behind it. This kind of single instruction would empower the data scientists in the porting to edge devices.</para>
<para>In this short paper, we give a brief summary of works that address the challenges of implementing DL on different hardware platforms. Initially, we present our results on a basic neural network deployment on edge devices, and then we compare the performance of 3 selected devices. Finally, we describe the lessons learned and present solutions to facilitate the deployment of these models in the future.</para>
</section>
<section class="lev1" id="ch10-2">
<title>10.2 Related Work</title>
<para>Benchmarking low-resource platforms is a necessary process to select the best platforms to embed algorithms. It is a tricky procedure, as the performance of a platform depends on several aspects: the available memory and processing units, the technology of the hardware, and the frameworks and tools used during the deployment of the models to benchmark. To harmonize the performance assessment, benchmarking suites such as TinyMLPerf [<link linkend="ch10-bib3">3</link>] have been created. Recently, a benchmarking suite has been developed for event-based neuromorphic hardware [<link linkend="ch10-bib4">4</link>]. However, both these solutions still need manual adaptation of the code to run on new platforms. While the benchmarking gives good insights about which and why to select a certain platform. It still remains the question of how to use the benchmarking tools itself. Each platform comes with its own SDK, conversion tools, and constraint of utilization, which in turn limits the possibility of comparing the platforms between them.</para>
<para>Today, many benchmarks are therefore performed on just a few hardware platforms and comparing only a single use-case, as alternatives are more cumbersome. Furthermore, it is easier to benchmark and compare platforms from the same constructor, as the deployment pipelines are usually similar between devices. In this regard, standard architectures LeNet-5 and ResNet-20 have been benchmarked on a few STM32 boards [<link linkend="ch10-bib5">5</link>]. Machine learning algorithms have also been compared on Cortex-M processors [<link linkend="ch10-bib6">6</link>] [<link linkend="ch10-bib7">7</link>]. Some efforts of cross-constructor benchmarking have also been made. For example, a recent work deployed a gesture recognition and wake-up words application on an Arduino Nano BLE and a STM32 NUCLEO-F401RE [<link linkend="ch10-bib8">8</link>] using a convolutional neural network.</para>
<para>While the above research focuses on the established STM32 Cortex-M based MCUs, some emerging processors are also explored [<link linkend="ch10-bib9">9</link>], but the research in this domain remains scarce. Furthermore, the deployment pipelines are not documented, which limits the reproducibility of the results. In our research, we deploy a single neural network on three different platforms and observe their performance. We also highlight the difference between the deployment pipelines of each constructor, and we perform a qualitative study of the easiness of deployment on each system.</para>
</section>
<section class="lev1" id="ch10-3">
<title>10.3 Methods</title>
<para>In this section, we present the selected task and associated experimental setup, and a method to evaluate the effort of the deployment.</para>
<section class="lev2" id="ch10-3-1">
<title>10.3.1 Neural Network Deployment</title>
<para>In our experiment, we use 3 different boards. We select boards from different constructors to show the (large) variety of tools and processing available in edge devices today. These sample devices are a very small subset of the large variety of devices today, but they show that with only three different board manufacturers, an extensive adaptation of the deployment pipeline is necessary. The selected 3 devices for our experiments are the following: a Kendryte K210 from Canaan, a dual-core RISC-V processor with floating-point units; an STM32L4R9 from STMicroelectronics (ST) with an ARM Cortex-M4 core also including floating-point unit, and SynSense DynapCNN, an event-based processor. <link linkend="ch10-T1">Table 10.1</link> summarizes the major differences between these platforms.</para>
<section class="lev3" id="ch10-3-1-1">
<title>10.3.1.1 Task and Model</title>
<para>We tested the selected platforms on a simple LeNet-5 [<link linkend="ch10-bib10">10</link>] networks trained on MNIST, which architecture is displayed in <link linkend="ch10-F1">Figure 10.1</link>. This architecture, composed of convolutions layers, average pooling and dense layers, is compatible with all selected platforms. The architecture was trained for <math id="Ch10.S3.SS1.SSSx1.p1.m1" display="inline"><mn>30</mn></math> epochs with a learning rate <math id="Ch10.S3.SS1.SSSx1.p1.m2" display="inline"><mrow><mrow><mn>1</mn><mo>&#x2062;</mo><mi>e</mi></mrow><mo>-</mo><mn>4</mn></mrow></math>. Tensorflow <math id="Ch10.S3.SS1.SSSx1.p1.m3" display="inline"><mn>2.9.1</mn></math> was used to define the H5 model running on the Sipeed and ST boards, while PyTorch <math id="Ch10.S3.SS1.SSSx1.p1.m4" display="inline"><mn>1.11.0</mn></math> was used for DynapCNN. Unfortunately, our efforts to transfer the weights from the Tensorflow model to the PyTorch failed, and we had to train the models separately. The Keras and PyTorch models reached an accuracy of <math id="Ch10.S3.SS1.SSSx1.p1.m5" display="inline"><mrow><mn>99.44</mn><mo>%</mo></mrow></math> and <math id="Ch10.S3.SS1.SSSx1.p1.m6" display="inline"><mrow><mn>99.38</mn><mo>%</mo></mrow></math> on the train set, respectively. We perform inference on the first <math id="Ch10.S3.SS1.SSSx1.p1.m7" display="inline"><mn>1000</mn></math> images of the test dataset.</para>
<fig id="ch10-F1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 10.1:</emphasis> Illustration of LeNet-5 architecture.</para></caption>
<graphic xlink:href="graphics/ch10-fig01.jpg"/>
</fig>
</section>
<section class="lev3" id="ch10-3-1-2">
<title>10.3.1.2 Experimental Setup</title>
<para>For each platform, we used the latest tools available at the time at which this article was written.</para>
<para><emphasis role="strong">Kendryte K210</emphasis></para>
<para>The Kendryte K210 is used with the Sipeed MaixDock M1. The Neural networks embedded in this device were converted from Keras H5 file format, using Tensorflow <math id="Ch10.S3.SS1.SSSx2.Px1.p1.m1" display="inline"><mn>2.9.1</mn></math> and associated TFLite. The firmware version of the Kendryte is <math id="Ch10.S3.SS1.SSSx2.Px1.p1.m2" display="inline"><mn>0.6.2</mn></math>, and the version of the NNCase package used for conversion is <math id="Ch10.S3.SS1.SSSx2.Px1.p1.m3" display="inline"><mn>0.2</mn></math>.</para>
<para><emphasis role="strong">STM32L4R9</emphasis></para>
<para>The STM32L4R9 board with an Arm Cortex-M4 core processor from ST is programmed in C. Due to the complexity of hardware initialization, ST provides a tool, STM32CubeMX <math id="Ch10.S3.SS1.SSSx2.Px2.p1.m1" display="inline"><mn>6.5.0</mn></math>, which automatically generates an initial C project for a specific board. The tool X-CUBE-AI <math id="Ch10.S3.SS1.SSSx2.Px2.p1.m2" display="inline"><mn>7.1.0</mn></math> converts TFLite models into C files which are, alongside the X-CUBE-AI inference library, added to the project. The Keras H5 file network is converted to TFLite format using Tensorflow <math id="Ch10.S3.SS1.SSSx2.Px2.p1.m3" display="inline"><mn>2.8.2</mn></math> and Python <math id="Ch10.S3.SS1.SSSx2.Px2.p1.m4" display="inline"><mn>3.6</mn></math>. Gcc-arm-none-eabi <math id="Ch10.S3.SS1.SSSx2.Px2.p1.m5" display="inline"><mrow><mn>15</mn><mo>&#x2062;</mo><mtext>:</mtext><mo>&#x2062;</mo><mn>10.3</mn><mo>&#x2062;</mo><mtext>-</mtext><mo>&#x2062;</mo><mn>2021.07</mn><mo>&#x2062;</mo><mtext>-</mtext><mo>&#x2062;</mo><mn>4</mn></mrow></math> and Make <math id="Ch10.S3.SS1.SSSx2.Px2.p1.m6" display="inline"><mn>4.2.1</mn></math> are used to compile the whole project, and STM32CubeProgrammer <math id="Ch10.S3.SS1.SSSx2.Px2.p1.m7" display="inline"><mn>2.10.0</mn></math> is used to upload the binaries on the device.</para>
<para><emphasis role="strong">DynapCNN</emphasis></para>
<para>The SynSense DynapCNN processor was programmed using Python <math id="Ch10.S3.SS1.SSSx2.Px3.p1.m1" display="inline"><mn>3.7.13</mn></math> with PyTorch <math id="Ch10.S3.SS1.SSSx2.Px3.p1.m2" display="inline"><mn>1.11.0</mn></math>, Sinabs <math id="Ch10.S3.SS1.SSSx2.Px3.p1.m3" display="inline"><mn>0.3.3</mn></math> (and underlying Sinabs-DynapCNN <math id="Ch10.S3.SS1.SSSx2.Px3.p1.m4" display="inline"><mrow><mn>0.3.1</mn><mo>.</mo><mrow><mtext>dev</mtext><mo>&#x2062;</mo><mn>3</mn></mrow></mrow></math>), and Samna <math id="Ch10.S3.SS1.SSSx2.Px3.p1.m5" display="inline"><mn>0.14.33.0</mn></math> libraries. The neural network is written in PyTorch and converted to a spiking version using Sinabs, while Samna is used to map the network to the hardware. The inputs are presented to the network using a preprocessing function that generates spikes<sup>1</sup> from random sampling of the image, using the following function, where <emphasis role="strong">tWindow</emphasis> is the duration of the spiking frame and <emphasis role="strong">img</emphasis> has shape <emphasis role="strong">[channels, width, height]</emphasis>:</para>
<para><graphic xlink:href="graphics/ch10-alo01.jpg"/></para>
<para>During our simulation, we found <math id="Ch10.S3.SS1.SSSx2.Px3.p3.m1" display="inline"><mn>100</mn></math> timesteps to be sufficient to reach equivalent accuracy between the spiking and non-spiking version of MNIST.</para>
<fig id="ch10-T1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 10.1:</emphasis> Relevant technical specifications of the devices (from constructor websites).</para></caption>
<graphic xlink:href="graphics/ch10-tab01.jpg"/>
</fig>
</section>
<section class="lev3" id="ch10-3-1-3">
<title>10.3.1.3 Deployment</title>
<para>For standalone platforms, the network was converted and uploaded to the platform. For Kendryte, the inference script was written such that the model is loaded at the beginning of the script and processes images one by one. The images are transmitted via serial communication and inferred by inference script. In X-CUBE-AI, this is automatically done, while Kendryte requires a script that sends batches of images and obtains the predictions. For DynapCNN, the images are predicted by sending the corresponding events to the device and reading the output events from the buffer of the board.</para>
<fig id="ch10-F2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 10.2:</emphasis> Deployment pipelines for all platforms. From left to right: STM32L4R9, Kendryte K210 and DynapCNN. For DynapCNN, the pipeline is contained in a single Python script, while the other relay on external languages and tools.</para></caption>
<graphic xlink:href="graphics/ch10-fig02.jpg"/>
</fig>
<para>The prediction time is provided automatically by the X-CUBE-AI platform, while Kendryte requires to time the prediction manually. In the MicroPython script used for inference on Kendryte, we put a counter around the line performing the inference. For DynapCNN, the reported times corresponds to the timestamp of the first output event and the final output event, respectively. Both times are averaged over the test samples. The computation of the key performance indicators (accuracy, mean time) is performed offline. <link linkend="ch10-F2">Figure 10.2</link> illustrates the pipelines for all platforms.</para>
</section>
</section>
<section class="lev2" id="ch10-3-2">
<title>10.3.2 Measuring the Ease of Deployment</title>
<para>One of the major criteria for the adoption of a product is the ease of use, meaning how much one user is autonomous in using the device. This highly depends on the user skills, but also on the quality of the documentation. For embedded machine learning, the documentation should explicitly describe the procedure to deploy a model once the user receives the new platform. We have identified <math id="Ch10.S3.SS2.p1.m1" display="inline"><mn>5</mn></math> different phases that are required when using a microcontroller product for AI acceleration.</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para><emphasis role="strong">Acquisition (A)</emphasis>: this phase comprises the effort needed to place an order for the device and the time necessary to ship the device. A small effort would correspond to ordering the platform from a website and receiving it within the next week. A large effort requires to contact the company by phone or email and wait for two month to receive the device.</para></listitem>
<listitem><para><emphasis role="strong">Setup (S)</emphasis>: this phase comprises the effort needed to install the required environment. A small effort would require installing a python package from pip or an executable available from the constructor website. A large effort requires installing multiple packages which versions depend on the firmware of the device or the version of Python packages used to train the model, as well as dependencies on external tools.</para></listitem>
<listitem><para><emphasis role="strong">Getting started (G)</emphasis>: this phase is the effort needed to replicate the examples given in the documentation. A small effort would correspond to a full deployment example done within one hour. A large effort would require support from the constructor.</para></listitem>
<listitem><para><emphasis role="strong">Model preparation (M)</emphasis>: this phase comprises the effort needed to convert a PyTorch/Tensorflow model to the proprietary format of the device. A small effort would correspond to a single command line with arguments. A large effort corresponds to manually writing the neural network in the proprietary format and transferring the weights, with limited help from the conversion tool, or requiring intervention from the constructor.</para></listitem>
<listitem><para><emphasis role="strong">Inference (I)</emphasis>: this phase comprises the effort needed to perform inference once the model is embedded to the device. A small effort would correspond to a single command line or instruction to perform inference, a medium effort requires writing an inference script and deploying it manually on the hardware platform. A large effort would require intervention from the constructor.</para></listitem>
</itemizedlist>
<para>Each phase is assigned with a number between <math id="Ch10.S3.SS2.p3.m1" display="inline"><mn>1</mn></math> and <math id="Ch10.S3.SS2.p3.m2" display="inline"><mn>5</mn></math>. The total score represents the complexity of deployment. A low value (<math id="Ch10.S3.SS2.p3.m3" display="inline"><mn>5</mn></math>) corresponds to a small effort necessary to deploy a model on a never-used platform, while <math id="Ch10.S3.SS2.p3.m4" display="inline"><mn>25</mn></math> corresponds to a large effort.</para>
</section>
</section>
<section class="lev1" id="ch10-4">
<title>10.4 Results</title>
<para>In this section, we present the results and metrics recorded for each platform, and the effort perceived by the team to perform the experiments.</para>
<section class="lev2" id="ch10-4-1">
<title>10.4.1 Inference Results</title>
<para>The models were successfully deployed on all platforms. <link linkend="ch10-T2">Table 10.2</link> summarizes the results on the <math id="Ch10.S4.SS1.p1.m1" display="inline"><mn>1000</mn></math> first samples of MNIST test dataset. It can be observed that the balanced accuracy is not homogeneous between the platforms. This difference is certainly caused by the different transformations affecting the models during the deployment (conversion). While we initially tried to deploy full-precision models and a quantized version of them, we only had time to deploy it on the ST platform. The evaluation of quantized-aware trained models and evaluation DynapCNN and Kendryte K210 using integer weights is a future work. The models run faster when using 8-bit integer precision on STM32 (even if the platform is made to compute 32-bit floats). The Kendryte K210 is the fastest to compute synchronous frames while DynapCNN is the fastest to provide a result in a 32-bit precision, with <math id="Ch10.S4.SS1.p1.m2" display="inline"><mrow><mn>98.79</mn><mo>%</mo></mrow></math> precision using only the first spike<sup>2</sup>. Unfortunately, only the DynapCNN provides an estimation of the energy consumption, obtained with Sinabs by computing the average number of synpatic operations over the course of the simulations. All the metrics are averaged over the test partition.</para>
<fig id="ch10-T2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 10.2:</emphasis> Results on MNIST dataset for all platforms. For the DynapCNN, we report the accuracy and latency for the first spike prediction and over the entire simulation.</para></caption>
<graphic xlink:href="graphics/ch10-tab02.jpg"/>
</fig>
</section>
<section class="lev2" id="ch10-4-2">
<title>10.4.2 Perceived Effort</title>
<fig id="ch10-T3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 10.3:</emphasis> Perceived effort for each stage of the inference. 1: small, 5: large.</para></caption>
<graphic xlink:href="graphics/ch10-tab03.jpg"/>
</fig>
<para><link linkend="ch10-T3">Table 10.3</link> summarizes the team perceived effort for each of these phases in a qualitative manner. We observe a high variation in the effort perceived for each platform. The model preparation phase seems to be critical. In all the platforms, this phase is perceived as requiring a great effort. Kendryte K210 and STM32L4R9 require the most human intervention to build a complete deployment pipeline, while the deployment pipeline of DynapCNN is automated.</para>
</section>
</section>
<section class="lev1" id="ch10-5">
<title>10.5 Conclusion</title>
<para>Although the development of embedded machine learning holds great promise, the lack of consistency and standardization across devices makes development extremely platform-dependent. Deploying a model on these devices requires to use of low-level tools, such as C language. However, most models are developed using (high-level) Python-based tools. The deployment process of a model therefore requires adaptation of the model from Python to C, which is time-consuming and is prone to errors and artifacts in the final implementation. Platform providers are aware of this problem and have started putting effort into facilitating the deployment by providing automated tools and interfaces with DL frameworks. Specifically, for the platforms used in these experiments, Sipeed has ported MicroPython to the Maix Dock, allowing to write code close to the one used to train the model; SynSense provides a library that allows interaction with the DynapCNN directly from a Python script, and allow simulation of the model before deployment, to get a quick idea of performance. Finally, the well-established ST-Microelectronic provides the X-CUBE-AI tool, which, in addition to analyzing the model before deployment, offers the possibility of validating the model on the target and retrieves relevant metrics without writing a single line of code.</para>
<para>However, these tools are recent and standards are not yet established. To promote and accelerate the development of machine learning on embedded interfaces, it is necessary to provide standardized tools accessible to model developers, where a minimum of knowledge about the platform is required. This will increase the adoption of the technologies. Some points seem essential to facilitate the adoption of low-power technologies, in particular:</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para>Up-to-date documentation: documents specifying platform schematics, APIs and dependencies on external tools must be carefully maintained.</para></listitem>
<listitem><para>The documentation should contain examples for each API call.</para></listitem>
<listitem><para>Model conversion tools should be compatible with most deep learning libraries (Tensorflow and PyTorch) and should detail which version and which operations (layers) are supported by each version of the tool. Ideally, conversion tools should be based on community standards, such as the ONNX format.</para></listitem>
<listitem><para>Model conversion tools should be automated and provide understandable warnings and error messages.</para></listitem>
</itemizedlist>
<para>To reduce the entry barrier for these low-power platforms for developers of Deep Learning models the following interfaces would be beneficial:</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para>A hardware simulation interface, in order to obtain a quick feedback on the feasibility of deploying the model on the platform, and to provide an interpretable error in case of memory exhaustion or unsupported layer.</para></listitem>
<listitem><para>An evaluation of the key performance indicators relevant for edge computing, such as memory consumption, model speed (number of cycles per inference) and energy used during inference.</para></listitem>
</itemizedlist>
<para>These interfaces will enable rapid prototyping and comparison of models for the Edge, while providing a solid foundation for iterating and developing new inference techniques.</para>
</section>
<section class="lev1">
<title>Acknowledgements</title>
<para>This work is supported through the project ANDANTE. ANDANTE has received funding from the ECSEL Joint Undertaking (JU) under grant agreement No 876925. The JU receives support from the European Union&#x2019;s Horizon 2020 research and innovation programme and France, Belgium, Germany, Netherlands, Portugal, Spain, Switzerland. The authors are responsible for the content of this publication.</para>
</section>
<section class="lev1" id="ch10-Ref">
<title>References</title>
<para id="ch10-bib1">[1] Q. Liu, O. Richter, C. Nielsen, S. Sheik, G. Indiveri, and N. Qiao. Live demonstration: face recognition on an ultra-low power event-driven convolutional neural network asic. In <emphasis>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops</emphasis>, pages 0&#x2013;0, 2019.</para>
<para id="ch10-bib2">[2] Canaan website. Kendryte K210 description page, 2022.</para>
<para id="ch10-bib3">[3] C. R. Banbury, V. J. Reddi, M. Lam, W. Fu, A. Fazel, J. Holleman, X. Huang, R. Hurtado, D. Kanter, A. Lokhmotov, et al. Benchmarking tinyml systems: Challenges and direction. <emphasis>arXiv preprint arXiv:2003.04821</emphasis>, 2020.</para>
<para id="ch10-bib4">[4] C. Ostrau, C. Klarhorst, M. Thies, and U. R&#xfc;ckert. Benchmarking of neuromorphic hardware systems. In <emphasis>Proceedings of the Neuro-inspired Computational Elements Workshop</emphasis>, pages 1&#x2013;4, 2020.</para>
<para id="ch10-bib5">[5] L. Heim, A. Biri, Z. Qu, and L. Thiele. Measuring what really matters: Optimizing neural networks for tinyml. <emphasis>arXiv preprint arXiv:2104.10645</emphasis>, 2021.</para>
<para id="ch10-bib6">[6] V. Falbo, T. Apicella, D. Aurioso, L. Danese, F. Bellotti, R. Berta, and A. D. Gloria. Analyzing machine learning on mainstream microcontrollers. In <emphasis>International Conference on Applications in Electronics Pervading Industry, Environment and Society</emphasis>, pages 103&#x2013;108. Springer, 2019.</para>
<para id="ch10-bib7">[7] R. Sanchez-Iborra and A. F. Skarmeta. Tinyml-enabled frugal smart objects: Challenges and opportunities. <emphasis>IEEE Circuits and Systems Magazine</emphasis>, 20(3):4&#x2013;18, 2020.</para>
<para id="ch10-bib8">[8] A. Osman, U. Abid, L. Gemma, M. Perotto, and D. Brunelli. Tinyml platforms benchmarking. In <emphasis>International Conference on Applications in Electronics Pervading Industry, Environment and Society</emphasis>, pages 139&#x2013;148. Springer, 2022.</para>
<para id="ch10-bib9">[9] M. de Prado, M. Rusci, A. Capotondi, R. Donze, L. Benini, and N. Pazos. Robustifying the deployment of tinyml models for autonomous mini-vehicles. <emphasis>Sensors</emphasis>, 21(4):1339, 2021.</para>
<para id="ch10-bib10">[10] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. <emphasis>Proceedings of the IEEE</emphasis>, 86(11):2278&#x2013;2324, 1998.</para>
</section>
<para><sup>1</sup>Spikes are binary events (on or off) distributed in input space and time.</para>
<para><sup>2</sup>Some samples (with indices [18, 247, 493, 495, 717, 894, 904, 947] in test set) did not produce any spikes for an unknown reason. In that case, we removed the associated labels and compute the balanced accuracy on the 992 remaining samples.</para>
</chapter>
<chapter class="chapter" id="ch11" label="11" xreflabel="1">
<title>Efficient Edge Deployment Demonstrated on YOLOv5 and Coral Edge TPU</title>
<subtitle>Ruben Prokscha, Mathias Schneider, and Alfred H&#x00F6;&#x00DF;</subtitle>
<affiliation>Ostbayerische Technische Hochschule Amberg-Weiden, Germany</affiliation>
<section class="lev1">
<title>Abstract</title>
<para>The recent advancements towards Artificial Intelligence (AI) at the edge resonate with an impression of a dichotomy between resource intensive, highly abstracted Machine Learning (ML) research and strongly optimized, low-level embedded design. Overcoming such opposing mindsets is imperative for enabling desirable future scenarios such as autonomous driving and smart cities. edge AI must incorporate both straightforward streamlined deployments together with resource efficient execution to achieve general acceptance. This research aims to exemplify how such an endeavour could be realized, utilizing a novel low power AI accelerator together with a state-of-the-art object detection algorithm. Different considerations regarding model structure and efficient hardware acceleration are presented for deploying Deep Learning (DL) applications in resource restricted environments while maintaining the comfort of operating at a high degree of abstraction. The goal is to demonstrate what is possible in the field of edge AI once software and hardware are optimally matched.</para>
<para><emphasis role="strong">Keywords:</emphasis> edge AI, object detection, deep learning, YOLO, embedded systems, tensor processing unit.</para>
</section>
<section class="lev1" id="ch11-1">
<title>11.1 Introduction</title>
<para>With AI shifting from a simple research subject towards end user applications, the issue of efficient deployment moves into focus. ML workloads are decidedly different from average computing tasks. Hence, GPUs were the common solution for such undertakings. Realizing mobile intelligent appliances, requires even more specialized, low power accelerators which can be integrated into embedded environments. Such edge solutions attracted increasing interest within the last years. The European Strategic Research and Innovation Agenda (SRIA) [<link linkend="ch11-bib1">1</link>] concretizes the term even further by introducing the terms Micro-, Deep- and Meta-edge. There are several different solutions available which target this new frontier. Most prominent are the NVIDIA Jetson family, which utilizes optimized embedded GPUs, the Intel Neural Compute Stick 2 which is comprised of a specialized Vision Processing Unit (VPU) and the Google Coral edge Tensor Processing Unit (TPU), which will be the focus of this work. As such, its impact on related research is presented in the following section. The task of object detection was chosen to be part of the experimental test setup for evaluating the accelerator. You Only Look Once (YOLO) version 5 [<link linkend="ch11-bib2">2</link>] serves as delegate for these class of networks in the upcoming section. It is evaluated, how models can be modified to facilitate edge TPU characteristics. Furthermore, it is shown how this optimized solution compares to models provided by Google. With a focus on deployment, a lightweight software stack is introduced which enables efficient AI solutions without sacrificing high-level development. Finally, a conclusion is provided giving a synapsis of the key findings and offering points of interest for future work.</para>
</section>
<section class="lev1" id="ch11-2">
<title>11.2 Related Work</title>
<para>In recent years, the usage of decentralized AI at the edge has become a progressively relevant research topic. Thereby, besides GPU acceleration, the energy-efficient edge TPU was of special interest by research fellows. For applications with strict power or battery limitation, such as in the area of UAV, the usage of the edge TPU is evaluated in recent work. Thereby, applications comprise indoor person-following systems [<link linkend="ch11-bib3">3</link>], vision-based trash and litter detection [<link linkend="ch11-bib4">4</link>], and lightweight odometry estimation [<link linkend="ch11-bib5">5</link>]. Using a U-Net network architecture, Roesler et al. leverage their edge AI setup combining the edge accelerator with a STM32MP157C-DK2 board for the yield estimation of grapes in an agriculture use case [<link linkend="ch11-bib6">6</link>]. But also, other application domains are explored, e.g., in [<link linkend="ch11-bib7">7</link>], which utilizes the edge TPU to process time-series data to determine the remaining useful life. Since at that time Recurrent Neural Networks (RNNs) were not yet supported by the accelerator, their model architecture employs a deep Convolutional Neural Network (CNN). It is worth mentioning that their experiments included measurements for models using quantization-aware training as well as post-training quantization, which outperformed reference CPU and GPU deployments in terms of latency and accuracy. The authors in [<link linkend="ch11-bib8">8</link>] examine the potential of the edge TPU for detecting network intrusion to ensure security at the edge using feed forward and CNN architectures. They elaborate their classification scores on a public benchmark dataset, and further investigate the energy efficiency of their DL algorithms in comparison to traditional CPU processing. Their studies on the effects of larger model sizes reveal a bimodal behaviour of the edge accelerator, indicating a decline of the energy efficiency ratio as soon as a certain model size is exceeded. This finding is the focus of their consecutive work and is confirmed by more refined experiments [<link linkend="ch11-bib9">9</link>].</para>
<para>Besides this applied research of utilizing the edge accelerator for a dedicated application, more theoretical research was conducted to explore and demarcate TPU capabilities. Therefore, several benchmarks were performed to determine its performance empirically using various setups differing in the models under test, obtained metrics, or compared edge devices [<link linkend="ch11-bib10">10</link>, <link linkend="ch11-bib11">11</link>, <link linkend="ch11-bib12">12</link>]. Providing micro-architectural insights, Google researcher, Yazdanbakhsh et al., elaborate an extensive evaluation covering different structures in CNNs and their effects on latency and energy consumption [<link linkend="ch11-bib13">13</link>]. With a similar level of hardware details, the authors in [<link linkend="ch11-bib14">14</link>] analysed the inference of 24 Google edge models, revealing major shortcomings of the edge TPU architecture which must be taken into account for efficient deployment. Furthermore, they incorporate the results into their framework for heterogeneous edge ML accelerators called Mensa, improving the edge TPU performance significantly.</para>
</section>
<section class="lev1" id="ch11-3">
<title>11.3 Experimental Setup</title>
<para><link linkend="ch11-F1">Figure 11.1</link> depicts the setup used for this research. A Raspberry Pi 4 Model B with 4 GB memory served as base platform. The Google Coral edge TPU accelerator was connected either to a USB2 or USB3 port for performance and accuracy evaluation.</para>
<fig id="ch11-F1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 11.1:</emphasis> Raspberry Pi 4 with Google Coral edge TPU USB accelerator.</para></caption>
<graphic xlink:href="graphics/ch11-fig01.jpg"/>
</fig>
<section class="lev2" id="ch11-3-1">
<title>11.3.1 Google Coral Edge TPU</title>
<para>Google developed a custom Application Specific Integrated Circuit (ASIC) for edge inference. This specialized TPU can be connected to existing systems utilizing a USB, (m)PCIe or M.2 interface. <link linkend="ch11-F1">Figure 11.1</link> depicts the USB dongle variant of the accelerator with is advertised to perform up to four trillion operations per second. Approximately 8 MB &#x2018;scratchpad&#x2019; memory is available per unit and the peak power consumption is rated at 2 W [<link linkend="ch11-bib15">15</link>]. Additionally, multiple of these coprocessors can be chained together for handling bigger workloads. The TPU hardware operates on 8 Bit integer variables. Both performance and power consumption benefit from a reduced complexity in the hardware design. However, this introduces weight quantization as an additional step before deployment. The reduction in precision from floating point to 8 Bit integers variables subsequently leads to a deterioration of accuracy. Further overhead is introduced by the addition of quantization operations to the execution graph.</para>
<para>Deploying a model for this device entails several pitfalls due to a rather convoluted development pipeline. Google necessitates its own Tensorflow (TF) framework as starting point. Hence, models from other frameworks must be converted by means of e.g., Open Neural Network Exchange (ONNX). There, a quantization step is performed alongside a conversion to the TFLite format. The final step involves a proprietary edge TPU compiler, which translate the TFLite instructions for the edge TPU. Inference on the other hand is straight forward. The TFLite runtime provides the interfaces for loading and executing the model file, while the libedgetpu is responsible for handling the low-level communication with the accelerator. This allows for a very lightweight deployment of 10 MB to 20 MB (without model) compared to conventional GPU solutions, which can require over a gigabyte disk storage for the libraries alone.</para>
</section>
<section class="lev2" id="ch11-3-2">
<title>11.3.2 YOLOv5</title>
<para>The original You Only Look Once (YOLO) architecture was proposed by Joseph Redmon in 2016 [<link linkend="ch11-bib16">16</link>]. It performs both object detection and classification in a single model. This resulted in a significant performance increase compared to classical two stage designs (e.g., Region Based Convolutional Neural Networks (R-CNNs) [<link linkend="ch11-bib17">17</link>]). Since the original design, many improvements were made. YOLOv5 [<link linkend="ch11-bib2">2</link>] is based on the YOLOv3 [<link linkend="ch11-bib18">18</link>] architecture. It is under constant open-source development by Ultralytics, who shifted the focus from academic research to accessible deployment. They provide an end-to-end solution which allows for training, testing and exporting models to a variety of different deployment frameworks. This includes the integration of the previously described pipeline for generating edge TPU models from version 6.1 onward.</para>
</section>
</section>
<section class="lev1" id="ch11-4">
<title>11.4 Performance Considerations</title>
<para>The Coral accelerator achieves its low energy footprint and high performance by sacrificing flexibility. This manifests itself in a significantly reduced instruction set [<link linkend="ch11-bib19">19</link>]. The edge TPU compiler is a black box which aims to aggregate as much operations as possible and convert them into a binary which can be executed by the coprocessor. Every operation, which is not mapped accordingly, must therefore run-on CPU. This section aims to provide guidance for optimizing a model for edge TPU execution exemplified on YOLOv5 (release 6.1).</para>
<section class="lev2" id="ch11-4-1">
<title>11.4.1 Graph Optimization</title>
<para><link linkend="ch11-F2">Figure 11.2</link> depicts the graphs of two edge TPU models. <link linkend="ch11-F2">Figure 11.2</link>a shows the small variant of the YOLOv5 model with additional optimizations. The EfficientDet Lite0 [<link linkend="ch11-bib20">20</link>] model in <link linkend="ch11-F2">Figure 11.2</link>b was taken from the Coral model zoo [<link linkend="ch11-bib21">21</link>]. Most of the graph is mapped to the edgetpu-custom-op, while some operations are still executed by the main processor. In the following, possible issues are shown when compiling a model and ways to improve the mapping are elaborated.</para>
<fig id="ch11-F2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 11.2:</emphasis> Quantized edge TPU Models.</para></caption>
<graphic xlink:href="graphics/ch11-fig02.jpg"/>
</fig>
<section class="lev3" id="ch11-4-1-1">
<title>11.4.2.1 Incompatible Operations</title>
<para>The compiler only maps operations until it encounters an incompatibility. Everything after that is executed on the CPU. This is especially critical for activation functions (e.g., LeakyReLU, Hardswish), as they are distributed throughout the graph. While it is possible to create multiple TPU subgraphs, the overhead of transferring intermediate tensors several times between CPU and TPU usually eliminates any benefits. It is therefore advisable to use compatible activation functions (e.g., ReLU, Logistic). Furthermore, binary operations (e.g., AND, OR) are also not supported.</para>
</section>
<section class="lev3" id="ch11-4-1-2">
<title>11.4.1.2 Tensor Transformations</title>
<para>The reshape and transpose operation are not mapped once their input tensor exceeds a certain soft threshold. There is no documentation on how this limit is calculated and it seems to be dependent on the general model structure. However, it could be observed, that this threshold is significantly smaller for the transpose operation. A possible explanation for this behaviour could be an inability of the accelerator to address memory in a different order. A transpose operation on CPU would imply a change in the direction (column/row wise) memory is read from the same location. If this is not supported by the TPU, memory reallocation is required.</para>
<para>There are several ways for addressing this issue. One approach is to reduce the size of the input tensor. In CNNs the size is proportionally propagated through the network. Hence, reducing the input size results in smaller intermediate tensors. Further reduction can be induced by limiting the number of output classes. If graph modifications are viable, a divide and conquer strategy can be used to split tensors before the operation and merging afterwards. Moving these operations to the bottom of the graph can also be an option as the instruction are fast on CPU. A last option is using mathematical transformation to change the graph beneficially.</para>
<para>Some of these strategies were used to optimize the YOLOv5 models which are evaluated further in this research. All changes were committed to the open-source project in a pull request [<link linkend="ch11-bib22">22</link>] and are part of the next major release (6.2). <link linkend="ch11-T1">Table 11.1</link> shows the performance impact for the demonstrator setup. Both model variants experienced a significant speedup in inference time. The variant with the larger input size improves significantly.</para>
<fig id="ch11-T1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 11.1:</emphasis> Comparison of YOLOv5s model before and after optimizations.</para></caption>
<graphic xlink:href="graphics/ch11-tab01.jpg"/>
</fig>
</section>
</section>
<section class="lev2" id="ch11-4-2">
<title>11.4.2 Performance Evaluation</title>
<para>In the following, different variants of the optimized YOLOv5 models are compared to other object detectors supplied by Google. All numerical values can be found in <link linkend="ch11-T2">Table 11.2</link>. The inference speed was evaluated utilizing the Google benchmark model tool [<link linkend="ch11-bib23">23</link>]. Version 16 of libedgetpu-max was used, and each inference was repeated 100 times with a previous warm-up phase. Accuracy was determined by pycocotools and the Common Objects in Context (COCO) evaluation dataset [<link linkend="ch11-bib24">24</link>]. The input images were proportionally scaled to input size with bilinear interpolation. The Google models have a postprocessing operation integrated in the model graph (ref. <link linkend="ch11-F2">Figure 11.2</link>b). It was evaluated separately for inference speed and fast Non-Maximum Suppression (NMS) [<link linkend="ch11-bib25">25</link>] was used for all models as it is the default setting of this custom operation. Furthermore, the threshold for confidence was set to 0.001 and overlap to 0.65.</para>
<fig id="ch11-T2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 11.2:</emphasis> Model comparison in regards of input size, file size, operation</para></caption>
<graphic xlink:href="graphics/ch11-tab02.jpg"/>
</fig>
<section class="lev3" id="ch11-4-2-1">
<title>11.4.2.1 Speed-Accuracy Comparison</title>
<para><link linkend="ch11-F3">Figure 11.3</link> shows the mean average precision (mAP<math id="Ch11.S4.SS2.SSSx1.p1.m1" display="inline"><msub><mi></mi><mrow><mn>50</mn><mo>:</mo><mn>95</mn></mrow></msub></math>) of each tested model in relation to the inference speed. It can be observed that the edge TPU works best lower input sizes, while larger inputs cause an unproportionate slowdown compared to the benefit in accuracy. Interesting are the nano and small models with 320 px input. They have an almost identical inference time, while the accuracy of the s-model is significantly better. They share the same vertical graph structure, while the larger one is scaled horizontally by a factor of two. Hence, the small variant has twice as many weights for each convolutional layer. This aligns with the insights from [<link linkend="ch11-bib12">12</link>] that horizontal scaling is preferable. The model should be very close to a sweet spot, for which all weights are cached within the 8 MB device memory. Sacrificing some model vertical space for more width could theoretically improve the accuracy even further.</para>
<fig id="ch11-F3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 11.3:</emphasis> USB3 speed-accuracy comparison of different model types and configurations for edge TPU deployment.</para></caption>
<graphic xlink:href="graphics/ch11-fig03.jpg"/>
</fig>
<para>In general, YOLOv5 performs better than the other models. Only the nano model has issues, which is probably caused by its particularly small file size. If speed is the deciding factor, SSDLite MobileDet [<link linkend="ch11-bib26">26</link>] [<link linkend="ch11-bib27">27</link>], is still the preferable solution. The classical SSD Mobilenetv2 [<link linkend="ch11-bib26">26</link>] [<link linkend="ch11-bib28">28</link>] does not seem to be competitive anymore. The EfficientDet models perform reasonable, however considering the additional overhead by a particularly slow postprocessing operation, YOLOv5 should be considered the better solution. All models share a low accuracy for small objects, which could be an issue inflicted by quantization.</para>
</section>
<section class="lev3" id="ch11-4-2-2">
<title>11.4.2.2 USB Speed Comparison</title>
<para>Considering the purpose of edge accelerators to allow for AI deployment on low power devices, USB3 might not always be an option. Hence, it should be evaluated whether a deployment utilizing USB2 is a viable option. The maximum speed for such a connection is rated at 60 MB/s, while USB3 is specified at almost ten times this value.</para>
<fig id="ch11-F4" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 11.4:</emphasis> YOLOv5s inference speed comparison between USB2 and USB3</para></caption>
<graphic xlink:href="graphics/ch11-fig04.jpg"/>
</fig>
<para><link linkend="ch11-F4">Figure 11.4</link> depicts the speed comparison the small model with varying input size. A considerable difference for the inference speed can be observed. The USB2 interface causes a slowdown by a factor of three. The model parameters should fit entirely into the device memory. Therefore, only the data transfer should impact the speed. Equation (11.4.2) shows how the data flowing to and from the device is calculated. <emphasis>data<math id="Ch11.S4.SS2.SSSx2.p2.m1" display="inline"><msub><mi></mi><mi mathvariant="normal">in</mi></msub></math></emphasis> only depends on the input size, while <emphasis>data<math id="Ch11.S4.SS2.SSSx2.p2.m2" display="inline"><msub><mi></mi><mi mathvariant="normal">out</mi></msub></math></emphasis> also considers the number of anchor boxes (3), strides for multi scale outputs (8, 16, 32) and class count. For the 320 px model, this results in 842.7 KB of data flow per inference, while the 640 px input increases this value to 3.37 MB. Additional data flow could arise due to intermediate tensors, which are too large to be buffered on the device. Whether this is an issue here must be determined in futureresearch.</para>
<table id="Ch11.Sx2.EGx1">
<tr>
<td><math id="Ch11.Ex1.m1" display="inline"><mrow><msub><mi>data</mi><mrow><mi>in</mi><mo separator="true">&#x2003;&#x2003;&#x2006;</mo></mrow></msub><mo>=</mo><mrow><mn>3</mn><mo>&#x2062;</mo><msub><mi mathvariant="normal">x</mi><mi>in</mi></msub><mo>&#x2062;</mo><msub><mi mathvariant="normal">y</mi><mi>in</mi></msub></mrow></mrow></math></td>
<td></td>
</tr>
<tr>
<td><math id="Ch11.E1.m1" display="inline"><mrow><msub><mi>data</mi><mi>out</mi></msub><mo>=</mo><mrow><mrow><mn>3</mn><mo>&#x2062;</mo><mrow><mo>(</mo><mrow><mstyle displaystyle="true"><mfrac><mn>1</mn><msup><mn>8</mn><mn>2</mn></msup></mfrac></mstyle><mo>+</mo><mstyle displaystyle="true"><mfrac><mn>1</mn><msup><mn>16</mn><mn>2</mn></msup></mfrac></mstyle><mo>+</mo><mstyle displaystyle="true"><mfrac><mn>1</mn><msup><mn>32</mn><mn>2</mn></msup></mfrac></mstyle></mrow><mo>)</mo></mrow><mo>&#x2062;</mo><msub><mi mathvariant="normal">x</mi><mi>in</mi></msub><mo>&#x2062;</mo><msub><mi mathvariant="normal">y</mi><mi>in</mi></msub></mrow><mo>*</mo><mrow><mo stretchy="false">(</mo><mrow><mn>5</mn><mo>+</mo><msub><mi mathvariant="normal">n</mi><mi>cls</mi></msub></mrow><mo stretchy="false">)</mo></mrow></mrow></mrow></math></td>
<td>(11.1)</td>
</tr>
</table>
</section>
</section>
<section class="lev2" id="ch11-4-3">
<title>11.4.3 Deployment Pipeline</title>
<para>An AI application can be considered a data pipeline of the steps. At first data must be loaded and pre-processed to comply with the model. In the context of object detection, this implies loading and scaling a jpeg image. The following steps are inference and postprocessing. The latter takes the raw model output and transforms into a usable form. This could involve thresholding, NMS and coordination transforms. The pipeline is executed for each inference, hence all steps should be highly optimized. Most efforts are usually focused towards optimizing the model while neglecting everything else. This section introduces a small deployment stack for object detection, which is both optimized and allows for the usage of well-established high-level frameworks.</para>
<fig id="ch11-F5" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 11.5:</emphasis> Micro software stack for fast and lightweight edge deployment.</para></caption>
<graphic xlink:href="graphics/ch11-fig05.jpg"/>
</fig>
<para>The software stack depicted in <link linkend="ch11-F5">Figure 11.5</link> shows a simple layer model for a lightweight deep vision deployment. The part concerning the TPU was previously elaborated. Loading and transforming images is often handled by OpenCV [<link linkend="ch11-bib29">29</link>]. It uses shared low-level libraries to perform these operations. Providing an optimized image loader, such as libjpeg-turbo [<link linkend="ch11-bib30">30</link>] can therefore accelerate the whole pipeline. Similar is true for Numpy [<link linkend="ch11-bib31">31</link>], which is responsible for performing mathematical tensor operations on CPU. A dedicated math library such as OpenBLAS [<link linkend="ch11-bib32">32</link>] makes use of Single Instruction Multiple Data (SIMD) which performs vector operations faster and more efficient. Such a software stack is similarly fast compared to a solution written in a compiled language, while being way more flexible. It could also be viable to package such an application into a lightweight container for easy deployment using virtualization technologies.</para>
</section>
</section>
<section class="lev1" id="ch11-5">
<title>11.5 Conclusion and Future Work</title>
<para>This research demonstrated how efficient edge AI applications can be implemented in a feasible manner. It was shown that a high degree of optimization is required to make the best use of limited computing resources. Additionally, a lightweight software stack was presented, which can be used as basis for building high level ML applications. A paradigm shift towards a more deployment driven AI development, as portrait by YOLOv5, is mandatory for making ubiquitous AI possible. The Google Coral edge TPU offers high potential for enabling real-time object detection for common video stream rates on embedded systems, however there are several pitfalls associated with the device. The limited opset requires models to be designed accordingly, which must be in the interest of the developers. Another issue is the USB2 performance. Future research must evaluate, what exactly causes this drastic slowdown. If the TPU should be used in ultra-low power segments (e.g., Micro Controller Units), USB3 will not be viable. Changing the model to reduce the amount of data flowing to and from the device could alleviate this shortcoming.</para>
</section>
<section class="lev1">
<title>Acknowledgements</title>
<para>This work has been financially supported by the AI4DI project. AI4DI receives funding within the Electronic Components and Systems For European Leadership Joint Undertaking (ESCEL JU) in collaboration with the European Union&#x2019;s Horizon 2020 Framework Programme and National Authorities, under grant agreement n<sup>&#x2218;</sup> 826060.</para>
</section>
<section class="lev1" id="ch11-Ref">
<title>References</title>
<para id="ch11-bib1">[1] AENEAS, Inside Industry Association, and EPOSS. ECS &#x2013; Strategic Research and Innovation Agenda 2022. en. Jan. 2022. URL: <ulink url="https://ecscollaborationtool.eu/publication/download/slides-ovidiu-vermesan.pdf">https://ecscollaborationtool.eu/publication/download/slides-ovidiu-vermesan.pdf</ulink> (visited on 03/31/2022).</para>
<para id="ch11-bib2">[2] G. Jocher et al. ultralytics/yolov5: v6.1 - TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference. Feb. 2022. URL: <ulink url="https://zenodo.org/record/6222936">https://zenodo.org/record/6222936</ulink> (visited on 03/30/2022).</para>
<para id="ch11-bib3">[3] A. Boschi et al. &#x201c;A Cost-Effective Person-Following System for Assistive Unmanned Vehicles with Deep Learning at the Edge&#x201d;. en. In: Machines 8.3 (Aug. 2020), p. 49.</para>
<para id="ch11-bib4">[4] M. Kraft et al. &#x201c;Autonomous, Onboard Vision-Based Trash and Litter Detection in Low Altitude Aerial Images Collected by an Unmanned Aerial Vehicle&#x201d;. en. In: Remote Sensing 13.5 (Mar. 2021),p. 965.</para>
<para id="ch11-bib5">[5] N. J. Sanket et al. &#x201c;PRGFlow: Benchmarking SWAP-Aware Unified Deep Visual Inertial Odometry&#x201d;. en. In: arXiv:2006.06753 [cs] (June 2020).</para>
<para id="ch11-bib6">[6] M. Roesler et al. &#x201c;Deploying Deep Neural Networks on Edge Devices for Grape Segmentation&#x201d;. en. In: Smart and Sustainable Agriculture. Ed. by Selma Boumerdassi, Mounir Ghogho, and <math id="bib.bib6.m1" display="inline"><mover accent="true"><mi mathvariant="normal">E</mi><mo>&#xb4;</mo></mover></math>ric Renault. Vol. 1470. Cham: Springer International Publishing, 2021, pp. 30&#x2013;43.</para>
<para id="ch11-bib7">[7] C. Resende et al. &#x201c;TIP4.0: Industrial Internet of Things Platform for Predictive Maintenance&#x201d;. en. In: Sensors 21.14 (July 2021), p. 4676.</para>
<para id="ch11-bib8">[8] S. Hosseininoorbin et al. &#x201c;Exploring Edge TPU for Network Intrusion Detection in IoT&#x201d;. en. In: arXiv:2103.16295 [cs] (Mar. 2021).</para>
<para id="ch11-bib9">[9] S. Hosseininoorbin et al. &#x201c;Exploring Deep Neural Networks on Edge TPU&#x201d;. en. In: arXiv:2110.08826 [cs] (Oct. 2021).</para>
<para id="ch11-bib10">[10] M. Alnemari and N. Bagherzadeh. &#x201c;Efficient Deep Neural Networks for Edge Computing&#x201d;. en. In: 2019 IEEE International Conference on Edge Computing (EDGE). Milan, Italy: IEEE, July 2019, pp. 1&#x2013;7.</para>
<para id="ch11-bib11">[11] M. Antonini et al. &#x201c;Resource Characterisation of Personal-Scale Sensing Models on Edge Accelerators&#x201d;. en. In: Proceedings of the First International Workshop on Challenges in Artificial Intelligence and Machine Learning for Internet of Things. New York NY USA: ACM, Nov. 2019, pp. 49&#x2013;55.</para>
<para id="ch11-bib12">[12] A. A. Asyraaf Jainuddin et al. &#x201c;Performance Analysis of Deep Neural Networks for Object Classification with Edge TPU&#x201d;. In: 2020 8th International Conference on Information Technology and Multimedia (ICIMU). Aug. 2020, pp. 323&#x2013;328.</para>
<para id="ch11-bib13">[13] A. Yazdanbakhsh et al. An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks. Feb. 2021.</para>
<para id="ch11-bib14">[14] A. Boroumand et al. &#x201c;Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks&#x201d;. en. In: arXiv:2109.14320 [cs] (Sept. 2021).</para>
<para id="ch11-bib15">[15] USB Accelerator datasheet. en-us. URL : <ulink url="https://coral.ai/docs/accelerator/datasheet/">https://coral.ai/docs/accelerator/datasheet/</ulink> (visited on 03/31/2022).</para>
<para id="ch11-bib16">[16] J. Redmon et al. &#x201c;You Only Look Once: Unified, Real-Time Object Detection&#x201d;. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June 2016, pp. 779&#x2013;788.</para>
<para id="ch11-bib17">[17] R. Girshick et al. &#x201c;Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation&#x201d;. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. June 2014, pp. 580&#x2013;587.</para>
<para id="ch11-bib18">[18] J. Redmon and A. Farhadi. &#x201c;YOLOv3: An Incremental Improvement&#x201d;. In: (Apr. 2018).</para>
<para id="ch11-bib19">[19] TensorFlow models on the Edge TPU. en-us. URL: <ulink url="https://coral.ai/docs/edgetpu/models-intro/#supported">https://coral.ai/docs/edgetpu/models-intro/#supported</ulink> &#x2013; operations (visited on 03/30/2022).</para>
<para id="ch11-bib20">[20] M. Tan, R. Pang, and Q. V. Le. &#x201c;EfficientDet: Scalable and Efficient Object Detection&#x201d;. In: arXiv:1911.09070 [cs, eess] (July 2020). arXiv: 1911.09070.</para>
<para id="ch11-bib21">[21] Models - Object Detection. en-us. URL: <ulink url="https://coral.ai/models/object-detection/">https://coral.ai/models/object-detection/</ulink>.</para>
<para id="ch11-bib22">[22] EdgeTPU optimizations by paradigmn Pull Request #6808 ultralytics/yolov5. en. URL: <ulink url="https://github.com/ultralytics/yolov5/pull/6808">https://github.com/ultralytics/yolov5/pull/6808</ulink> (visited on 03/31/2022).</para>
<para id="ch11-bib23">[23] Performance measurement &#x2014; TensorFlow Lite. en. URL : <ulink url="https://www.tensorflow.org/lite/performance/measurement">https://www.tensorflow.org/lite/performance/measurement</ulink> (visited on 03/30/2022).</para>
<para id="ch11-bib24">[24] T.-Y. Lin et al. &#x201c;Microsoft COCO: Common Objects in Context&#x201d;. en. In: Computer Vision &#x2013; ECCV 2014. Ed. by David Fleet et al. LectureNotes in Computer Science. Cham: Springer International Publishing, 2014, pp. 740&#x2013;755.</para>
<para id="ch11-bib25">[25] J. Hosang, R. Benenson, and B. Schiele. &#x201c;Learning Non-maximum Suppression&#x201d;. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). ISSN: 1063-6919. July 2017, pp. 6469&#x2013;6477.</para>
<para id="ch11-bib26">[26] W. L. et al. &#x201c;SSD: Single Shot MultiBox Detector&#x201d;. en. In: Computer Vision &#x2013; ECCV 2016. Ed. by Bastian Leibe et al. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2016, pp. 21&#x2013;37.</para>
<para id="ch11-bib27">[27] Y. Xiong et al. &#x201c;MobileDets: Searching for Object Detection Architectures for Mobile Accelerators&#x201d;. In: arXiv:2004.14525 [cs] (July 2020). arXiv: 2004.14525.</para>
<para id="ch11-bib28">[28] M. Sandler et al. &#x201c;MobileNetV2: Inverted Residuals and Linear Bottlenecks&#x201d;. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. June 2018, pp. 4510&#x2013;4520.</para>
<para id="ch11-bib29">[29] G. Bradski. &#x201c;The OpenCV Library&#x201d;. In: Dr. Dobb&#x2019;s Journal of Software Tools (2000).</para>
<para id="ch11-bib30">[30] libjpeg-turbo. original-date: 2015-07-27T07:11:54Z. Mar. 2022. URL: <ulink url="https://github.com/libjpeg-turbo/libjpeg-turbo">https://github.com/libjpeg-turbo/libjpeg-turbo</ulink> (visited on 03/31/2022).</para>
<para id="ch11-bib31">[31] C. R. Harris et al. &#x201c;Array programming with NumPy&#x201d;. en. In: Nature 585.7825 (Sept. 2020), pp. 357&#x2013;362.</para>
<para id="ch11-bib32">[32] Q. Wang et al. &#x201c;AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs&#x201d;. en. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. Denver Colorado: ACM, Nov. 2013, pp. 1&#x2013;12.</para>
</section>
</chapter>
<chapter class="chapter" id="ch12" label="12" xreflabel="12">
<title>Embedded Edge Intelligent Processing for End-To-End Predictive Maintenance in Industrial Applications</title>
<subtitle>Ovidiu Vermesan<sup>1</sup> and Marcello Coppola<sup>2</sup></subtitle>
<affiliation><sup>1</sup>SINTEF AS, Norway<?lb?><sup>2</sup>STMicroelectronics, France</affiliation>
<section class="lev1">
<title>Abstract</title>
<para>This article advances innovative approaches to the design and implementation of an embedded intelligent system for predictive maintenance (PdM) in industrial applications. It is based on the integration of advanced artificial intelligence (AI) techniques into micro-edge Industrial Internet of Things (IIoT) devices running on Arm<sup>&#xae;</sup> Cortex<sup>&#xae;</sup> microcontrollers (MCUs) and addresses the impact of a) adapting to the constraints of MCUs, b) analysing sensor patterns in the time and frequency domain and c) optimising the AI model architecture and hyperparameter tuning, stressing that hardware&#x2013;software co-exploration is the key ingredient to converting micro-edge IIoT devices into intelligent PdM systems. Moreover, this article highlights the importance of end-to-end AI development solutions by employing existing frameworks and inference engines that permit the integration of complex AI mechanisms within MCUs, such as NanoEdge<sup>TM</sup> AI Studio, Edge Impulse and STM32 Cube.AI. Both quantitative and qualitative insights are presented in complementary workflows with different design and learning components, as well as in the backend flow for deployment onto IIoT devices with a common inference platform based on Arm<sup>&#xae;</sup> Cortex<sup>&#xae;</sup>-M-based MCUs. The use case is an n-class classification based on the vibration of generic motor rotating equipment. The results have been used to lay down the foundation of the PdM strategy, which will be included in future work insights derived from anomaly detection, regression and forecasting applications.</para>
<para><emphasis role="strong">Keywords:</emphasis> predictive maintenance, smart sensors systems, industrial internet of things, industrial internet of intelligent things, vibration analysis, machine learning, deep learning architecture, edge-embedded devices.</para>
</section>
<section class="lev1" id="ch12-1">
<title>12.1 Introduction and Background</title>
<para>Leveraging AI methods and techniques at the edge is vital for increasing the performance and capabilities of the intelligent sensor systems and IIoT devices used in industrial manufacturing. For many intelligent applications, the edge AI processing concept is reflected in the emergence of different edge layers (micro-, deep-, meta-edge). The edge processing continuum includes the sensing, processing and communication devices (micro-edge) close to the physical industrial assets under monitoring, the gateways and intelligent controllers processing devices (deep-edge), and the on-premise multi-use computing devices (meta-edge). This continuum creates a multi-level structure that moves up in processing, intelligence, and connectivity capability.</para>
<para>Micro-edge devices are typically small sensors and actuators equipped with microcontrollers (MCUs) based on Arm<sup>&#xae;</sup> Cortex<sup>&#xae;</sup>-M cores (e.g., M0, M0+, M3, M4, M7) or open-source RISC-V instruction set architecture, circuits with memory, serial ports, peripherals, and wireless capabilities and designed to perform and extend the specific tasks of embedded systems.</para>
<para>Developing AI functionalities for micro-edge devices is a complex process that has increased potential in various industrial applications, including manufacturing. In industrial manufacturing, the implementation of machine learning (ML) and deep learning (DL) models on micro-edge-embedded devices has an absolute advantage for condition monitoring and PdM/prescriptive maintenance (PsM) operations for industrial motors/equipment. Using AI-enabled micro-edge devices for motors/equipment monitoring in industrial processes can prevent downtime by alerting users to perform preventative maintenance based on equipment real-time conditions.</para>
<para>There are several works that provide a comprehensive review of frameworks available in the market that currently permit the integration of complex ML and DL mechanisms within MCUs [<link linkend="ch12-bib1">1</link>] [<link linkend="ch12-bib4">4</link>].</para>
<para>This article researches and investigates different approaches to using ML and DL technologies to bring AI capabilities to micro-edge devices and applies these capabilities for classification for PdM industrial applications. The goal is to implement ML and DL techniques in low-energy systems, including sensors, to perform intelligent automated tasks, such as PdM and PsM.</para>
<para>The approaches used in this article illustrate how to optimise ML and DL models for resource-constrained micro-edge-embedded devices. The article gives an overview of the data acquisition and prediction aspects of ML and DL, discusses how to build ML and DL models targeting micro-edge devices and presents the experimental results using different tools and approaches.</para>
<para>The article is organised into five sections. The introduction on intelligent edge processing real-time maintenance systems and description of data-, model- and knowledge-driven methods for time series is included in Section 12.1. Section 12.2 describes the architecture and design of motor classification for PdM, including methods and possible end-to-end flows and presents the use case, i.e., motor classification. Section 12.3 introduces the implementation of the classification use case using three existing platforms. Section 12.4 highlights specific experiments performed and the results that were achieved through the lens of employing different tools. Section 12.5 addresses future research challenges and discusses the key open issues related to AI techniques and methods in implementing intelligent edge processing real-time maintenance systems for the purposes of industrial applications.</para>
</section>
<section class="lev1" id="ch12-2">
<title>12.2 Machine and Deep Learning for Embedded Edge Predictive Maintenance</title>
<para>For industrial manufacturing facilities using motors in the process line, the maturity of maintenance practices is a crucial determinant of the ability to operate reliably and profitably without interruption. Condition-based monitoring maintenance (CBM) addresses uptime and maintenance costs by monitoring one or several critical measurements for the motors, such as temperature, vibration, oil analysis and current, which are used as indicators of an out-of-specification condition. Maintenance tasks are performed when needed. PdM applies a more extensive set of input data and more analysis to provide a more reliable indicator of the overall health and condition of the motor as well as a more accurate prediction of a possible failure and what action should be considered to prevent it.</para>
<para>With PdM, the motors are serviced considering the actual wear and tear and service needs, reducing unexpected outages, making fewer scheduled maintenance repairs or replacements, and using fewer maintenance resources (including spare parts and supplies) while simultaneously decreasing failures. PdM provides the prerequisite foundation for PsM and autonomous maintenance (by executing actions automatically, without human intervention). PsM builds on the infrastructure and data collected for PdM, following the various corrective actions taken by maintenance personnel and the resulting outcomes.</para>
<para><link linkend="ch12-F1">Figure 12.1</link> illustrates a typical industrial motor with a rotor, stator, bearings, and shaft as essential components for the engine&#x2019;s normal operation.</para>
<fig id="ch12-F1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 12.1:</emphasis> Industrial motor components [<link linkend="ch12-bib5">5</link>] [<link linkend="ch12-bib6">6</link>]</para></caption>
<graphic xlink:href="graphics/ch12-fig01.jpg"/>
</fig>
<para>The various components conditions and operations are possible causes that can generate anomalous behaviour, thus defining various abnormal states (classes). A large amount of historical and real-time information is required to identify, classify, and predict motor&#x2019;s possible failures. AI-based ML and DL algorithms are suitable to deal with these types of tasks.</para>
<para>This paper focuses on AI-based PdM approaches, which learn from historical and real-time data and recommend the best timing and course of action for a given set of conditions and sub conditions employing ML and DL models implemented using micro-edge-embedded devices. For example, the implementation of an ML solution into a PdM application includes several steps: data preparation, feature engineering, algorithm selection and parameter tuning.</para>
<para>The interaction between the edge IIoT devices, ML and DL have opened opportunities for new data-driven approaches for PdM solutions in industrial processes. In this paper, different techniques and tools were successfully tested using various methods based on ML and DL to predict the state of industrial motors and to detect and classify motors conditions based on trained data. The PdM monitoring has been tested on measurements performed on bench motors using computation at the micro-edge, allowing real-time acquisition, processing, and wireless communication.</para>
</section>
<section class="lev1" id="ch12-3">
<title>12.3 Approaches for Predictive Maintenance</title>
<para>AI-based PdM approaches [<link linkend="ch12-bib2">2</link>] [<link linkend="ch12-bib3">3</link>] [<link linkend="ch12-bib7">7</link>], employing ML and DL models implemented using micro-edge-embedded devices, are designed on different hardware platforms and software suites, generating embedded code, and performing learning and inference engine optimisations. Depending on the application and the frameworks and inference engines for integrating AI mechanisms within MCUs, several variants of the workflows are used.</para>
<para>This paper focuses on NanoEdge<sup>TM</sup> AI (NEAI) Studio [<link linkend="ch12-bib14">14</link>], Edge Impulse (EI) [<link linkend="ch12-bib8">8</link>] [<link linkend="ch12-bib10">10</link>] and STM32 Cube.AI [<link linkend="ch12-bib10">10</link>] [<link linkend="ch12-bib13">13</link>].</para>
<fig id="ch12-T1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 12.1:</emphasis> Frameworks and inference engines for integrating AI mechanisms within MCUs</para></caption>
<graphic xlink:href="graphics/ch12-tab01.jpg"/>
</fig>
<para><link linkend="ch12-T1">Table 12.1</link> gives an overview of the features of these frameworks, which support the workflows of ML and DL model development and deployment on microcontroller class devices. AI/ ML models work on frameworks such as Keras, ONNX, Lasagne, Caffe, Convetjs etc.</para>
<section class="lev2" id="ch12-3-1">
<title>12.3.1 Hardware and Software Platforms</title>
<para>The experiments in this paper perform the processing of various types of input data, including three-axis vibration, temperature, and device logs. The data for the experiments was collected from bench motors using a STWIN Sensor Tile Wireless Industrial Node IIoT device.</para>
<para>This micro-edge IIoT device comprises of three axis ultrawide bandwidth (DC to 6 kHz) acceleration sensor (ISM330DHCX), a 12-bit analog-to-digital converter, a user-configurable digital filter chain, a temperature sensor, and a serial peripheral interface. The micro electro mechanical systems (MEMS) vibration sensor has a selectable sensitivity (<math id="Ch12.S3.SS1.p2.m1" display="inline"><mo>&#xb1;</mo></math>2, <math id="Ch12.S3.SS1.p2.m2" display="inline"><mo>&#xb1;</mo></math>4, <math id="Ch12.S3.SS1.p2.m3" display="inline"><mo>&#xb1;</mo></math>8, or <math id="Ch12.S3.SS1.p2.m4" display="inline"><mo>&#xb1;</mo></math>16 g) and processing capabilities ensured by an Arm<sup>&#xae;</sup> Cortex<sup>&#xae;</sup>-M4 processor (120 MHz, 640 KB RAM, 2 MB Flash). The micro-edge device can be powered externally or by an internal lithium-ion battery and has BLE and Wi-Fi connectivity.</para>
<para>The design flow allows collecting or uploading training data from micro-edge devices, labelling the data, training an ML model, and launching and monitoring ML models in a production environment.</para>
<para>The PdM AI-based design flow uses the sensors and hardware platforms, software development kits (SDKs), frameworks and inference engines for integrating AI mechanisms within MCUs to generate code to be deployed on MCUs that allow running AI models in embedded systems by performing predictions at the edge. The ML and DL models deployed on the micro-edge devices become part of the firmware flashed into the MCUs.</para>
<para>A micro-edge AI processing flow is illustrated in <link linkend="ch12-F2">Figure 12.2</link>.</para>
<fig id="ch12-F2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 12.2:</emphasis> Micro-edge AI processing flow</para></caption>
<graphic xlink:href="graphics/ch12-fig02.jpg"/>
</fig>
<para>The AI-based flow uses an embedded compiler that can convert models to C/C++ to increase the efficiency of models trained on the edge platform and reduce RAM, storage usage and code size by tens of percent.</para>
</section>
<section class="lev2" id="ch12-3-2">
<title>12.3.2 Motor Classification Use Case</title>
<para>The use case analysed in this article is the classification of the state of a motor based on the vibration measurements using an accelerometer sensor from an IIoT device. The signals covering all states to be classified were collected using a built-in three-axis accelerometer (ISM330DHCX) to measure the accelerations of three orthogonal directions.</para>
<para>In general, the n-class classification of n different states uses static models with pretrained libraries.</para>
<para>The classes were defined based on conditions (motor speeds) and sub-conditions (malfunctions). The motor was operating at fixed speeds, which were divided into three classes based on various percentages of the maximum speed (50%, 75% and 100%). A malfunction of the motor (motor fan trepidations) was added to the second class to obtain a new class. The classes defined are:</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para>MOTOR_OFF: just record signals when nothing is happening</para></listitem>
<listitem><para>MOTOR_ON_NORMAL_50: the motor is running at 50% of the maximum speed</para></listitem>
<listitem><para>MOTOR_ON_NORMAL_75: the motor is running at 75% of the maximum speed</para></listitem>
<listitem><para>MOTOR_ON_NORMAL_75_B: the motor fan produces additional trepidations to the motor, while the motor is running at 75% of the maximum speed</para></listitem>
<listitem><para>MOTOR_ON_NORMAL_MAX: motor is running at maximum speed.</para></listitem>
</itemizedlist>
</section>
</section>
<section class="lev1" id="ch12-4">
<title>12.4 Experimental Setup</title>
<para>The design and implementation steps and the experimental setup of the end-to-end (E2E) classification application use two main primary flows, including NEAI Studio and EI. The former creates ML static libraries based on unsupervised algorithms, while the later employs deep neural networks (NNs) for the classification task. A third flow was branched out from EI into Python using Tensor Flow&#x2019;s Keras API, and the resulted model was fed onto STM32Cube.AI.</para>
<para>The experimental process started by collecting the vibration signals from the micro-edge IIoT device mounted on the motor, through a simple datalogger application in real-time. The recorded signals were then analysed in both the time and frequency domain, filtrated, and datasets were prepared for each flow. The classification AI models were then built in each flow, using the accelerometer spectral features (e.g., root mean square (RMS), frequency and amplitude of spectral power peaks, etc.) and optimise the performance. In the end the three models were deployed and integrated with the firmware using STM32 CubeIDE. Finally, inference classifications were run to note the performance of the implementations and deployments.</para>
<section class="lev2" id="ch12-4-1">
<title>12.4.1 Signal Data Acquisition and Pre-processing</title>
<para>Prior to acquiring the signals, a thorough analysis of the vibration patterns of the motor have been conducted, landing to the conclusion that the most suitable sampling frequency to capture vibration patterns is 1667 Hz.</para>
<para>Both NEAI and EI offer several ways to take the measurements from the sensor IIoT device directly from within their GUIs. Acquiring signals with datalogger functionality in NEAI seemed to be the most straightforward data acquisition approach as it only requires the SD card. In the experimental use case, a simple logger application was used that reads and logs the raw accelerometer sensor data directly on the serial port, so that logs can be retrieved from a computer using serial tools such as Tera Term or from the console of the integrated development environment (IDE).</para>
<para>For the three-axis accelerometer sensor, a collection of signals (split in 60% training, 20% validation and 20% test) was acquired for each of the classes, with a buffer size of 512 samples on each axis, in total 1536 values per signal. Thus, with a sampling frequency of 1667 Hz, each buffer represents a snapshot of approximately 300 milliseconds of the accelerometer temporal vibration data, which is sufficient to capture the essence of the motor vibration patterns. The vibration signals collected are visualised as shown in <link linkend="ch12-F3">Figure 12.3</link>, in both temporal and frequency plots for the accelerometer sensor Z-axis for each of the two classes.</para>
<fig id="ch12-F3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 12.3:</emphasis> Visualisation of two selected classes signals in both temporal and frequency domain with NEAI</para></caption>
<graphic xlink:href="graphics/ch12-fig03.jpg"/>
</fig>
<para>To be able to better differentiate the individual classes and thus ensuring high accuracy score, the recorded signals were processed in frequency domain. Filter settings was activated in the signals pre-processing steps. By providing filtering, only the frequencies that represent the characteristics of the motor vibration are kept, and the rest are attenuated. Filtering techniques also help to eliminate high frequency noise that interfere with the vibration signal, and eliminate frequencies for transitions between states, which would normally yield unknown class.</para>
<para>The recorded signals for each class were downloaded and then converted into a format accepted by EI, to ensure the same signals are being used for the signal processing, thus yielding similar results.</para>
<para>Till acceptable quality-labelled data sets were arrived at, several iterations were performed, and this included recording new signals without background noise, collecting/recording longer signals and even changing the categorisation of classes.</para>
</section>
<section class="lev2" id="ch12-4-2">
<title>12.4.2 Feature Extraction, ML/DL Model Selection and Training</title>
<para>Both NEAI and EI offer an automated mechanism for generating the AI model architecture and training, although the mechanisms differ since NEAI employs unsupervised algorithms, whereas EI employs DL NNs.</para>
<para>The benchmarking process for n-class classification with NEAI involves searching through a pool of ML algorithms and tests combinations of three elements: pre-processing, ML algorithms (e.g., random forest-RF, support vector machines-SVM, etc.) and hyper-parameters for each model. Each combination results in a library that is evaluated for accuracy, confidence and memory usage, and the results provide a ranking of these libraries. Accuracy reflects the library&#x2019;s ability to correctly attribute each signal to the correct class, whereas confidence reflects the library&#x2019;s ability to separate the n-classes.</para>
<para><link linkend="ch12-F4">Figure 12.4</link> shows that the top library for the PdM classification case has an accuracy of 100%, confidence 99.94%, uses the RF algorithm, and takes 6.2kB RAM and 8.3 kB Flash. 100% means that all classes are completely separated, there is no overlap.</para>
<fig id="ch12-F4" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 12.4:</emphasis> Benchmarking with NEAI</para></caption>
<graphic xlink:href="graphics/ch12-fig04.jpg"/>
</fig>
<para>In the &#x201c;Confusion Matrix&#x201d;, the 200 number means that the performance for each class is 100%, i.e., all 200 signals extracted from initial data (20% of 1000 signals) have been properly classified.</para>
<para>In the EI platform, a <emphasis>Spectral Analysis</emphasis> signal processing block was used to apply a filter, perform spectral analysis, and extract frequency and spectral power data. A useful aspect of the platform is the possibility to visualise and explore the features (<link linkend="ch12-F5">Figure 12.5</link>). The fact that the features are visually clustered is a good indication that the model can be trained to perform the classification. During the first iterations, the features overlapped to a significant degree and were intertwined, and the trained model had difficulties differentiating between classes. This problem was addressed by collecting more signals and increasing the size of the sampling signal to better capture signal patterns.</para>
<fig id="ch12-F5" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 12.5:</emphasis> Snapshots of Feature Explorer in EI based on the pre-processing block early in the process.</para></caption>
<graphic xlink:href="graphics/ch12-fig05.jpg"/>
</fig>
<para>It is also possible to calculate and visualise feature importance when generating the features, indicating how important the features are for each class compared with all other classes. RMS and peak values of vibration along the three-axis proved to be the most important features in determining the class in this case. Based on this information, the dimension reduction algorithms can be used to simplify the model by deleting the less important or redundant information from the data set to make it manageable while maintaining relevance and performance.</para>
<para>To implement the solution in EI, a classification learning block was used, which employs TensorFlow with Keras. It takes the features from <emphasis>Spectral Analysis</emphasis> signal processing block and learns to distinguish between the five classes. The strategy adopted was to start with a small deep NN model and experiment with it, i.e., two dense layers, using EI graphical user interface (GUI). Most of the experimentations have been performed around an architecture consisting of multiple dense layers and dropout layers. Convolutional layers were also included.</para>
<fig id="ch12-F6" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 12.6:</emphasis> Confusion Matrix and Data Explorer based on full training set: Correctly Classified (Green) and Misclassified (Red).</para></caption>
<graphic xlink:href="graphics/ch12-fig06.jpg"/>
</fig>
<para>At the end of the training, the model&#x2019;s performance and the confusion matrix of the validation data can be evaluated. <link linkend="ch12-F6">Figure 12.6</link> shows an accuracy and a loss on the training and validation datasets, comparable with the results obtained with NEAI with a different model architecture. To avoid overfitting, the learning rate was reduced, and more data was collected, and the model was re-trained.</para>
</section>
<section class="lev2" id="ch12-4-3">
<title>12.4.3 Optimisation and Tuning Performance</title>
<para>Developing the most efficient ML/DL flows for the classification PdM application was challenging. It required many iterative experiments and insights into the workings of motor vibration patterns, digital signal processing, AI algorithms, architectures, and microcontrollers. Nevertheless, both NEAI and EI provided automation and transparency for these processes, though to varying degrees.</para>
<para>For the NEAI classification, the learning is fixed at library generation based on the data provided for each class. The benchmarking implementation includes patented elements; thus, the internal working of the engine is not transparent. Nevertheless, multiple benchmarks can be created, and a high degree of automation allows for the best results to be obtained from signal capturing and formatting. The benchmarking process takes around 60 minutes when running on a processing unit with 6 CPU cores.</para>
<para>EI offers a higher degree of transparency and control over the model architecture and hyperparameters. The strategy adopted for the case of EI was to start from a simple model, experiment with it and improve it into a deeper and wider model. For this improvement step and for validation purposes, a parallel sub-flow was branched out from the flow with EI to conduct experiments in a Python framework. The training was launched in both EI and Python and compared throughout. The updated architecture and hyperparameters were exchanged back and forth between the EI and Python frameworks.</para>
<para>The improvements consisted in making the model deeper by adding more layers, and wider by increasing the number of hidden units, changing the activation and optimisation functions, learning rate, fitting more data.</para>
<para>While the improvement process was run manually in Python, the EI&#x2019;s Edge Optimized Neural (EON<sup>TM</sup>) Compiler [<link linkend="ch12-bib9">9</link>] can be used to find the best solution for the Arm<sup>&#xae;</sup> Cortex<sup>&#xae;</sup>-M-based MCUs, i.e., the most optimal combination of processing block and ML model for the given set of constraints, including latency, RAM usage, and accuracy. Currently, there are a limited number of MCUs that are supported and does not include the MCU of STWIN IIoT device (Arm<sup>&#xae;</sup> Cortex<sup>&#xae;</sup>-M4 MCU STM32L4R9), which operates at a frequency of up to 120MHz. Nevertheless, the estimated on-device performance could be evaluated for Cortex-M4F 80MHz, to determine the impact of optimisations such as quantisation across different slices of the datasets (<link linkend="ch12-F7">Figure 12.7</link>).</para>
<fig id="ch12-F7" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 12.7:</emphasis> A comparison between int8 quantized and unoptimized versions of the same model, showing the difference in performance and results.</para></caption>
<graphic xlink:href="graphics/ch12-fig07.jpg"/>
</fig>
</section>
<section class="lev2" id="ch12-4-4">
<title>12.4.4 Testing</title>
<para>ML/DL model testing usually refers to the evaluation of the trained model on the testing dataset to analyse how well the model performs against unseen data. However, model testing in NEAI and EI provide more than that. Both platforms provide a microcontroller emulator to test and debug the generated model prior to its deployment on the device.</para>
<para>As part of the NEAI toolkit, a microcontroller emulator is provided for each library to test and debug the generated model without the need to download, link or compile. Test signals can be imported from file; however, the signals were imported live from the same datalogger application through serial port, in this way ensuring completely new signals, not seen before. The classification is automatically run using the live signals, while changing motor speeds and triggering shaft disturbances, to switch between classes and cover all five states and classes.</para>
<para>The results are presented in <link linkend="ch12-F8">Figure 12.8</link>, showing that the classifier managed to properly reproduce and detect all classes with reasonable certainty percentages.</para>
<fig id="ch12-F8" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 12.8:</emphasis> Evaluation of trained model using NEAI Emulator with live streaming.</para></caption>
<graphic xlink:href="graphics/ch12-fig08.jpg"/>
</fig>
<para>In EI, the trained model was evaluated by assessing the accuracy using the test dataset. To ensure unbiased evaluation of model effectiveness, the test samples were not used directly or indirectly during training. The EI emulator took care of extracting the features from the test set, running the trained model, and reporting the performance in the confusion matrix. The results are shown in <link linkend="ch12-F9">Figure 12.9</link>.</para>
<fig id="ch12-F9" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 12.9:</emphasis> EI model testing with test datasets.</para></caption>
<graphic xlink:href="graphics/ch12-fig09.jpg"/>
</fig>
</section>
<section class="lev2" id="ch12-4-5">
<title>12.4.5 Deployment</title>
<para>In the context of micro-edge embedded systems, model deployment is dependent on the hardware/software platform and is more or less automated, and in essence comprises three steps: the first is a format conversion of the fully trained model; the second is a weight/model compression to reduce the amount of memory to store the weights in the target hardware platform and to simplify the computation so it can run efficiently on target processors. The third entails compiling the model and generating the code to be integrated with the MCUs firmware.</para>
<para>The back-end flow consists of wrapping an STM32CubeIDE project with the generated files from the three deployed models, adding functionality on top such as retrieving the accelerometer values to be fed to the classification function and displaying the result, then compiled, built, and flashed onto the MCU target.</para>
<para>The flow exhibits some particularities in the case of the three model deployments.</para>
<para>In the case of NEAI, the selected model is deployed in the form of a static library (libneai.a), an AI header file (NanoEdgeAI.h) containing functions and variable definitions, and a knowledge header file (knowledge.h) containing the model&#x2019;s knowledge. In this case, first the knowledge was initialised, then the NanoEdge AI classifier was run, and the output was print to the serial port.</para>
<para>For the EI deployment, the CMSIS-PACK [<link linkend="ch12-bib11">11</link>] [<link linkend="ch12-bib12">12</link>] for STM32 packaged all signal processing blocks, configuration and learning blocks up into a single library (.pack file), which was then added to the STM32 project using the CubeMX packages manager. This is currently only supported for C++ applications using CubeIDE.</para>
<para>The third flow was branched out from EI and further developed in a Python framework using TensorFlow&#x2019;s Keras API. The resulted model was converted into optimised C code with STM32 Cube.AI, an extension of the CubeMX tool, which offers simple and efficient interoperability with other ML frameworks.</para>
</section>
<section class="lev2" id="ch12-4-6">
<title>12.4.6 Inference</title>
<para>Inference classifications have been conducted with all applications running directly from the target hardware platform on the micro-edge IIoT devices, producing classification in real-time.</para>
<para>The state machine consists mainly of two states with two functions &#x201c;init&#x201d; and &#x201c;inferencing&#x201d;, respectively, with the former initialising the deep NN model and the latter being a continuously running function for collecting raw data from the sensors on the micro-edge IIoT device and making classifications in real-time. A snapshot from the classification based on the NEAI model is shown in <link linkend="ch12-F10">Figure 12.10</link>.</para>
<fig id="ch12-F10" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 12.10:</emphasis> Live classification streaming with detected state and confidence (with Tera Term)</para></caption>
<graphic xlink:href="graphics/ch12-fig10.jpg"/>
</fig>
<para>The &#x201c;?&#x201d; indicate the state switching, which happens after several consecutive confirmations of inference result is encounter, and this number is programmable.</para>
</section>
</section>
<section class="lev1" id="ch12-5">
<title>12.5 Discussion and Future Work</title>
<para>Embedding trained models into the firmware code enables AI/ML capabilities of intelligent edge devices. Employing different frameworks that permit the integration of complex AI mechanisms within MCUs - such as NEAI Studio, EI and STM32 Cube.AI - for deploying AI-based PdM solutions into micro-edge embedded devices provides designers with the flexibility to optimise implementation by experimenting with deployment on the same hardware platform target using several frameworks and inference engines. The different workflows can be matched to the PdM application requirements for generating embedded code and performing learning and inference engine optimisations.</para>
<para>ML and NNs can now be efficiently deployed on resource-constrained devices, which enable cost-efficient deployment, widespread availability, and the preservation of sensitive data in PdM applications. However, the trade-offs associated with optimisation methods, software frameworks and hardware architecture on performance metrics, such as inference latency and energy consumption, are yet to be studied and researched in depth.</para>
<para>This preliminary work allowed for the exploration of different scenarios to evaluate trade-offs between computational cost and performance on actual classification tasks, laying the foundation for further investigations of more complex PdM systems using various AI-based techniques. Future work will aim to enlarge comparison and benchmarking by considering more edge ML and DL technologies, workflows, and datasets. A more generic and complete PdM strategy must include insights from other applications, such as anomaly detection, regression, and forecasting.</para>
</section>
<section class="lev1">
<title>Acknowledgements</title>
<para>This work is conducted under the framework of the ECSEL AI4DI &#x201c;Artificial Intelligence for Digitising Industry&#x201d; project. The project has received funding from the ECSEL Joint Undertaking (JU) under grant agreement No 826060. The JU receives support from the European Union&#x2019;s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Italy, Latvia, Belgium, Lithuania, France, Greece, Finland, Norway.</para>
</section>
<section class="lev1" id="ch12-Ref">
<title>References</title>
<para id="ch12-bib1">[1] R. Sanchez-Iborra and A.F. Skarmeta, &#x201c;TinyML-Enabled Frugal Smart Objects: Challenges and Opportunities,&#x201d; in <emphasis>IEEE Circuits and Systems Magazine</emphasis>, vol. 20, no. 3, pp. 4-18, third quarter 2020. <ulink url="https://doi.org/10.1109/MCAS.2020.3005467">https://doi.org/10.1109/MCAS.2020.3005467</ulink></para>
<para id="ch12-bib2">[2] T. Hafeez, L. Xu and G. Mcardle, &#x201c;Edge Intelligence for Data Handling and Predictive Maintenance in IIoT,&#x201c; in <emphasis>IEEE Access</emphasis>, Vol. 9, pp. 49355-49371, 2021. <ulink url="https://doi.org/10.1109/ACCESS.2021.3069137">https://doi.org/10.1109/ACCESS.2021.3069137</ulink></para>
<para id="ch12-bib3">[3] Y. Liu, W. Yu, T. Dillon, W. Rahayu and M. Li, &#x201c;Empowering IoT Predictive Maintenance Solutions With AI: A Distributed System for Manufacturing Plant-Wide Monitoring,&#x201c; in <emphasis>IEEE Transactions on Industrial Informatics</emphasis>, vol. 18, no. 2, pp. 1345-1354, Feb. 2022. <ulink url="https://doi.org/10.1109/TII.2021.3091774">https://doi.org/10.1109/TII.2021.3091774</ulink></para>
<para id="ch12-bib4">[4] H. Wang, H. Sayadi, S.M. Pudukotai Dinakarrao, A. Sasan, S. Rafatirad and H. Homayoun, &#x201c;Enabling Micro AI for Securing Edge Devices at Hardware Level,&#x201c; in <emphasis>IEEE Journal on Emerging and Selected Topics in Circuits and Systems</emphasis>, vol. 11, no. 4, pp. 803-815, Dec. 2021. <ulink url="https://doi.org/10.1109/JETCAS.2021.3126816">https://doi.org/10.1109/JETCAS.2021.3126816</ulink></para>
<para id="ch12-bib5">[5] F. Cipollini, L. Oneto, A. Coraddu, et al. &#x201c;Unsupervised Deep Learning for Induction Motor Bearings Monitoring&#x201d;. Data-Enabled Discov. Appl. 3, 1, 2019. <ulink url="https://doi.org/10.1007/s41688-018-0025-2">https://doi.org/10.1007/s41688-018-0025-2</ulink></para>
<para id="ch12-bib6">[6] M. Guenther. 6 Ways to Improve Electric Motor Lubrication for Better Bearing Reliability. Available online at: <ulink url="https://blog.chesterton.com/lubrication-maintenance/improving-electric-motor-lubricaiton/">https://blog.chesterton.com/lubrication-maintenance/improving-electric-motor-lubricaiton/</ulink></para>
<para id="ch12-bib7">[7] C. Kammerer, M. Gaust, M. K<math id="bib.bib7.m1" display="inline"><mover accent="true"><mi mathvariant="normal">u</mi><mo>&#xa8;</mo></mover></math>stner, P. Starke, R. Radtke, and A. Jesser, &#x201c;Motor Classification with Machine Learning Methods for Predictive Maintenance,&#x201c; <emphasis>IFAC-PapersOnLine</emphasis>, vol. 54, no. 1, pp. 1059&#x2013;1064, 2021. <ulink url="https://doi.org/10.1016/j.ifacol.2021.08.126">https://doi.org/10.1016/j.ifacol.2021.08.126</ulink></para>
<para id="ch12-bib8">[8] Edge Impulse. Available online at: <ulink url="https://www.edgeimpulse.com">https://www.edgeimpulse.com</ulink></para>
<para id="ch12-bib9">[9] EON Tuner. Available online at: <ulink url="https://docs.edgeimpulse.com/docs/eon-tuner">https://docs.edgeimpulse.com/docs/eon-tuner</ulink></para>
<para id="ch12-bib10">[10] J. Jongboom, 2020. &#x201c;Learning for all STM32 developers with STM32Cube.AI and Edge Impulse&#x201d;. Available online at: <ulink url="https://www.edgeimpulse.com/blog/machine-learning-for-all-stm32-developers-with-stm32cube-ai-and-edge-impulse">https://www.edgeimpulse.com/blog/machine-learning-for-all-stm32-developers-with-stm32cube-ai-and-edge-impulse</ulink></para>
<para id="ch12-bib11">[11] ARM-NN. 2020. Available online at: <ulink url="https://github.com/ARM-software/armnn">https://github.com/ARM-software/armnn</ulink></para>
<para id="ch12-bib12">[12] CMSIS-NN. 2020. Available online at: <ulink url="https://arm-software.github.io/CMSIS_5/NN/html/">https://arm-software.github.io/CMSIS_5/NN/html/</ulink></para>
<para id="ch12-bib13">[13] STM32Cube.AI 2020. Available online at: <ulink url="https://www.st.com/en/embedded-software/x-cube-ai.html">https://www.st.com/en/embedded-software/x-cube-ai.html</ulink></para>
<para id="ch12-bib14">[14] NanoEdge<sup>TM</sup> AI Studio. Automated Machine Learning (ML) tool for STM32 developers. Available online at: <ulink url="https://www.st.com/en/development-tools/nanoedgeaistudio.html">https://www.st.com/en/development-tools/nanoedgeaistudio.html</ulink></para>
</section>
</chapter>
<chapter class="chapter" id="ch13" label="13" xreflabel="13">
<title>AI-Driven Strategies to Implement a Grapevine Downy Mildew Warning System</title>
<subtitle>Luiz Angelo Steffenel<sup>1</sup>, Axel Langlet<sup>1</sup>, Lilian Hollard<sup>1</sup>, Lucas Mohimont<sup>1</sup>, Nathalie Gaveau<sup>1</sup>, Marcello Copola<sup>2</sup>, Cl&#x00E9;ment Pierlot<sup>3</sup>, and Marine Rondeau<sup>3</sup></subtitle>
<affiliation><sup>1</sup>Universit&#xe9; de Reims Champagne Ardenne, France<?lb?><sup>2</sup>STMicroelectronics, France<?lb?><sup>3</sup>Vranken-Pommery Monopole, France</affiliation>
<section class="lev1">
<title>Abstract</title>
<para>In this paper, we assess the usage of machine learning techniques to predict the infection events of Downy Mildew. Every year, Champagne vineyards are exposed to grapevine diseases that affect the plants and fruits, most caused by fungi. Using data from an agro-meteorological station, we compare machine learning performances against traditional prediction methods for Downy Mildew (<emphasis>Plasmopara viticola</emphasis>) infections. Indeed, depending on the year, we obtain 82 to 97% accuracy for primary infections and 98% for secondary infections. These results may guide the development of Edge AI applications integrated to meteorological stations and agricultural sensors,and help winegrowers to rationalize the vine&#x2019;s treatment, limiting the damages and the usage of fungicide or chemical products.</para>
<para><emphasis role="strong">Keywords:</emphasis> artificial intelligence, Downy Mildew, CNN, random forest, SVM.</para>
</section>
<section class="lev1" id="ch13-1">
<title>13.1 Introduction</title>
<para>Every year, Champagne vineyards are exposed to grapevine fungal diseases that affect the plants and fruits. Black rot (<emphasis>Guignardia bidwellii</emphasis>), Downy mildew (<emphasis>Plasmopara viticola</emphasis>), Powdery mildew (<emphasis>Erysiphe necator</emphasis>), and Graymold (<emphasis>Botrytis cinerea</emphasis>) are examples of diseases that can affect grape quality and hinder the productivity. Each fungus develops under certain environmental conditions and detecting favourable conditions for the spread of the diseases may lead to proactive actions to prevent its dissemination.</para>
<para>In the specific case of the Downy Mildew caused by <emphasis>Plasmopara viticola</emphasis>, there are two cycles of infestation that affect the grapevine. The first one is caused by sexual spores (called <emphasis>primary infections</emphasis>) and the second one by the dissemination of asexual (<emphasis>secondary infections</emphasis>) [<link linkend="ch13-bib4">4</link>].</para>
<para>The mechanical identification of the fungus development cycles and their forecast has already been the subject of several works, including [<link linkend="ch13-bib8">8</link>] [<link linkend="ch13-bib5">5</link>] or [<link linkend="ch13-bib7">7</link>]. Indeed, several of these works define algorithms to identify the primary or secondary infection events using a combination of weather and ground observed variables, which led to the creation of decision-support systems for the vine-growers. However, these algorithms are limited to strict input parameters, which are not always available, and do not explore the potential of hidden correlations with other data variables such as dew point, cloud coverage or vapor pressure deficit.</para>
<para>Artificial intelligence, on the other side,relies only on the dataset rather than on models. It uses computing power to expand the search for patterns and correlations among a broader and richer dataset, often reaching similar or better results than existing models.</para>
<para>Despite its potential, artificial intelligence has been rarely used to identify Downy Mildew infections. Among the precursor works, we can cite Chen et al. [<link linkend="ch13-bib3">3</link>], which applied several regression models as well as random forest and gradient boost to predict severe infection events in the Bordeaux vineyard. Volpi et al. [<link linkend="ch13-bib9">9</link>] also use decision trees and random forests to identify different diseases in Tuscany, Italy, but relying on meteorological data from ERA5-Land instead of in-site sensors.</para>
<para>Interestingly, artificial intelligence is more used to monitor crops through image systems rather than weather sensors. For instance, [<link linkend="ch13-bib1">1</link>] [<link linkend="ch13-bib2">2</link>] use image recognition techniques to identify the intensity of the infections on watermelon or squash crops using hyperspectral images from aerial views. Another work [<link linkend="ch13-bib6">6</link>] uses Convolutional Neural Networks to detect <emphasis>Plasmopara viticola spores</emphasis> in microscopic images.</para>
<para>In this paper, we explore the interest of using machine learning techniques to identify Downy Mildew infections using datasets obtained from regular agro-meteorological sensors. Our aim is both to identify the most efficient and robust methods and to prepare the path to their implementation on Edge AI devices deployed directly on the vineyards.</para>
<para>The remainder of this paper is organized as follows: Section 13.2 presents the datasets and research methodology used in this work. Section 13.3 introduces the different machine learning techniques used in this work, as well as their implementation specifications. In Section 13.4 we present a comparative study of machine learning strategies, aiming at their accuracy as well as their robustness over the years. Section 13.5 goes beyond the simple results by discussing the impact of AI-based algorithms on the monitoring of crops. Finally, Section 13.6 concludes this work.</para>
</section>
<section class="lev1" id="ch13-2">
<title>13.2 Research Material and Methodology</title>
<section class="lev2" id="ch13-2-1">
<title>13.2.1 Datasets</title>
<para>The data used in this paper was obtained from a Prom<math id="Ch13.S2.SS1.p1.m1" display="inline"><mover accent="true"><mi mathvariant="normal">e</mi><mo>&#xb4;</mo></mover></math>t<math id="Ch13.S2.SS1.p1.m2" display="inline"><mover accent="true"><mi mathvariant="normal">e</mi><mo>&#xb4;</mo></mover></math> AGRI-300 weather station installed at &#x201c;Moulin de la Housse&#x201d; vineyard from Vranken-Pommery group in Reims<sup>1</sup>. This station provides hourly readings from several features of interest:</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para>Wind speed [Km/h] (max, average)</para></listitem>
<listitem><para>Wind gust [Km/h] (max)</para></listitem>
<listitem><para>Relative humidity [%] (max, min, average)</para></listitem>
<listitem><para>Pluviometry [l/m<sup>2</sup>]</para></listitem>
<listitem><para>Leaf wetting duration [min]</para></listitem>
<listitem><para>Dew point [C] (min, average)</para></listitem>
<listitem><para>Solar radiation [W/m<sup>2</sup>] (average)</para></listitem>
<listitem><para>Air temperature [C] (max, min, average)</para></listitem>
<listitem><para>Vapor press deficit [kPa] (min, average)</para></listitem>
</itemizedlist>
<para>More than 20k entries were recorded for each feature from 2019 to 2021, except for the Leaf wetting duration that could only be recorded in 2019/2020 as the sensor stop working in February 2021.</para>
<para>The presented machine learning approaches are implemented, optimized and evaluated on a Nvidia DGX1 server that includes eight Tesla V100 GPUs connected through an NVlink network supporting up to 40 GB/s bidirectional bandwidth. Regarding programming tools, we have implemented our approaches using the Python language with scikit-learn, Tensorflow and Keras libraries.</para>
</section>
<section class="lev2" id="ch13-2-2">
<title>13.2.2 Labelling Methodology</title>
<para>To train machine learning models to identify Mildew favourable situations, we adopted a supervised learning approach. To label the training dataset, we applied the algorithms proposed by [<link linkend="ch13-bib7">7</link>]. Two different Mildew infection alert situations are identified in that work, each one with strict requirements. Hence, primary infections are related to the conditions for winter spores&#x2019; germination, which may occur when the average daily temperature exceeds 10 <math id="Ch13.S2.SS2.p1.m1" display="inline"><msup><mi></mi><mo>&#x2218;</mo></msup></math>C and the precipitation within the last 48h reaches 10 mm (called &#x201c;3-10&#x201d; flag). If rainfall or gentle breeze (i.e., wind of speed greater than 3.4m/s) occurs at night within the following 48h, primary infection has presumably occurred, causing the start of the incubation period of <emphasis>Plasmopora viticola</emphasis>. <link linkend="ch13-F1">Figure 13.1</link> schematizes this algorithm.</para>
<para>Second mildew infections may happen when the incubation period from the first infection has been completed. It depends on favourable night conditions (FNCs) conditions where the weather is humid (relative humidity (RH) <math id="Ch13.S2.SS2.p2.m1" display="inline"><mo>&gt;</mo></math>80%), and the temperature is higher than 12<math id="Ch13.S2.SS2.p2.m2" display="inline"><msup><mi></mi><mo>&#x2218;</mo></msup></math>C for at least 2h. In such case, the secondary infection warning is raised if we also observe more than 2h of uninterrupted leaf wetness (LW) and average temperature (T) above 10<math id="Ch13.S2.SS2.p2.m3" display="inline"><msup><mi></mi><mo>&#x2218;</mo></msup></math>C, with precipitation or strong wind that can increase spore spread. <link linkend="ch13-F2">Figure 13.2</link> schematizes this algorithm.</para>
<para>Thanks to these two algorithms, we create two binary labels, one for primary alert and the other for secondary alert, used in independent classification models. These labels are only used during the training phase, as our objective is to obtain accurate predictions based on the raw input data from the weather station sensors.</para>
<fig id="ch13-F1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 13.1:</emphasis> Algorithm for primary infection alarms [<link linkend="ch13-bib7">7</link>]</para></caption>
<graphic xlink:href="graphics/ch13-fig01.jpg"/>
</fig>
<fig id="ch13-F2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 13.2:</emphasis> Algorithm for secondary infection alarms [<link linkend="ch13-bib7">7</link>]</para></caption>
<graphic xlink:href="graphics/ch13-fig02.jpg"/>
</fig>
</section>
</section>
<section class="lev1" id="ch13-3">
<title>13.3 Machine Learning Models</title>
<para>This section presents different strategies to model the Downy Mildew warning system using machine-learning techniques. As presented in Section 13.2, our dataset covers three years (2019-2021) and includes several features directly related to the algorithms from [<link linkend="ch13-bib7">7</link>] such as temperature, relative humidity, pluviometry, wind speed or leaf wetness. Other algorithms variables were adapted from existing data, so the absence of solar radiation (provided by the weather station) was used as an indicator for night time instead of a calculation based solely on the date.</para>
<para>We deliberately kept other variables not cited in the original algorithms, such as the dew point and the vapor press deficit. As stated before, our aim is to explore potential correlations with additional variables. Similarly, we do not compare the accuracy with the real risks in the vineyard but only with the expected labels. Performing such comparison requires on-site evaluation and a separate tagging from a human operator, which is part of our future works.</para>
<para>Another point to consider is how to enter the dataset as alerts depend on historical events from at least the last 48h. Instead of using mode complex time-series models such as LSTM or GRU, we chose to feed the algorithms with a concatenation of the features recorded in the last 48h. This approach allows us to express the problem in a simpler way that can be approached using a wider range of machine learning techniques, including some best adapted to constrained environments such as those in a Edge AI scenario.</para>
<para>As a result, we model the problem as a binary classification problem, i.e., for each level of infection alert (primary or secondary), we create separated &#x201c; alert&#x201d;/&#x201c;not alert&#x201d; labels. We decided to split it into two binary classification problems instead of a multi-class classification problem to favour each alert type&#x2019;s accuracy. Henceforth, we choose to compare five well-known binary classification techniques:</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para>Decision trees</para></listitem>
<listitem><para>Random forest</para></listitem>
<listitem><para>Support Vector Machines (SVM)</para></listitem>
<listitem><para>Dense Neural Networks (DNN)</para></listitem>
<listitem><para>Convolutional Neural Networks (CNN)</para></listitem>
</itemizedlist>
<para>Decision Trees and Support Vector Machine predictors use the basic <emphasis>scikit-learn</emphasis> implementation (<emphasis>DecisionTreeClassifier</emphasis> and <emphasis>SVC</emphasis>, respectively) with no additional optimisation. Random Forests (<emphasis>RandomForestClassifier</emphasis>) were trained with the parameter <emphasis>n iterators = 1000</emphasis>.</para>
<para>The Dense Neural Network implemented in Keras using seven Dense layers with respectively 200, 100, 100, 50, 50,10, and 2 outputs. Activator <emphasis>ReLU</emphasis> was used in all but the last layer (<emphasis>None</emphasis>); the model was compiled with <emphasis>Binary Crossentropy (from logits=True)</emphasis> loss, Adam optimizer, and <emphasis>Binary Accuracy</emphasis> metrics.</para>
<para>Finally, the Convolutional Neural Network was implemented in Keras, using at the input two conv2D layers (32 and 64 outputs, respectively), with 3x3 padding, 0.2 dropout and ReLU activation. Once flatted, a Dense network with 100 outputs, 0.5 dropout and ReLU activation sits just before a final Dense network with 2 outputs (Sigmoid activation).</para>
<para>As the dataset only covers three years, we adopted a cross-validation approach where, for each technique, we generated a different model for each respective year (2019, 2020 or 2021). Therefore, each model was trained only with the data from its own year, split into 90<math id="Ch13.S3.p9.m1" display="inline"><mo>%</mo></math> training and 10<math id="Ch13.S3.p9.m2" display="inline"><mo>%</mo></math> testing parts (randomly shuffled) and later submitted to cross-validation against the other years. Not only the cross-validation helps identifying the most robust model but also allows to investigate the impact of the 2021 weather profile, which differed from the two previous ones due to several climatic events (early crop freeze, rainy weather) that favoured the spread of diseases and led to a massive reduction in crop production and quality.</para>
</section>
<section class="lev1" id="ch13-4">
<title>13.4 Results</title>
<section class="lev2" id="ch13-4-1">
<title>13.4.1 Primary Mildew Infection Alerts</title>
<para>As stated above, we create three different training-validation datasets, one for each year. Therefore, <link linkend="ch13-T1">Table 13.1</link> compares the accuracy score from the 2019&#x2019;s model when applied to 2020 and 2021. The best scores are presented in bold, showing that two techniques detach from the others: CNN and SVM. CNN shows slight better scores in the 2021 dataset but is closely followed by SVM.</para>
<para>In the case of the 2020&#x2019;s model, Random Forest and SVM perform well for the 2019 case, and almost all techniques (except simple Decision Tree) present similar results for the 2021 case (see <link linkend="ch13-T2">Table 13.2</link>). Finally, the 2021&#x2019;s model Random Forest seems the best technique for the 2019 dataset, while SVM is better in the case of the 2020 dataset (<link linkend="ch13-T3">Table 13.3</link>). We can, however, point out that Random Forest achieves good results in this latter case, even if not as good as the SVM scores. If the&#x201c; best&#x201d; technique varies from year to year, both SVM and CNNs show robust results, closely followed by Random Forest.The choice reposes therefore in the computing capabilities available to the devices.</para>
<para>We can also see that 2021 was different from the previous ones. If models from 2019 or 2020 achieve lower scores when predicting 2021 alerts, we can also say that models trained with 2021 data are among the best ones when predicting alerts for the previous years. This was somehow expected, as 2021 was rich in favourable events for spreading diseases in the vineyard.</para>
<fig id="ch13-T1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 13.1:</emphasis> Accuracy of 2019 Primary Infection Models</para></caption>
<graphic xlink:href="graphics/ch13-tab01.jpg"/>
</fig>
<fig id="ch13-T2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 13.2:</emphasis> Accuracy of 2020 Primary Infection Models</para></caption>
<graphic xlink:href="graphics/ch13-tab02.jpg"/>
</fig>
<fig id="ch13-T3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 13.3:</emphasis> Accuracy of 2021 Primary Infection Models</para></caption>
<graphic xlink:href="graphics/ch13-tab03.jpg"/>
</fig>
</section>
<section class="lev2" id="ch13-4-2">
<title>13.4.2 Secondary Mildew Infection Alerts</title>
<para>As the meteorological station stopped recording leaf wetness in February 2021, we could not tag Secondary Mildew Infections on the 2021 dataset. Nonetheless, we compare both 2019 and 2021 models in cross-validation, as we previously did for the Primary Mildew Infection.</para>
<para>Hence, <link linkend="ch13-T4">Table 13.4</link> condenses the results from all machine learning techniques when cross validating each year&#x2019;s models. Secondary Infection alerts seem much easier to identify, with higher accuracy scores. Unfortunately, the absence of a 2021 dataset does not allow a broader comparison under different weather conditions (2021 presented the lowest accuracy in the Primary Alert experiments).</para>
<fig id="ch13-T4" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 13.4:</emphasis> Accuracy of 2021 Primary Infection Models</para></caption>
<graphic xlink:href="graphics/ch13-tab04.jpg"/>
</fig>
<para>Once again, CNN presents the highest accuracy scores, closely followed by SVM and Random Forest. Indeed, we shall point-out that SVN and Random Forest are good candidates when considering the implementation on environments with performance restrictions, such as in the case of IoT/Edge AI.</para>
</section>
</section>
<section class="lev1" id="ch13-5">
<title>13.5 Discussion</title>
<para>The results obtained here are encouraging but shall be considered in the context of the reduced span of the dataset gathered from a single agro-meteorological station installed since 2019. A deeper analysis would require several years of data, as performed by [<link linkend="ch13-bib3">3</link>] or [<link linkend="ch13-bib9">9</link>].</para>
<para>However, our main objective was to conceive a proof of concept inscribed in the efforts of the European project AI4DI to develop and disseminate an environmental monitoring system based on different industrial sensors (e.g., TEROS, Bosch BME68x, ST Microelectronics) connected to STM32WL enhanced by a machine learning core. These sensors are expected to enable continuous monitoring of the environment, the soil, meteorological conditions, and/or plant performances.</para>
<para>Besides implementing AI models on the STM32WL, some sensors can also be enriched with a machine learning core. This is the case of the LSM6D SOX sensor from ST Microelectronics, which comprises a set of configurable parameters and decision trees able to run AI algorithms in the sensor itself. Hence, this environment would benefit from simpler models such as random forest and SVM, rather than CNN.</para>
<para>Today, while many agricultural weather meteorological stations are available on the market, innovation comes from implementing Edge AI directly on the sensors or, in some cases, in the gateways. Therefore, the current work represents a primary effort to identify good and robust models that could be deployed in an edge AI environment.</para>
</section>
<section class="lev1" id="ch13-6">
<title>13.6 Conclusion</title>
<para>Every year, Champagne vineyards are exposed to grapevine diseases that affect the plants and fruits, and the Downy Mildew, caused by <emphasis>Plasmopara viticola</emphasis> is a common disease. Forecasting the infection events of Downy Mildew may help vine growers to rationalize the treatment of the vine, limiting the damages and the usage of fungicide or chemical products.</para>
<para>In this paper, we compare the accuracy of several machine learning techniques when applied to datasets from the Champagne region. By creating multiple models and using cross-validation across different years, we were able to identify three candidate techniques with close results, namely Convolutional Neural Networks, Support Vector Machines and Random Forest.</para>
<para>If CNN seems to be more robust across different years, the accuracy difference is minimal,and the other techniques present an interest in the case of deployment over an Edge AI infrastructure. Indeed, we aim to prepare the path to the implementation of Downy Mildew forecast models on Edge AI sensing devices that will be deployed directly on the vineyards to closely monitor the crops.</para>
</section>
<section class="lev1">
<title>Acknowledgements</title>
<para>This work has been performed in the project AI4DI: Artificial Intelligence for Digitizing Industry, under grant agreement No 826060. The project is cofunded by grants from Germany, Austria, Finland, France, Norway, Latvia, Belgium, Italy, Switzerland, and the Czech Republic and - Electronic Component Systems for European Leadership Joint Undertaking (ECSEL JU).</para>
<para>We want to thank Vranken-Pommery Monopole for providing the datasets for the training. We also thank the ROMEO Computing Center<sup>2</sup> of Universit&#xe9; de Reims Champagne Ardenne, whose Nvidia DGX-1 server allowed us to accelerate the training steps and compare several model approaches.</para>
</section>
<section class="lev1" id="ch13-Ref">
<title>References</title>
<para id="ch13-bib1">[1] J. Abdulridha, Y. Ampatzidis, J. Qureshi, and P. Roberts. Identification and classification of downy mildew severity stages in watermelon utilizing aerial and ground remote sensing and machine learning. <emphasis>Frontiers in Plant Science</emphasis>, 13, 2022.</para>
<para id="ch13-bib2">[2] J. Abdulridha, Y. Ampatzidis, P. Roberts, S. C. Kakarla. Detecting powdery mildew disease in squash at different stages using UAV-based hyperspectral imaging and artificial intelligence. <emphasis>Biosystems Engineering</emphasis>, 197:135&#x2013;148, 2020.</para>
<para id="ch13-bib3">[3] M. Chen, F. Brun, M. Raynal, and D. Makowski. Forecasting severe grape downy mildew attacks using machine learning. <emphasis>PLOS ONE</emphasis>, 15:1&#x2013;20, 03 2020.</para>
<para id="ch13-bib4">[4] C. Gessler, I. Pertot, and M. Perazzolli. Plasmopara viticola: A review of knowledge on downy mildew of grapevine and effective disease management. <emphasis>PhytopathologiaMediterranea</emphasis>, 50:3&#x2013;44, 04 2011.</para>
<para id="ch13-bib5">[5] E. Gonzalez-Dom&#xed;nguez, T. Caffi, N. Ciliberti, and V. Rossi. A mechanistic model of botrytis cinerea on grapevines that includes weather, vine growth stage, and the main infection pathways. <emphasis>PLOS ONE</emphasis>, 10(10):1&#x2013;23, 10 2015.</para>
<para id="ch13-bib6">[6] I. Hern<math id="bib.bib6.m1" display="inline"><mover accent="true"><mi mathvariant="normal">a</mi><mo>&#xb4;</mo></mover></math>ndez, S. Guti<math id="bib.bib6.m2" display="inline"><mover accent="true"><mi mathvariant="normal">e</mi><mo>&#xb4;</mo></mover></math>rrez, S. Ceballos, R. I<math id="bib.bib6.m3" display="inline"><mover accent="true"><mi mathvariant="normal">n</mi><mo stretchy="false">~</mo></mover></math><math id="bib.bib6.m4" display="inline"><mover accent="true"><mi mathvariant="normal">i</mi><mo>&#xb4;</mo></mover></math>guez, I. Barrio, and J. Tardaguila. Artificial intelligence and novel sensing technologies for assessing downy mildew in grapevine. <emphasis>Horticulturae</emphasis>, 7(5), 2021.</para>
<para id="ch13-bib7">[7] I. Mezei, M. Lukic, L. Berbakov, B. Pavkovic, and B. Radovanovic. Grapevine downy mildew warning system based on nb-iot and energy harvesting technology. <emphasis>Electronics</emphasis>, 11(3), 2022.</para>
<para id="ch13-bib8">[8] V. Rossi, T. Caffi, S. Giosue, and R. Bugiani. A mechanistic model&#x2018; simulating primary infections of downy mildew in grapevine. <emphasis>Ecological Modelling</emphasis>, 212(3):480&#x2013;491, 2008.</para>
<para id="ch13-bib9">[9] I. Volpi, D. Guidotti, M. Mammini, and S. Marchi. Predicting symptoms of downy mildew, powdery mildew, and graymold diseases of grapevine through machine learning. <emphasis>Italian Journal of Agrometeorology</emphasis>, (2):57&#x2013;69, Dec. 2021.</para>
</section>
<para><sup>1</sup>Data could be provided upon request</para>
<para><sup>2</sup><ulink url="https://romeo.univ-reims.fr">https://romeo.univ-reims.fr</ulink></para>
</chapter>
<chapter class="chapter" id="ch14" label="14" xreflabel="14">
<title>On the Verification of Diagnosis Models</title>
<subtitle>Franz Wotawa and Oliver Tazl</subtitle>
<affiliation>Graz University of Technology, Austria</affiliation>
<section class="lev1">
<title>Abstract</title>
<para>Enhancing systems with advanced diagnostic capabilities for detecting, locating, and compensating faults during operation increases autonomy and reliability. To assure that the diagnosis-enhanced system really has improved reliability, we need &#x2013; besides other means &#x2013; to check the correctness of the diagnosis functionality. In this paper, we contribute to this challenge and discuss the application of testing to the case of model-based diagnosis, where we focus on testing the system models used for fault detection and localization. We present a simple use case and provide a step-by-step discussion on introducing testing, its capabilities, and arising issues. We come up with several challenges that we should tackle in future research.</para>
<para><emphasis role="strong">Keywords:</emphasis> model-based diagnosis, testing, verification and validation</para>
</section>
<section class="lev1" id="ch14-1">
<title>14.1 Introduction</title>
<para>Every system comprising hardware faces the problem of degradation under operation, which impacts its behavior over time. To prevent unwanted behavior that may lead to harm, we have to carry out regular maintenance tasks. Maintenance includes preventive activities like changing the tires of cars when their surfaces do not meet regulations anymore and looking at errors occurring during operation. The latter requires root cause identification, i.e., searching for components we have to repair for failure recovery. There is no doubt that the maintenance and diagnosis of engineered systems are of practical importance and, therefore, worth being considered in research.</para>
<para>If we aim to support maintenance personnel carrying out diagnoses, we need to automate the fault detection and localization activities. Since the beginning of artificial intelligence, diagnosis has been an active research field leading to expert systems and later to model-based diagnosis. The idea behind model-based diagnosis is to use system models for localizing the root causes of detected failures. Early work includes Davis and colleagues [<link linkend="ch14-bib3">3</link>] papers discussing the basic ideas and concepts behind model-based reasoning. Later, Reiter [<link linkend="ch14-bib15">15</link>] formalized the idea utilizing first-order logic. Based on these foundations, several authors have discussed several applications of model-based reasoning for solving real-world problems. Applications range from power supply networks [<link linkend="ch14-bib1">1</link>], the automotive domain [<link linkend="ch14-bib13">13</link>], space probes [<link linkend="ch14-bib14">14</link>], robotics [<link linkend="ch14-bib7">7</link>], self-adaptive systems [<link linkend="ch14-bib16">16</link>], to debugging [<link linkend="ch14-bib6">6</link>]. For a more recent paper, we refer to Wotawa and Kaufmann [<link linkend="ch14-bib22">22</link>], where the authors introduced how advanced reasoning systems can be used for computing diagnosis. For recent applications of diagnosis in the context of cyber-physical systems, have a look at [<link linkend="ch14-bib9">9</link>, <link linkend="ch14-bib23">23</link>, <link linkend="ch14-bib21">21</link>, <link linkend="ch14-bib20">20</link>].</para>
<para>In the following, we illustrate the basic ideas and concepts of model-based reasoning using a small example circuit comprising a battery <math id="Ch14.S1.p3.m1" display="inline"><mi>B</mi></math>, a switch <math id="Ch14.S1.p3.m2" display="inline"><mi>S</mi></math>, and two bulbs <math id="Ch14.S1.p3.m3" display="inline"><msub><mi>L</mi><mn>1</mn></msub></math>, <math id="Ch14.S1.p3.m4" display="inline"><msub><mi>L</mi><mn>2</mn></msub></math>. We depict the circuit in <link linkend="ch14-F1">Figure 14.1</link>. If we switch on <math id="Ch14.S1.p3.m5" display="inline"><mi>S</mi></math>, we expect both bulbs to transmit light when we assume the correctness of every component. It is important to consider such correctness assumptions. For example, if we switch on <math id="Ch14.S1.p3.m6" display="inline"><mi>S</mi></math>, and only one bulb (e.g., <math id="Ch14.S1.p3.m7" display="inline"><msub><mi>L</mi><mn>1</mn></msub></math>) is on, and the other (e.g., <math id="Ch14.S1.p3.m8" display="inline"><msub><mi>L</mi><mn>2</mn></msub></math>) is not, we conclude a broken bulb. But how can we do this? We may consider a model for each component, e.g., a correct battery provides electricity, a switch in the on state takes the electricity from the battery and transmits it to the bulbs, and a correct bulb produces light if there is electricity available. When we assume that all components are working, we receive a contradiction from this model. This is due to bulb <math id="Ch14.S1.p3.m9" display="inline"><msub><mi>L</mi><mn>2</mn></msub></math> that should produce light but we do not observe it. If we assume all components except <math id="Ch14.S1.p3.m10" display="inline"><msub><mi>L</mi><mn>2</mn></msub></math> to be correct, there is no contradiction anymore, and we have identified the root cause, i.e., <math id="Ch14.S1.p3.m11" display="inline"><msub><mi>L</mi><mn>2</mn></msub></math>.</para>
<fig id="ch14-F1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 14.1:</emphasis> A simple electric circuit comprising bulbs, a switch and a battery.</para></caption>
<graphic xlink:href="graphics/ch14-fig01.jpg"/>
</fig>
<para>A prerequisite of model-based diagnosis is the availability of a system model (or model in short). Modeling is not a trivial task. For model-based diagnosis, we need models formulated in a language that a reasoning system can use for deriving logical conclusions. Models are abstract representations of the system structure and behavior. Only parts of the system classified as components in the model can be part of a derived root cause. Wires or connectors need to be stated as components if we want to have them included in a diagnosis. In model-based diagnosis, only components used in models can be part of a root cause. It is also worth noting that we can use uncertainty in model-based diagnosis. De Kleer and Williams [<link linkend="ch14-bib4">4</link>] formalized the use of fault probabilities of components for searching for the most probable diagnosis. In addition, de Kleer and Williams introduced an algorithm for selecting the optimal probing locations for minimizing probing steps for identifying a single diagnosis.</para>
<para>In this manuscript, we do not focus on the diagnosis methods and processes themselves. Instead, we provide a discussion on how to verify diagnosis models. The challenge of model verification is of uttermost importance for assuring that systems equipped with diagnosis functionality work correctly. Although we may use some of the presented results for verifying diagnosis models generated by machine learning, we consider models for model-based reasoning in the context of this paper. For testing machine learning, we refer the interested reader to a recent survey [<link linkend="ch14-bib24">24</link>].</para>
<para>The challenge of model-based diagnosis and other logic-based reasoning systems is not that novel. Wotawa [<link linkend="ch14-bib17">17</link>] introduced the use of combinatorial testing and fault injection for testing self-adaptive systems based on models. The same author also discussed the use of combinatorial testing and metamorphic testing for theorem provers in [<link linkend="ch14-bib18">18</link>] and the general challenge [<link linkend="ch14-bib19">19</link>]. In any of these papers, the focus is on testing the implementation and not the underlying models. Koroglu and Wotawa [<link linkend="ch14-bib10">10</link>] also contributed to the challenge of verifying the reasoning system but focused on the underlying compiler that allows reading in logic theories, i.e., system models. Hence, testing the system models used for diagnosis is still an open challenge worth tackling for quality assurance.</para>
<para>We organize this paper as follows: In Section 14.2, we introduce the testing challenge in detail including a first solution. Afterward, we present the results when using the provided solution in a small case study. Finally, we discuss open issues, and further challenges, and conclude the paper.</para>
</section>
<section class="lev1" id="ch14-2">
<title>14.2 The Model Testing Challenge</title>
<para>Before discussing the model testing challenge in detail, we briefly summarize model-based diagnosis and the required information. In <link linkend="ch14-F2">Figure 14.2</link> we depict the basic architecture behind every model-based diagnosis system. On the right side, we have a (physical) system from which we extract observations. On the upper left side, we have a model of the system. This model shall represent the system in a way such that expected observations can be derived. The model and the observations are passed to a diagnosis engine, which tries to find an arrangement of health states to components such that no contradiction can be derived. In the simplest case, we only know the correct behavior of components. We use a logic predicate <emphasis role="strong">nab<math id="Ch14.S2.p1.m1" display="inline"><mrow><mi></mi><mo mathvariant="normal">\</mo><mn mathvariant="normal">1</mn></mrow></math></emphasis> to represent the corresponding health state. The diagnosis engine itself is assumed to be based on either a theorem prover or a constraint solver. It delivers a set of diagnoses. Each diagnosis itself is a set of faulty components. If the set of diagnoses comprises the empty set, we know that all components are working as expected.</para>
<fig id="ch14-F2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 14.2:</emphasis> The model-based diagnosis principle and information needed for testing.</para></caption>
<graphic xlink:href="graphics/ch14-fig02.jpg"/>
</fig>
<para>It is worth noting that in the context of this paper, we are not interested in outlining the details regarding model-based diagnosis, the modeling principles, and algorithms. We solely focus on testing, and specifically on testing the system model. What we can take with us from <link linkend="ch14-F2">Figure 14.2</link> are the inputs and outputs to the diagnosis engine comprising the model, the observations, and the computed diagnoses. If we want to verify the implementation of the diagnosis engine, we can use models and observations together with the corresponding expected diagnoses to define a test case. However, when we want to test the models, which are usually divided into two parts, the component models, and the structure of the system, we have to further think about underlying assumptions and prerequisites.</para>
<para>First, we have to assume that the diagnosis engine itself is correct. This means that the diagnosis engine is delivering the right diagnoses for a given model and observations. Testing the implementation of the diagnosis engine might also comprise testing the underlying theorem prover or constraint solver, the implementation of the diagnosis algorithm, and the compiler that is used to load a model and the observations into the diagnosis engine.</para>
<para>Second, the observations themselves describe the data that have been observed from the system. Usually, we do not use the raw data obtained from the system directly. The data is usually mapped to logical representations. Because we are only focusing on the verification of models used for diagnosis, there might also be faults occurring that originate from the mapping of data to their logical representations. For verifying the model, we do not need to deal with this topic. We can stay with the abstract representation of real observations for testing.</para>
<para>Finally, we assume that models can be divided into component models and structural models. We further assume that the component models are generally valid and can be used in several systems. This assumption is of particular importance because one argument in favor of model-based diagnosis is its flexibility in adapting to different systems and its model re-use capabilities.</para>
<para>Let us now come up with a definition of the challenge of testing diagnosis models where we have the following information given:</para>
<itemizedlist mark="none" spacing="normal">
<listitem><para>1. A model <math id="Ch14.S2.I1.i1.p1.m1" display="inline"><mi>M</mi></math> for components of given types and their connections.</para></listitem>
</itemizedlist>
<para>For testing we want to have the following:</para>
<itemizedlist mark="none" spacing="normal">
<listitem><para>1. A set of systems <math id="Ch14.S2.I2.i1.p1.m1" display="inline"><mi mathvariant="normal">&#x3a3;</mi></math> and for each system <math id="Ch14.S2.I2.i1.p1.m2" display="inline"><mrow><mi>S</mi><mo>&#x2208;</mo><mi mathvariant="normal">&#x3a3;</mi></mrow></math> a model <math id="Ch14.S2.I2.i1.p1.m3" display="inline"><msub><mi>M</mi><mi>S</mi></msub></math> representing the structure, i.e., its components and connections.</para></listitem>
<listitem><para>2. For each system <math id="Ch14.S2.I2.i2.p1.m1" display="inline"><mi>S</mi></math>, we want to have a set of inputs, i.e., possible observations, and a set of expected diagnoses. Note that observations include inputs and outputs of a system, and control commands (like opening or closing a switch).</para></listitem>
</itemizedlist>
<para>Note that the systems, as well as their inputs, must be obtained such that they may lead the diagnosis engine to compute different values. This principle is well-known in testing where testers focus on revealing faults and try to bring an implementation into a state of failure. For stating the problem, we do not rely on automation. Test cases for diagnosis models, and in particular the behavioral part, may be developed manually or using any method for automated test case generation (if possible).</para>
<para>In practice, we might be interested in testing a particular model comprising a structural and behavioral part of a given system. For this variant of the general model testing challenge, we only need to come up with observations and expected diagnoses. In the next section, we discuss generating test cases using the two-bulb example as a use case.</para>
</section>
<section class="lev1" id="ch14-3">
<title>14.3 Use Case</title>
<para>In this section, we use the two-bulb example from <link linkend="ch14-F1">Figure 14.1</link> as a use case for diagnosis model testing. We developed the diagnosis model using the input format of the Clingo<sup>1</sup> theorem prover that relies on the logic programming language Prolog. In <link linkend="ch14-F3">Figure 14.3</link> we see the source code of the model. In Line 1, the ordinary behavior of a battery is given. In case the battery is correctly working (and the predicate <emphasis role="strong">nab<math id="Ch14.S3.p1.m1" display="inline"><mrow><mi></mi><mo mathvariant="normal">\</mo><mn mathvariant="normal">1</mn></mrow></math></emphasis> is true), the battery provides a nominal output at the <emphasis role="strong">pow</emphasis> port. In lines 2-4, we formalize the model of a switch. A switch is transferring the power from the <emphasis role="strong">in_pow</emphasis> to the <emphasis role="strong">out_pow</emphasis> port and vice versa if it is correctly working an <emphasis role="strong">on</emphasis>. If the switch is <emphasis role="strong">off</emphasis>, there is no power at the output. Similarly, in lines 5-7, we see the behavior model of bulbs. If there is nominal power on the input, and the bulb is working fine, then the bulb is shining. If there is no power, there is also no light. If there is a light, we know that there must be electricity provided.</para>
<para>In lines 8-10, we have the connection model, stating that there is a transfer from one port of a component to another, and their values must be the same. The latter is stated in Line 10. Afterward, we have the structural model of the circuit. First, we define the components of the circuit <emphasis role="strong">b</emphasis>, <emphasis role="strong">s</emphasis>, <emphasis role="strong">l1</emphasis>, <emphasis role="strong">l2</emphasis> for the battery, switch, lamp 1 and lamp 2 respectively. Second, we state the connections between the ports of the components.</para>
<fig id="ch14-F3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 14.3:</emphasis> A model for diagnosis of the two lamp example from <link linkend="ch14-F1">Figure 14.1</link> comprising the behavior of the components (lines 1-7) and connections (lines 8-10), and the structure of the circuit (lines 11-18).</para></caption>
<graphic xlink:href="graphics/ch14-fig03.jpg"/>
</fig>
<para>For testing the model of the particular two-bulb system, we have to provide test cases comprising observations (which work as the inputs to the model) and the expected diagnoses (which are the expected outputs). For the two bulb example, the position of the switch (<emphasis role="strong">on</emphasis>, <emphasis role="strong">off</emphasis>), and the state of the two bulbs regarding light emission (<emphasis role="strong">on</emphasis>, <emphasis role="strong">off</emphasis>) serve as the inputs. It is worth noting that the power supply of the battery might also be observed. However, for the initial testing, we only consider those observations where we do not require additional equipment for measurement in practice. Nevertheless, for testing, we may also consider more observations.</para>
<para>When having 3 observations each having a domain comprising 2 values, we finally obtain 8 test cases covering all combinations. We depict this test cases in <link linkend="ch14-T1">Table 14.1</link>. Note that the first two test cases (which are highlighted in gray) cover the correct behavior of the system, where the switch is used to turn on and off lamps. Therefore, we see the empty set as the expected diagnosis in the corresponding column. The other test cases formalize an incorrect behavior of the two-bulb circuit.</para>
<fig id="ch14-T1" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 14.1:</emphasis> All eight test cases used to verify the 2-bulb example comprising the used observations and the expected diagnoses. The <emphasis role="strong">P/F</emphasis> column indicates whether the original model passes (<math id="Ch14.T1.m3" display="inline"><mo>&#x221a;</mo></math>) or fails (<math id="Ch14.T1.m4" display="inline"><mo>&#xd7;</mo></math>) the test.</para></caption>
<graphic xlink:href="graphics/ch14-tab01.jpg"/>
</fig>
<para>For testing the model, we run our diagnosis engine <emphasis role="strong">model_diagnose</emphasis> using the observations of a test case. In Clingo adding observations to models can be simple done via linking the model into a file where we state the observations. For the first test case the file <emphasis role="strong">tle_obs1.pl</emphasis> comprises the following statements:</para>
<para>#include &#x1e97;wo_lamps_example.pl.&#x308;</para>
<para>on(s).</para>
<para>val(light(l1),on).</para>
<para>val(light(l2),on).</para>
<para>The first line includes the model we show in <link linkend="ch14-F3">Figure 14.3</link>, which we store in the file <emphasis role="strong">two_lamps_example.pl</emphasis>. For executing a test case, we run the diagnosis engine in a shell using the following command: <emphasis role="strong">./model_diagnose -f tle_obs1.pl -fault 2</emphasis>. In this call, we ask for diagnoses comprising up to two components, which we do via setting the parameter <emphasis role="strong">-fault</emphasis> to <emphasis role="strong">2</emphasis>. Finally, we used a shell script to carry out all test cases. We see the outcome of testing in column <emphasis role="strong">P/F</emphasis> <link linkend="ch14-T1">Table 14.1</link>. The model passes all tests successfully.</para>
<fig id="ch14-T2" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 14.2:</emphasis> Running 7 model mutations M<math id="Ch14.T2.m3" display="inline"><mi>i</mi></math>, where we removed line <math id="Ch14.T2.m4" display="inline"><mi>i</mi></math> in the original model of <link linkend="ch14-F3">Figure 14.3</link>, using the 8 test cases from <link linkend="ch14-T1">Table 14.1</link>.</para></caption>
<graphic xlink:href="graphics/ch14-tab02.jpg"/>
</fig>
<para>After checking the correctness of diagnosis results obtained when using the model, we wanted to evaluate the quality of the test suite. In software engineering, measures like code coverage or the mutation score are used for this purpose. Estimating code coverage, i.e., the number of rules used to derive a contradiction for diagnosis is difficult because theorem provers usually do not provide this information. Therefore, we focused on mutation testing [<link linkend="ch7-bib2">2</link>, <link linkend="ch7-bib12">12</link>]. The underlying idea is to modify a program and to have a look at whether this modification can be detected by the test suite. The mutation score is defined as the fraction of the detected and all mutations. There are some issues when computing the mutation score, for example, equivalent mutants, i.e., changes of the program that are not changing the behavior.</para>
<para>For languages like Java, there are tools, e.g., [<link linkend="ch14-bib8">8</link>]. In our case, because of a lack of tools, we only removed rules as modification operators. In particular, we were interested in looking at the consequences to the diagnosis results when removing a rule from a component model. We define a mutant M<math id="Ch14.S3.p14.m1" display="inline"><mi>i</mi></math> as the original program (from <link linkend="ch14-F3">Figure 14.3</link>) where we removed the rule in Line <math id="Ch14.S3.p14.m2" display="inline"><mi>i</mi></math>. In <link linkend="ch14-T2">Table 14.2</link> we find the results obtained for each mutant. We see that there are two mutants M3 and M6 that cannot be detected by any test cases. Hence, the mutation score for our test suite is <math id="Ch14.S3.p14.m3" display="inline"><mrow><mfrac><mn>5</mn><mn>7</mn></mfrac><mo>=</mo><mn>0.7143</mn></mrow></math>. To clarify the reason behind not having a mutation score of <math id="Ch14.S3.p14.m4" display="inline"><mn>1.0</mn></math> we analyzed the corresponding rules of mutant M3 and M6. M3 allows transferring electricity also from the output to the input, which might be appropriate when dealing with other circuits. M6 covers the case where there is zero power on the input. Because there are no other rules allowing to derive zero power, this rule does not provide anything for the reasoning process for this use case and can be removed. Please note that the rule might be introduced again when considering a different use case where we have to deal with zero power at the input.</para>
<fig id="ch14-F4" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Figure 14.4:</emphasis> Another simple electric circuit comprising bulbs, switches and a battery. This circuit is an extended version of the circuit from <link linkend="ch14-F1">Figure 14.1</link>. On the right, we have the structural model of this circuit in Prolog notation.</para></caption>
<graphic xlink:href="graphics/ch14-fig04.jpg"/>
</fig>
<para>The question that remains is whether the component models can be used for other systems as well. To verify the corresponding property, i.e., the component models are generally applicable, we have to come up with new systems and apply test case generation again. In this use case, we slightly modified the original two-bulb example. We added another switch in parallel such that both provide the functionality of an or-gate. The lamps have to be off only if both switches are open, i.e., in their off state. See <link linkend="ch14-F4">Figure 14.4</link> for the schematics of the extended two-bulb circuit.</para>
<para>For testing the extended two-bulb circuit, we have to introduce test cases. Similar to the original circuit, we use all combinations of input values, and manually computed the expected diagnoses. We depict the whole test suite in <link linkend="ch14-T3">Table 14.3</link>. There we also see the obtained results after automating the test execution using shell scripts. For many test cases, the computed diagnoses are not equivalent to the expected ones. We conclude that the provided model is not generally applicable.</para>
<fig id="ch14-T3" position="float" xmlns:xlink="http://www.w3.org/1999/xlink">
<caption><para><emphasis role="strong">Table 14.3:</emphasis> Test cases for the extended two-bulb example from <link linkend="ch14-F4">Figure 14.4</link> and their test execution results. In gray we indicate tests that check the expected (fault-free) behavior of the circuit.</para></caption>
<graphic xlink:href="graphics/ch14-tab03.jpg"/>
</fig>
<para>After carefully analyzing the root cause behind this divergence, we identified the rule in Line 4 of the component model (from <link linkend="ch14-F3">Figure 14.3</link>) as problematic. This rule states that an open switch assures that there is no power on the output of the switch. Unfortunately, there might be electricity available because of another power supplying component like given in the extended two-bulb example. Unfortunately, we are also not able to remove this rule because otherwise, the behavior of the original two-bulb example would change (see <link linkend="ch14-T2">Table 14.2</link>). A solution would be to introduce a specific or-component that takes the outputs of the two switches as inputs and provides power whenever at least one power output has a nominal value.</para>
</section>
<section class="lev1" id="ch14-4">
<title>14.4 Open Issues and Challenges</title>
<para>We can identify the following results and issues from the use case discussed in the previous section.</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para><emphasis>Testing a model</emphasis> for a particular system that is based on component models and a structural part <emphasis>is possible</emphasis> but requires to identify (i) the input, i.e., observations given to the system, and (ii) the expected diagnosis. From this result the following issues arise:</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para>We have to identify the observations given to the system. This might not be an obvious task requiring to analyse the functionality of the system. We may start with observations of the input and the output of the system. But this might not be a complete test suite when considering the mutation score.</para></listitem>
<listitem><para>Furthermore, we have to consider different observations. We may make use of all combinations as we did in the case study. However, for a larger system, this is infeasible, and other approaches are required. Combinatorial testing [<link linkend="ch14-bib11">11</link>] might be a good starting point for future research.</para></listitem>
<listitem><para>The expected diagnoses have to be computed manually. This is a time-consuming task. Hence, any means for automating this step would be highly appreciated.</para></listitem>
</itemizedlist></listitem>
<listitem><para>The <emphasis>generated test suite may not</emphasis> lead to one that allows for <emphasis>detecting all faults</emphasis>. Fault detection capabilities are usually measured using the mutation score. From the use case discussed in the previous section, we see that the mutation score, even when considering only one mutation operator, might be less than <math id="Ch14.S4.I1.i2.p1.m1" display="inline"><mn>1.0</mn></math>. Related issues and future research activities include:</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para>We need to come up with a well-founded theory of mutation testing for logic rules. This also includes considering more mutation operators.</para></listitem>
<listitem><para>There is a need for generating test cases for diagnosis models automatically such that the mutation score can be maximized.</para></listitem>
</itemizedlist></listitem>
<listitem><para><emphasis>Testing should be extended</emphasis> to check whether the component <emphasis>models can be used in other systems</emphasis> as well. What is missing in this context is:</para>
<itemizedlist mark="bulleted" spacing="normal">
<listitem><para>The automated generation of different but still relevant systems for practical applications is an open research question. For each of the generated systems, we need to compute test suites and check the correctness of the computed diagnosis. Note that in principle, we have an infinite number of such systems. We have to think about when to stop testing.</para></listitem>
<listitem><para>In case of deviations between the expected diagnoses and the computed ones, someone is interested in identifying the reasons behind them. Hence, we need debugging functionality that may be similar to previous work on debugging knowledge bases [<link linkend="ch14-bib5">5</link>].</para></listitem>
</itemizedlist></listitem>
</itemizedlist>
<para>In summary, the main challenge relies on the automation of test case generation. Test cases or at least the expected diagnoses have to be generated manually. Moreover, we need to adapt existing testing methods and techniques for logic representations. Partially there is related work someone can start with. But when compared to corresponding work for ordinary programming languages, available knowledge can be considered minor.</para>
</section>
<section class="lev1" id="ch14-5">
<title>14.5 Conclusion</title>
<para>In this paper, we discussed the use of testing for model-based diagnosis. We focused on assuring the quality of system models used for fault detection and localization. We discussed how to test models and identified arising shortcomings, and future research directions. Testing a system model comes in two flavors: (i) testing a model of a particular system and (ii) testing component models used in different system models. For both, we need to define test cases comprising observations and expected diagnoses. For testing component models, in addition, we need to come up with different systems. Issues and challenges include providing means for answering the question of when to stop testing, giving quality guarantees, and the automation of test case generation.</para>
</section>
<section class="lev1">
<title>Acknowledgments</title>
<para>The research was supported by ECSEL JU under the project H2020 826060 AI4DI - Artificial Intelligence for Digitising Industry. AI4DI is funded by the Austrian Federal Ministry of Transport, Innovation, and Technology (BMVIT) under the program "ICT of the Future" between May 2019 and April 2022. More information can be retrieved from <ulink url="https://iktderzukunft.at/en/">https://iktderzukunft.at/en/</ulink>.</para>
<para><graphic xlink:href="graphics/ch14-alo01.jpg"/></para>
</section>
<section class="lev1" id="ch14-Ref">
<title>References</title>
<para id="ch14-bib1">[1] A. Beschta, O. Dressler, H. Freitag, M. Montag, and P. Struss. A model-based approach to fault localization in power transmission networks. <emphasis>Intelligent Systems Engineering</emphasis>, 1992.</para>
<para id="ch14-bib2">[2] T. Budd, R. DeMillo, R. Lipton, and F. Sayward. Theoretical and empirical studies on using program mutation to test the functional correctness of programs. In <emphasis>Proc. Seventh ACM Symp. on Princ. of Prog. Lang. (POPL)</emphasis>. ACM, January 1980.</para>
<para id="ch14-bib3">[3] R. Davis, H. Shrobe, W. Hamscher, K. Wieckert, M. Shirley, and S. Polit. Diagnosis based on structure and function. In <emphasis>Proceedings AAAI</emphasis>, pages 137&#x2013;142, Pittsburgh, August 1982. AAAI Press.</para>
<para id="ch14-bib4">[4] J. de Kleer and B. C. Williams. Diagnosing multiple faults. <emphasis>Artificial Intelligence</emphasis>, 32(1):97&#x2013;130, 1987.</para>
<para id="ch14-bib5">[5] A. Felfernig, G. Friedrich, D. Jannach, and M. Stumptner. Consistency based diagnosis of configuration knowledge bases. In <emphasis>Proceedings of the European Conference on Artificial Intelligence (ECAI)</emphasis>, Berlin, August 2000.</para>
<para id="ch14-bib6">[6] G. Friedrich, M. Stumptner, and F. Wotawa. Model-based diagnosis of hardware designs. <emphasis>Artificial Intelligence</emphasis>, 111(2):3&#x2013;39, July 1999.</para>
<para id="ch14-bib7">[7] M. W. Hofbaur, J. K&#xf6;b, G. Steinbauer, and F. Wotawa. Improving robustness of mobile robots using model-based reasoning. <emphasis>J. Intell. Robotic Syst.</emphasis>, 48(1):37&#x2013;54, 2007.</para>
<para id="ch14-bib8">[8] R. Just. The Major mutation framework: Efficient and scalable mutation analysis for Java. In <emphasis>Proceedings of the International Symposium on Software Testing and Analysis (ISSTA)</emphasis>, pages 433&#x2013;436, San Jose, CA, USA, 2014.</para>
<para id="ch14-bib9">[9] D. Kaufmann, I. Nica, and F. Wotawa. Intelligent agents diagnostics - enhancing cyber-physical systems with self-diagnostic capabilities. <emphasis>Adv. Intell. Syst.</emphasis>, 3(5):2000218, 2021.</para>
<para id="ch14-bib10">[10] Y. Koroglu and F. Wotawa. Fully automated compiler testing of a reasoning engine via mutated grammar fuzzing. In <emphasis>In Proc. of the 14th IEEE/ACM International Workshop on Automation of Software Test (AST)</emphasis>, Montreal, Canada, 27th May 2019.</para>
<para id="ch14-bib11">[11] D. R. Kuhn, R. N. Kacker, and Y. Lei. <emphasis>Introduction to Combinatorial Testing</emphasis>. Chapman &amp; Hall/CRC Innovations in Software Engineering and Software Development Series. Taylor &amp; Francis, 2013.</para>
<para id="ch14-bib12">[12] J. A. Offutt and S. D. Lee. An empirical evaluation of weak mutation. <emphasis>IEEE Transactions on Software Engineering</emphasis>, 20(5):337&#x2013;344, 1994.</para>
<para id="ch14-bib13">[13] C. Picardi, R. Bray, F. Cascio, L. Console, P. Dague, O. Dressler, D. Millet, B. Rehfus, P. Struss, and C. Vall&#xe9;e. Idd: Integrating diagnosis in the design of automotive systems. In <emphasis>Proceedings of the European Conference on Artificial Intelligence (ECAI)</emphasis>, pages 628&#x2013;632, Lyon, France, 2002. IOS Press.</para>
<para id="ch14-bib14">[14] K. Rajan, D. Bernard, G. Dorais, E. Gamble, B. Kanefsky, J. Kurien, W. Millar, N. Muscettola, P. Nayak, N. Rouquette, B. Smith, W. Taylor, and Y.-w. Tung. Remote Agent: An Autonomous Control System for the New Millennium. In <emphasis>Proceedings of the 14th European Conference on Artificial Intelligence (ECAI)</emphasis>, Berlin, Germany, August 2000.</para>
<para id="ch14-bib15">[15] R. Reiter. A theory of diagnosis from first principles. <emphasis>Artificial Intelligence</emphasis>, 32(1):57&#x2013;95, 1987.</para>
<para id="ch14-bib16">[16] G. Steinbauer and F. Wotawa. Model-based reasoning for self-adaptive systems - theory and practice. In <emphasis>Assurances for Self-Adaptive Systems</emphasis>, volume 7740 of <emphasis>Lecture Notes in Computer Science</emphasis>, pages 187&#x2013;213. Springer, Switzerland, 2013.</para>
<para id="ch14-bib17">[17] F. Wotawa. Testing self-adaptive systems using fault injection and combinatorial testing. In <emphasis>Proceedings of the Intl. Workshop on Verification and Validation of Adaptive Systems (VVASS 2016)</emphasis>, pages 305&#x2013;310, Vienna, Austria, 2016. IEEE.</para>
<para id="ch14-bib18">[18] F. Wotawa. Combining combinatorial testing and metamorphic testing for testing a logic-based non-monotonic reasoning system. In <emphasis>In Proceedings of the 7th International Workshop on Combinatorial Testing (IWCT) / ICST 2018</emphasis>, April 13th 2018.</para>
<para id="ch14-bib19">[19] F. Wotawa. On the automation of testing a logic-based diagnosis system. In <emphasis>In Proceedings of the 13th International Workshop on Testing: Academia-Industry Collaboration, Practice and Research Techniques (TAIC PART)/ICST 2018</emphasis>, April 9th 2018.</para>
<para id="ch14-bib20">[20] F. Wotawa. Reasoning from first principles for self-adaptive and autonomous systems. In E. Lughofer and M. Sayed-Mouchaweh, editors, <emphasis>Predictive Maintenance in Dynamic Systems &#x2013; Advanced Methods, Decision Support Tools and Real-World Applications</emphasis>. Springer, 2019.</para>
<para id="ch14-bib21">[21] F. Wotawa. Using model-based reasoning for self-adaptive control of smart battery systems. In Moamar Sayed-Mouchaweh, editor, <emphasis>Artificial Intelligence Techniques for a Scalable Energy Transition &#x2013; Advanced Methods, Digital Technologies, Decision Support Tools, and Applications</emphasis>. Springer, 2020.</para>
<para id="ch14-bib22">[22] F. Wotawa and D. Kaufmann. Model-based reasoning using answer set programming. <emphasis>Applied Intelligence</emphasis>, 2022.</para>
<para id="ch14-bib23">[23] F. Wotawa, O. A. Tazl, and D. Kaufmann. Automated diagnosis of cyber-physical systems. In <emphasis>IEA/AIE (2)</emphasis>, volume 12799 of <emphasis>Lecture Notes in Computer Science</emphasis>, pages 441&#x2013;452. Springer, 2021.</para>
<para id="ch14-bib24">[24] J. M. Zhang, M. Harman, L. Ma, and Y. Liu. Machine learning testing: Survey, landscapes and horizons. <emphasis>IEEE Transactions on Software Engineering</emphasis>, 48(1):1&#x2013;36, 2022.</para>
</section>
<para><sup>1</sup>see <ulink url="https://potassco.org">https://potassco.org</ulink></para>
</chapter>
<chapter class="index" id="index">
<title>Index</title>
<section class="lev1" id="index1">
<title>A</title>
<para>accelerators 3, 7, 32, 129, 154</para>
<para>AI 2, 54, 68, 73, 85, 152, 181</para>
<para>annotation 92, 93, 94, 95</para>
<para>anomaly detection 82, 86, 88, 173</para>
<para>artificial intelligence 54, 69, 99, 126, 141, 157, 201</para>
<para>ASIC 1, 11, 143</para>
</section>
<section class="lev1" id="index2">
<title>B</title>
<para>benchmark 7, 29, 46, 91, 95, 131, 147</para>
<para>benchmarking 1, 3, 5, 10, 14, 32, 173</para>
<para>bertology 91</para>
<para>bevel 104, 105, 108, 111, 112</para>
<para>bio-inspired processing 22</para>
</section>
<section class="lev1" id="index3">
<title>C</title>
<para>causality 91, 95, 97, 100</para>
<para>CNN 12, 76, 88, 89, 143, 182, 186</para>
<para>comparison 2, 9, 24, 46, 84, 105, 185</para>
<para>computer vision 5, 76, 154</para>
<para>contamination 74, 83, 103, 112</para>
<para>contamination monitoring and management 104</para>
<para>convolutional neural networks 27, 145, 182, 186</para>
</section>
<section class="lev1" id="index4">
<title>D</title>
<para>deep learning 3, 55, 76, 129, 158, 159, 174</para>
<para>deep learning architecture 158</para>
<para>defect detection 73, 77, 82, 83</para>
<para>Downy Mildew 177, 180, 186, 187</para>
<para>DynapCNN 130, 132, 134, 137</para>
</section>
<section class="lev1" id="index5">
<title>E</title>
<para>edge AI 103, 141, 161, 179, 186</para>
<para>edge-embedded devices 158, 159, 161</para>
<para>embedded systems 22, 152, 162</para>
</section>
<section class="lev1" id="index6">
<title>F</title>
<para>fault detection 189, 190, 201</para>
<para>fault localization 113, 201</para>
</section>
<section class="lev1" id="index7">
<title>H</title>
<para>hardware trust 53</para>
</section>
<section class="lev1" id="index8">
<title>I</title>
<para>image processing 53</para>
<para>industrial internet of intelligent things 158</para>
<para>industrial internet of things 157, 158</para>
<para>inference 1, 4, 85, 130, 136, 157, 161, 173</para>
<para>information extraction 91, 100</para>
<para>IoT 1, 130, 157, 185</para>
</section>
<section class="lev1" id="index9">
<title>K</title>
<para>kendryte 130, 132, 134, 139</para>
<para>key Performance Indicators 2, 5, 6, 7, 138</para>
</section>
<section class="lev1" id="index10">
<title>L</title>
<para>labelling 76, 84, 180</para>
<para>low power 3, 130, 141</para>
</section>
<section class="lev1" id="index11">
<title>M</title>
<para>machine learning 5, 75, 81, 158, 177, 180, 191</para>
<para>machine vision 73, 76</para>
<para>manufacturing AI solutions 81</para>
<para>Mask R-CNN 76, 77, 78</para>
<para>ML 2, 55, 141, 161, 183</para>
<para>model-based diagnosis 113, 114, 125, 189, 192</para>
</section>
<section class="lev1" id="index12">
<title>N</title>
<para>neuromorphic 1, 9, 21, 24, 129</para>
<para>neuromorphic computing 1, 32, 36</para>
<para>neuromorphic processor 2, 23, 25, 32</para>
</section>
<section class="lev1" id="index13">
<title>O</title>
<para>object detection 5, 36, 77, 141, 151</para>
</section>
<section class="lev1" id="index14">
<title>P</title>
<para>performance 2, 6, 9, 130, 145, 168, 185</para>
<para>physical inspection of electronics 53</para>
<para>physical simulation 113</para>
<para>predictive maintenance 157, 159, 161</para>
</section>
<section class="lev1" id="index15">
<title>R</title>
<para>random forest 165, 178, 182, 183, 186</para>
<para>relation extraction 91</para>
</section>
<section class="lev1" id="index16">
<title>S</title>
<para>semantic segmentation 53, 56</para>
<para>semiconductor wafer 73</para>
<para>smart sensors systems 158</para>
<para>spiking neural network 26, 38, 129</para>
<para>STM32 129, 130, 157, 171</para>
<para>supervised learning 82, 84, 161, 180</para>
<para>surface 73, 74, 105, 189</para>
<para>SVM 161, 177, 182, 185</para>
</section>
<section class="lev1" id="index17">
<title>T</title>
<para>tensor processing unit 142</para>
<para>testing 77, 113, 169, 171, 189, 201</para>
<para>transfer learning and scalability 83, 85, 86</para>
<para>TXRF 105, 110, 111</para>
</section>
<section class="lev1" id="index18">
<title>V</title>
<para>verification and validation 114, 189, 203</para>
<para>vibration analysis 158</para>
<para>VPD-ICPMS 105, 106, 107, 111</para>
</section>
<section class="lev1" id="index19">
<title>W</title>
<para>wafer loops 103, 104</para>
</section>
<section class="lev1" id="index20">
<title>Y</title>
<para>YOLO 76, 78, 141, 145</para>
</section>
</chapter>
</book>