# Accurate program/verify schemes of resistive switching memory (RRAM) for in-memory neural network circuits

Valerio Milo, *Member, IEEE*, Artem Glukhov, Eduardo Pérez, Cristian Zambelli, *Member, IEEE*, Nicola Lepri, Mamathamba K. Mahadevaiah, Emilio Perez-Bosch Quesada, Piero Olivo, Christian Wenger, and Daniele Ielmini, *Fellow, IEEE* 

Abstract—Resistive switching memory (RRAM) is a promising technology for embedded memory and their application in computing. In particular, RRAM arrays can provide a convenient primitive for matrix vector multiplication (MVM) with strong impact on the acceleration of neural networks for artificial intelligence (AI). At the same time, RRAM is affected by intrinsic conductance variations which might cause a degradation of accuracy in Al inference hardware. This work provides a detailed study of the multilevel-cell (MLC) programming of RRAM for neural network applications. We compare three MLC programming schemes and discuss their variations in terms of the different slope in the programming characteristics. We test the accuracy of a 2layer fully-connected neural network (FC-NN) as a function of the MLC scheme, the number of weight levels, and the weight mapping configuration. We find a trade-off between the FC-NN accuracy, size and current consumption. This work highlights the importance of a holistic approach to Al accelerators encompassing the device properties, the overall circuit performance, and the Al application specifications.

Index Terms— Resistive switching memory (RRAM); multilevel cell (MLC) operation; artificial neural network (ANN); in-memory computing (IMC).

## I. INTRODUCTION

RESISTIVE switching memory (RRAM) has recently gained increased interest for its application in novel computing concepts called in-memory computing (IMC) [1],

This article has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 648635), by the Deutsche Forschungsgemeinschaft (German Research Foundation) with Project-ID 434434223-SFB1461 and by the Federal Ministry of Education and Research of Germany under grant number 16ES1002.

V. Milo was with the Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB), Politecnico di Milano and IU.NET, 20133 Milan, Italy (e-mail: valerio.milo@polimi.it). Now he is with Applied Materials Italia Srl, 42124 Reggio Emilia, Italy.

A. Glukhov, N. Lepri, and D. lelmini are with the Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano and IU.NET, 20133 Milan, Italy (e-mail: daniele.ielmini@polimi.it).

C. Zambelli and P. Olivo are with the Dipartimento di Ingegneria, Università degli Studi di Ferrara, 44122 Ferrara, Italy.

E. Pérez, M. K. Mahadevaiah, E. Perez-Bosch Quesada, and Ch. Wenger are with IHP-Leibniz-Institut für innovative Mikroelektronik, 15236 Frankfurt (Oder), Germany.

Ch. Wenger is also with BTU Cottbus-Senftenberg, 01968 Cottbus, Germany.



Fig. 1. (a) Schematic of a 1T1R RRAM device of the 4-kbit array used in this work. The RRAM has a stack consisting of a Ti-based oxygen reservoir and an amorphous  $HfO_2$  switching layer sandwiched between a TiN TE and a TiN BE. (b) Multilevel I-V characteristics of 1T1R RRAM device measured for increasing  $V_G$ .

[2]. A major advantage of IMC is the capability to execute matrix vector multiplication (MVM) in parallel on multiple rows and columns of a memory array, which allows for a strong acceleration of neural networks [3]–[7]. The recent demonstration of embedded RRAM devices at Mbit capacity [8] enables the design and integration of IMC circuits [9]–[11], thus paving the way for energy efficient RRAM-based accelerators of artificial intelligence (AI).

A potential issue for RRAM-based IMC is the limited precision of conductance, which is affected by programming variations [12], [13], random telegraph noise (RTN) [14], drift [15], and other types of random fluctuations [16], [17]. Multilevel-cell (MLC) program/verify techniques have been proposed to overcome the variability effects and improve the precision of conductance in RRAM [18], [19]. Still, the optimization of MLC precision and its impact on the overall performance of the IMC accelerators in terms of precision and energy efficiency is not fully understood.

This work compares three different MLC program/verify schemes for a 4kb RRAM array used to accelerate MVM in a FC-NN. We show that gate-based program/verify techniques, where the compliance current is increased at each programming step, display the best accuracy thanks to relatively



Fig. 2. Measured single-cell (gray line) and median (symbol) conductance of 4 LRS levels programmed in the 4-kbit array by (a) ISPVA, (b) IGVVA-100, and (c) IGVVA-10. ISPVA shows abrupt conductance transitions in correspondence of the set voltage while IGVVA-100 and IGVVA-10 provide a more gradual conductance increase as a result of better current modulation achieved by  $V_G$  control than  $V_{TE}$  control. Smaller voltage step  $\Delta V_G$  makes IGVVA-10 programming more accurate than IGVVA-100.

shallow characteristics of conductance vs. number of pulses. Thanks to this optimized control of MLC conductance, we program the RRAM array with synaptic weights obtained from an offline training and quantization technique for the recognition of handwritten characters. The results are discussed in terms of the tradeoff between inference testing accuracy and current consumption in the array. These results show that a multiscale approach, ranging from weight precision at device level to overall circuit performance, is essential in the design of IMC accelerators of AI.

Preliminary results about this work were reported in [20]. In this work, we extend the number of MLC states up to 9 resistive levels, resulting in 19 synaptic weights. We also include the impact of fluctuations on the FC-NN accuracy, by comparing the conductance immediately after verify to the one at the end of the algorithm. Finally, we include a comprehensive study of accuracy as a function of the number/choice of conductance levels and the number of hidden neurons, also including the impact of IR drop due to parasitic wire resistance in the RRAM array.

# II. MULTILEVEL 1T1R RRAM DEVICE

The 4-bit array used as test vehicle in this work includes 64x64 RRAM cells based on the one-transistor/one-resistor (1T1R) structure shown in Fig. 1(a). This structure consists of the serial connection of a TiN/Ti/HfO<sub>2</sub>/TiN RRAM with a n-channel MOS in 0.25  $\mu$ m CMOS technology, which is introduced to select the cell and limit the current to the compliance current I<sub>C</sub>. Fig. 1(b) shows the measured I-V curves for increasing I<sub>C</sub> which was controlled by the gate voltage V<sub>G</sub>. These characteristics present abrupt set transitions from the high resistance state (HRS) to the low resistance state (LRS) and gradual reset transitions from the LRS to the HRS for positive/negative voltages, respectively. These curves clearly support the ability of our 1T1R RRAM devices to achieve MLC operation by tuning of V<sub>G</sub>.

## III. MLC ALGORITHM CHARACTERIZATION

To achieve an accurate MLC programming of the 4-kbit RRAM array, we compared two program/verify algorithm





Fig. 3. Measured conductance of a single RRAM device evidencing the after-switching conductance  $\mathsf{G}_{AS}$  and the end-algorithm conductance  $\mathsf{G}_{EA}$  for (a) ISPVA and (b) IGVVA-100. The inset shows the pulse amplitude of  $\mathsf{V}_{TE}$  and  $\mathsf{V}_G$  for AS and EA conditions.

approaches based on the modulation of top electrode voltage  $V_{TE}$  and gate voltage  $V_G$ , respectively. The first algorithm, referred to as incremental step pulse program and verify



Fig. 4. Conductance CDFs of HRS and 4 LRS levels measured (a) after the switching event (AS) and (b) at the end of the algorithm (EA) by application of ISPVA, IGVVA-100, and IGVVA-10. IGVVA-10 CDFs display the lowest D2D variability for both AS and EA condition, followed by ISPVA and IGVVA-100.

algorithm (ISPVA), was proposed in [21] and allows multilevel programming via step-by-step application of set pulses (pulse width  $t_{pulse} = 1 \mu s$ ) with increasing  $V_{TE}$  while  $V_G$  is set to achieve the  $I_C$  corresponding to the desired target level, and the source of the transistor is grounded. Fig. 2(a) shows the conductance of 4 LRS levels measured by application of ISPVA on a quarter of the 4-kbit 1T1R RRAM devices initially prepared in HRS.  $V_{TE}$  was increased from 0.5 V to 2 V with a voltage step  $\Delta V_{TE}$  = 100 mV by keeping the amplitude of V<sub>G</sub> pulses fixed at 1 V, 1.2 V, 1.4 V, and 1.6 V to achieve the target level currents of 10  $\mu$ A, 20  $\mu$ A, 30  $\mu$ A, and 40  $\mu$ A, respectively. Note that, after any programming operation, a read operation was performed by application of  $V_G = 1.7 \text{ V}$  and  $V_{TE} = 0.2 \text{ V}$ . By considering both single cell conductance G and its median value <G> for each LRS level, it can be noted that abrupt changes take place as soon as  $V_{TE}$ becomes larger than  $V_{set} \approx 0.9$  V, which evidences that ISPVA is not suitable to finely modulate the device conductance. In particular, the median characteristics show faster transitions for increasing  $V_G$ .

To overcome the ISPVA limitation, we designed and investigated a V<sub>G</sub>-based programming algorithm called incremental gate voltage and verify algorithm (IGVVA) [20]. Unlike  $V_{TE}$ controlled ISPVA, IGVVA consists of the application of programming pulses (pulse width  $t_{pulse} = 1 \mu s$ ) with increasing amplitude  $V_G$  from 0.5 V to 1.7 V. On the other hand, the amplitude of  $V_{TE}$  programming pulses is kept equal to 1.2 V, which is larger than  $V_{set}$  to allow for the set transition. Note that IGVVA was tested using two voltage steps, namely  $\Delta V_G = 100 \text{ mV}$  (IGVVA-100) and  $\Delta V_G = 10 \text{ mV}$  (IGVVA-10). Fig. 2(b) and (c) show the measured G and <G> as a function of  $V_G$  by IGVVA-100 and IGVVA-10, respectively, which exhibit a more gradual increase compared to ISPVA. This is due to the higher accuracy in the device current control arising from the tight relation between  $V_G$  and  $I_C$  [22]. Also, it can be noted that the level programming precision of IGVVA-



Fig. 5. (a) Comparison among ISPVA, IGVVA-100, and IGVVA-10 median I-V characteristics of the LRS with <G $>=200~\mu$ S in terms of maximum slope g= dl/dV. (b) CDFs of ISPVA, IGVVA-100, and IGVVA-10 dl/dV. According to Fig. 2, ISPVA provides dl/dV CDFs with increasing median and standard deviation variability for increasing level. On the other hand, IGVVA-100 and IGVVA-10 show dl/dV with small median and standard deviation variability. In particular, IGVVA-10 dl/dV CDFs show the lowest median variability, confirming its finer conductance tuning capability.

10 is higher than IGVVA-100 as a result of the smaller  $\Delta V_G$ . To compare the programming accuracy of these MLC algorithms, we programmed 5 levels into the 4-kbit RRAM array by measuring the after-switching (AS) and the endalgorithm (EA) conductance, namely the conductance values measured immediately above the verify threshold and the value measured at the end of the whole algorithm, respectively. For example, Fig. 3 shows the AS conductance  $G_{AS}$  and the EA conductance  $G_{EA}$  for (a) ISPVA and (b) IGVVA-100 program/verify pulses in the case of the conductance level with  $G_{target} = 150 \mu S$ . It should be noted that in ISPVA, no program pulses are applied to the device after  $G_{AS}$  is measured, while the read pulse is applied until the number of pulses reaches 16, which would be needed to increase the theoretical  $V_{TE}$  to the maximum value of 2 V. Similar to ISPVA, no additional programming pulses are applied to the device between the AS state and the EA state in IGVVA-100. This allows to evidence post-programming fluctuations of the



Fig. 6. Measured AS and EA CDFs of 9 IGVVA-10 conductance levels used to implement the synaptic weights into the 4-kbit RRAM array.

conductance. Note also that in the case of IGVVA-10 (not shown), we adopted the same scheme used for IGVVA-100, while applying a total sequence of 121 pulses as a result of the smaller  $\Delta V_G$ .

Fig. 4(a) shows the (a) AS and (b) EA cumulative distributions (CDFs) of HRS and 4 programmed LRS levels. Compared to AS, the EA distributions show a conductance relaxation for any level, thus resulting in the conductance of some programmed cells decreases below the conductance target. Both figures indicate that IGVVA-10 shows the lowest device-to-device (D2D) variability, followed by ISPVA and IGVVA-100, thus arising as the most accurate approach for programming the synaptic weights of a neural network in our RRAM array.

To gain more insight about the control of CDFs by the algorithms, we also studied the maximum slope  $g=\mathrm{dI/dV}$  of the experimental I-V characteristics, which is explained in Fig. 5(a) for the case of the highest programmed level. Fig. 5(b) shows the CDFs of g for each of the 4 LRS levels programmed by ISPVA, IGVVA-100, and IGVVA-10. From these data, ISPVA features increasing slope with increasing level as opposed to IGVVA-100 and IGVVA-10, where a small increase of g with negligible dependence on  $\Delta V_G$  can be noted. These results support the better control of CDFs with IGVVA-10 compared with IGVVA-100 and ISPVA.

# IV. SYNAPTIC WEIGHT MAPPING BY IGVVA-10

To test the accuracy of IGVVA-10 for encoding synaptic weights in a neural network, we programmed 8 LRS levels corresponding to the target conductances from 50  $\mu$ S to 225  $\mu$ S by IGVVA-10. Fig. 6 shows the experimental CDFs of the HRS and 8 LRS levels. Both the AS and the EA distributions are reported, evidencing a small D2D variability and relatively small EA relaxation tails affecting all the levels.

The 9 conductance CDFs in Fig. 6 were used to implement the 4-kbit synaptic weights of the 2-layer fully-connected neural network (FC-NN) investigated in [20]. This neural network was trained off-line by backpropagation rule for



Fig. 7. (a) Schematic representation of a synaptic weight W implemented using the differential configuration of two 1T1R RRAM devices. (b) Color plot of standard deviation  $\sigma_W$  of 19 differential weights calculated via all the IGVVA-10 CDF differences. Based on  $\sigma_W$ , we selected 10 combinations of 19 weights (C1-C10) for mapping the synaptic weights of the neural network, where C10 features the lowest variability.



Fig. 8. (a) Calculated inference accuracy using differential weights based on AS and EA IGVVA-10 CDFs and (b) corresponding current consumption as a function of the weight mapping combination. EA relaxation has a small impact on both figures of merit of our network implementation. (c) Schematic of a crossbar array of resistive devices including the parasitic wire resistances r responsible for the IR drop. (d) Impact of the IR drop on the calculated inference accuracy for increasing r as a function of the weight combination.

recognizing a simplified 14x14 version of the handwritten digit images of Modified National Institute of Standards and Technology (MNIST) dataset [23]. The network consists of an input layer including 197 neurons, a hidden layer with 20 neurons, and an output layer with 10 neurons. The after-training weight quantization scheme proposed in [24] was implemented to take advantage of the quantized levels in the RRAM array. To maximize the inference accuracy of the neural network in the 4-kbit array, here we implemented the

FC-NN synaptic weights using the 9 IGVVA-10 CDFs in Fig. 6 combined with the differential scheme illustrated in Fig. 7(a), namely by encoding the weight as the difference of two 1T1R conductances G<sup>+</sup> and G<sup>-</sup> [3]. Note that, to obtain a certain weight W, there are various possible combinations of G<sup>+</sup> and G<sup>-</sup> from the distributions of Fig. 6. Fig. 7(b) shows the 100 combinations of G<sup>+</sup> and G<sup>-</sup> for mapping 19 weights in the network. For instance, a weight of 100  $\mu$ S can be obtained as the difference between  $G^+ = 100 \mu S$  and  $G^- = 0$ , or as the difference between  $G^+ = 200 \ \mu S$  and  $G^- = 100 \ \mu S$ . Note that L2, which corresponds to  $\langle G \rangle = 25 \mu S$ , was not experimentally measured, but calculated in simulation by differences of CDFs. The figure also shows the standard deviation  $\sigma_W$  of the differential levels obtained by all the possible differences of the 9 IGVVA-10 CDFs. It can be noted that the combination of 19 differential weights with the lowest  $\sigma_W$ , called C10, is found at the top row (G<sup>+</sup> = L10) and the rightmost column ( $G^- = L10$ ) as a result of the lowest  $\sigma_G$  of L10 shown in Fig. 6. Also, Fig. 7(b) shows other examples of weight mapping combinations (C1, C3, C6, and C8), which, despite the higher  $\sigma_W$ , allow to implement the 19 differential weights by using smaller conductance levels. This poses a significant trade-off for the design of our network: while the weight precision is maximized in correspondence of the highest conductance levels, the relatively large current results in a larger area and energy of the periphery circuits as well as a higher IR drop, causing additional errors. The IR drop impact might be minimized provided that the interconnect resistances are much smaller than the device resistances [25].

# V. IMPACT OF SYNAPTIC WEIGHT MAPPING ON NEURAL NETWORK DESIGN

To better understand the impact of the various weight mapping combinations, Fig. 8(a) shows the calculated inference accuracy  $\eta$  of the 2-layer FC-NN with 100 hidden neurons ( $N_H = 100$ ) and 19 differential weight levels based on IGVVA-10 on MNIST test dataset as a function of the weight combination Ci, where the index i ranges from 1 to 10. In agreement with the color plot in Fig. 7(b), the testing accuracy of the network increases with increasing i, supporting C10 as the best combination to increase  $\eta$  (96.58%) closer to the software accuracy calculated using real-valued weights with 64-bit floating point (FP-64) precision (96.77%). Note that the improvement in terms of inference accuracy from C1 to C10 for both AS and EA is by 0.15%. Fig. 8(b) shows the current consumption during the inference phase as a function of Ci. This was calculated as the sum of all the column currents of the network during the testing of the MNIST images. We obtained that the consumed current increases with Ci as a result of the increasing conductances used to implement the differential weights, thus leading to a maximum value at C10 which is about five times the C1based dissipation. These results clearly illustrate the tradeoff between the inference accuracy of the FC-NN and the current consumption. In addition to the current consumption, the impact of the IR drop, namely the voltage drop due to the parasitic wire resistance (Fig. 8(c)), was evaluated. Fig. 8(d)



Fig. 9. Calculated inference accuracy of the 2-layer FC-NN with N $_H$  = 100 as a function of the number of synaptic weight levels programmed by IGVVA-100, ISPVA, and IGVVA-10 CDFs. The higher number of levels combined with IGVVA-10 programming leads  $\eta$  close to FP-64 accuracy.



Fig. 10. Calculated inference accuracy of the 2-layer FC-NN as a function of the number of synaptic weight levels for (a) various steps  $\Delta W$  in weight mapping and (b) increasing size of hidden layer  $N_H$ .

shows the inference accuracy of our network as a function of the combinations indicated in Fig. 7(b). The network was broken into 6 individual tiles consisting of 32x32 crossbar arrays of RRAM devices where both terminals of each device are connected to row and column wires with a finite non-zero resistance r between two contacts. The simulation results show that the inference accuracy decreases with increasing r from 0 to 3  $\Omega$  evidencing an increasing drop at higher Ci because of the larger currents. These results further highlight the importance of adopting relatively low conductance levels, despite their slightly larger variation.

While accuracy is only barely improved by increasing the conductance levels (Fig. 8(a)), it can be more heavily impacted by the conductance precision in terms of the number of levels of the synaptic weight. This is shown in Fig. 9 where we report the calculated  $\eta$  of the 2-layer FC-NN with N<sub>H</sub> = 100 as a function of the number of discrete weight levels programmed by ISPVA, IGVVA-100, and IGVVA-10. First, the increasing number of weight levels from 9 to 19 leads to increasing inference accuracy values for all the algorithms, supporting the need for memory devices with accurate MLC operation. Also, accordingly with Fig. 4(a), IGVVA-10 provides the highest improvement followed by ISPVA and IGVVA-100 thanks to

its lower D2D variability.

In addition to the number of levels, mapping a wider range of real-valued weights calculated in software is also essential to achieve higher inference accuracies. This is shown in Fig. 10(a), where we report that  $\eta$  can be achieved by mapping the weights using levels with a step  $\Delta W = 50 \mu S$  rather than a smaller step of 25  $\mu$ S in the case of 9 discrete levels. Obviously, this choice has the drawback of requiring a larger current consumption, and a larger number of discrete MLC states in the memory device. Fig. 10(b) shows the increase in inference accuracy of the 2-layer FC-NN as a function of the number of weight levels programmed by IGVVA-10 AS CDFs for increasing  $N_H$ . As expected, a larger number of weights enables a significant improvement in test accuracy. In particular, if 19 levels are adopted for weight mapping,  $\eta$ increases from 92% with  $N_H = 20$  to 96.2% with  $N_H = 100$ . However, increasing the number of hidden neurons, namely the size of FC-NN, also results in a larger area of the memory array, thus evidencing a trade-off between accuracy and area consumption.

# VI. CONCLUSIONS

We investigated 3 MLC algorithms to optimize the synaptic weight implementation for RRAM-based FC-NNs. IGVVA-10 allows to program 9 conductance levels exhibiting the lowest D2D variability thanks to the highly accurate slope control of I-V characteristics. Combining the differential encoding scheme and IGVVA-10, we mapped the weights of a 2-layer FC-NN demonstrating high inference accuracy for increasing number of levels, weight mapping step, and hidden layer size. This study also allows to evidence key trade-offs between the improvement of inference accuracy and current/area consumption, with a focus on the impact of the IR drop. The results discussed in this work support the need for a co-optimization at device and system level to bring the array-level neural network implementations close to the accuracy achieved by neural networks operated in software.

#### REFERENCES

- M. A. Zidan, J. P. Strachan, and W. D. Lu, "The future of electronics based on memristive systems," *Nat. Electron.*, vol. 1, pp. 22–29, 2018, DOI: 10.1038/s41928-017-0006-8.
- [2] D. Ielmini and H.-S. P. Wong, "In-memory computing with resistive switching devices," *Nat. Electron.*, vol. 1, pp. 333–343, 2018, DOI: 10.1038/s41928-018-0092-2.
- [3] G. W. Burr et al., "Experimental demonstration and tolerancing of a large-scale neural network (165 000 synapses) using phase-change memory as the synaptic weight element," *IEEE Trans. Electron Devices*, vol. 62, no. 11, pp. 3498–3507, 2015, DOI: 10.1109/TED.2015.2439635.
- [4] T. Gokmen and Y. Vlasov, "Acceleration of deep neural network training with resistive cross-point devices: Design considerations," Front. Neurosci., vol. 10, p. 333, 2016, DOI: 10.3389/fnins.2016.00333.
- [5] S. Yu, "Neuro-inspired computing with emerging nonvolatile memory," *Proc. IEEE*, vol. 106, no. 2, pp. 260–285, 2018, DOI: 10.1109/JPROC.2018.2790840.
- [6] C. Li et al., "Analogue signal and image processing with large memristor crossbars," Nat. Electron., vol. 1, pp. 52–59, 2018, DOI: 10.1038/s41928-017-0002-z.
- [7] M. Hu, C. E. Graves, C. Li, Y. Li, N. Ge, E. Montgomery, N. Davila, H. Jiang, R. S. Williams, J. J. Yang, Q. Xia, and J. P. Strachan, "Memristor-based analog computation and neural network classification with a dot product engine," *Adv. Mater.*, vol. 30, no. 1705914, 2018, DOI: 10.1002/adma.201705914.

- [8] C.-C. Chou, Z.-J. Lin, P.-L. Tseng, C.-F. Li, C.-Y. Chang, W.-C. Chen, Y.-D. Chih, and T.-Y. J. Chang, "An N40 256K44 embedded RRAM macro with SL-precharge SA and low-voltage current limiter to improve read and write performance," *IEEE Int. Solid-State Circ. Conf. (ISSCC)*, pp. 478–480, 2018, DOI: 10.1109/ISSCC.2018.8310392.
- [9] W.-H. Chen et al., "A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors," *IEEE Int. Solid-State Circ. Conf. (ISSCC)*, pp. 494–496, 2018, DOI: 10.1109/ISSCC.2018.8310400.
- [10] C.-X. Xue et al., "A 1Mb multibit ReRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors," *IEEE Int. Solid-State Circ. Conf. (ISSCC)*, pp. 388–390, 2019, DOI: 10.1109/ISSCC.2019.8662395.
- [11] C.-X. Xue et al., "A CMOS-integrated compute-in-memory macro based on resistive random-access memory for AI edge devices," Nat. Electron., vol. 4, pp. 81–90, 2021, DOI: 10.1038/s41928-020-00505-5.
- [12] A. Fantini, L. Goux, R. Degraeve, D. Wouters, N. Raghavan, G. Kar, A. Belmonte, Y.-Y. Chen, B. Govoreanu, and M. Jurczak, "Intrinsic switching variability in HfO<sub>2</sub> RRAM," *IEEE Int. Mem. Workshop* (IMW), pp. 1–4, 2013, DOI: 10.1109/IMW.2013.6582090.
- [13] S. Ambrogio, S. Balatti, A. Cubeta, A. Calderoni, N. Ramaswamy, and D. Ielmini, "Statistical fluctuations in HfO<sub>x</sub> resistive-switching memory: Part I – Set/reset variability," *IEEE Trans. Electron Devices*, vol. 61, no. 8, pp. 2912–2919, 2014, DOI: 10.1109/TED.2014.2330200.
- [14] S. Ambrogio, S. Balatti, A. Cubeta, A. Calderoni, N. Ramaswamy, and D. Ielmini, "Statistical fluctuations in HfO<sub>x</sub> resistive-switching memory: Part II – Random telegraph noise," *IEEE Trans. Electron Devices*, vol. 61, no. 8, pp. 2920–2927, 2014, DOI: 10.1109/TED.2014.2330202.
- [15] Y. Lin, C. Wang, M. Lee, D. Lee, Y. Lin, F. Lee, H. Lung, K. Wang, T. Tseng, and C. Lu, "Performance impacts of analog ReRAM nonideality on neuromorphic computing," *IEEE Trans. Electron Devices*, vol. 66, no. 3, pp. 1289–1295, 2019, DOI: 10.1109/TED.2019.2894273.
- [16] S. Ambrogio, S. Balatti, V. McCaffrey, D. C. Wang, and D. Ielmini, "Noise-induced resistance broadening in resistive switching memory – Part I: Intrinsic cell behavior," *IEEE Trans. Electron Devices*, vol. 62, no. 11, pp. 3805–3811, 2015, DOI: 10.1109/TED.2015.2475598.
- [17] S. Ambrogio, S. Balatti, V. McCaffrey, D. C. Wang, and D. Ielmini, "Noise-induced resistance broadening in resistive switching memory – Part II: Array statistics," *IEEE Trans. Electron Devices*, vol. 62, no. 11, pp. 3812–3819, 2015, DOI: 10.1109/TED.2015.2477135.
- [18] E. Pérez, C. Zambelli, M. K. Mahadevaiah, P. Olivo, and Ch.Wenger, "Toward reliable multi-level operation in RRAM arrays: Improving postalgorithm stability and assessing endurance/data retention," *J. Electron. Dev. Soc.*, vol. 7, pp. 740–747, 2019, DOI: 10.1109/JEDS.2019.2931769.
- [19] V. Milo, C. Zambelli, P. Olivo, E. Pérez, M. K. Mahadevaiah, O. G. Ossorio, Ch. Wenger, and D. Ielmini, "Multilevel HfO<sub>2</sub>-based RRAM devices for low-power neuromorphic networks," *APL Mater.*, vol. 7, p. 081120, 2019, DOI: 10.1063/1.5108650.
- [20] V. Milo, F. Anzalone, C. Zambelli, E. Perez, M. K. Mahadevaiah, O. G. Ossorio, P. Olivo, Ch. Wenger, and D. Ielmini, "Optimized programming algorithms for multilevel RRAM in hardware neural networks," *IEEE Int. Reliab. Phys. Symp. (IRPS)*, 2021, in press.
- [21] E. Pérez, A. Grossi, C. Zambelli, P. Olivo, R. Roelofs, and Ch. Wenger, "Reduction of the cell-to-cell variability in Hf<sub>1-x</sub>Al<sub>x</sub>O<sub>y</sub> based RRAM arrays by using program algorithms," *IEEE Electron Device Lett.*, vol. 38, no. 2, pp. 175–178, 2017, DOI: 10.1109/LED.2016.2646758.
- [22] D. Ielmini, "Modeling the universal set/reset characteristics of bipolar RRAM by field- and temperature-driven filament growth," *IEEE Trans. Electron Devices*, vol. 58, no. 12, pp. 4309–4317, 2011, DOI: 10.1109/TED.2011.2167513.
- [23] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," *Proc. IEEE*, vol. 86, no. 11, pp. 2278– 2324, 1998, DOI: 10.1109/5.726791.
- [24] A. Zhou, A. Yao, Y. Guo, L. Xu, and Y. Chen, "Incremental network quantization: towards lossless CNNs with low-precision weights," arXiv:1702.03044, 2017.
- [25] D. Ielmini and G. Pedretti, "Device and circuit architectures for inmemory computing," Adv. Intell. Syst., vol. 2, no. 7, p. 2000040, 2020, DOI: 10.1002/aisy.202000040.