Abstract
A novel organic neuromorphic device performing pattern classification is presented and demonstrated. It features an artificial soma capable of dendritic integration from three pre-synaptic neurons. The time-response of the interface between electrolytic solutions and organic mixed ionic-electronic conductors is proposed as the sole computational feature for pattern recognition, and it is easily tuned in the organic dendritic integrator by simply controlling electrolyte ionic strength. The classifier is benchmarked in speech-recognition experiments, with a sample of 14 words, encoded either from audio tracks or from kinematic data, showing excellent discrimination performances in a planar, miniaturizable, fully passive device, designed to be promptly integrated in more complex architectures where on-board pattern classification is required.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
The capability of recognizing and reacting to stimulation patterns coming from multiple sources, integrating them in a coordinated, low-dimensionality response, is one of the main features of neural computing, which enables extremely low-power parallel computation, concurring in making the human brain a more robust, plastic and fault-tolerant computing system with respect to digital architectures [1]. In each neuron, the soma constantly integrates inputs coming from thousands of dendritic synapses, which are located at the connection knots between axons of pre-synaptic neurons and its (post-synaptic) dendrites. This process is known as 'dendritic integration' [2] and it is critical in neural signal transmission and computing. Indeed, it has been demonstrated that, since action potentials are initiated near the soma in the axon initial segment [3, 4], synaptic inputs can influence action potentials of post-synaptic neurons by effectively modulating the membrane potential at this location [5]. In addition, this modulation is influenced by geometrical and electrochemical properties of dendrites, thus enabling a wide variety of non-linear operations in neurons [6].
In recent years, there has been a growing endeavor in trying to implement brain computing features on a hardware level [7]. This ambitious goal is pursued either by building massive networks of artificial neurons and synapses achieved via conventional silicon-based electronics, aiming at building neuromorphic supercomputers (e.g. the recently announced Deep South, aiming at >1014 synaptic operations per second [8]), or by emulating synaptic functions in neuromorphic device units, exploiting the inherent signal response properties of unconventional materials, such as organic mixed ionic electronic conductors—OMIECs [9]. The latter resulted in the newly developed field of Organic Neuromorphic Electronics, which succeeded in the demonstration of a wide variety of neural processing functions, concerning plasticity in response to input signals [10, 11] and spike generation [12]. Importantly, such architectures can be promptly interfaced to the biological environment, and their neuromorphic behavior is directly influenced by chemo-physical features of the environment itself, making them ideal candidates for biointerfacing and sensing applications [13–17]. Lately, both reversible [18–20] and irreversible [21] means for achieving tunability of the device response have been explored, especially in three-terminal architectures [22], and this enabled the demonstration of higher functions, such as pattern recognition [23] and reservoir computing for image recognition [24] in OMIEC-based neuromorphic devices.
The proneness of OMIECs towards such applications directly stems from their capability of establishing large electroactive surface area interfaces with electrolytic solutions [25] and from their strong interaction with ionic species in solution, which ultimately results in a kinetic unbalance between the processes of ion adsorption and desorption upon an external driving force [26] (i.e. typically, a pulsed bias in the electrolyte). The consequent, relatively 'slow', relaxation time of OMIEC/electrolyte interfaces can serve as a design tool for the fabrication of neuromorphic units exhibiting an inherent representation of time [27] as well as real-time filtering capabilities [28]. Such features make them ideal building blocks for in situ spatio-temporal pattern recognition, which is one of the main desiderata when imagining compact, low-power circuitry for on board data treatment and classification.
In this work, a model OMIEC-based neuromorphic architecture for spatio-temporal pattern classification is proposed, namely a dendritic integrator made by artificial post-synaptic soma with three dendrites interfaced to three pre-synaptic neurons, and its classification capabilities are assessed both on model patterns and in the context of speech recognition.
2. Results and discussion
Figure 1 shows the proposed approach and the device fabrication. In particular, figures 1(a) and (b) provide a direct comparison between the biological concept we are proposing to mimick and the corresponding artificial dendritic integrator. The three presynaptic neurons in figure 1(a) are three input terminals carrying the voltage time series Vin,1, Vin,2 and Vin,3, respectively, in figure 1(b). The three synapses are identified by their three weight coefficients k1, k2 and k3. The post synaptic neurons receives the three inputs and outputs a current, Iout, which corresponds to a weighted combination of the input signals, thus reducing the dimensionality of the input by a factor 3 (i.e. three bi-dimensional V vs t inputs are reduced to a one bi-dimensional I vs t output). Figure 1(c) shows the steps of the device fabrication, which are detailed in the experimental section. At first, three linear electrodes and a three-branched one are patterned via direct laser ablation on a metalized poly-imide layer, then a second layer of polyimide allows spatial confinement of the three synaptic cleft, each constituted by two terminals, namely one axon terminal from a pre-synaptic neuron and one dendrite from the post-synaptic soma. Synaptic terminals are then coated with PEDOT/PSS via potentiostatic electrodeposition, and the device is completed with three working electrolytes, one for each synaptic cleft. Figures 1(d) and (e) show the schematic and the actual device characterization layout.
Due to the chosen device layout, the administration of a voltage pulse elicits, in each synaptic cleft, a currents which can be collected at the presynaptic terminal, , and has the usual form:
where RE and C are resistance of the electrolyte and equivalent series capacitance of both electrode/electrolyte interfaces, respectively. This current is mirrored on the other terminal of each synapse, resulting in an excitatory post synaptic current, EPSC, in the soma. The overall somatic current Iout will simply be the linear combination of the EPSCs, as follows:
where indexes 1, 2, 3 indicate the synaptic clefts.
Since both RE and C, in the absence of any other difference concerning electrode area or thickness of the PEDOT/PSS layers, are solely influenced by the molar concentration of the electrolyte, it is possible to change the weight of each synaptic connection by changing the molar concentration in the corresponding electrolyte compartment. For this reason, in the present work, weight coefficients k1, k2 and k3 are expressed in mol l−1.
At first, we assessed the device classification performances benchmarking them against the discrimination of two model patterns, the results are summarized in figure 2. Figure 2(a) shows the selected 3 × 3 binary patterns, termed 'A' and 'B', and the corresponding input voltammograms. Each of the three input voltages codifies for a row of the pattern, each column is encoded in a Δt time interval, Vin = 1 V is assigned to a white pixel, while Vin = 0 V is assigned to a black one. A 'black pixel', namely a Δt time interval with Vin,1 = Vin,2 = Vin,3 = 0 V is added before and after every row, to ensure equal contour conditions. In this experiment, k1 = k2 = k3 = 1 M.
Download figure:
Standard image High-resolution imageThese patterns have been chosen since, although spatially different, they are temporally equal in terms of transitions between ON and OFF pixels. Such transitions are of critical importance since the cause spikes and current decays in Iout, as from equation (2). In particular, in both cases there is one voltage turning on at t = Δt, one voltage turning ON and one turning OFF at t = 2Δt and at t = 3Δt, and one turning OFF at t = 4Δt. Comparison between figures 2(b) and (c), showing the discrimination between Pattern A and Pattern B built on a time basis of 20 ms and 3 ms, respectively, unveils the relevance of Δt in determining the classification efficiency or the artificial soma. The characteristic RC relaxation time of our synapses with k = 1 M in response to a single pulse is τRC = 3.84 ± 0.62 ms (see figure S1 in the supporting information) When Δt = 20 ms (i.e. longer than τRC, figure 2(b)), all the presynaptic currents return to steady state before any further perturbation of the system equilibrium occurs. The resulting somatic currents, top right panel in figure 2(b), are hence poorly distinguishable. As a consequence, also the absolute exchanged somatic charges, |Q| (bottom right panel, figure 2(b)), calculated as the integrals of current traces in time, show comparable profiles between Pattern A and Pattern B.
It is worth noticing that, due to minimum differences between the individual synapses, the proposed architecture still manages to discriminate between the patterns, meaning—as expected—that spatial information can be encoded in the difference of synaptic weights, be them intentionally variated or adventitiously different. This effect is magnified when Δt is lower than τRC (figure 2(c)), since incomplete relaxation of each synapse (the phenomenon at the origin of STP, STDP and paired-pulse plasticity in organic neuromorphic devices) brings in additional charge contributions.
We choose to regard the absolute value of the integral of current in time, |Q|, and in particular its value at the end of the stimulation protocol, |Q|end, as the final quantitative output of the proposed classifier because, on the one hand, it magnifies even transient differences between currents and, on the other, in view of future technological development it could be directly measured with a simple integrator circuit acting as a pseudo-additional classification layer, yielding a further dimensionality reduction (as commonly done in artificial neural networks) by condensating the information of a pattern of three bidimensional time series in a single charge value. Upon these premises, a performance coefficient of our binary classifier, χ, can be expressed as the difference between the total somatic charge exchanged upon Pattern A and Pattern B stimulations, normalized over the sum of the two charges. In this example, when Δt goes from 20 ms to 3 ms, χ increases from 0.047 (data from figure 2(b)) to 0.081 (data from figure 2(c)). As said, another way to improve classification performance is to diversify the weights of the synaptic connections. Even keeping Δt = 20 ms, if k1 = 0.8 M, k2 = 1 M and k3 = 0.6 M, χ is increased to a value of 0.059.
In absolute terms, ||Q|end,A − |Q|end,B| = 1.67 µC when Δt = 20 ms and ||Q|end,A − |Q|end,B| = 1.36 µC when Δt = 3 ms, and both charge differences largely exceed the sensitivity of state of the art charge integrators (≈pC).
Aiming at the translation of the proposed architecture and of its classification capability to more complex and significant scenarios, we devised a proof-of-concept application of the optimized classifier architecture (Δt = 3 ms, k1 = 0.8 M, k2 = 1 M and k3 = 0.6 M) to the problem of speech recognition. We used a set of 14 audio recordings, each containing an individual Italian word. In particular, seven of them (i.e. AMORE—CASA—CIAO—FAME—FELICE—GRAZIE—TRISTE) were collected in the framework of the present study, while the remaining seven (i.e. GIORNATA—INFANZIA—INVENTATO—MASCHERA—ONDOSO—PRIVILEGIO—TIMONE) were extracted from the Multi-SPeaKing-style Articulatory corpus (MSPKA) [29], to rule out influences from the sampling conditions and to compare classification efficiency starting either from audio traces or from kinematic data.
From each audio, we compute the mel-frequency cepstral coefficients (MFCCs) [30] to derive three dimensional features to serve as input signals. MFCC extraction—a standard pre-processing step in speech recognition tasks—facilitates a compact parameterization of speech signals capable of capturing phonetically relevant aspects [31]. The algorithm to generate input sequences for the artificial soma starting from audio tracks is schematically shown in figure 3, using the word 'AMORE' as an example. As commonly known, it is possible to apply a moving time window to an audio track (figure 3(a)) and to compute the discrete Fourier transform for each segment, resulting in a spectrogram (figure 3(b)). Since the proposed artificial soma features three input synapses, it is necessary to reduce to 3 bins one of the dimensions of the spectrograms. Precisely, we discretized the spectrogram in three time windows (figure 3(c)), as detailed in the experimental section, and applied the discrete cosine transform to the Mel spectrogram to extract 13 cepstral coefficients (MFCCs, figure 3(d)):
Download figure:
Standard image High-resolution imagewhere Sk is the power of the kth frequency band. We, then, excluded the first cepstral coefficient c0, accounting for the overall energy of the signal, and binarized the result using a median-split approach:
Binarization produces a black and white pattern representing the audio file (figure 3(e)). All the patterns are reported in figure S2, in the supporting information.
A 90° counterclockwise rotation of the pattern (figure 3(f)) and its conversion in a square voltage input on a time basis Δt = 3 ms, with Vin = 0.5 V for white rectangles and Vin = 0 V for black ones, yields the input sequences for the artificial soma (figure 3(g)). As for the model patterns of figure 2, for each stimulation pattern it is possible to collect the individual pre-synaptic currents I1, I2 and I3 (figure 3(h)) and the resulting somatic current, Iout (figure 3(i)).
The performances of the classifier are summarized in figure 4. Figure 4(a) shows the profiles of Iout in response to the 14 V input patterns derived from audio recordings, as described in figure 3. Conversely to what happened in the cases of the model patterns in figure 2, here the current outputs are clearly distinguishable also from a qualitative examination of the tracks. To provide a quantitative estimate of the pattern recognition efficiency, it is possible to refer to the cross-correlation table in figure 4(b), which reports Pearson's correlation coefficients computed between current traces.
Download figure:
Standard image High-resolution imageFor this dataset, the average cross correlation coefficient is as low as 0.39 ± 0.23, with a maximum value of 0.84 (between MASCHERA and INFANZIA), hinting at easy discrimination amongst all the selected words. As in the case of model patterns, it is possible to further reduce the dimensionality of the problem by integrating Iout in time and acquiring charge vs time profiles, reported in figure 4(c). As predictable by the current traces in figure 4(a), every word yields an unambiguously attributable charge profile and an individual |Q|end value (figure 4(d)).
To quantitatively express the classification efficiency of a given classifier architecture (i.e. a given time basis for patter encoding and a given kernel of k values) it is not useful to refer to the parameter χ, which was devised for one-to-one comparisons, but it is possible to refer to the average difference between two subsequent |Q|end values, S.
Data in figure 4 yield an S as high as 1.38 µC while, in a control experiment with k1 = k2 = k3 = 1 M (figure S3 in the supporting information), S = 0.43 µC. This means that, by properly tuning the set of synaptic weights, it is possible to increase the classification performance by 320%.
As a final test for our classifier, as introduced above, we chose a less pre-treated but more clinically relevant input dataset, still in the field of speech recognition: namely, kinematic data of the speaker's articulatory trait, collected by electromagnetic articulography (EMA), on the subset of words extracted by the MSPKA. Results are shown in figure 5. MSPKA dataset provides, along with the audio, the corresponding speaker's articulatory kinematics data (specifically lips, jaws, and tongue movements).
Download figure:
Standard image High-resolution imageFrom the high-dimensional EMA data (comprising 7 sensors × 3 dimensions = 21 time-series), only the three most relevant recordings [32] were chosen: namely tongue movement towards and away-from the lips (TB.x) and the palate (TB.z), and finally opening and closing of the mouth (lower lip moving towards and away from the upper lip: LL.z) (figure 5(a)). Figure S4 in the supporting information shows the entire dataset. To derive voltage input sequences we binned the time-series with a moving average approach on windows of 10 ms and cut all the selected words at the duration of the shortest one (figure 5(b)). The time basis is then converted to 3 ms in accordance with previous experiments, resulting in the three input sequences (figure 5(c)). Iout is collected as already described. Figure 5(e) shows the profiles of Iout in response to the seven voltage input patterns derived from EMA recordings. Figures 5(f) and (e) show more efficient classification with respect to the corresponding data in figures 4(c) and (d). In particular, restricting the analysis to the seven-word subset, S is increased from 2.06 µC to 3.43 µC, with a 66% increase and here a charge resolution of 1 µC is sufficient for unambiguous classification. This is further evidenced by the comparison between cross correlation tables in figure 5(h) and in figure 5(i), built on Iout profiles resulting from MFCCs-derived and EMA-derived input sequences, respectively, which show much higher correlation (i.e. poorer distinction) between Iout profiles resulting from audio classification with respect to those coming from classification of kinematic data, well in accordance with previous literature on speech recognition [32].
3. Experimental
3.1. Device fabrication
The microfabricated electrodes are obtained starting from a thin flexible foil of DuPont Kapton EN30 metalized with gold (Creavac, Dresden, DE). The polymeric foil exhibits thickness values ranging between 7 and 10 µm whereas the metallization layer, deposited in vacuum, includes a thin film of chromium (3 nm) to promote adhesion and a 70 nm thick gold film. Patterning of the electrodes was performed by means of laser scan ablation according to a drawing in CAD file containing the geometry. The in-house assembled infrared laser system (Istituto Italiano di Tecnologia) enables selection of the optimal ablation power (≈2 mW) not to melt or deform the polymeric film. Although the process is intrinsically serial, the speed of the galvanometric scanner is high, allowing a total processing time lower than 4 min. After ablation, devices are cleaned by sonication in ethanol for 5 min.
Terminal parts of the electrodes are coated by potentiostatic electrodeposition of PEDOT/PSS (5 s 0.2 V, then 0.8 V in charge limit control, up to 300 mC cm−2 charge density), starting from an aqueous electrolyte containing 10 mm EDOT and 5 mg ml−1 NaPSS, reaching a final thickness of 1.22 ± 0.05 μm. PEDOT/PSS rms roughness was found to be 12 ± 1 nm. This protocol ensures high reproducibility, with impedance variability at the operational frequency lower than 2.5%. The surface topography of the electrodeposited PEDOT:PSS formulation on ablated gold-polyimide substrates was investigated by atomic force microscopy (AFM), using Park XE7 AFM System (Park System, Suwon, Lorea) operating in Tapping mode, in air and room temperature. Premounted silicon cantilevers (OMCL-AC160TS, Olympus Micro Cantilevers, Tokyo, Japan) with an Al backside reflective coating, a tip curvature radius ca. 7 nm, an elastic constant ca. 26 Nm−1, and a resonance frequency ca. 300 Hz were used. The root-mean-square roughness (rms) and the thickness of the electrodeposited PEDOT:PSS film were analyzed by Park System XEI software (Park System, Suwon, Korea). In particular, the rms were extracted and averaged from 5.0 μm × 5.0 μm topography images (figure S6) collected in three different regions. Before extracting the rms values, the topography images were flattened using a fourth order regression in X direction.
3.2. Electrical characterization
Electrical characterization is performed using three Keysight B2912A Source/Measure Units, connected in parallel on a common ground and controlled by an ad hoc designed software. Channel 1 of each SMU is used to source input voltages Vin,1, Vin,2 and Vin,3 and to collect IPRE in each individual synapse, while Channel 2 of one the SMUs is used to collect the somatic current, Iout. Measurements are performed using phosphate buffered saline solution at pH = 7.4 (P3619-1GA, Merck) as a transmission medium, adjusting the concentration via dilution with MilliQ water.
3.3. Audio spectrogram computation
A window (with no overlap) of the length of one-third of the total audio duration and an N-points fast Fourier transform with N equal to the length of the window are employed. The Mel spectrogram is then computed by applying a filterbank of 64 triangular filters linearly spaced in the Bark/Mel scale. Slaney's formula is used in the conversion from Hz to Mel [33], as implemented in the librosa package (https://zenodo.org/badge/latestdoi/6309729), mimicking the pre-processing of speech signals by the human inner ear. The resulting spectrogram is finally converted to dB (log transformation multiplied by 20).
3.4. Analysis, graphing and presentation
Collected data are analyzed and graphed by means of MatlabR2022b and OriginPro2016, figure panels are assembled in Adobe Photoshop CS6, 3D-device schematics are sketched in SketchUp Make 2017.
4. Conclusions
In this work we develop and demonstrate an organic neuromorphic spatiotemporal pattern classifier based on the concept of dendritic integration between three two-terminal artificial synapses whose weight factors can be arbitrarily set by simply changing the working electrolyte. The device is planar and fabricated through laser prototyping, making it prone to miniaturization and integration with planar, flexible and eventually bio-compatible/degradable electrode arrays for physiological applications. Furthermore, it is virtually passive, and performs classification without the necessity of any driving voltage, with the exception of the input ones. The proposed architecture can efficiently discriminate between fourteen 13 × 3 spatiotemporal patterns derived from audio files (figures 3 and 4) and shows a first proof of principle classification of quasi-continuous physiologically relevant time series (EMA data in figure 5), with even better performances with respect to strongly pre-treated model patterns. The results herein presented, coupled with the high integration possibilities offered by the technological platform of organic electronic techniques—which allows the envisioning of networks of multiple artificial somas—have the potential to foster novel online classification strategies of signaling patterns based on supervised learning strategies with organic neuromorphic electronics.
Acknowledgments
Research work leading to this publication was funded by IIT—Istituto Italiano di Tecnologia, University of Ferrara and University of Modena and Reggio Emilia (FAR 2018 Project e-MAP). This work has received funding from the European Union's Horizon Europe research and Innovation program under Grant Agreement No. 10109859, Project Piezo4Spine.
Luciano Fadiga and Fabio Biscarini equally contributed to this work.
Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).
Supplementary data (0.8 MB PDF)