Author: Amiriparian, Shahin; Hubner, Tobias; Gerczuk, Maurice; Ottl, Sandra; Care, Bjorn W. Schuller EIHW -- Chair of Embedded Intelligence for Health; Wellbeing,; Augsburg, University of; Germany,; Language, GLAM -- Group on; Audio,; Music,; London, Imperial College; UK,
                    Title: DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for Embedded Speech and Audio Processing from Decentralised Data  Cord-id: t1n1an4i  Document date: 2021_4_23
                    ID: t1n1an4i
                    
                    Snippet: Deep neural speech and audio processing systems have a large number of trainable parameters, a relatively complex architecture, and require a vast amount of training data and computational power. These constraints make it more challenging to integrate such systems into embedded devices and utilise them for real-time, real-world applications. We tackle these limitations by introducing DeepSpectrumLite, an open-source, lightweight transfer learning framework for on-device speech and audio recognit
                    
                    
                    
                     
                    
                    
                    
                    
                        
                            
                                Document: Deep neural speech and audio processing systems have a large number of trainable parameters, a relatively complex architecture, and require a vast amount of training data and computational power. These constraints make it more challenging to integrate such systems into embedded devices and utilise them for real-time, real-world applications. We tackle these limitations by introducing DeepSpectrumLite, an open-source, lightweight transfer learning framework for on-device speech and audio recognition using pre-trained image convolutional neural networks (CNNs). The framework creates and augments Mel-spectrogram plots on-the-fly from raw audio signals which are then used to finetune specific pre-trained CNNs for the target classification task. Subsequently, the whole pipeline can be run in real-time with a mean inference lag of 242.0 ms when a DenseNet121 model is used on a consumer-grade Motorola moto e7 plus smartphone. DeepSpectrumLite operates decentralised, eliminating the need for data upload for further processing. By obtaining state-of-the-art results on a set of paralinguistics tasks, we demonstrate the suitability of the proposed transfer learning approach for embedded audio signal processing, even when data is scarce. We provide an extensive command-line interface for users and developers which is comprehensively documented and publicly available at https://github.com/DeepSpectrum/DeepSpectrumLite.
 
  Search related documents: 
                                Co phrase  search for related documents- Try single phrases listed below for: 1
  
 
                                Co phrase  search for related documents, hyperlinks ordered by date