Toolkits

auDeep

A Python toolkit for unsupervised feature learning with deep neural networks (DNNs)

 

Developers:  Shahin Amiriparian, Michael Freitag, Sergey Pugachevskiy, Björn W. Schuller

 

GitHub: https://github.com/auDeep/auDeep

 

auDeep is a Python toolkit for unsupervised feature learning with deep neural networks (DNNs). Currently, the main focus of this project is feature extraction from audio data with deep recurrent autoencoders. However, the core feature learning algorithms are not limited to audio data. Furthermore, we plan on implementing additional DNN-based feature learning approaches.

 

(c) 2017 Michael Freitag, Shahin Amiriparian, Sergey Pugachevskiy, Nicholas Cummins, Björn Schuller: Universität Passau Published under GPLv3, see the LICENSE.md file for details.

 

Please direct any questions or requests to Shahin Amiriparian (shahin.amiriparian at tum.de) or Michael Freitag (freitagm at fim.uni-passau.de).

 

Citing

If you use auDeep or any code from auDeep in your research work, you are kindly asked to acknowledge the use of auDeep in your publications.

 

M. Freitag, S. Amiriparian, S. Pugachevskiy, N. Cummins, and B.Schuller. auDeep: Unsupervised Learning of Representations from Audio with Deep Recurrent Neural Networks, Journal of Machine Learning Research, 2017, submitted, 5 pages.

 

S. Amiriparian, M. Freitag, N. Cummins, and B. Schuller. Sequence to sequence autoencoders for unsupervised representation learning from audio, Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop, pp. 17-21, 2017.

DeepSpectrum

A Python toolkit for feature extraction from audio data with pre-trained Image Convolutional Neural Networks (CNNs)

 

Developers:  Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Björn W. Schuller

 

GitHub: https://github.com/DeepSpectrum/DeepSpectrum 


DeepSpectrum is a Python toolkit for feature extraction from audio data with pre-trained Image Convolutional Neural Networks (CNNs). It features an extraction pipeline which first creates visual representations for audio data - plots of spectrograms or chromagrams - and then feeds them to a pre-trained Image CNN. Activations of a specific layer then form the final feature vectors.

 

(c) 2017-2018 Shahin Amiriparian, Maurice Gercuk, Sandra Ottl, Björn Schuller: Universität Augsburg Published under GPLv3, see the LICENSE.md file for details.

 

Please direct any questions or requests to Shahin Amiriparian (shahin.amiriparian at tum.de) or Maurice Gercuk (gerczuk at fim.uni-passau.de).


Citing
If you use DeepSpectrum or any code from DeepSpectrum in your research work, you are kindly asked to acknowledge the use of DeepSpectrum in your publications.

 

S. Amiriparian, M. Gerczuk, S. Ottl, N. Cummins, M. Freitag, S. Pugachevskiy, A. Baird and B. Schuller. Snore Sound Classification using Image-Based Deep Spectrum Features. In Proceedings of INTERSPEECH (Vol. 17, pp. 2017-434).

iHEARu-PLAY

© University of Augsburg

 

An intelligent gamified crowdsourcing platform for a wide range of data collection and annotation

 

Authors: Simone Hantke, Björn Schuller

 

Project Site: www.ihearu-play.eu 

 

GitHub: www.github.com/iHEARu-PLAY/iHEARu-PLAY 

 

iHEARu-PLAY is a modular intelligent gamified crowdsourcing platform for large-scale, in-the-wild audio, image, and audio-visual data collection and annotation. The platform runs on any standard PC or smartphone application and offers quality-effective and cost-effective audio, video and image labelling for a diverse range of annotation tasks, taking into account novel annotator trustability-based machine learning algorithms to reduce the manual annotation workload.

 

Regarding copyright, the Active Learning Code is under GNU GENERAL PUBLIC LICENSE. For iHEARu-PLAY please contact the authors.

 

Please direct any questions or requests to Simone Hantke (simone.hantke at tum.de) or Björn Schuller (schuller at informatik.uni-augsburg.de).

 

If you use iHEARu-PLAY or any code from iHEARu-PLAY in your research work, you are kindly asked to acknowledge the use of iHEARu-PLAY in your publications.

 

Citing
S. Hantke, F. Eyben, T. Appel, and B. Schuller, “iHEARu-PLAY: Introducing a game for crowdsourced data collection for affective computing,” in Proc. 1st International Workshop on Automatic Sentiment Analysis in the Wild (WASA 2015) held in conjunction with the 6th biannual Conference on Affective Computing and Intelligent Interaction (ACII 2015), (Xi’an, P. R. China), pp. 891–897, AAAC, IEEE, September 2015.

S. Hantke, A. Abstreiter, N. Cummins, and B. Schuller, “Trustability-based Dynamic Active Learning for Crowdsourced Labelling of Emotional Audio Data,” IEEE Access, vol. 6, pp. 42142–42155, December 2018.

N-HANS

Neuro-Holistic Audio-eNhancement System

 

Authors: Shuo Liu, Gil Keren, Björn W. Schuller

 

Github: https://github.com/N-HANS/N-HANS

 

N-HANS is a Python toolkit for in-the-wild speech enhancement, including speech, music, and general audio denoising, separation, and selective noise or source suppression. The functionalities are realised based on two neural network models sharing the same architecture, but trained separately. The models are comprised of stacks of residual blocks, each conditioned on additional speech or environmental noise recordings for adapting to different unseen speakers or environments in real life.

 

                                                   pip3 install N-HANS

 

(c) 2020-2021 Shuo Liu, Gil Keren, Björn Schuller: University of Augsburg published under GPL v3.

 

Please direct any questions or requests to Shuo Liu (shuo.liu@informatik.uni-augsburg.de).

 

Citing

S. Liu, G. Keren, E. Parada-Cabaleiro, B. Schuller, "N-HANS: A neural network-based toolkit for  in-the-wild audio enhancement," Multimedia Tools and Applications, 2021, accepted, 27 pages.

openXBOW

© University of Augsburg

 

The Passau Open-Source Crossmodal Bag-of-Words Toolkit

 

Authors: Maximilian Schmitt, Björn W. Schuller

 

GitHub: https://github.com/openXBOW/openXBOW

 

openXBOW generates a bag-of-words representation from a sequence of numeric and/or textual features, e.g., acoustic LLDs, visual features, and transcriptions of natural speech. The tool provides a multitude of options, e.g., different modes of vector quantisation, codebook generation, term frequency weighting and methods known from natural language processing. In the GitHub repository, you find a tutorial that helps you to starting working with openXBOW.

 

The development of this toolkit has been supported by the European Union's Horizon 2020 Programme under grant agreement No. 645094 (IA SEWA) and the European Community's Seventh Framework Programme through the ERC Starting Grant No. 338164 (iHEARu). SEWA iHEARu EU Horizon2020

 

For more information, please visit the official websites: http://sewaproject.eu http://ihearu.eu (C) 2016-2017, published under GPL v3, please check the file LICENSE.txt for details. Maximilian Schmitt, Björn Schuller: University of Passau. Contact: maximilian.schmitt@uni-passau.de

 

Citing

If you use openXBOW or any code from openXBOW in your research work, you are kindly asked to acknowledge the use of openXBOW in your publications.

 

http://www.jmlr.org/papers/v18/17-113.html

 

Maximilian Schmitt, Björn W. Schuller: openXBOW - Introducing the Passau Open-Source Crossmodal Bag-of-Words Toolkit, Journal of Machine Learning Research, vol. 18, pp. 1-5, 2017.

Search