A Python toolkit for unsupervised feature learning with deep neural networks (DNNs)
Developers: Shahin Amiriparian, Michael Freitag, Sergey Pugachevskiy, Björn W. Schuller
auDeep is a Python toolkit for unsupervised feature learning with deep neural networks (DNNs). Currently, the main focus of this project is feature extraction from audio data with deep recurrent autoencoders. However, the core feature learning algorithms are not limited to audio data. Furthermore, we plan on implementing additional DNN-based feature learning approaches.
(c) 2017 Michael Freitag, Shahin Amiriparian, Sergey Pugachevskiy, Nicholas Cummins, Björn Schuller: Universität Passau Published under GPLv3, see the LICENSE.md file for details.
Please direct any questions or requests to Shahin Amiriparian (shahin.amiriparian at tum.de) or Michael Freitag (freitagm at fim.uni-passau.de).
If you use auDeep or any code from auDeep in your research work, you are kindly asked to acknowledge the use of auDeep in your publications.
M. Freitag, S. Amiriparian, S. Pugachevskiy, N. Cummins, and B.Schuller. auDeep: Unsupervised Learning of Representations from Audio with Deep Recurrent Neural Networks, Journal of Machine Learning Research, 2017, submitted, 5 pages.
S. Amiriparian, M. Freitag, N. Cummins, and B. Schuller. Sequence to sequence autoencoders for unsupervised representation learning from audio, Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop, pp. 17-21, 2017.
A Python toolkit for feature extraction from audio data with pre-trained Image Convolutional Neural Networks (CNNs)
Developers: Shahin Amiriparian, Maurice Gerczuk, Sandra Ottl, Björn W. Schuller
DeepSpectrum is a Python toolkit for feature extraction from audio data with pre-trained Image Convolutional Neural Networks (CNNs). It features an extraction pipeline which first creates visual representations for audio data - plots of spectrograms or chromagrams - and then feeds them to a pre-trained Image CNN. Activations of a specific layer then form the final feature vectors.
(c) 2017-2018 Shahin Amiriparian, Maurice Gercuk, Sandra Ottl, Björn Schuller: Universität Augsburg Published under GPLv3, see the LICENSE.md file for details.
Please direct any questions or requests to Shahin Amiriparian (shahin.amiriparian at tum.de) or Maurice Gercuk (gerczuk at fim.uni-passau.de).
If you use DeepSpectrum or any code from DeepSpectrum in your research work, you are kindly asked to acknowledge the use of DeepSpectrum in your publications.
S. Amiriparian, M. Gerczuk, S. Ottl, N. Cummins, M. Freitag, S. Pugachevskiy, A. Baird and B. Schuller. Snore Sound Classification using Image-Based Deep Spectrum Features. In Proceedings of INTERSPEECH (Vol. 17, pp. 2017-434).
An intelligent gamified crowdsourcing platform for a wide range of data collection and annotation
Authors: Simone Hantke, Björn Schuller
Project Site: www.ihearu-play.eu
iHEARu-PLAY is a modular intelligent gamified crowdsourcing platform for large-scale, in-the-wild audio, image, and audio-visual data collection and annotation. The platform runs on any standard PC or smartphone application and offers quality-effective and cost-effective audio, video and image labelling for a diverse range of annotation tasks, taking into account novel annotator trustability-based machine learning algorithms to reduce the manual annotation workload.
Regarding copyright, the Active Learning Code is under GNU GENERAL PUBLIC LICENSE. For iHEARu-PLAY please contact the authors.
Please direct any questions or requests to Simone Hantke (simone.hantke at tum.de) or Björn Schuller (schuller at informatik.uni-augsburg.de).
If you use iHEARu-PLAY or any code from iHEARu-PLAY in your research work, you are kindly asked to acknowledge the use of iHEARu-PLAY in your publications.
S. Hantke, F. Eyben, T. Appel, and B. Schuller, “iHEARu-PLAY: Introducing a game for crowdsourced data collection for affective computing,” in Proc. 1st International Workshop on Automatic Sentiment Analysis in the Wild (WASA 2015) held in conjunction with the 6th biannual Conference on Affective Computing and Intelligent Interaction (ACII 2015), (Xi’an, P. R. China), pp. 891–897, AAAC, IEEE, September 2015.
S. Hantke, A. Abstreiter, N. Cummins, and B. Schuller, “Trustability-based Dynamic Active Learning for Crowdsourced Labelling of Emotional Audio Data,” IEEE Access, vol. 6, pp. 42142–42155, December 2018.
The Passau Open-Source Crossmodal Bag-of-Words Toolkit
Authors: Maximilian Schmitt, Björn W. Schuller
openXBOW generates a bag-of-words representation from a sequence of numeric and/or textual features, e.g., acoustic LLDs, visual features, and transcriptions of natural speech. The tool provides a multitude of options, e.g., different modes of vector quantisation, codebook generation, term frequency weighting and methods known from natural language processing. In the GitHub repository, you find a tutorial that helps you to starting working with openXBOW.
The development of this toolkit has been supported by the European Union's Horizon 2020 Programme under grant agreement No. 645094 (IA SEWA) and the European Community's Seventh Framework Programme through the ERC Starting Grant No. 338164 (iHEARu). SEWA iHEARu EU Horizon2020
For more information, please visit the official websites: http://sewaproject.eu http://ihearu.eu (C) 2016-2017, published under GPL v3, please check the file LICENSE.txt for details. Maximilian Schmitt, Björn Schuller: University of Passau. Contact: email@example.com
If you use openXBOW or any code from openXBOW in your research work, you are kindly asked to acknowledge the use of openXBOW in your publications.
Maximilian Schmitt, Björn W. Schuller: openXBOW - Introducing the Passau Open-Source Crossmodal Bag-of-Words Toolkit, Journal of Machine Learning Research, vol. 18, pp. 1-5, 2017.
Neuro-Holistic Audio-eNhancement System
Authors: Shuo Liu, Gil Keren, Björn W. Schuller
N-HANS is a Python toolkit for in-the-wild speech enhancement, including speech, music, and general audio denoising, separation, and selective noise or source suppression. The functionalities are realised based on two neural network models sharing the same architecture, but trained separately. The models are comprised of stacks of residual blocks, each conditioned on additional speech or environmental noise recordings for adapting to different unseen speakers or environments in real life.
pip3 install N-HANS
(c) 2020-2021 Shuo Liu, Gil Keren, Björn Schuller: University of Augsburg published under GPL v3.
Please direct any questions or requests to Shuo Liu (firstname.lastname@example.org).
S. Liu, G. Keren, E. Parada-Cabaleiro, B. Schuller, "N-HANS: A neural network-based toolkit for in-the-wild audio enhancement," Multimedia Tools and Applications, 2021, accepted, 27 pages.