Records of academic references


Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition

Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Julien Epps, and Bj¨ orn W. Schuller, Fellow, IEEE
Abstract—Inspite the emerging importance of Speech Emotion Recognition (SER), the state-of-the-art accuracy is quite low and
needs improvement to make commercial applications of SER viable. A key underlying reason for the low accuracy is the scarcity of
emotion datasets, which is a challenge for developing any robust machine learning model in general. In this paper, we propose a
solution to this problem: a multi-task learning framework that uses auxiliary tasks for which data is abundantly available. We show that
utilisation of this additional data can improve the primary task of SER for which only limited labelled data is available. In particular, we
use gender identifications and speaker recognition as auxiliary tasks, which allow the use of very large datasets, e. g., speaker
classification datasets. To maximise the benefit of multi-task learning, we further use an adversarial autoencoder (AAE) within our
framework, which has a strong capability to learn powerful and discriminative features. Furthermore, the unsupervised AAE in
combination with the supervised classification networks enables semi-supervised learning which incorporates a discriminative
component in the AAE unsupervised training pipeline. This semi-supervised learning essentially helps to improve generalisation of our
framework and thus leads to improvements in SER performance. The proposed model is rigorously evaluated for categorical and
dimensional emotion, and cross-corpus scenarios. Experimental results demonstrate that the proposed model achieves state-of-the-art
performance on two publicly available datasets.



AVA: A Large-Scale Database for Aesthetic Visual Analysis
Naila Murray
Computer Vision Center
Universitat Aut`onoma de Barcelona, Spain
nmurray@cvc.uab.es
Luca Marchesotti, Florent Perronnin
Xerox Research Centre Europe
Meylan, France
firstname.lastname@xrce.xerox.com
Abstract
With the ever-expanding volume of visual content available,
the ability to organize and navigate such content by
aesthetic preference is becoming increasingly important.
While still in its nascent stage, research into computational
models of aesthetic preference already shows great potential.
However, to advance research, realistic, diverse and
challenging databases are needed. To this end, we introduce
a new large-scale database for conducting Aesthetic
Visual Analysis: AVA. It contains over 250,000 images
along with a rich variety of meta-data including a
large number of aesthetic scores for each image, semantic
labels for over 60 categories as well as labels related to
photographic style. We show the advantages of AVA with respect
to existing databases in terms of scale, diversity, and
heterogeneity of annotations. We then describe several key
insights into aesthetic preference afforded by AVA. Finally,
we demonstrate, through three applications, how the large
scale of AVA can be leveraged to improve performance on
existing preference tasks.


Content-Based Photo Quality Assessment
Xiaoou Tang, Fellow, IEEE, Wei Luo, and Xiaogang Wang, Member, IEEE
Abstract—Automatically assessing photo quality from the perspective
of visual aesthetics is of great interest in high-level vision
research and has drawn much attention in recent years. In this
paper, we propose content-based photo quality assessment using
both regional and global features. Under this framework, subject
areas, which draw the most attentions of human eyes, are first
extracted. Then regional features extracted from both subject
areas and background regions are combined with global features
to assess photo quality. Since professional photographers adopt
different photographic techniques and have different aesthetic criteria
in mind when taking different types of photos (e.g., landscape
versus portrait), we propose to segment subject areas and extract
visual features in different ways according to the variety of photo
content. We divide the photos into seven categories based on their
visual content and develop a set of new subject area extraction
methods and new visual features specially designed for different
categories. The effectiveness of this framework is supported by
extensive experimental comparisons of existing photo quality
assessment approaches as well as our new features on different
categories of photos. In addition, we propose an approach of online
training an adaptive classifier to combine the proposed features
according to the visual content of a test photo without knowing
its category. Another contribution of this work is to construct
a large and diversified benchmark dataset for the research of
photo quality assessment. It includes 17,673 photos with manually
labeled ground truth.


Discovering Beautiful Attributes for Aesthetic Image Analysis
Luca Marchesotti · Naila Murray · Florent Perronnin
Received: 6 May 2014 / Accepted: 17 November 2014 / Published online: 9 December 2014
© Springer Science+Business Media New York 2014
Abstract Aesthetic image analysis is the study and assessment
of the aesthetic properties of images. Current computational
approaches to aesthetic image analysis either provide
accurate or interpretable results. To obtain both accuracy
and interpretability by humans, we advocate the use
of learned and nameable visual attributes as mid-level features.
For this purpose, we propose to discover and learn the
visual appearance of attributes automatically, using a recently
introduced database, called AVA, which contains more than
250,000 images together with their aesthetic scores and textual
comments given by photography enthusiasts.We provide
a detailed analysis of these annotations as well as the context
in which they were given. We then describe how these three
key components of AVA—images, scores, and comments—
can be effectively leveraged to learn visual attributes. Lastly,
we showthat these learned attributes can be successfully used
in three applications: aesthetic quality prediction, image tagging
and retrieval.


Aesthetic Visual Quality Assessment of Paintings
Congcong Li, Student Member, IEEE, and Tsuhan Chen, Fellow, IEEE
Abstract—This paper aims to evaluate the aesthetic visual
quality of a special type of visual media: digital images of paintings.
Assessing the aesthetic visual quality of paintings can be
considered a highly subjective task. However, to some extent,
certain paintings are believed, by consensus, to have higher aesthetic
quality than others. In this paper, we treat this challenge
as a machine learning problem, in order to evaluate the aesthetic
quality of paintings based on their visual content. We design a
group of methods to extract features to represent both the global
characteristics and local characteristics of a painting. Inspiration
for these features comes from our prior knowledge in art and a
questionnaire survey we conducted to study factors that affect
human’s judgments. We collect painting images and ask human
subjects to score them. These paintings are then used for both
training and testing in our experiments. Experimental results show
that the proposed work can classify high-quality and low-quality
paintings with performance comparable to humans. This work
provides a machine learning scheme for the research of exploring
the relationship between aesthetic perceptions of human and the
computational visual features extracted from paintings.