Preferences in Database Systems - Challenges and Solutions
Habilitation, University of Augsburg.
Preferences are a fundamental natural concept and the handling of user wishes is becoming an increasingly important issue in real-life information systems. Prefer- ences are used for information filtering and extraction to reduce the very large volume of data presented to a user to a small set of highly interesting results. Preferences are also used to involve user profiles into the process of decision making in order to im- prove the query answers. In particular, the advent of the Big Data era has created an explosion in the available information and data. As the range of potential choices ex- pand, the time and effort required to search through them also increases. Preferences can be used to focus search queries, find appealing answers, and to order the search results. Hence, the importance of preference handling is evident for many areas of computer science, e.g., database systems or artificial intelligence.
The research literature on preferences is enormous. It covers preference queries, preference reasoning, preference mining, social choice theory, decision theory, as well as applications, e.g., recommender systems, and e-commerce (the list is by no means exhaustive). Preferences are of broad and current interest in database sys- tems and artificial intelligence, for which we have numerous international events. Although preferences have been investigated in different disciplines, their varying us- ages share a common purpose, namely, to identify interesting objects w.r.t. the user profile [Kac11].
The research on preferences in the context of database queries goes back a long time to Lacroix and Lavency [LL87]. The integration of preference queries in database systems enables satisfying search results by delivering best matches, even when no ob- ject in a dataset fulfills all preferences perfectly. Many approaches were proposed to express preferences in database systems. The approach of Kießling [Kie02] is based on modeling preferences as strict partial orders, formulating preferences in database queries by means of preference constructors. Chomicki [Cho03] represents prefer- ences by a logical formula, where the associated preference over the objects forms a partial order. Brafman and Domshlak [BD04] use a CP-net to represent the par- tial order and Ciaccia [Cia07] extend this approach to incomplete CP-nets. Given such preference representations, the result of a database query is the set of maximal objects, i.e., those objects which are not dominated w.r.t. the partial order and the associated dimensions. This is similar to the Skyline operator introduced by Börzsöny et al. [BKS01] and the Pareto dominance (named after the Italian economist Vilfredo Pareto), where objects are compared w.r.t. a preference on different dimensions. All approaches of preference handling in database systems are based on an operator, which selects the maximal objects given a preference query: Skyline (Börzsöny et al., [BKS01]), Winnow (Chomicki, [Cho03]), Best Matches Only (Kießling, [Kie02]), etc. However, the difference is only a matter of terminology. They are all based on the same principle: Given a preference order over objects, the set of maximal objects contains those which are not dominated by any other object.
Since the pioneering works of Lacroix, Lavency, Börzsöny, Kießling, and Chomicki, the quest for an effective representation and an efficient evaluation of preferences bothers many researchers all over the world. Preferences can be considered as a well researched area and to date already a variety of sophisticated techniques for preference handling are known. Nevertheless, there are still open issues concerning preference queries in database systems and in this habilitation project I consider some of these challenges and present solutions for them.
- Semi-Skylines and Skyline Snippets - Theory and Applications
Dissertation, University of Augsburg.
1st Examiner: Professor Dr. W. Kießling,
2nd Examiner: Professor Dr. B. Möller
Skyline evaluation techniques (also known as Pareto preference queries) follow a common paradigm that eliminates data elements by finding other elements in the data set that dominate them. To date already a variety of sophisticated Skyline evaluation techniques are known, hence Skylines are considered a well researched area.
This book presents Semi-Skylines as a novel concept for several challenging Skyline evaluation and optimization techniques. Among other things Semi-Skylines can be used effectively for algebraic optimization of constrained Skyline queries. Moreover, the problem of Skyline queries with very large result sets is considered. Using the concept of Skyline Snippets a subset of the complete Skyline can be computed without any pre-computed index structure.