Proceedings Eighth International Conference on Database Systems for Advanced Applications

Summary form only given, as follows. As database system research evolves, there are several enduring themes. One, of course, is how we deal with the largest possible amounts of data. A less obvious theme is optimization ¿ it is an essential ingredient of all modern forms of database system. Because we deal with large volumes of data, we are often forced to process that data in regular ways. But wh...
View full abstract»

The efficient processing of similarity joins is important for a large class of applications. The dimensionality of the data for these applications ranges from low to high. Most existing methods have focussed on the execution of high-dimensional joins over large amounts of disk-based data. The increasing sizes of main memory available on current computers, and the need for efficient processing of s...
View full abstract»

Modern database applications including computer-aided design (CAD), medical imaging, or molecular biology impose new requirements on spatial query processing. Particular problems arise from the need of high resolutions for very large spatial objects, including cars, space stations, planes and industrial plants, and from the design goal to use general purpose database management systems in order to...
View full abstract»

Similarity search in database systems is becoming an increasingly important task in modern application domains such as multimedia, molecular biology, medical imaging and many others. Especially for CAD applications, suitable similarity models and a clear representation of the results can help to reduce the cost of developing and producing new parts by maximizing the reuse of existing parts. In thi...
View full abstract»

Automating schema matching is challenging. Previous approaches to automating schema matching focus on computing direct element matches between two schemas. Schemas, however rarely match directly. Thus, to complete the task of schema matching, we must also compute indirect element matches. In this paper we present a framework for generating direct as well as many indirect element matches between a ...
View full abstract»

Integration of multiple heterogeneous data sources continues to be a critical problem for many application domains and a challenge for researchers world-wide. One aspect of integration is the translation of schema and data across data model boundaries. Researchers in the past have looked at both customized algorithmic approaches as well as generic meta-modeling approaches as viable solutions. We n...
View full abstract»

Peer-to-peer (P2P) technology can be naturally integrated with mobile agent technology in Internet applications, taking advantage of the autonomy, mobility, and efficiency of mobile agents in accessing and processing data. We address the problem of protecting critical information in agent-based P2P Internet applications under two different scenarios. First, we assume the route of a mobile agent in...
View full abstract»

Mining frequent patterns is a fundamental and important problem in many data mining applications. Many of the algorithms adopt the pattern growth approach, which is shown to be superior to the candidate generate-and-test approach significantly. We identify the key factors that influence the performance of the pattern growth approach, and optimize them to further improve the performance. Our algori...
View full abstract»

Recently a growing number of applications monitor the physical world by tracking sensor data and detecting values, trends or patterns of interest. We focus on the problem of detecting sequential patterns with complex predicates over sensor data, and present an algorithm that efficiently pre-computes which pattern predicates' checks can be skipped at query compile-time, so that the processing windo...
View full abstract»

Transaction clustering has received attention in recent developments of data mining. Traditional clustering methods are not useful to solve this problem. Transaction data sets are different from the traditional data sets in their high dimensionality, sparsity and numerous outliers. We introduce a new efficient algorithm for transaction clustering. The proposed algorithm is based on a caucus, which...
View full abstract»

We propose a novel Web search scheme TAX-PQ. TAX-PQ enables taxonomy-based topic-focused Web search on ordinary Boolean Web search interfaces. TAX-PQ utilizes a taxonomy and the data set maintained in an existing taxonomy-based search facility for this purpose. The search is initiated by designating an initial query and a context category in the taxonomy. The data set in the taxonomy-based search ...
View full abstract»

We propose an edge capacity based on hub and authority scores, and examine the effects of using the edge capacity on the method for extracting Web communities using maximum flow algorithm proposed by G. Flake et al. (2000). A Web community is a collection of Web pages in which a common (or related) topic is taken up. In recent years, various methods for finding Web communities have been proposed. ...
View full abstract»

To integrate many data sources we use a peer mediator-framework where views defined in the peers are logically composed in terms of each other A common approach to execute queries over mediators is to treat views in data sources as 'black boxes'. The mediators locally decompose queries into query fragments and submit them to the data sources for processing. Another approach, used in distributed DB...
View full abstract»

We introduce a new type of KDD patterns called emerging substrings. In a sequence database, an emerging substring (ES) of a data class is a substring which occurs more frequently in that class rather than in other classes. ESs are important to sequence classification as they capture significant contrasts between data classes and provide insights for the construction of sequence classifiers. We pro...
View full abstract»

With the rapid growth of on-line information available, text classification is becoming more and more important. kNN is a widely used text classification method of high performance. However, this method is inefficient because it requires a large amount of computation for evaluating the similarity between a test document and each training document. In this paper, we propose a fast kNN text classifi...
View full abstract»

This paper describes an efficient approach to record linkage. Given two lists of records, the record-linkage problem consists of determining all pairs that are similar to each other where the overall similarity between two records is defined based on domain-specific similarities over individual attributes constituting the record. The record-linkage problem arises naturally in the context of data c...
View full abstract»

This paper introduces an efficient method for the maintenance of wavelet-based histograms built on partial sums. Wavelet-based histograms can be constructed from either raw data distributions or partial sums. The two construction methods have their own merits. Previous works have only focused on the maintenance of raw-data-based histograms. However it is highly inefficient to apply directly their ...
View full abstract»

Selectivity estimation is an integral part of query optimization. In this paper, we propose a novel approach to approximate data density functions of relations and use them to estimate selectivities. A data density function here is approximated by a partial sum of an orthogonal series. Such approximate density functions can be derived easily, stored efficiently, and maintained dynamically. Experim...
View full abstract»

Histogram techniques are widely used in commercial database management systems for an estimation of query results. Recently, they have been also used in approximately, processing database queries, especially aggregation queries. Existing research results in this area have been mainly focused on constructing a histogram to approximately represent, as accurate as possible on an intuitive base, the o...
View full abstract»

Moving object environments contain large numbers of queries and continuously moving objects. Traditional spatial index structures do not work well in this environment because of the need to frequently update the index which results in very poor performance. In this paper, we present a novel indexing structure, namely the Q+Rtree, based on the observation that: i) most moving objects are in quasi-s...
View full abstract»

Recently, more research has been conducted on moving object databases (MOD). Typically, there are three kinds of data for dynamic attributes in MOD, i.e., historical, current and future. Although many index structures have been developed for the former two types of data, there is not much work to deal with the future data. In particular, the problem of index update has not been addressed with effi...
View full abstract»

Modern computer applications, from business decision support to scientific data analysis, utilize data visualization tools to support exploratory activities. Visual exploration tools typically do not scale well when applied to huge data sets, partially because being interactive necessitates real-time responses. However, we observe that interactive visual explorations exhibit several properties tha...
View full abstract»

With the wide availability of content delivery networks, many e-commerce Web applications utilize edge cache servers to cache and deliver dynamic contents at locations much closer to users, avoiding network latency. By caching a large number of dynamic content pages in the edge cache servers, response time can be reduced, benefiting from higher cache hit rates. However this is achieved at the expe...
View full abstract»

In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex aggregate queries based on samples. However, uniformly extracted samples often do not guarantee acceptable accuracy in grouping interval estimations. This is crucial in most less-aggregated analyses, which are mostly based on recent data (e.g. forecasting, performance analysis). We prop...
View full abstract»