Academic Commons Search Resultshttp://academiccommons.columbia.edu/catalog.rss?f%5Bauthor_facet%5D%5B%5D=Chaudhuri%2C+Surajit&f%5Bdepartment_facet%5D%5B%5D=Computer+Science&q=&rows=500&sort=record_creation_date+desc
Academic Commons Search Resultsen-usPerformance of Multiattribute Top-K Queries on Relational Systemshttp://academiccommons.columbia.edu/catalog/ac:110381
Bruno, Nicolas; Chaudhuri, Surajit; Gravano, Luishttp://hdl.handle.net/10022/AC:P:29406Fri, 22 Apr 2011 00:00:00 +0000In many applications, users specify target valuesfor the attributes of a relation, and expect in return the k tuplesthat best match these values. Traditional RDBMSs do not process these``top-k queries'' efficiently. In our previous work, we outlined afamily of strategies to map a top-k query into a traditional selectionquery that a RDBMS can process efficiently. The goal of such mappingstrategies is to get all needed tuples (but minimize the number ofretrieved tuples) and thus avoid ``restarts'' to get additionaltuples. Unfortunately, no single mapping strategy performedconsistently the best under all data distributions. In this paper, wedevelop a novel mapping technique that leverages information about thedata distribution and adapts itself to the local characteristics ofthe data and the histograms available to do the mapping. We alsoreport the first experimental evaluation of the new and old mappingstrategies over a real RDBMS, namely over Microsoft's SQL Server7.0. The experiments show that our new techniques are robust andsignificantly more efficient than previously known strategiesrequiring at least one sequential scan of the data sets.Computer sciencelg233Computer ScienceTechnical reportsOptimizing Top-K Selection Queries over Multimedia Repositorieshttp://academiccommons.columbia.edu/catalog/ac:110001
Chaudhuri, Surajit; Gravano, Luis; Marian, Ameliehttp://hdl.handle.net/10022/AC:P:29289Thu, 21 Apr 2011 00:00:00 +0000Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A query on these attributes will typically request not just a set of objects, as in the traditional relational query model (filtering), but also a grade of match associated with each object, which indicates how well the object matches the selection condition (ranking). Furthermore, unlike in the relational model, users may just want the k top-ranked objects for their selection queries, for a relatively small k. In addition to the differences in the query model, another peculiarity of multimedia repositories is that they may allow access to the attributes of each object only through indexes. In this paper, we investigate how to optimize the processing of top-k selection queries over multimedia repositories. The access characteristics of the repositories and the above query model lead to novel issues in query optimization. In particular, the choice of the indexes used to search the repository strongly influences the cost of processing the filtering condition. We define an execution space that is search-minimal, i.e., the set of indexes searched is minimal. Although the general problem of picking an optimal plan in the search-minimal execution space is NP-hard, we present an efficient algorithm that solves the problem optimally when the predicates in the query are independent. We also show that the problem of optimizing top-k selection queries can be viewed, in many cases, as that of evaluating more traditional selection conditions. Thus, both problems can be viewed together as an extended filtering problem to which techniques of query processing and optimization may be adapted.Computer sciencelg233Computer ScienceTechnical reports