1. What is data mining? In your answer, address the following: Data mining refers to the process or method that extracts or \mines" interesting knowledge or patterns from large amounts of data.
(a) Is it another hype?Data mining is not another hype. Instead, the need for data mining has arisen due to the wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge. Thus, data mining can be viewed as the result of the natural evolution of information technology. (b) Is it a simple transformation or application of technology developed from databases, statistics, machine learning, and pattern recognition? No. Data mining is more than a simple transformation of technology developed from databases, statistics, and machine learning. Instead, data mining involves an integration, rather than a simple transformation, of techniques from multiple disciplines such as database technology, statistics, machine learning, high-performance computing, pattern recognition, neural networks, data visualization, information retrieval, image and signal processing, and spatial data analysis.. (c) We have presented a view that data mining is the result of the evolution of database technology. Do you think that data mining is also the result of the evolution of machine learning research? Can you present such views based on the historical progress of this discipline? Do the same for the fields of statistics and pattern recognition.

(d) Describe the steps involved in data mining when viewed as a process of knowledge discovery The steps involved in data mining when viewed as a process of knowledge discovery are as follows: * Data cleaning, a process that removes or transforms noise and inconsistent data * Data integration, where multiple data sources may be combined * Data selection, where data relevant to the analysis task are retrieved from the database * Data transformation, where data are transformed or consolidated into forms appropriate for mining * Data mining, an essential process where intelligent and efficient methods are applied in order to extract patterns * Pattern evaluation, a process that identifies the truly interesting patterns representing knowledge based on some interestingness measures * Knowledge presentation, where visualization and knowledge representation techniques are used to present the mined knowledge to the user

2. How is a data warehouse different from a database? How are they similar? * Differences between a data warehouse and a database: A data warehouse is a repository of information collected from multiple sources, over a history of time, stored under a unified schema, and used for data analysis and decision support; whereas a database, is a collection of interrelated data that represents the current status of the stored data. There could be multiple heterogeneous databases where the schema of one database may not agree with the schema of another. A database system supports ad-hoc query and on-line transaction processing. * Similarities between a data warehouse and a database: Both are repositories of information, storing huge amounts of persistent data. 3. Define each of the following data mining functionalities: characterization, discrimination, association and correlation analysis, classification, regression, clustering, and outlier analysis. Give examples of each data mining functionality, using a real-life database that you are familiar with. * Characterization is a summarization of the general characteristics or features of a target class of data. For example, the characteristics of students can be produced, generating a profile of all the University first year computing science students, which may include such information as a high GPA and large number of courses taken. * Discrimination is a comparison of the general features of target class data objects with the general features...

YOU MAY ALSO FIND THESE DOCUMENTS HELPFUL

...Role Mining - Revealing Business Roles
for Security Administration
using DataMining Technology
Martin Kuhlmann
Dalia Shohat
SYSTOR Security Solutions GmbH
Hermann-Heinrich-Gossen-Strasse 3
D 50858 Cologne
[martin.kuhlmann|dalia.shohat]
@systorsecurity.com
Gerhard Schimpf
SMF TEAM IT-Security Consulting
Am Waldweg 23
D 75173 Pforzheim
Gerhard.Schimpf@smfteam.de
ABSTRACT
In this paper we describe the work devising a new technique for
role-finding to implement Role-Based Security Administration.
Our results stem from industrial projects, where large-scale
customers wanted to migrate to Role-Based Access Control
(RBAC) based on already existing access rights patterns in their
production IT-systems.
The core of this paper creates a link between the use of well
established datamining technology and RBAC. We present a
process for detecting patterns in a data base of access rights and
for deriving enterprise roles from these patterns. Moreover, a tool
(the SAM Role Miner) is described. The result allows an
organized migration process to RBAC with the goal of building a
single point of administration and control, using a cross-platform
administration tool.
Categories and Subject Descriptors
D.4.6 [Operating Systems]: Security and Protection – Access
Controls; H.2.0 [Information Systems]: General – Security,
Integrity, and Protection; K.6.5 [Management of Computing...

...Haiyang Zheng Andrew Kusiak
e-mail: andrew-kusiak@uiowa.edu Department of Mechanical and Industrial Engineering, 3131 Seamans Center, University of Iowa, Iowa City, IA 52242-1527
Prediction of Wind Farm Power Ramp Rates: A Data-Mining Approach
In this paper, multivariate time series models were built to predict the power ramp rates of a wind farm. The power changes were predicted at 10 min intervals. Multivariate time series models were built withdata-mining algorithms. Five different data-mining algorithms were tested using data collected at a wind farm. The support vector machine regression algorithm performed best out of the ﬁve algorithms studied in this research. It provided predictions of the power ramp rate for a time horizon of 10–60 min. The boosting tree algorithm selects parameters for enhancement of the prediction accuracy of the power ramp rate. The data used in this research originated at a wind farm of 100 turbines. The test results of multivariate time series models were presented in this paper. Suggestions for future research were provided. DOI: 10.1115/1.3142727 Keywords: power ramp rate prediction, wind farm, data-mining algorithms, multivariate time series model, parameter selection
1 Introduction
Wind power generation is rapidly expanding and is becoming a noticeable contributor to the electric grid. The...

...technology of cloud
computing using web mining .Web mining include how
to extract the useful information from the web and gain
knowledge using datamining techniques. Here so many
online resources are available i.e. web content mining
and access through the web servers. Web mining
techniques and applications are much needed in cloud
computing .The implementation of these techniques
through cloud computing will allow users to retrieve
relevant and meaningful data from virtually integrated
data warehouse which reduces cost and infrastructure.
Keywords— Datamining, Web Mining, Data Warehouse,
Knowledge Discovery, Cloud Mining, Web Content Mining,
Web Structure Mining, Web Usage Mining
I. INTRODUCTION
The Internet is becoming an increasingly vital tool in our
everyday life, both professional and personal, as its users
are becoming more numerous. It is not surprising that
business is increasingly conducted over the Internet.
Perhaps one of the most revolutionary concepts of recent
years is Cloud Computing. The expansion of the World
Wide Web (Web for short) has resulted in a large amount
of data that is now in general freely available for user
access. The different types of data have to be managed and...

...﻿
1
Abstract.
In this paper, we present an overview of research issues in web mining. The World Wide Web has turned to be one of the largest information sources. It is an heterogeneous,explosive,,dynamic and mostly unstructured data repository.Some companies use the Web to find out more about heir competition.the user want to have the efficient search tools to find significant information easily. All of them are expecting tools or techniques to help them satisfy their demands and / or solve the problems encountered on the web. Therefore,web intelligence is required to help organizations in decision making as well as also help users in finding relevant information.
Web mining with respect to web data referred here as web datamining. In particular, our focus is on
web datamining research in context of our web warehousing project called WHOWEDA
(Warehouse of Web Data). We have categorized web datamining into threes areas; web content
mining, web structure mining and web usage mining. We have highlighted and discussed various
research issues involved in each of these web datamining category. We believe that web datamining will be the topic of exploratory research in near future.
1 Introduction
The advent of...

...A Recommender System Based On Web DataMining for Personalized E-learning
Jinhua Sun
Department of Computer Science and Technology Xiamen University of Technology, XMUT Xiamen, China jhsun@xmut.edu.cn
Yanqi Xie
Department of Computer Science and Technology Xiamen University of Technology, XMUT Xiamen, China yqxie@xmut.edu.cn
Abstract—In this paper, we introduce a web datamining
solution to e-learning system to discover hidden patterns strategies from their learners and web data, describe a personalized recommender system that uses web mining techniques for recommending a student which (next) links to visit within an adaptable e-learning system, propose a new framework based on datamining technology for building a Web-page recommender system, and demonstrate how datamining technology can be effectively applied in an e-learning environment. Keywords--Datamining; web log,;e-learning; recommender
readily interpreted by the analyst. A virtual e-learning framework is proposed, and how to enhance e-learning through web datamining is discussed. II. RELATED WORK
I.
INTRODUCTION
With the rapid development of the World Wide Web, Web datamining has been extensively used in the past for analyzing huge collections of data,...

...at DataMining in the Pharmaceutical Industry
Topics Covered:
1) What is DataMining and why is it used?
2) How is DataMining used in the Pharmaceutical Industry?
3) Recent debate in the legality of DataMining and the Pharmaceutical Industry
Pharmaceutical companies are taking advantage of the growing use of technology in the healthcare arena by usingdata to enhance their marketing efforts and increase the quality of research and development. The process of datamining allows companies to extract useful information from large sets of individual data. This process provides a knowledge that is vital to a pharmaceutical company’s competitive position and organizational decision-making. “DataMining enables firms and organizations to make calculated decisions by assembling, accumulating, analyzing and accessing corporate data. It uses variety of tools like query and reporting tools, analytical processing tools, and Decision Support System (DSS) tools” (Rangan, 2007).
1) What is DataMining and why is it used?
“Datamining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis. Data...