50% off Encyclopedia of Information Science and Technology, Third Edition (10-Volumes)

This discipline-defining encyclopedia serves research needs in numerous fields that are affected by the rapid pace
and substantial impact of technological change and is a must have for every academic library collection.
Expires 12/31/2016.

Abstract

With the ever-increasing and ever-changing flow of information available on the Web, information analysis has never been more important. Web text mining, which includes text categorization, text clustering, association analysis and prediction of trends, can assist us in discovering useful information in an effective and efficient manner. In this chapter, we have proposed a Web mining system that incorporates both online efficiency and off-line effectiveness to provide the “right” information based on users’ preferences. A Bi-Objective Fuzzy c-Means algorithm and information retrieval technique, for text categorization, clustering and integration, was employed for analysis. The proposed system is illustrated via a case involving the Web site marketing of mobile phones. A variety of Web sites exist on the Internet and a common type involves the trading of goods. In this type of Web site, the question to ask is: If we want to establish a Web site that provides information about products, how can we respond quickly and accurately to queries? This is equivalent to asking: How can we design a flexible search engine according to users’ preferences? In this study, we have applied data mining techniques to cope with such problems, by proposing, as an example, a Web site providing information on mobile phones in Taiwan. In order to efficiently provide useful information, two tasks were considered during the Web design phase. One related to off-line analysis: this was done by first carrying out a survey of frequent Web users, students between 15 and 40 years of age, regarding their preferences, so that Web customers’ behavior could be characterized. Then the survey data, as well as the products offered, were classified into different demand and preference groups. The other task was related to online query: this was done through the application of an information retrieval technique that responded to users’ queries. Based on the ideas above the remainder of the chapter is organized as follows: first, we present a literature review, introduce some concepts and review existing methods relevant to our study, then, the proposed Web mining system is presented, a case study of a mobile-phone marketing Web site is illustrated and finally, a summary and conclusions are offered.

Literature Review

Over 150 million people, worldwide, have become Internet users since 1994. The rapid development of information technology and the Internet has changed the traditional business environment. The Internet has enabled the development of Electronic Commerce (e-commerce), which can be defined as selling, buying, conducting logistics, or other organization-management activities, via the Web (Schneider, 2004). Companies are finding that using the Web makes it easier for their business to communicate effectively with customers. For example, Amazom.com, an online bookstore that started up in 1998, reached an annual sales volume of over $1 billion in 2003 (Schneider, 2004). Much research has focused on the impact and mechanisms of e-commerce (Angelides, 1997; Hanson, 2000; Janal, 1995; Mohammed, Fisher, Jaworski, & Paddison, 2004; Rayport & Jaworski, 2002; Schneider, 2004). Although many people challenge the future of e-commerce, Web site managers must take advantage of Internet specialties which potentially enable their companies to make higher profits and their customers to make better decisions. Given that the amount of information available on the Web is large and rapidly increasing, determining an effective way to help users find useful information has become critical. Existing document retrieval systems are mostly based on the Boolean Logic model. Such systems’ applications can be rather limited because they cannot handle ambiguous requests. Chen and Wang (1995) proposed a knowledge-based fuzzy information retrieval method, using the concept of fuzzy sets to represent the categories or features of documents. Fuzzy Set Theory was introduced by Zadeh (1965), and is different from traditional Set Theory, as it uses the concept of membership functions to deal with questions that cannot be solved by two-valued logic. Fuzzy Set Theory concepts have been applied to solve special dynamic processes, especially those observations concerned with linguistic values.

Because the Fuzzy concept has been shown to be applicable when coping with linguistic and vague queries, Chen and Wang’s method is discussed below. Their method is based on a concept matrix for knowledge representation and is defined by a symmetric relation matrix as follows:

(1) where n is the number of concepts, and aij represents the relevant values between concepts Ai and Aj with aii = 1, ∀ i. It can be seen that this concept matrix can reveal the relationship between properties used to describe objects, which has benefits for product identification, query solving, and online sales development. For effective analysis, these properties, determined as the attributes of an object, should be independent of each other; however this may not always be so. Therefore a transitive closure matrix A* must be obtained from the following definition.

Definition 1: Let A be a concept matrix as shown in Equation (1), define:

(2) where ⊗ is the max-min composite operation with “∨” being the maximum operation and “∧” being the minimum operation. If there exists an integer p ≤ n – 1 such that Ap = Ap+1 = Ap+2 = ..., A* = Ap is called the Transitive Closure of the concept matrix A.

Matrix A* is an equivalent matrix which satisfies reflexive, symmetric and transitive properties.

To identify each object by its properties, a document descriptor matrix D is constructed in the following form:

(3) where dij represents the degree of relevance of document Di with respect to concept Aj and m is the number of documents in general terms. By applying the max-min composite operation ⊗ to D and A*, we have matrix B = D ⊗ A*= [bij]m×n where bij represents the relevance of each document Di with respect to a particular concept Aj.