Business Intelligence for Small and Middle-Sized Entreprises

Transcription

1 Business Intelligence for Small and Middle-Sized Entreprises Oksana Grabova University of Lyon (ERIC Lyon2) 5 av P. Mendes-France Bron Cedex, France Jerome Darmont University of Lyon (ERIC Lyon2) 5 av P. Mendes-France Bron Cedex, France Iryna Zolotaryova Kharkiv National University of Economics 9-a, pr. Lenina Kharkov, Ukraine Jean-Hugues Chauchat University of Lyon (ERIC Lyon2) 5 av P. Mendes-France Bron Cedex, France ABSTRACT Data warehouses are the core of decision support systems, which nowadays are used by all kind of enterprises in the entire world. Although many studies have been conducted on the need of decision support systems (DSSs) for small businesses, most of them adopt existing solutions and approaches, which are appropriate for large-scaled enterprises, but are inadequate for small and middle-sized enterprises. Small enterprises require cheap, lightweight architectures and tools (hardware and software) providing online data analysis. In order to ensure these features, we review web-based business intelligence approaches. For real-time analysis, the traditional OLAP architecture is cumbersome and storage-costly; therefore, we also review in-memory processing. Consequently, this paper discusses the existing approaches and tools working in main memory and/or with web interfaces (including freeware tools), relevant for small and middle-sized enterprises in decision making. 1. INTRODUCTION During the last decade, data warehouses (DWs) have become an essential component of modern decision support systems in most companies of the world. In order to be competitive, even small and middle-sized enterprises (SMEs) now collect large This author also works at the Kharkiv National University of Economics, 9-a, pr.lenina, Kharkov, Ukraine volumes of information and are interested in business intelligence (BI) systems [26]. SMEs are regarded as significantly important on a local, national or even global basis and they play an important part in the any national economy [34]. In spite of multiples advantages, existing DSSs frequently remain inaccessible or insufficient for SMEs because of the following factors: high price; high requirements for a hardware infrastructure; complexity for most users; irrelevant functionality; low flexibility to deal with a fast changing dynamic business environment [56]; low attention to difference in data access necessity in SMEs and large-scaled enterprises. In addition, many projects fail due to the complexity of the development process. Moreover, as the work philosophies of small and large-scaled enterprises are considerably different, it is not advisable to use tools destined to large-scaled enterprises. In short, one size does not fit all [51]. Furthermore, there are a lot of problems in the identification of information needs of potential users in the process of building a data warehouse [7]. Thereby, SMEs require lightweight, cheap, flexible, SIGMOD Record, June 2010 (Vol. 39, No. 2) 39

2 simple and efficient solutions. To aim at these features, we can take advantage of light clients with web interfaces. For instance, web technologies are utilized for data warehousing by large corporations, but there is an even greater demand of such kind of systems among small and middle-sized enterprises. Usage of web technologies provides cheap software, because it eliminates the necessity for numerous dispersed applications, the necessity of deployment and maintenance of corporate network, and reduces training time. It is simple for end-users to utilize web-based solutions. In addition, a web-based architecture requires only lightweight software clients (i.e., web browsers). Besides, there is a need for real-time data analysis, which induces memory and storage issues. Traditional OLAP (On line Analytical Processing) tools are often based on a cumbersome hardware and software architecture, so they require significant resources to provide a high performance. Their flexibility is limited by data aggregation. At the same time, in-memory databases provide significant performance improvements. Absence of disk I/O operations permits fast query response times. In-memory databases do not require indexes, recalculation and pre-aggregations, thus system becomes more flexible because analysis is possible to a detailed level without its pre-definition. Moreover, according to analyst firms, by 2012, 70% of Global 1000 organizations will load detailed data into memory as the primary method to optimize BI application performance [47]. Thus, our objective is to propose an original and adapted BI solutions for SMEs. To this aim, in a first step, we review in this paper the existing research related to this issue. The remainder of this paper is organized as follows. In section 2, we first present and discuss web-based BI approaches, namely web data warehouses and web-based open source software for data warehousing. In Section 3, we review in-memory BI solutions (MOLAP, vector database-based BI software) and technologies that can support it (in-memory and vector databases). We finally conclude this paper in Section 4 and provide our view on how the research and technologies surveyed in this paper can be enhanced to fit SME s BI needs. 2. WEB-POWERED BI The Web has become the platform of choice for the delivery of business applications for large-scaled entreprises as well as for SMEs. Web warehousing is a recent approach that merges data warehousing and business intelligence systems with web technologies [52]. In this section, we present and discuss web data warehousing approaches, their features, advantages and possibilities, as well as their necessity and potential for SMEs. 2.1 Web warehousing General information There are two basic definitions of web warehousing. The first one simply states that web warehouses use data from the Web. The second concentrates on the use of web technologies in data warehousing. We focus on second definition in our paper. Web-data warehouses inherit a lot of characteristics from traditional data warehouses, including: data are organized around major subjects in the enterprise; information is aggregated and validated; data is represented by times series, not by current status. Web-based data warehouses nonetheless differ from traditional DWs. Web warehouses organize and manage the stored items, but do not collect them [52]. Web-based DW technology changes the pattern of users accessing to the DW: instead of accessing through a LAN (Local Area Network), users access via Internet/Intranet [30]. Specific issues raised by web-based DW include unrealistic user expectations, especially in terms of how much information they want to be able to access from the Web; security issues; technical implementation problems related to peak demand and load problems [42]. Eventually, web technologies make data warehouses and decision support systems friendlier to users. They are often used in data warehouses only to visualize information [18]. At the same time, web technology opens up multiple information formats, such as structured data, semi-structured data and unstructured data, to end-users. This gives a lot of possibilities to users, but also creates a problem known as data heterogeneity management [19]. Another important issue is the necessity to view the Web as an enormous source of business data, without whose enterprises loose a lot of possibilities. Owing to the Web, business analysts can access large external to enterprise information and then study competitor s movements by analyzing their web site content, can analyze customer preferences or emerging trends [11]. So, e-business technologies are expected to allow SMEs to gain capabilities that were once the preserve of their larger competitors [34]. However, most of the information in the Web is unstructured, heterogeneous and hence difficult to analyze [26]. 40 SIGMOD Record, June 2010 (Vol. 39, No. 2)

3 Among web-technologies used in data warehousing, we can single out web browsers, web services and XML. Usage of web browser offers some advantages over traditional warehouse interface tools [19, 33]: cheapness and simplicity of web browser installation and use; reduction of system training time; elimination of problems posed by operating systems; low cost of deployment and maintenance; elimination of necessity for numerous dispersed applications; possibility to open data warehouse to business partners over an extranet. Web warehouses can be divided into two classes: XML document warehouses and XML data warehouses. We present them in sections and respectively. We also introduce OLAP on XML data (XOLAP) in section We finish this section by web-based paradigm known as cloud computing (section 2.1.5). Section 2.2. finally presents web-based open source software for data warehousing analysis XML document warehouses An XML document warehouse is a software framework for analyzing, sharing and reusing unstructured data (texts, multimedia documents, etc.). Unstructured data processing takes an important place in enterprise life because unstructured data are larger in volume than structured data, are more difficult to analyze, and are an enormous source of raw information. Representing unstructured or semi-structured data with traditional data models is very difficult. For example, relational models such as star and snowflake schemas are semantically poor for unstructured data. Thus, Nassis et al. utilize object-oriented concepts to develop a conceptual model for XML document warehouses [35]. They use UML diagrams to build hierarchical conceptual views. By combination of object oriented concepts and XML Schema, they build the xfact repository XML data warehouses In contrast to XML document warehouses, XML data warehouses focus on structured data. XML data warehouse design is possible from XML sources [3]. In this case, it is necessary to translate XML data into a relational schema by XML schema [3, 8]. Xyleme is one of the first projects aimed at XML data warehouse design [57]. It collects and archives web XML documents into a dynamic XML warehouse. Some more recent approaches are based on classical warehouse schemas. Pokorny adapts the traditional star schema with explicit dimension hierarchies for XML environments by using Document Type Definition (DTD) [41]. Boussaïd et al. define data warehouse schemas via XML schema in a methodology named X-Warehousing [8]. Golfarelli proposes a semi-automatic approach for building the conceptual schema for a data mart starting directly from XML sources [15]. This work elaborates the concept of Dimensional Fact Model. Baril and Bellahsene propose a View Model from XML Documents implemented in the DAWAX (Data Warehouse for XML) system [4]. View specification mechanism allows filtering data to be stored. Nørvåg introduces a temporal XML data warehouses to query historical document versions and query changes between document versions [36]. Nørvåg et al. also propose TeXOR, a temporal XML database system built on top of an object-relational database system [37]. Finally, Zhang et al. propose an approach, named X- Warehouse, to materialize data warehouses based on frequent query patterns represented by Frequent Pattern Trees [58] XOLAP Some recent research attempts to perform OLAP analysis over XML data. In order to support OLAP queries and to be able to construct complex analytic queries, some researches extend the XQuery language with aggregation features [5]. Wiwatwattana et al. also introduce an XQuery cube operator, Xˆ3 [55], Hachicha et al. also propose a similar operator, but based on TAX (Tree Algebra for XML)[17] Cloud computing Another, increasingly popular web-based solution is cloud computing. Cloud computing provides access to large amounts of data and computational resources through a variety of interfaces [38]. It is provided as services via cloud (Internet). These services delivered through data centers are accessible anywhere. Besides, they allow the rise of cloud analytics [2]. The main consumers of cloud computing are small enterprises and startups that do not have a legacy of IT investments to manage [50]. Cloud computingbased BI tools are rather cheap for small and middlesized enterprises, because they provide no need of hardware and software maintenance [1] and their SIGMOD Record, June 2010 (Vol. 39, No. 2) 41

4 prices increase according to required data storages. Contrariwise, cloud computing does not allow users to physically possess their data storage. It causes user dependence on the cloud computing provider, loss of data control and data security. In conclusion, most cloud computing-based BI tools do not fit enterprise requirements yet Discussion Data storage and analysis interface solutions should be easily deployed in a small organization at low cost, and thus be based on web technologies such as XML and web services. Web warehousing is rather recent, but a popular direction that provides a lot of advantages, especially in data integration. Web-based tools provide light interface. Thereby, their usage by small and middle-sized enterprises is limited. Existing cloud-based BI tools are appropriated for small and middle-sized enterprises with respect to price and flexibility. However, they are so far enterprise-friendly and are in need of data security enhancements. 2.2 Web-based open source software In this section, we focus on ETL(Extraction Transformation Loading) tools, OLAP servers and OLAP clients. Their characteristics are summarized in Table ETL Web-based free ETL tools are in most cases RO- LAP (Relational OLAP, discussed in Section )- oriented. ROLAP-oriented ETL tools allow user to define and create data transformations in Java (JasperETL) or in TL (Clover.ETL) 1. Singular MO- LAP (Multidimensional OLAP, discussed in Section )-oriented ETL Palo defines the ETL process either via web interfaces or via XML structures for experts. All studied ETL tools configure heterogeneous data sources and complex file formats. They interact with differents DBMSs (DataBase Management Systems). Some of the tools can also extract data from ERP (Enterprise Resource Planning) and CRM (Customer Relationship Management) systems [53] OLAP In this section we review OLAP servers as well as OLAP clients. All studied OLAP severs use the MDX (Multi-Dimensional expression) language for aggregating tables. They parse MDX into SQL to retrieve answers to dimensional queries. All reviewed OLAP servers exists for Java, but a Palo 1 exists also for.net, PHP, and C. Moreover, Palo is an in-memory Multidimensional OLAP database server 2. Mondrian schemas are represented in XML files 3. Mondrian Pentaho Sever is used by different OLAP clients, e.g., FreeAnalysis. All studied OLAP clients are Java applications. They usually run on client, but tools also exist that run on web servers[53]. So far, only PocOLAP is a lightweight, open source OLAP solution Discussion The industrial use of open source business intelligence tools is becoming increasingly common, but it is still not as wide- spread as for other types of software [53]. Moreover, freeware OLAP systems often propose simple web-based interfaces. In addition, there are some web-based open source BI tools that work in memory. Nowadays, there are three complete solutions, including ETL and OLAP: Talend OpenStudio, Mondrian Pentaho and Pa- lo. Among ETL tools, only Palo is MOLAP-oriented. Not all of these tools provide free graphical user interfaces. All three represented ETL tools support Java. They can be implemented on different platforms. Free web-based OLAP servers are used by different OLAP clients. The most extended and widely used is Mondrian Pentaho Server due to its functionality. All studied OLAP clients are Java applications. Most of them can be used with XMLA(XML for Analysis)-enabled sources. But they have not enough documentation. Generally, web-based studied tools provide sufficient functionality, but they remain cumbersome due to traditional OLAP usage. 3. IN-MEMORY BI SOLUTIONS In the late eighties, main memory databases were researched by numerous authors. Thereafter, it has rarely been discussed because of limits of technologies at this time, but nowadays it takes back an important place in database technologies. 3.1 MOLAP OLAP and MOLAP Before studying existing MOLAP approaches, we review general OLAP principles and definitions. The OLAP concept was introduced in 1993 by Codd. OLAP is an approach to quickly answer multidimensional analytical queries [13]. In OLAP, a di- 2 olap server SIGMOD Record, June 2010 (Vol. 39, No. 2)

5 ETL OLAP Tools Platform License Particular features ROLAP Clover.ETL Java LGPL does not have an open source GUI; uses its own TL language for data transformations JasperETL Java GPL generated code - Java or Perl; can use CRM systems as data sources MOLAP Palo ETL Java GPL does not have a GUI for a while; parallel Server jobs are not supported servers Mondrian Java CPL ROLAP-based; data cubes via XML Palo Java GPL MOLAP-based; works in memory; data cubes via Excel add-in FreeAnalysis Java MPL works with servers that use XMLA, clients e.g., Modrian JPalo Java GPL works with the Palo server PocOLAP Java LGPL Table 1: Web-based open source software mension is a sequence of analyzed parameter values. An important goal of multidimensional modeling is to use dimensions to provide as much context as possible for facts [21]. Combinations of dimension values define a cube s cell. A cube stores the result of different calculations and aggregations. There are three variants of OLAP: MOLAP, RO- LAP, Hybrid OLAP (HOLAP). We compare these approaches in table 2. With respect to ROLAP and HOLAP, MOLAP provides faster computation time and querying [48] due to a storage of all required data in the OLAP server. Moreover, it provides more space-efficient storage [40]. Since the purpose of MOLAP is to support decision making and management, data cubes must contain sufficient information to support decision making and reply to every user expectation. In this context, researches try to improve three main aspects: response time (by new aggregations algorithms [28], new operators [46]), query personalization, data analysis visualization [26] Storage methods Researchers interested in MOLAP focus a lot on storage techniques. In addition, most researches choose MOLAP as the most suitable among OLAPtechniques for storage [31], although MOLAP requires significant storage capacity. According to Kudryavcev, there are three basic types of storage methods: semantic, syntactical, approximate [23]. Syntactical approaches transform only data storage schemas. Semantic storage techniques transform cube structures. Approximate storage techniques compress initial data. One semantic storage technique is Quotient Cube. It consists in a semantic compression by partitioning the set of cells of a cube into equivalent classes, while keeping the cube s roll-up and drill-down semantics and lattice structure [25]. The main objective of such approximating storage technique such as Wavelets is rangesum query optimization [29]. In the syntactical approach DWARF, a cube is compressed by deleting redundant information [49]. Data are represented as graphs with keys and pointers in graphs nodes. Data redundancy decrease is provided by an addressing and data storage improvement Schema evolution There are a lot of works that bring up the problem of schema evolution, because working only with the latest version hides the existence of information that may be critical for data analysis. It is possible to classify these studies into two groups: updating models (mapping data in the last version) and tracking history models (saving schema evolution). Other types of approaches look at the possibility for users to choose which presentation they want for query reponses. For instance, Body et al. proposed a novel temporal multidimensional model for supporting evolutions on multidimensional structures by introducing a set of temporal modes of presentation for dimensions in a star schema [6] Discussion Multidimensional OLAP is appropriate for decision making. It offers a number of advantages, including automatic aggregation, visual querying, and good query performance due to the use of preaggregation [39].Besides, MOLAP may be a good solution for the situations in which small to mediumsized DBs are the norm and application software SIGMOD Record, June 2010 (Vol. 39, No. 2) 43

6 Data storage Results sets Table 2: Comparison of OLAP technologies MOLAP ROLAP HOLAP Multidimensional Relational database Uses MOLAP technology to store database higher-level summary data, a RO- LAP system to store detailed data Stores in a MOLAP Stores no results sets Stores results sets, but not all cube Requires singificant ca- Requires the least stor- Capacity pacity age capacity Performance The fastest performancmance The slowest perfor- Dimensions Minimum number Maximum number Vulnerability Provides poor storage Database design recommended utilization, especially by ER di- when the data set is agrams are inappropriate sparse for decision sup- Advantages Fast query performance; automated computation of higher level aggregates of the data; array model provides natural indexing Disadvantages Data redundancy; querying models with dimensions of high cardinality is difficult port systems No limitation on data volume; leverage functionalities inherent in relational databases Slow performance Compromise between performance, capacity, and permutations of dimensions available to a user Fast access at all levels of aggregation; compact aggregate storage; dynamically updated dimensions; easy aggregate maintenance Complexity - a HOLAP server must support MOLAP and ROLAP engines, tools to combine storage engines and operations. Functionality overlap - between storage and optimization techniques in ROLAP and MOLAP engines. speed is critical [45], because loading all data to the multidimensional format does not require significant time nor disk space. Nevertheless, MOLAP systems have different problems due to the complexity, time-consuming and necessity of an expert for cube rebuilding. If the user wants to change dimensions, the whole deployment process need to be redone (datamart schema, ETL process, etc.) [56]. However, the cost of MOLAP tools does not fit the needs of small and middle-sized enterprises. In addition, MOLAP-based systems may encounter significant scalability problems. Moreover, MOLAP requires a cumbersome architecture, i.e., important software and hardware needs, the necessity of significant changes in work process to generate substantial benefits [32], and a considerable deployment time. 3.2 Main Memory Databases General information Main Memory Databases (MMDBs) entirely reside in main memory [14] and only use a disk subsystem for backup [16]. The concept of managing an entire database in main memory has been researched for over twenty years, and the benefits of such approaches have been well-understood in certain domains, such as telecommunications, security trading, applications handling high traffic of data, e.g., routers; real-time applications. However, it is only recently, with decreasing memory prices and the availability of 64-bit operating systems, that the size restrictions on in-memory databases have been removed and in-memory data management has become available for many applications [27, 54]. When the assumption of disk-residency is removed, complexity is dramatically reduced. The number of machine instructions drops, buffer pool management disappears, extra data copies are not needed, index pages shrink, and their structure is simplified. Design becomes simpler and more compact, and queries are executed faster [54]. Consequently, usage of main memory databases become advantageous in many cases: for hot data (frequently ac- 44 SIGMOD Record, June 2010 (Vol. 39, No. 2)

7 cess, low data volume), for cold data (scarce access, in the case of voluminous data), in application requiring a short access/response time. A second wave of applications using MMDB is currently appearing, e.g., FastDB, Dali from AT&T Bell lab, TimesTen from Oracle. These systems are widely used in many applications such as HP intellect web flat already, Cisco VoIP call Proxy, the telecom system of Alcatel and Ericsson and so on [12]. The high demand of MMDBs is provoked by the necessity of high reliability, high real-time capacity, high quantity of information throughput [20]. MMDBs have some advantages, including short response time, good transaction throughputs. MMDBs also leverage the decreasing cost of main memory. Contrariwise, MMDB size is limited by size of RAM (Random Access Memory). Moreover, since data in main memory can be directly accessed by the processor, MMDBs suffer from data vulnerability, i.e., risk of data loss because of unintended accident due to software errors[14], hardware failure or other hazards MMDB issues Although in-memory technologies provide high performance, scalability and flexibility to BI tools, they are still some open issues. MMDBs work in memory, therefore the main problems and challenges are recovery, commit processing, access methods and storage. There is no doubt that backups of memory resident databa- ses must be maintained on other storage than main memory in order to insure data integrity. In order to protect against failures, it is necessary to have a backup copy and to keep a log of transaction activity [14]. In addition, recovery processing is usually the only MMDB component that deals with disk I/O, so it must be designed carefully [20]. Existing research works do not share a common view of this problem. Some authors propose to use a part of stable main memory to hold the log. It provides short response time, but it causes a problem when logs are large. So, it is used for the precommit transactions. Group commits (e.g., a casual commit protocol [27]) allow accumulating several transactions in memory before flushing them to the log disk. Nowadays, commit processing is especially important in distributed database systems because it is slow due to the fact that disk logging takes place at several sites [27]. Several different approaches of data storage exist for MMDBs. Initially, there have been a lot of attempts to use database partitioning techniques developed earlier for other types of databases. Gruenwald and Eich divide existing techniques as following: horizontal partitioning, group partitioning, single vertical partitioning, group vertical partitioning, and mixed partitioning [16]. Only horizontal and single vertical partitioning are suitable for MMDBs and, as a result of this study, single vertical partitioning was chosen as the most efficient [10]. B-trees and hashing are identified also as appropriate storage techniques for MMDBs. Hashing is not as space efficient as a tree, so it is rarely used [43]. Finally, most researches agree to choose T-trees (a balanced index tree data structure optimized for cases where both the index and the actual data are fully kept in memory) as the main storage technique [12, 14, 44]. T-trees indeed require less memory space and fewer CPU cycles than B-trees, so indexes are more economical. Above-mentioned issues are important for BI environment: data coherence is strategic, performance is fundamental for on-line operations like OLAP. Choices of right storage and recovering techniques are crucial as it can damage data security and data integrity MMDB Systems In this section we give an overview of MMDB systems. We particularly focus our discussion on the most recent systems such as Dali, FastDB, Kdb, IBM Cognos TM1 and TimesTen. Among studied systems, we can distinguish a storage manager (the Dali system [20]) and complete main memory data- base systems (FastDB, Kdb, TM1, TimesTen). Interfaces can be based on zerofootprint Web (IBM Cognos TM1 4 ), standard SQL (TimesTen)[54] or C++ (FastDB)[22]. Most MMDBS feature SQL or SQL-like query language (FastDB, TimesTen). Kdb system uses its own language q for programming and querying [24]. IBM Cognos 8 BI and TimesTen are aimed at decision making in large corporations. Main MMDB disadvantages are interprocess communication absence and high storage requirements (Dali system) [9], limitation of server memory (TimesTen), clientserver architecture is unsupportable (FastDB) Discussion The main benefit of using MMDBs is short access/reponse time and good transaction throughput. But MMDBs are hampered by data vulnerability and security problems. Memory is not persistent, which means data loss in case of failure on the server. Security problems come from unauthorized 4 www-01.ibm.com/software/data/cognos/products/tm1 SIGMOD Record, June 2010 (Vol. 39, No. 2) 45

8 access to data aimed at data corruption or theft. So far, MMDBs are mainly used in real-time applications, telecommunications, but not commonly used for decision making. In spite of a considerable research on MMDBs, there are some unresolved issues such as data security and safety and data processing efficiency. 3.3 Vector Databases General information A vector table is built by transforming a file in the following way: every record represents all column values in a vector. Vector databases (VDBs) do not require indexes nor any complex database structure. Differences between vector and relational databases are summarized in table 2. In order to access data, relational DBMSs provide only sequential scan by columns and by rows. VDBs provide fast data access. Besides, relational DBs store large volumes of repeatable data due to data nature. For example, in a table of students, French nationality can be repeated in a great number. Contrariwise, in VDBs, this data is present only once. It provides significant data compression. The main principles of vector databases are data associations and data access by pointers. Vector database implementations allow elimination of data redundancy, because any possible pice of data is written once and it does not repeat itself. Such metadata as keys in the relational data model loose their interest in VDBs, because data associations are provided by pointers. Hence, VDBs do not consume as much space as relational DBs VDB-based BI The main principle of vector database is that instead of dimension associations with OLAP cube there are associations between data. These associations are defined during data load process by matching up table columns having the same name. Usage of vector databases differs from classical warehousing: there is no predefinition of what a dimension is. Any piece of data is available as dimension and any piece of data is available as measure. So, it is not necessary to reconstruct data schema in the case of dimension change. As vector databases work in memory, VDB-based BI are endowed with instant data access. However, entreprises frequently hesitate to use VDB-based BI because of noninteroperability with SQL tools. One BI tool that uses vector database deployment is QlikView 5. QlikView provides integrated ETL. It 5 removes the need to pre-aggregate data. It is possible to change analysis axes any moment at any level of query detailing. Despite QlikView capacities, it has some limitations and disadvantages such as lack of a unified metadata view and of predicting models (QlikView s statistical analysis features are less developed than the in other BI tools). There is no specialization in visualization: QlikView provides a clean interface to analysts but it lacks advanced visualization features to help them graphically wade through complicated data. One of the QlikViews s features is an ability to automatically connect tables. But this can create some problems. When there are fields, which represent the same thing in different tables and they do not have the same name, it is necessary to rename them to connect them. When there are fields in different tables that have the same name, but not the same content and sense, a senseless connection is created. So it is necessary to delete this connection and reanalyze all the fields with the same names in order to distinguish the ones with different sense. QlikView provides a possibility for end-users to use integrated ETL and to construct their data schema themselves, which often leads to unsatisfactory results Discussion Vector databases hold the same advantages as others in-memory databases and are only limited by memory size. VDB-based BI is a relatively new direction, but it is rather popular due to fast performance, great analysis capacity, unlimited number of dimensions, tables and measures and implementation easiness. However, among features proposed by QlikView, there are disputable ones: automatic table connection, possibility to create a data schema by enduser. These characteristics do not cover different situations due to data aggregation complexity when data come from different sources. Such data have different refinement levels, different field names, etc. Consequently, providing to end-users the possibility to create data schemas can provoke an inadequate data schema, table connections, data loss as well as false data presence in database. Moreover, VDBbased BI tools are often blackboxes, meaning that we do not know what happens inside. Such models also lack flexibility. 4. CONCLUSION Nowadays, BI becomes an essential part of any enterprise, even an SME. This necessity is caused by the increasing data volume indispensable for decision making. Existing solutions and tools are mostly 46 SIGMOD Record, June 2010 (Vol. 39, No. 2)

9 Table 3: Relational and vector database chatacteristics Characteristic Relational DB Vector DB Access to data Sequential Parallel Data integrity Foreign Keys Multi-dimensional Data relations stored in Keys Vectors Data reuse Not available Built-in Metadata System tables None Speed (high volume) Slow Fast Uniqueness User Constraints Built-in aimed at large-scaled enterprises; thereby they are inaccessible or insufficient for SMEs because of high price, redundant functionality, complexity, and high hardware and software requirements. SMEs require solutions with light architectures that, moreover, are cheap and do not require additional hardware and software. This survey discusses the importance of data warehousing for SMEs, presents the main characteristics and examples of web-based data warehousing, MO- LAP systems and MMDBs. All these approaches have important disadvantages to be chosen as a unique decision support system: cumbersome architecture and complexity in MOLAP, data vulnerability in MMDBs, non-transparency and providing too large powers for users in VDB-based systems, security issues in cloud computing systems. In this context, our research objective is to design BI solutions that are suitable for SMEs and avoid the aforementioned disadvantages. Our idea is to work toward a ROLAP system that operates in-memory, i.e., to add in OLAP operators on top of an SQL-based MMDB. This should simplify a lot the in-memory OLAP architecture with respect to MOLAP. Choosing an open source MMDB system (such as FastDB) and using wellknown ETL, modeling and analysis processes should also help avoid the black box issue of VDBs. Finally, storing business data as close to the user as possible mitigates security issues with respect to cloud BI. Problems will still remain, though (e.g., data vulnerability and need for backup, the design of adapted, in-memory indexes for OLAP), but we are confident we can address them in our future research. 5. ACKNOWLEDGMENTS The authors would like to thank the French Ambassy in Ukraine for supporting this joint research work of the Kharkiv National University of Economics (Ukraine) and the University of Lyon 2 (France). 6. REFERENCES [1] D. J. Abadi. Data Management in the Cloud: Limitations and Opportunities. IEEE Data Engineering Bulletin, 32(1):3 12, March [2] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. Above the Clouds: A Berkeley View of Cloud Computing. Technical Report UCB/EECS , EECS Department, University of California, Berkeley, February [3] M. Banek, Z. Skocir, and B. Vrdoljak. Logical Design of Data Warehouses from XML. In ConTEL 05: Proceedings of the 8th international conference on Telecommunications, volume 1, pages , [4] X. Baril and Z. Bellahsene. Designinig and Managing an XML Warehouse. In XML Data Management: Native XML and XML-Enabled Database Systems, chapter 16, pages Addison-Wesley Professional, [5] K. Beyer, D. Chamberlin, L. S. Colby, F. Özcan, H. Pirahesh, and Y. Xu. Extending XQuery for analytics. In SIGMOD 05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages , New York, NY, USA, ACM. [6] M. Body, M. Miquel, Y. Bédard, and A. Tchounikine. A multidimensional and multiversion structure for OLAP applications. In DOLAP 02: Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP, pages 1 6, New York, NY, USA, ACM. [7] M. Böhnlein and A. U. vom Ende. Business Process Oriented Development of Data Warehouse Structures, pages Physica: Heidelberg 2000, [8] O. Boussaïd, R. BenMessaoud, R. Choquet, and S. Anthoard. X-Warehousing: an SIGMOD Record, June 2010 (Vol. 39, No. 2) 47

Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of Base and Mining Group of base and data mining group,

Introduction OLAP in the Data Warehouse Before the advent of data warehouse, tools were available for business data analysis. One of these is structured query language or SQL for query processing. However,

INTEROPERABILITY IN DATA WAREHOUSES Riccardo Torlone Roma Tre University http://torlone.dia.uniroma3.it/ SYNONYMS Data warehouse integration DEFINITION The term refers to the ability of combining the content

OLAP and OLTP AMIT KUMAR BINDAL Associate Professor Databases Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age Information, which is created by data,

Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics Platform. Contents Pentaho Scalability and

ORACLE DATABASE 10G ENTERPRISE EDITION OVERVIEW Oracle Database 10g Enterprise Edition is ideal for enterprises that ENTERPRISE EDITION For enterprises of any size For databases up to 8 Exabytes in size.

1 COURSE SYLLABUS COURSE TITLE: FORMAT: CERTIFICATION EXAMS: 55043AC Microsoft End to End Business Intelligence Boot Camp Instructor-led None This course syllabus should be used to determine whether the

60 Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative

Whitepaper Innovations in Business Intelligence Database Technology The State of Database Technology in 2015 Database technology has seen rapid developments in the past two decades. Online Analytical Processing

The IBM Cognos Platform for Enterprise Business Intelligence Highlights Optimize performance with in-memory processing and architecture enhancements Maximize the benefits of deploying business analytics

1 B.Sc (Computer Science) Database Management Systems UNIT-V Business Intelligence? Business intelligence is a term used to describe a comprehensive cohesive and integrated set of tools and process used

Master Management MDM and Warehousing Complement Each Other Greater business value from both 2011 IBM Corporation Executive Summary Master Management (MDM) and Warehousing (DW) complement each other There

Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

Chapter 6 FOUNDATIONS OF BUSINESS INTELLIGENCE: DATABASES AND INFORMATION MANAGEMENT Learning Objectives Describe how the problems of managing data resources in a traditional file environment are solved

9 8 TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS Assist. Prof. Latinka Todoranova Econ Lit C 810 Information technology is a highly dynamic field of research. As part of it, business intelligence

College of Engineering, Technology, and Computer Science Design and Implementation of Cloud-based Data Warehousing In partial fulfillment of the requirements for the Degree of Master of Science in Technology

IAF Business Intelligence Solutions Make the Most of Your Business Intelligence White Paper INTRODUCTION In recent years, the amount of data in companies has increased dramatically as enterprise resource