|
|
Top Ten Trends in Data Warehousing本文重点: 理论探讨 广告 Top Ten Trends in Data WarehousingBy Dorinne Hoss Although data warehousing has greatly matured as a technology discipline over the past ten years, enterprises that undertake data warehousing initiatives continue to face fresh challenges that evolve with the changing business and technology environment. The data warehouse is being called on to support new initiatives, such as customer relationship management and supply chain management, and has also been directly impacted by the rise of e- business. Data warehousing vendors have developed new and more sophisticated technologies and have acquired and merged with other vendors. The number of homegrown and packaged software implementations throughout the average enterprise has grown rapidly, creating more data sources and information delivery options. With all of the activity surrounding data warehousing, it is hard to sort out which issues and trends are most pressing for enterprises. To that end, this article presents insights into the ten biggest data warehousing challenges facing organizations. Trend #10: Data Warehouse Do-OversData warehousing has matured as a technology discipline and most large enterprises have completed some form of data warehousing initiative, whether it is an enterprise-wide data warehouse or just one or two departmental data marts. These initiatives have achieved varying degrees of success, and many organizations are now in the process of reengineering or even totally rebuilding their data infrastructures. According to META Group, almost one- third of data warehousing efforts through 2001 will be do-overs. What problems and challenges have made these do-overs necessary? There are some common pitfalls that many enterprise data warehousing initiatives have fallen into:
The good news behind past data warehousing "failures" is that enterprises have learned from their mistakes and are developing a set of best practices as they correct the problems. This means more successful implementations in the future as newcomers to data warehousing learn from those who have been there before. Trend #9: Proliferation of Data SourcesThe number of enterprise data sources is growing rapidly, with new types of sources emerging every year. The most exciting new source is, of course, enterprise e-business operations. Enterprises want to integrate clickstream data from their Web sites with other internal data in order to get a complete picture of their customers and integrate internal processes. Other sources for valuable data include ERP programs, operational data stores, packaged and homegrown analytic applications and existing data marts. The process of integrating these sources into one data warehouse can be complicated and is made even more difficult when an enterprise merges with or acquires another enterprise. Enterprises also look to a growing number of external sources to supplement their internal data. These might include prospect lists, demographic and psychographic data, and business profiles purchased from third-party providers. Enterprises might also want to use an external provider for help with address verification, where internal company sources are compared with a master list to ensure data accuracy. Additionally, some industries have their own specific sources of external data. For example, the retail industry uses data from store scanners, and the pharmaceutical industry uses prescription data that is aggregated by third- party vendors. Trend #8: OutsourcingAlthough enterprises have not yet begun to outsource their actual data warehouses, they are outsourcing other applications and, by extension, the data used and generated by those applications. The use of outsourcing is growing rapidly. Gartner, Inc. estimates that by 2003, 45 percent of large enterprises will host or rent some form of business application with an application service provider (ASP). ASPs offer fast application deployment and application expertise that an enterprise might not possess. While the benefits can be great, enterprises that use ASPs must manage the risks inherent in outsourcing data. First, enterprises should make sure that their ASP is taking adequate security measures to keep data separate and private from the data of the ASP抯 other customers. Second, the enterprise should ensure that the ASP has experience with moving large volumes of data so that migration of data to and from the ASP will go smoothly. Third, the ASP should have proven experience in backup and recovery for the database(s) being used. Finally, enterprises should ensure that the flow of data between the enterprise抯 internal systems and the ASP can be kept intact. Trend #7: Hub Versus Relational DatabasesIn an effort to control costs and improve performance, enterprises are increasingly implementing data hubs in their data warehouses instead of using relational databases. Keeping data in a relational database can be quite expensive, costing three to five times more than keeping data in a nonrelational repository. Additionally, queries on nonrelational data stores can outperform queries on relational databases. In hopes of achieving these benefits, enterprises are turning to compressed flat files to replace some of their RDBMSs. Despite the performance benefits and cost-effectiveness of these data hubs, they are limited by not having SQL and are not appropriate for all situations. Trend #6: Active Data WarehousesAs enterprises face competitive pressure to increase the speed of decision making, the data warehouse must evolve to support real-time analysis and action. "Active" data warehouses are one way to meet this need. In contrast to traditional data warehouses, active data warehouses are tied closely to operational systems, are designed to hold very detailed and current data, and feature shortened batch windows. And unlike most operational data stores (ODS), active data warehouses hold integrated data and are open to user queries. All of the aforementioned characteristics make active data warehouses ideal for real-time analysis and decision-making as well as automated event triggering. With an active data warehouse, an enterprise can respond to customer interactions and changing business conditions in real time. An active data warehouse enables a credit card company to detect and stop fraud as it happens, a transportation company to reroute its vehicles quickly and efficiently or an online retailer to communicate special offers based on a customer抯 Web surfing behavior. The active data warehouse抯 greatest benefit lies in its ability to support tactical as well as strategic decisions. Trend #5: Fusion with CRMCustomer relationship management (CRM) is one of the most popular business initiatives in enterprises today. CRM helps enterprises attract new customers and develop loyalty among existing customers with the end result of increasing sales and improving profitability. A data warehouse contains the information an enterprise needs to truly understand its customers and is, therefore, increasingly looked to as a prerequisite for a successful CRM initiative. One of the most important requirements of CRM is the integration of sales, marketing and customer care ?all of these customer-facing functions must share information and work together. In the past, enterprises seldom integrated these areas, but CRM initiatives are pushing them to do so in order to better understand and serve their customers. The data warehouse is essential in this integration process, as it collects data from all channels and customer touchpoints, and presents a unified view of the customer to sales, marketing and customer-care employees. Software packages are increasingly reflecting the need for integration of these functional areas as demonstrated by the trend towards merging customer-care and campaign-management software. Trend #4: Growing Number of End UsersAs vendors make data warehousing and business intelligence tools more accessible to the masses, the number of data warehousing end users is growing rapidly. Survey.com predicts that the number of data warehouse users will more than quadruple by 2002, with an average of 2,718 individual users and 609 concurrent users per warehouse. In addition to coping with the growth in the number of end users, data warehousing teams will need to cater to different types of end users. In a recent study, Gartner found that the use of business intelligence tools is growing most rapidly among administration and operations personnel, followed closely by executive-level personnel. These findings demonstrate that business intelligence tools have become both easier to use and more strategic. Obviously, end users will have different needs depending on their position in the company ?while the business analyst needs ad hoc querying capabilities, the CEO and COO may only want static reporting. Enterprises can handle the growing number of end users through the use of several techniques including parallelism and scalability, optimized data partitioning, aggregates, cached result sets and single-mission data marts. These techniques allow a large number of employees to concurrently access the data warehouse without compromising performance. Accommodating the different needs of various end-user groups will require as much of an organizational solution as a technical one. Data warehousing teams should involve end users from the beginning in order to determine the types of data and applications necessary to meet their decision-making needs. Trend #3: More Complex QueriesIn addition to becoming more numerous, queries against the data warehouse will also become more complex. User expectations are growing in terms of the ability to get exactly the type of information needed, when it抯 needed. Simple data aggregation is no longer enough to satisfy users who want to be able to drill down on multiple dimensions. For example, it may not be enough to deliver a regional sales report every week. Users may want to look at the data by customized dimensions ?perhaps by a certain customer characteristic, a specific sales location or the time of purchase. Users are also demanding more sophisticated business intelligence tools. According to Gartner, data mining is the most rapidly growing business intelligence technology. Other sophisticated technologies are also becoming more popular. Vendors are developing software that can monitor data repositories and trigger reactions to events on a real-time basis. For example, if a telecom customer calls to cancel his call-waiting feature, real-time analytic software can detect this and trigger a special offer of a lower price in order to retain the customer. Vendors are also developing a new generation of data mining algorithms, featuring predictive power combined with explanatory components, robustness and self-learning features. These new algorithms automate data mining and make it more accessible to mainstream users by providing explanations with results, indicating when results are not reliable and automatically adapting to changes in underlying predictive models and/or data structures. Enterprises can handle complex queries and the demands of advanced analytic technologies by implementing some of the same techniques used to handle the increasing number of users, including parallelism. These techniques ensure that complex queries will not compromise data warehouse performance. In trying to meet end-user demands, enterprises will also need to address data warehouse availability. In global organizations, users need 24x7 uptime in order to get the information they need. In enterprises with moderate data volumes, high availability is easily implemented with high redundancy levels. In enterprises with large data volumes, however, systems must be carefully engineered for robustness through the use of well-designed parallel frameworks. Trend #2: Integrated Customer ViewObtaining a 360- degree view of the customer is rapidly becoming the single most popular rationale for large-scale data warehousing efforts. Enterprises want to get a complete picture of each customer across all channels and all lines of business. While this sounds like a simple concept, it can be very difficult to implement. Many enterprises have historically been organized around products, geographies or other business-related dimensions, and their IT systems reflect this. Moving to a customer-centric view requires a big change in the way they collect, store and disseminate information. Enterprises have to integrate the proliferating data sources previously mentioned and must be sure to handle data quality issues so that customers are represented accurately across all systems. Trend #1: Exploding Data VolumesOne of the biggest technology issues facing enterprises today is the explosion in data volumes that is expected to occur over the next several years. According to Gartner, in 2004 enterprises will be managing 30 times more data than in 1999. And Survey.com found that the amount of usable data in the average data warehouse will increase 290 percent to more than 1.2 terabytes in 2002. E-business is one of the primary culprits in the data explosion, as clickstream data is expected to quickly add terabytes to the data warehouse. As the number of other customer contact channels grows, they add even more data. Escalating end-user demands also play a part, as organizations collect more information and store it for longer periods. The data explosion creates extreme scalability challenges for enterprises. A truly scalable data warehouse will allow an enterprise to accommodate increasing data volumes by simply adding more hardware. Scalable data warehouses typically rely on parallel technology frameworks. Fortunately, lower hardware costs are making parallel technology more accessible. Distributed memory parallel processor (DMPP) hardware is becoming less expensive, and alternatives to DMPP are also improving ?server clustering (of SMPs) is evolving as a substitute. Additionally, storage costs continue to decline every year, making it possible for enterprises to keep terabytes of detailed historical data. This article has looked at some of the major challenges in data warehousing today. Hopefully this list provides some food for thought for those involved in data warehousing initiatives and encourages you to examine the way these trends and issues affect your own organizations. While this article has presented only brief suggestions for dealing with data warehousing challenges, I hope readers will use these suggestions as a springboard to further exploration of available solutions.
Dorinne Hoss is a market research analyst at Knightsbridge Solutions, a systems integrator specializing in scalable data warehousing and e-business infrastructures that solve terabyte-class data problems. Hoss specializes in "big data" issues and their strategic impact on enterprise success. She can be reached at dhoss@knightsbridge.com. 如果您希望与本文章的作者或其所在机构,进一步交流,请联系:畅享网 姜小姐 jill.jiang@amteam.org | 021-51096826-102 | 在线联系 |
|
|
|