At the SDForum’s Business Intelligence SIG meeting at SAP on May 19, 2010, Donovan Schneider, Principal Architect at Salesforce.com, told an overflow crowd how Salesforce.com implemented a dramatically new approach to analytics.
Ever since Bill Inmon and Ralph Kimball popularized data warehouses in the 1990s, organizations have extracted data from line-of-business applications, transformed it to fit a uniform schema, and loaded it into a central data warehouse. Advocates of the data warehouse argue that this separation of functions between line-of-business applications and the data warehouse improves performance by freeing applications from the burden of processing large, complex queries and running those queries through a platform optimized for the purpose. By imposing a single schema on this disparate data, data warehouse practitioners intend to make the data comprehensible to business users.
Schneider, who earned his doctorate in Computer Science at the University of Wisconsin, Madison and worked on data warehouses at Siebel, Yahoo!, and HP-Labs, argued that in practice data warehouses are complex and rigid. ETL (extraction, transform, and load) routines break constantly as schemas adapt to changing business needs. Security permissions between LOB applications and the data warehouse are often inconsistent. And the data in the data warehouse is almost always out-of-date, either by hours or days.
“The dirty secret of analytics,” Schneider told the audience, “is that these projects fail horribly, they fail horribly often, and they fail horribly in many different ways.” Citing statistics from Gartner according to which fifty percent of analytics projects fail to meet expectations, Schneider argued that traditional data warehouses are too complex, limiting their use to a small technical elite and failing to provide value to business decision makers.
To avoid these limitations, Schneider said, Salesforce.com rejected the data warehouse approach, providing customers instead with real-time analytics by retrieving data from the same databases that process transactions. If Michael Dell runs a report to see how Dell servers are selling in the Western region, Schneider explained, he will see every single item saved to the system prior to his running the report. This zero latency enables business users to make decisions confident in the timeliness of the data.
Not only does the approach achieve zero latency, Schneider explained, but it enables business users to build their own reports and dashboards using terms that they understand because they themselves created the entities and attributes using terms that fit their business.
Responding to questions about performance from the analytics experts in the audience, Schneider acknowledged the technical complexities of scaling real-time analytics in the cloud as Salesforce.com services over 72,000 customers, over 200 million API calls a day, and hundreds of thousands of unique applications.
Schneider told the audience that Salesforce.com stores customer data in Oracle Databases, which provide a strong foundation of well-proven technology and the transactional support required by business applications. The data for each customer is assigned to one of ten Oracle instances where it is combined with data from thousands of other customers and marked with a key that uniquely identifies each customer. In order to overcome the complexities posed by multi-tenancy and customer customizations, Salesforce.com designed a set of tables that combine data from many customers and stores values as varchars, a generic data type that can store data of almost any type.
Because Oracle’s query optimization engine does not work effectively with varchars and because it does not know about the complex security permissions that determine who can see what data, Salesforce.com built a system of database indexes and its own optimizer that operates in the application tier. The indexes, which are populated within transactions to assure consistency, use more specific data types to improve the performance of joins. Salesforce.com regularly monitors performance and creates additional indexes as required. The optimizer performs pre-queries, compiles statistics, creates execution plans specifying indexes and joins, and submits these plans to the Oracle database for execution.
To further enhance performance, Schneider said, the Salesforce.com user interface and API encourage best practices and discourage users from running excessively large, complex queries that would not only perform poorly for the user but would slow performance for other Salesforce.com customers. While some customers might think of this as a restriction, Schneider said, the tradeoff benefits all customers.
Even with these optimization techniques in place, Schneider acknowledged, the Salesforce.com approach cannot scale to service massive data repositories. “I would not call the friends I worked with at Yahoo to ask them to move their data to Salesforce.com,” Schneider said. Salesforce.com does turn down customers with data sets too large for it to handle, Schneider told the audience. “We are not trying to replace Teradata.” Instead, Salesforce.com seeks to meet the requirements that are common to CRM and the types of line-of-business applications that customers build on its platform.
Schneider said that Salesforce.com is not trying to mimic all of the functionality offered by specialized analytics applications, which are too complex for most users anyway. Instead, Schneider explained, Salesforce.com wants to provide the functionality that creates the most value to its customers in a way that is simple enough for them to actually use. Schneider defined business intelligence as dashboards, reports, list views, and search. While acknowledging the gap between this definition and the definitions offered by other experts in the field, Schneider said that this practical approach to analytics best meets the needs of its customers.
Salesforce.com offers real-time analytics, Schneider concluded, but it is designed as a practical extension of its platform rather than a generic data warehouse in the cloud.
Schneider encourages everyone to try out the Salesforce.com platform for free at http://www.developerforce.com/events/regular/registration.php.
As James says:
“Salesforce.com is not trying to mimic all of the functionality offered by specialized analytics applications, which are too complex for most users anyway…”
What-if a business needs specialized analytics app and the business requirements are not met by providing generic and simple lists, dashboards etc, which is mostly the case.
I still believe that the traditional data warehouse approach is much better, cleaner and faster approach for analytics. Many BI/DWH project fails doesn’t mean that complete DWH approach is bad. The complexity and rigidness nature of DWH can be lowered down and people should think of innovative ways of achiveing that. In the long run, an Ent. DWH pays-off and its worth investing in it.
Thanks for the comment. I agree the traditional data warehouse approach is the best way to go in many situations, especially when there are multiple data sources, massive quantities of data, and complex reporting requirements. However, platforms like Salesforce (and Microsoft Dynamics CRM with which I’m familiar) provide the ability to build a variety of line-of-business applications, even applications that have nothing to do with CRM. Indeed, these platforms provide a great middle way between buying and building from scratch. And these tools provide basic user-friendly reporting tools, which are becoming more powerful over time. In quite a few cases, these tools just do the trick. Users can get the data they need without even calling IT, which is great. Sure, it won’t work in all cases. But when it does, it saves time and costs.