Do you use data to support your decisions? Most people use data to support even the simplest decisions on a daily basis. What route should I take to beat the traffic? Is the interest low enough to refinance my house? Should I go to this meeting? When should I wake up to make my 8:00 meeting?
Most business users evaluate data to support their decisions. In a typical scenario a business user has a hypothesis and explores data to evaluate his hypothesis. For example: a business user would advance a hypothesis that his company should receive a good return on investment (ROI) from direct marketing, based on the last year’s data of 50% ROI from the direct marketing.
Data Mining Benefits
In contrast, data mining helps to establish new hypotheses. The goal of data mining is to extract knowledge from large quantities of data and to expose previously unknown interesting patterns such as groups of data records, anomalies and dependencies. Ben Averch from eBay had a great quote on this in his TDWI presentation earlier this year: “The metrics you know are cheap. The metrics you don’t know are expensive – but high in potential ROI. “
If you shop online, you are most likely familiar with recommendations that an online seller displays on their sites: “Customers Who Bought This Item Also Bought…” and “Frequently Bought Together…” These recommendations are the result of data mining and predictive analytics.
According to International Data Corporation’s (IDC) 2011 research, the median return on investment for predictive analytics projects is 250%, which represents an increase from the 145% average ROI from IDC’s 2003 study.
Just a few examples where businesses use data mining solutions:
- Gain new customers and reduce customer attrition
- Minimize risk and detect fraud
- Anticipate resource demands and future sales
- Increase marketing campaign responses
- Analyze customer’s profile and suggests products to purchase
- Identify and resolve transportation bottlenecks
- Pinpoint most effective law enforcement methods
… and many more
Generally data mining algorithms look for patterns and trends in data, based on the relationships between input columns (age, gender, location, profession, time, etc.), and the outcome column (e.g., the decision to purchase a particular product).
Data Mining with SQL Server 2012
If your business already owns SQL Server 2005/2008/2012 SQL Server Standard Edition or above, you have a number of data mining algorithms at your disposal as a part of SQL Server Analysis Services.
Simply install Data Mining Add-ins from http://www.microsoft.com/download/en/details.aspx?id=7294 and use cool reporting capabilities in Excel and Visio.
Microsoft provides helpful guidance on “choosing the right data mining algorithm”; here is a summary:
- Discrete result has a few limited states, such as customer making yes or no decision about purchasing a product or a law firm making a decision to litigate a matter, based on win and loss probability.
- Continuous results have a range of states, such as seasonal trends or company’s future quarterly expenses.
- Sequence predicts customer’s navigation on a website: what is the probability of a customer clicking on a particular link. This can help a company to direct a customer to the fastest path to a sale.
- Groups of common items are the items that typically combine together. This can help to suggest additional products for purchase or design a packaged solution.
- Helps to find clusters of customers that exhibit similar behavior or identify personas for software development.
SQL Server makes it easy to switch from one data mining algorithm to another, to experiment with models and to discover the model that provides the most useful insights. You can leverage insights derived by one model to tune your inputs for another model. You can also combine results of data mining with On-Line Analytical Processing (OLAP).
In addition to the algorithms described earlier, SQL Server includes Text Mining algorithm that analyzes unstructured text data. Text Mining allows companies to analyze unstructured data such as a “comments” section on a customer satisfaction survey. Text mining algorithm is available in SQL Server Integration Services. While it is straight forward to create a package that would process text input with Text Mining algorithm, it is somewhat unfortunate that this algorithm does not follow the same implementation pattern as other data mining algorithms in SQL Server Analysis Services.
Discovering insight with data mining can be very rewarding. Knowledge is the new gold – happy mining!