Jim Kashner Interview by Dan Conway

Thought Leader Interview

Jim Kashner: Bottom Line on Data Mining

Chief Data Mining Technology Officer, Teradata, a division of NCR

Preface

Dan Conway interviewed Jim Kashner, the chief technology officer for data mining at NCR's Data Mining Lab. The San Diego-based Lab gives companies the framework, mentoring, and toolkit to test out data mining and conduct pilot projects before implementing long-term solutions on their own. Based on his experience with these early data miners at the Lab, Kashner has some helpful advice and insights for companies exploring data mining.

Q1: What kind of business cases have companies working with the Data Mining Lab been targeting?
Kashner's Response: The early adopters we have worked with focused on one or two specific questions about customer behavior. Typically they want to quantify aspects of customers' behavior and then group customers according to the categories that emerge. For example, we worked with a large international bank that wanted to control the costs its customers incurred when using ATMs at other banks. The company wanted answers to four key questions:
What constitutes excessive customer use of competitors' ATMs?
Which customers are incurring excessive competitor ATM costs?
What is their value to our bank?
What should we be aware of when we use these results?

Our first discovery was that 10 percent of the bank's customers were incurring 90 percent of the ATM costs. This answer could have been acquired through traditional decision-support analyses. However, data mining also uncovered the fact that of the 10 percent who were incurring the costs, 80 percent were low-value customers. We then used several multivariate data mining techniques to understand the future customer value for each of the current low-value customers and to quantify the concept of "excessive use."
Data mining also paid off in answering the next logical question asked by the bank: Should we rethink how we service the low-value 80 percent? To answer this question, we looked at more behavioral patterns. Interestingly, about 30 percent of the low-value customers turned out to be high-potential customers: They were students. Not necessarily a startling discovery, but one that would have been difficult to make as precisely without massive amounts of detailed data and multivariate analytical techniques. The most useful discovery that the bank made in pursuing the ATM cost problem was that a competitor was targeting college campuses for new ATM installations. No other banks were building brand equity on campuses, and college students were incurring ATM costs because another bank had a convenient and an almost exclusive presence on campuses. Through the data mining experience, the bank was able to answer its initial questions. But more importantly, it uncovered a competitor's strategy before its high-potential customers had defected.

Q2: What other industries seem to be turning to data mining?
Kashner's Response: Interest has been mixed across many industries. We've worked with banks, retailers, manufacturers of packaged goods, telecommunications companies, transportation companies, and are starting to see insurance companies come through the door. Most interest in data mining seems to be coming from marketing and finance. Marketing staffs need to achieve a higher return on their marketing investment and are looking for ways to focus campaigns on the customers most likely to buy a particular product or service. Finance is interested in aligning cost-to-serve with the revenue-or potential revenue-derived from a customer or customer segment. High on the list of both functional groups is understanding and mitigating customer attrition.

Q3: Could you give me an example of the kind of marketing related studies you've seen in the Lab?
Kashner's Response: One of the studies we did is a classic example of marketing response modeling. A national telecommunications company was getting a response rate of only 0.5 percent on its marketing campaigns. It wanted to find out which customers were likely to buy bundles of services. The company was considering building data mining capabilities but first wanted to assess how well data mining works. Working at the Lab, we jointly created a propensity-to-buy model that helps the company predict which services, when bundled, are likely to be purchased by which customers. Deploying the model within one sales territory yielded increased sales of bundled services by a factor of 10. The results proved the accuracy of prediction from the model and convinced the company that data mining pays off. The limited test yielded such good results that the company has since made the model regional. The company also learned enough about the data mining process we used and developed sufficient skill with analytical tools that they've since been able to build out the model independently.

Q4: In general, how do you approach answering these business questions? Is it simply a matter of picking the right statistical technique or is there more to it?
Kashner's Response: There's a lot more to it. We've developed a formal data mining framework that spans the two overarching streams of activity that run through the entire engagement: project management and knowledge transfer. Data mining activities are organized into five consecutive stages: Business Issues, Architecture Preparation, Data Preparation, Analytics, and Knowledge Delivery.

During the Business Issues stage we identify, clarify, and qualify the business questions for data mining. Data mining requires scarcer resources than most other business analysis techniques, so if a question can be best answered using another approach, we identify and recommend the better alternative, and end the process right there.

We ask the following kinds of questions to assess suitability for data mining:
Does answering the question require understanding complex relationships among many variables?
Are there prerequisite questions to be answered? What are they?
Does answering the question require an inference about a phenomenon or the future?
Will the answer describe a complex phenomenon?
What level of technology is required to manage the complexity of answering the question?

Q5: How often does the question need to be reexamined and the answer refreshed?
Kashner's Response: During the Architecture Preparation stage we explore the technical implications of the approach and plan for them. For example, we consider and plan for the technical implications of adding analytical modeling and facilities to the customer's current hardware and software architecture, identify analytical software appropriate for answering the defined business questions, define the dependencies among analytical modeling and other application software, identify other required enabling technologies, cite infrastructure constraints and limitations, outline the characteristics of the production environment required to use and maintain analytical models, and define any data transport issues between the analytical modeling and production environments.

During the Data Preparation stage, we identify, extract, and validate large data samples, move the samples into the analytical environment, and test them for applicability to the business questions. We also construct and refine preliminary models to ensure the strongest results.

Data preparation is no trivial task; it typically takes up 40 to 60 percent of the schedule. Although customers who have already cleansed and transformed their data in a fully populated data warehouse are a few steps ahead, they will still have work to do. Data quality thresholds for data mining are much higher than those for data warehousing. For example, we comb the data looking for null values, which aren't acceptable for certain analytical techniques. Once we know how many null values occur in each data element, we must decide how to replace nulls with a reasonable and useful value, or to exclude the element altogether. We also assess the variability of each element to confirm that the data is varied enough to allow reliable and valid inferences. Finally, as we understand more about what information may exist in the data, we try to be open to including other powerful data elements that emerge. It's common to develop, validate, and test many large data samples before finding the best of the relevant data elements that meet the quality criteria.

Then we're ready for the Analytics stage, which involves building, testing, and validating the analytical models. Analytical modeling is when the fun really begins. By the end of the Analytics stage, we have interpreted and documented the answers to the business questions.

When we know that the models yield valid and useful results, we build out the applications for users so that the models can be integrated into routine business processes. We also frequently conduct several formal training sessions during Knowledge Delivery for those who will use, maintain, and refresh the models.

Q6: What kind of timeline is involved in getting answers from these kinds of data mining implementations?
Kashner's Response: The banking example I mentioned took about 13 weeks from start to finish. Although I'm tempted to say that this is typical for an initial data mining project, we determine the scope of every project independently. Our data mining projects with customers have ranged from four weeks to six months in duration. Complexity of the business question, architecture and technology requirements, amount of data preparation, intricacies of analytics, and the magnitude of knowledge transfer and delivery all influence the duration of a data mining project.

Q7: If you could list the critical success factors for data mining, what would they be?
Kashner's Response: Most importantly, be very clear about the question you want to answer and that data mining is the best way to answer it. Questions best suited for data mining require describing or explaining a relatively complex business phenomenon or making an inference about future events or behavior. Questions that require summarization or simple description across a few variables are usually answered more efficiently and effectively using traditional decision-support techniques and simple descriptive statistics. Staff the effort with people who are curious and eager to learn about data mining, who demonstrate an ability to think analytically, and who have good information technology skills. An advanced degree in applied statistics or applied mathematics is desirable, but not essential.

Active participation from business experts and users of data mining results is central to the success of every data mining project. People who have served as data warehouse analysts or liaisons between business and information technology professionals tend to have many of the problem-solving skills and the experience necessary for the success of data mining efforts. The senior professionals who developed and are sophisticated users of the data warehouse are excellent candidates because they're already familiar with the business meaning of the warehouse data and how and why the elements were developed. Learning the analytical and statistical techniques required for a particular data mining project takes much less time than developing a comprehensive and deep understanding of database content and rationale.

Do as much qualification, examination, and analysis of data elements within the data warehouse as possible. As I mentioned earlier, identifying relevant, high-quality data elements takes the largest portion of the schedule. It's time-consuming for two reasons: First, as elements fail to meet quality standards, you need to return to the database to find other elements to evaluate. This process tends to require several iterations. Second, as you learn more and more about the data, new insights and options arise that may require a refinement of the analytical approach.

Q8: What happens if a company has a valid business case for data mining but doesn't have the right data?
Kashner's Response: In some cases, companies can find ways to acquire the data they need. For example, one data mining project we ran was done as a three-way partnership: a manufacturer of package goods, a retailer, and the Lab. The manufacturer wanted to learn about customer preferences so that it could better align its products with what people want: flavors and quantities, for example. Unfortunately, it didn't have access to information about the people who bought its products, so it approached the retailers who sell its products, looking for one who was tracking customers. Very few were, but the manufacturer did find a retailer that was collecting information on what individual customers were buying as part of a loyalty card program.

Fortunately, all of the parties involved were committed to getting reliable and valid results. For this investigation, we wanted to understand the effective components of customer loyalty campaigns. We established control and experimental marketing programs, and measured success by individual product purchases. When analyzing the data, one of the things we looked for was which products were consistently purchased together.

The manufacturer and retailer now understand the effectiveness of their customer loyalty campaigns in much greater depth, breadth, and precision than was possible using traditional market analysis methods such as focus groups. Their new ability to track, analyze, and understand individual customer purchasing behavior over time is a very large step away from mass marketing, toward mass customization.

Q9: How do you select the right data mining tool? What criteria are important when evaluating alternatives?
Kashner's Response: In the beginning, select an analytical tool whose breadth and depth matches the business need for data mining. If there's a choice, choose the tool that your data miners already know. This may sound a bit surprising, but most folks who have done post-graduate work have had to use analytical tools such as those from SAS Institute and SPSS Inc. So it's very likely that the people qualified to do data mining are familiar with at least one of the more sophisticated and proven tools on the market.

Every organization will-and should-have different selection criteria, but the following four criteria should be part of every tool evaluation: depth of analytical technique, breadth of analytical technique, data handling features, and ease of use. (Depth refers to the quality and reliability of each analytical technique; breadth refers to the number of analytical techniques supported by the tool.)

There are many pretenders in the market, but only a dozen or so tools actually do well what they claim to do. Evaluating the quality of algorithms underlying the marketing claims requires substantial statistical expertise. If you don't have this advanced expertise and experience internally, I recommend hiring an independent consultant who specializes in data mining analytics to help evaluate and narrow your choice of tools. Linking back to business need, a tool may be perfectly acceptable if it handles one or two analytical techniques with great depth and sophistication and two or three others superficially, as long as the depth of each technique matches the depth of the analysis you need to answer the business question.

Analytical tools are expensive, and it takes a while to learn to use one effectively. A company might know that it needs to build a suite of tools over time, but it would be wise to choose one initially that delivers most of the required functionality, learn it well, and then supplement it with special function tools as data mining requirements evolve. The two most obvious concerns related to data are volume and manipulation. Every tool has a ceiling on volume; it is important to identify whether or not that ceiling is artificially low. If it is, the tool was probably designed as an analytical engine for small data sets rather than data mining. The ability to handle large volumes is a recent requirement and emerging trend. Many of the tools on the market were developed before the volume requirement was clear and have not updated their data volume capabilities. In addition to volume, the ease with which data is manipulated and transformed is important to consider if the Data Preparation step is to be efficient and well managed. I can't emphasize enough how crucial it is to discover and test the characteristics of each data element until you're very sure that you've got the right elements and know all of their characteristics. In general, the more a tool supports an iterative process, the better the results.

Ease of use has to do with trade-offs and familiarity. As in many things, the more complex a tool, the longer it will take to learn. The tool with the simplest, most intuitive user interface may have your data miners up and running very quickly, but as their sophistication with data mining grows, they might find the tool limiting. There are a couple of ways to offset ease of use in favor of greater depth. As I mentioned before, choosing the most sophisticated tool that is already familiar to your data miners has several advantages. The learning curve is shorter and chances are that tool offers much greater breadth and depth than a tool with a much slicker user interface. A second strategy is to partner with someone who knows the most sophisticated tool and knows how to do data mining until your internal team has developed basic competence and confidence in that tool. This kind of knowledge transfer capability is what attracts a lot of companies to the Lab.

As premier tool manufacturers continue to put more data mining functions and sophisticated techniques into the data warehouse, data mining becomes a more powerful and efficient process. As user interfaces and help engines continue to improve for tools that operate in the data warehouse, data mining will attract a wider and more diverse pool of practitioners. And with increasing numbers of practitioners with diverse ideas, data mining is coalescing as a formal discipline that will contribute to elevating decision support.

Q10: What is the future for data mining and for NCR's Data Mining Lab?
Kashner's Response: Teradata pioneered in databases to streamline data mining and scale technology for businesses with large numbers of customers or transactions (telco, retail, financial). Currently businesses can run simple predictive analysis in real-time, however Teradata will bring complex analysis in real-time.

1. Teradata continues to pioneer data mining advancements. Teradata was first to optimize the data mining environment by integrating in-database technology that provided the following benefits: Reduce the data mining cycle by 50% allowing businesses integrate predictive analysis into their business process quickly and efficiently.

2. Teradata was also able to bring predictive analytics to enterprise scale implementations by running models against millions if not billions of customer records, not just samples.

3. Future advancement include enabling complex analysis that require historical and current data into a real-time environments where predictions can be made in sub-seconds. For example, credit card fraud is typically identified based on known fraudulent patterns, however fraudulent behavior varies based on customers. By understanding detailed customer behavior this analysis can be refined to predict fraudulent behavior at a customer level.

About Jim Kashner
Jim is Chief Technology Officer of Data Mining for Teradata, and has been with Teradata since 1993. He has been working in the areas of analytical modeling and information technology for over twenty years.
As co-founder of Teradata's Data Mining Lab in 1996, Kashner has played a large role in the development and delivery of Teradata's Professional Services and product offerings in the areas of data mining, knowledge discovery, and CRM.
Kashner has undergraduate and graduate degrees in Psychology and Quantitative Methods from Arizona State University. He taught psychology, statistics, and advanced quantitative methods at Arizona State, and conducted research there. Kashner is a member of several professional organizations, including the "American Statistical Association," the "American Association for Artificial Intelligence," the "American Mathematical Association," "IEEE," and the "American Federation of Musicians."

Citation
Conway, D., "Jim Kashner Interview: Bottom Line on Data Mining", DSSResources.COM, 11/26/2004.

Dan Conway, Director, Public Relations, Teradata, a division of NCR Corporation, provided permission to post this interview at DSSResources.COM on September 1, 2004. The interview was posted at DSSResources.COM on Friday, November 26, 2004.