Category: Business

Classifying Machine Learning Problems

9/4/2014

When approaching CRM analysis using machine learning, it helps to first understand and categorize the problem.

Each of the quadrants in the diagram below has it's own unique set of data collection and processing requirements.

Classification problems result in labeling data.

Regression problems result in making a numeric prediction, such as a probability.

Batch problems are done "offline" and often use data sets spanning several days or years. The value of batch problems increases with the number of unique data points that can be added to the training set.

Real-Time problems operate on smaller data samples, but are often trained on a very large corpus of data, such as log file history or social media posts.

Some examples of machine learning problems at the intersections of these dimensions:

Opportunity Win/Loss (Batch Classification): This could be a weekly exercise of reviewing pipeline opportunities and letting a classification algorithm predict "Win" or "Loss" based on past Opportunities.

Loyalty Programs (Batch Regression): An airline may run monthly batch predictions to determine how many reward points might incentivize frequent flyers to accept an offer.

Twitter Sentiment (Real-Time Classification): Analyzing the "firehose" of tweets to determine if customers are happy or upset with a particular brand.

Churn Prediction (Real-Time Regression): A call center may make make predictions on whether a customer is likely to attrit based on real-time information. Higher probably churn predictions may result in authorizing Customer Service Representatives to offer retention incentives.

Note: Real-Time machine learning can be further sub-categorized into when the actual training takes place. Some applications will evaluate data in real-time using a previously trained, static model that goes relatively unchanged. For example, weather predictions.

Adaptive learning algorithms simultaneously evaluate real-time data streams AND re-train the evaluation model, typically based on a pre-defined trailing sample window, or when sufficient human feedback signal has been collected to warrant a retraining.

1 Comment

Welcome to Data Mining CRM!

8/30/2014

1 Comment

"Begin With The End in Mind" - Stephen Covey

Welcome to Data Mining CRM! This blog documents lessons learned applying various data science and machine learning techniques to Customer Relationship Management (CRM) data.

Salesforce.com CRM and Weka are my primary tools, both of which have free Developer tools available. Click on the "Resources" page for links to the tools discussed in this blog.

Audience
My interests span both business and technology, as such there are 3 audiences for this blog:

Financial Decision Makers
CFOs, Finance Executives, or Board Members who have a fiscal or fiduciary responsibility to an organization. Blog posts categorized as "Financial" will explore the ROI of data mining and how to setup data mining initiatives for success.

Business Decision Makers
Line of business leaders; VP of Sales, Analysts, and other business users. Blog entries tagged "Business" will "begin with the end in mind" to first identify business objectives to be achieved, then work backwards to apply data mining techniques.

Technical Decision Makers
Developers, Analysts, Architects, Data Scientists, Statisticians; anyone who gets hands on with implementing data mining and machine learning technology. Blog entries tagged "Technical" will explore the full lifecycle of data mining; from building training data sets, classifying, making predictions, and operationally making data mining a repeatable process.

Personal Journey to Data Mining

My personal journey to data mining began with attempts at applying Edward Tufte's information architecture print techniques to web dashboard designs and working backwards to understand how data must be structured to support rich analytics visualizations. This evolved into developing tools for analyzing site uptime logs, dabbling in predicting system behaviors, and developing a log analytics service (Logalytics.io).

Several years were spent learning how to prepare and filter data so that it can be analyzed (NYTimes says about 50%-80% of data mining is "janitor work"... and yes, that's true).

Tufte's multi-variate visualizations help humans identify patterns and correlations that are not evident by looking at the raw data. Can computers be trained to identify these patterns? If so, what is the future impact on CRM dashboard design and information architecture?

Identifying potential customer opportunities involves creating reports and dashboards that apply some commonly understand correlations; "Show me all customers who have spent in excess of X dollars over the past Y months" or "Show me all customers who have opened a newsletter email or clicked on a particular link for a particular campaign".

But what correlations are we missing? There's just too much data today for the classic analytics model to scale. Big data gets bigger everyday. Can we just dump all available customer data into a magic machine and have it reveal undiscovered correlations?

In my pursuit to answer these questions, I attended the Stanford online learning course for machine learning; which provides deep exposure to the statistical foundations of machine learning and artificial intelligence. However, my end goal of developing interactive, CRM-oriented, dashboards required a more practical approach to data mining, which I ultimately discovered through University of Waikato's online Weka courses. Weka's use of Java, coupled with some Marketing related learning recipes, provides a pragmatic approach to data mining CRM.

Next Steps

Amazon.com "People who bought X also bought Y"
Netflix.com "Recommended movies for you"
Google search results
YouTube recommended videos
Facebook activity feed and targeted ads

Machine learning (ML) recommendation engines were built into the foundation of the above mentioned brands, which gave them staggering competitive advantages. The travel and financial industries are experiencing churn as ML-focused services are making exceptionally relevant predictions on customer demands and disrupting previously established business models.

We live in an extremely dynamic society where a 360° view of the Customer involves data from CRM, ERP, social, mobile, Internet of Things sensor streams, and a variety of other systems of engagement. Data mining is our only hope to make sense of it all and evolve the craft of customer relationship management. I hope you'll actively comment on these blog entries and share in this journey.

(ps: Converting this blog into a book is an eventual goal. Therefore, I will be occasionally revisiting some posts and editing for brevity, or enhancing with diagrams. Apologies in advance if this iterative approach to blogging results in some comments or inbound references appearing slightly out of context. I'll do my best to mention article changes within the comments.)

1 Comment

Classifying Machine Learning Problems

Welcome to Data Mining CRM!

Author

Archives

Categories