Data Mining: Practical Machine Learning Techniques for CRM
  • Home
  • Forums
  • Resources

Getting Started: ARFF Files

8/31/2014

0 Comments

 
Data mining platforms convert several data sources into a common data structure that allows an ecosystem of plug-in components to emerge and "speak a common language".

In Weka machine learning, this common file format is called Attribute-Relation File Format, or ARFF for short.
Picture
Converting Salesforce Objects to ARFF: SObj2ARFF

A data loader for converting Salesforce SObjects to ARFF is available at this Github repository.  Many of the articles in this blog assume use of ARFF files generated from a Salesforce.com CRM data source.

The tools and processes mentioned in this blog default to a "command line first" approach to enable automation long-term. All command line steps are depicted in block quotes.

Step 1
Download and Build SObj2ARFF
~/workspace/mkdir weka
~/workspace/git clone https://github.com/dataminingcrm/weka.git weka/
~/workspace/cd weka
~/workspace/weka  ./build.sh
This will clone the Salesforce converter project into a local directory named workspace/weka.
The build.sh script will build a single Java JAR with all dependencies at the location:
~/workspace/weka/bin/dataminingcrm.jar

Step 2
Configuration
Copy the provided configuration template to a file named config.properties and edit the file with Salesforce credentials and object source.
~/workspace/weka cp config.properties.template config.properties
~/workspace/weka vim config.properties

# Config file name/value pairs.
url=https://login.salesforce.com
username=username@domain.org
password=org_password
token=security_token
relation=Opportunity
query=SELECT * FROM Opportunity LIMIT 500
Step 3
Execution
Copy the config.properties to the bin directory and run the sobj2arff.sh script.
~/workspace/weka/cp config.properties bin/
~/workspace/weka/cd bin
~/workspace/weka/bin ./sobj2arff.sh
The output of step 3 will emit an ARFF file to the console (standard out). Alternatively, pipe this output to a *.arff file.
~/workspace/weka/bin ./sobj2arff > opportunities.arff
The ARFF file will contain the object relationship, attributes, and data necessary to proceed with analyzing the data in Weka Explorer.
Picture
Future articles will describe optimizing the Salesforce Object Query Language (SOQL) query for specific types of training or validation data sets.
0 Comments

    Author

    Michael Leach
    San Francisco, CA

    View Mike Leach's profile on LinkedIn
    Picture

    Archives

    September 2014
    August 2014

    Categories

    All
    Business
    Financial
    Technical
    Weka

    RSS Feed

Proudly powered by Weebly
Photo used under Creative Commons from Damian Gadal