Data mining platforms convert several data sources into a common data structure that allows an ecosystem of plug-in components to emerge and "speak a common language".
In Weka machine learning, this common file format is called Attribute-Relation File Format, or ARFF for short.
In Weka machine learning, this common file format is called Attribute-Relation File Format, or ARFF for short.
Converting Salesforce Objects to ARFF: SObj2ARFF
A data loader for converting Salesforce SObjects to ARFF is available at this Github repository. Many of the articles in this blog assume use of ARFF files generated from a Salesforce.com CRM data source.
The tools and processes mentioned in this blog default to a "command line first" approach to enable automation long-term. All command line steps are depicted in block quotes.
Step 1
Download and Build SObj2ARFF
A data loader for converting Salesforce SObjects to ARFF is available at this Github repository. Many of the articles in this blog assume use of ARFF files generated from a Salesforce.com CRM data source.
The tools and processes mentioned in this blog default to a "command line first" approach to enable automation long-term. All command line steps are depicted in block quotes.
Step 1
Download and Build SObj2ARFF
~/workspace/mkdir weka
~/workspace/git clone https://github.com/dataminingcrm/weka.git weka/
~/workspace/cd weka
~/workspace/weka ./build.sh
This will clone the Salesforce converter project into a local directory named workspace/weka.
The build.sh script will build a single Java JAR with all dependencies at the location:
~/workspace/weka/bin/dataminingcrm.jar
Step 2
Configuration
Copy the provided configuration template to a file named config.properties and edit the file with Salesforce credentials and object source.
The build.sh script will build a single Java JAR with all dependencies at the location:
~/workspace/weka/bin/dataminingcrm.jar
Step 2
Configuration
Copy the provided configuration template to a file named config.properties and edit the file with Salesforce credentials and object source.
~/workspace/weka cp config.properties.template config.properties
~/workspace/weka vim config.properties
# Config file name/value pairs.
url=https://login.salesforce.com
username=username@domain.org
password=org_password
token=security_token
relation=Opportunity
query=SELECT * FROM Opportunity LIMIT 500
Step 3
Execution
Copy the config.properties to the bin directory and run the sobj2arff.sh script.
Execution
Copy the config.properties to the bin directory and run the sobj2arff.sh script.
~/workspace/weka/cp config.properties bin/
~/workspace/weka/cd bin
~/workspace/weka/bin ./sobj2arff.sh
The output of step 3 will emit an ARFF file to the console (standard out). Alternatively, pipe this output to a *.arff file.
~/workspace/weka/bin ./sobj2arff > opportunities.arff
The ARFF file will contain the object relationship, attributes, and data necessary to proceed with analyzing the data in Weka Explorer.
Future articles will describe optimizing the Salesforce Object Query Language (SOQL) query for specific types of training or validation data sets.