Introduction
Decision Tree is a supervised machine learning algorithm that can be used to solve both classification and regression problems. The goal of this algorithm is to create a model that can predict the value or class of the target variable by learning decision rules inferred from training data.
When using Decision Trees to predict the class label for a record, we begin from the root of the tree. The values of the root attribute are then compared with the record’s attribute. Based on the comparison, we follow the branch that corresponds to that value and proceed to the next node.
The Decision Tree algorithm classifies examples by sorting them right from the root node to the leaf/terminal node, classifying the example.
In this article, we will be discussing how to implement the Decision Tree algorithm in Java. We will be predicting whether an individual will go to the gym or not based on their age and weight.
Store the Data in GridDB
The data has been stored in .csv file named gym.csv
, but we need to move it to GridDB. Although we can still use the data from the CSV file, GridDB offers a number of benefits, especially improved query performance.
The dataset has two independent variables namely age
and weight
and one dependent variable, gym
. A value of 1 for the dependent variable shows that a person will go to the gym while a value of 0 shows that the person won’t go to the gym.
Let’s first import the set of libraries to be used for this:
import java.io.File;
import java.io.IOException;
import java.util.Properties;
import java.util.Collection;
import java.util.Scanner;
import com.toshiba.mwcloud.gs.Collection;
import com.toshiba.mwcloud.gs.GSException;
import com.toshiba.mwcloud.gs.GridStore;
import com.toshiba.mwcloud.gs.GridStoreFactory;
import com.toshiba.mwcloud.gs.Query;
import com.toshiba.mwcloud.gs.RowKey;
import com.toshiba.mwcloud.gs.RowSet;
We can now create a static Java class to represent the GridDB container to be used:
public static class GymData {
@RowKey int age;
int weight;
int gym;
}
The above Java class is equivalent to a SQL table with 3 columns. The 3 variables represents the columns of the GridDB container.
Let’s now establish a connection between Java and the GridDB container. We will use the credentials of our GridDB installation:
Properties props = new Properties();
props.setProperty("notificationAddress", "239.0.0.1");
props.setProperty("notificationPort", "31999");
props.setProperty("clusterName", "defaultCluster");
props.setProperty("user", "admin");
props.setProperty("password", "admin");
GridStore store = GridStoreFactory.getInstance().getGridStore(props);
We will be using a GridDB container named GymData
. Let’s select it:
Collection<String, GymData> coll = store.putCollection("col01", GymData.class);
We will be using the name coll
to refer to the GymData
container.
Let’s now move data from the gym.csv
file into GridDB:
File file1 = new File("gym.csv");
Scanner sc = new Scanner(file1);
String data = sc.next();
while (sc.hasNext()){
String scData = sc.next();
String dataList[] = scData.split(",");
String age = dataList[0];
String weight = dataList[1];
String gym = dataList[2];
GymData gd = new GymData();
gd.age = Integer.parseInt(age);
gd.weight = Integer.parseInt(weight);
gd.gym = Integer.parseInt(gym);
coll.append(gd);
}
The above code will add the data into the GridDB container.
Retrieve the Data
We now want to retrieve the data from the GridDB container and use it to implement a model using the Decision Tree algorithm. The following code can help us to retrieve the data:
Query<GymData> query = coll.query("select *");
RowSet<GymData> rs = query.fetch(false);
RowSet res = query.fetch();
The select *
means select all
, and it helps us to retrieve all the data stored in the container.
Implement the Decision Tree Model
The goal is to use the data to train a machine learning model that can predict whether an individual will go to the gym or not. We will use the Decision Tree algorithm to train the model. Let’s import the necessary libraries from the Weka library:
import java.io.IOException;
import weka.classifiers.Classifier;
import weka.classifiers.Evaluation;
import weka.core.Instances;
import weka.core.Instance;
import weka.core.converters.ArffLoader;
import java.io.FileReader;
import java.io.BufferedReader;
import java.io.IOException;
import weka.classifiers.trees.Id3;
import weka.classifiers.trees.J48;
import weka.core.converters.ArffLoader;
We can now create a buffered reader and instances for the dataset:
BufferedReader bufferedReader
= new BufferedReader(
new FileReader(res));
// Create dataset instances
Instances datasetInstances
= new Instances(bufferedReader);
Let us now call the buildClassifier()
function of the Weka library to build the Decision Tree classifier:
datasetInstances.setClassIndex(datasetInstances.numAttributes()-1);
Classifier myclassifier = new J48();
myclassifier.buildClassifier(datasetInstances);
System.out.println(myclassifier);
The last instance of the dataset has not been used in building the classifier.
Make a Prediction
Let us use the above model and the last instance of the dataset to make a prediction. We will use the classifyInstance()
function of the Weka library as shown below:
Instance pred = datasetInstances.lastInstance();
double answer = myclassifier.classifyInstance(pred);
System.out.println(answer);
Compile and Run the Model
To compile and run the model, you will need the Weka API. Download it from the following URL:
http://www.java2s.com/Code/Jar/w/weka.htm
Next, login as the gsadm
user. Move your .java
file to the bin
folder of your GridDB located in the following path:
/griddb_4.6.0-1_amd64/usr/griddb-4.6.0/bin
Run the following command on your Linux terminal to set the path for the gridstore.jar file:
export CLASSPATH=$CLASSPATH:/home/osboxes/Downloads/griddb_4.6.0-1_amd64/usr/griddb-4.6.0/bin/gridstore.jar
Next, use the following command to compile your .java
file:
javac -cp weka-3-7-0/weka.jar DecisionTreeAlgorithm.java
Run the .class file that is generated by running the following command:
java -cp .:weka-3-7-0/weka.jar DecisionTreeAlgorithm
The model returned 0
for the prediction, which means that the person will not go to the gym.
If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.
As the use of the internet has mitigated the traditional practices of PR, blogs, like fitness blogs for gyms, have become the popular medium for press releases and customer communication. Not to mention the importance of SEO in driving traffic to your site, leading to more member sign-ups. In this article, we discuss the benefits of running a blog, what content to share on a corporate and fitness blog for a gym, studio, or health club, and some tips for making your blog the best it can be.