Going to the Gym: Using A Decision Tree | GridDB: Open Source Time Series Database for IoT

Introduction

Decision Tree is a supervised machine learning algorithm that can be used to solve both classification and regression problems. The goal of this algorithm is to create a model that can predict the value or class of the target variable by learning decision rules inferred from training data.

When using Decision Trees to predict the class label for a record, we begin from the root of the tree. The values of the root attribute are then compared with the record’s attribute. Based on the comparison, we follow the branch that corresponds to that value and proceed to the next node.

The Decision Tree algorithm classifies examples by sorting them right from the root node to the leaf/terminal node, classifying the example.

In this article, we will be discussing how to implement the Decision Tree algorithm in Java. We will be predicting whether an individual will go to the gym or not based on their age and weight.

Store the Data in GridDB

The data has been stored in .csv file named gym.csv, but we need to move it to GridDB. Although we can still use the data from the CSV file, GridDB offers a number of benefits, especially improved query performance.

The dataset has two independent variables namely age and weight and one dependent variable, gym. A value of 1 for the dependent variable shows that a person will go to the gym while a value of 0 shows that the person won’t go to the gym.

Let’s first import the set of libraries to be used for this:

import java.io.File;
import java.io.IOException;
import java.util.Properties;
import java.util.Collection;
import java.util.Scanner;


import com.toshiba.mwcloud.gs.Collection;
import com.toshiba.mwcloud.gs.GSException;
import com.toshiba.mwcloud.gs.GridStore;
import com.toshiba.mwcloud.gs.GridStoreFactory;
import com.toshiba.mwcloud.gs.Query;
import com.toshiba.mwcloud.gs.RowKey;
import com.toshiba.mwcloud.gs.RowSet;

We can now create a static Java class to represent the GridDB container to be used:

public static class GymData {
     @RowKey int  age;
     int weight; 
     int gym;
    }

The above Java class is equivalent to a SQL table with 3 columns. The 3 variables represents the columns of the GridDB container.

Let’s now establish a connection between Java and the GridDB container. We will use the credentials of our GridDB installation:

        Properties props = new Properties();
        props.setProperty("notificationAddress", "239.0.0.1");
        props.setProperty("notificationPort", "31999");
        props.setProperty("clusterName", "defaultCluster");
        props.setProperty("user", "admin");
        props.setProperty("password", "admin");
        GridStore store = GridStoreFactory.getInstance().getGridStore(props);

We will be using a GridDB container named GymData. Let’s select it:

Collection<String, GymData> coll = store.putCollection("col01", GymData.class);

We will be using the name coll to refer to the GymData container.

Let’s now move data from the gym.csv file into GridDB:

File file1 = new File("gym.csv");
                Scanner sc = new Scanner(file1);
                String data = sc.next();
 
                while (sc.hasNext()){
                        String scData = sc.next();
                        String dataList[] = scData.split(",");
                        String age = dataList[0];
                        String weight = dataList[1];
                        String gym = dataList[2];
                        
                                                
                        
                        GymData gd = new GymData();
                        gd.age = Integer.parseInt(age);
                        gd.weight = Integer.parseInt(weight);
                        gd.gym = Integer.parseInt(gym);
                            
                        
                        
                        coll.append(gd);
                 }

The above code will add the data into the GridDB container.

Retrieve the Data

We now want to retrieve the data from the GridDB container and use it to implement a model using the Decision Tree algorithm. The following code can help us to retrieve the data:

Query<GymData> query = coll.query("select *");
                RowSet<GymData> rs = query.fetch(false);
            RowSet res = query.fetch();

The select * means select all, and it helps us to retrieve all the data stored in the container.

Implement the Decision Tree Model

The goal is to use the data to train a machine learning model that can predict whether an individual will go to the gym or not. We will use the Decision Tree algorithm to train the model. Let’s import the necessary libraries from the Weka library:

import java.io.IOException;

import weka.classifiers.Classifier;
import weka.classifiers.Evaluation;
import weka.core.Instances;
import weka.core.Instance;
import weka.core.converters.ArffLoader; 

import java.io.FileReader;
import java.io.BufferedReader;

import java.io.IOException;
import weka.classifiers.trees.Id3;
import weka.classifiers.trees.J48;
import weka.core.converters.ArffLoader;

We can now create a buffered reader and instances for the dataset:

BufferedReader bufferedReader
                = new BufferedReader(
                    new FileReader(res));
 
            // Create dataset instances
            Instances datasetInstances
                = new Instances(bufferedReader);

Let us now call the buildClassifier() function of the Weka library to build the Decision Tree classifier:

datasetInstances.setClassIndex(datasetInstances.numAttributes()-1);

            
            Classifier myclassifier = new J48();
        
        
        myclassifier.buildClassifier(datasetInstances);
                System.out.println(myclassifier);

The last instance of the dataset has not been used in building the classifier.

Make a Prediction

Let us use the above model and the last instance of the dataset to make a prediction. We will use the classifyInstance() function of the Weka library as shown below:

Instance pred = datasetInstances.lastInstance();
        double answer = myclassifier.classifyInstance(pred);
        System.out.println(answer);

Compile and Run the Model

To compile and run the model, you will need the Weka API. Download it from the following URL:

http://www.java2s.com/Code/Jar/w/weka.htm

Next, login as the gsadm user. Move your .java file to the bin folder of your GridDB located in the following path:

/griddb_4.6.0-1_amd64/usr/griddb-4.6.0/bin

Run the following command on your Linux terminal to set the path for the gridstore.jar file:

export CLASSPATH=$CLASSPATH:/home/osboxes/Downloads/griddb_4.6.0-1_amd64/usr/griddb-4.6.0/bin/gridstore.jar

Next, use the following command to compile your .java file:

javac -cp weka-3-7-0/weka.jar DecisionTreeAlgorithm.java

Run the .class file that is generated by running the following command:

java -cp .:weka-3-7-0/weka.jar DecisionTreeAlgorithm

The model returned 0 for the prediction, which means that the person will not go to the gym.

If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.