Logistic Regression Algorithm in Java

Introduction

Regression analysis is a technique used to determine the relationship between the dependent and the independent variable (s) for prediction purposes. It is a good tool for data modelling and analysis.

There are different regression techniques. Our focus will be on Logistic Regression.

Logistic Regression is suitable when there are more than one independent variables in a dataset. This technique requires large sample sizes as maximum likelihood estimates are less powerful compared to ordinary least squares at low sample sizes.

In this article, we will be demonstrating how to implement Logistic Regression using Java and GridDB. The goal is to predict whether to play or not based on weather conditions.

Store the Data in GridDB

The data has been stored in a CSV file named “play.csv”. The dataset has 4 independent variables (outlook, temperature, humidity, windy), and 1 dependent variable (play). We need to write the dataset into GridDB and enjoy the benefits offered by GridDB like improved query performance.

Let’s first import the libraries to be used for this:

import java.io.IOException;
import java.util.Collection;
import java.util.Properties;
import java.util.Scanner;
import java.io.File;


import com.toshiba.mwcloud.gs.Collection;
import com.toshiba.mwcloud.gs.GSException;
import com.toshiba.mwcloud.gs.GridStore;
import com.toshiba.mwcloud.gs.GridStoreFactory;
import com.toshiba.mwcloud.gs.Query;
import com.toshiba.mwcloud.gs.RowKey;
import com.toshiba.mwcloud.gs.RowSet;

Now, let’s create a static Java class to represent the GridDB container to be used:

public static class Weather {
     @RowKey String outlook;
     String temperature; 
     String humidity;
     String windy;
     String play;
}

See the above class as a SQL table with 5 columns. The 5 variables simply represents the columns of the GridDB container.

We can now connect to our GridDB container from Java. The following code demonstrates this:

Properties props = new Properties();
        props.setProperty("notificationAddress", "239.0.0.1");
        props.setProperty("notificationPort", "31999");
        props.setProperty("clusterName", "defaultCluster");
        props.setProperty("user", "admin");
        props.setProperty("password", "admin");
        GridStore store = GridStoreFactory.getInstance().getGridStore(props);

The GridDB container to be used has the name Weather. Let’s select it:

Collection<String, Weather> coll = store.putCollection("col01", Weather.class);

Everytime we need to use the Weather container, we will use its instance name coll.

We can now read the data from the play.csv file and write it into the GridDB container:

 File file1 = new File("play.csv");
                Scanner sc = new Scanner(file1);
                String data = sc.next();
 
                while (sc.hasNext()){
                        String scData = sc.next();
                        String dataList[] = scData.split(",");
                        String outlook = dataList[0];
                        String temperature = dataList[1];
                        String humidity = dataList[2];
                        String windy = dataList[3];
                        String play = dataList[4];
                                                
                        
                        Weather wt = new Weather();
                        wt.outlook = outlook;
                        wt.temperature = temperature;
                        wt.humidity = humidity;
                        wt.windy = windy;
                        wt.play = play;
    
                        
                        
                        coll.append(wt);
                 }

Retrieve the Data

To use the data to build a Logistic Regression model, we need to pull it from the GridDB container. The following code can help you to accomplish this:

Query<weather> query = coll.query("select *");
                RowSet</weather><weather> rs = query.fetch(false);
            RowSet res = query.fetch();</weather>

The select * query helped us to retrieve all the data stored in the GridDB container.

Implement a Logistic Regression Model

Now that the data is ready, we can use it to implement a Logistic Regression model. The goal of the model is to help us determine whether we can play or not depending on the weather.

Let’s first import the libraries that will help us to implement the model:

import java.io.IOException;

import weka.classifiers.Evaluation;
import weka.classifiers.Classifier;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.converters.ArffLoader; 

import java.io.BufferedReader;
import java.io.FileReader;

Let us create buffered reader and instances for the dataset.

BufferedReader bufferedReader
                = new BufferedReader(
                    new FileReader(res));
 
            // Create dataset instances
            Instances datasetInstances
                = new Instances(bufferedReader);

We can now call the Logistic() function of Weka API to build a Logistic Regression model using the dataset:

datasetInstances.setClassIndex(datasetInstances.numAttributes()-1);

Classifier classifier = new weka.classifiers.functions.Logistic();
        /** */
classifier.buildClassifier(datasetInstances);
System.out.println(classifier);

Make a Prediction

We can use the above model to make a prediction. We will be using the last instance of the dataset to make a prediction as shown below:

Instance pred = datasetInstances.lastInstance();
        double answer = classifier.classifyInstance(pred);
        System.out.println(answer);

Compile and Run the Model

To compile and run the model, you will need the Weka API. Download it from the following URL:

http://www.java2s.com/Code/Jar/w/weka.htm

Next, login as the gsadm user. Move your .java file to the bin folder of your GridDB located in the following path:

/griddb_4.6.0-1_amd64/usr/griddb-4.6.0/bin

Run the following command on your Linux terminal to set the path for the gridstore.jar file:

export CLASSPATH=$CLASSPATH:/home/osboxes/Downloads/griddb_4.6.0-1_amd64/usr/griddb-4.6.0/bin/gridstore.jar

Next, use the following command to compile your .java file:

javac -cp weka-3-7-0/weka.jar WeatherPlay.java

Run the .class file that is generated by running the following command:

java -cp .:weka-3-7-0/weka.jar WeatherPlay

The model returned 1.0 for the prediction, which means that we can play.

If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.