Introduction
Regression analysis is a technique used to determine the relationship between the dependent and the independent variable (s) for prediction purposes. It is a good tool for data modelling and analysis.
There are different regression techniques. Our focus will be on Logistic Regression.
Logistic Regression is suitable when there are more than one independent variables in a dataset. This technique requires large sample sizes as maximum likelihood estimates are less powerful compared to ordinary least squares at low sample sizes.
In this article, we will be demonstrating how to implement Logistic Regression using Java and GridDB. The goal is to predict whether to play or not based on weather conditions.
Store the Data in GridDB
The data has been stored in a CSV file named “play.csv”. The dataset has 4 independent variables (outlook, temperature, humidity, windy), and 1 dependent variable (play). We need to write the dataset into GridDB and enjoy the benefits offered by GridDB like improved query performance.
Let’s first import the libraries to be used for this:
import java.io.IOException;
import java.util.Collection;
import java.util.Properties;
import java.util.Scanner;
import java.io.File;
import com.toshiba.mwcloud.gs.Collection;
import com.toshiba.mwcloud.gs.GSException;
import com.toshiba.mwcloud.gs.GridStore;
import com.toshiba.mwcloud.gs.GridStoreFactory;
import com.toshiba.mwcloud.gs.Query;
import com.toshiba.mwcloud.gs.RowKey;
import com.toshiba.mwcloud.gs.RowSet;
Now, let’s create a static Java class to represent the GridDB container to be used:
public static class Weather {
@RowKey String outlook;
String temperature;
String humidity;
String windy;
String play;
}
See the above class as a SQL table with 5 columns. The 5 variables simply represents the columns of the GridDB container.
We can now connect to our GridDB container from Java. The following code demonstrates this:
Properties props = new Properties();
props.setProperty("notificationAddress", "239.0.0.1");
props.setProperty("notificationPort", "31999");
props.setProperty("clusterName", "defaultCluster");
props.setProperty("user", "admin");
props.setProperty("password", "admin");
GridStore store = GridStoreFactory.getInstance().getGridStore(props);
The GridDB container to be used has the name Weather
. Let’s select it:
Collection<String, Weather> coll = store.putCollection("col01", Weather.class);
Everytime we need to use the Weather
container, we will use its instance name coll
.
We can now read the data from the play.csv
file and write it into the GridDB container:
File file1 = new File("play.csv");
Scanner sc = new Scanner(file1);
String data = sc.next();
while (sc.hasNext()){
String scData = sc.next();
String dataList[] = scData.split(",");
String outlook = dataList[0];
String temperature = dataList[1];
String humidity = dataList[2];
String windy = dataList[3];
String play = dataList[4];
Weather wt = new Weather();
wt.outlook = outlook;
wt.temperature = temperature;
wt.humidity = humidity;
wt.windy = windy;
wt.play = play;
coll.append(wt);
}
Retrieve the Data
To use the data to build a Logistic Regression model, we need to pull it from the GridDB container. The following code can help you to accomplish this:
Query<weather> query = coll.query("select *");
RowSet</weather><weather> rs = query.fetch(false);
RowSet res = query.fetch();</weather>
The select *
query helped us to retrieve all the data stored in the GridDB container.
Implement a Logistic Regression Model
Now that the data is ready, we can use it to implement a Logistic Regression model. The goal of the model is to help us determine whether we can play or not depending on the weather.
Let’s first import the libraries that will help us to implement the model:
import java.io.IOException;
import weka.classifiers.Evaluation;
import weka.classifiers.Classifier;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.converters.ArffLoader;
import java.io.BufferedReader;
import java.io.FileReader;
Let us create buffered reader and instances for the dataset.
BufferedReader bufferedReader
= new BufferedReader(
new FileReader(res));
// Create dataset instances
Instances datasetInstances
= new Instances(bufferedReader);
We can now call the Logistic()
function of Weka API to build a Logistic Regression model using the dataset:
datasetInstances.setClassIndex(datasetInstances.numAttributes()-1);
Classifier classifier = new weka.classifiers.functions.Logistic();
/** */
classifier.buildClassifier(datasetInstances);
System.out.println(classifier);
Make a Prediction
We can use the above model to make a prediction. We will be using the last instance of the dataset to make a prediction as shown below:
Instance pred = datasetInstances.lastInstance();
double answer = classifier.classifyInstance(pred);
System.out.println(answer);
Compile and Run the Model
To compile and run the model, you will need the Weka API. Download it from the following URL:
http://www.java2s.com/Code/Jar/w/weka.htm
Next, login as the gsadm
user. Move your .java
file to the bin
folder of your GridDB located in the following path:
/griddb_4.6.0-1_amd64/usr/griddb-4.6.0/bin
Run the following command on your Linux terminal to set the path for the gridstore.jar file:
export CLASSPATH=$CLASSPATH:/home/osboxes/Downloads/griddb_4.6.0-1_amd64/usr/griddb-4.6.0/bin/gridstore.jar
Next, use the following command to compile your .java
file:
javac -cp weka-3-7-0/weka.jar WeatherPlay.java
Run the .class file that is generated by running the following command:
java -cp .:weka-3-7-0/weka.jar WeatherPlay
The model returned 1.0
for the prediction, which means that we can play.
If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.