Introduction
The Naive Bayes algorithm is a classification technique that is based on the Bayes’ Theorem. It assumes that the predictors are independent of each other. A Naive Bayes classifier assumes that the presence of a certain feature in a class is not related to the presence of any other feature.
For example, the apple fruit is characterized by red color, round shape, and about 3 inches of diameter. Although these features depend on each other, they independently contribute to the probability of the fruit being an apple. That’s why it’s called “Naive”.
It is an easy model to build and well-applicable to very large datasets. Despite its simplicity, Naive Bayes has outperformed even the most sophisticated classification algorithms.
In this article, we will be discussing how to implement a Naive Bayes classifier using Java and GridDB. The goal will be to predict whether a customer will purchase a product based on day, discount, and free delivery.
Store the Data in GridDB
The data has been stored in a CSV file named “shopping.csv”. We want to move the data into GridDB and enjoy some of its benefits including improved query performance.
Let’s import the libraries to be used for this:
import java.io.File;
import java.io.IOException;
import java.util.Properties;
import java.util.Collection;
import java.util.Scanner;
import com.toshiba.mwcloud.gs.Collection;
import com.toshiba.mwcloud.gs.GSException;
import com.toshiba.mwcloud.gs.GridStore;
import com.toshiba.mwcloud.gs.GridStoreFactory;
import com.toshiba.mwcloud.gs.Query;
import com.toshiba.mwcloud.gs.RowKey;
import com.toshiba.mwcloud.gs.RowSet;
Next, we will create a static Java class to represent the GridDB container where the data is to be stored:
public static class ShoppingData {
@RowKey String day;
String discount;
String free_delivery;
String purchase;
}
See above Java class as a SQL table with 4 columns. The 4 variables represents the columns of the GridDB container.
Let’s now connect to the GridDB container from Java. We will use the credentials of our GridDB installation:
Properties props = new Properties();
props.setProperty("notificationAddress", "239.0.0.1");
props.setProperty("notificationPort", "31999");
props.setProperty("clusterName", "defaultCluster");
props.setProperty("user", "admin");
props.setProperty("password", "admin");
GridStore store = GridStoreFactory.getInstance().getGridStore(props);
The container has the name “ShoppingData”. Let’s select it:
Collection<String, ShoppingData> coll = store.putCollection("col01", ShoppingData.class);
We will be using the name coll
to refer to the ShoppingData
container.
Let’s now write the shopping.csv
data into GridDB:
File file1 = new File("shopping.csv");
Scanner sc = new Scanner(file1);
String data = sc.next();
while (sc.hasNext()){
String scData = sc.next();
String dataList[] = scData.split(",");
String day = dataList[0];
String discount = dataList[1];
String free_delivery = dataList[2];
String purchase = dataList[3];
ShoppingData sd = new ShoppingData();
sd.day = day;
sd.discount= discount;
sd.free_delivery = free_delivery;
sd.purchase = purchase;
coll.append(sd);
}
The above code will add the data into the GridDB container.
Retrieve the Data
We can now retrieve the data from GridDB and use it to implement a Naive Bayes Classifier. The following code can help us to retrieve the data:
Query<shoppingdata> query = coll.query("select *");
RowSet</shoppingdata><shoppingdata> rs = query.fetch(false);
RowSet res = query.fetch();</shoppingdata>
The select *
statement helped us to retrieve all the data stored in the container.
Implement the Naive Bayes Classifier
Now that we have the data, we can use it to train a machine learning model using the Naive Bayes algorithm. We will use the Weka library. Let’s first import all the libraries to be used to train the model:
import weka.core.Instances;
import weka.filters.Filter;
import java.io.FileReader;
import java.io.BufferedReader;
import weka.classifiers.Evaluation;
import weka.classifiers.Classifier;
import weka.core.converters.ArffLoader;
import weka.classifiers.bayes.NaiveBayesMultinomial;
import weka.filters.unsupervised.attribute.StringToWordVector;
Let’s create a buffered reader and instances for the dataset:
BufferedReader bufferedReader
= new BufferedReader(
new FileReader(res));
// Create dataset instances
Instances datasetInstances
= new Instances(bufferedReader);
Let’s now use the multinomial Weka classifier for Naive Bayes to build and evaluate the model:
datasetInstances.setClassIndex(datasetInstances.numAttributes()-1);
Classifier classifier = new NaiveBayesMultinomial();
classifier.buildClassifier(datasetInstances);
Evaluation eval = new Evaluation(datasetInstances);
eval.evaluateModel(classifier, datasetInstances);
System.out.println("Naive Bayes Classifier Evaluation Summary");
System.out.println(eval.toSummaryString());
System.out.print(" the input data expression as per the alogorithm is ");
System.out.println(classifier);
Make a Prediction
We did not use the last instance of the dataset to train the model. We want to use it to make a prediction. We will use the classifyInstance()
function of the Weka library as shown below:
Instance pred = datasetInstances.lastInstance();
double answer = classifier.classifyInstance(pred);
System.out.println(answer);
Compile and Run the Model
To compile and run the above Naive Bayes classifier, you will need the Weka API. Download it from the following URL:
http://www.java2s.com/Code/Jar/w/weka.htm
Next, login as the gsadm
user. Move your .java
file to the bin
folder of your GridDB located in the following path:
/griddb_4.6.0-1_amd64/usr/griddb-4.6.0/bin
Run the following command on your Linux terminal to set the path for the gridstore.jar file:
export CLASSPATH=$CLASSPATH:/home/osboxes/Downloads/griddb_4.6.0-1_amd64/usr/griddb-4.6.0/bin/gridstore.jar
Next, use the following command to compile your .java
file:
javac -cp weka-3-7-0/weka.jar NaiveBayesClassifierExample.java
Run the .class file that is generated by running the following command:
java -cp .:weka-3-7-0/weka.jar NaiveBayesClassifierExample
The prediction result shows that the customer will make a purchase.
If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.