Building a Recommendation System in Java

Introduction

Recommender systems are systems designed to recommend items to users based on different factors. They recommend the items that the users are most likely to be interested in. Netflix uses a recommender system to recommend movies to its users to watch. Amazon uses a recommender system to recommend items to users for purchase.

Recommender systems are very essential in the digital world as users are overwhelmed by choices and they need help in finding the items they are looking for. This makes customers happier, leading to more sales. In this, article, we will be discussing how to build a recommender system using Java and GridDB.

How to Implement a Recommender System in Java

In this section, we will be building a recommender system in Java using Apache Mahout. Apache Mahout is an open source project used to create machine learning algorithms. You can use it to implement machine learning techniques like classification, clustering, and recommendation.

The work of the recommender system will be to recommend items to a user. The dataset to be used is organized in three columns, UserID, Item Number, and Rating. Thus, a column with values “4, 12, 3.0” means that the user with an ID of 4 has rated item 12 as 3 stars. The dataset is stored in a .CSV file, but we will move it to GridDB and pull it from there for building a recommender system.

Create a Maven Project

Launch your Java IDE and create a new Maven project. Open the pom.xml file and modify it to the following:


<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>Add your Group Id</groupId>
  <artifactId>Add your Artifact Id</artifactId>
  <version>1.0-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>Add the name of your Project</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <dependencies>
  <dependency>
    <groupId>org.apache.mahout</groupId>
    <artifactId>mahout-mr</artifactId>
    <version>0.10.0</version>
  </dependency>

    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

Make sure that you enter the correct details in the above file according to your project settings.

Write the Data into GridDB

We now need to write our data from the CSV file into GridDB. First, let’s import the libraries to be used:

import java.io.IOException;
import java.util.Collection;
import java.util.Properties;
import java.util.Scanner;

import com.toshiba.mwcloud.gs.Collection;
import com.toshiba.mwcloud.gs.GSException;
import com.toshiba.mwcloud.gs.GridStore;
import com.toshiba.mwcloud.gs.GridStoreFactory;
import com.toshiba.mwcloud.gs.Query;
import com.toshiba.mwcloud.gs.RowKey;
import com.toshiba.mwcloud.gs.RowSet;

The data has been stored in a CSV file named data.csv. Our goal is to store the data into a GridDB container. Let’s first create the container schema as a static class:

 public static class Ratings{
     @RowKey String userid;
     String item;
     String rating;
    }

The above class represents a GridDB container, which is equivalent to a SQL table with 3 columns.

Let us now establish a connection to GridDB. We should create a Properties instance using the specifics of our GridDB installation, including the name of the cluster to connect to, the name of the user who needs to connect, and the password for that user. Use the following code:

  Properties props = new Properties();
        props.setProperty("notificationAddress", "239.0.0.1");
        props.setProperty("notificationPort", "31999");
        props.setProperty("clusterName", "defaultCluster");
        props.setProperty("user", "admin");
        props.setProperty("password", "admin");
        GridStore store = GridStoreFactory.getInstance().getGridStore(props);

Change the above details to match the credentials of your GridDB installation.

We want to write the data to the Ratings container, so let’s select it:

Collection<String, Ratings> coll = store.putCollection("col01", Ratings.class);

The coll is an instance of the Ratings container, hence, we will be using it to refer to the container.

Let us now read the data from the data.csv file and write it into the GridDB container:

File file1 = new File("data.csv");
                Scanner sc = new Scanner(file1);
                String data = sc.next();
 
                while (sc.hasNext()){
                        String scData = sc.next();
                        String dataList[] = scData.split(",");
                        String userid = dataList[0];
                        String item = dataList[1];
                        String rating = dataList[2];

                        
                        
                        Ratings ratings = new Ratings();
    
                        ratings.userid = userid;
                        ratings.item = item;
                        ratings.rating = rating;
                        coll.append(ratings);
                 }

The above code will read data from the data.csv file and create the ratings object. This object has then been appended to the GridDB container. Since the data columns are separated by commas, we had to set the comma (,) as the delimiter.

Pull the Data from GridDB

We should now pull the data from GridDB and use it to build a recommender system. The following code will help you to pull the data from GridDB:

Query<ratings> query = coll.query("select *");
            RowSet</ratings><ratings> rs = query.fetch(false);
            RowSet res = query.fetch();</ratings>

The select * query helps us to pull all the data from the GridDB container.

Build a Recommender System

It is now time to build a recommender system from the data using Apache Mahout. Let’s first import the necessary libraries from Apache Mahout:

import java.io.File;
import java.util.List;

import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.impl.similarity.CityBlockSimilarity;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.UserBasedRecommender;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;

The goal is to recommend n items to the user. We will use a similarity measure to cluster the items. The similarity measure will be calculated using the Manhattan distance between different items. The code will also be surrounded by a try and catch block to catch any exceptions that may occur. Here is the code for building the recommender system:

 try {

        CityBlockSimilarity similarity = new CityBlockSimilarity(res);
        UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.1,similarity, res);
        UserBasedRecommender recommender = new GenericUserBasedRecommender(res, neighborhood, similarity);
          
            // UserID and number of items to be recommended
            List<recommendeditem> recommended_items = recommender.recommend(2, 2);      
            
            for (RecommendedItem r : recommended_items) {
            System.out.println(r);
        }
        } catch (Exception ex) {
            System.out.println("An exception occured!");
        }</recommendeditem>

The most important part in the above code is the similarity threshold. It is the lower limit for the similarity between two data items that belong to the same cluster. If for instance you use a similarity threshold of 0.5, items with field values that are 50% similar will most likely be assigned to the same cluster. A similarity threshold of 1.0 means that for items to be clustered together, they must have similar characteristics.

In the above code, we are calculating 2 recommendations for the user with an ID of 2. Let us execute the program to see what it returns.

Compile and Run the Code

To run the code, login as the gsadm user. Move your project files to the bin folder of your GridDB located in the following path:

/griddb_4.6.0-1_amd64/usr/griddb-4.6.0/bin

Run the following command on your Linux terminal to set the path for the gridstore.jar file:

export CLASSPATH=$CLASSPATH:/home/osboxes/Downloads/griddb_4.6.0-1_amd64/usr/griddb-4.6.0/bin/gridstore.jar

Next, compile your .java file by running the following command:

javac RecommenderSystemClass.java

Execute the .class file that is generated by running the following command:

java RecommenderSystemClass

The system will recommend 2 items to the user in a descending order as shown below:

Recommended Item[item:12, value:4.857143]
Recommended Item[item:14, value:3.357143]

Congratulations!

That’s how to build a recommender system using Java and GridDB.

If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.