Blog

Introducing the GridDB Cloud CLI Tool

We have already written a Quick Start Guide on how to use GridDB Cloud. And though we believe it’s simple enough to get started using GridDB Cloud’s WebAPI, we wanted to make some of the simple commands usable from the CLI without having to make CURL requests which include your authentication headers in every command. Enter the GridDB Cloud CLI tool: GitHub The GridDB Cloud CLI Tool aims to make managing your GridDB Cloud database a little more manageable from the comfort of your own terminal! Tasks like querying, pushing data, creating containers, etc can all be accomplished now in your CLI with the help of this tool. In this article, we will walk through how to install and use this tool and show some examples of what you can accomplish with it. Getting Started (Download & Configuration) The CLI Tool is distributed via github as a single binary file. In the release section, you can download the appropriate version for your machine. Once downloaded, you can insert it in a directory in your PATH for your CLI and use from anywhere in the CLI, or alternatively, you can simply use the binary file from within the location it’s located (ie. ./griddb-cloud-cli help). The tool is written in Go, so you could also clone the repo and build your own binary: $ go get $ go build Configuration This tool expects a .yaml file to exist in $HOME/.griddb.yaml with the following fields: cloud_url: “url” cloud_username: “example” cloud_pass: “pass” Alternatively, you save the file elsewhere and include the –config flag when running your tool (ie. griddb-cloud-cli –config /opt/configs/griddb.yaml checkConnection). You will also still need to whitelist your IP Address in the GridDB Cloud Portal. Unfortunately this is not something that is achievable through the CLI Tool at this time. Features & Commands This tool was written with the help of the ever-popular Cobra Library. Because of this, we are able to use the –help flag for all the commands in case you forget the functionality of some of the commands and their flags. $ griddb-cloud-cli help A series of commands to help you manage your cloud-based DB. Standouts include creating a container and graphing one using ‘read graph’ and ‘create’ respectfully Usage: griddb-cloud-cli [command] Available Commands: checkConnection Test your Connection with GridDB Cloud completion Generate the autocompletion script for the specified shell create Interactive walkthrough to create a container delete Test your Connection with GridDB Cloud help Help about any command ingest Ingest a `csv` file to a new or existing container list Get a list of all of the containers put Interactive walkthrough to push a row read Query container with TQL show get container info sql Run a sql command Flags: –config string config file (default is $HOME/.griddb.yaml) -h, –help help for griddb-cloud-cli So with that out of the way, let’s begin with the commands. All GridDB CLI Tool Commands On your first time around, you should run the checkConnection command as a sanity check to ensure that you can connect to your instance. The tool will tell you if you have improper auth or if you’re blocked by the firewall: Check Connection $ griddb-cloud-cli checkConnection [10005:TXN_AUTH_FAILED] (address=172.25.23.68:10001, partitionId=0) 2025/04/30 08:32:33 Authentication Error. Please check your username and password in your config file $ griddb-cloud-cli checkConnection 2025/04/30 08:33:48 (403) IP Connection Error. Is this IP Address Whitelisted? Please consider whitelisting Ip Address: X.X.X.116 $ griddb-cloud-cli checkConnection 2025/04/30 08:35:20 Please set a config file with the –config flag or set one in the default location $HOME/.griddb.yaml And if everything is settled correctly: $ griddb-cloud-cli checkConnection 200 OK List Containers You can list all containers inside of your Cloud DB Instance: $ griddb-cloud-cli list 0: actual_reading_1 1: actual_reading_10 2: boiler_control_10 3: device1 4: device2 5: device3 6: device4 7: device6 Show Container You can display the schema and other info about an individual container: $ griddb-cloud-cli show device2 { “container_name”: “device2”, “container_type”: “TIME_SERIES”, “rowkey”: true, “columns”: [ { “name”: “ts”, “type”: “TIMESTAMP”, “timePrecision”: “MILLISECOND”, “index”: [] }, { “name”: “device”, “type”: “STRING”, “index”: [] }, { “name”: “co”, “type”: “DOUBLE”, “index”: [] }, { “name”: “humidity”, “type”: “FLOAT”, “index”: [] }, { “name”: “light”, “type”: “BOOL”, “index”: [] }, { “name”: “lpg”, “type”: “DOUBLE”, “index”: [] }, { “name”: “motion”, “type”: “BOOL”, “index”: [] }, { “name”: “smoke”, “type”: “DOUBLE”, “index”: [] }, { “name”: “temperature”, “type”: “DOUBLE”, “index”: [] } ] } Querying/Reading a Container You can run TQL or SQL queries on your containers. TQL is the simpler option: $ griddb-cloud-cli read device2 –limit 1 –pretty [ { “name”: “device2”, “stmt”: “select * limit 1”, “columns”: null, “hasPartialExecution”: true }] [ [ { “Name”: “ts”, “Type”: “TIMESTAMP”, “Value”: “2006-01-02T07:04:05.700Z” }, { “Name”: “device”, “Type”: “STRING”, “Value”: “b8:27:eb:bf:9d:51” }, { “Name”: “co”, “Type”: “DOUBLE”, “Value”: 0.004955938648391245 }, { “Name”: “humidity”, “Type”: “FLOAT”, “Value”: 51 }, { “Name”: “light”, “Type”: “BOOL”, “Value”: false }, { “Name”: “lpg”, “Type”: “DOUBLE”, “Value”: 0.00765082227055719 }, { “Name”: “motion”, “Type”: “BOOL”, “Value”: false }, { “Name”: “smoke”, “Type”: “DOUBLE”, “Value”: 0.02041127012241292 }, { “Name”: “temperature”, “Type”: “DOUBLE”, “Value”: 22.7 } ] ] The read command will run a simple TQL query of your container which you can then specify the following: an offset (–offset), a limit (-l, –limit), pretty print(-p, –pretty), just rows (–rows), which columns you want to see (–columns) or just the straight obj delivered from GridDB Cloud (–raw). Normally when you query a container with GridDB Cloud, it will send your results as two arrays, one with your column object, and another with more arrays of just row data. You can query this with –raw, but the default is to make a JSON and send that unstructured. If you use Pretty like above, it will indent and space it out for you. Just printing rows is better if you querying lots of rows: $ griddb-cloud-cli read device1 –limit 25 –rows [ { “name”: “device1”, “stmt”: “select * limit 25”, “columns”: null, “hasPartialExecution”: true }] ts,co,humidity,light,lpg,motion,smoke,temp, [2020-07-12T01:00:25.984Z 0.0041795988 77.5999984741 true 0.006763671 false 0.0178934842 26.8999996185] [2020-07-12T01:00:53.485Z 0.0048128545 53.5 false 0.0074903843 false 0.0199543908 21.7] [2020-07-12T01:01:35.020Z 0.0030488793 74.9000015259 true 0.0053836916 false 0.014022829 19.5] [2020-07-12T01:01:52.751Z 0.0049817187 51.3 false 0.0076795919 false 0.020493267 22.4] [2020-07-12T01:01:59.191Z 0.003937408 72.9000015259 true 0.006477819 false 0.0170868731 24.7999992371] [2020-07-12T01:02:01.157Z 0.0050077601 51.1 false 0.0077086115 false 0.0205759974 22.4] [2020-07-12T01:02:01.445Z 0.0030841269 74.8000030518 true 0.0054286446 false 0.0141479363 19.6000003815] [2020-07-12T01:02:04.938Z 0.0048169262 53.5 false 0.0074949679 false 0.0199674343 21.7] [2020-07-12T01:02:05.182Z 0.0025840714 75.5999984741 false 0.0047765452 false 0.0123403139 19.6000003815] [2020-07-12T01:02:12.428Z 0.0030488793 74.9000015259 true 0.0053836916 false 0.014022829 19.6000003815] [2020-07-12T01:02:16.506Z 0.0048277855 53.5 false 0.0075071874 false 0.0200022097 21.7] [2020-07-12T01:02:19.376Z 0.0030401715 74.9000015259 true 0.005372564 false 0.0139918711 19.6000003815] [2020-07-12T01:02:21.754Z 0.0041428371 77.5999984741 true 0.0067205832 false 0.0177717486 26.8999996185] [2020-07-12T01:02:29.017Z 0.0048400659 53.5 false 0.0075209965 false 0.0200415141 21.7] [2020-07-12T01:02:33.443Z 0.0042300404 77.5999984741 true 0.0068226226 false 0.0180601254 26.7999992371] [2020-07-12T01:02:35.686Z 0.00255591 75.5999984741 false 0.0047388314 false 0.0122362642 19.6000003815] [2020-07-12T01:02:41.697Z 0.0030488793 75 true 0.0053836916 false 0.014022829 19.6000003815] [2020-07-12T01:03:03.206Z 0.0042019019 77.5999984741 true 0.006789761 false 0.0179672218 26.7999992371] [2020-07-12T01:03:04.701Z 0.0049946711 51.3 false 0.0076940309 false 0.0205344276 22.5] [2020-07-12T01:03:04.768Z 0.0040601528 72.6999969482 true 0.0066232815 false 0.0174970393 24.7999992371] [2020-07-12T01:03:05.999Z 0.0040886168 77.5 true 0.0066568388 false 0.0175917499 26.7999992371] [2020-07-12T01:03:08.403Z 0.0048101357 53.7 false 0.0074873232 false 0.0199456799 21.8] [2020-07-12T01:03:08.942Z 0.0049860142 51.1 false 0.0076843815 false 0.02050692 22.4] [2020-07-12T01:03:10.023Z 0.0048141805 53.5 false 0.0074918772 false 0.0199586389 21.7] [2020-07-12T01:03:12.863Z 0.0050019251 51.1 false 0.0077021129 false 0.020557469 22.3] Querying Number Data into an ASCII Line Chart Using a subcommand of read, you can also run a TQL query and read the results into a graph. For example: $ griddb-cloud-cli read graph device1 -l 10 –columns temp,humidity [ { “name”: “device1”, “stmt”: “select * limit 10”, “columns”: [“temp”,”humidity”], “hasPartialExecution”: true }] 77.60 ┼╮ 75.66 ┤╰╮ â•­â•® â•­â•® ╭─────────── 73.73 ┤ â•°â•® ╭╯╰╮ â•­â•® ╭╯╰╮ ╭╯ 71.79 ┤ â•°â•® ╭╯ │ ╭╯╰╮ │ â•°â•® ╭╯ 69.85 ┤ â•°â•® ╭╯ â•°â•® ╭╯ â•°â•® ╭╯ â•°â•® ╭╯ 67.92 ┤ │ ╭╯ â•°â•® ╭╯ â•°â•® ╭╯ â•°â•® ╭╯ 65.98 ┤ â•°â•® ╭╯ â•°â•® ╭╯ â•°â•® ╭╯ â•°â•® ╭╯ 64.04 ┤ â•°â•® ╭╯ â•°â•® ╭╯ â•°â•® ╭╯ â•°â•® ╭╯ 62.11 ┤ â•°â•® ╭╯ â•°â•® ╭╯ â•°â•® ╭╯ â•°â•® ╭╯ 60.17 ┤ â•°â•® ╭╯ â•°â•® ╭╯ │ ╭╯ â•°â•® ╭╯ 58.23 ┤ â•°â•® ╭╯ â•°â•® ╭╯ â•°â•® ╭╯ â•°â•® ╭╯ 56.30 ┤ ╰╮╭╯ â•°â•® ╭╯ â•°â•® ╭╯ ╰╮╭╯ 54.36 ┤ ╰╯ ╰╮╭╯ â•°â•® │ ╰╯ 52.42 ┤ ││ ╰╮╭╯ 50.49 ┤ ╰╯ ╰╯ 48.55 ┤ 46.61 ┤ 44.68 ┤ 42.74 ┤ 40.80 ┤ 38.87 ┤ 36.93 ┤ 34.99 ┤ 33.06 ┤ 31.12 ┤ 29.18 ┤ 27.25 ┼─╮ 25.31 ┤ ╰───╮ ╭────╮ 23.37 ┤ ╰───╮ ╭────────╯ ╰────────╮ 21.44 ┤ ╰───────╮ ╭──────╯ ╰───────╮ ╭──────────────╮ 19.50 ┤ ╰───────╯ ╰─────╯ ╰────────────── Col names from container device1 â–  temp â–  humidity The results are color-coded so that you can accurately see which cols are mapped to which values. It also automatically omits non-number types if you just want to read the entire container a line chart: $ griddb-cloud-cli read graph device1 -l 5 Column ts (of type TIMESTAMP ) is not a `number` type. Omitting Column light (of type BOOL ) is not a `number` type. Omitting Column motion (of type BOOL ) is not a `number` type. Omitting 77.60 ┼─╮ 75.01 ┤ ╰─╮ ╭─╮ 72.43 ┤ ╰──╮ ╭──╯ ╰──╮ ╭── 69.84 ┤ ╰──╮ ╭──╯ ╰──╮ ╭─╯ 67.25 ┤ ╰─╮ ╭──╯ ╰─╮ ╭──╯ 64.67 ┤ ╰──╮ ╭──╯ ╰──╮ ╭──╯ 62.08 ┤ ╰──╮ ╭──╯ ╰──╮ ╭──╯ 59.49 ┤ ╰─╮ ╭──╯ ╰──╮ ╭──╯ 56.91 ┤ ╰──╮ ╭──╯ ╰─╮ ╭──╯ 54.32 ┤ ╰────╯ ╰──╮ ╭──╯ 51.73 ┤ ╰───╯ 49.15 ┤ 46.56 ┤ 43.97 ┤ 41.39 ┤ 38.80 ┤ 36.21 ┤ 33.63 ┤ 31.04 ┤ 28.46 ┤ 25.87 ┼───────────╮ ╭── 23.28 ┤ ╰───────────╮ ╭─────────────────────────╯ 20.70 ┤ ╰──────────────────────────────────────────────╯ 18.11 ┤ 15.52 ┤ 12.94 ┤ 10.35 ┤ 7.76 ┤ 5.18 ┤ 2.59 ┤ 0.00 ┼─────────────────────────────────────────────────────────────────────────────────────────────────── Col names from container device1 â–  co â–  humidity â–  lpg â–  smoke â–  temp Creating Containers You can create containers using an interactive question prompt in the CLI. It will ask for container name, container type, rowkey, and col names and types. For example, let’s create a new time series container with two columns: $ griddb-cloud-cli create ✔ Container Name: … sample1 ✔ Choose: … TIME_SERIES ✔ How Many Columns for this Container? … 2 ✔ Col name For col #1 … ts ✔ Col #1(TIMESTAMP CONTAINERS ARE LOCKED TO TIMESTAMP FOR THEIR ROWKEY) … TIMESTAMP ✔ Col name For col #2 … temp ✔ Column Type for col #2 … DOUBLE ✔ Make Container? { “container_name”: “sample1”, “container_type”: “TIME_SERIES”, “rowkey”: true, “columns”: [ { “name”: “ts”, “type”: “TIMESTAMP”, “index”: null }, { “name”: “temp”, “type”: “DOUBLE”, “index”: null } ] } … YES {“container_name”:”sample1″,”container_type”:”TIME_SERIES”,”rowkey”:true,”columns”:[{“name”:”ts”,”type”:”TIMESTAMP”,”index”:null},{“name”:”temp”,”type”:”DOUBLE”,”index”:null}]} 201 Created If you can’t easily follow along with the prompt here, please just download the tool and try it for yourself! And note, as explained in the prompts, if you select to create a TIME_SERIES Container, the rowkey is auto set to true and the first col must have a type of TIMESTAMP. Collection containers have different rules. Putting Rows to Containers Similarly, you can follow along with the prompt to push data into your container, 1 by 1. Here we will push to our new container sample1 and use NOW() as our current timestamp: $ griddb-cloud-cli put sample1 Container Name: sample1 ✔ Column 1 of 2 Column Name: ts Column Type: TIMESTAMP … NOW() ✔ Column 2 of 2 Column Name: temp Column Type: DOUBLE … 20.2 [[“2025-04-30T07:43:03.700Z”, 20.2]] ✔ Add the Following to container sample1? … YES 200 OK Ingesting CSV Data You can also ingest full CSV files with this tool. It too uses an interactive prompt as there is information that needs to be set for each col, such as index position in CSV and data type. Once you set those, it will ingest the data in chunks of 1000. $ griddb-cloud-cli ingest iot_telemetry_data.csv ✔ Does this container already exist? … NO Use CSV Header names as your GridDB Container Col names? ts,device,co,humidity,light,lpg,motion,smoke,temp ✔ Y/n … YES ✔ Container Name: … device6 ✔ Choose: … TIME_SERIES ✔ Col ts(TIMESTAMP CONTAINERS ARE LOCKED TO TIMESTAMP FOR THEIR ROWKEY) … TIMESTAMP ✔ (device) Column Type … STRING ✔ (co) Column Type … DOUBLE ✔ (humidity) Column Type … DOUBLE ✔ (light) Column Type … BOOL ✔ (lpg) Column Type … DOUBLE ✔ (motion) Column Type … BOOL ✔ (smoke) Column Type … DOUBLE ✔ (temp) Column Type … DOUBLE }, { “name”: “device”, “type”: “STRING”, “index”: null }, { “name”: “co”, “type”: “DOUBLE”, “index”: null }, { “name”: “humidity”, “type”: “DOUBLE”, “index”: null }, { “name”: “light”, “type”: “BOOL”, “index”: null }, { “name”: “lpg”, “type”: “DOUBLE”, “index”: null }, { “name”: “motion”, “type”: “BOOL”, “index”: null }, { “name”: “smoke”, “type”: “DOUBLE”, “index”: null }, { “name”: “temp”, “type”: “DOUBLE”, “index”: null } ] } … YES {“container_name”:”device6″,”container_type”:”TIME_SERIES”,”rowkey”:true,”columns”:[{“name”:”ts”,”type”:”TIMESTAMP”,”index”:null},{“name”:”device”,”type”:”STRING”,”index”:null},{“name”:”co”,”type”:”DOUBLE”,”index”:null},{“name”:”humidity”,”type”:”DOUBLE”,”index”:null},{“name”:”light”,”type”:”BOOL”,”index”:null},{“name”:”lpg”,”type”:”DOUBLE”,”index”:null},{“name”:”motion”,”type”:”BOOL”,”index”:null},{“name”:”smoke”,”type”:”DOUBLE”,”index”:null},{“name”:”temp”,”type”:”DOUBLE”,”index”:null}]} 201 Created Container Created. Starting Ingest 0 ts ts 1 device device 2 co co 3 humidity humidity 4 light light 5 lpg lpg 6 motion motion 7 smoke smoke 8 temp temp ✔ Is the above mapping correct? … YES Ingesting. Please wait… Inserting 1000 rows 200 OK Inserting 1000 rows 200 OK Inserting 1000 rows Notice here, in this example, it asks if the container exists in your DB yet. If you select NO, it will create the container for you as shown above. But if you select YES, it will allow you to pick the container from your list and then map the proper indices, and then ingest that way — handy! SQL Commands Sometimes you will need to use SQL because its flexibility and for its ability to use and manipulate partitioned tables. There are three subcommands you can use which follow the sql command: create, update, query. Let’s walk through each one (and yes, they are exactly what they sound like). As a note, you will need to include the -s string with every command (it stands for string, it just represents the raw sql string). First, let’s create a new partitioned table: griddb-cloud-cli sql query -s $ griddb-cloud-cli sql create -s “CREATE TABLE IF NOT EXISTS pyIntPart1 (date TIMESTAMP NOT NULL PRIMARY KEY, value STRING) WITH (expiration_type=’PARTITION’,expiration_time=10,expiration_time_unit=’DAY’) PARTITION BY RANGE (date) EVERY (5, DAY);” [{“stmt”: “CREATE TABLE IF NOT EXISTS pyIntPart1 (date TIMESTAMP NOT NULL PRIMARY KEY, value STRING) WITH (expiration_type=’PARTITION’,expiration_time=10,expiration_time_unit=’DAY’) PARTITION BY RANGE (date) EVERY (5, DAY);” }] Now we have our table. Now let’s push some data into it: griddb-cloud-cli sql update -s $ griddb-cloud-cli sql update -s “INSERT INTO pyIntPart2(date, value) VALUES (NOW(), ‘fourth’)” [{“stmt”: “INSERT INTO pyIntPart2(date, value) VALUES (NOW(), ‘fourth’)” }] [{“updatedRows”:1,”status”:1,”message”:null,”stmt”:”INSERT INTO pyIntPart2(date, value) VALUES (NOW(), ‘fourth’)”}] And then read from it: griddb-cloud-cli sql query -s $ griddb-cloud-cli sql query -s “select * from pyIntPart2 limit 1” –pretty [{“stmt”: “select * from pyIntPart2 limit 1” }] [ [ { “Name”: “date”, “Type”: “TIMESTAMP”, “Value”: “2025-04-30T14:58:00.255Z” }, { “Name”: “value”, “Type”: “STRING”, “Value”: “fourth” } ] ] And as explained above, the read command uses TQL under the hood, which does not have access to partitioned tables, so your use of read will fail on this particular table: $ griddb-cloud-cli read pyIntPart2 2025/04/30 09:09:41 400 Error: [151001:TQ_SYNTAX_ERROR_EXECUTION] Partial/Distribute TQL does not support order by and selection expression except for ‘*’ (address=172.25.23.69:10001, partitionId=27) (containerName=pyIntPart2) Conclusion We hope that the GridDB Cloud CLI tool will be helpful and we hope this article showcased its strengths adequately! And of course, because this tool is completely open source, we encourage users to tinker and expand the current suite of available features. Some of the stuff that may be coming from us: JSON-based table creation, pushing rows without interactive mode, & many

More
Building a One-Time Token Login System with Spring Security

Imagine trying to keep track of all your passwords. It’s a daunting task, isn’t it? With passwords required for social media, online shopping, e-wallet apps, and various computer tools, it’s easy to lose count. Forgetting a password can be frustrating, and having to create a new one every time can be a hassle. Fortunately, there’s a better and safer way to log in to websites: One-Time Tokens. Unlike traditional passwords, which are used repeatedly, One-Time Tokens provide a unique code that can only be used once. This token is designed to be used within a short timeframe, and once it’s used or expires, it becomes invalid. It’s like a secure, self-destructing message. So, why are One-Time Tokens a better option than traditional passwords? Here are a few key benefits: Enhanced Security: By reducing our reliance on passwords, we minimize the risk of password-related vulnerabilities. This means no more weak passwords, no more password reuse across different sites, and even if someone intercepts a token, it’s likely to be expired and useless. Improve User Experience: Let’s face it, remembering passwords can be a pain. One-Time Tokens simplify the process, allowing users to click a link or enter a short code, making login a smoother experience. Fewer Password Reset: With One-Time Tokens, the need for password resets decreases significantly. Since users don’t have to constantly remember and re-enter passwords, there’s less to forget in the first place. Let’s dive into the world of One-Time Tokens login with Spring Security! We will explore what they are and how we can actually implement them. One-Time Token Login: The Flow You navigate to the login page and enter your email address to start the process. The application generates a unique, one-time token. This token is a long, random string that’s impossible to guess. The system sends this token to you via email, SMS, or WhatsApp. You receive the token and enter it on the login page. Often, you’ll just click a magic link in the message that contains the token. The web application receives the token and checks: Is the token valid? (Does it match one we generated?) Has it expired? (One-time tokens have a short lifespan for security) Has it already been used? (Remember, it’s one-time use only!) If everything checks out, you’re logged in. The web application establishes a session for you. 💻 Let’s Build This Thing! Maven To enable the One-Time Token feature, we need to include the spring-boot-starter-security and spring-boot-starter-web dependencies. <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-security</artifactId> </dependency> </dependencies> Spring Security Configuration Next, we configure the Spring Security and enable the form-login and One-Time Token Login: @Configuration @EnableWebSecurity public class SecurityConfig { @Bean public SecurityFilterChain securityFilterChain( HttpSecurity http, SendLinkOneTimeTokenGenerationSuccessHandler successHandler, CustomOneTimeTokenService customOneTimeTokenService) throws Exception { AuthenticationSuccessHandler ottLoginsuccessHandler = (request, response, authentication) -> response.sendRedirect(“/”); http.csrf(Customizer.withDefaults()) .authorizeHttpRequests( authorize -> authorize .requestMatchers(“/error”, “/”, “/images/**”, “/js/*.js”, “/css/*.css”) .permitAll() .requestMatchers(new AntPathRequestMatcher(“/authentication/login”)) .permitAll() .requestMatchers(new AntPathRequestMatcher(“/logout”)) .permitAll() .requestMatchers(new AntPathRequestMatcher(“/webjars/**”)) .permitAll() .requestMatchers(new AntPathRequestMatcher(“/ott/sent”)) .permitAll() .requestMatchers(new AntPathRequestMatcher(“/ott/submit”)) .permitAll() .anyRequest() .authenticated()) .formLogin( form -> form.loginPage(“/authentication/login”) .loginProcessingUrl(“/login”) .failureUrl(“/authentication/login?failed”) .defaultSuccessUrl(“/”) .permitAll()) .headers( httpSecurityHeaders -> httpSecurityHeaders.frameOptions(FrameOptionsConfig::disable)) .logout(Customizer.withDefaults()) .oneTimeTokenLogin( configurer -> configurer .tokenGenerationSuccessHandler(successHandler) .tokenService(customOneTimeTokenService) .showDefaultSubmitPage(false) .authenticationSuccessHandler(ottLoginsuccessHandler)); return http.build(); } } @EnableWebSecurity annotation: enable Spring Security’s web security support and provide the Spring MVC integration. SecurityFilterChain bean to add custom filter in Spring Security Context. Configures authorizeHttpRequests defines which URL path should be secured and which should not. Configures formLogin() to customize the form based authentication. Configures loginPage() for redirecting to /authentication/login if authentication is required. Configures loginProcessingUrl to validate the submitted credentials. failureUrl specify the URL to send users if authentication fails. Configures headers(). We enable all the default headers except the X-Frame-Options headers. .logout(Customizer.withDefaults()) provides logout support using default settings. The default is that accessing the URL “/logout” will log the user out by invalidating the HTTP Session, cleaning up any rememberMe authentication that was configured, clearing the SecurityContextHolder, and then redirect to /login?success. Enable One-Time Token Login support and customize it with oneTimeTokenLogin() method : – tokenGenerationSuccessHandler : Specifies strategy to be used to handle generated one-time tokens. We will create a custom handler that implements OneTimeTokenGenerationSuccessHandler. – tokenService : Configures the OneTimeTokenService used to generate and consume the OneTimeToken. – showDefaultSubmitPage(false) : disable the default One-Time Token submit page. – authenticationSuccessHandler : Specifies the AuthenticationSuccessHandler strategy used to handle a successful user authentication. For demo, we redirect user to the home page. Custom OneTimeTokenGenerationSuccessHandler Next, we need to implement a custom OneTimeTokenGenerationSuccessHandler to deliver the token to the end user. @Component public class SendLinkOneTimeTokenGenerationSuccessHandler implements OneTimeTokenGenerationSuccessHandler { private final OttEmailService emailService; private final FlashMapManager flashMapManager = new SessionFlashMapManager(); public SendLinkOneTimeTokenGenerationSuccessHandler(OttEmailService emailService) { this.emailService = emailService; } @Override @SneakyThrows public void handle( HttpServletRequest request, HttpServletResponse response, OneTimeToken oneTimeToken) throws IOException, ServletException { UriComponentsBuilder builder = UriComponentsBuilder.fromUriString(UrlUtils.buildFullRequestUrl(request)) .replacePath(request.getContextPath()) .replaceQuery(null) .fragment(null) .path(“/ott/submit”) .queryParam(“token”, oneTimeToken.getTokenValue()); String link = builder.toUriString(); CompletableFuture.runAsync(() -> emailService.sendEmail(oneTimeToken.getUsername(), link)); RedirectView redirectView = new RedirectView(“/ott/sent”); redirectView.setExposeModelAttributes(false); FlashMap flashMap = new FlashMap(); flashMap.put(“token”, oneTimeToken.getTokenValue()); flashMap.put(“ottSubmitUrl”, link); flashMapManager.saveOutputFlashMap(flashMap, request, response); redirectView.render(flashMap, request, response); } } This component will do a number of things: Generate the magic link containing the one-time token. Call emailService.sendEmail() to send the email to the user with magic link. For demo, we redirect user to the /ott/sent page. Custom Success Page Create a controller and an HTML template to handle this page. For demo purpose, we forward the token to the custom submit page. @Controller @RequestMapping(“/ott”) public class OttController { @GetMapping(“/sent”) public String sent(Model model) { return “ott/sent”; } @GetMapping(“/submit”) public String submit(Model model, @RequestParam(“token”) String token) { model.addAttribute(“token”, token); return “ott/submit”; } } sent.html <html xmlns:th=”http://www.thymeleaf.org” xmlns:layout=”http://www.ultraq.net.nz/thymeleaf/layout” layout:decorate=”~{layout}”> <head> <title>OTT Sent</title> </head> <body> <div layout:fragment=”content”> <p>We just sent you an email. Please follow the provided link to log in.</p> <p>For testing here is the <a th:href=”${ottSubmitUrl}”>submit link</a></p> </div> </body> </html> submit.html <body> <div layout:fragment=”content”> <div class=”d-flex flex-wrap mb-4″> <h1 class=”flex-grow-1″>Login OTT</h1> </div> <form th:action=”@{/login/ott}” method=”post”> <div class=”row mb-3″> <label for=”token” class=”form-check-label”>Token</label> <input type=”text” id=”token” name=”token” th:value=”${token}” placeholder=”Token” required=”true” autofocus=”autofocus” class=”form-control”/> </div> <button class=”btn btn-primary” type=”submit”>Sign in</button> </form> </div> </body> Custom OneTimeTokenService Next, create a custom implementation of OneTimeTokenService interface. By customizing it, we can have a custom expire time, adding more info into token, and implement custom token value. @Service public class CustomOneTimeTokenService implements OneTimeTokenService { private final Map<String, OneTimeToken> oneTimeTokens = new ConcurrentHashMap<>(); private Clock clock = Clock.systemUTC(); @Override @NonNull public OneTimeToken generate(GenerateOneTimeTokenRequest request) { String token = UUID.randomUUID().toString(); Instant expiresAt = this.clock.instant().plus(5, ChronoUnit.MINUTES); OneTimeToken oneTimeToken = new DefaultOneTimeToken(token, request.getUsername(), expiresAt); oneTimeTokens.put(token, oneTimeToken); return oneTimeToken; } @Override @Nullable public OneTimeToken consume(OneTimeTokenAuthenticationToken authenticationToken) { log.info(“Consume token: {}”, authenticationToken.getTokenValue()); OneTimeToken oneTimeToken = oneTimeTokens.remove(authenticationToken.getTokenValue()); if (oneTimeToken == null || isExpired(oneTimeToken)) { return null; } return oneTimeToken; } private boolean isExpired(OneTimeToken oneTimeToken) { return this.clock.instant().isAfter(oneTimeToken.getExpiresAt()); } } CustomOneTimeTokenService class responsible for: Generating the token and storing it. For demo, we use in memory store. Consuming the token, we validate the token expiration and delete the token from the store. Retrieving Users Create custom UserDetailsService implementation to load user-specific data. @Service @AllArgsConstructor public class CustomUserDetailService implements UserDetailsService { private final UsersContainerClient usersContainerClient; @Override public UserDetails loadUserByUsername(String email) { UserRecord user = usersContainerClient.getUserByEmail(email); if (user == null) { throw new UsernameNotFoundException(“User not found”); } List<SimpleGrantedAuthority> authorities = new java.util.ArrayList<>(); authorities.add(new SimpleGrantedAuthority(“ROLE_USER”)); if (“admin@example.com”.equals(user.email())) { authorities.add(new SimpleGrantedAuthority(“ROLE_ADMIN”)); } return new org.springframework.security.core.userdetails.User( user.email(), user.password(), authorities); } } The user object is stored and fetched using the UsersContainerClient class, which will handle the interaction with GridDB Cloud. @Service public class UsersContainerClient { private static final String CONTAINER_NAME = “Users”; private final RestClient restClient; private final String baseUrl; public UsersContainerClient( @Value(“${griddb.base-url}”) String baseUrl, @Value(“${griddb.auth-token}”) String authToken) { this.baseUrl = baseUrl; this.restClient = RestClient.builder() .baseUrl(this.baseUrl) .defaultHeader(“Authorization”, “Basic ” + authToken) .defaultHeader(“Content-Type”, MediaType.APPLICATION_JSON_VALUE) .build(); } private <T> T post(String uri, Object body, Class<T> responseType) { try { return restClient.post().uri(uri).body(body).retrieve().body(responseType); } catch (GridDbException e) { throw e; } catch (Exception e) { throw new GridDbException( “Failed to execute POST request”, HttpStatusCode.valueOf(500), e.getMessage(), e); } } public UserRecord getUserByEmail(String email) { String statement = String.format(“SELECT id, email, name, \”password\” FROM Users where email == ‘%s'”, email); return getOneUser(statement); } private UserRecord getOneUser(String statement) { String type = “sql-select”; GridDbCloudSQLSelectInput input = new GridDbCloudSQLSelectInput(type, statement); var response = post(“/sql”, List.of(input), GridDbCloudSQLOutPut[].class); log.info(“Output: {}”, response[0]); if (response[0].results().size() == 0) { return null; } UserRecord foundUser = null; for (List<String> row : response[0].results()) { if (row.size() < 4) { break; } foundUser = new UserRecord(row.get(0), row.get(1), row.get(2), row.get(3)); } log.info(“Found user: {}”, foundUser); return foundUser; } } record GridDbCloudSQLSelectInput(String type, @JsonProperty(“stmt”) String statement) {} record GridDbCloudSQLOutPut( @JsonProperty(“columns”) List<GDCColumnInfo> columns, @JsonProperty(“results”) List<List<String>> results, @JsonProperty(“responseSizeByte”) long responseSizeByte) {} To communicate with GridDB Cloud via HTTP requests, we create a Spring RestClient instance with HTTP basic authentication. We POST the sql-select query and convert the response into UserRecord Demo For demo, we have added demo users (admin@example.com, user@example.com) on application startup. Full code can be access on Github. Conclusion One-time tokens are a great leap forward in enhancing online security while keeping things user-friendly. Using frameworks like Spring Security can make it easier to implement these advanced security measures. When using one-time tokens in production, keep these key factors in mind: – Token Validity: Decide how long each token should stay active. – Delivery Reliability: Ensure your token delivery method is dependable. – Security: Make sure the token generation process is cryptographically secure. – Storage Safety: Store tokens securely to prevent unauthorized access. By addressing these aspects, you can create a robust and user-friendly security

More
Building a Scheduling Assistants with SpringAI

In this blog post, we’ll walk through how to build a personal AI assistant that simplifies managing your calendar. By the end, you’ll know how to create an assistant capable of handling event submissions and retrieving schedules through simple conversations. We’ll use Spring Boot, Spring AI, and OpenAI to build a system that’s both practical and enjoyable to interact with. Why Build a Personal AI Calendar Assistant? Managing tasks through natural language might seem like something straight out of a sci-fi movie, but it’s more useful than you might expect. This AI assistant can save you time, eliminate the hassle of manual input, and make managing your schedule a breeze. Additionally, building this project is a fantastic way to sharpen your skills as a developer. If you’re a computer science student or an aspiring developer, you’ll gain valuable hands-on experience with AI integration, backend development, and database management, all while creating a tool you can use in your daily life. System Overview Before diving into the details of coding, let’s take a moment to understand how the entire system is structured. Here’s a high-level overview of how everything works: This application features a single page to display the event list and includes a chatbot interface for user interaction. User Interaction via Chat The chatbot interface allows users to interact with the AI assistant using natural language commands. For example: Submit Events: Add events by chatting with the assistant. For example, you could say, “I want to go to the Shopping Center Tomorrow at 2 PM.“ List Events: Check your schedule by asking, “Show my events for tomorrow“ The AI assistant processes these commands by understanding the user’s queries, extracting critical details such as intent, time, and location, and then performing the appropriate action—like saving the event or retrieving a list of upcoming events. Backend System (Spring Boot) The backend serves as the engine of the system, handling several key tasks: API Handling: Receives user input from the chatbot interface. Event Management: Manages the storage and retrieval of events from the database. Spring AI: Manages the AI logic and communicates with the OpenAI API. AI Module (Spring AI + OpenAI API) This module functions as the brain of the assistant. Here’s how it operates: Input Parsing: The AI module processes user queries and leverages the OpenAI API to extract key details such as the event title, time, and location. Intent Recognition: Determines the user’s intention, whether it’s adding an event or listing upcoming events. Response Generation: Produces a user-friendly response based on the action performed. Spring AI acts as a wrapper around the OpenAI API, streamlining the integration process and allowing you to focus on core application logic instead of implementation complexities. Data Storage (Database Layer) The database layer ensures that all events are securely stored and can be retrieved when needed. Here’s what happens at this level: Event Storage: Stores each event submitted through the chatbot. Query: Fetches relevant events from the database when the user requests their schedule. For this project, we’ll use GridDB as our database solution. Now that we’ve covered the system architecture, let’s get started with building the application! Step-by-Step Guide to Building the Project The following items should be installed in your system: Java 17 or later: OpenJDK Maven Your preferred IDE: VS Code, Initellij IDEA Docker Compose OpenAI API We need to create an API Key with OpenAI to access ChatGPT models. Create an account and generate the token on the API Keys page. Initialize a Spring Boot Project You can use this pre-initialized project and click Generate to download a Zip file. You can also fork the project from Github and open it in your IDE or other editor. Spring AI Dependency Add Milestone and Snapshot Repositories <repositories> <repository> <id>spring-milestones</id> <name>Spring Milestones</name> <url>https://repo.spring.io/milestone</url> <snapshots> <enabled>false</enabled> </snapshots> </repository> <repository> <id>spring-snapshots</id> <name>Spring Snapshots</name> <url>https://repo.spring.io/snapshot</url> <releases> <enabled>false</enabled> </releases> </repository> </repositories> Add Spring AI Bill of Materials (BOM) <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-bom</artifactId> <version>1.0.0-SNAPSHOT</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> Add SpringAI OpenAI Spring Boot starter <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai-spring-boot-starter</artifactId> </dependency> Add GridDB dependency <dependency> <groupId>com.github.griddb</groupId> <artifactId>gridstore</artifactId> <version>5.6.0</version> </dependency> Storing and Managing Events In this project we have a simple calendar system with two main entities: User and Event. Each event is associated with a specific user. Based on the schema above, we will create the entity classes as follows: @Data public class User { @RowKey String id; String email; String fullName; } @Data public class Event { @RowKey private String id; private String title; private String location; private Date startTime; private Date endTime; private String userId; } Next, we create the GridDBConfig class as a central configuration for database operation. The class will do the following: Read environment variables for connecting to the GridDB database Create a GridStore class for managing database connection to the GridDB instance Create GridDB Collection’s container (Table) to manage a set of rows. The container is a rough equivalent of the table in a relational database. On creating/updating the Collection we specify the name and object corresponding to the column layout of the collection. Also for each collection, we add an index for a column that is frequently searched and used in the condition of the WHERE section of TQL. Make the container available in the Spring container @Configuration public class GridDBConfig { @Value(“${GRIDDB_NOTIFICATION_MEMBER}”) private String notificationMember; @Value(“${GRIDDB_CLUSTER_NAME}”) private String clusterName; @Value(“${GRIDDB_USER}”) private String user; @Value(“${GRIDDB_PASSWORD}”) private String password; @Bean public GridStore gridStore() throws GSException { Properties properties = new Properties(); properties.setProperty(“notificationMember”, notificationMember); properties.setProperty(“clusterName”, clusterName); properties.setProperty(“user”, user); properties.setProperty(“password”, password); GridStore store = GridStoreFactory.getInstance().getGridStore(properties); return store; } @Bean public Collection<String, User> userCollection(GridStore gridStore) throws GSException { Collection<String, User> collection = gridStore.putCollection(AppConstant.USERS_CONTAINER, User.class); collection.createIndex(“email”); return collection; } @Bean public Collection<String, Event> eventCollection(GridStore gridStore) throws GSException { Collection<String, Event> movieCollection = gridStore.putCollection(AppConstant.EVENT_CONTAINER, Event.class); movieCollection.createIndex(“userId”); return movieCollection; } } Business Logic EventService class This service class handles event creation and listing. @Slf4j @Service public class EventService { private final Collection<String, Event> eventCollection; private final Collection<String, User> userCollection; public EventService(Collection<String, Event> eventCollection, Collection<String, User> userCollection) { this.eventCollection = eventCollection; this.userCollection = userCollection; } public List<EventDTO> findAll(String userId) { if (userId != null && !userId.isBlank()) { return fetchAll(userId).stream().map(event -> mapToDTO(event, new EventDTO())).toList(); } final List<Event> events = fetchAll(); return events.stream().map(event -> mapToDTO(event, new EventDTO())).toList(); } public String create(final EventDTO eventDTO, String userId) { final Event event = new Event(); mapToEntity(eventDTO, event); event.setUserId(userId); event.setId(IdGenerator.next(“ev_”)); try { eventCollection.put(event); return event.getId(); } catch (GSException e) { throw new AppErrorException(“Failed to create event”); } } } UserService class This class handles user creation. @Slf4j @Service public class UserService { private final Collection<String, User> userCollection; public UserService(Collection<String, User> userCollection) { this.userCollection = userCollection; } public Optional<User> findByEmail(final String emailString) { try (Query<User> query = userCollection.query(“SELECT * WHERE email='” + emailString + “‘”, User.class)) { RowSet<User> rowSet = query.fetch(); if (rowSet.hasNext()) { User user = rowSet.next(); return Optional.of(user); } else { throw new NotFoundException(“User not found”); } } catch (GSException e) { throw new AppErrorException(“Failed to find user”); } } Connecting OpenAI To connect to OpenAI’s API, we need to configure the API key and specify the name of the OpenAI model for accessing the LLM. This configuration is done in the application.yml file: spring: ai: openai: api-key: ${OPENAI_API_KEY} chat: options: model: gpt-4o-mini Here, ${OPENAI_API_KEY} retrieves the API key from an environment variable. For this project, we are using the gpt-4o-mini model. Initialize the Spring AI ChatClient Below is the implementation of the PersonalAssistant class, which initializes the ChatClient, processes user queries, and sends them to the OpenAI API. @Service public class PersonalAssistant { private final ChatClient chatClient; public PersonalAssistant(ChatClient.Builder modelBuilder, ChatMemory chatMemory) { // @formatter:off this.chatClient = modelBuilder.defaultSystem(“”” You are a personal assistant and travel planner. Your job is to answer questions about and to perform actions on the user’s behalf, mainly around calendar events, and time-management. You are required to answer an a professional manner. If you don’t know the answer, politely tell the user you don’t know the answer, then ask the user a followup question to try and clarify the question they are asking. If you do know the answer, provide the answer but do not provide any additional followup questions. Use the provided functions to fetch user’s events by email, and create new event. Before creating new event, you MUST always get the following information from the user: 1. Email 2. Location 3. Start time 4. End time: If not provided, assume it ended in one hour. 5. Title: Get title from user’s intent and interest. Today is {current_date}. “””) .defaultAdvisors( new MessageChatMemoryAdvisor(chatMemory, DEFAULT_CHAT_MEMORY_CONVERSATION_ID, 10), new SimpleLoggerAdvisor() ) .defaultFunctions(“getUserEvents”, “createEvent”) .build(); // @formatter:on } public String chat(String chatId, String userMessageContent) { return this.chatClient.prompt() .system(s -> s.param(“current_date”, LocalDate.now().toString())) .user(userMessageContent) .call().content(); } } We obtain an auto-configured ChatClient.Builder and use it to create the ChatClient. The ChatClient is a Spring Bean provided by Spring AI that manages sending user input to the LLM. To make our chatbot focus on functioning as a personal assistant and avoid providing irrelevant information, we utilize a system message to guide the model’s behavior and specify the desired output. This system message is defined within the defaultSystem() method. We add chat memory to maintain context for up to 10 previous messages when using the chatbot, ensuring more cohesive interactions. We include a SimpleLoggerAdvisor to log request and response data from the ChatClient, which is helpful for debugging and monitoring AI interactions. We register the getUserEvents() and createEvent() functions to enable the LLM to interact with existing business logic. The chat() method accepts a user message, passes it to the Spring AI ChatClient bean as input, and returns the result from the content(). Function Calling Here’s how function calling works in this project: The user types something like, Give me my schedule for tomorrow. Spring AI connects to the OpenAI API, processes the text, and extracts the required information. Using function calling, the AI model dynamically determines which function to trigger. Spring AI executes the relevant function with the extracted parameters (e.g., getUserEvents()). Spring AI calls the OpenAI API again, including the function’s response, to generate the final reply. Now, let’s map our functions so we can use them with Spring AI. @Configuration public class EventTools { private static final Logger logger = LoggerFactory.getLogger(EventTools.class); @Autowired private EventService eventService; @Autowired private UserService userService; public record EventListRequest(String email) {} public record EventViewDTO(String id, String title, String location, LocalDateTime startTime, LocalDateTime endTime, UserViewDTO user) {} public record UserViewDTO(String name) {} @Bean @Description(“Get event list for given users email”) public Function<EventListRequest, List<EventViewDTO>> getUserEvents() { return request -> { Optional<User> user = userService.findByEmail(request.email()); return eventService.findAll(user.get().getEmail()).stream().map(this::mapToViewDTO).toList(); }; } private EventViewDTO mapToViewDTO(EventDTO eventDTO) { return new EventViewDTO(eventDTO.getId(), eventDTO.getTitle(), eventDTO.getLocation(), eventDTO.getStartTime(), eventDTO.getEndTime(), new UserViewDTO(eventDTO.getUser().name())); } public record CreateEventRequest(String email, String title, String location, LocalDateTime startTime, LocalDateTime endTime) { } @Bean @Description(“Create new event with specified email, title, location, start-time, and end-time.”) public Function<CreateEventRequest, String> createEvent() { return request -> { logger.debug(“call function create event {}”, request); Optional<User> user = userService.findByEmail(request.email()); EventDTO eventDTO = new EventDTO(); eventDTO.setTitle(request.title()); eventDTO.setLocation(request.location()); eventDTO.setStartTime(request.startTime()); eventDTO.setEndTime(request.endTime()); return eventService.create(eventDTO, user.get().getId()); }; } } Define a @Bean method that returns a java.util.function.Function. Add the @Description annotation to provide a clear explanation of what this function does. Spring AI can leverage the service classes we’ve already developed without requiring a complete rewrite. Chat Interface The chatbox UI is developed using Thymeleaf, Javascript, and CSS. The chatbox is designed to resemble message bubbles, similar to iMessage, and supports using the Enter key to send messages. We use AJAX to handle HTTP requests and responses seamlessly. Running the Project with Docker Compose To spin up the project we will utilize Docker Compose. The entire code for the web application is available on Github. Before starting the application, make sure you have the API Key from OpenAI. Create .env file with the following content: OPENAI_API_KEY=’YOUR_OPENAI_API_KEY’ Build the services: docker compose build Start the services: docker compose up After starting the application it is accessible under localhost:8080. Conclusion Spring AI makes it easier to add AI features to Spring-based applications. It allows AI code to work alongside existing business logic in the same codebase. What can be improved? Add logs for chatbox messages (input and output). Make it easy for users to give feedback on chatbox responses. Implement safety measures like

More
Pushing Time Series Data to GridDB Cloud’s Time Series Containers with Kafka HTTP Sink

In a previous article, we showcased how one could pair GridDB Cloud’s free infrastructure with Kafka using a custom Single Message Transform and some SSL certs/rules; you can read that article here: Pushing Data to GridDB Cloud with Kafka HTTP Sink Connector. In this article, we will expand on those efforts and add timestamp data types into the mix. By the time you finish this article, you should be able to understand how you can stream data from some source over to GridDB Cloud, with the added benefit of being able to push to time series containers which take timestamps as their rowkey (a must!) As stated above, the big addition for this article is the handling of time series data and pushing it out into the GridDB Cloud. There were two things that had to be learned in order to get this project to work: chaining together Single Message Transforms, and learning the exact time format the GridDB WebAPI accepts as acceptable for time series data; there was also a minuscule change made to the SMT we used in the previous article. Prereqs This article is part II, and therefore a continuation of a previous effort; in part I, we go over the fundamentals of what this project is and how it works This means that understanding part I of this series is a pseudo-prerequisite for this article but is not necessarily required. In any case, the prereqs for both of these articles are the same: A Kafka system/cluster running (docker is the easiest way) The source code for the custom Single Message Transform (or the pre-compiled .jar) The GridDB Web API Sink connector (just connection details to make it yourself) The source code (and all of the required configs/yamls) can found on the GridDB.net github page: $ git clone https://github.com/griddbnet/Blogs.git –branch kafka_http_timeseries Implementation Most of the code implementation for this project was done in the previous effort, but there are still some changes we need to make to the existing code base. Mostly though, we will be using an existing Single Message Transform to be able to send time series data to GridDB Cloud. The way it works is this: an SMT allows for to transforming the Kafka records before it gets sent over to your Kafka sink. It also allows for using multiple SMTs (executed in order) before the data gets sent out. For our purposes, we are just using the right side of the diagram. The topic flows through to the sink, gets transformed (twice in this case!) and then out to our GridDB Cloud installation. The photo is credited to confluent. Chaining Single Message Transforms In part I of this series, we used our custom SMT to decouple the values from the field names from our Kafka record and form it into a nested array, which is the only data struct that a PUT to GridDB Cloud accepts. Using just this alone, we were able to successfully push data to a GridDB Collection container. However, when dealing with time series containers, an issue arises because the WebAPI expects a very specific data format for the time series data column. If your data is in milliseconds since epoch, for example, the GridDB WebAPI will not accept that as a valid time column type and will reject the HTTP Request. According to the docs, the format expected by GridDB WebAPI is this: YYYY-MM-DDThh:mm:ss.SSSZ (ie. “2016-01-16T10:25:00.253Z”). So, before we transform our data to extract the values and create our nested array, we can run a Single Message Transform on just the ts column, transform whatever the value is into the format it likes, and then run the process of building our nested array. Using this flow allows for us to push data successfully but to also transform the timestamp column into the exact format expected. And please remember, the order of your transforms matter! “transforms.timestamp.type”: “org.apache.kafka.connect.transforms.TimestampConverter$Value”, “transforms.timestamp.target.type”: “string”, “transforms.timestamp.field”: “ts”, “transforms.timestamp.format”: “yyyy-MM-dd’\”T’\”HH:mm:ss.SSS’\”Z’\” “, “transforms.nestedList.type”: “net.griddb.GridDBWebAPITransform$Value”, “transforms.nestedList.fields”: “ts”, Here you see we target the ts column and we explicitly state the format we expect. One small gotcha is that you must wrap the T and Z characters in single quotes otherwise Kafka will reject the format as illegal. And of course, if you deviate from this format at all, you will be rejected by the GridDB Cloud — ouch! Handling Strings Sent to GridDB WebAPI Now that we’ve got our SMTs in place, there’s one more ‘gotcha’ to investigate. The GridDB Web API expects the timestamp to be wrapped in double quotes, and so we need to make a small change to our SMT from part I of this article: Object val = fPath.valueFrom(value); String valType = val.getClass().getName(); if (valType.contains(“String”)) { val = “\”” + val + “\””; row.add(val); } else { row.add(val); } Luckily for us, the WebAPI expects all strings to be wrapped in double quotes, so we don’t need to do any explicit checking if the value is a timestamp or not, we just need to check if the value is a string. Once we have this settled, fire up the connector (you can run the script inside of the scripts/ dir just please make the necessary changes before you do so) and then create some topics. Creating Topics with Schemas Using the Control Center Now that we’ve got our infrastructure in place, let’s run it! First, please make sure the GridDB Cloud URL you’re using points to a real container already in place in your db. In my case, I made a time series container called kafka_ts and gave it a schema of: ts (timestamp), data (float), temp (float). This container is already being pointed to in the URL of my sink connector. With that out of the way, let’s make our topic and schema. If you used the script to create the connector, your topic may be named topic_griddb_cloud, so head into your Kafka control-center (located in http://localhost:9021) and create a new topic. From the Schema tab, you can copy and paste the following schema: { “connect.name”: “net.griddb.webapi.griddb”, “connect.parameters”: { “io.confluent.connect.avro.field.doc.data”: “The string is a unicode character sequence.”, “io.confluent.connect.avro.field.doc.temp”: “The double type is a double precision (64-bit) IEEE 754 floating-point number.”, “io.confluent.connect.avro.field.doc.ts”: “The int type is a 32-bit signed integer.”, “io.confluent.connect.avro.record.doc”: “Sample schema to help you get started.” }, “doc”: “Sample schema to help you get started.”, “fields”: [ { “doc”: “The int type is a 32-bit signed integer.”, “name”: “ts”, “type”: “string” }, { “doc”: “The double type is a double precision (64-bit) IEEE 754 floating-point number.”, “name”: “temp”, “type”: “double” }, { “doc”: “The string is a unicode character sequence.”, “name”: “data”, “type”: “double” } ], “name”: “griddb”, “namespace”: “net.griddb.webapi”, “type”: “record” } Once created, from the messages tab, produce a new message like so: { “ts”: “2025-03-13T18:00:00.032Z”, “data”: 23.2, “temp”: 43.23 } If all goes well, your sink should still be running and you should have a new row of data inside of your container — cool!! Troubleshooting While preparing for this article, I had lots of issues getting everything to run properly, despite the results showing how relatively simple it is. There are two reasons for that: 1, debugging was pretty obtuse as the logs are extremely difficult to follow, and 2, the schema rules are extremely finicky and must be precise for Kafka to follow through on streaming data (which is a good thing!). So, if you are encountering issues, I recommend first changing the log level of your Kafka cluster’s connect container from just “WARN” (default) to either DEBUG or TRACE. I’d start with DEBUG and move up if necessary as the TRACE logs move extremely quickly and are difficult to read. You can change the log level with Docker by adding some environment variables and doing a hard reset of your containers. Add this to the bottom of your Kafka connect docker-compose section #docker-compose.yml connect: image: cnfldemos/cp-server-connect-datagen:0.6.4-7.6.0 hostname: connect container_name: connect depends_on: – broker – schema-registry ports: – “8083:8083” – “2929:2929” – “443:443” environment: CONNECT_LOG4J_ROOT_LOGLEVEL: DEBUG # Or TRACE for even more detail CONNECT_LOG4J_LOGGERS: org.apache.kafka.connect=DEBUG,io.confluent.connect=DEBUG $ docker compose up -d –force-recreate And once it’s printing out the enormous amounts of logs, you can narrow down what you’re searching for using grep $ dockers logs -f connect | grep Caused Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic topic_griddb_cloud to Avro: Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte! I’ve found Caused to be the best way to debug the issues with the connectors, but you can try searching for the topic name, the connector name, or maybe your URL endpoint. Another thing you can do is to modify the SMT code and print messages from there to observe how the SMT is handling your records. Conclusion And now we can successfully push our kafka data directly into GridDB Time Series Containers on the

More
Build a Time Series Forecasting Model Using TensorFlow Keras and GridDB

This article explains how to build a time series forecasting model using TensorFlow Keras and GridDB. We will retrieve historical stock market data from Yahoo Finance, store it in a GridDB time series container, and use it to train a TensorFlow Keras transformer model for time series forecasting. GridDB is a robust NoSQL database designed for handling large volumes of real-time data with exceptional efficiency. Its advanced in-memory processing and time series data management features make it an ideal choice for big data and IoT applications, including financial forecasting and real-time analytics. Prerequisites To run the code in this article, you will need the following libraries: GridDB C Client GridDB Python client Follow the instructions on the GridDB Python Package Index (Pypi) page to install these clients. You will also need to install TensorFlow, yfinance, Numpy, Pandas, and Matplotlib libraries. The scripts below will help you install and import the necessary libraries for running the code in this article. pip install yfinance python3 -m pip install tensorflow[and-cuda] import os import absl.logging os.environ[‘TF_CPP_MIN_LOG_LEVEL’] = ‘3’ os.environ[‘TF_ENABLE_ONEDNN_OPTS’] = ‘0’ absl.logging.set_verbosity(absl.logging.ERROR) os.environ[‘CUDA_VISIBLE_DEVICES’] = ‘-1′ import yfinance as yf import pandas as pd import griddb_python as griddb import numpy as np import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers from tensorflow.keras.utils import disable_interactive_logging import matplotlib.pyplot as plt from sklearn.preprocessing import MinMaxScaler Inserting Stock Market Data into GridDB We will use stock market data from Yahoo Finance to train our time series forecasting model. In this section, you will see how to fetch stock market data from Yahoo Finance, create a connection with GridDB, and insert the Yahoo finance data into a GridDB container. Fetch Data from Yahoo Finance The yfinance.download() method allows you to retrieve data from Yahoo Finance into a Pandas dataframe. In the script below we retrieve Apple’s stock prices for the full year of 2023: ticker = “AAPL” start_date = “2023-01-01” end_date = “2023-12-31″ data = yf.download(ticker, start=start_date, end=end_date) print(f”Fetched {len(data)} rows of data for {ticker}”) data.head() Output: Connect to GridDB To connect to GridDB you need to call the griddb.StoreFactory.get_instance() method to get a GridDB factory instance object. Next, you need to create a GridDB factory store object using the get_store() method. You will need to pass your GridDB host name, cluster, name and user and password to the get_store() method. Finally, you can test your connection by randomly retrieve a GridDB container using the get_container() method. The following script shows how to connect to GridDB and test your GridB connection: # GridDB connection details DB_HOST = “127.0.0.1:10001” DB_CLUSTER = “myCluster” DB_USER = “admin” DB_PASS = “admin” # creating a connection factory = griddb.StoreFactory.get_instance() try: gridstore = factory.get_store( notification_member = DB_HOST, cluster_name = DB_CLUSTER, username = DB_USER, password = DB_PASS ) container1 = gridstore.get_container(“container1”) if container1 == None: print(“Container does not exist”) print(“Successfully connected to GridDB”) except griddb.GSException as e: for i in range(e.get_error_stack_size()): print(“[“, i, “]”) print(e.get_error_code(i)) print(e.get_location(i)) print(e.get_message(i)) Output: Container does not exist Successfully connected to GridDB Create Container for Stock Data in GridDB A GridDB container is a fundamental data structure in used for storing and managing data in GridDB. We will store the Yahoo Finance data we retrieved in a time series type container. To create a container you first need to call the ContainerInfo() method and pass it the container name, a list of lists containing data columns and types, and the container type which in our case will be griddb.ContainerType.TIME_SERIES. Next, call the put_container() method and pass it as a parameter the container info object you previously created. The script below shows how to create the AAPL_stock_data container in GridDB. container_name = f”{ticker}_stock_data” column_info = [ [“Timestamp”, griddb.Type.TIMESTAMP], [“Open”, griddb.Type.DOUBLE], [“High”, griddb.Type.DOUBLE], [“Low”, griddb.Type.DOUBLE], [“Close”, griddb.Type.DOUBLE], [“Volume”, griddb.Type.LONG] ] container_info = griddb.ContainerInfo(container_name, column_info, griddb.ContainerType.TIME_SERIES) try: gridstore.put_container(container_info) container = gridstore.get_container(container_name) if container is None: print(f”Failed to create or retrieve container: {container_name}”) else: print(f”Successfully created and retrieved container: {container_name}”) except griddb.GSException as e: print(f”Error creating or retrieving container {container_name}:”) for i in range(e.get_error_stack_size()): print(f”[{i}]”) print(f”Error code: {e.get_error_code(i)}”) print(f”Location: {e.get_location(i)}”) print(f”Message: {e.get_message(i)}”) Output: Successfully created and retrieved container: AAPL_stock_data Insert Data into GridDB Container The last step is to insert the Yahoo Finance data from the Pandas DataFrame into the GridDB container you created in the previous script. To do so, you can iterate through all the rows of a Pandas DataFrame, call the container’s put() method and pass it the data you want to store in the container. The script below shows how to store Yahoo Finance Data in a GridDB container. try: for index, row in data.iterrows(): container.put([index.to_pydatetime(), row[‘Open’], row[‘High’], row[‘Low’], row[‘Close’], int(row[‘Volume’])]) print(f”Successfully inserted {len(data)} rows of data into {container_name}”) except griddb.GSException as e: print(f”Error inserting data into container {container_name}:”) for i in range(e.get_error_stack_size()): print(f”[{i}]”) print(f”Error code: {e.get_error_code(i)}”) print(f”Location: {e.get_location(i)}”) print(f”Message: {e.get_message(i)}”) Output: Successfully inserted 250 rows of data into AAPL_stock_data Creating a Stock Market Forecasting Model Using TensorFlow Keras We have now Successfully stored our time series stock market data in GriDB, next we will train a TensorFlow Keras model for time series forecasting. Retrieving Data from GridDB First we will retrieve data from our GridDB container and store it in a Pandas DataFrame. To do so, call the get_container() method and pass to it the name of the container you want to retrieve. Next, call SELECT * query on the container using the query() method. Call the fetch() method to run the query and finally the fetch_rows() function to store returned records in a Pandas DataFrame. def retrieve_data_from_griddb(container_name): try: stock_data_container = gridstore.get_container(container_name) # Query all data from the container query = stock_data_container.query(“select *”) rs = query.fetch() # Adjust the number based on your data size data = rs.fetch_rows() data .set_index(“Timestamp”, inplace=True) return data except griddb.GSException as e: print(f”Error retrieving data from GridDB: {e.get_message()}”) return none stock_data = retrieve_data_from_griddb(“AAPL_stock_data”) stock_data.head() Output: Data Preprocessing for TensorFlow Keras Transformer Model We will use a Transformer model from TensorFlow Keras for time series forecasting in this article. You can also use a long short term memory (LSTM) or one-dimensional convolutional neural networks (1D-CNN) as well. However, transformers, being the state of the art are likely to outperform the other models. We will use the Open and Volume stock prices for the last seven days to predict the Open stock price for the next day. To do so, we will divide our data into into feature (X) and labels (y) set, and the into training (80%) and test(20%) sets. We will also normalize our data since deep learning models are known to work better with the normalized data. The following script preprocesses and normalizes the dataset. features = [‘Open’, ‘Volume’] data = stock_data[features].values # Initialize the scaler scaler = MinMaxScaler(feature_range=(0, 1)) # Fit the scaler to the data and transform data_normalized = scaler.fit_transform(data) # Create sequences def create_sequences(data, seq_length): X, y = [], [] for i in range(len(data) – seq_length): X.append(data[i:(i + seq_length), :]) y.append(data[i + seq_length, 0]) # Predicting next day’s Open price return np.array(X), np.array(y) seq_length = 7 # stock prices of last 7 days X, y = create_sequences(data_normalized, seq_length) # Split the data into training and testing sets split = int(0.8 * len(X)) X_train, X_test = X[:split], X[split:] y_train, y_test = y[:split], y[split:] Creating a TensorFlow Keras Transformer Model Next, we will define our transformer model architecture. Our model will consist of a multiheaded attention layer, followed by two 1-D convolutional neural network layers. We will also add dropout and layer normalization to avoid overfitting. You can modify the model architecture if you want. # Define the Transformer block def transformer_encoder(inputs, head_size, num_heads, ff_dim, dropout=0): # Attention and Normalization x = layers.MultiHeadAttention( key_dim=head_size, num_heads=num_heads, dropout=dropout )(inputs, inputs) x = layers.Dropout(dropout)(x) x = layers.LayerNormalization(epsilon=1e-6)(x) res = x + inputs # Feed Forward Part x = layers.Conv1D(filters=ff_dim, kernel_size=1, activation=”relu”)(res) x = layers.Dropout(dropout)(x) x = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(x) x = layers.LayerNormalization(epsilon=1e-6)(x) return x + res =Subsequently, we will define the build_model() method that builds our model. The model takes our data features and labels as inputs, pass the data through transformer model we just defined. The output of the transformer model is passed through a global average pooling layer, followed by three dense layers to get the final model output. # Build the model def build_model( input_shape, head_size, num_heads, ff_dim, num_transformer_blocks, mlp_units, dropout=0, mlp_dropout=0, ): inputs = keras.Input(shape=input_shape) x = inputs for _ in range(num_transformer_blocks): x = transformer_encoder(x, head_size, num_heads, ff_dim, dropout) x = layers.GlobalAveragePooling1D(data_format=”channels_first”)(x) for dim in mlp_units: x = layers.Dense(dim, activation=”relu”)(x) x = layers.Dropout(mlp_dropout)(x) outputs = layers.Dense(1)(x) return keras.Model(inputs, outputs) Next, we pass the model configurations to the build_model() function and get the model object back from the function. We call the compile() method to compile the model. # Create the model input_shape = X_train.shape[1:] model = build_model( input_shape, head_size=256, num_heads=4, ff_dim=4, num_transformer_blocks=4, mlp_units=[128], mlp_dropout=0.4, dropout=0.25, ) # Compile the model model.compile( optimizer=keras.optimizers.Adam(learning_rate=1e-4), loss=”mse”, metrics=[“mae”] ) Next, we define call backs for early stopping, storing the best model weights, and reducing the learning rate. Finally, we call the fit() method an pass it our training data to start model training. # Define callbacks callbacks = [ keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True), keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=5, min_lr=1e-6), ] # Train the model history = model.fit( X_train, y_train, validation_split=0.2, epochs=100, batch_size=32, callbacks=callbacks, ) Output: The script below shows the training and validation losses for our model. The curves show that our model is not overfitting. plt.figure(figsize=(12, 6)) plt.plot(history.history[‘loss’], label=’Training Loss’) plt.plot(history.history[‘val_loss’], label=’Validation Loss’) plt.title(‘Model Training History’) plt.ylabel(‘Loss’) plt.xlabel(‘Epoch’) plt.legend() plt.show() Output: Evaluating the Model Performance Let’s evaluate our model’s performance on the training and test sets. The output of the following script shows that we receive a mean absolute error score of 0.1596 which is greater than 0.0788. This shows that our model is overfitting on training set. Next, we will plot the actual and predicted stock prices side-by-side on a line plot. It is important to note that we have to inverse the effect of data scaling that we did during the data preprocessing step. The following script also does that. test_loss, test_mae = model.evaluate(X_test, y_test) print(f”Test MAE: {test_mae:.4f}”) # When making predictions and inverse transforming: train_predictions = model.predict(X_train) test_predictions = model.predict(X_test) # Inverse transform predictions (for Open price only) train_predictions = scaler.inverse_transform(np.column_stack((train_predictions, np.zeros_like(train_predictions))))[:, 0] test_predictions = scaler.inverse_transform(np.column_stack((test_predictions, np.zeros_like(test_predictions))))[:, 0] # Inverse transform actual values y_train_actual = scaler.inverse_transform(np.column_stack((y_train.reshape(-1, 1), np.zeros_like(y_train.reshape(-1, 1)))))[:, 0] y_test_actual = scaler.inverse_transform(np.column_stack((y_test.reshape(-1, 1), np.zeros_like(y_test.reshape(-1, 1)))))[:, 0] Output: Test MAE: 0.1596 Finally, the script below plots the actual stock prices for the training set, and the actual and predicted stock prices for the test set. # Get the actual prices for the entire dataset y_actual = np.concatenate([y_train_actual, y_test_actual]) # Create a date range for the entire dataset full_date_range = pd.date_range(start=stock_data.index[seq_length], periods=len(y_actual), freq=’B’) # Get the date range for the test set test_date_range = full_date_range[-len(y_test_actual):] # Plot results plt.figure(figsize=(20, 10)) # Plot the entire actual price series plt.plot(full_date_range, y_actual, label=’Actual’, color=’blue’) # Plot only the test predictions plt.plot(test_date_range, test_predictions, label=’Predicted (Test Set)’, color=’red’, linestyle=’–‘) # Add a vertical line to indicate the start of the test set split_date = full_date_range[-len(y_test_actual)] plt.axvline(x=split_date, color=’green’, linestyle=’–‘, label=’Test Set Start’) plt.title(‘Stock Open Price – Actual vs Predicted (Test Set)’, fontsize=20) plt.xlabel(‘Date’, fontsize=16) plt.ylabel(‘Open Price’, fontsize=16) plt.legend(fontsize=14) plt.grid(True, which=’both’, linestyle=’–‘, linewidth=0.5) # Rotate and align the tick labels so they look better plt.gcf().autofmt_xdate() # Show the plot plt.show() Output: From the above output, you can see that model predictions are quite close to the actual stock prices. The model also captures the bullish and bearish trend. Conclusion In this article, you learned how to create a TensorFlow Keras model for time series forecasting using data from GridDB. We explored how to connect to GridDB, insert financial data into a time series container, and retrieve it for further processing. We also demonstrated how to build a Transformer-based neural network model for predicting stock prices. You can use the code in this article for developing any time series forecasting model using GridDB time series data. GridDB is a highly efficient NoSQL database, optimized for handling large-scale time series data, which makes it ideal for applications like financial forecasting and real-time analytics. Using TensorFlow’s advanced deep learning and AI capabilities and GridDB’s powerful data management system, you can build scalable and performant forecasting models. You can find the complete code for this blog on my GridDB Blogs GitHub repository. For any questions or issues related to GridDB, feel free to reach out on Stack Overflow using the griddb tag to get prompt response from our engineers. Please note: This article is for educational purposes only and does not serve as financial or stock trading

More
Clothes Recommendation System Using OpenAI & RAG

Table of Contents Clothes Recommendation System Using OpenAI \& RAG Introduction System Architecture Running The Project Understanding Retrieval-Augmented Generation (RAG) How Does RAG Work? Advantages of OpenAI \& RAG in Fashion Prerequisites OpenAI Docker Node.js Project Development Node.js Backend Data Management with GridDB Building User Interface Further Enhancements Introduction Clothes recommendation is an important feature in any e-commerce solution. It gives personalized shopping experiences in fashion and using AI-driven solutions will enhance those experiences. In this article, we will use the GPT-4o mini model to analyze images of clothing and extract its colors and styles. With this information, we can accurately identify the characteristics of the input clothing item and complement the identified features with our knowledge base using the RAG technique. Running The Project This app is tested on ARM Machines such as Apple MacBook M1 or M2 and to run the project you need Docker installed. 1. .env Setup Create an empty directory, for example, clothes-rag, and change to that directory: mkdir clothes-rag cd clothes-rag Create a .env file with these keys: OPENAI_API_KEY= GRIDDB_CLUSTER_NAME=myCluster GRIDDB_USERNAME=admin GRIDDB_PASSWORD=admin IP_NOTIFICATION_MEMBER=griddb-server:10001 To get the OPENAI_API_KEY please read this tutorial section. 2. Run with Docker Compose Create the docker-compose.yml file in the directory and use this setup configuration: networks: griddb-net: driver: bridge services: griddb-server: image: griddbnet/griddb:arm-5.5.0 container_name: griddb-server environment: – GRIDDB_CLUSTER_NAME=${GRIDDB_CLUSTER_NAME} – GRIDDB_PASSWORD=${GRIDDB_PASSWORD} – GRIDDB_USERNAME=${GRIDDB_USERNAME} – NOTIFICATION_MEMBER=1 – IP_NOTIFICATION_MEMBER=${IP_NOTIFICATION_MEMBER} networks: – griddb-net ports: – “10001:10001” # Expose GridDB port if needed for external access clothes-rag: image: junwatu/clothes-rag:latest container_name: clothes-rag-griddb env_file: .env # Load environment variables from the single .env file networks: – griddb-net ports: – “3000:3000″ # Expose application port for local access 3. Run When steps 1 and 2 are finished, run the app with this command: docker-compose up -d If everything running, you will get a similar response to this: [+] Running 3/3 ✔ Network clotes-rag-griddb_griddb-net Created 0.0s ✔ Container griddb-server Started 0.2s ✔ Container clothes-rag-griddb Started 0.2s 4. Test the App Open the browser and go to http://localhost:3000. By default the app will automatically make a request for the default selected product. If you want to run the project locally from the app source code, please read this section. System Architecture This system architecture leverages RAG to ensure that the recommendations are informed by both user-specific input and stored data, making them more relevant and customized. Here’s a breakdown of the components and their interactions: User Interaction: The user inputs a prompt (e.g., their preferences or requirements for clothing) through a React.js based User Interface. This UI serves as the point where the user communicates with the system, sending prompts and receiving recommendations. Node.js Backend: The Node.js server acts as the core processing unit, handling communication between the user interface, database, and OpenAI services. It receives the user’s prompt from the React.js front end and processes it to determine the data and insights required for a recommendation. Data Source (GridDB): GridDB is used to store clothing recommendation data such as selected clothes and recommendations. RAG Integration with OpenAI (Embeddings): In this system, the RAG use data from CSV with Embeddings data. The Node.js server uses RAG to provide enhanced context by combining information fetched from Text Embedded Model, data from CSV and the user’s prompt before passing it to OpenAI. OpenAI (Text Embedding + GPT-4.0 Mini): The Text Embedding model is used to generate vector representations of the prompt and any retrieved-context, making it easier to match user queries with relevant data. GPT-4.0 Mini (a smaller variant of GPT-4) processes the prompt, query, and enhanced context together to generate tailored recommendations. This step enables the system to provide more personalized and context-aware recommendations based on both user input and the data fetched from the CSV file. Response Flow: After generating the recommendation, the response is sent back through the Node.js backend to the React.js user interface, where the user can view the clothing suggestions. Understanding Retrieval-Augmented Generation (RAG) Retrieval-augmented generation (RAG) enhances large language models (LLMs) by using external knowledge bases for more accurate responses. LLMs, trained on vast data with billions of parameters, perform tasks like answering questions or translations. RAG improves this by enabling the model to access specific domains or internal data without retraining. How Does RAG Work? Without RAG, the LLM takes the user input and creates a response based on the information it was trained on—or what it already knows. With RAG, an information retrieval component is introduced that utilizes the user input to first pull information from a new knowledge source. The user query and the relevant information are both given to the LLM. The LLM uses the new knowledge and its training data to generate a better text response. Advantages of OpenAI & RAG in Fashion Combining GPT-4o mini with Retrieval-Augmented Generation (RAG) offers several practical benefits for the fashion industry: Contextual Understanding: GPT-4o mini analyzes clothing inputs and comprehends their context, leading to more accurate responses. Access to Information: RAG integrates the generative abilities of GPT-4o mini with a retrieval system that draws from a large database of fashion-related knowledge, ensuring relevant information is readily available. Personalization: The system can provide tailored recommendations based on user preferences and historical data, enhancing the shopping experience. In this post, only points 1 and 2 are utilized for the project. Prerequisites OpenAI There few steps needed to set up OpenAI. Go to your project dashboard and do these steps: You need to enable two models from OpenAI: gpt-4o-mini text-embedding-3-large You also need to create a key. It will be used by the app so it can use those models: Use the key for the value of `OPENAI_API_KEY` in the `.env` file and this file should be ignored from the repository. Docker For easy development and distribution, this project uses a docker container to “package” the application. For easy Docker installation, use the Docker Desktop tool. GridDB Docker This app needs a GridDB server and it should be running before the app. In this project, we will use the GridDB docker for ARM machines. To test the GridDB on your local machine, you can run these docker commands: docker network create griddb-net docker pull griddbnet/griddb:arm-5.5.0 docker run –name griddb-server \ –network griddb-net \ -e GRIDDB_CLUSTER_NAME=myCluster \ -e GRIDDB_PASSWORD=admin \ -e NOTIFICATION_MEMBER=1 \ -d -t griddbnet/griddb:arm-5.5.0 By using the Docker Desktop, you can easily check if the GridDB docker is running. For more about GridDB docker for ARM, please check out this blog. Node.js This is needed for the project development. However, if you just want to run the project, you don’t have to install it. Install Node.js from here. For this project, we will use the nvm package manager and Node.js v20.18.0 LTS version. # installs nvm (Node Version Manager) curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash # download and install Node.js nvm install 20 # verifies the right Node.js version is in the environment node -v # should print `v20.18.0` # verifies the right NPM version is in the environment npm -v ` To connect Node.js and GridDB database, you need the gridb-node-api npm package which is a Node.js binding developed using GridDB C Client and Node addon API. Project Development If you want to develop the project, you need to do these few steps: 1. Check the GridDB To make this app work as expected, make sure the GridDB docker is running. To check it, you can use this docker command: # Check container status docker ps | grep griddb-server If the GridDB is running, you will have a similar response to this: fcace9e13b5f griddbnet/griddb:arm-5.5.0 “/bin/bash /start-gr…” 3 weeks ago Up 20 hours 0.0.0.0:10001->10001/tcp griddb-server and if it’s not running, please check this previous section. 2. Clone the App Source Code Before cloning the source code, please ensure that Git LFS is installed, as this app utilizes relatively large static files. Clone the app source code from this repository: git clone https://github.com/junwatu/clothes-recommendation.git The app folder is the source code for this app. 3. Build the App Docker Change the directory into the app folder and build the app docker version: cd app docker build -t clothes-rag . 4. Run the App Docker Before running the dockerize app, you need to setup a few enviroment keys. You can copy these keys from the .env.example file or create an .env file and fill with these keys: OPENAI_API_KEY= GRIDDB_CLUSTER_NAME=myCluster GRIDDB_USERNAME=admin GRIDDB_PASSWORD=admin IP_NOTIFICATION_MEMBER=griddb-server:10001 Make sure you have the key to access the OpenAI service. For details on how to do this, read the previous section. The last one is run the app docker using this command: docker run –name clothes-rag-griddb \ –network griddb-net \ –env-file .env \ -p 3000:3000 clothes-rag Also, by using the Docker Desktop you can easily check if the GridDB and the dockerize app are running or not. From the screenshot above the GridDB is running on port 10001 and the app is runnning on port 3000. Now, you can test the app using the browser. Node.js Backend This app uses Node.js as the backend server. It serves user interface files and processes the AI recommendation for the selected product. API Documentation Route Method Description / GET Serves the main user interface file. /recommendation POST Generates clothing recommendations. /query GET Retrieves stored data. /query/:id GET Retrieves data by a specific ID. The core functionality for this app is in the /recommendation route. The getClothRecommendations function will take a selected product, which is essentially a product image, and it will return an array of product recommendations. const recommendationResults = await getClothRecommendations(realImagePath); RAG API Documentation The RAG source code is in the lib\rag.js file. This file is responsible for getting the clothes recommendation. Function Name Description analyzeCloth Analyzes an image of clothing to suggest matching items, category, and gender. getEmbeddings Generates embeddings for text descriptions, creating vector representations for similarity calculations. findSimilarItems Finds items similar to an input item based on cosine similarity, filtering by threshold and top matches. getClothRecommendations Generates recommendations for clothing items to pair with an input image, with retry for better matches. The core functionality is handled by the findSimilarItems function which use the cosine similarity function to compare between 2 vector (clothes number representation). function cosineSimilarityManual(vec1, vec2) { vec1 = vec1.map(Number); vec2 = vec2.map(Number); const dotProduct = vec1.reduce((sum, v1, i) => sum + v1 * vec2[i], 0); const mag1 = Math.sqrt(vec1.reduce((sum, v) => sum + v * v, 0)); const mag2 = Math.sqrt(vec2.reduce((sum, v) => sum + v * v, 0)); return dotProduct / (mag1 * mag2); } If the vector result is tend to value 1 then the clothes is similar or recommended. You can set the minimum similarity score for clothes to be included. In this code, the minimum threshold where the clothes are considered as a recommendation is 0.5, you can change this to a higher value for stricter recommendations: function findSimilarItems(inputEmbedding, embeddings, threshold = 0.5, topK = 2) { const similarities = embeddings.map((vec, index) => [index, cosineSimilarityManual(inputEmbedding, vec)] ); const filteredSimilarities = similarities.filter(([, sim]) => sim >= threshold); const sortedIndices = filteredSimilarities .sort((a, b) => b[1] – a[1]) .slice(0, topK); return sortedIndices; } Data Source The RAG data source uses a clothes style CSV file that contain embeddings values and from it the app get all the clothes recommendation. You can look at all the clothes style databases in the data\clothes_styles_with_embeddings.csv file. Data Management with GridDB API Documentation The main code that responsible for handling data input and output to GridDB is the db/griddbOperarions.js file. Here’s the table summary for its function: Function Description getOrCreateContainer Creates a new container or retrieves an existing one based on the specified container name and column info. insertData Inserts data into the specified container and logs the operation. queryData Executes a query on the specified container and fetches the results, logging the number of rows retrieved. queryDataById Queries a container for a specific row identified by a unique ID, returning the corresponding row data. The GridDB database can be used to save data as a collection or simply behave like a column base database. This function will use the existing container or create a new collection container: export async function getOrCreateContainer(containerName, columnInfoList, rowKey = true) { try { const conInfo = new griddb.ContainerInfo({ ‘name’: containerName, ‘columnInfoList’: columnInfoList, ‘type’: griddb.ContainerType.COLLECTION, ‘rowKey’: rowKey }); await store.dropContainer(containerName).catch(() => console.log(“Container doesn’t exist. Creating new one…”)); let container = await store.putContainer(conInfo, false); return container; } catch (err) { console.error(“Error creating container:”, err.message); throw err; } } In the getOrCreateContainer function the type container info key should be set as griddb.ContainerType.COLLECTION so the GridDB will save the data as a collection. Save Data The data model for this app contains 3 data only: id, image, and recommendations: const columnInfoList = [ [‘id’, griddb.Type.INTEGER], [‘image’, griddb.Type.STRING], [‘recommendations’, griddb.Type.STRING] ]; The recommendation data will be saved after successful response from OpenAI and this will be handled in the route /recommendation: const container = await getOrCreateContainer(containerName, columnInfoList); await insertData(container, [generateRandomID(), product.image, JSON.stringify(cleanRecommendations)]); Read Data To read data in the GridDB database, you can directly use the /query route. Building User Interface The user interface is built using the React library. The main user interface is built only with 2 react components: ProductSelector.tsx This component shows all the clothes products. For simplicity, the product list data is from static data: const products: Product[] = [ { id: 1, name: “Striped Sports Jersey”, description: “Red and black striped sports jersey with short sleeves”, price: 39.99, color: “Red/Black”, size: [“S”, “M”, “L”, “XL”], category: “Sports Wear”, image: “/data/preview/1.png”, thumbnail: “data/preview/1-small.jpeg”, }, //… ]; When user select one clothes from the product list, the thumbnail will sent to the server for processing and the server will find recommendations. RecommendationCard.tsx This component will display any recommendation for the selected product. Further Enhancements To improve this product recommendation app, consider these five enhancements: Personalize Recommendations with User Profiles. Dynamic Product Catalog with Real-Time Database Integration. Optimize Data Retrieval with Incremental Caching. Improve Recommendation Algorithm. UI and UX

More
Build a Movie Reservation Website with Spring Boot: A Beginner’s Guide

Hey there, fellow developers! 👋 Ever wondered how websites like MovieTickets work behind the scenes? Today, we’re going to build our own movie reservation website using Spring Boot. Don’t worry if you’re just starting out, I’ll break everything down into simple, easy-to-follow steps. What’s Spring Boot and Why Are We Using It? Think of Spring Boot as your trusty assistant that helps you build websites faster and easier. It’s like having a super-helpful friend who has already set up most of the boring stuff for you so that you can focus on the fun parts of coding! Imagine you’re building a house. Instead of creating every tool from scratch, Spring Boot gives you a ready-to-use toolbox. It handles a lot of the complex setup that usually gives developers headaches, like: Setting up a new project (it’s as easy as clicking a few buttons!) Managing different parts of your website Connecting to your database Making your website secure The best part? Spring Boot is popular in the real world, so learning it now will help you in your future career. Plus, it’s perfect for building websites that need to handle lots of users at once – exactly what we need for our movie reservation system! What We’ll Build Together: A website where users can view movies and show times. A way to select seats and make reservations for showtimes. Allows admins to manage movies. The system should prioritize availability for viewing movies and shows but should prioritize consistency for reservations. Ready to dive in? Let’s start building something awesome! 🚀 Getting Started 🎬 To follow along, you’ll need: Basic Java knowledge A code editor (like IntelliJ or VS Code) Java Development Kit (JDK) installed Create a Spring Boot Project Navigate to start.spring.io. This service pulls in all the dependencies you need for an application and does most of the setup. Click generate, it will generate the Spring Boot project and download it as a zip. Now unzip this project and import it into any IDE.<br> To interact with GridDB, we need to add a GridDB Java Client to this project. Add the following dependency into maven pom.xml. <dependency> <groupId>com.github.griddb</groupId> <artifactId>gridstore</artifactId> <version>5.5.0</version> </dependency> Defining the Entities Let’s start by looking at the big picture – kind of like sketching out the blueprint before building a house. Think of this like making a list of the main “things” our app needs to keep track of. In coding, we call these “entities” (but really, they’re just the important pieces of data we need to store). First, let’s list out the key items we need for our app to work properly. We’ll need the following entities: Movie: This entity stores essential information about a movie, for example, title and genre. Show: contains information related to the schedule or actual time at which a movie begins. Seat: represents the physical seat location. User: represents the individual interacting with the system. Reservation: records the details of a user’s reservation. It typically includes the user ID, show ID, total price, reservation status, and seat number. Next, let’s create Java POJO classes. @Data public class User { @RowKey String id; String email; String fullName; Date createdAt; } @Data public class Movie { @RowKey private String id; private String title; private String genre; } @Data public class Show { @RowKey private String id; private String movieId; private Date startTime; private Date endTime; private Double price; private Integer totalSeats; } @Data public class Seat { @RowKey private String id; private String status; private String showId; private String seatNumber; } @Data public class Reservation { @RowKey private String id; private String userId; private String showId; private Double totalPrice; private Integer numberOfSeats; private String[] seats; Date createdAt; } Next, we create the GridDBConfig class as a central configuration for database operation. The class will do the following: * Read environment variables for connecting to the GridDB database * Create a GridStore class for managing database connection to the GridDB instance * Create GridDB Collection’s container (Table) to manage a set of rows. The container is a rough equivalent of the table in a relational database. * On creating/updating the Collection we specify the name and object corresponding to the column layout of the collection. Also for each collection, we add an index for a column that is frequently searched and used in the condition of the WHERE section of TQL. * Make the container available in the Spring container @Configuration public class GridDBConfig { @Value(“${GRIDDB_NOTIFICATION_MEMBER}”) private String notificationMember; @Value(“${GRIDDB_CLUSTER_NAME}”) private String clusterName; @Value(“${GRIDDB_USER}”) private String user; @Value(“${GRIDDB_PASSWORD}”) private String password; @Bean public GridStore gridStore() throws GSException { Properties properties = new Properties(); properties.setProperty(“notificationMember”, notificationMember); properties.setProperty(“clusterName”, clusterName); properties.setProperty(“user”, user); properties.setProperty(“password”, password); GridStore store = GridStoreFactory.getInstance().getGridStore(properties); return store; } @Bean public Collection<String, User> userCollection(GridStore gridStore) throws GSException { Collection<String, User> collection = gridStore.putCollection(AppConstant.USERS_CONTAINER, User.class); collection.createIndex(“email”); return collection; } @Bean public Collection<String, Movie> movieCollection(GridStore gridStore) throws GSException { Collection<String, Movie> movieCollection = gridStore.putCollection(AppConstant.MOVIE_CONTAINER, Movie.class); movieCollection.createIndex(“title”); return movieCollection; } @Bean public Collection<String, Show> showCollection(GridStore gridStore) throws GSException { Collection<String, Show> showCollection = gridStore.putCollection(AppConstant.SHOW_CONTAINER, Show.class); showCollection.createIndex(“movieId”); return showCollection; } @Bean public Collection<String, Seat> seatCollection(GridStore gridStore) throws GSException { Collection<String, Seat> seatCollection = gridStore.putCollection(AppConstant.SEAT_CONTAINER, Seat.class); seatCollection.createIndex(“showId”); return seatCollection; } @Bean public Collection<String, Reservation> reservationCollection(GridStore gridStore) throws GSException { Collection<String, Reservation> reservationCollection = gridStore.putCollection(AppConstant.RESERVATION_CONTAINER, Reservation.class); reservationCollection.createIndex(“userId”); reservationCollection.createIndex(“showId”); return reservationCollection; } } Listing and creating movies Now, we create the service class MovieService.java in the service package and implement all the business logic in this class. This service class will interact with the database and return the result after converting it to the DTO class. private List<Movie> fetchAll() { List<Movie> movies = new ArrayList<>(0); try { String tql = “SELECT * FROM ” + AppConstant.MOVIE_CONTAINER; Query<Movie> query = movieCollection.query(tql); RowSet<Movie> rowSet = query.fetch(); while (rowSet.hasNext()) { Movie row = rowSet.next(); movies.add(row); } } catch (GSException e) { log.error(“Error fetch all movies”, e); } return movies; } public List<MovieDTO> findAll() { final List<Movie> movies = fetchAll(); return movies.stream().map(movie -> mapToDTO(movie, new MovieDTO())).toList(); } public MovieDTO get(final String id) { try (Query<Movie> query = movieCollection.query(“SELECT * WHERE id='” + id + “‘”, Movie.class)) { RowSet<Movie> rowSet = query.fetch(); if (rowSet.hasNext()) { return mapToDTO(rowSet.next(), new MovieDTO()); } else { throw new NotFoundException(); } } catch (GSException e) { throw new AppErrorException(); } } public String create(final MovieDTO movieDTO) { if (titleExists(movieDTO.getTitle())) { return “”; } final Movie movie = new Movie(); mapToEntity(movieDTO, movie); movie.setId(KeyGenerator.next(“mv_”)); try { movieCollection.put(movie); } catch (GSException e) { log.error(“Failed put into Movie collection”, e); throw new AppErrorException(); } return movie.getId(); } After creating the service class, we will create the controllers to handle the HTTP request based on the URL. MovieController.java handles all the HTTP requests to /movies. This class will provide attributes to the HTML page. @Controller @RequestMapping(“/movies”) public class MovieController { private final MovieService movieService; public MovieController(final MovieService movieService) { this.movieService = movieService; } @GetMapping public String list(final Model model) { model.addAttribute(“movies”, movieService.findAll()); return “movie/list”; } @GetMapping(“/add”) public String add(@ModelAttribute(“movie”) final MovieDTO movieDTO) { return “movie/add”; } @PostMapping(“/add”) public String add(@ModelAttribute(“movie”) @Valid final MovieDTO movieDTO, final BindingResult bindingResult, final RedirectAttributes redirectAttributes) { if (bindingResult.hasErrors()) { return “movie/add”; } movieService.create(movieDTO); redirectAttributes.addFlashAttribute(WebUtils.MSG_SUCCESS, WebUtils.getMessage(“movie.create.success”)); return “redirect:/movies”; } } Next, we need the html pages for listing movies. We will use HTML elements to render tabular data comprised of rows and columns of cells. Inside the table body, we use Thymeleaf th:each to iterate over collections of movies. <table class=”table table-striped table-hover align-middle”> <thead> <tr> <th scope=”col”>[[#{movie.title.label}]]</th> <th scope=”col”>[[#{movie.genre.label}]]</th> <th><!– –></th> </tr> </thead> <tbody> <tr th:each=”movie : ${movies}”> <td>[[${movie.title}]]</td> <td>[[${movie.genre}]]</td> <td> <div class=”float-end text-nowrap”> <a th:href=”@{/shows/movie/{movieId}(movieId=${movie.id})}” class=”btn btn-sm btn-primary”>[[#{movie.list.show}]]</a> </div> </td> </tr> </tbody> </table> Next, we create a page to create a new movie. We use the &lt;form&gt; tag for submitting user input. <form th:action=”${requestUri}” method=”post”> <div th:replace=”~{fragments/forms::inputRow(object=’movie’, field=’title’, required=true)}” /> <div th:replace=”~{fragments/forms::inputRow(object=’movie’, field=’genre’)}” /> <input type=”submit” th:value=”#{movie.add.headline}” class=”btn btn-primary mt-4″ /> </form> Showtimes After completing the movies listing, we continue to create the showtimes. We will repeat the same process to create a listing page which will show the movie name, start time, end time, price and total seats. The final result will be like this: To make reservations, operator click Reserve button from the shows listing. Reservation We create the service class ReservationService.java to handle the reservation process. This class will interact with reservations and seat tables. @Service public class ReservationService { private final Collection<String, Reservation> reservationCollection; private final Collection<String, Seat> seatCollection; public ReservationService(Collection<String, Reservation> reservationCollection, Collection<String, Seat> seatCollection) { this.reservationCollection = reservationCollection; this.seatCollection = seatCollection; } } Reservation Flow Here’s a detailed breakdown of the seat selection functional requirements: Display the showtime and base price Display seats with color indicator: Not available (red), Selected seats (blue checkedbox) Allow users to select multiple seats After users submit the new reservation, the create method will handle the technical implementation: Re-calculate the total price Generate reservation ID Update the seat status from available to reserved If there are multiple seats, then we should make sure all the selected seats can be updated. public String create(final ReservationDTO reservationDTO) { reservationDTO.setTotalPrice(reservationDTO.getShow().getPrice() .multiply(new BigDecimal(reservationDTO.getSeats().size()))); final Reservation reservation = new Reservation(); mapToEntity(reservationDTO, reservation); reservation.setId(KeyGenerator.next(“rsv”)); try { seatCollection.setAutoCommit(false); for (String seatId : reservationDTO.getSeats()) { String tql = “SELECT * WHERE id='” + seatId + “‘”; log.info(tql); Query<Seat> query = seatCollection.query(tql, Seat.class); RowSet<Seat> rs = query.fetch(true); if (rs.hasNext()) { Seat seat = rs.next(); seat.setStatus(“RESERVED”); rs.update(seat); } } seatCollection.commit(); reservationCollection.put(reservation); } catch (GSException e) { log.error(“Failed to create reservations”, e); throw new AppErrorException(“Failed to save reservation”); } return reservation.getId(); } The reservations list will look like this: Running the Project with Docker Compose To spin up the project we will utilize Docker Compose. The entire code for the web application is available on Github. To run the app: docker compose up –build The website ready at http://localhost:8080 Conclusion We’ve just build a foundational movie reservation system using Spring Boot. This project laid the groundwork to explore more complex web applications. We can enhance this system by adding features like user authentication, payment integration, and real-time

More
Pushing Data to GridDB Cloud with Kafka HTTP Sink Connector

As we have discussed before, Kafka is an invaluable tool when dealing with certain IoT workloads. Kafka can guarantee a robust pipeline of streaming your sensor data into almost anywhere due to its high flexibility and various connectors. And indeed, we have previously written articles about using GridDB’s official Kafka Source & Sink connectors to stream your data from place A to GridDB and vice versa. On the heels of GridDB Cloud now being free for most users worldwide, we thought we could again revisit using Kafka with GridDB, but now instead we would like to push our sensor data into the cloud using the Web API. To accomplish this, we needed to find an HTTP Sink Kafka connector and ensure that it could meet our requirements (namely data transformations and being able to change the HTTP method). Eventually we landed on using Confluent’s own HTTP Sink connector, as it was the only one we could find which allowed for us to use the PUT method when making our HTTP Requests. As for transforming the data, Kafka already provided a method of doing this with something they call SMT (Single Message Transform). And then finally, the last challenge we needed to overcome is being able to securely push our data through HTTPS as GridDB cloud’s endpoint is protected by SSL. Following Along All source code for this project are available on our GitHub page. $ git clone https://github.com/griddbnet/Blogs.git –branch kafka_http Within that repo you will find the source code, the docker compose file, and the SSL certificates. As this entire project is dockerized, to run the project yourself, you will simply need docker installed. From there, you can run the project: docker compose up -d. We have already included the .jar file in the library dir so you won’t need to build the custom SMT code to push data to GridDB Cloud. Implementation To connect to push data to GridDB Cloud via the Web API, you must make an HTTP Request with a data structure that the Web API expects. If you look at the docs, you will see that to push data into a container we need to ensure a couple of things: first we need to ensure we make a PUT HTTP Request. Second, we need to ensure the data is set up as an array of arrays in the order of the schema. For example: [ [“2025-01-16T10:25:00.253Z”, 100.5, “normal”], [“2025-01-16T10:35:00.691Z”, 173.9, “normal”], [“2025-01-16T10:45:00.032Z”, 173.9, null] ] In order to get our Kafka messages to output messages like this, we will need to write a custom SMT. Here’s an excellent article on how flexible and useful these can be: Single Message Transformations – The Swiss Army Knife of Kafka Connect. Once we have the SMT finished, we can set up our SSL rules and certs and then make our connectors and topics via Confluent’s UI or through JSON files. Single Message Transformations The code to get this working is not very complicated, essentially we want to take an objject structure coming in from a typical Kafka message and transform into an array of arrays with all of the values parsed out. We will ensure that the index positions match our schema outside of the context of the SMT. As mentioned earlier, the .jar file is included within this project so you don’t need to do anything else, but if you would like to build it yourself or make changes, you can use mvn to build it. Here is the full Java code (it’s also available in this repo in the smt directory). @Override public R apply(R record) { final Schema schema = operatingSchema(record); if (schema == null) { final Map<String, Object> value = requireMapOrNull(operatingValue(record), PURPOSE); return newRecord(record, null, value == null ? null : fieldPath.valueFrom(value)); } else { final Struct value = requireStructOrNull(operatingValue(record), PURPOSE); fieldNames = schema.fields(); List<List<Object>> nestedArray = new ArrayList<>(); List<Object> row = new ArrayList<>(); for (Field f : fieldNames) { String fName = f.name(); SingleFieldPath fPath = new SingleFieldPath(fName, FieldSyntaxVersion.V2); row.add(fPath.valueFrom(value)); } nestedArray.add(row); return newRecord(record, schema, value == null ? null : nestedArray); } } The main method we will be using is this apply function. We extract all of the values from the incoming messages, remove the field names, and make a new array of arrays and return that new array. That’s it! Of course there’s more to it, but this is the important bit. Now that we’ve got the structure we need, let’s set up our connectors and SSL information. Docker SSL Parameters Because GridDB Cloud’s endpoint is SSL protected, we need to ensure that our Kafka broker and HTTP Sink have the proper SSL Certs in place to securely communicate with the endpoint. If we miss any part of the process, the connection will fail with various errors, including the dreaded Handshake failed. Based on the docker-compose file I used as the base for this project, to get SSL working, we will need to add a ton SSL environment values for our broker and kafka-connect. Here are some of the values I added to the broker in order for it to get SSL working KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: ‘CONTROLLER:PLAINTEXT, PLAINTEXT:PLAINTEXT, PLAINTEXT_HOST:PLAINTEXT, SSL:SSL’ KAFKA_ADVERTISED_LISTENERS: ‘PLAINTEXT://broker:29092, PLAINTEXT_HOST://localhost:9092, SSL://broker:9093’ KAFKA_SSL_KEYSTORE_FILENAME: kafka.kafka-1.keystore.pkcs12 KAFKA_SSL_KEYSTORE_CREDENTIALS: kafka-1_keystore_creds KAFKA_SSL_KEY_CREDENTIALS: kafka-1_sslkey_creds KAFKA_SSL_TRUSTSTORE_FILENAME: kafka.client.truststore.jks KAFKA_SSL_TRUSTSTORE_CREDENTIALS: kafka-1_trustore_creds KAFKA_SECURITY_PROTOCOL: ‘SSL’ KAFKA_SASL_MECHANISM: ‘plain’ KAFKA_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM: KAFKA_LISTENERS: ‘PLAINTEXT://broker:29092, CONTROLLER://broker:29093, PLAINTEXT_HOST://0.0.0.0:9092, SSL://broker:9093′ On top of adding these values, we also needed to generate these certificate files and copy them to the docker containers using a mounted volume. Generating SSL Certificates First, let’s take a look at the .pkcs12 file, which is the SSL_KEYSTORE_FILE. This is a file you can generate on your local working machine, to do so, I followed a guide which gave me the following instructions: $ openssl req -new -nodes \ -x509 \ -days 365 \ -newkey rsa:2048 \ -keyout ca.key \ -out ca.crt $ openssl req -new \ -newkey rsa:2048 \ -keyout kafka-1.key \ -out kafka-1.csr \ -nodes $ openssl x509 -req \ -days 3650 \ -in kafka-1.csr \ -CA ca.crt \ -CAkey ca.key \ -CAcreateserial \ -out kafka-1.crt \ -extensions v3_req $ openssl pkcs12 -export \ -in kafka-1.crt \ -inkey afka-1.key \ -chain \ -CAfile ca.pem \ -name kafka-1 \ -out kafka-1.p12 \ -password pass:confluent $ keytool -importkeystore \ -deststorepass confluent \ -destkeystore kafka.kafka-1.keystore.pkcs12 \ -srckeystore kafka-1.p12 \ -deststoretype PKCS12 \ -srcstoretype PKCS12 \ -noprompt \ -srcstorepass confluent With that out of the way, we will also need to tell our server that the GridDB Cloud is safe by grabbing its certs and then generating some certs and including them into our broker and connect. From the GridDB Cloud web dashboard, if you click on the lock icon from the browser, you can view/manage the SSL Certificates. From that menu, you can download the .pem files. Alternatively, you can use the CLI: openssl s_client -showcerts -connect cloud5197.griddb.com:443. With the output, you can save the portions that say BEGIN CERTIFICATE to END CERTIFICATE into a separate file. Armed with this file, you can generate a truststore file to let your server know it’s a trusted location. $ keytool -import -trustcacerts -alias griddb-cloud-cert -file ca.pem -keystore kafka.client.truststore.jks -storepass confluent -v Now we have the two key files (kafka.kafka-1.keystore.pkcs12 && kafka.client.truststore.jks) needed for secure communication with GridDB Cloud — cool! Connector Clients This next step is where we actually tell our kafka cluster which data we want streaming to where. So in this case, we will make a test topic with a simple schema of just three values: { “connect.name”: “net.griddb.webapi.griddb”, “connect.parameters”: { “io.confluent.connect.avro.field.doc.data”: “The string is a unicode character sequence.”, “io.confluent.connect.avro.field.doc.temp”: “The double type is a double precision (64-bit) IEEE 754 floating-point number.”, “io.confluent.connect.avro.field.doc.ts”: “The int type is a 32-bit signed integer.”, “io.confluent.connect.avro.record.doc”: “Sample schema to help you get started.” }, “doc”: “Sample schema to help you get started.”, “fields”: [ { “doc”: “The int type is a 32-bit signed integer.”, “name”: “ts”, “type”: “int” }, { “doc”: “The double type is a double precision (64-bit) IEEE 754 floating-point number.”, “name”: “temp”, “type”: “double” }, { “doc”: “The string is a unicode character sequence.”, “name”: “data”, “type”: “double” } ], “name”: “griddb”, “namespace”: “net.griddb.webapi”, “type”: “record” } Before we try pushing our data to GridDB Cloud, we will need to create our container inside of our DB. You can use the Dashboard or simply send a CURL request using Postman or the CLI to create the container to match that schema. For me, I’m calling it kafka. In this case, I’m not going to make a Time Series container and will settle for a Collection container for educational purposes. We will then make a source connector provided by Confluent to generate mock data in the style of that schema. Once you have it set up, it looks like this in the dashboard: Next, we make a connector for the HTTP Sink which takes that source connector’s mock data and streams it out to the HTTP we set it to (hint: it’s GridDB Cloud!). But as the data moves through from the source to the sink, we will of course apply our SMT to change the data into an array of arrays to push to GridDB Cloud. And if we configured our SSL correctly, we should see our data inside of our GridDB Cloud container. Connector Client Values and Rules To send the connectors to your Kafka cluster, you can either manually enter in the values using the Kafka Control Center, which provides a nice UI for editing connectors, or simply take the .json files included with this repo and pushing them using CURL. Here are the values for the datagen which creates mock data for our GridDB Cloud to ingest: { “name”: “web_api_datagen”, “config”: { “connector.class”: “io.confluent.kafka.connect.datagen.DatagenConnector”, “kafka.topic”: “griddb_test”, “schema.string”: “{ \”connect.name\”: \”net.griddb.webapi.griddb\”, \”connect.parameters\”: { \”io.confluent.connect.avro.field.doc.data\”: \”The string is a unicode character sequence.\”, \”io.confluent.connect.avro.field.doc.temp\”: \”The double type is a double precision (64-bit) IEEE 754 floating-point number.\”, \”io.confluent.connect.avro.field.doc.ts\”: \”The int type is a 32-bit signed integer.\”, \”io.confluent.connect.avro.record.doc\”: \”Sample schema to help you get started.\” }, \”doc\”: \”Sample schema to help you get started.\”, \”fields\”: [ { \”doc\”: \”The int type is a 32-bit signed integer.\”, \”name\”: \”ts\”, \”type\”: \”int\” }, { \”doc\”: \”The double type is a double precision (64-bit) IEEE 754 floating-point number.\”, \”name\”: \”temp\”, \”type\”: \”double\” }, { \”doc\”: \”The string is a unicode character sequence.\”, \”name\”: \”data\”, \”type\”: \”double\” } ], \”name\”: \”griddb\”, \”namespace\”: \”net.griddb.webapi\”, \”type\”: \”record\” }” } } It is messy, but that’s because the schema string includes the raw string of the schema I shared earlier (up above). And here are the values of the HTTP Sink itself: { “name”: “griddb_web_api_sink”, “config”: { “connector.class”: “io.confluent.connect.http.HttpSinkConnector”, “transforms”: “nestedList”, “topics”: “griddb”, “transforms.nestedList.type”: “net.griddb.GridDBWebAPITransform$Value”, “transforms.nestedList.fields”: “ts”, “http.api.url”: “https://cloud5197.griddb.com/griddb/v2/gs_clustermfcloud97/dbs/ZUlQ8/containers/kafka/rows”, “request.method”: “put”, “headers”: “Content-Type: application/json”, “auth.type”: “basic”, “connection.user”: “user”, “connection.password”: “password”, “https.ssl.key.password”: “confluent”, “https.ssl.keystore.key”: “”, “https.ssl.keystore.location”: “/etc/kafka/secrets/kafka.kafka-1.keystore.pkcs12”, “https.ssl.keystore.password”: “confluent”, “https.ssl.truststore.location”: “/etc/kafka/secrets/kafka.client.truststore.jks”, “https.ssl.truststore.password”: “confluent”, “https.ssl.enabled.protocols”: “”, “https.ssl.keystore.type”: “PKCS12”, “https.ssl.protocol”: “TLSv1.2”, “https.ssl.truststore.type”: “JKS”, “reporter.result.topic.replication.factor”: “1”, “reporter.error.topic.replication.factor”: “1”, “reporter.bootstrap.servers”: “broker:29092” } } Some important values here: of course the SSL values and certs, as well as the URL as this contains the container name (kafka in our case). We also have our BASIC AUTHENICATION values in here as well as our SMT. All of this information is crucial to ensure that our Kafka cluster streams our mock data to the proper place with zero errors. You can push these connectors using HTTP Requests: $ #!/bin/sh curl -s \ -X “POST” “http://localhost:8083/connectors/” \ -H “Content-Type: application/json” \ -d ‘{ “name”: “griddb_web_api_sink”, “config”: { “connector.class”: “io.confluent.connect.http.HttpSinkConnector”, “transforms”: “nestedList”, “topics”: “griddb_test”, “transforms.nestedList.type”: “net.griddb.GridDBWebAPITransform$Value”, “transforms.nestedList.fields”: “ts”, “http.api.url”: “https://cloud5197.griddb.com/griddb/v2/gs_clustermfcloud97/dbs/ZUlQ8/containers/kafka/rows”, “request.method”: “put”, “headers”: “Content-Type: application/json”, “auth.type”: “basic”, “connection.user”: “user”, “connection.password”: “password”, “https.ssl.key.password”: “confluent”, “https.ssl.keystore.key”: “”, “https.ssl.keystore.location”: “/etc/kafka/secrets/kafka.kafka-1.keystore.pkcs12”, “https.ssl.keystore.password”: “confluent”, “https.ssl.truststore.location”: “/etc/kafka/secrets/kafka.client.truststore.jks”, “https.ssl.truststore.password”: “confluent”, “https.ssl.enabled.protocols”: “”, “https.ssl.keystore.type”: “PKCS12”, “https.ssl.protocol”: “TLSv1.2”, “https.ssl.truststore.type”: “JKS”, “reporter.result.topic.replication.factor”: “1”, “reporter.error.topic.replication.factor”: “1”, “reporter.bootstrap.servers”: “broker:29092” } }’ And then the same thing for the source connector. The main thing to take away from this section is the values you need to enter to successfully push your data from Kafka to GridDB Cloud. For example, you can see in the transforms section that we are using the SMT we wrote and built earlier. Results First, let’s take a look at our logs to see if our data is going through $ docker logs -f connect Here you should see some sort of output. You can also check your Control Center and ensure that the GridDB Web API Sink doesn’t have any errors. For me, this is what it looks like: And then of course, let’s check our GridDB dashboard to ensure our data is being routed to the correct container: Conclusion And with that, we have successfully pushed data from Kafka over to GridDB Cloud. For some next steps, you could try chaining SMTs to convert the mock data TS into timestamps that GridDB can understand and push to a time series

More