Blog

Building a Stock Market Chatbot Application Using LangGraph and GridDB

With the advent of Large Language Models (LLMs), chatbot applications have become increasingly common, enabling more natural and intelligent interactions with data. In this article, you will see how to build a stock market chatbot using LangGraph, OpenAI API, and GridDB cloud. We will retrieve historical Apple stock price data from Yahoo Finance using the yfinance library, insert it into a GridDB container, and then query it using a chatbot built with LangGraph that utilizes the OpenAI GPT -4 model. GridDB is a high-performance time-series database designed for massive real-time workloads. Its support for structured containers, built-in compression, and lightning-fast reads and writes makes it ideal for storing and querying time series data such as stock market prices. Installing and Importing Required Libraries !pip install -q yfinance !pip install langchain !pip install langchain-core !pip install langchain-community !pip install langgraph !pip install langchain_huggingface !pip install tabulate !pip uninstall -y pydantic !pip install –no-cache-dir “pydantic>=2.11,<3” import yfinance as yf import pandas as pd import json import datetime as dt import base64 import requests import numpy as np from langchain_core.prompts import ChatPromptTemplate from langchain_openai import ChatOpenAI from langchain_core.output_parsers import StrOutputParser from langgraph.graph import START, END, StateGraph from langchain_core.messages import HumanMessage from langgraph.checkpoint.memory import MemorySaver from langchain_experimental.agents import create_pandas_dataframe_agent from langchain_openai import OpenAI from langchain.agents.agent_types import AgentType from typing_extensions import List, TypedDict from pydantic import BaseModel, Field from IPython.display import Image, display Inserting and Retrieving Stock Market Data From GridDB We will first import data from Yahoo Finance into a Python application. Next, we will insert this data into a GridDB container and will retrieve it. Importing Data from Yahoo Finance The yfinance Python library allows you to import data from Yahoo Finance. You need to pass the ticker name, as well as the start and end dates, for the data you want to download. The following script downloads the Apple stock price data for the year 2024. import yfinance as yf import pandas as pd ticker = “AAPL” start_date = “2024-01-01” end_date = “2024-12-31” dataset = yf.download(ticker, start=start_date, end=end_date, auto_adjust=False) # ─────────────────────────────────────────────────────────────── # 1. FLATTEN: keep the level that holds ‘Close’, ‘High’, … # ─────────────────────────────────────────────────────────────── if isinstance(dataset.columns, pd.MultiIndex): # find the level index where ‘Close’ lives for lvl in range(dataset.columns.nlevels): level_vals = dataset.columns.get_level_values(lvl) if ‘Close’ in level_vals: dataset.columns = level_vals # keep that level break else: # already flat – nothing to do pass # ─────────────────────────────────────────────────────────────── # 2. Select OHLCV, move index to ‘Date’ # ─────────────────────────────────────────────────────────────── dataset = dataset[[‘Close’, ‘High’, ‘Low’, ‘Open’, ‘Volume’]] dataset = dataset.reset_index().rename(columns={‘index’: ‘Date’}) dataset[‘Date’] = pd.to_datetime(dataset[‘Date’]) # optional: reorder columns dataset = dataset[[‘Date’, ‘Close’, ‘High’, ‘Low’, ‘Open’, ‘Volume’]] dataset.columns.name = None dataset.head() Output: The above output indicates that the dataset comprises the daily closing, opening, high, low, and volume prices for Apple stock. In the section, you will see how to insert this data into a GridDB cloud container. Establishing a Connection with GridDB Cloud After you create your GridDB cloud account and complete configuration settings, you can run the following script to see if you can access your database within a Python application. username = “your_user_name” password = “your_password” base_url = “your_griddb_host_url” url = f”{base_url}/checkConnection” credentials = f”{username}:{password}” encoded_credentials = base64.b64encode(credentials.encode()).decode() headers = { ‘Content-Type’: ‘application/json’, # Added this header to specify JSON content ‘Authorization’: f’Basic {encoded_credentials}’, ‘User-Agent’: ‘PostmanRuntime/7.29.0’ } response = requests.get(url, headers=headers) print(response.status_code) print(response.text) Output: 200 The above output indicates that you have successfully connected with your GridDB cloud host. Creating a Container for Inserting Stock Market Data in GridDB Cloud Next, we will insert the Yahoo Finance into GridDB. To do so, we will add another column, SerialNo which contains unique keys for each data row, as GridDB expects a unique key column in the dataset. Next, we will map Pandas dataframe column types to Gridb data types. dataset.insert(0, “SerialNo”, dataset.index + 1) dataset[‘Date’] = pd.to_datetime(dataset[‘Date’]).dt.strftime(‘%Y-%m-%d’) # “2024-01-02” dataset.columns.name = None # Mapping pandas dtypes to GridDB types type_mapping = { “int64”: “LONG”, “float64”: “DOUBLE”, “bool”: “BOOL”, ‘datetime64’: “TIMESTAMP”, “object”: “STRING”, “category”: “STRING”, } # Generate the columns part of the payload dynamically columns = [] for col, dtype in dataset.dtypes.items(): griddb_type = type_mapping.get(str(dtype), “STRING”) # Default to STRING if unknown columns.append({ “name”: col, “type”: griddb_type }) columns Output: [{‘name’: ‘SerialNo’, ‘type’: ‘LONG’}, {‘name’: ‘Date’, ‘type’: ‘STRING’}, {‘name’: ‘Close’, ‘type’: ‘DOUBLE’}, {‘name’: ‘High’, ‘type’: ‘DOUBLE’}, {‘name’: ‘Low’, ‘type’: ‘DOUBLE’}, {‘name’: ‘Open’, ‘type’: ‘DOUBLE’}, {‘name’: ‘Volume’, ‘type’: ‘LONG’}] The above output displays the dataset column names and their corresponding GridDB-compliant data types. The next step is to create a GridDB container. To do so, you need to pass the container name, container type, and a list of column names and their data types. url = f”{base_url}/containers” container_name = “stock_db” # Create the payload for the POST request payload = json.dumps({ “container_name”: container_name, “container_type”: “COLLECTION”, “rowkey”: True, # Assuming the first column as rowkey “columns”: columns }) # Make the POST request to create the container response = requests.post(url, headers=headers, data=payload) # Print the response print(f”Status Code: {response.status_code}”) Adding Stock Data to GridbDB Cloud Container Once you have created a container, you must convert the data from your Pandas dataframe into the JSON format and call a put request to insert data into GridDB. url = f”{base_url}/containers/{container_name}/rows” # Convert dataset to list of lists (row-wise) with proper formatting def format_row(row): formatted = [] for item in row: if pd.isna(item): formatted.append(None) # Convert NaN to None elif isinstance(item, bool): formatted.append(str(item).lower()) # Convert True/False to true/false elif isinstance(item, (int, float)): formatted.append(item) # Keep integers and floats as they are else: formatted.append(str(item)) # Convert other types to string return formatted # Prepare rows with correct formatting rows = [format_row(row) for row in dataset.values.tolist()] # Create payload as a JSON string payload = json.dumps(rows) # Make the PUT request to add the rows to the container response = requests.put(url, headers=headers, data=payload) # Print the response print(f”Status Code: {response.status_code}”) print(f”Response Text: {response.text}”) Output: Status Code: 200 Response Text: {“count”:251} If you see the above response, you have successfully inserted the data. Retrieving Data from GridDB After inserting the data, you can perform a variety of operations on the dataset. Let’s see how to import data from a GridDB container into a Pandas dataframe. container_name = “stock_db” url = f”{base_url}/containers/{container_name}/rows” # Define the payload for the query payload = json.dumps({ “offset”: 0, # Start from the first row “limit”: 10000, # Limit the number of rows returned “condition”: “”, # No filtering condition (you can customize it) “sort”: “” # No sorting (you can customize it) }) # Make the POST request to read data from the container response = requests.post(url, headers=headers, data=payload) # Check response status and print output print(f”Status Code: {response.status_code}”) if response.status_code == 200: try: data = response.json() print(“Data retrieved successfully!”) # Convert the response to a DataFrame rows = data.get(“rows”, []) stock_dataset = pd.DataFrame(rows, columns=[col for col in dataset.columns]) except json.JSONDecodeError: print(“Error: Failed to decode JSON response.”) else: print(f”Error: Failed to query data from the container. Response: {response.text}”) print(stock_dataset.shape) stock_dataset.head() Output: The above output shows the data retrieved from the GridDB container. We store the data in a Pandas dataframe. You can store the data in any other format if you want. Once you have the data, you can create a variety of AI and data science applications. Creating a Stock Market Chatbot Using GridDB Data In this next section, you will see how to create a simple chatbot in LangGraph framework, which calls the OpenAI API to answer your questions about the Apple stock price you just retrieved from the GridDB. Creating a Graph in LangGraph To create a Graph in LangGraph, you need to define its state. A graph’s state contains attributes that are shared between multiple graph nodes. Since we only need to store questions and answers, we create the following graph state. class State(TypedDict): question: str answer: str We will use the create_pandas_dataframe_agent from LangChain to answer our questions since we retrieved data from Gridb into a Pandas dataframe. We will create the agent object and will call it inside the run_llm() function we define. We will use this function in our LangGraph node. api_key = “YOUR_OPENAI_API_KEY” llm = ChatOpenAI(model = ‘gpt-4o’, api_key = api_key) agent = create_pandas_dataframe_agent(llm, stock_dataset, verbose=True, agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION, allow_dangerous_code=True) def run_llm(state: State): question = state[‘question’] response = agent.invoke(question) return {‘answer’: response[‘output’]} Finally, we define the graph for our chatbot. The graph consists of only one node, ask_question, which calls the run_llm() function. Inside the function, we call the create_pandas_dataframe_agent(), which answers questions about the dataset. graph_builder=StateGraph(State) graph_builder.add_node(“ask_question”, run_llm) graph_builder.add_edge(START,”ask_question”) graph_builder.add_edge(“ask_question”,END) graph = graph_builder.compile() display(Image(graph.get_graph().draw_mermaid_png())) Output: The above output shows the flow of our graph. Asking Questions Let’s test our chatbot by asking some questions. We will first ask our chatbot about the month that had the highest average opening price—also, the month where people made the most profit in day trading. question = [HumanMessage(content=”Which month had the highest average opening stock price? And what is the month where people made most profit in day trading?”)] result = graph.invoke({“question”: question}) print(result[‘answer’]) Output: The output above shows the chatbot’s response. That is correct; I verified it manually using a Python script. Let’s ask it to be more creative and see if it finds any interesting patterns in the dataset. question = [HumanMessage(content=”Do you find any interesting patterns in the dataset?”)] result = graph.invoke({“question”: question}) print(result[‘answer’]) Output: g src=”images\img5-chatbot-response2b.png”> The above output shows the first part of the reply. You can see that the chatbot is intelligent enough to draw a plot for the closing prices to identify interesting patterns. The following output shows some interesting observations from the dataset. Output: g src=”images\img5-chatbot-response2b.png”> Conclusion The article demonstrates how to create an OpenAI API-based chatbot that answers questions related to data retrieved from GridDB. If you have any questions or need help with GridDB cloud, you can leave your query on Stack Overflow using the griddb tag. Our team will be happy to answer it. For the complete code of this article, visit my GridDB Blogs GitHub

More
New Features for the GridDB Cloud CLI Tool

The GridDB Cloud CLI Tool aims to make routine maintenance of checking of your GridDB Cloud instance a much easier endeavor. When we first introduced it, it was able to do your basic CRUD commands, but it lacked the ability to read from the filesystem to create containers/tables and to push data to those containers. Although maybe a subtle difference, essentially before, whenever the tool was run to CREATE, you needed manual intervention (known as interactive mode) to use the tool. I’m sure you know where I am going with this — in this latest release, we have added some functionality revolved around reading JSON files from the filesystem which can help certain workflow, such as to help automate some testing on your GridDB Cloud, for instance. Being able to read from JSON files to create tables also means we are able to migrate from GridDB CE to GridDB Cloud as discussed in this blog: https://griddb.net/en/blog/griddb-cloud-on-azure-marketplace-how-to-migrate-3-ways/. So for this article, since there’s not much in the way of showcasing features, I will walk through the technical updates made to the tool, as well as a simple demo of how this tool can be useful. Creating Container from JSON File First off, I changed the way GridDB Cloud CLI Tool handles the create command as explained above. From now on, when you use Create, if you want to create a new container by filling out CLI prompts, you can use the -i flag to indicate –interactive mode. So if you run: griddb-cloud-cli create -i, it will begin the process of asking a series of questions until you get the container you want. And now if you run griddb-cloud-cli create the tool will expect exactly one argument, a json file. The JSON file format is modeled to be the same as exported by the GridDB CE tool, here’s an example: { “version”:”5.6.0″, “database”:”public”, “container”:”device”, “containerType”:”TIME_SERIES”, “containerFileType”:”csv”, “containerFile”:[ “public.device_2020-07-11_2020-07-12.csv”, “public.device_2020-07-12_2020-07-13.csv”, “public.device_2020-07-13_2020-07-14.csv”, “public.device_2020-07-14_2020-07-15.csv”, “public.device_2020-07-15_2020-07-16.csv”, “public.device_2020-07-16_2020-07-17.csv”, “public.device_2020-07-17_2020-07-18.csv”, “public.device_2020-07-18_2020-07-19.csv”, “public.device_2020-07-19_2020-07-20.csv” ], “partitionNo”:14, “columnSet”:[ { “columnName”:”ts”, “type”:”timestamp”, “notNull”:true }, { “columnName”:”co”, “type”:”double”, “notNull”:false }, { “columnName”:”humidity”, “type”:”double”, “notNull”:false }, { “columnName”:”light”, “type”:”boolean”, “notNull”:false }, { “columnName”:”lpg”, “type”:”double”, “notNull”:false }, { “columnName”:”motion”, “type”:”boolean”, “notNull”:false }, { “columnName”:”smoke”, “type”:”double”, “notNull”:false }, { “columnName”:”temp”, “type”:”double”, “notNull”:false } ], “rowKeySet”:[ “ts” ], “timeSeriesProperties”:{ “compressionMethod”:”NO”, “compressionWindowSize”:-1, “compressionWindowSizeUnit”:”null”, “expirationDivisionCount”:-1, “rowExpirationElapsedTime”:-1, “rowExpirationTimeUnit”:”null” }, “compressionInfoSet”:[ ], “timeIntervalInfo”:[ { “containerFile”:”public.device_2020-07-11_2020-07-12.csv”, “boundaryValue”:”2020-07-11T17:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-12_2020-07-13.csv”, “boundaryValue”:”2020-07-12T00:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-13_2020-07-14.csv”, “boundaryValue”:”2020-07-13T00:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-14_2020-07-15.csv”, “boundaryValue”:”2020-07-14T00:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-15_2020-07-16.csv”, “boundaryValue”:”2020-07-15T00:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-16_2020-07-17.csv”, “boundaryValue”:”2020-07-16T00:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-17_2020-07-18.csv”, “boundaryValue”:”2020-07-17T00:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-18_2020-07-19.csv”, “boundaryValue”:”2020-07-18T00:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-19_2020-07-20.csv”, “boundaryValue”:”2020-07-19T00:00:00.000-0700″ } ] } Now, obviously, if you were just writing your own json files, you wouldn’t need a lot of the information from the migration tool, so instead you can create something with just the bare minimum. { “database”:”public”, “container”:”device10″, “containerType”:”TIME_SERIES”, “columnSet”:[ { “columnName”:”ts”, “type”:”timestamp”, “notNull”:true }, { “columnName”:”co”, “type”:”double”, “notNull”:false }, { “columnName”:”humidity”, “type”:”double”, “notNull”:false }, { “columnName”:”light”, “type”:”boolean”, “notNull”:false }, { “columnName”:”lpg”, “type”:”double”, “notNull”:false }, { “columnName”:”motion”, “type”:”boolean”, “notNull”:false }, { “columnName”:”smoke”, “type”:”double”, “notNull”:false }, { “columnName”:”temp”, “type”:”double”, “notNull”:false } ], “rowKeySet”:[ “ts” ] } Small note, you can also use the -f flag to avoid the confirmation of the container you are about to create, again avoiding manual user input when committing an action. Technical Details of Implementing Let’s take a quick look at the underlying code of writing a program which can take the JSON file from the filesystem and push it to GridDB Cloud to be made into a new container. First, let’s look at interpreting the file after being read from the filesystem: func ParseJson(jsonName string) (cmd.ContainerInfo, []string) { filename := jsonName properties, err := os.ReadFile(filename) if err != nil { log.Fatal(err) } var exportProperties ExportProperties err = json.Unmarshal(properties, &exportProperties) if err != nil { log.Fatal(err) } //fmt.Println(exportProperties) var conInfo cmd.ContainerInfo conInfo.ContainerName = exportProperties.Container conInfo.ContainerType = exportProperties.ContainerType conInfo.RowKey = len(exportProperties.RowKeySet) > 0 cols := transformToConInfoCols(exportProperties.ColumnSet) conInfo.Columns = cols return conInfo, exportProperties.ContainerFile } Here we read the file contents, create a data structure of type ExportProperties which was declared earlier (and shown below). We unmarshal (think of mapping) the json file to match our struct so that our program knows which values correspond to which keys. From there, we do some other processing to create the data object which will be sent to GriddB Cloud via an HTTP Web Request. // data struct to unmarshal user json file type ExportProperties struct { Version string `json:”version,omitempty”` Database string `json:”database,omitempty”` Container string `json:”container”` ContainerType string `json:”containerType,omitempty”` ContainerFileType string `json:”containerFileType,omitempty”` ContainerFile ContainerFile `json:”containerFile”` ColumnSet []ColumnSet `json:”columnSet”` RowKeySet []string `json:”rowKeySet”` } Here we define the entire struct which may be read from either the user-made JSON, or the GridDB CE migration tool-generated JSON (notice the keys which are omitted in the user JSONs have a special designation of omitempty). One last thing I’ll point out is that we needed to use a custom unmarshaller for this specific struct (ContainerFile) because when the GridDB export tool outputs the meta JSON file, the containerFile value is sometimes a single string, and sometimes it’s a slice of strings. Here’s the custom unmarshaller: // custom JSON unmarshaller for the case where sometimes the value is a slice // and sometimes it’s just a singular string func (c *ContainerFile) UnmarshalJSON(data []byte) error { var nums any err := json.Unmarshal(data, &nums) if err != nil { return err } items := reflect.ValueOf(nums) switch items.Kind() { case reflect.String: *c = append(*c, items.String()) case reflect.Slice: *c = make(ContainerFile, 0, items.Len()) for i := 0; i < items.Len(); i++ { item := items.Index(i) switch item.Kind() { case reflect.String: *c = append(*c, item.String()) case reflect.Interface: *c = append(*c, item.Interface().(string)) } } } return nil } Examples Of Using New Features As explained above, you can now use the tool as part of your workflow and bash scripting. So you can, for example, run a script with cron to track your Downloads folder list of files and general file size. Let's take a look at a working example. Bash Script to Track Downloads Directory As a silly example of how you can use the tool in your bash scripts, let's create a TIME_SERIES container which will keep track of the status of your downloads directory. We will create a JSON file with the container's schema, then we will use a simple bash script to push the current datetime and a couple of data points to our table. We then schedule a cron to run the script every 1 hour so that we have up to date data. First, the JSON file: { "database": "public", "container": "download_data", "containerType": "TIME_SERIES", "columnSet": [ { "columnName": "ts", "type": "timestamp", "notNull": true }, { "columnName": "file_count", "type": "integer", "notNull": false }, { "columnName": "total_size", "type": "string", "notNull": false } ] } And then our script, which will create the table (I realize it's not the best showcase as we are using just one table, but you could also create a situation where you push logs to a table and create new daily tables, kind of like fluentd), and then push up to date data of your directory: #!/bin/bash DOWNLOADS_DIR="/Users/israelimru/Downloads" FILE_COUNT=$(ls -1A "$DOWNLOADS_DIR" | wc -l | xargs) TOTAL_SIZE=$(du -sh "$DOWNLOADS_DIR" | awk '{print $1}') LOG_DATA="NOW(),$FILE_COUNT,$TOTAL_SIZE" griddb-cloud-cli create /Users/israelimru/download_table.json -f griddb-cloud-cli put -n download_data -v $LOG_DATA echo "Log complete." NOTE: It's crucial to use the -f flag when calling the create command because it won't ask for a prompt, it will simply create the table for you (and simply fail if the table already exists, which is fine!) Here you can see our LOG_DATA uses the NOW() variable; this will be auto translated by the tool to be a current timestamp in the proper format. And then we just grab the relevant data points using simple unix commands and push that data to the Cloud. Then set up the cronjob: 0 * * * * /your/script/testing.sh Conclusion Being able to use the GridDB Cloud CLI Tool without user intervention opens up some new possibilities and we hope these changes can be useful for you and your team. To grab a copy of the cloud tool, you can check here: https://github.com/Imisrael/griddb-cloud-cli. Source code for this blog is simply that bash script up in the section prior, so feel free to copy and use as you

More
Generate Podcast from PDF using AI

Why This Project? Turn your complex PDFs into clear, natural-sounding AI-powered podcasts instantly. Ideal for busy professionals, auditory learners, and passive learning workflows. Core Benefits: Hands-free learning, ideal during commutes or chores. Easily transforms technical PDFs into engaging audio. Drastically reduces content-creation time. Problem & Why This Matters Converting PDFs manually is tedious: Difficult text extraction from PDFs. Complex manual summarization and audio recording overhead. Messy PDF management/storage. Your AI-powered podcast solution automates every step—solving these clearly. Introducing the AI-powered PDF-to-Podcast Generation System The diagram above illustrates the simplified workflow of the AI-powered PDF-to-podcast system: User Uploads PDF: The user submits a PDF document to the platform. OCR Text Extraction (Mistral AI OCR): Mistral AI OCR accurately extracts the text content from the uploaded PDF. Summarize Content (OpenAI): The extracted text is summarized by OpenAI to simplify complex content for easier listening. Convert Text to Podcast (OpenAI TTS): OpenAI’s Text-to-Speech converts the summarized text into natural, engaging audio. Store Data (GridDB Cloud): The summarized text and associated data are efficiently stored in the GridDB Cloud for future retrieval. Podcast Playback: Users access the simplified, engaging podcast directly for convenient listening. Run the Project To run this project, you should clone the apps directory from this repository. git clone https://github.com/junwatu/griddb-podcast-creator.git cd griddb-podcast-creator cd apps npm install Create a .env file for the project credentials with these keys: MISTRAL_API_KEY= OPENAI_API_KEY= GRIDDB_WEBAPI_URL= GRIDDB_PASSWORD= GRIDDB_USERNAME= You need the Mistral API key for OCR functionality, the OpenAI key for the text-to-speech (TTS) conversion, and GridDB keys for the data storage. Please, look in this section on how to get those keys. Run the project using this command: npm run dev Go to the default app URL http://localhost:3000 Browse for the PDF file and click the Convert to Podcast button to generate podcast. Prerequisites Node.js You need Node.js installed because this project uses Next.js. Install the Node LTS version from here. Mistral OCR API Setup A Mistral API key is needed to use the OCR functionality. Create the API key here. OpenAI API Setup Create the OpenAI API key here. You may need create a project and enable few models. In this project, we will use two AI models from OpenAI: gpt-4o to create audio script. gpt-4o-mini-tts for generating audio from the script. GridDB Cloud Setup The GridDB Cloud offers a free plan tier and is officially available worldwide. You need these GridDB environment variables in the .env file: GRIDDB_WEBAPI_URL= GRIDDB_USERNAME= GRIDDB_PASSWORD= Sign Up for GridDB Cloud Free Plan If you would like to sign up for a GridDB Cloud Free instance, you can do so in the following link: https://form.ict-toshiba.jp/download_form_griddb_cloud_freeplan_e. After successfully signing up, you will receive a free instance along with the necessary details to access the GridDB Cloud Management GUI, including the GridDB Cloud Portal URL, Contract ID, Login, and Password. GridDB WebAPI URL Go to the GridDB Cloud Portal and copy the WebAPI URL from the Clusters section. It should look like this: GridDB Username and Password Go to the GridDB Users section of the GridDB Cloud portal and create or copy the username for GRIDDB_USERNAME. The password is set when the user is created for the first time, use this as the GRIDDB_PASSWORD. For more details, to get started with GridDB Cloud, please follow this quick start guide. IP Whitelist When running this project, please ensure that the IP address where the project is running is whitelisted. Failure to do so will result in a 403 status code or forbidden access. You can use a website like What Is My IP Address to find your public IP address. To whitelist the IP, go to the GridDB Cloud Admin and navigate to the Network Access menu. Building the Podcast Generator Developing the Next.js Web Interface & API The main important code is the API route that handles PDF upload and then processes it. This is the snippet code from the route.ts file in the apps/app/api/upload directory: export async function POST(request: NextRequest) { try { const formData = await request.formData(); const file = formData.get(‘file’) as File; if (!file) { return NextResponse.json( { error: ‘No file uploaded’ }, { status: 400 } ); } if (file.type !== ‘application/pdf’) { return NextResponse.json( { error: ‘Invalid file type. Please upload a PDF file’ }, { status: 400 } ); } const maxSize = 10 * 1024 * 1024; // 10MB in bytes if (file.size > maxSize) { return NextResponse.json( { error: ‘File size too large. Maximum size is 10MB’ }, { status: 400 } ); } const bytes = await file.arrayBuffer(); const buffer = Buffer.from(bytes); const timestamp = Date.now(); const randomString = Math.random().toString(36).substring(7); const tempFilename = `upload_${timestamp}_${randomString}.pdf`; const tempFilePath = join(os.tmpdir(), tempFilename); await writeFile(tempFilePath, buffer); // Extract data from PDF const { content: pdfContent, response: ocrResponse } = await ocrService.processFile(tempFilePath, file.name); // Generate script for the audio from the extracted data const audioScript = await openaiService.generatePodcastScript(pdfContent); // Generate audio from the script const audioFiles = await generatePodcastAudio(audioScript, process.env.OPENAI_API_KEY || ”, { voice: ‘verse’, outputDir: audioDir, instructions: instructions, outputFormat: ‘mp3’, }); const cleanedAudioFiles = cleanAudioPaths(audioFiles); // Save the data into GridDB database const podcastData: GridDBData = { id: generateRandomID(), audioFiles: JSON.stringify(cleanedAudioFiles), audioScript: JSON.stringify(audioScript), // @ts-ignore ocrResponse: JSON.stringify(ocrResponse), }; const result = await dbClient.insertData({ data: podcastData }); return NextResponse.json({ message: ‘File uploaded successfully’, fileName: file.name, fileSize: file.size, tempFilePath: tempFilePath, ocrResponse: ocrResponse, audioFiles: audioFiles, audioScript: audioScript, }); } catch (error) { console.error(‘Error uploading file:’, error); return NextResponse.json( { error: ‘Failed to upload file’ }, { status: 500 } ); } } Let’s look at the code in detail: 1. OCR Extraction of PDF Content /** Extracted data from PDF */ const { content: pdfContent, response: ocrResponse } = await ocrService.processFile(tempFilePath, file.name); Function: The uploaded PDF file is sent to an OCR (Optical Character Recognition) service for processing. Outcome: Mistral OCR extracts textual content from the PDF and provides a full response (ocrResponse) and the PDF content (pdfContent). 2. Generating the Podcast Script /** Generate script for the audio from the extracted data */ const audioScript = await openaiService.generatePodcastScript(pdfContent); Function: Pass the extracted textual content (pdfContent) to an OpenAI-powered service. Outcome: A structured podcast script (audioScript) suitable for text-to-speech conversion. 3. Creating Audio from the Generated Script /** Generate audio from the script */ const audioFiles = await generatePodcastAudio(audioScript, process.env.OPENAI_API_KEY || ”, { voice: ‘verse’, outputDir: audioDir, instructions: instructions, outputFormat: ‘mp3′, }); Function: Convert the generated podcast script into audio form using an OpenAI text-to-speech API. Parameters: voice: Determines vocal style (e.g., ‘verse’). outputDir: The destination directory for audio files. In this project, it is set to the public/audio directory. instructions: Extra refinement or instructional parameters for audio quality. outputFormat: Audio format set as ‘mp3’. Outcome: The data location of audio segments (audioFiles). 4. Saving Data to GridDB Database /** Save the data into GridDB database */ const podcastData: GridDBData = { id: generateRandomID(), audioFiles: JSON.stringify(cleanedAudioFiles), audioScript: JSON.stringify(audioScript), // ts ignore // @ts-ignore ocrResponse: JSON.stringify(ocrResponse) } const result = await dbClient.insertData({ data: podcastData }); Function: Collect and organize processed data (ID, audio files, podcast script, OCR response) into a structured object (podcastData) with appropriate type conversions (JSON serialization) to match the GridDB data schema. Outcome: Persistently stores generated audio metadata, associated scripts, and OCR extraction data into the GridDB database, providing future retrieval and management capabilities. In summary, the main functionality follows a clear sequence from our system diagram before. Integrating OCR using Mistral AI Mistral OCR is an Optical Character Recognition API that sets a new standard in document understanding. Unlike other models, Mistral OCR comprehends each element of documents—media, text, tables, equations—with unprecedented accuracy and cognition. It takes images and PDFs as input and extracts content in an ordered interleaved text and images. In this project, we will use Mistral OCR to extract text from PDFs. The process involves: 1. Uploading the PDF file to Mistral. const uploaded_pdf = await this.client.files.upload({ file: { fileName: fileName, content: file, }, purpose: “ocr”, }); 2. Retrieving the signed URL for the uploaded PDF. const signedUrl = await this.client.files.getSignedUrl({ fileId: uploaded_pdf.id, }); 3. Sending the signed URL to Mistral OCR for text extraction. const ocrResponse = await this.client.ocr.process({ model: “mistral-ocr-latest”, document: { type: “document_url”, documentUrl: signedUrl.url, } }); This final step will yield a complete OCR data response from Mistral. Get PDF key points and summarization using OpenAI We won’t convert all the content of the PDF extraction text because it will be to long. The best way is to summarize and get the key points of the extraction data. For this task, we will use the gpt-4o model. This is the system prompt to extract meaningful content from PDF’s extracted data: Create a 5-minute podcast episode script in a conversational style, using the content provided.\n\nInclude the following elements:\n\n- **Introduction**: Engage your audience with an intriguing opening statement related to the topic. Capture their attention immediately.\n\n- **Main Talking Points**: Develop 3-4 main sections discussing the central ideas or arguments. Use relatable examples and personal stories for better understanding. Maintain a conversational tone, as if you are speaking directly to the listener. Ensure natural transitions between sections to keep the flow.\n\n- **Conclusion**: Summarize the key takeaways in a concise manner, making sure to leave a lasting impression.\n\n- **Call to Action**: End with a clear and compelling call to action encouraging listeners to engage further or reflect on the topic.\n\n# Output Format\n\nWrite the script in a conversational and engaging narrative suitable for a podcast. Each section should integrate seamlessly with transitions, emulate a direct speaking style to engage the listener, and reinforce the message.\n\n# Examples\n\n**Introduction**: \”Welcome to [Podcast Name]. Today, we’re diving into [Topic]. Have you ever wondered…?\”\n\n**Main Talking Points**:\n\n1. \”Let’s start with [Main Idea]. It’s like when…\”\n2. \”Moving on to [Next Idea], consider how…\”\n3. \”Finally, when we talk about [Final Idea], there’s a story about…\”\n\n**Conclusion**: \”So, as we’ve learned today, [Key Takeaway 1], [Key Takeaway 2]…\”\n\n**Call to Action**: \”Think about how you can [Action]. Join us next time when we explore…\”\n\n# Notes\n\n- The script should be written to cater both to novices and those with some prior knowledge.\n- Ensure it resonates intellectually and stimulates curiosity among listeners.\n- Use transition words to guide listeners smoothly from one idea to the next. To keep the response consistent, we can use the schema feature. So, basically, we can force the AI model response to match a predefined data structure or schema: { “introduction”: “Welcome to our podcast! Today, we’re exploring how AI can revolutionize the way we consume content by transforming PDFs into engaging audio podcasts. Have you ever wished you could listen to your documents instead of reading them? Let’s dive in!”, “main_talking_points”: [ { “title”: “The Challenges of Manual PDF-to-Podcast Conversion”, “content”: “Manually converting PDFs into podcasts is a tedious process. It involves extracting text, summarizing complex content, and recording audio—all of which take significant time and effort. AI simplifies this by automating these steps, saving you hours of work.” }, { “title”: “How AI Simplifies the Process”, “content”: “AI tools like Mistral OCR and OpenAI TTS streamline the workflow. Mistral OCR extracts text from PDFs with high accuracy, while OpenAI’s models summarize and convert the text into natural-sounding audio. This ensures a seamless and efficient process.” }, { “title”: “The Role of GridDB in Managing Data”, “content”: “GridDB Cloud acts as a robust storage solution for parsed text and audio files. It ensures that your data is organized, easily retrievable, and ready for future use, making the entire system scalable and efficient.” } ], “conclusion”: “In summary, AI-powered tools are transforming the way we interact with content. By automating the conversion of PDFs into podcasts, we save time, enhance accessibility, and create a more engaging learning experience.”, “call_to_action”: “Think about how you can leverage this technology in your own projects. Visit our GitHub repository to get started, and don’t forget to share your feedback!” } The gpt-4o will response data with these keys: introduction main_talking_points conclusion call_to_action With this format then it will be easier to convert the text to audio for our podcast application. For more information about the data schema code, please look into the openai.ts file in apps\app\lib folder. Generating Podcast using OpenAI TTS We will use the gpt-4o-mini-tts model from OpenAI to generate speech from text. This model is capable of controlling the voice of your generated audio with additional instructions. We will process the audio in two processes based on the schema response from the OpenAI model. 1. Process introduction, conclusion, and call_to_action. This code will process the introduction, conclusion, and call to action into audio. const simpleKeys = [‘introduction’, ‘conclusion’, ‘call_to_action’] as const; for (const key of simpleKeys) { try { const text = podcastData[key]; const fileName = `${key}.${outputFormat}`; const speechFile = path.join(outputDir, fileName); const response = await openai.audio.speech.create({ model, voice, instructions, input: text }); const buffer = Buffer.from(await response.arrayBuffer()); fs.writeFileSync(speechFile, buffer); audioFiles[key] = speechFile; } catch (error) { console.error(`Error processing ${key}:`, error); throw error; } } 2. Process main_talking_points This code will process the main content or talking points into audio. // Process main talking points separately if (Array.isArray(podcastData.main_talking_points)) { for (let i = 0; i < podcastData.main_talking_points.length; i++) { try { const point = podcastData.main_talking_points[i]; const text = point.content; const fileName = `talking_point_${i}.${outputFormat}`; const speechFile = path.join(outputDir, fileName); const response = await openai.audio.speech.create({ model, voice, instructions, input: text }); const buffer = Buffer.from(await response.arrayBuffer()); fs.writeFileSync(speechFile, buffer); audioFiles[`talking_point_${i}`] = speechFile; } catch (error) { console.error(`Error processing talking point ${i}:`, error); throw error; } } Storing Data to GridDB Cloud The column data or schema for the GridDB database is simple: export interface GridDBData { id: string | number; ocrResponse: Blob; audioScript: string; audioFiles: string; } Then to save data to the GridDB, in this project the code is in the insertData function: async function insertData({ data, containerName = 'podcasts', }: { data: GridDBData; containerName?: string; }): Promise { console.log(data); try { const row = [ parseInt(data.id.toString(), 10), data.ocrResponse, data.audioScript, data.audioFiles, ]; const path = `/containers/${containerName}/rows`; return await makeRequest(path, [row], 'PUT'); } catch (error) { if (error instanceof GridDBError) { throw error; } const errorMessage = error instanceof Error ? error.message : 'Unknown error'; throw new GridDBError(`Failed to insert data: ${errorMessage}`, undefined, undefined, error); } } The core code actually just PUT operation on REST route path /containers/podcasts/rows. It's so easy to use GridDB on the cloud. The full source code for saving data into GridDB is in the apps/app/lib/griddb.ts file. This file later will be used in the route.ts file as an API. User Interface The code for the user interface resides in a single file, page.tsx. It is developed using React and Shadcn components, featuring two primary tabs: 1. Upload & Convert PDF The user will browse and upload a PDF file. Once the file is selected, the user can click the Convert to Podcast button. This action will make the application process the PDF using OCR to extract text and then generate an audio podcast using AI-generated voices. You can test the application using the sample PDF files available in the pdfs directory. These files are sourced from arXiv and contain highly technical content, making them ideal for evaluating the application's capabilities. 2. Podcast Result After successful conversion, in the Your Podcasts tab the user can listen to the podcast, using the audio player and section list to navigate through the content easily. Possible enhancements Custom Voice Options: Provide users the option for different voices or accents. Podcast hosting integration: Connect your podcasts directly to platforms like Spotify, Apple Podcasts, RSS feeds, etc. Improved UI/UX: Provide users better controls over file management & audio playback. Resources Project source code. GridDB Cloud Web API, Mistral OCR, OpenAI TTS,

More
GridDB Cloud on Azure Marketplace: How To Migrate (3 Ways!)

If you are thinking about switching to the GridDB Cloud Azure Marketplace instance, first, you can read about how to do that here: GridDB Cloud on Microsoft Azure Marketplace. Second, you may be worried about how you may transfer your existing data from your GridDB Free Plan, from your local GridDB CE instance, or even from Postgresql. Here are the distinct sections: Migrating from GridDB Free Plan Migrating From PostgreSQL Migrating From GridDB CE In this blog, we will walkthrough the migration process of moving your data from a GridDB Free Plan, a third party database (postgresql in this case), and a local GridDB CE instance. The process is different for each one, so let’s go through them 1-by-1. Migrating from GridDB Free Plan First of all, if you are unsure what the GridDB Free Plan is, you can look here: GridDB Cloud Quick Start Guide This is by far the easiest method of conducting a full-scale migration. The high level overview is that the TDSL (Toshiba Digital Solution) support team will handle everything for you. TDSL Support When you sign up for the GridDB Pay As You Go plan, as part of the onboarding process, you will receive an email with the template you will need to use when contacting support for various functions, including data migration! So, grab your pertinent information (contract ID, GridDB ID, etc) and the template and let’s send an email. Compose an email to tdsl-ms-support AT toshiba-sol.co.jp with the following template Contract ID: [your id] GridDB ID: [your id] Name: Israel Imru E-mail: imru@fixstars.com Inquiry Details: I would like to migrate from my GridDB Free Plan Instance to my GridDB pay as you go plan Occurrence Date: — Collected Information: — The team will usually respond within one business day to confirm your operation and with further instructions. For me, they sent me the following: Please perform the following operations in the management GUI of the source system. After completing the operations, inform us of the date and time when the operations were performed. 1. Log in to the management GUI. 2. From the menu on the left side of the screen, click [Query]. 3. Enter the following query in the [QUERY EDITOR]: SELECT 2012 4. Click the [Execute] button Best regards, Toshiba Managed Services Support Desk Once I ran the query they asked me, I clicked on query history, and copied the timestamp and sent that over to them. That was all they needed — armed with this information, they told me to wait 1-2 business days and they would seamlessly migrate my instance along with an estimated time slot when the migration would be completed. Once it was done, all of my data, including the IP Whitelist and my Portal Users were all copied over to my pay as you go plan. Cool! Migrating from PostgreSQL There is no official way of doing conducting this sort of migration, so for now, we can try simply exporting our tables into CSV files and then importing those files individually into our Cloud instance. Luckily with the GridDB Cloud CLI tool this process is much easier than ever before. So let’s first export our data and go from there. Exporting PostgreSQL Data First, the dataset I’m working with here is simply dummy data I ingested using a python script. Here’s the script: import psycopg2 import psycopg2.extras from faker import Faker import random import time # — YOUR DATABASE CONNECTION DETAILS — # Replace with your actual database credentials DB_NAME = “template1” DB_USER = “postgres” DB_PASSWORD = “yourpassword” DB_HOST = “localhost” # Or your DB host DB_PORT = “5432” # Default PostgreSQL port # — DATA GENERATION SETTINGS — NUM_RECORDS = 50000 # Initialize Faker fake = Faker() # Generate a list of fake records print(f”Generating {NUM_RECORDS} fake records…”) records_to_insert = [] for _ in range(NUM_RECORDS): name = fake.catch_phrase() # Using a more specific Faker provider quantity = random.randint(1, 1000) price = round(random.uniform(0.50, 500.00), 2) records_to_insert.append((name, quantity, price)) print(“Finished generating records.”) # SQL statements create_table_query = “”” CREATE TABLE IF NOT EXISTS sample_data ( id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL, quantity INTEGER, price REAL ); “”” # Using execute_batch is much more efficient for large inserts insert_query = “INSERT INTO sample_data (name, quantity, price) VALUES %s;” conn = None try: # Establish a connection to the database conn = psycopg2.connect( dbname=DB_NAME, user=DB_USER, password=DB_PASSWORD, host=DB_HOST, port=DB_PORT ) # Create a cursor cur = conn.cursor() # Create the table if it doesn’t exist print(“Ensuring ‘sample_data’ table exists…”) cur.execute(create_table_query) # Optional: Clean the table before inserting new data print(“Clearing existing data from the table…”) cur.execute(“TRUNCATE TABLE sample_data RESTART IDENTITY;”) # Start the timer start_time = time.time() print(f”Executing bulk insert of {len(records_to_insert)} records…”) psycopg2.extras.execute_values( cur, insert_query, records_to_insert, template=None, page_size=1000 # The number of rows to send in each batch ) print(“Bulk insert complete.”) # Commit the changes to the database conn.commit() # Stop the timer end_time = time.time() duration = end_time – start_time print(f”Successfully inserted {cur.rowcount} rows in {duration:.2f} seconds.”) # Close the cursor cur.close() except (Exception, psycopg2.DatabaseError) as error: print(f”Error while connecting to or working with PostgreSQL: {error}”) if conn: conn.rollback() # Roll back the transaction on error finally: # Close the connection if it was established if conn is not None: conn.close() print(“Database connection closed.”) Once you run this script, you will have 50k rows in your PSQL instance. Now let’s export this to CSV: $ psql –host 127.0.0.1 –username postgres –password –dbname template1 psql (14.18 (Ubuntu 14.18-0ubuntu0.22.04.1)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) Type “help” for help. template1=# select COUNT(*) from sample_data; count ——- 50000 (1 row) template1=# COPY sample_data TO ‘/tmp/sample.csv’ WITH (FORMAT CSV, HEADER); COPY 50000 template1=# \q And now that we have our CSV data, let’s install the CLI Tool and ingest it. Ingesting CSV Data into GridDB Cloud You can download the latest CLI Tool from the Github releases page: https://github.com/Imisrael/griddb-cloud-cli/releases. For me, I installed the .deb file $ wget https://github.com/Imisrael/griddb-cloud-cli/releases/download/v0.1.4/griddb-cloud-cli_0.1.4_linux_amd64.deb $ sudo dpkg -i griddb-cloud-cli_0.1.4_linux_amd64.deb $ vim ~/.griddb.yaml And enter your credentials: cloud_url: “https://cloud97.griddb.com:443/griddb/v2/gs_clustermfclo7/dbs/ZQ8” cloud_username: “kG-israel” cloud_pass: “password” And ingest: $ griddb-cloud-cli ingest /tmp/sample.csv ✔ Does this container already exist? … NO Use CSV Header names as your GridDB Container Col names? id,name,quantity,price ✔ Y/n … YES ✔ Container Name: … migrated_data ✔ Choose: … COLLECTION ✔ Row Key? … true ✔ (id) Column Type … INTEGER ✔ Column Index Type1 … TREE ✔ (name) Column Type … STRING ✔ (quantity) Column Type … INTEGER ✔ (price) Column Type … FLOAT ✔ Make Container? { “container_name”: “migrated_data”, “container_type”: “COLLECTION”, “rowkey”: true, “columns”: [ { “name”: “id”, “type”: “INTEGER”, “index”: [ “TREE” ] }, { “name”: “name”, “type”: “STRING”, “index”: null }, { “name”: “quantity”, “type”: “INTEGER”, “index”: null }, { “name”: “price”, “type”: “FLOAT”, “index”: null } ] } … YES {“container_name”:”migrated_data”,”container_type”:”COLLECTION”,”rowkey”:true,”columns”:[{“name”:”id”,”type”:”INTEGER”,”index”:[“TREE”]},{“name”:”name”,”type”:”STRING”,”index”:null},{“name”:”quantity”,”type”:”INTEGER”,”index”:null},{“name”:”price”,”type”:”FLOAT”,”index”:null}]} 201 Created Container Created. Starting Ingest 0 id id 1 name name 2 quantity quantity 3 price price ✔ Is the above mapping correct? … YES Ingesting. Please wait… Inserting 1000 rows 200 OK Inserting 1000 rows 200 OK And after some time, your data should be ready in your GridDB Cloud instance! $ griddb-cloud-cli sql query -s “SELECT COUNT(*) from migrated_data” [{“stmt”: “SELECT COUNT(*) from migrated_data” }] [[{“Name”:””,”Type”:”LONG”,”Value”:50000}]] And another confirmation $ griddb-cloud-cli read migrated_data -p -l 1 [ { “name”: “migrated_data”, “stmt”: “select * limit 1”, “columns”: null, “hasPartialExecution”: true }] [ [ { “Name”: “id”, “Type”: “INTEGER”, “Value”: 1 }, { “Name”: “name”, “Type”: “STRING”, “Value”: “Enterprise-wide multi-state installation” }, { “Name”: “quantity”, “Type”: “INTEGER”, “Value”: 479 }, { “Name”: “price”, “Type”: “FLOAT”, “Value”: 194.8 } ] ] Migrating from GridDB CE If you want to move all of your local data from GridDB Community Edition over to your GridDB Pay As You Go cloud database, you can now use the GridDB Cloud CLI Tool for the job! You will also of course need to export your CE containers that you wish to migrate. Prereqs As explained above, you will need: the GridDB Cloud CLI Tool from GitHub and the GridDB CE Import/Export Tool installed onto your machine. Step by Step Process of Migrating Let’s run through an example of exporting out entier GridDB CE Database and then running the migration from the CLI tool. This is going to assume you have the Import tool already set up, you can read more about that here: https://griddb.net/en/blog/using-the-griddb-import-export-tools-to-migrate-from-postgresql-to-griddb/ First, you’d run the export tool like so: $ cd expimp/bin $ ./gs_export -u admin/admin -d all –all This command will export all of your containers into a directory called ‘all’. $ ./gs_export -u admin/admin -d all –all Export Start. Directory : /home/israel/development/expimp/bin/all Number of target containers : 7 public.p01 : 2 public.p02 : 0 public.p03 : 0 public.device3 : 1015 public.device2 : 1092 public.device1 : 1944 public.col02 : 10000 Number of target containers:7 ( Success:7 Failure:0 ) Export Completed. Next, ensure your GridDB Cloud CLI Tool is set up, and once it is, you can run the migrate command. Let’s look at how it works: $ griddb-cloud-cli migrate -h Use the export tool on your GridDB CE Instance to create the dir output of csv files and a properties file and then migrate those tables to GridDB Cloud Usage: griddb-cloud-cli migrate [flags] Examples: griddb-cloud-cli migrate <directory> Flags: -f, –force Force create (no prompt) -h, –help help for migrate So in our case, we want to use migrate with the -f flag to not show us prompts because we have 7 containers to create and migrate! $ griddb-cloud-cli migrate -f all And here is an example of some of the output: {“container_name”:”device2″,”container_type”:”TIME_SERIES”,”rowkey”:true,”columns”:[{“name”:”ts”,”type”:”TIMESTAMP”,”index”:null},{“name”:”co”,”type”:”DOUBLE”,”index”:null},{“name”:”humidity”,”type”:”DOUBLE”,”index”:null},{“name”:”light”,”type”:”BOOL”,”index”:null},{“name”:”lpg”,”type”:”DOUBLE”,”index”:null},{“name”:”motion”,”type”:”BOOL”,”index”:null},{“name”:”smoke”,”type”:”DOUBLE”,”index”:null},{“name”:”temp”,”type”:”DOUBLE”,”index”:null}]} 201 Created inserting into (device2). csv: all/public.device2_2020-07-12_2020-07-13.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-13_2020-07-14.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-14_2020-07-15.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-15_2020-07-16.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-16_2020-07-17.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-17_2020-07-18.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-18_2020-07-19.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-19_2020-07-20.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-20_2020-07-21.csv 200 OK {“container_name”:”device3″,”container_type”:”TIME_SERIES”,”rowkey”:true,”columns”:[{“name”:”ts”,”type”:”TIMESTAMP”,”index”:null},{“name”:”co”,”type”:”DOUBLE”,”index”:null},{“name”:”humidity”,”type”:”DOUBLE”,”index”:null},{“name”:”light”,”type”:”BOOL”,”index”:null},{“name”:”lpg”,”type”:”DOUBLE”,”index”:null},{“name”:”motion”,”type”:”BOOL”,”index”:null},{“name”:”smoke”,”type”:”DOUBLE”,”index”:null},{“name”:”temp”,”type”:”DOUBLE”,”index”:null}]} 201 Created inserting into (device3). csv: all/public.device3_2020-07-12_2020-07-13.csv 200 OK inserting into (device3). csv: all/public.device3_2020-07-13_2020-07-14.csv 200 OK inserting into (device3). csv: all/public.device3_2020-07-14_2020-07-15.csv 200 OK inserting into (device3). csv: all/public.device3_2020-07-15_2020-07-16.csv 200 OK inserting into (device3). csv: all/public.device3_2020-07-16_2020-07-17.csv 200 OK inserting into (device3). csv: all/public.device3_2020-07-17_2020-07-18.csv 200 OK inserting into (device3). csv: all/public.device3_2020-07-18_2020-07-19.csv 200 OK inserting into (device3). csv: all/public.device3_2020-07-19_2020-07-20.csv 200 OK {“container_name”:”p01″,”container_type”:”COLLECTION”,”rowkey”:true,”columns”:[{“name”:”name”,”type”:”STRING”,”index”:null},{“name”:”names”,”type”:”STRING_ARRAY”,”index”:null},{“name”:”barr”,”type”:”BOOL_ARRAY”,”index”:null},{“name”:”tsarr”,”type”:”TIMESTAMP_ARRAY”,”index”:null}]} 201 Created {“container_name”:”p02″,”container_type”:”COLLECTION”,”rowkey”:true,”columns”:[{“name”:”id”,”type”:”STRING”,”index”:null},{“name”:”date”,”type”:”STRING”,”index”:null}]} 201 Created {“container_name”:”p03″,”container_type”:”COLLECTION”,”rowkey”:true,”columns”:[{“name”:”id”,”type”:”LONG”,”index”:null},{“name”:”c1″,”type”:”STRING”,”index”:null},{“name”:”c2″,”type”:”BOOL”,”index”:null}]} 201 Created inserting into (p01). csv: all/public.p01.csv 200 OK Lastly we can verify that our containers are in there: $ griddb-cloud-cli show device3 { “container_name”: “device3”, “container_type”: “TIME_SERIES”, “rowkey”: true, “columns”: [ { “name”: “ts”, “type”: “TIMESTAMP”, “timePrecision”: “MILLISECOND”, “index”: [] }, { “name”: “co”, “type”: “DOUBLE”, “index”: [] }, { “name”: “humidity”, “type”: “DOUBLE”, “index”: [] }, { “name”: “light”, “type”: “BOOL”, “index”: [] }, { “name”: “lpg”, “type”: “DOUBLE”, “index”: [] }, { “name”: “motion”, “type”: “BOOL”, “index”: [] }, { “name”: “smoke”, “type”: “DOUBLE”, “index”: [] }, { “name”: “temp”, “type”: “DOUBLE”, “index”: [] } ] } $ griddb-cloud-cli sql query -s “SELECT COUNT(*) FROM device2” [{“stmt”: “SELECT COUNT(*) FROM device2” }] [[{“Name”:””,”Type”:”LONG”,”Value”:1092}]] Looks good to me! Conclusion And with that, we have learned three different methods of migrating from a variation of GridDB to the new Azure Marketplace GridDB

More
Building Resume Creator with Multi-agent AI

In this blog, we will build an AI-powered resume-creation system that automates the tedious and time-consuming tasks involved in manual resume creation. By leveraging multi-agent AI systems, we will streamline the process of information gathering, and content writing to produce resumes with minimal human intervention. Limitations of Manual Resume Processing Inefficient Information Gathering The manual process of collecting and organizing information is time-consuming and requires significant effort. Inconsistent Formatting Manual resume creation often leads to formatting inconsistencies. The process requires manual adjustments to maintain professional formatting standards, which can be error-prone and time-consuming. Content Writing and Rewriting Challenges The manual process requires significant effort in crafting and editing content. Writing compelling and well-structured content by hand is labor-intensive, requiring multiple revisions and edits. Automating Resume Creation using AI Creating a resume manually involves several steps: Information Gathering: Collecting and organizing your personal details, job history, skills, and education. Formatting: Ensuring the resume looks attractive and professional, often without clear guidelines. Content Writing: Crafting and refining content to make it concise, compelling, and relevant. Proofreading and Editing: Checking for errors and polishing the resume to a professional standard. With the AI system, we can automate these steps using multi-agent systems. Each agent performs a specific task, such as extracting information, generating content, or formatting the resume. By coordinating these agents, we can create a fully automated resume creation system. Running the Resume Creator Before we dive into the technical details, you can run the resume creator system by following these steps: 1) Clone the repository: git clone https://github.com/junwatu/resume-creator-multi-agent-ai.git 2) Install the dependencies: cd resume-creator-multi-agent-ai cd apps npm install 3) Create a .env file in the apps directory and add the following environment variables: OPENAI_API_KEY=api-key-here GRIDDB_WEBAPI_URL= GRIDDB_USERNAME= GRIDDB_PASSWORD= VITE_APP_BASE_URL=http://localhost VITE_PORT=3000 Please refer to the Prerequisites section for more details on obtaining the OpenAI API key and GridDB credentials. 4) Start the server: npm run start 5) Open your browser and go to http://localhost:3000 to access the resume creator system. How it Works? In this blog, we automate the information gathering and content writing for the resume, tasks that are usually manual and time-consuming. This system diagram illustrates the resume creation process discussed in this blog, showcasing the collaboration between two main AI agents: Here’s a brief description: The system starts with User Input and requires an environment setup that includes Team Initialization and OpenAI API Key. Two AI agents work together: Profile Analyst (Agent AI 1): Handles data extraction from user input, breaking down information into categories like Name, Experience, Skills, Education, and Job History. Resume Writer (Agent AI 2): Takes structured information and handles the writing aspect. The workflow follows these key steps: Data Extraction: Organizes raw user input into structured categories. This is the information-gathering step. Structured Information: Stores the organized data into the GridDB Cloud database. Resume Crafting: Combines the structured data with writing capabilities. This is the content writing step. Create Resume: Generates the content. Final Resume: Produces the completed document. Prerequisites KaibanJS KaibanJS is the JavaScript framework for building multi-agent AI systems. We will use it to build our resume creation system. OpenAI We will use the o1-mini model from OpenAI. It is a smaller version of the o1 model, suitable for tasks that require complex reasoning and understanding. Create a project, an API key, and enable the o1-mini model in the OpenAI platform. Make sure to save the API key in the .env file. OPENAI_API_KEY=api-key-here GridDB Cloud The GridDB Cloud offers a free plan tier and is officially available worldwide. This database will store the structured information extracted by the Profile Analyst agent and also the final resume generated by the Resume Writer agent. You need these GridDB environment variables in the .env file: GRIDDB_WEBAPI_URL= GRIDDB_USERNAME= GRIDDB_PASSWORD= Check the below section on how to get these values. Sign Up for GridDB Cloud Free Plan If you would like to sign up for a GridDB Cloud Free instance, you can do so in the following link: https://form.ict-toshiba.jp/download_form_griddb_cloud_freeplan_e. After successfully signing up, you will receive a free instance along with the necessary details to access the GridDB Cloud Management GUI, including the GridDB Cloud Portal URL, Contract ID, Login, and Password. GridDB WebAPI URL Go to the GridDB Cloud Portal and copy the WebAPI URL from the Clusters section. It should look like this: GridDB Username and Password Go to the GridDB Users section of the GridDB Cloud portal and create or copy the username for GRIDDB_USERNAME. The password is set when the user is created for the first time, use this as the GRIDDB_PASSWORD. For more details, to get started with GridDB Cloud, please follow this quick start guide. IP Whitelist When running this project, please ensure that the IP address where the project is running is whitelisted. Failure to do so will result in a 403 status code or forbidden access. You can use a website like What Is My IP Address to find your public IP address. To whitelist the IP, go to the GridDB Cloud Admin and navigate to the Network Access menu. Node.js We will use Node.js LTS v22.12.0 to build a server that handles the communication between the user interface, AI agents, and OpenAI API and store data in the GridDB Cloud database. React We will use React to build the user interface for the resume creation system. Where the user can input their details and generate a resume with a click of a button. Building the Resume Creation System Node.js Server We will use Node.js to build the server that handles the communication between the user interface, AI agents, and OpenAI API. The server will also store the structured information in the GridDB Cloud database. This table provides an overview of the API routes defined in the server.js code, including HTTP methods, endpoints, descriptions, and any parameters. HTTP Method Endpoint Description Parameters POST /api/resumes Creates a new resume. Calls the generateResume function to generate content, saves to the database, and returns the response. Body: { content: string } GET /api/resumes Fetches all resumes stored in the database. None GET /api/resumes/:id Fetches a specific resume by its ID. Path: id (Resume ID) DELETE /api/resumes/:id Deletes a specific resume by its ID. Path: id (Resume ID) The main route code for the resume creation is as follows: app.post(‘/api/resumes’, async (req, res) => { try { const resumeData = req.body || {}; const result = await generateResume(resumeData.content || undefined); console.log(result); const resume = { id: generateRandomID(), rawContent: resumeData.content, formattedContent: result.result, status: result.status, createdAt: new Date().toISOString(), information: JSON.stringify(result.stats), } // Save resume to database const dbResponse = await dbClient.insertData({ data: resume }); if (result.status === ‘success’) { const all = { message: ‘Resume created successfully’, data: result.result, stats: result.stats, dbStatus: dbResponse } res.status(201).json(all); } else { res.status(400).json({ message: ‘Failed to generate resume’, error: result.error }); } } catch (error) { res.status(500).json({ error: ‘Server error while creating resume’, details: error.message }); } }); When the user submits their data, the server calls the generateResume function to generate the resume content. The result is then saved to the GridDB Cloud database, and the resume content is returned as a response. Multi-agent AI We will use KaibanJS to build the multi-agent AI system for the resume creation process. You can find the agent code in the team.kban.js file and this system consists of two main agents: Profile Analyst (Agent AI 1) The Profile Analyst agent is responsible for extracting structured information from the user input. It categorizes the input into fields such as Name, Experience, Skills, Education, and Job History. The effectiveness of these fields depends on the quality and diversity of the submitted data. const profileAnalyst = new Agent({ name: ‘Carla Smith’, role: ‘Profile Analyst’, goal: ‘Extract structured information from conversational user input.’, background: ‘Data Processor’, tools: [] // Tools are omitted for now }); This profile agent will use this task code to extract user data: const processingTask = new Task({ description: `Extract relevant details such as name, experience, skills, and job history from the user’s ‘aboutMe’ input. aboutMe: {aboutMe}`, expectedOutput: ‘Structured data ready to be used for a resume creation.’, agent: profileAnalyst }); The expectedOutput is the structured data that will be used by the Resume Writer agent to generate the resume content. The description and expectedOutput mimic the prompts if were interact with ChatGPT. However, in this case, this is done by the Profile Analyst agent. Resume Writer (Agent AI 2) The Resume Writer agent is responsible for crafting the resume content based on the structured information provided by the Profile Analyst agent. It generates well-structured, compelling content that effectively showcases the user’s qualifications and achievements. const resumeWriter = new Agent({ name: ‘Alex Morra’, role: ‘Resume Writer’, goal: `Craft compelling, well-structured resumes that effectively showcase job seekers qualifications and achievements.`, background: `Extensive experience in recruiting, copywriting, and human resources, enabling effective resume design that stands out to employers.`, tools: [] }); This resume agent will use this task code to generate the resume content: const resumeCreationTask = new Task({ description: `Utilize the structured data to create a detailed and attractive resume. Enrich the resume content by inferring additional details from the provided information. Include sections such as a personal summary, detailed work experience, skills, and educational background.`, expectedOutput: `A professionally formatted resume in raw markdown format, ready for submission to potential employers`, agent: resumeWriter }); The result of this task is markdown-formatted resume content that can be easily converted into a PDF or other formats and it’s easy to process by the user interface. Save Data to GridDB Cloud Database The GridDB Cloud database stores the structured information extracted by the Profile Analyst agent and the final resume generated by the Resume Writer agent. This is the schema data used to store the resume information in the GridDB Cloud database: { “id”: “string”, “rawContent”: “string”, “formattedContent”: “string”, “status”: “string”, “createdAt”: “string”, “information”: “string” } Field Type Description id string A unique identifier for each resume. rawContent string The original user input for the resume. formattedContent string The final formatted resume content. status string Indicates the success or failure of the resume generation process. createdAt string The timestamp of when the resume was created. information string The OpenAI token information. GridDB Cloud provides a RESTful API that allows us to interact with the database. We will use this API to store and retrieve the resume information. The griddb-client.js file contains the code to interact with the GridDB Cloud database. It includes functions to insert, retrieve, and delete resume data. To insert new data, you can use the endpoint /containers/${containerName}/rows. This endpoint allows you to add a new row of data to the database: async function insertData({ data, containerName = ‘resumes’ }) { console.log(data); try { const timestamp = data.createdAt instanceof Date ? data.createdAt.toISOString() : data.createdAt; const row = [ parseInt(data.id), // INTEGER data.rawContent, // STRING data.formattedContent, // STRING data.status, // STRING timestamp, // TIMESTAMP (ISO format) data.information // STRING ]; const path = `/containers/${containerName}/rows`; return await makeRequest(path, [row], ‘PUT’); } catch (error) { throw new Error(`Failed to insert data: ${error.message}`); } } GridDB also supports SQL-like queries to interact with the database. Here’s an example of an SQL query to retrieve all resumes from the database: SELECT * FROM resumes; and to retrieve a specific resume by its ID: SELECT * FROM resumes WHERE id = ‘resume-id’; Let’s take an example how to insert data into the GridDB Cloud database: const sql = “insert into resumes (id, rawContent, formattedContent, status, createdAt, information) values(3, ‘raw contenct here’, ‘ formatted content here’, ‘success’, TIMESTAMP(‘2025-01-02’), ‘{tokens: 300}’)”; const response = await fetch(`${process.env.GRIDDB_WEBAPI_URL}’/sql/dml/update’`, { method: ‘POST’, headers: { ‘Content-Type’: ‘application/json’, ‘Authorization’: `Basic ${authToken}`, }, body: JSON.stringify(payload), }); const responseText = await response.text(); The code above inserts the resume data into the GridDB Cloud database using the /sql/dml/update endpoint and the SQL query. All these data operations will be handled by the Node.js server and exposed as API endpoints for the user interface to interact with. User Interface The ResumeCreator component is built using React and allows users to input their details in a text and generate a resume with the click of a button. The user interface is designed to be simple. import { useState } from ‘react’; import { Card, CardContent } from ‘@/components/ui/card’; import { Button } from ‘@/components/ui/button’; import { Textarea } from ‘@/components/ui/textarea’; import { Alert, AlertDescription } from ‘@/components/ui/alert’; import { ResumeMarkdownRenderer } from ‘./ResumeMarkdownRenderer.tsx’; const ResumeCreator = () => { const [isSubmitting, setIsSubmitting] = useState(false); const [submitStatus, setSubmitStatus] = useState(null); const [resumeText, setResumeText] = useState(“default resume text”); const [markdownContent, setMarkdownContent] = useState(null); const BASE_URL = import.meta.env.VITE_APP_BASE_URL + ‘:’ + import.meta.env.VITE_PORT; const handleSubmit = async () => { setIsSubmitting(true); setSubmitStatus(null); try { const response = await fetch(`${BASE_URL}/api/resumes`, { method: ‘POST’, headers: { ‘Content-Type’: ‘application/json’, }, body: JSON.stringify({ content: resumeText }), }); if (!response.ok) { throw new Error(‘Failed to create resume’); } const aiResume = await response.json(); setMarkdownContent(aiResume.data); setSubmitStatus(‘success’); } catch (error) { console.error(‘Error creating resume:’, error); setSubmitStatus(‘error’); } finally { setIsSubmitting(false); setTimeout(() => setSubmitStatus(null), 5000); } }; return ( <div className=”max-w-4xl mx-auto p-8 space-y-6″> <h1 className=”text-3xl font-bold text-center”> Resume Creator <div className=”w-40 h-1 bg-green-500 mx-auto mt-1″></div> </h1> {submitStatus && ( <Alert className={submitStatus === ‘success’ ? ‘bg-green-50’ : ‘bg-red-50’}> <AlertDescription> {submitStatus === ‘success’ ? ‘Resume created successfully!’ : ‘Failed to create resume. Please try again.’} </AlertDescription> </Alert> )} {markdownContent ? ( <ResumeMarkdownRenderer markdown={markdownContent} /> ) : ( <div className=”space-y-6″> <h2 className=”text-2xl font-semibold”>About Me</h2> <Card className=”border-2″> <CardContent className=”p-6″> <p className=”text-sm text-gray-600 mb-4″> Enter your professional experience, skills, and education. Our AI will help format this into a polished resume. </p> <Textarea value={resumeText} onChange={(e: React.ChangeEvent<HTMLTextAreaElement>) => setResumeText(e.target.value)} className=”min-h-[400px] font-mono” placeholder=”Enter your resume content here…” /> </CardContent> </Card> <div className=”flex justify-center”> <Button onClick={handleSubmit} disabled={isSubmitting} className=”bg-green-500 hover:bg-green-600 text-white px-8 py-2 rounded-md” > {isSubmitting ? ‘Creating…’ : ‘Create Resume’} </Button> </div> </div> )} </div> ); }; export default ResumeCreator; The core functionality of the ResumeCreator component is to create a user resume using AI and render the result. It uses useState to manage input (resumeText), generated markdown (markdownContent), submission status (submitStatus), and submission progress (isSubmitting). The handleSubmit function sends a POST request to the /api/resumes route at the backend (${BASE_URL}/api/resumes), passing the user’s input, and updates the state based on the API’s response. Read here for the Node.js API routes. The UI includes a text area for input, a submit button to trigger the API call, and a markdown renderer ResumeMarkdownRenderer component to display the AI-generated resume. Alerts notify the user of the submission status while loading states to ensure a smooth experience. Further Improvements Enhanced Data Extraction: Improve the Profile Analyst agent’s ability to extract and categorize information more accurately and efficiently. Advanced Content Generation: Enhance the Resume Writer agent’s content generation capabilities to produce more compelling and personalized resumes. User Interface Enhancements: Add more features to the user interface, such as resume templates, customization options, and real-time editing. Conclusion In this blog, we have built an AI-powered resume-creation system that automates the tedious and time-consuming tasks involved in manual resume creation. By leveraging multi-agent AI systems, we have streamlined the process of information gathering and content writing to produce resumes with minimal human

More
Sports Analytics with GridDB

Introduction In modern sports, data-driven decision-making has become essential for gaining a competitive edge. Every step, shot, or lap generates a stream of events that require high-speed ingestion, efficient storage, and rapid querying—challenges that traditional relational databases struggle to handle at scale. To address this, organizations are increasingly looking into alternatives. GridDB, a highly scalable and efficient time-series database, is designed to manage large volumes of continuously generated data such as above. By leveraging GridDB, teams can analyze critical performance metrics such as player speed, fatigue levels, and tactical positioning over time. These insights enable coaches and analysts to make informed decisions on game tactics, substitutions, and training regimens based on real-time and historical data. In this article, we explore how GridDB, integrated within a Spring Boot application, can be used for a soccer analytics use case—optimizing player substitutions and refining game strategies with data-driven precision. Understanding the Use Case A single soccer match generates hundreds of timestamped events—such as a midfielder’s pass at a given time e.g. 20:05:32 or a striker’s shot at time 20:10:15—each enriched with outcomes and metadata. The sequential nature of this data reveals crucial patterns, like player fatigue or shifts in attacking momentum, that static analyses often miss. For engineers, the challenge lies in efficiently managing this high-speed, high-volume data stream. To simulate this type of data, we will use events/15946.json dataset from StatsBomb, which logs an entire match’s events—including passes, shots, and tackles—with millisecond precision. Our Spring Boot application, powered by GridDB, will focus on: Performance Tracking: Monitoring pass accuracy to detect signs of fatigue. Strategy Optimization: Analyzing shot frequency to uncover attacking opportunities. Setting Up GridDB Cluster and Spring Boot Integration Project Structure Here’s a suggested project structure for this application: ├───my-griddb-app │ │ pom.xml │ │ │ ├───src │ │ ├───main │ │ │ ├───java │ │ │ │ └───mycode │ │ │ │ │ MySpringBootApplication.java │ │ │ │ │ │ │ │ │ ├───config │ │ │ │ │ GridDBConfig.txt │ │ │ │ │ │ │ │ │ ├───controller │ │ │ │ │ MatchEventsController.java │ │ │ │ │ │ │ │ │ └───service │ │ │ │ MatchEventsService.java │ │ │ │ MetricsCollectionService.java │ │ │ │ RestTemplateConfig.java │ │ │ │ │ │ │ └───resources │ │ │ │ application.properties │ │ │ │ │ │ │ └───templates │ │ │ pass-accuracy-graph.html This structure separates controllers, models, repositories, services, and the application entry point into distinct layers, enhancing modularity and maintainability. It can be further customized based on individual requirements. Set Up GridDB Cloud For this exercise, we will be using GridDB Cloud version. Start by visiting the GridDB Cloud portal and [signing up](GridDB Cloud Free Plan | TOSHIBA DIGITAL SOLUTIONS CORPORATION) for an account. Based on requirements, either the free plan or a paid plan can be selected for broader access. After registration ,an email will be sent containing essential details, including the Web API URL and login credentials. Once the login details are received, log in to the Management GUI to access the cloud instance. Create Database Credentials Before interacting with the database, we must create a database user: Navigate to Security Settings: In the Management GUI, go to the “GridDB Users” tab. Create a Database User: Click “Create Database User,” enter a username and password, and save the credentials. For example, set the username as soccer_admin and a strong password. Store Credentials Securely: These will be used in your application to authenticate with GridDB Cloud. Set Allowed IP Addresses To restrict access to authorized sources, configure the allowed IP settings: Navigate to Security Settings: In the Management GUI, go to the “Network Access” tab and locate the “Allowed IP” section and add the . Add IP Addresses: For development, you can temporarily add your local machine’s IP. Add POM Dependency Here’s an example of how to configure the dependency in thepom.xml file: <project xmlns=”http://maven.apache.org/POM/4.0.0″ xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd”> <modelVersion>4.0.0</modelVersion> <groupId>com.example</groupId> <artifactId>my-griddb-app</artifactId> <version>1.0-SNAPSHOT</version> <name>my-griddb-app</name> <url>http://maven.apache.org</url> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>3.2.4</version> <relativePath /> <!– lookup parent from repository –> </parent> <properties> <maven.compiler.source>17</maven.compiler.source> <maven.compiler.target>17</maven.compiler.target> </properties> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> <exclusions> <exclusion> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-logging</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-thymeleaf</artifactId> </dependency> <!– JSON processing –> <dependency> <groupId>org.glassfish.jersey.core</groupId> <artifactId>jersey-client</artifactId> <version>2.35</version> </dependency> <dependency> <groupId>org.json</groupId> <artifactId>json</artifactId> <version>20210307</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.15.0</version> <!– or the latest version –> </dependency> <!– Lombok –> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <optional>true</optional> </dependency> </dependencies> </project> Technical Implementation Implementing a soccer analytics solution with GridDB and Spring Boot involves three key steps: Ingesting the StatsBomb events/15946.json dataset into GridDB. Querying the data to extract time-series metrics. Visualizing the results to generate actionable insights. Below, we explore each phase in detail, showcasing GridDB’s time-series capabilities and its seamless integration within a Spring Boot architecture. Step 1: Data Ingestion The events/15946.json file logs a sequence of match events—passes, shots, tackles—each record containing essential fields such as: Timestamp (e.g., “2021-06-11T20:05:32.456”) Player Name (player.name) Event Type (type.name) Outcome (e.g., pass.outcome.name as “Complete”, shot.outcome.name as “Goal”) To efficiently store and query this data in GridDB, we first need define a time-series container in GridDb cloud as below. Container Setup We define a container name match_events in GridDB Cloud using the time-series type with timestamp as the row key. Next, we will create schema which will includes the following columns: timestamp (TIMESTAMP, NOT NULL, Row Key) player_name (STRING) event_type (STRING) event_outcome (STRING) minute (INTEGER) second (INTEGER) team_name (STRING) Afterwards we implement MetricsCollectionServicewhich fetches data from JSON file and pushing the data in database. Here the implentation of MetricCollectionService.java : package mycode.service; import org.json.JSONArray; import org.json.JSONObject; import org.springframework.stereotype.Service; import org.springframework.beans.factory.annotation.Value; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; import java.time.LocalDate; import java.time.format.DateTimeFormatter; import java.util.Scanner; @Service public class MetricsCollectionService { private static String gridDBRestUrl; private static String gridDBApiKey; @Value(“${griddb.rest.url}”) public void setgridDBRestUrl(String in) { gridDBRestUrl = in; } @Value(“${griddb.api.key}”) public void setgridDBApiKey(String in) { gridDBApiKey = in; } public void collect() { try { // Fetch JSON Data from GitHub String jsonResponse = fetchJSONFromGitHub( “https://raw.githubusercontent.com/statsbomb/open-data/master/data/events/15946.json”); JSONArray events = new JSONArray(jsonResponse); // Process and Send Data to GridDB Cloud sendBatchToGridDB(events); } catch (Exception e) { e.printStackTrace(); } } private static String fetchJSONFromGitHub(String urlString) throws Exception { URL url = new URL(urlString); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod(“GET”); conn.setRequestProperty(“Accept”, “application/json”); if (conn.getResponseCode() != 200) { throw new RuntimeException(“Failed to fetch data: HTTP error code : ” + conn.getResponseCode()); } Scanner scanner = new Scanner(url.openStream()); StringBuilder response = new StringBuilder(); while (scanner.hasNext()) { response.append(scanner.nextLine()); } scanner.close(); return response.toString(); } private static void sendBatchToGridDB(JSONArray events) { JSONArray batchData = new JSONArray(); boolean startProcessing = false; for (int i = 0; i < events.length(); i++) { JSONObject event = events.getJSONObject(i); JSONArray row = new JSONArray(); if (event.has(“index”) && event.getInt(“index”) == 10) { startProcessing = true; } if (!startProcessing) { continue; // Skip records until we reach index == 7 } // Extract and format fields String formattedTimestamp = formatTimestamp(event.optString(“timestamp”, null)); row.put(formattedTimestamp); row.put(event.optJSONObject(“player”) != null ? event.getJSONObject(“player”).optString(“name”, null) : null); row.put(event.optJSONObject(“type”) != null ? event.getJSONObject(“type”).optString(“name”, null) : null); JSONObject passOutcome = event.optJSONObject(“pass”); JSONObject shotOutcome = event.optJSONObject(“shot”); if (passOutcome == null && shotOutcome == null) { continue; } if (passOutcome != null) { if (passOutcome.has(“outcome”)) { row.put(passOutcome.getJSONObject(“outcome”).optString(“name”, null)); } else { row.put(JSONObject.NULL); } } else if (shotOutcome != null) { if (shotOutcome.has(“outcome”)) { row.put(shotOutcome.getJSONObject(“outcome”).optString(“name”, null)); } else { row.put(JSONObject.NULL); } } else { row.put(JSONObject.NULL); } row.put(event.optInt(“minute”, -1)); row.put(event.optInt(“second”, -1)); row.put(event.optJSONObject(“team”) != null ? event.getJSONObject(“team”).optString(“name”, null) : null); batchData.put(row); } sendPutRequest(batchData); } private static String formatTimestamp(String inputTimestamp) { try { String todayDate = LocalDate.now().format(DateTimeFormatter.ISO_DATE); return todayDate + “T” + inputTimestamp + “Z”; } catch (Exception e) { return “null”; // Default if parsing fails } } private static void sendPutRequest(JSONArray batchData) { try { URL url = new URL(gridDBRestUrl); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setDoOutput(true); conn.setRequestMethod(“PUT”); conn.setRequestProperty(“Content-Type”, “application/json”); conn.setRequestProperty(“Authorization”, gridDBApiKey); // Encode username and password for Basic Auth // Send JSON Data OutputStream os = conn.getOutputStream(); os.write(batchData.toString().getBytes()); os.flush(); int responseCode = conn.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK || responseCode == HttpURLConnection.HTTP_CREATED) { System.out.println(“Batch inserted successfully.”); } else { System.out.println(“Failed to insert batch. Response: ” + responseCode); } conn.disconnect(); } catch (Exception e) { e.printStackTrace(); } } } Ingestion Logic This steps involves fetching data from GridDB using the REST API and grouping it into 5-minute intervals. package mycode.service; import com.fasterxml.jackson.databind.JsonNode; import com.fasterxml.jackson.databind.ObjectMapper; import org.springframework.stereotype.Service; import java.net.URI; import java.net.http.HttpClient; import java.net.http.HttpRequest; import java.net.http.HttpResponse; import java.util.HashMap; import java.util.Map; @Service public class MatchEventsService { private static final String GRIDDB_URL = “https://cloud5114.griddb.com:443/griddb/v2/gs_clustermfcloud5114/dbs/9UkMCtv4/containers/match_events/rows”; private static final String AUTH_HEADER = “Basic TTAyY…lhbEAx”; private final HttpClient httpClient = HttpClient.newHttpClient(); private final ObjectMapper objectMapper = new ObjectMapper(); public Map<Integer, Integer> getPassCountByFiveMin(String playerName) { try { // Build the HTTP request based on your curl HttpRequest request = HttpRequest.newBuilder() .uri(URI.create(GRIDDB_URL)) .header(“Content-Type”, “application/json”) .header(“Authorization”, AUTH_HEADER) .POST(HttpRequest.BodyPublishers.ofString(“{\”offset\”: 0, \”limit\”: 55555}”)) .build(); // Fetch the response HttpResponse<string> response = httpClient.send(request, HttpResponse.BodyHandlers.ofString()); JsonNode rootNode = objectMapper.readTree(response.body()); JsonNode rows = rootNode.get(“rows”); // Process data: count passes every 5 minutes Map<Integer, Integer> passCountByFiveMin = new HashMap<>(); for (JsonNode row : rows) { String currentPlayer = row.get(1).asText(); String eventType = row.get(2).asText(); int minute = row.get(4).asInt(); if (playerName.equals(currentPlayer) && “Pass”.equals(eventType)) { // Group by 5-minute intervals (0-4, 5-9, 10-14, etc.) int fiveMinInterval = (minute / 5) * 5; passCountByFiveMin.merge(fiveMinInterval, 1, Integer::sum); } } return passCountByFiveMin; } catch (Exception e) { e.printStackTrace(); return new HashMap<>(); } } }</string> Step 3: Visualization To deliver insights, Spring Boot exposes REST endpoints via a @RestController: Endpoints: /api/pass-accuracy/{player} returns a JSON array of {time, accuracy} pairs; /api/shot-frequency/{team} returns {time, shots}. Implementation: The controller calls the query service, maps GridDB results to DTOs, and serializes them with Spring’s Jackson integration. package mycode.controller; import mycode.service.MatchEventsService; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Controller; import org.springframework.ui.Model; import org.springframework.web.bind.annotation.*; import java.util.Map; @Controller public class MatchEventsController { @Autowired private MatchEventsService matchEventsService; @GetMapping(“/pass-accuracy/{playerName}”) public String getPassCountEveryFiveMin(@PathVariable String playerName, Model model) { Map<Integer, Integer> passCountByFiveMin = matchEventsService.getPassCountByFiveMin(playerName); // Prepare data for the chart model.addAttribute(“playerName”, playerName); model.addAttribute(“timeIntervals”, passCountByFiveMin.keySet().stream().sorted().toList()); model.addAttribute(“passCounts”, passCountByFiveMin.values()); return “pass-accuracy-graph”; // Thymeleaf template name } } Running the Project To run the project, execute the following command to build and run our application: mvn clean install && mvn spring-boot:run   Accessing the Dashboard After launching the application, open a web browser and navigate to: http://localhost:9090/pass-accuracy/{{player name}}. For example, http://localhost:9090/pass-accuracy/Lionel%20Andr%C3%A9s%20Messi%20Cuccittini This visualization displays a chart representing pass accuracy trends over time. It provides insights into the player’s fatigue levels over time and their overall activity on the field. Similarly various insights can be generated from this saved data, providing valuable analytics for team performance and decision-making. For example, Player Pass Accuracy Over Time Data: Count of “Pass” events with outcome.name = “Complete” vs. “Incomplete” per player, bucketed by 5-minute intervals. Visualization: Line graph with time (x-axis) and pass accuracy percentage (y-axis) for a key player (e.g., a midfielder). Insight: If pass accuracy drops below 70% late in the game (e.g., after minute 70), the player may be fatigued—time for a substitution. Graph: Goal Proximity Over Time Data: Count of “Shot” events with shot.outcome.name = “Goal” or near-miss outcomes (e.g., “Off Target”), bucketed by 10-minute intervals. Visualization: Stacked bar graph with time (x-axis) and shot outcomes (y-axis). Insight: Periods with frequent near-misses (e.g., minute 30-40) suggest missed opportunities—adjust tactics to capitalize on pressure. Conclusion: As demonstrated, GridDB efficiently processes the timestamped complexity of soccer events, delivering structured insights with precision. Integrated within a Spring Boot application, its high-performance ingestion, optimized time-series indexing, and rapid querying capabilities enable the extraction of actionable metrics from StatsBomb data—identifying player fatigue and strategic opportunities with millisecond accuracy. As sports technology continues to evolve, such implementations highlight the critical role of specialized databases in unlocking the full potential of temporal

More
GridDB for Environmental Monitoring in Smart Cities

Introduction Smart cities are transforming urban landscapes by leveraging technology to improve efficiency and sustainability. A key component of smart cities is environmental monitoring, which involves the collection, aggregation, and analysis of real-time data to address challenges like air pollution, traffic congestion, and resource management. By leveraging innovative database technologies, smart cities can unlock actionable insights that drive sustainability and improve urban living standards. This article delves into how GridDB, a high-performance database tailored for time-series and IoT data, supports environmental monitoring in smart cities. Using real-time data streams like pollen levels, illness risk metrics, and air quality indices, we showcase how GridDB effectively manages large data volumes with speed and precision, empowering informed decision-making for a smarter, more sustainable urban future. Understanding the Use Case To demonstrate a practical application, we integrate environmental data from Ambee, a trusted provider of real-time environmental information. Using GridDB’s robust capabilities, we aggregate and analyze this data, uncovering patterns and insights that can guide policymakers and stakeholders in enhancing urban sustainability. In this article, we focus on three specific datasets crucial for smart city management: Pollen Data: This dataset tracks allergen levels to predict periods of heightened allergy risks, enabling authorities to issue timely health advisories. Illness Risk Data: By analyzing environmental conditions, this dataset assesses the probability of disease outbreaks, aiding public health planning. Air Quality Data: Monitoring pollutants such as PM2.5, PM10, and NO2 ensures compliance with health standards and helps mitigate pollution’s impact on urban life. Together, these datasets, sourced from Ambee, serve as the basis for our study, highlighting the potential of GridDB for real-time environmental monitoring and data-driven decision-making. Setting Up GridDB Cluster and Spring Boot Integration: For Environmental Monitoring The first step is to set up a GridDB cluster and integrate it with our Spring Boot application as follows. Setting up GridDB Cluster GridDB provides flexible options to meet different requirements. For development, a single-node cluster on our local machine may be sufficient. However, in production, distributed clusters across multiple machines are typically preferred for improved fault tolerance and scalability. For detailed guidance on setting up clusters based on our deployment strategy, refer to the GridDB documentation. To set up a GridDB cluster, follow the steps mentioned here. Setting up Spring Boot Application Once our GridDB cluster is operational, the next step is connecting it to ourSpring Boot application. The GridDB Java Client API provides the necessary tools to establish this connection. To simplify the process, you can include the griddb-spring-boot-starter library as a dependency in our project, which offers pre-configured beans for a streamlined connection setup.    Setting Up API Access To begin, visit www.getambee.com and create an account. After registering, you’ll be provided with an API key, which is required for authentication when making requests to their endpoints. This key grants access to various environmental data services offered by the platform.   Pricing Plans Ambee offers flexible pricing plans to suit different needs: Free Tier: Ideal for developers or small-scale projects, with limited API calls per month. Paid Plans: Designed for larger-scale applications, these plans provide higher API limits and additional features. Project Structure Here’s a suggested project structure for this application: └───my-griddb-app │ pom.xml │ ├───src │ ├───main │ │ ├───java │ │ │ └───mycode │ │ │ │ MySpringBootApplication.java │ │ │ │ │ │ │ ├───config │ │ │ │ GridDBConfig.java │ │ │ │ │ │ │ ├───controller │ │ │ │ ChartController.java │ │ │ │ │ │ │ ├───dto │ │ │ │ AirQualityDTO.java │ │ │ │ IllnessRiskDTO.java │ │ │ │ PollenDataDTO.java │ │ │ │ │ │ │ └───service │ │ │ ChartService.java │ │ │ MetricsCollectionService.java │ │ │ RestTemplateConfig.java │ │ │ │ │ └───resources │ │ │ application.properties │ │ │ │ │ └───templates │ │ charts.html │ │ │ └───test │ └───java │ └───com │ └───example │ AppTest.java This structure separates controllers, models, repositories, services, and the application entry point into distinct layers, enhancing modularity and maintainability. Add GridDB Dependency To enable interaction with GridDB in our Spring Boot project, we must include the GridDB Java Client API dependency. This can be accomplished by adding the appropriate configuration to the project build file, such as pom.xml for Maven or the equivalent file for Gradle. Here’s an example of how to configure the dependency in thepom.xml file: <project xmlns=”http://maven.apache.org/POM/4.0.0″ xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd”> <modelVersion>4.0.0</modelVersion> <groupId>com.example</groupId> <artifactId>my-griddb-app</artifactId> <version>1.0-SNAPSHOT</version> <name>my-griddb-app</name> <url>http://maven.apache.org</url> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>3.2.4</version> <relativePath /> <!– lookup parent from repository –> </parent> <properties> <maven.compiler.source>17</maven.compiler.source> <maven.compiler.target>17</maven.compiler.target> </properties> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <!– GridDB dependencies –> <dependency> <groupId>com.github.griddb</groupId> <artifactId>gridstore</artifactId> <version>5.6.0</version> </dependency> <!– Spring Boot dependencies –> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> <exclusions> <exclusion> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-logging</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-thymeleaf</artifactId> </dependency> <!– JSON processing –> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.15.0</version> <!– or the latest version –> </dependency> <!– Lombok –> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <optional>true</optional> </dependency> </dependencies> </project> Configure GridDB Connection After adding the GridDB dependency, the next step is configuring the connection details for our GridDB cluster in our Spring Boot application. This is usually configured in the application.properties file, where you can specify various settings for the application. Here’s a quick example of how to set up those connection details: GRIDDB_NOTIFICATION_MEMBER=127.0.0.1:10001 GRIDDB_CLUSTER_NAME=myCluster GRIDDB_USER=admin GRIDDB_PASSWORD=admin management.endpoints.web.exposure.include=* server.port=9090 #API token api.token=<enter your API Key></enter> griddb.cluster.port: The port number on which the GridDB cluster is listening. griddb.cluster.user: The username for accessing the GridDB cluster. griddb.cluster.password: The password for the specified GridDB user (replace with ouractual password). server.port=9090: Sets the port on which ourSpring Boot application will run. api.token : API token for authentication purposes. Create GridDB Client Bean To interact effectively with GridDB in our Spring Boot application,we need to create a dedicated Spring Bean to manage the GridDB connection. This bean will establish the connection using the parameters defined in the application.properties file and will act as the central interface for interacting with the GridDB cluster across the application. Here’s an example of how to define this bean in a Java class named GridDbConfig.java: package mycode.config; import java.util.Properties; import org.springframework.beans.factory.annotation.Value; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.context.annotation.PropertySource; import com.toshiba.mwcloud.gs.GSException; import com.toshiba.mwcloud.gs.GridStore; import com.toshiba.mwcloud.gs.GridStoreFactory; @Configuration @PropertySource(“classpath:application.properties”) public class GridDBConfig { @Value(“${GRIDDB_NOTIFICATION_MEMBER}”)   private String notificationMember; @Value(“${GRIDDB_CLUSTER_NAME}”)   private String clusterName; @Value(“${GRIDDB_USER}”)   private String user; @Value(“${GRIDDB_PASSWORD}”)   private String password; @Bean   public GridStore gridStore() throws GSException {     // Acquiring a GridStore instance     Properties properties = new Properties();     properties.setProperty(“notificationMember”, notificationMember);     properties.setProperty(“clusterName”, clusterName);     properties.setProperty(“user”, user);     properties.setProperty(“password”, password);     return GridStoreFactory.getInstance().getGridStore(properties); } } Metric Collection Our primary focus is on the MetricCollection.java class, where API calls to external services are made, and the collected data is processed and stored. This class serves as a bridge between the external APIs and the backend system, ensuring seamless integration with GridDB for real-time analytics and decision-making. Ambee provides a rich set of endpoints to fetch real-time environmental data. For this project, we’ll focus on the following key endpoints: Pollen Data API: This endpoint provides allergen levels for specific locations, helping to monitor pollen concentrations and predict allergy outbreaks. It is essential for public health, as it allows authorities to issue timely advisories for people with respiratory conditions or allergies. Air Quality API: This API provides data on various pollutants such as PM2.5, PM10, CO, and NO2. Monitoring air quality is crucial for ensuring compliance with health standards and mitigating pollution’s impact on public health. It helps cities take proactive steps in reducing pollution and protecting residents’ well-being. Illness Risk API: This endpoint returns calculated risk scores based on environmental conditions such as pollution levels, temperature, and humidity. It plays a critical role in public health, enabling early detection of potential health risks and informing decisions on preventive measures to reduce illness outbreaks in urban areas. All above APIs use RESTful architecture and return responses in JSON format. Key Components of the MetricCollection.java Class API Key Management: The API key is injected into the service via the @Value annotation. This allows the project to securely access the external Ambee API services without hardcoding sensitive credentials directly into the codebase. RestTemplate for API Calls: The RestTemplate object is used for making HTTP requests. It simplifies the process of invoking REST APIs and handling the response data. We are using restTemplate.getForObject() to fetch JSON data from the APIs and return it as a string. This data can then be processed or converted into a more structured format, like objects or entities, for further analysis. Data Storage in GridDB: The environmental data fetched from the APIs is stored in GridDB’s time-series containers for efficient management and querying. Here’s how the data is persisted: Pollen Data: A time-series container named “PollenData” is created using the PollenDataDTO class. Each data entry is appended to this container. Illness Risk Data: A similar container, “IllnessRiskData”, is created for storing data related to contamination risks using the IllnessRiskDTO class. Air Quality Data: Another time-series container, “AirQualityData”, is set up for storing air quality metrics using the AirQualityDTO class. Fetching and Processing the Data Each method in the MetricCollection.java class is designed to collect specific environmental data: Air Quality: The getAirQualityData method is responsible for fetching real-time air quality data for a given city. It contacts the Ambee air quality API and retrieves the data in JSON format. Pollen Data: Similarly, the getPollenData method makes a request to the Ambee pollen API to gather pollen data, which is vital for assessing allergens in the air, particularly for individuals with respiratory conditions. Illness Risk: The getIllnessRiskData method provides critical insights into potential health risks caused by environmental factors such as pollution levels or seasonal changes, allowing for proactive health management. Data Transformation After retrieving the raw JSON data, it can be parsed and converted into Java objects for streamlined processing. By mapping the JSON response to custom Java classes, such as AirQualityDTO, IllnessRiskDTO, and PollenDataDTO, the data becomes easier to manage and transform within the system for further analysis and visualization. Below is the implementation of these DTO classes. @Data @NoArgsConstructor @AllArgsConstructor public class AirQualityDTO { @RowKey public Date updatedAt; private double lat; private double lng; private String city; private String state; private double pm10; private double pm25; private double no2; private double so2; private double co; private double ozone; private int aqi; private String pollutant; private double concentration; private String category; } @Data @NoArgsConstructor @AllArgsConstructor public class IllnessRiskDTO { @RowKey public Date createdAt; private double lat; private double lng; private String iliRisk; } @Data @NoArgsConstructor @AllArgsConstructor public class PollenDataDTO { @RowKey public Date updatedAt; private double lat; private double lng; private String grassPollenRisk; private String treePollenRisk; private String weedPollenRisk; private int grassPollenCount; private int treePollenCount; private int weedPollenCount; } Storing Data in GridDB After transforming the data into Java objects, it is stored in GridDB for real-time querying and analysis. Here’s a brief snippet of how the data insertion look: Scheduling Data Collection To ensure the data is regularly collected, you can use Spring Boot’s @Scheduled annotation to trigger API calls at fixed intervals. This makes sure the data is updated regularly to support real-time monitoring and analytics. @Scheduled(fixedRate = 4000) public void collectMetrics() throws GSException, JsonMappingException, JsonProcessingException, ParseException { List<pollendatadto> pollenData = fetchPollenData(); List<illnessriskdto> illnessRiskData = fetchIllnessRiskData(); List<airqualitydto> airQualityData = fetchAirQualityData(); // Store Pollen Data TimeSeries<pollendatadto> pollenSeries = store.putTimeSeries(“PollenData”, PollenDataDTO.class); for (PollenDataDTO data : pollenData) { pollenSeries.append(data); } // Store Illness Risk Data TimeSeries<illnessriskdto> illnessRiskSeries = store.putTimeSeries(“IllnessRiskData”, IllnessRiskDTO.class); for (IllnessRiskDTO data : illnessRiskData) { illnessRiskSeries.append(data); } // Store Air Quality Data TimeSeries<airqualitydto> airQualitySeries = store.putTimeSeries(“AirQualityData”, AirQualityDTO.class); for (AirQualityDTO data : airQualityData) { airQualitySeries.append(data); } }</airqualitydto></illnessriskdto></pollendatadto></airqualitydto></illnessriskdto></pollendatadto> By following above steps, we can effectively extract data from Ambee, load it into GridDB database. Data Querying in GridDB and Visualization with Thymeleaf Once the data is stored and available in GridDB, the next step is to visualize this data in a way that provides actionable insights. In this section, we’ll explore how to build a dashboard using Spring Boot, Thymeleaf, and Chart.js to render charts that displays realtime environment data. Here are the steps to achieve this: Building the Chart Controller The ChartController acts as the intermediary between backend data in GridDB and the frontend visualizations displayed on the dashboard. Its responsibilities include handling HTTP requests, interacting with the service layer to fetch data, and passing that data to Thymeleaf templates for rendering. Here’s how the ChartController is implemented: package mycode.controller; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Controller; import org.springframework.ui.Model; import org.springframework.web.bind.annotation.GetMapping; import mycode.service.ChartService; import mycode.dto.AirQualityDTO; import mycode.dto.IllnessRiskDTO; import mycode.dto.PollenDataDTO; import java.util.List; @Controller public class ChartController { @Autowired private ChartService chartService; @GetMapping(“/charts”) public String showCharts(Model model) { try { // Fetch data for charts List<pollendatadto> pollenData = chartService.getPollenData(); List<illnessriskdto> illnessRiskData = chartService.getIllnessRiskData(); List<airqualitydto> airQualityData = chartService.getAirQualityData(); // Add data to the model for Thymeleaf model.addAttribute(“pollenData”, pollenData); model.addAttribute(“illnessRiskData”, illnessRiskData); model.addAttribute(“airQualityData”, airQualityData); } catch (Exception e) { model.addAttribute(“error”, “Unable to fetch data: ” + e.getMessage()); } return “charts”; } }</airqualitydto></illnessriskdto></pollendatadto> Implementing the Chart Service The ChartService acts as the business logic layer, encapsulating the operations needed to query GridDB and process the results. In this context, the ChartService class accesses a GridStore database container to retrieve environmental monitoring data. The service then compiles these objects into a list representing the environmental metrics, ready for use in analysis or visualization. Here’s how the ChartService is implemented: package mycode.service; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Date; import java.util.List; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Service; import com.toshiba.mwcloud.gs.Container; import com.toshiba.mwcloud.gs.GridStore; import com.toshiba.mwcloud.gs.Query; import com.toshiba.mwcloud.gs.Row; import com.toshiba.mwcloud.gs.RowSet; import com.toshiba.mwcloud.gs.TimeSeries; import mycode.dto.AirQualityDTO; import mycode.dto.IllnessRiskDTO; import mycode.dto.PollenDataDTO; @Service public class ChartService { @Autowired GridStore store; public List<pollendatadto> getPollenData() throws Exception {

More
Python Client v5.8 Changes & Usage (Client is now based on Java!)

The GridDB Python client has been updated to now use the native GridDB Java interface with JPype and Apache Arrow. Prior to this release, the python client relied on the c_client, and translated some of those commands with swig and other tools. The main benefit of committing to this change in underlying technology is being able to query GridDB and get back an Apache Arrow recordbatch object in return. We will go over how this change can directly affect your python workflows with a concrete example later on in this article. Another benefit is how SQL is now handled. With the c_client as the base, you could query the database only using TQL, but not SQL, meaning that certain partitioned tables were simply not accessible to your python client. There were workarounds, for example: Pandas with Python GridDB SQL Queries, but now with this new client, this sort of thing will work out of the box. So with that out of the way, let’s see how we can install the new GridDB Python Client and explore some of the changes in this new version Installation And Prereqs To install, as explained briefly above, you will need to have Java installed and set to your environment variable JAVA_HOME. You will also need maven and python3.12 Here is how I installed these packages and set the Java home on Ubuntu 22.04: $ sudo apt install maven python3.12 default-jdk $ export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 Installation To grab the source code of the python client, navigate to its github page and clone the repo. Once cloned, we can run a maven install to build our apache library and then install the python client. git clone https://github.com/griddb/python_client.git cd python_client/java mvn install cd .. cd python python3.12 -m pip install . cd .. #this puts you back in the root directory of python_client Jar Files and Your Environment Variables On top of having the JAVA HOME set and having Java installed, there are a couple of .jar files you will need to be in your CLASSPATH as well. Specifically we need pyarrow, gridstore and in some cases, py-memory-netty. Two of these three we can just download generic versions from the maven repository, but Apache Arrow will rely on a modified version which we built in the previous step. So let’s download and set these jar files. $ mkdir lib && cd lib $ curl -L -o gridstore.jar https://repo1.maven.org/maven2/com/github/griddb/gridstore/5.8.0/gridstore-5.8.0.jar $ curl -L -o arrow-memory-netty.jar https://repo1.maven.org/maven2/org/apache/arrow/arrow-memory-netty/18.3.0/arrow-memory-netty-18.3.0.jar $ cp ../java/target/gridstore-arrow-5.8.0.jar gridstore-arrow.jar $ export CLASSPATH=$CLASSPATH:./gridstore.jar:./gridstore-arrow.jar:./arrow-memory-netty.jar If you are unsure if your CLASSPATH is set, you can always run an echo: $ echo $CLASSPATH :./gridstore.jar:./gridstore-arrow.jar:./arrow-memory-netty.jar With this set in your CLASSPATH, you can start your python3 griddb scripts without explicity setting the classpath options when starting the JVM at the top of the file. If you don’t set the CLASSPATH, you can use the option like the sample code does: import jpype # If no CLASSPATH Set in the environment, you can force the JVM to start with these jars explicitly jpype.startJVM(classpath=[“./gridstore.jar”, “./gridstore-arrow.jar”, “./arrow-memory-netty.jar”]) import griddb_python as griddb import sys or if you set the CLASSPATH and make this permanent (for example, editing your .bashrc file), you can get away with not importing jpype at all as the modified pyarrow will do it for you. # No jpype set here; still works import griddb_python as griddb import sys Running Samples To ensure everything is working properly, we should try running the sample code. Navigate to the sample dir, make some changes, and then run! # from the root of this python client repo $ cd sample We will need to change the connection details as GridDB CE runs mostly in FIXED_LIST mode now, meaning we need a notification member, not a host/port combo: try: #Get GridStore object # Changed here to notification_member vs port & address gridstore = factory.get_store(notification_member=argv[1], cluster_name=argv[2], username=argv[3], password=argv[4]) And depending on you have your CLASSPATH variables set, you can either add arrow-memory-netty to your jypype start jvm method, or set your classpath as explained. The code should now run just fine: python_client/sample$ python3.12 sample1.py 127.0.0.1:10001 myCluster admin admin Person: name=name02 status=False count=2 lob=[65, 66, 67, 68, 69, 70, 71, 72, 73, 74] API Differences & Usage The README for the Python client page explains what features are still currently missing (when compared to the previous v.0.8.5): – Array type for GridDB – Timeseries-specific function – Implicit data type conversion But there is also some functionality that is gained with this move: – Compsite RowKey, Composite Index GEOMETRY type and TIMESTAMP(micro/nano-second) type – Put/Get/Fetch with Apache Arrow – Operations for Partitioning table Using SQL These new features basically come from the ability to use Java and then by extension JDBC and SQL. To use SQL, you can simply add the GridDB JDBC jar to your CLASSPATH (or jvm start options). From there, you can use SQL and use them on partition tables. Taken from the samples, here’s what SQL can look like: import jpype import jpype.dbapi2 jpype.startJVM(classpath=[“./gridstore.jar”, “./gridstore-arrow.jar”, “./gridstore-jdbc.jar”]) import griddb_python as griddb import sys ### SQL create table/insert url = “jdbc:gs://127.0.0.1:20001/myCluster/public” conn = jpype.dbapi2.connect(url, driver=”com.toshiba.mwcloud.gs.sql.Driver”, driver_args={“user”:”admin”, “password”:”admin”}) curs = conn.cursor() curs.execute(“DROP TABLE IF EXISTS Sample”) curs.execute(“CREATE TABLE IF NOT EXISTS Sample ( id integer PRIMARY KEY, value string )”) print(‘SQL Create Table name=Sample’) curs.execute(“INSERT INTO Sample values (0, ‘test0’),(1, ‘test1’),(2, ‘test2’),(3, ‘test3’),(4, ‘test4’)”) print(‘SQL Insert’) For the most part, this stuff is the same as before as we could always start up the jpype jvm with python and run JDBC. What is truly new is using Apache Arrow. Using Apache Arrow with GridDB/Python/Nodejs Part of what makes Arrow so useful in the modern era is its ability to “[allow] for zero-copy reads and fast data access and interchange without serialization overhead between these languages and systems.”(https://en.wikipedia.org/wiki/Apache_Arrow). To showcase this, we will create a python script to generate 10000 rows of ‘random’ data, then we will query the result directly in an Arrow RecordBatch object, and once that obj exists, we will stream it over tcp (without serialzing/deserializing) directly over to nodejs (so called Zero-Copying). First, let’s generate 10000 rows of data. We will create a timeseries container with ‘random’ data in all of the columns. from datetime import datetime, timezone, timedelta import griddb_python as griddb import sys import pandas as pd import pyarrow as pa import uuid import random import socket import warnings warnings.filterwarnings(‘ignore’) def generate_random_timestamps(start_date_str, num_timestamps, min_interval_minutes=5, max_interval_minutes=30): date_format = “%Y-%m-%dT%H:%M:%S” current_time = datetime.fromisoformat(start_date_str.replace(“Z”, “”)).replace(tzinfo=timezone.utc) timestamp_list = [] for _ in range(num_timestamps): timestamp_str = current_time.strftime(date_format) + “.000Z” timestamp_list.append(timestamp_str) random_minutes = random.randint(min_interval_minutes, max_interval_minutes) current_time += timedelta(minutes=random_minutes) return timestamp_list start_point = “2024-12-01T10:00:00.000Z” number_of_stamps = 10000 min_interval = 5 max_interval = 20 generated_datelist = generate_random_timestamps( start_point, number_of_stamps, min_interval, max_interval ) factory = griddb.StoreFactory.get_instance() gridstore = factory.get_store( notification_member=”127.0.0.1:10001″, cluster_name=”myCluster”, username=”admin”, password=”admin” ) col = gridstore.get_container(“col01”) ra = griddb.RootAllocator(sys.maxsize) blob = bytearray([65, 66, 67, 68, 69, 70, 71, 72, 73, 74]) conInfo = griddb.ContainerInfo(“col01”, [[“ts”, griddb.Type.TIMESTAMP], [“name”, griddb.Type.STRING], [“status”, griddb.Type.BOOL], [“count”, griddb.Type.LONG], [“lob”, griddb.Type.BLOB]], griddb.ContainerType.TIME_SERIES, True) i=0 rows=[] while i < 10000: rows.append([datetime.strptime(generated_datelist[i], "%Y-%m-%dT%H:%M:%S.%f%z"),str(uuid.uuid1()), False, random.randint(0, 1048576), blob]) i=i+1 Next let's insert with multiput. First we'll format the list of rows into a dataframe. Then we'll convert the dataframe into a recordbatch and then use multiput to insert the batch into GridDB: df = pd.DataFrame(rows, columns=["ts", "name", "status", "count", "lob"]) col = gridstore.put_container(conInfo) rb = pa.record_batch(df) col.multi_put(rb, ra) Now that our data is inside GridDB, let's query it (this is for the sake of education, obviously this makes no sense in the 'real world'). col = gridstore.get_container("col01") q = col.query("select *") q.set_fetch_options(root_allocator=ra) rs = q.fetch() result = [] rb = rs.next_record_batch() #gets all of the rows as a recordbatch obj And now finally, let's stream our record batch over to some other programming environment to showcase Apache Arrow's supreme flexibility. We will use nodejs as the consumer here. #stream our rows through socket HOST = '127.0.0.1' PORT = 2828 with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as server_socket: server_socket.bind((HOST, PORT)) server_socket.listen(1) print(f"Python producer listening on {HOST}:{PORT}") conn, addr = server_socket.accept() print(f"Connected by {addr}") with conn: with conn.makefile(mode='wb') as f: # Use the file-like object as the sink for the stream writer with pa.ipc.new_stream(f, rb.schema) as writer: writer.write_batch(rb) Run the python script and it will create the container, add the rows of data, query those rows of data, and then start a server which is listening for a consumer to connect to it, which will then send all of the rows to the consumer. Nodejs will now connect to our producer and print out the records const net = require('net'); const { RecordBatchReader } = require('apache-arrow'); const HOST = '127.0.0.1'; // Ensure this port matches your Python producer's port const PORT = 2828; const client = new net.Socket(); client.connect(PORT, HOST, async () => { console.log(`Connected to Python producer at ${HOST}:${PORT}`); try { const reader = await RecordBatchReader.from(client); let schemaPrinted = false; for await (const recordBatch of reader) { if (!schemaPrinted) { console.log(“Successfully parsed schema from stream.”); console.log(`Schema:`, reader.schema.fields.map(f => `${f.name}: ${f.type}`).join(‘, ‘)); console.log(“— Processing data batches —“); schemaPrinted = true; } // Convert the record batch to a more familiar JavaScript object format const data = recordBatch.toArray().map(row => row.toJSON()); console.log(“Received data batch:”, data); } console.log(“——————————-“); console.log(“Stream finished.”); } catch (error) { console.error(“Error processing Arrow stream:”, error); } }); client.on(‘close’, () => { console.log(‘Connection closed’); }); client.on(‘error’, (err) => { console.error(‘Connection error:’, err.message); }); Run this nodejs script like so: $ npm install $ node consumer.js Connected to Python producer at 127.0.0.1:2828 Successfully parsed schema from stream. Schema: ts: Timestamp, name: Utf8, status: Bool, count: Int64, lob: Binary — Processing data batches — Received data batch: [ { ts: 1733047200000, name: ’65d16ce6-55d3-11f0-8070-8bbd0177d9e6′, status: false, count: 820633n, lob: Uint8Array(10) [ 65, 66, 67, 68, 69, 70, 71, 72, 73, 74 ] }, { ts: 1733047500000, name: ’65d16ce7-55d3-11f0-8070-8bbd0177d9e6′, status: false, count: 931837n, lob: Uint8Array(10) [ 65, 66, 67, 68, 69, 70, 71, 72, 73, 74 ] }, ….cutoff Conclusion And with that, we have successfully showcased the new GridDB Python

More