Blog

Sentiment Analysis on Product Reviews using OpenAI

Overview This article explores a practical implementation of sentiment analysis on product reviews using OpenAI’s large language models (LLMs) and GridDB. The project involves extracting review data, analyzing sentiment using AI, and efficiently storing and querying the results. It’s an example of how modern AI tools and database systems can be combined to process and understand massive amounts of customer feedback. Why Use AI for Sentiment Analysis? Manual analysis of customer sentiment is impossible at scale due to the sheer volume of user-generated content. Artificial intelligence, especially LLMs like OpenAI’s GPT, enables automated analysis with human-level language understanding. Key advantages include: Scalability: AI can process millions of reviews quickly and consistently. Context Awareness: LLMs are capable of identifying sarcasm, subtle opinions, and emotional tone. Language Versatility: These models handle slang, informal language, and multilingual text more effectively than rule-based systems. AI-driven sentiment analysis helps businesses track public opinion, improve customer service, and guide product development based on real user feedback. Running the Project Clone the Repository git clone https://github.com/junwatu/sentiment-analysis-griddb.git cd app npm install Setup Environment Variables Create a .env file in the app directory of the project and add the following environment variables: OPENAI_API_KEY= GRIDDB_WEBAPI_URL= GRIDDB_USERNAME= GRIDDB_PASSWORD= VITE_BASE_URL=http://localhost:3000 You can copy the .env.example file to .env and fill in the values. To get the OpenAI API key, read this section and for GridDB Cloud, read this section. By default, the VITE_BASE_URL is set to http://localhost:3000 but you can change it to your own domain. Run the Project npm run start Access the Project If you don’t change the VITE_BASE_URL, open your browser and navigate to http://localhost:3000 to access the project. Technologies Used Node.js You need Node.js installed because this project uses Vite based project. Install the Node LTS version from here. OpenAI AI models are used for natural language processing and sentiment analysis. Create the OpenAI API key here. You may need to create a project and enable a few models. We will use the gpt-4o model. GridDB Cloud Setup The GridDB Cloud offers a free plan tier and is officially available worldwide. You need these GridDB environment variables in the .env file: GRIDDB_WEBAPI_URL= GRIDDB_USERNAME= GRIDDB_PASSWORD= Sign Up for GridDB Cloud Free Plan If you would like to sign up for a GridDB Cloud Free instance, you can do so in the following link: https://form.ict-toshiba.jp/download_form_griddb_cloud_freeplan_e. After successfully signing up, you will receive a free instance along with the necessary details to access the GridDB Cloud Management GUI, including the GridDB Cloud Portal URL, Contract ID, Login, and Password. GridDB WebAPI URL Go to the GridDB Cloud Portal and copy the WebAPI URL from the Clusters section. It should look like this: GridDB Username and Password Go to the GridDB Users section of the GridDB Cloud portal and create or copy the username for GRIDDB_USERNAME. The password is set when the user is created for the first time, use this as the GRIDDB_PASSWORD. For more details, to get started with GridDB Cloud, please follow this quick start guide. IP Whitelist When running this project, please ensure that the IP address where the project is running is whitelisted. Failure to do so will result in a 403 status code or forbidden access. You can use a website like What Is My IP Address to find your public IP address. To whitelist the IP, go to the GridDB Cloud Admin and navigate to the Network Access menu. Project Architecture The diagram illustrates the flow of data in the sentiment analysis system: Product Review Data is fed into the system. The OpenAI API analyzes this data for sentiment. The sentiment results and associated metadata are then stored in GridDB. Finally, the sentiment analysis results are presented through a user interface. This represents a pipeline where raw review data is processed, analyzed for sentiment, stored in GridDB, and then made available for viewing. Dataset Preparation You can use any data review for this project. For example, you can use the Amazon Review Dataset or just copy-paste a review from one of the product reviews on Amazon. Server The server is built using Node.js and Express.js. It is a simple server that provides a few API routes. You can see the server code in the server.ts file. API Routes Method Endpoint Description POST /api/sentiment Sentiment analysis for a review and persist the result in GridDB GET /api/sentiments Retrieve all stored review records GET * (any non‑API path) Serve main UI Running Sentiment Analysis with OpenAI We use the OpenAI API (e.g., GPT-4o) to evaluate the sentiment of each review. The input is a text prompt that asks the model to categorize the sentiment. System Prompt We will use a system prompt to guide the model to understand the task. The system prompt is as follows: You are a sentiment‑analysis classifier for Amazon user‑review records. You will receive one JSON object that contains the fields “title” and “text” (and sometimes “rating” which you must ignore). Your task:\n1. Read the natural‑language content (title + text).\n2. Predict the sentiment label and an estimated star rating without looking at any numeric “rating” field.\n3. Respond ONLY with a JSON object in this schema:\n{\n “label”: “positive | neutral | negative”,\n “predicted_rating”: 1 | 2 | 3 | 4 | 5,\n “confidence”: 0-1\n}\nMapping rule (aligned to the Amazon Reviews dataset):\n• 1–2 stars ⇒ negative\n• 3 stars ⇒ neutral\n• 4–5 stars ⇒ positive\nIf the review text is empty, off‑topic, or nonsense, return:\n{“label”:”neutral”,”predicted_rating”:3,”confidence”:0.0}\nNever add commentary or extra keys. Basically, it tells the model to behave like a sentiment analysis classifier for Amazon user-review records. Few Shots A few shots are used to guide the model in understanding the task. The few shots used in this project are as follows: const FEW_SHOTS = [ { role: ‘user’, content: JSON.stringify({ title: ‘Rock‑solid mount’, text: “I’ve tried dozens of phone mounts—this one finally stays put on bumpy roads. Five minutes to install and rock‑solid!”, }), }, { role: ‘assistant’, content: ‘{“label”:”positive”,”predicted_rating”:5,”confidence”:0.96}’, }, { role: ‘user’, content: JSON.stringify({ title: ‘Broke in a week’, text: “Looks nice, but the zipper broke after one week and Amazon wouldn’t replace it.”, }), }, { role: ‘assistant’, content: ‘{“label”:”negative”,”predicted_rating”:1,”confidence”:0.93}’, }, { role: ‘user’, content: JSON.stringify({ title: ‘Meh’, text: ‘These were lightweight and soft but much too small for my liking. I would have preferred two of these together to make one loc.’, }), }, { role: ‘assistant’, content: ‘{“label”:”neutral”,”predicted_rating”:3,”confidence”:0.55}’, }, ] as const; Few-shot examples are used to help a model quickly learn a specific task or format by demonstrating correct input-output pairs. In this context, these few shots serve as demonstrations to guide the sentiment analysis model, illustrating: Expected input format: Reviews consisting of a title and text. Expected output structure: A JSON response containing: label: sentiment classification (“positive”, “negative”, “neutral”). predicted_rating: numerical rating estimation (1–5). confidence: confidence score for the prediction. The model leverages these examples to understand precisely how to classify and score new reviews in a similar structured manner. Call OpenAI API The sentiment analysis was done by OpenAI using the gpt-4o model. The code is available in the server.ts file. Here is the important code snippet: const completion = await openai.chat.completions.create({ model: ‘gpt-4o’, temperature: 0, messages: [ { role: ‘system’, content: SYSTEM_PROMPT }, …FEW_SHOTS, { role: ‘user’, content: JSON.stringify({ title, text }), }, ], }); The text is the review text and the title is the review title. Storing Results in GridDB GridDB is used to persist the processed review data. We use a collection container to store the data. Container Schema The container schema is as follows: id: INTEGER title: STRING review: STRING sentiment: STRING Saving Data to GridDB This is the example data format that is saved to GridDB: { id: 4495, title: ‘US Version’, review: “US version of the game so DLC can’t be downloaded. Returned for a refund”, sentiment: ‘{“label”:”negative”,”predicted_rating”:2,”confidence”:0.88}’ } The reviewData is inserted into GridDB using the insertData function. const reviewData: GridDBData = { id: generateRandomID(), title, review: text, sentiment: JSON.stringify(parsed), }; await dbClient.insertData({ data: reviewData }); The code to insert data is available in the griddb.ts file. Basically, this function takes the data and inserts it into GridDB using the HTTP method. async function insertData({ data, containerName = ‘sentiments’, }: { data: GridDBData; containerName?: string; }): Promise { try { const row = [ parseInt(data.id.toString()), data.title, data.review, data.sentiment, ]; const path = `/containers/${containerName}/rows`; return await makeRequest(path, [row], ‘PUT’); } catch (error) { if (error instanceof GridDBError) { throw error; } const errorMessage = error instanceof Error ? error.message : ‘Unknown error’; throw new GridDBError(`Failed to insert data: ${errorMessage}`, undefined, undefined, error); } } The API route used to insert data in GridDB Cloud is: `${GRIDDB_WEB_API_URL}/containers/sentiments/rows` and by using the HTTP PUT method, the data can be easily inserted into the database. Query All Data To query all data from the sentiments container, you can use the following API route: app.get(‘/api/sentiments’, async (req: express.Request, res: express.Response) => { try { // Search all data const results = await dbClient.searchData([{ type: ‘sql’, stmt: ‘SELECT * FROM sentiments’ }]); res.json({ data: results }); } catch (error) { res.status(500).json({ error: error.message }); } }); The code above uses SQL SELECT statement to query all data from the sentiments container and if you query the data directly from the browser, you can use the following URL https://localhost:3000/api/sentiments User Interface The UI is built using React and Vite. The <SentimentAnalysis /> component is the main component of the UI. It is a simple form that allows users to input a review title and text, and then submit the form to the /api/sentiment endpoint. After submitting the form and processing the sentiment analysis, the UI receives the response data that has JSON format, for example: { “label”: “positive”, “predicted_rating”: 4, “confidence”: 0.85 } The response data is then displayed in the UI. If the review is positive, the UI will display a green color, if it is negative, the UI will display a red color, and if it is neutral, the UI will display a yellow color. Also the UI will display the predicted rating and confidence level. Conclusion This project demonstrates a real-world use case of combining AI and databases for sentiment analysis. Using OpenAI for intelligent sentiment analysis and GridDB for data storage enables fast, efficient processing of customer reviews. The same framework can be extended to other datasets and domains, including social media, customer support logs, or live feedback systems. References OpenAI API

More
Business Intelligence Case Study with GridDB and Python

How do you know your business is going in the right direction? With so much data we have in hand, it is very easy for the problem to get lost while cleaning and analysing it. To be honest, I was in the same place. As a Data Analyst/Data Scientist, I would try to look at the data first and make sense of it. But it not only consumed more time but also made it difficult to explain why I took certain decisions. A recent project on Business Intelligence taught me the power of problem-solving. For this tutorial, we will be exploring a dataset of a global store. To keep it general, I am going to assume our problem statment here is How to grow our business. Then, we will work through the problem statement in a tree-like framework. This will help in structuring our analysis and also get specific KPIs. Framework The framework breaks this down into two major strategic approaches: 1. Increase Our Profit There are a few ways to boost profits. This framework highlights three core strategies: ➤ Increase the Quantity Sold More units sold usually means more revenue. To do this effectively: – Ensure demand is met – Make sure your supply chain and stock levels can support customer demand. We need more data to do this. – Focus on region-specific performance – Some regions may outperform others, and identifying where sales are strongest helps target efforts better. ➤ Sell More Products with High Profit Margins Not all products are created equal. Shifting focus toward high-margin items can improve profitability. – Use product-level data to identify top performers. – Understand regional trends to see where high-margin products sell best. ➤ Reduce the Shipping Cost Shipping and logistics can quietly erode margins, so optimizing this is crucial. – Analyze which products incur the highest shipping costs. – Check which regions are driving those costs and explore smarter routing or fulfillment solutions. 2. Decrease Our Discount While discounts can drive volume, excessive discounting eats into profits. Two key tactics can help manage this: ➤ Identify Best-Selling Products Best-sellers may not need heavy discounting to move. – Reassess discount policies at the product level. – Consider performance variations across regions. ➤ Uncover Bottlenecks Inefficiencies or slow-moving inventory often trigger unnecessary discounts. – Identifying and addressing bottlenecks can reduce dependency on markdowns. Now that we have a basic framework in place, let us look at each aspect and decide on the metrics. What metrics are we tracking? Key Performance Indicators (KPIs): total sales total profit average shipping cost average discount Profitability: products with most profit regions with most profit Discount: best selling products (do not need discounts) bottlenecks due to discounts (low or negative profit margin) Market segmentation: Identifying customer patterns quantity sold by region profit by region shipping cost by region discount by region Pre-requisites The following libraries are required for the below code to execute: Pandas GridDB Python Client Seaborn Matplotlib GridDB’s GitHub Page covers the installation in detail. Please go through it to be able to interact with the GridDB Server in Python. Great! Let’s go ahead and import the libraries Importing Libraries #this is optional – in case of errors during execution even though the python client is installed, it is likely that this could be a path issue and can be resolved as follows import sys sys.path.append(“/home/shripriya/python_client”) #sys.path.append(<path_to_python_client>)</path_to_python_client> import griddb_python as griddb import pandas as pd import seaborn as sns import matplotlib.pyplot as plt Dataset We will be using an open-source SuperStore dataset that can be downloaded from here. Once we download it and put it in the same folder as the code, we can then use pandas to read it and clean it. Reading the Dataframe superstore_df = pd.read_excel(“global_superstore_2016.xlsx”, sheet_name=”Orders”) superstore_df.head() Row ID Order ID Order Date Ship Date Ship Mode Customer ID Customer Name Segment Postal Code City Product ID Category Sub-Category Product Name Sales Quantity Discount Profit Shipping Cost Order Priority 0 40098 CA-2014-AB10015140-41954 2014-11-11 2014-11-13 First Class AB-100151402 Aaron Bergman Consumer 73120.0 Oklahoma City TEC-PH-5816 Technology Phones Samsung Convoy 3 221.980 2 0.0 62.1544 40.77 High 1 26341 IN-2014-JR162107-41675 2014-02-05 2014-02-07 Second Class JR-162107 Justin Ritter Corporate NaN Wollongong FUR-CH-5379 Furniture Chairs Novimex Executive Leather Armchair, Black 3709.395 9 0.1 -288.7650 923.63 Critical 2 25330 IN-2014-CR127307-41929 2014-10-17 2014-10-18 First Class CR-127307 Craig Reiter Consumer NaN Brisbane TEC-PH-5356 Technology Phones Nokia Smart Phone, with Caller ID 5175.171 9 0.1 919.9710 915.49 Medium 3 13524 ES-2014-KM1637548-41667 2014-01-28 2014-01-30 First Class KM-1637548 Katherine Murray Home Office NaN Berlin TEC-PH-5267 Technology Phones Motorola Smart Phone, Cordless 2892.510 5 0.1 -96.5400 910.16 Medium 4 47221 SG-2014-RH9495111-41948 2014-11-05 2014-11-06 Same Day RH-9495111 Rick Hansen Consumer NaN Dakar TEC-CO-6011 Technology Copiers Sharp Wireless Fax, High-Speed 2832.960 8 0.0 311.5200 903.04 Critical 5 rows × 23 columns We can see that our dataset has a lot of information on the orders being placed. Not all information is useful here. Remember our framework comes in handy here. So, we will go ahead and drop some columns that are not needed. Exploratory Data Analysis (EDA) #do not need Order ID, Postal Code, Order Priority cols = [“Order ID”, “Postal Code”, “Order Priority”] superstore_df.drop(columns=cols, inplace=True) superstore_df.info() <class ‘pandas.core.frame.DataFrame’> RangeIndex: 51290 entries, 0 to 51289 Data columns (total 21 columns): # Column Non-Null Count Dtype — —— ————– —– 0 Row ID 51290 non-null int64 1 Order Date 51290 non-null datetime64[ns] 2 Ship Date 51290 non-null datetime64[ns] 3 Ship Mode 51290 non-null object 4 Customer ID 51290 non-null object 5 Customer Name 51290 non-null object 6 Segment 51290 non-null object 7 City 51290 non-null object 8 State 51290 non-null object 9 Country 51290 non-null object 10 Region 51290 non-null object 11 Market 51290 non-null object 12 Product ID 51290 non-null object 13 Category 51290 non-null object 14 Sub-Category 51290 non-null object 15 Product Name 51290 non-null object 16 Sales 51290 non-null float64 17 Quantity 51290 non-null int64 18 Discount 51290 non-null float64 19 Profit 51290 non-null float64 20 Shipping Cost 51290 non-null float64 dtypes: datetime64[ns](2), float64(4), int64(2), object(13) memory usage: 8.2+ MB Since there are no null-values, we do not need to worry about that. However, the date columns here are not of the correct datatype. So we will quickly change that before inserting the data into GridDB. # Prepare the DataFrame superstore_df[“Order Date”] = pd.to_datetime(superstore_df[“Order Date”]).dt.to_pydatetime() superstore_df[“Ship Date”] = pd.to_datetime(superstore_df[“Ship Date”]).dt.to_pydatetime() # Rename columns to match GridDB naming (no spaces or dashes) superstore_df.columns = [col.replace(” “, “_”).replace(“-“, “_”) for col in superstore_df.columns] /tmp/ipykernel_1024/114528052.py:2: FutureWarning: The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result superstore_df[“Order Date”] = pd.to_datetime(superstore_df[“Order Date”]).dt.to_pydatetime() /tmp/ipykernel_1024/114528052.py:3: FutureWarning: The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result superstore_df[“Ship Date”] = pd.to_datetime(superstore_df[“Ship Date”]).dt.to_pydatetime() Let us verify that the datatypes are indeed changed. superstore_df.info() <class ‘pandas.core.frame.DataFrame’> RangeIndex: 51290 entries, 0 to 51289 Data columns (total 21 columns): # Column Non-Null Count Dtype — —— ————– —– 0 Row_ID 51290 non-null int64 1 Order_Date 51290 non-null datetime64[ns] 2 Ship_Date 51290 non-null datetime64[ns] 3 Ship_Mode 51290 non-null object 4 Customer_ID 51290 non-null object 5 Customer_Name 51290 non-null object 6 Segment 51290 non-null object 7 City 51290 non-null object 8 State 51290 non-null object 9 Country 51290 non-null object 10 Region 51290 non-null object 11 Market 51290 non-null object 12 Product_ID 51290 non-null object 13 Category 51290 non-null object 14 Sub_Category 51290 non-null object 15 Product_Name 51290 non-null object 16 Sales 51290 non-null float64 17 Quantity 51290 non-null int64 18 Discount 51290 non-null float64 19 Profit 51290 non-null float64 20 Shipping_Cost 51290 non-null float64 dtypes: datetime64[ns](2), float64(4), int64(2), object(13) memory usage: 8.2+ MB Okay, everything looks great! We can now put our clean data into GridDB. Putting the data into GridDB Before putting the data into GridDB, we need to create something called containers. Containers are like tables in any database. We define our schema that includes the name and data type of the column. Once we create the containers, it is very easy to put the data using the GridDB python client. Container Creation factory = griddb.StoreFactory.get_instance() host = “127.0.0.1:10001” cluster = “myCluster” db_user = “admin” db_password = “admin” try: gridstore = factory.get_store(notification_member=host, cluster_name=cluster, username=db_user, password=db_password) print(“Connection successful”) conInfo = griddb.ContainerInfo( name=”sales_data”, column_info_list=[ [“Row_ID”, griddb.Type.LONG], [“Order_Date”, griddb.Type.TIMESTAMP], [“Ship_Date”, griddb.Type.TIMESTAMP], [“Ship_Mode”, griddb.Type.STRING], [“Customer_ID”, griddb.Type.STRING], [“Customer_Name”, griddb.Type.STRING], [“Segment”, griddb.Type.STRING], [“City”, griddb.Type.STRING], [“State”, griddb.Type.STRING], [“Country”, griddb.Type.STRING], [“Region”, griddb.Type.STRING], [“Market”, griddb.Type.STRING], [“Product_ID”, griddb.Type.STRING], [“Category”, griddb.Type.STRING], [“Sub_Category”, griddb.Type.STRING], [“Product_Name”, griddb.Type.STRING], [“Sales”, griddb.Type.DOUBLE], [“Quantity”, griddb.Type.INTEGER], [“Discount”, griddb.Type.DOUBLE], [“Profit”, griddb.Type.DOUBLE], [“Shipping_Cost”, griddb.Type.DOUBLE] ], type=griddb.ContainerType.COLLECTION ) container = gridstore.put_container(conInfo) print(“Container creation successful”) except griddb.GSException as e: for i in range(e.get_error_stack_size()): print(“[“, i, “]”) print(“Error Code:”, e.get_error_code(i)) print(“Location:”, e.get_location(i)) print(“Message:”, e.get_message(i)) Connection successful Container creation successful If executed correctly, the above code should print out Connection successful followed by Container creation successful. If not, the exception message will help you troubleshoot. More information on the error codes are available here. Once our container is in place, let us put the data into it using the following code. Inserting the data container.put_rows(superstore_df) print(“Data insertion successful”) Data insertion successful It is very important to verify if our data has been inserted correctly before moving onto the data analysis. Let’s go ahead and do that. Reading the data try: gridstore = factory.get_store(notification_member=host, cluster_name=cluster, username=db_user, password=db_password) cont = gridstore.get_container(“sales_data”) if cont is None: print(“Does not exist”) print(“Connection successful”) query_str=”SELECT *” query = cont.query(query_str) rs = query.fetch() df = rs.fetch_rows() print(df.info()) except griddb.GSException as e: for i in range(e.get_error_stack_size()): print(“[“,i,”]”) print(e.get_error_code(i)) print(e.get_location(i)) print(e.get_message(i)) Connection successful <class ‘pandas.core.frame.DataFrame’> RangeIndex: 51290 entries, 0 to 51289 Data columns (total 21 columns): # Column Non-Null Count Dtype — —— ————– —– 0 Row_ID 51290 non-null int64 1 Order_Date 51290 non-null datetime64[ns] 2 Ship_Date 51290 non-null datetime64[ns] 3 Ship_Mode 51290 non-null object 4 Customer_ID 51290 non-null object 5 Customer_Name 51290 non-null object 6 Segment 51290 non-null object 7 City 51290 non-null object 8 State 51290 non-null object 9 Country 51290 non-null object 10 Region 51290 non-null object 11 Market 51290 non-null object 12 Product_ID 51290 non-null object 13 Category 51290 non-null object 14 Sub_Category 51290 non-null object 15 Product_Name 51290 non-null object 16 Sales 51290 non-null float64 17 Quantity 51290 non-null int64 18 Discount 51290 non-null float64 19 Profit 51290 non-null float64 20 Shipping_Cost 51290 non-null float64 dtypes: datetime64[ns](2), float64(4), int64(2), object(13) memory usage: 8.2+ MB None Great! Everything looks great. We can now go ahead with our analysis. Analysis I will define a function that takes the container, query, and output data type as an input and returns the result of the query. This helps me avoid writing the same code over and over again. def fetch_data(container, query_str, data_type): rs = container.query(query_str) row_set = rs.fetch() if row_set.has_next(): result = row_set.next() result = result.get(type=data_type) return result 1. Key Performance Indicators (KPI) Referencing the framework, we are tracking 4 KPIs – Total Sales Total Profit Avg Shipping Cost Avg Discount total_sales = fetch_data(cont, ‘SELECT SUM(Sales) FROM sales_data’, griddb.Type.DOUBLE) total_profit = fetch_data(cont, ‘SELECT SUM(Profit) FROM sales_data’, griddb.Type.DOUBLE) avg_shipping = fetch_data(cont, ‘SELECT AVG(Shipping_Cost) FROM sales_data’, griddb.Type.DOUBLE) avg_discount = fetch_data(cont, ‘SELECT AVG(Discount) FROM sales_data’, griddb.Type.DOUBLE) print(f”🔢 KPIs”) print(f”Total Sales: ${total_sales:,.2f}”) print(f”Total Profit: ${total_profit:,.2f}”) print(f”Avg Shipping Cost: ${avg_shipping:.2f}”) print(f”Avg Discount: {avg_discount:.2%}”) 🔢 KPIs Total Sales: $12,642,501.91 Total Profit: $1,467,457.29 Avg Shipping Cost: $26.48 Avg Discount: 14.29% Let’s go ahead and plot these. fig, axes = plt.subplots(2, 2, figsize=(10, 6)) fig.suptitle(“Key Performance Indicators (KPIs)”, fontsize=20, fontweight=’bold’) axes = axes.flatten() for ax in axes: ax.axis(‘off’) # KPIs kpi_labels = [“Total Sales”, “Total Profit”, “Avg Shipping Cost”, “Avg Discount”] kpi_values = [ f”${total_sales:,.2f}”, f”${total_profit:,.2f}”, f”${avg_shipping:.2f}”, f”{avg_discount:.2%}” ] kpi_colors = [“#4CAF50”, “#2196F3”, “#FF9800”, “#9C27B0”] # Fill KPI cards for i, ax in enumerate(axes): ax.text(0.5, 0.6, kpi_labels[i], fontsize=16, ha=’center’, va=’center’, weight=’bold’, color=’gray’) ax.text(0.5, 0.4, kpi_values[i], fontsize=24, ha=’center’, va=’center’, weight=’bold’, color=kpi_colors[i]) ax.set_facecolor(“#f7f7f7”) ax.set_frame_on(True) ax.patch.set_linewidth(1) ax.patch.set_edgecolor(“#ddd”) # Add vertical and horizontal dividers fig.subplots_adjust(hspace=0.3, wspace=0.3) fig_width, fig_height = fig.get_size_inches() # Vertical line in the center fig.lines.append(plt.Line2D([0.5, 0.5], [0.05, 0.95], color=”lightgray”, linewidth=1, linestyle=”–“, transform=fig.transFigure)) # Horizontal line in the center fig.lines.append(plt.Line2D([0.05, 0.95], [0.5, 0.5], color=”lightgray”, linewidth=1, linestyle=”–“, transform=fig.transFigure)) plt.tight_layout(rect=[0, 0, 1, 0.95]) plt.show() For the next set of metrics, we will be using multiple columns of the database. Let’s go ahead and retrieve it as a dataframe so that we can easily leverage pandas groupby and aggregation. query_str=”SELECT *” query = cont.query(query_str) rs = query.fetch() df = rs.fetch_rows() 2. Profitability We are tracking 2 key things here: products with most profit regions with most profit top_products_profit = df.groupby(‘Product_Name’)[‘Profit’].sum().sort_values(ascending=False).head(10) region_profit = df.groupby(‘Region’)[‘Profit’].sum().sort_values(ascending=False) fig, axes = plt.subplots(1, 2, figsize=(16, 6)) sns.barplot(x=top_products_profit.values, y=top_products_profit.index, hue=top_products_profit.index, palette=”viridis”, legend=False, ax=axes[0]) axes[0].set_title(“Top 10 Products by Profit”) axes[0].set_xlabel(“Profit”) axes[0].set_ylabel(“Product”) sns.barplot(x=region_profit.values, y=region_profit.index, hue=region_profit.index, palette=”magma”, legend=False, ax=axes[1]) axes[1].set_title(“Profit by Region”) axes[1].set_xlabel(“Profit”) axes[1].set_ylabel(“Region”) plt.tight_layout() plt.show() 3. Discount Similarly, for the second branch in our tree framework – Discount, we will be tracking 2 metrics: Best selling products (products that do not need discounts) Bottlenecks due to discounts (products with low or negative profit margin) top_sales_no_discount = df[df[‘Discount’] == 0].groupby(‘Product_Name’)[‘Sales’].sum().sort_values(ascending=False).head(10) df[‘Profit_Margin’] = df[‘Profit’] / df[‘Sales’] low_margin = df[(df[‘Discount’] > 0.3) & (df[‘Profit_Margin’] < 0)].groupby(‘Product_Name’)[‘Profit’].sum().sort_values().head(10) fig, axes = plt.subplots(1, 2, figsize=(16, 6)) sns.barplot(x=top_sales_no_discount.values, y=top_sales_no_discount.index, hue=top_sales_no_discount.index, palette=”cubehelix”, legend=False, ax=axes[0]) axes[0].set_title(“Top Selling Products Without Discount”) axes[0].set_xlabel(“Sales”) axes[0].set_ylabel(“Product”) sns.barplot(x=low_margin.values, y=low_margin.index, hue=low_margin.index, palette=”coolwarm”, legend=False, ax=axes[1]) axes[1].set_title(“High Discount, Low/Negative Profit”) axes[1].set_xlabel(“Profit”) axes[1].set_ylabel(“Product”) plt.tight_layout() plt.show() 4. Market Segmentation Lastly, we also want to look at the market segmentation and we can segment it by 4 parts: Quantity sold by region Profit by region Shipping cost by region Discount by region quantity_by_region = df.groupby(‘Region’)[‘Quantity’].sum().sort_values(ascending=False) shipping_by_region = df.groupby(‘Region’)[‘Shipping_Cost’].mean().sort_values(ascending=False) discount_by_region = df.groupby(‘Region’)[‘Discount’].mean().sort_values(ascending=False) fig, axes = plt.subplots(2, 2, figsize=(18, 12)) # Plot 1: Quantity Sold by Region sns.barplot(x=quantity_by_region.values, y=quantity_by_region.index, hue=quantity_by_region.index, palette=”crest”, legend=False, ax=axes[0, 0]) axes[0, 0].set_title(“Quantity Sold by Region”) axes[0, 0].set_xlabel(“Quantity”) axes[0, 0].set_ylabel(“Region”) # Plot 2: Profit by Region sns.barplot(x=region_profit.values, y=region_profit.index, hue=region_profit.index, palette=”flare”, legend=False, ax=axes[0, 1]) axes[0, 1].set_title(“Profit by Region”) axes[0, 1].set_xlabel(“Profit”) axes[0, 1].set_ylabel(“Region”) # Plot 3: Shipping Cost by Region sns.barplot(x=shipping_by_region.values, y=shipping_by_region.index, hue=shipping_by_region.index, palette=”rocket”, legend=False, ax=axes[1, 0]) axes[1, 0].set_title(“Shipping Cost by Region”) axes[1, 0].set_xlabel(“Shipping Cost”) axes[1, 0].set_ylabel(“Region”) # Plot 4: Discount by Region sns.barplot(x=discount_by_region.values, y=discount_by_region.index, hue=discount_by_region.index, palette=”ch:start=.2,rot=-.3″, legend=False, ax=axes[1, 1]) axes[1, 1].set_title(“Average Discount by Region”) axes[1, 1].set_xlabel(“Discount”) axes[1, 1].set_ylabel(“Region”) plt.tight_layout() plt.show() Insights The problem statement was how can we grow our business. We focused on increasing our profits and reducing our discounts. So what insights can we draw from the visualizations above? Some specific smartphones are one of the most profitable products for our business, so I would focus on if the demand is met and how can we improve the customer experience for that domain. Africa and Central Asia stores are facing overall losses. Therefore, we need to consider reducing the scale of the store. More information is needed in this regard. The second point is validated when we also look at the average discount in Africa and Asia – the discount is one of the highest while profit margins are negative. These regions, therefore, are contributing negatively to our overall business. Western Europe seems to be our biggest market with higher demand and higher profit margins. So, we should be doubling down here to grow our business. Conclusion In this tutorial, we developed an end-to-end BI solution for a superstore. The tutorial dived into problem-solving, critical thinking followed by detailed analysis and providing actionable insights. This is close to a real-world application of how a Data/Business Analyst would approach a business problem. We also saw that in order to have a centralized data repository, we can make use of GridDB. With GridDB’s python client, the integration of pandas for analysis seemed

More
Building a Stock Market Chatbot Application Using LangGraph and GridDB

With the advent of Large Language Models (LLMs), chatbot applications have become increasingly common, enabling more natural and intelligent interactions with data. In this article, you will see how to build a stock market chatbot using LangGraph, OpenAI API, and GridDB cloud. We will retrieve historical Apple stock price data from Yahoo Finance using the yfinance library, insert it into a GridDB container, and then query it using a chatbot built with LangGraph that utilizes the OpenAI GPT -4 model. GridDB is a high-performance time-series database designed for massive real-time workloads. Its support for structured containers, built-in compression, and lightning-fast reads and writes makes it ideal for storing and querying time series data such as stock market prices. Installing and Importing Required Libraries !pip install -q yfinance !pip install langchain !pip install langchain-core !pip install langchain-community !pip install langgraph !pip install langchain_huggingface !pip install tabulate !pip uninstall -y pydantic !pip install –no-cache-dir “pydantic>=2.11,<3” import yfinance as yf import pandas as pd import json import datetime as dt import base64 import requests import numpy as np from langchain_core.prompts import ChatPromptTemplate from langchain_openai import ChatOpenAI from langchain_core.output_parsers import StrOutputParser from langgraph.graph import START, END, StateGraph from langchain_core.messages import HumanMessage from langgraph.checkpoint.memory import MemorySaver from langchain_experimental.agents import create_pandas_dataframe_agent from langchain_openai import OpenAI from langchain.agents.agent_types import AgentType from typing_extensions import List, TypedDict from pydantic import BaseModel, Field from IPython.display import Image, display Inserting and Retrieving Stock Market Data From GridDB We will first import data from Yahoo Finance into a Python application. Next, we will insert this data into a GridDB container and will retrieve it. Importing Data from Yahoo Finance The yfinance Python library allows you to import data from Yahoo Finance. You need to pass the ticker name, as well as the start and end dates, for the data you want to download. The following script downloads the Apple stock price data for the year 2024. import yfinance as yf import pandas as pd ticker = “AAPL” start_date = “2024-01-01” end_date = “2024-12-31” dataset = yf.download(ticker, start=start_date, end=end_date, auto_adjust=False) # ─────────────────────────────────────────────────────────────── # 1. FLATTEN: keep the level that holds ‘Close’, ‘High’, … # ─────────────────────────────────────────────────────────────── if isinstance(dataset.columns, pd.MultiIndex): # find the level index where ‘Close’ lives for lvl in range(dataset.columns.nlevels): level_vals = dataset.columns.get_level_values(lvl) if ‘Close’ in level_vals: dataset.columns = level_vals # keep that level break else: # already flat – nothing to do pass # ─────────────────────────────────────────────────────────────── # 2. Select OHLCV, move index to ‘Date’ # ─────────────────────────────────────────────────────────────── dataset = dataset[[‘Close’, ‘High’, ‘Low’, ‘Open’, ‘Volume’]] dataset = dataset.reset_index().rename(columns={‘index’: ‘Date’}) dataset[‘Date’] = pd.to_datetime(dataset[‘Date’]) # optional: reorder columns dataset = dataset[[‘Date’, ‘Close’, ‘High’, ‘Low’, ‘Open’, ‘Volume’]] dataset.columns.name = None dataset.head() Output: The above output indicates that the dataset comprises the daily closing, opening, high, low, and volume prices for Apple stock. In the section, you will see how to insert this data into a GridDB cloud container. Establishing a Connection with GridDB Cloud After you create your GridDB cloud account and complete configuration settings, you can run the following script to see if you can access your database within a Python application. username = “your_user_name” password = “your_password” base_url = “your_griddb_host_url” url = f”{base_url}/checkConnection” credentials = f”{username}:{password}” encoded_credentials = base64.b64encode(credentials.encode()).decode() headers = { ‘Content-Type’: ‘application/json’, # Added this header to specify JSON content ‘Authorization’: f’Basic {encoded_credentials}’, ‘User-Agent’: ‘PostmanRuntime/7.29.0’ } response = requests.get(url, headers=headers) print(response.status_code) print(response.text) Output: 200 The above output indicates that you have successfully connected with your GridDB cloud host. Creating a Container for Inserting Stock Market Data in GridDB Cloud Next, we will insert the Yahoo Finance into GridDB. To do so, we will add another column, SerialNo which contains unique keys for each data row, as GridDB expects a unique key column in the dataset. Next, we will map Pandas dataframe column types to Gridb data types. dataset.insert(0, “SerialNo”, dataset.index + 1) dataset[‘Date’] = pd.to_datetime(dataset[‘Date’]).dt.strftime(‘%Y-%m-%d’) # “2024-01-02” dataset.columns.name = None # Mapping pandas dtypes to GridDB types type_mapping = { “int64”: “LONG”, “float64”: “DOUBLE”, “bool”: “BOOL”, ‘datetime64’: “TIMESTAMP”, “object”: “STRING”, “category”: “STRING”, } # Generate the columns part of the payload dynamically columns = [] for col, dtype in dataset.dtypes.items(): griddb_type = type_mapping.get(str(dtype), “STRING”) # Default to STRING if unknown columns.append({ “name”: col, “type”: griddb_type }) columns Output: [{‘name’: ‘SerialNo’, ‘type’: ‘LONG’}, {‘name’: ‘Date’, ‘type’: ‘STRING’}, {‘name’: ‘Close’, ‘type’: ‘DOUBLE’}, {‘name’: ‘High’, ‘type’: ‘DOUBLE’}, {‘name’: ‘Low’, ‘type’: ‘DOUBLE’}, {‘name’: ‘Open’, ‘type’: ‘DOUBLE’}, {‘name’: ‘Volume’, ‘type’: ‘LONG’}] The above output displays the dataset column names and their corresponding GridDB-compliant data types. The next step is to create a GridDB container. To do so, you need to pass the container name, container type, and a list of column names and their data types. url = f”{base_url}/containers” container_name = “stock_db” # Create the payload for the POST request payload = json.dumps({ “container_name”: container_name, “container_type”: “COLLECTION”, “rowkey”: True, # Assuming the first column as rowkey “columns”: columns }) # Make the POST request to create the container response = requests.post(url, headers=headers, data=payload) # Print the response print(f”Status Code: {response.status_code}”) Adding Stock Data to GridbDB Cloud Container Once you have created a container, you must convert the data from your Pandas dataframe into the JSON format and call a put request to insert data into GridDB. url = f”{base_url}/containers/{container_name}/rows” # Convert dataset to list of lists (row-wise) with proper formatting def format_row(row): formatted = [] for item in row: if pd.isna(item): formatted.append(None) # Convert NaN to None elif isinstance(item, bool): formatted.append(str(item).lower()) # Convert True/False to true/false elif isinstance(item, (int, float)): formatted.append(item) # Keep integers and floats as they are else: formatted.append(str(item)) # Convert other types to string return formatted # Prepare rows with correct formatting rows = [format_row(row) for row in dataset.values.tolist()] # Create payload as a JSON string payload = json.dumps(rows) # Make the PUT request to add the rows to the container response = requests.put(url, headers=headers, data=payload) # Print the response print(f”Status Code: {response.status_code}”) print(f”Response Text: {response.text}”) Output: Status Code: 200 Response Text: {“count”:251} If you see the above response, you have successfully inserted the data. Retrieving Data from GridDB After inserting the data, you can perform a variety of operations on the dataset. Let’s see how to import data from a GridDB container into a Pandas dataframe. container_name = “stock_db” url = f”{base_url}/containers/{container_name}/rows” # Define the payload for the query payload = json.dumps({ “offset”: 0, # Start from the first row “limit”: 10000, # Limit the number of rows returned “condition”: “”, # No filtering condition (you can customize it) “sort”: “” # No sorting (you can customize it) }) # Make the POST request to read data from the container response = requests.post(url, headers=headers, data=payload) # Check response status and print output print(f”Status Code: {response.status_code}”) if response.status_code == 200: try: data = response.json() print(“Data retrieved successfully!”) # Convert the response to a DataFrame rows = data.get(“rows”, []) stock_dataset = pd.DataFrame(rows, columns=[col for col in dataset.columns]) except json.JSONDecodeError: print(“Error: Failed to decode JSON response.”) else: print(f”Error: Failed to query data from the container. Response: {response.text}”) print(stock_dataset.shape) stock_dataset.head() Output: The above output shows the data retrieved from the GridDB container. We store the data in a Pandas dataframe. You can store the data in any other format if you want. Once you have the data, you can create a variety of AI and data science applications. Creating a Stock Market Chatbot Using GridDB Data In this next section, you will see how to create a simple chatbot in LangGraph framework, which calls the OpenAI API to answer your questions about the Apple stock price you just retrieved from the GridDB. Creating a Graph in LangGraph To create a Graph in LangGraph, you need to define its state. A graph’s state contains attributes that are shared between multiple graph nodes. Since we only need to store questions and answers, we create the following graph state. class State(TypedDict): question: str answer: str We will use the create_pandas_dataframe_agent from LangChain to answer our questions since we retrieved data from Gridb into a Pandas dataframe. We will create the agent object and will call it inside the run_llm() function we define. We will use this function in our LangGraph node. api_key = “YOUR_OPENAI_API_KEY” llm = ChatOpenAI(model = ‘gpt-4o’, api_key = api_key) agent = create_pandas_dataframe_agent(llm, stock_dataset, verbose=True, agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION, allow_dangerous_code=True) def run_llm(state: State): question = state[‘question’] response = agent.invoke(question) return {‘answer’: response[‘output’]} Finally, we define the graph for our chatbot. The graph consists of only one node, ask_question, which calls the run_llm() function. Inside the function, we call the create_pandas_dataframe_agent(), which answers questions about the dataset. graph_builder=StateGraph(State) graph_builder.add_node(“ask_question”, run_llm) graph_builder.add_edge(START,”ask_question”) graph_builder.add_edge(“ask_question”,END) graph = graph_builder.compile() display(Image(graph.get_graph().draw_mermaid_png())) Output: The above output shows the flow of our graph. Asking Questions Let’s test our chatbot by asking some questions. We will first ask our chatbot about the month that had the highest average opening price—also, the month where people made the most profit in day trading. question = [HumanMessage(content=”Which month had the highest average opening stock price? And what is the month where people made most profit in day trading?”)] result = graph.invoke({“question”: question}) print(result[‘answer’]) Output: The output above shows the chatbot’s response. That is correct; I verified it manually using a Python script. Let’s ask it to be more creative and see if it finds any interesting patterns in the dataset. question = [HumanMessage(content=”Do you find any interesting patterns in the dataset?”)] result = graph.invoke({“question”: question}) print(result[‘answer’]) Output: g src=”images\img5-chatbot-response2b.png”> The above output shows the first part of the reply. You can see that the chatbot is intelligent enough to draw a plot for the closing prices to identify interesting patterns. The following output shows some interesting observations from the dataset. Output: g src=”images\img5-chatbot-response2b.png”> Conclusion The article demonstrates how to create an OpenAI API-based chatbot that answers questions related to data retrieved from GridDB. If you have any questions or need help with GridDB cloud, you can leave your query on Stack Overflow using the griddb tag. Our team will be happy to answer it. For the complete code of this article, visit my GridDB Blogs GitHub

More
New Features for the GridDB Cloud CLI Tool

The GridDB Cloud CLI Tool aims to make routine maintenance of checking of your GridDB Cloud instance a much easier endeavor. When we first introduced it, it was able to do your basic CRUD commands, but it lacked the ability to read from the filesystem to create containers/tables and to push data to those containers. Although maybe a subtle difference, essentially before, whenever the tool was run to CREATE, you needed manual intervention (known as interactive mode) to use the tool. I’m sure you know where I am going with this — in this latest release, we have added some functionality revolved around reading JSON files from the filesystem which can help certain workflow, such as to help automate some testing on your GridDB Cloud, for instance. Being able to read from JSON files to create tables also means we are able to migrate from GridDB CE to GridDB Cloud as discussed in this blog: https://griddb.net/en/blog/griddb-cloud-on-azure-marketplace-how-to-migrate-3-ways/. So for this article, since there’s not much in the way of showcasing features, I will walk through the technical updates made to the tool, as well as a simple demo of how this tool can be useful. Creating Container from JSON File First off, I changed the way GridDB Cloud CLI Tool handles the create command as explained above. From now on, when you use Create, if you want to create a new container by filling out CLI prompts, you can use the -i flag to indicate –interactive mode. So if you run: griddb-cloud-cli create -i, it will begin the process of asking a series of questions until you get the container you want. And now if you run griddb-cloud-cli create the tool will expect exactly one argument, a json file. The JSON file format is modeled to be the same as exported by the GridDB CE tool, here’s an example: { “version”:”5.6.0″, “database”:”public”, “container”:”device”, “containerType”:”TIME_SERIES”, “containerFileType”:”csv”, “containerFile”:[ “public.device_2020-07-11_2020-07-12.csv”, “public.device_2020-07-12_2020-07-13.csv”, “public.device_2020-07-13_2020-07-14.csv”, “public.device_2020-07-14_2020-07-15.csv”, “public.device_2020-07-15_2020-07-16.csv”, “public.device_2020-07-16_2020-07-17.csv”, “public.device_2020-07-17_2020-07-18.csv”, “public.device_2020-07-18_2020-07-19.csv”, “public.device_2020-07-19_2020-07-20.csv” ], “partitionNo”:14, “columnSet”:[ { “columnName”:”ts”, “type”:”timestamp”, “notNull”:true }, { “columnName”:”co”, “type”:”double”, “notNull”:false }, { “columnName”:”humidity”, “type”:”double”, “notNull”:false }, { “columnName”:”light”, “type”:”boolean”, “notNull”:false }, { “columnName”:”lpg”, “type”:”double”, “notNull”:false }, { “columnName”:”motion”, “type”:”boolean”, “notNull”:false }, { “columnName”:”smoke”, “type”:”double”, “notNull”:false }, { “columnName”:”temp”, “type”:”double”, “notNull”:false } ], “rowKeySet”:[ “ts” ], “timeSeriesProperties”:{ “compressionMethod”:”NO”, “compressionWindowSize”:-1, “compressionWindowSizeUnit”:”null”, “expirationDivisionCount”:-1, “rowExpirationElapsedTime”:-1, “rowExpirationTimeUnit”:”null” }, “compressionInfoSet”:[ ], “timeIntervalInfo”:[ { “containerFile”:”public.device_2020-07-11_2020-07-12.csv”, “boundaryValue”:”2020-07-11T17:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-12_2020-07-13.csv”, “boundaryValue”:”2020-07-12T00:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-13_2020-07-14.csv”, “boundaryValue”:”2020-07-13T00:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-14_2020-07-15.csv”, “boundaryValue”:”2020-07-14T00:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-15_2020-07-16.csv”, “boundaryValue”:”2020-07-15T00:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-16_2020-07-17.csv”, “boundaryValue”:”2020-07-16T00:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-17_2020-07-18.csv”, “boundaryValue”:”2020-07-17T00:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-18_2020-07-19.csv”, “boundaryValue”:”2020-07-18T00:00:00.000-0700″ }, { “containerFile”:”public.device_2020-07-19_2020-07-20.csv”, “boundaryValue”:”2020-07-19T00:00:00.000-0700″ } ] } Now, obviously, if you were just writing your own json files, you wouldn’t need a lot of the information from the migration tool, so instead you can create something with just the bare minimum. { “database”:”public”, “container”:”device10″, “containerType”:”TIME_SERIES”, “columnSet”:[ { “columnName”:”ts”, “type”:”timestamp”, “notNull”:true }, { “columnName”:”co”, “type”:”double”, “notNull”:false }, { “columnName”:”humidity”, “type”:”double”, “notNull”:false }, { “columnName”:”light”, “type”:”boolean”, “notNull”:false }, { “columnName”:”lpg”, “type”:”double”, “notNull”:false }, { “columnName”:”motion”, “type”:”boolean”, “notNull”:false }, { “columnName”:”smoke”, “type”:”double”, “notNull”:false }, { “columnName”:”temp”, “type”:”double”, “notNull”:false } ], “rowKeySet”:[ “ts” ] } Small note, you can also use the -f flag to avoid the confirmation of the container you are about to create, again avoiding manual user input when committing an action. Technical Details of Implementing Let’s take a quick look at the underlying code of writing a program which can take the JSON file from the filesystem and push it to GridDB Cloud to be made into a new container. First, let’s look at interpreting the file after being read from the filesystem: func ParseJson(jsonName string) (cmd.ContainerInfo, []string) { filename := jsonName properties, err := os.ReadFile(filename) if err != nil { log.Fatal(err) } var exportProperties ExportProperties err = json.Unmarshal(properties, &exportProperties) if err != nil { log.Fatal(err) } //fmt.Println(exportProperties) var conInfo cmd.ContainerInfo conInfo.ContainerName = exportProperties.Container conInfo.ContainerType = exportProperties.ContainerType conInfo.RowKey = len(exportProperties.RowKeySet) > 0 cols := transformToConInfoCols(exportProperties.ColumnSet) conInfo.Columns = cols return conInfo, exportProperties.ContainerFile } Here we read the file contents, create a data structure of type ExportProperties which was declared earlier (and shown below). We unmarshal (think of mapping) the json file to match our struct so that our program knows which values correspond to which keys. From there, we do some other processing to create the data object which will be sent to GriddB Cloud via an HTTP Web Request. // data struct to unmarshal user json file type ExportProperties struct { Version string `json:”version,omitempty”` Database string `json:”database,omitempty”` Container string `json:”container”` ContainerType string `json:”containerType,omitempty”` ContainerFileType string `json:”containerFileType,omitempty”` ContainerFile ContainerFile `json:”containerFile”` ColumnSet []ColumnSet `json:”columnSet”` RowKeySet []string `json:”rowKeySet”` } Here we define the entire struct which may be read from either the user-made JSON, or the GridDB CE migration tool-generated JSON (notice the keys which are omitted in the user JSONs have a special designation of omitempty). One last thing I’ll point out is that we needed to use a custom unmarshaller for this specific struct (ContainerFile) because when the GridDB export tool outputs the meta JSON file, the containerFile value is sometimes a single string, and sometimes it’s a slice of strings. Here’s the custom unmarshaller: // custom JSON unmarshaller for the case where sometimes the value is a slice // and sometimes it’s just a singular string func (c *ContainerFile) UnmarshalJSON(data []byte) error { var nums any err := json.Unmarshal(data, &nums) if err != nil { return err } items := reflect.ValueOf(nums) switch items.Kind() { case reflect.String: *c = append(*c, items.String()) case reflect.Slice: *c = make(ContainerFile, 0, items.Len()) for i := 0; i < items.Len(); i++ { item := items.Index(i) switch item.Kind() { case reflect.String: *c = append(*c, item.String()) case reflect.Interface: *c = append(*c, item.Interface().(string)) } } } return nil } Examples Of Using New Features As explained above, you can now use the tool as part of your workflow and bash scripting. So you can, for example, run a script with cron to track your Downloads folder list of files and general file size. Let's take a look at a working example. Bash Script to Track Downloads Directory As a silly example of how you can use the tool in your bash scripts, let's create a TIME_SERIES container which will keep track of the status of your downloads directory. We will create a JSON file with the container's schema, then we will use a simple bash script to push the current datetime and a couple of data points to our table. We then schedule a cron to run the script every 1 hour so that we have up to date data. First, the JSON file: { "database": "public", "container": "download_data", "containerType": "TIME_SERIES", "columnSet": [ { "columnName": "ts", "type": "timestamp", "notNull": true }, { "columnName": "file_count", "type": "integer", "notNull": false }, { "columnName": "total_size", "type": "string", "notNull": false } ] } And then our script, which will create the table (I realize it's not the best showcase as we are using just one table, but you could also create a situation where you push logs to a table and create new daily tables, kind of like fluentd), and then push up to date data of your directory: #!/bin/bash DOWNLOADS_DIR="/Users/israelimru/Downloads" FILE_COUNT=$(ls -1A "$DOWNLOADS_DIR" | wc -l | xargs) TOTAL_SIZE=$(du -sh "$DOWNLOADS_DIR" | awk '{print $1}') LOG_DATA="NOW(),$FILE_COUNT,$TOTAL_SIZE" griddb-cloud-cli create /Users/israelimru/download_table.json -f griddb-cloud-cli put -n download_data -v $LOG_DATA echo "Log complete." NOTE: It's crucial to use the -f flag when calling the create command because it won't ask for a prompt, it will simply create the table for you (and simply fail if the table already exists, which is fine!) Here you can see our LOG_DATA uses the NOW() variable; this will be auto translated by the tool to be a current timestamp in the proper format. And then we just grab the relevant data points using simple unix commands and push that data to the Cloud. Then set up the cronjob: 0 * * * * /your/script/testing.sh Conclusion Being able to use the GridDB Cloud CLI Tool without user intervention opens up some new possibilities and we hope these changes can be useful for you and your team. To grab a copy of the cloud tool, you can check here: https://github.com/Imisrael/griddb-cloud-cli. Source code for this blog is simply that bash script up in the section prior, so feel free to copy and use as you

More
Generate Podcast from PDF using AI

Why This Project? Turn your complex PDFs into clear, natural-sounding AI-powered podcasts instantly. Ideal for busy professionals, auditory learners, and passive learning workflows. Core Benefits: Hands-free learning, ideal during commutes or chores. Easily transforms technical PDFs into engaging audio. Drastically reduces content-creation time. Problem & Why This Matters Converting PDFs manually is tedious: Difficult text extraction from PDFs. Complex manual summarization and audio recording overhead. Messy PDF management/storage. Your AI-powered podcast solution automates every step—solving these clearly. Introducing the AI-powered PDF-to-Podcast Generation System The diagram above illustrates the simplified workflow of the AI-powered PDF-to-podcast system: User Uploads PDF: The user submits a PDF document to the platform. OCR Text Extraction (Mistral AI OCR): Mistral AI OCR accurately extracts the text content from the uploaded PDF. Summarize Content (OpenAI): The extracted text is summarized by OpenAI to simplify complex content for easier listening. Convert Text to Podcast (OpenAI TTS): OpenAI’s Text-to-Speech converts the summarized text into natural, engaging audio. Store Data (GridDB Cloud): The summarized text and associated data are efficiently stored in the GridDB Cloud for future retrieval. Podcast Playback: Users access the simplified, engaging podcast directly for convenient listening. Run the Project To run this project, you should clone the apps directory from this repository. git clone https://github.com/junwatu/griddb-podcast-creator.git cd griddb-podcast-creator cd apps npm install Create a .env file for the project credentials with these keys: MISTRAL_API_KEY= OPENAI_API_KEY= GRIDDB_WEBAPI_URL= GRIDDB_PASSWORD= GRIDDB_USERNAME= You need the Mistral API key for OCR functionality, the OpenAI key for the text-to-speech (TTS) conversion, and GridDB keys for the data storage. Please, look in this section on how to get those keys. Run the project using this command: npm run dev Go to the default app URL http://localhost:3000 Browse for the PDF file and click the Convert to Podcast button to generate podcast. Prerequisites Node.js You need Node.js installed because this project uses Next.js. Install the Node LTS version from here. Mistral OCR API Setup A Mistral API key is needed to use the OCR functionality. Create the API key here. OpenAI API Setup Create the OpenAI API key here. You may need create a project and enable few models. In this project, we will use two AI models from OpenAI: gpt-4o to create audio script. gpt-4o-mini-tts for generating audio from the script. GridDB Cloud Setup The GridDB Cloud offers a free plan tier and is officially available worldwide. You need these GridDB environment variables in the .env file: GRIDDB_WEBAPI_URL= GRIDDB_USERNAME= GRIDDB_PASSWORD= Sign Up for GridDB Cloud Free Plan If you would like to sign up for a GridDB Cloud Free instance, you can do so in the following link: https://form.ict-toshiba.jp/download_form_griddb_cloud_freeplan_e. After successfully signing up, you will receive a free instance along with the necessary details to access the GridDB Cloud Management GUI, including the GridDB Cloud Portal URL, Contract ID, Login, and Password. GridDB WebAPI URL Go to the GridDB Cloud Portal and copy the WebAPI URL from the Clusters section. It should look like this: GridDB Username and Password Go to the GridDB Users section of the GridDB Cloud portal and create or copy the username for GRIDDB_USERNAME. The password is set when the user is created for the first time, use this as the GRIDDB_PASSWORD. For more details, to get started with GridDB Cloud, please follow this quick start guide. IP Whitelist When running this project, please ensure that the IP address where the project is running is whitelisted. Failure to do so will result in a 403 status code or forbidden access. You can use a website like What Is My IP Address to find your public IP address. To whitelist the IP, go to the GridDB Cloud Admin and navigate to the Network Access menu. Building the Podcast Generator Developing the Next.js Web Interface & API The main important code is the API route that handles PDF upload and then processes it. This is the snippet code from the route.ts file in the apps/app/api/upload directory: export async function POST(request: NextRequest) { try { const formData = await request.formData(); const file = formData.get(‘file’) as File; if (!file) { return NextResponse.json( { error: ‘No file uploaded’ }, { status: 400 } ); } if (file.type !== ‘application/pdf’) { return NextResponse.json( { error: ‘Invalid file type. Please upload a PDF file’ }, { status: 400 } ); } const maxSize = 10 * 1024 * 1024; // 10MB in bytes if (file.size > maxSize) { return NextResponse.json( { error: ‘File size too large. Maximum size is 10MB’ }, { status: 400 } ); } const bytes = await file.arrayBuffer(); const buffer = Buffer.from(bytes); const timestamp = Date.now(); const randomString = Math.random().toString(36).substring(7); const tempFilename = `upload_${timestamp}_${randomString}.pdf`; const tempFilePath = join(os.tmpdir(), tempFilename); await writeFile(tempFilePath, buffer); // Extract data from PDF const { content: pdfContent, response: ocrResponse } = await ocrService.processFile(tempFilePath, file.name); // Generate script for the audio from the extracted data const audioScript = await openaiService.generatePodcastScript(pdfContent); // Generate audio from the script const audioFiles = await generatePodcastAudio(audioScript, process.env.OPENAI_API_KEY || ”, { voice: ‘verse’, outputDir: audioDir, instructions: instructions, outputFormat: ‘mp3’, }); const cleanedAudioFiles = cleanAudioPaths(audioFiles); // Save the data into GridDB database const podcastData: GridDBData = { id: generateRandomID(), audioFiles: JSON.stringify(cleanedAudioFiles), audioScript: JSON.stringify(audioScript), // @ts-ignore ocrResponse: JSON.stringify(ocrResponse), }; const result = await dbClient.insertData({ data: podcastData }); return NextResponse.json({ message: ‘File uploaded successfully’, fileName: file.name, fileSize: file.size, tempFilePath: tempFilePath, ocrResponse: ocrResponse, audioFiles: audioFiles, audioScript: audioScript, }); } catch (error) { console.error(‘Error uploading file:’, error); return NextResponse.json( { error: ‘Failed to upload file’ }, { status: 500 } ); } } Let’s look at the code in detail: 1. OCR Extraction of PDF Content /** Extracted data from PDF */ const { content: pdfContent, response: ocrResponse } = await ocrService.processFile(tempFilePath, file.name); Function: The uploaded PDF file is sent to an OCR (Optical Character Recognition) service for processing. Outcome: Mistral OCR extracts textual content from the PDF and provides a full response (ocrResponse) and the PDF content (pdfContent). 2. Generating the Podcast Script /** Generate script for the audio from the extracted data */ const audioScript = await openaiService.generatePodcastScript(pdfContent); Function: Pass the extracted textual content (pdfContent) to an OpenAI-powered service. Outcome: A structured podcast script (audioScript) suitable for text-to-speech conversion. 3. Creating Audio from the Generated Script /** Generate audio from the script */ const audioFiles = await generatePodcastAudio(audioScript, process.env.OPENAI_API_KEY || ”, { voice: ‘verse’, outputDir: audioDir, instructions: instructions, outputFormat: ‘mp3′, }); Function: Convert the generated podcast script into audio form using an OpenAI text-to-speech API. Parameters: voice: Determines vocal style (e.g., ‘verse’). outputDir: The destination directory for audio files. In this project, it is set to the public/audio directory. instructions: Extra refinement or instructional parameters for audio quality. outputFormat: Audio format set as ‘mp3’. Outcome: The data location of audio segments (audioFiles). 4. Saving Data to GridDB Database /** Save the data into GridDB database */ const podcastData: GridDBData = { id: generateRandomID(), audioFiles: JSON.stringify(cleanedAudioFiles), audioScript: JSON.stringify(audioScript), // ts ignore // @ts-ignore ocrResponse: JSON.stringify(ocrResponse) } const result = await dbClient.insertData({ data: podcastData }); Function: Collect and organize processed data (ID, audio files, podcast script, OCR response) into a structured object (podcastData) with appropriate type conversions (JSON serialization) to match the GridDB data schema. Outcome: Persistently stores generated audio metadata, associated scripts, and OCR extraction data into the GridDB database, providing future retrieval and management capabilities. In summary, the main functionality follows a clear sequence from our system diagram before. Integrating OCR using Mistral AI Mistral OCR is an Optical Character Recognition API that sets a new standard in document understanding. Unlike other models, Mistral OCR comprehends each element of documents—media, text, tables, equations—with unprecedented accuracy and cognition. It takes images and PDFs as input and extracts content in an ordered interleaved text and images. In this project, we will use Mistral OCR to extract text from PDFs. The process involves: 1. Uploading the PDF file to Mistral. const uploaded_pdf = await this.client.files.upload({ file: { fileName: fileName, content: file, }, purpose: “ocr”, }); 2. Retrieving the signed URL for the uploaded PDF. const signedUrl = await this.client.files.getSignedUrl({ fileId: uploaded_pdf.id, }); 3. Sending the signed URL to Mistral OCR for text extraction. const ocrResponse = await this.client.ocr.process({ model: “mistral-ocr-latest”, document: { type: “document_url”, documentUrl: signedUrl.url, } }); This final step will yield a complete OCR data response from Mistral. Get PDF key points and summarization using OpenAI We won’t convert all the content of the PDF extraction text because it will be to long. The best way is to summarize and get the key points of the extraction data. For this task, we will use the gpt-4o model. This is the system prompt to extract meaningful content from PDF’s extracted data: Create a 5-minute podcast episode script in a conversational style, using the content provided.\n\nInclude the following elements:\n\n- **Introduction**: Engage your audience with an intriguing opening statement related to the topic. Capture their attention immediately.\n\n- **Main Talking Points**: Develop 3-4 main sections discussing the central ideas or arguments. Use relatable examples and personal stories for better understanding. Maintain a conversational tone, as if you are speaking directly to the listener. Ensure natural transitions between sections to keep the flow.\n\n- **Conclusion**: Summarize the key takeaways in a concise manner, making sure to leave a lasting impression.\n\n- **Call to Action**: End with a clear and compelling call to action encouraging listeners to engage further or reflect on the topic.\n\n# Output Format\n\nWrite the script in a conversational and engaging narrative suitable for a podcast. Each section should integrate seamlessly with transitions, emulate a direct speaking style to engage the listener, and reinforce the message.\n\n# Examples\n\n**Introduction**: \”Welcome to [Podcast Name]. Today, we’re diving into [Topic]. Have you ever wondered…?\”\n\n**Main Talking Points**:\n\n1. \”Let’s start with [Main Idea]. It’s like when…\”\n2. \”Moving on to [Next Idea], consider how…\”\n3. \”Finally, when we talk about [Final Idea], there’s a story about…\”\n\n**Conclusion**: \”So, as we’ve learned today, [Key Takeaway 1], [Key Takeaway 2]…\”\n\n**Call to Action**: \”Think about how you can [Action]. Join us next time when we explore…\”\n\n# Notes\n\n- The script should be written to cater both to novices and those with some prior knowledge.\n- Ensure it resonates intellectually and stimulates curiosity among listeners.\n- Use transition words to guide listeners smoothly from one idea to the next. To keep the response consistent, we can use the schema feature. So, basically, we can force the AI model response to match a predefined data structure or schema: { “introduction”: “Welcome to our podcast! Today, we’re exploring how AI can revolutionize the way we consume content by transforming PDFs into engaging audio podcasts. Have you ever wished you could listen to your documents instead of reading them? Let’s dive in!”, “main_talking_points”: [ { “title”: “The Challenges of Manual PDF-to-Podcast Conversion”, “content”: “Manually converting PDFs into podcasts is a tedious process. It involves extracting text, summarizing complex content, and recording audio—all of which take significant time and effort. AI simplifies this by automating these steps, saving you hours of work.” }, { “title”: “How AI Simplifies the Process”, “content”: “AI tools like Mistral OCR and OpenAI TTS streamline the workflow. Mistral OCR extracts text from PDFs with high accuracy, while OpenAI’s models summarize and convert the text into natural-sounding audio. This ensures a seamless and efficient process.” }, { “title”: “The Role of GridDB in Managing Data”, “content”: “GridDB Cloud acts as a robust storage solution for parsed text and audio files. It ensures that your data is organized, easily retrievable, and ready for future use, making the entire system scalable and efficient.” } ], “conclusion”: “In summary, AI-powered tools are transforming the way we interact with content. By automating the conversion of PDFs into podcasts, we save time, enhance accessibility, and create a more engaging learning experience.”, “call_to_action”: “Think about how you can leverage this technology in your own projects. Visit our GitHub repository to get started, and don’t forget to share your feedback!” } The gpt-4o will response data with these keys: introduction main_talking_points conclusion call_to_action With this format then it will be easier to convert the text to audio for our podcast application. For more information about the data schema code, please look into the openai.ts file in apps\app\lib folder. Generating Podcast using OpenAI TTS We will use the gpt-4o-mini-tts model from OpenAI to generate speech from text. This model is capable of controlling the voice of your generated audio with additional instructions. We will process the audio in two processes based on the schema response from the OpenAI model. 1. Process introduction, conclusion, and call_to_action. This code will process the introduction, conclusion, and call to action into audio. const simpleKeys = [‘introduction’, ‘conclusion’, ‘call_to_action’] as const; for (const key of simpleKeys) { try { const text = podcastData[key]; const fileName = `${key}.${outputFormat}`; const speechFile = path.join(outputDir, fileName); const response = await openai.audio.speech.create({ model, voice, instructions, input: text }); const buffer = Buffer.from(await response.arrayBuffer()); fs.writeFileSync(speechFile, buffer); audioFiles[key] = speechFile; } catch (error) { console.error(`Error processing ${key}:`, error); throw error; } } 2. Process main_talking_points This code will process the main content or talking points into audio. // Process main talking points separately if (Array.isArray(podcastData.main_talking_points)) { for (let i = 0; i < podcastData.main_talking_points.length; i++) { try { const point = podcastData.main_talking_points[i]; const text = point.content; const fileName = `talking_point_${i}.${outputFormat}`; const speechFile = path.join(outputDir, fileName); const response = await openai.audio.speech.create({ model, voice, instructions, input: text }); const buffer = Buffer.from(await response.arrayBuffer()); fs.writeFileSync(speechFile, buffer); audioFiles[`talking_point_${i}`] = speechFile; } catch (error) { console.error(`Error processing talking point ${i}:`, error); throw error; } } Storing Data to GridDB Cloud The column data or schema for the GridDB database is simple: export interface GridDBData { id: string | number; ocrResponse: Blob; audioScript: string; audioFiles: string; } Then to save data to the GridDB, in this project the code is in the insertData function: async function insertData({ data, containerName = 'podcasts', }: { data: GridDBData; containerName?: string; }): Promise { console.log(data); try { const row = [ parseInt(data.id.toString(), 10), data.ocrResponse, data.audioScript, data.audioFiles, ]; const path = `/containers/${containerName}/rows`; return await makeRequest(path, [row], 'PUT'); } catch (error) { if (error instanceof GridDBError) { throw error; } const errorMessage = error instanceof Error ? error.message : 'Unknown error'; throw new GridDBError(`Failed to insert data: ${errorMessage}`, undefined, undefined, error); } } The core code actually just PUT operation on REST route path /containers/podcasts/rows. It's so easy to use GridDB on the cloud. The full source code for saving data into GridDB is in the apps/app/lib/griddb.ts file. This file later will be used in the route.ts file as an API. User Interface The code for the user interface resides in a single file, page.tsx. It is developed using React and Shadcn components, featuring two primary tabs: 1. Upload & Convert PDF The user will browse and upload a PDF file. Once the file is selected, the user can click the Convert to Podcast button. This action will make the application process the PDF using OCR to extract text and then generate an audio podcast using AI-generated voices. You can test the application using the sample PDF files available in the pdfs directory. These files are sourced from arXiv and contain highly technical content, making them ideal for evaluating the application's capabilities. 2. Podcast Result After successful conversion, in the Your Podcasts tab the user can listen to the podcast, using the audio player and section list to navigate through the content easily. Possible enhancements Custom Voice Options: Provide users the option for different voices or accents. Podcast hosting integration: Connect your podcasts directly to platforms like Spotify, Apple Podcasts, RSS feeds, etc. Improved UI/UX: Provide users better controls over file management & audio playback. Resources Project source code. GridDB Cloud Web API, Mistral OCR, OpenAI TTS,

More
GridDB Cloud on Azure Marketplace: How To Migrate (3 Ways!)

If you are thinking about switching to the GridDB Cloud Azure Marketplace instance, first, you can read about how to do that here: GridDB Cloud on Microsoft Azure Marketplace. Second, you may be worried about how you may transfer your existing data from your GridDB Free Plan, from your local GridDB CE instance, or even from Postgresql. Here are the distinct sections: Migrating from GridDB Free Plan Migrating From PostgreSQL Migrating From GridDB CE In this blog, we will walkthrough the migration process of moving your data from a GridDB Free Plan, a third party database (postgresql in this case), and a local GridDB CE instance. The process is different for each one, so let’s go through them 1-by-1. Migrating from GridDB Free Plan First of all, if you are unsure what the GridDB Free Plan is, you can look here: GridDB Cloud Quick Start Guide This is by far the easiest method of conducting a full-scale migration. The high level overview is that the TDSL (Toshiba Digital Solution) support team will handle everything for you. TDSL Support When you sign up for the GridDB Pay As You Go plan, as part of the onboarding process, you will receive an email with the template you will need to use when contacting support for various functions, including data migration! So, grab your pertinent information (contract ID, GridDB ID, etc) and the template and let’s send an email. Compose an email to tdsl-ms-support AT toshiba-sol.co.jp with the following template Contract ID: [your id] GridDB ID: [your id] Name: Israel Imru E-mail: imru@fixstars.com Inquiry Details: I would like to migrate from my GridDB Free Plan Instance to my GridDB pay as you go plan Occurrence Date: — Collected Information: — The team will usually respond within one business day to confirm your operation and with further instructions. For me, they sent me the following: Please perform the following operations in the management GUI of the source system. After completing the operations, inform us of the date and time when the operations were performed. 1. Log in to the management GUI. 2. From the menu on the left side of the screen, click [Query]. 3. Enter the following query in the [QUERY EDITOR]: SELECT 2012 4. Click the [Execute] button Best regards, Toshiba Managed Services Support Desk Once I ran the query they asked me, I clicked on query history, and copied the timestamp and sent that over to them. That was all they needed — armed with this information, they told me to wait 1-2 business days and they would seamlessly migrate my instance along with an estimated time slot when the migration would be completed. Once it was done, all of my data, including the IP Whitelist and my Portal Users were all copied over to my pay as you go plan. Cool! Migrating from PostgreSQL There is no official way of doing conducting this sort of migration, so for now, we can try simply exporting our tables into CSV files and then importing those files individually into our Cloud instance. Luckily with the GridDB Cloud CLI tool this process is much easier than ever before. So let’s first export our data and go from there. Exporting PostgreSQL Data First, the dataset I’m working with here is simply dummy data I ingested using a python script. Here’s the script: import psycopg2 import psycopg2.extras from faker import Faker import random import time # — YOUR DATABASE CONNECTION DETAILS — # Replace with your actual database credentials DB_NAME = “template1” DB_USER = “postgres” DB_PASSWORD = “yourpassword” DB_HOST = “localhost” # Or your DB host DB_PORT = “5432” # Default PostgreSQL port # — DATA GENERATION SETTINGS — NUM_RECORDS = 50000 # Initialize Faker fake = Faker() # Generate a list of fake records print(f”Generating {NUM_RECORDS} fake records…”) records_to_insert = [] for _ in range(NUM_RECORDS): name = fake.catch_phrase() # Using a more specific Faker provider quantity = random.randint(1, 1000) price = round(random.uniform(0.50, 500.00), 2) records_to_insert.append((name, quantity, price)) print(“Finished generating records.”) # SQL statements create_table_query = “”” CREATE TABLE IF NOT EXISTS sample_data ( id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL, quantity INTEGER, price REAL ); “”” # Using execute_batch is much more efficient for large inserts insert_query = “INSERT INTO sample_data (name, quantity, price) VALUES %s;” conn = None try: # Establish a connection to the database conn = psycopg2.connect( dbname=DB_NAME, user=DB_USER, password=DB_PASSWORD, host=DB_HOST, port=DB_PORT ) # Create a cursor cur = conn.cursor() # Create the table if it doesn’t exist print(“Ensuring ‘sample_data’ table exists…”) cur.execute(create_table_query) # Optional: Clean the table before inserting new data print(“Clearing existing data from the table…”) cur.execute(“TRUNCATE TABLE sample_data RESTART IDENTITY;”) # Start the timer start_time = time.time() print(f”Executing bulk insert of {len(records_to_insert)} records…”) psycopg2.extras.execute_values( cur, insert_query, records_to_insert, template=None, page_size=1000 # The number of rows to send in each batch ) print(“Bulk insert complete.”) # Commit the changes to the database conn.commit() # Stop the timer end_time = time.time() duration = end_time – start_time print(f”Successfully inserted {cur.rowcount} rows in {duration:.2f} seconds.”) # Close the cursor cur.close() except (Exception, psycopg2.DatabaseError) as error: print(f”Error while connecting to or working with PostgreSQL: {error}”) if conn: conn.rollback() # Roll back the transaction on error finally: # Close the connection if it was established if conn is not None: conn.close() print(“Database connection closed.”) Once you run this script, you will have 50k rows in your PSQL instance. Now let’s export this to CSV: $ psql –host 127.0.0.1 –username postgres –password –dbname template1 psql (14.18 (Ubuntu 14.18-0ubuntu0.22.04.1)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) Type “help” for help. template1=# select COUNT(*) from sample_data; count ——- 50000 (1 row) template1=# COPY sample_data TO ‘/tmp/sample.csv’ WITH (FORMAT CSV, HEADER); COPY 50000 template1=# \q And now that we have our CSV data, let’s install the CLI Tool and ingest it. Ingesting CSV Data into GridDB Cloud You can download the latest CLI Tool from the Github releases page: https://github.com/Imisrael/griddb-cloud-cli/releases. For me, I installed the .deb file $ wget https://github.com/Imisrael/griddb-cloud-cli/releases/download/v0.1.4/griddb-cloud-cli_0.1.4_linux_amd64.deb $ sudo dpkg -i griddb-cloud-cli_0.1.4_linux_amd64.deb $ vim ~/.griddb.yaml And enter your credentials: cloud_url: “https://cloud97.griddb.com:443/griddb/v2/gs_clustermfclo7/dbs/ZQ8” cloud_username: “kG-israel” cloud_pass: “password” And ingest: $ griddb-cloud-cli ingest /tmp/sample.csv ✔ Does this container already exist? … NO Use CSV Header names as your GridDB Container Col names? id,name,quantity,price ✔ Y/n … YES ✔ Container Name: … migrated_data ✔ Choose: … COLLECTION ✔ Row Key? … true ✔ (id) Column Type … INTEGER ✔ Column Index Type1 … TREE ✔ (name) Column Type … STRING ✔ (quantity) Column Type … INTEGER ✔ (price) Column Type … FLOAT ✔ Make Container? { “container_name”: “migrated_data”, “container_type”: “COLLECTION”, “rowkey”: true, “columns”: [ { “name”: “id”, “type”: “INTEGER”, “index”: [ “TREE” ] }, { “name”: “name”, “type”: “STRING”, “index”: null }, { “name”: “quantity”, “type”: “INTEGER”, “index”: null }, { “name”: “price”, “type”: “FLOAT”, “index”: null } ] } … YES {“container_name”:”migrated_data”,”container_type”:”COLLECTION”,”rowkey”:true,”columns”:[{“name”:”id”,”type”:”INTEGER”,”index”:[“TREE”]},{“name”:”name”,”type”:”STRING”,”index”:null},{“name”:”quantity”,”type”:”INTEGER”,”index”:null},{“name”:”price”,”type”:”FLOAT”,”index”:null}]} 201 Created Container Created. Starting Ingest 0 id id 1 name name 2 quantity quantity 3 price price ✔ Is the above mapping correct? … YES Ingesting. Please wait… Inserting 1000 rows 200 OK Inserting 1000 rows 200 OK And after some time, your data should be ready in your GridDB Cloud instance! $ griddb-cloud-cli sql query -s “SELECT COUNT(*) from migrated_data” [{“stmt”: “SELECT COUNT(*) from migrated_data” }] [[{“Name”:””,”Type”:”LONG”,”Value”:50000}]] And another confirmation $ griddb-cloud-cli read migrated_data -p -l 1 [ { “name”: “migrated_data”, “stmt”: “select * limit 1”, “columns”: null, “hasPartialExecution”: true }] [ [ { “Name”: “id”, “Type”: “INTEGER”, “Value”: 1 }, { “Name”: “name”, “Type”: “STRING”, “Value”: “Enterprise-wide multi-state installation” }, { “Name”: “quantity”, “Type”: “INTEGER”, “Value”: 479 }, { “Name”: “price”, “Type”: “FLOAT”, “Value”: 194.8 } ] ] Migrating from GridDB CE If you want to move all of your local data from GridDB Community Edition over to your GridDB Pay As You Go cloud database, you can now use the GridDB Cloud CLI Tool for the job! You will also of course need to export your CE containers that you wish to migrate. Prereqs As explained above, you will need: the GridDB Cloud CLI Tool from GitHub and the GridDB CE Import/Export Tool installed onto your machine. Step by Step Process of Migrating Let’s run through an example of exporting out entier GridDB CE Database and then running the migration from the CLI tool. This is going to assume you have the Import tool already set up, you can read more about that here: https://griddb.net/en/blog/using-the-griddb-import-export-tools-to-migrate-from-postgresql-to-griddb/ First, you’d run the export tool like so: $ cd expimp/bin $ ./gs_export -u admin/admin -d all –all This command will export all of your containers into a directory called ‘all’. $ ./gs_export -u admin/admin -d all –all Export Start. Directory : /home/israel/development/expimp/bin/all Number of target containers : 7 public.p01 : 2 public.p02 : 0 public.p03 : 0 public.device3 : 1015 public.device2 : 1092 public.device1 : 1944 public.col02 : 10000 Number of target containers:7 ( Success:7 Failure:0 ) Export Completed. Next, ensure your GridDB Cloud CLI Tool is set up, and once it is, you can run the migrate command. Let’s look at how it works: $ griddb-cloud-cli migrate -h Use the export tool on your GridDB CE Instance to create the dir output of csv files and a properties file and then migrate those tables to GridDB Cloud Usage: griddb-cloud-cli migrate [flags] Examples: griddb-cloud-cli migrate <directory> Flags: -f, –force Force create (no prompt) -h, –help help for migrate So in our case, we want to use migrate with the -f flag to not show us prompts because we have 7 containers to create and migrate! $ griddb-cloud-cli migrate -f all And here is an example of some of the output: {“container_name”:”device2″,”container_type”:”TIME_SERIES”,”rowkey”:true,”columns”:[{“name”:”ts”,”type”:”TIMESTAMP”,”index”:null},{“name”:”co”,”type”:”DOUBLE”,”index”:null},{“name”:”humidity”,”type”:”DOUBLE”,”index”:null},{“name”:”light”,”type”:”BOOL”,”index”:null},{“name”:”lpg”,”type”:”DOUBLE”,”index”:null},{“name”:”motion”,”type”:”BOOL”,”index”:null},{“name”:”smoke”,”type”:”DOUBLE”,”index”:null},{“name”:”temp”,”type”:”DOUBLE”,”index”:null}]} 201 Created inserting into (device2). csv: all/public.device2_2020-07-12_2020-07-13.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-13_2020-07-14.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-14_2020-07-15.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-15_2020-07-16.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-16_2020-07-17.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-17_2020-07-18.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-18_2020-07-19.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-19_2020-07-20.csv 200 OK inserting into (device2). csv: all/public.device2_2020-07-20_2020-07-21.csv 200 OK {“container_name”:”device3″,”container_type”:”TIME_SERIES”,”rowkey”:true,”columns”:[{“name”:”ts”,”type”:”TIMESTAMP”,”index”:null},{“name”:”co”,”type”:”DOUBLE”,”index”:null},{“name”:”humidity”,”type”:”DOUBLE”,”index”:null},{“name”:”light”,”type”:”BOOL”,”index”:null},{“name”:”lpg”,”type”:”DOUBLE”,”index”:null},{“name”:”motion”,”type”:”BOOL”,”index”:null},{“name”:”smoke”,”type”:”DOUBLE”,”index”:null},{“name”:”temp”,”type”:”DOUBLE”,”index”:null}]} 201 Created inserting into (device3). csv: all/public.device3_2020-07-12_2020-07-13.csv 200 OK inserting into (device3). csv: all/public.device3_2020-07-13_2020-07-14.csv 200 OK inserting into (device3). csv: all/public.device3_2020-07-14_2020-07-15.csv 200 OK inserting into (device3). csv: all/public.device3_2020-07-15_2020-07-16.csv 200 OK inserting into (device3). csv: all/public.device3_2020-07-16_2020-07-17.csv 200 OK inserting into (device3). csv: all/public.device3_2020-07-17_2020-07-18.csv 200 OK inserting into (device3). csv: all/public.device3_2020-07-18_2020-07-19.csv 200 OK inserting into (device3). csv: all/public.device3_2020-07-19_2020-07-20.csv 200 OK {“container_name”:”p01″,”container_type”:”COLLECTION”,”rowkey”:true,”columns”:[{“name”:”name”,”type”:”STRING”,”index”:null},{“name”:”names”,”type”:”STRING_ARRAY”,”index”:null},{“name”:”barr”,”type”:”BOOL_ARRAY”,”index”:null},{“name”:”tsarr”,”type”:”TIMESTAMP_ARRAY”,”index”:null}]} 201 Created {“container_name”:”p02″,”container_type”:”COLLECTION”,”rowkey”:true,”columns”:[{“name”:”id”,”type”:”STRING”,”index”:null},{“name”:”date”,”type”:”STRING”,”index”:null}]} 201 Created {“container_name”:”p03″,”container_type”:”COLLECTION”,”rowkey”:true,”columns”:[{“name”:”id”,”type”:”LONG”,”index”:null},{“name”:”c1″,”type”:”STRING”,”index”:null},{“name”:”c2″,”type”:”BOOL”,”index”:null}]} 201 Created inserting into (p01). csv: all/public.p01.csv 200 OK Lastly we can verify that our containers are in there: $ griddb-cloud-cli show device3 { “container_name”: “device3”, “container_type”: “TIME_SERIES”, “rowkey”: true, “columns”: [ { “name”: “ts”, “type”: “TIMESTAMP”, “timePrecision”: “MILLISECOND”, “index”: [] }, { “name”: “co”, “type”: “DOUBLE”, “index”: [] }, { “name”: “humidity”, “type”: “DOUBLE”, “index”: [] }, { “name”: “light”, “type”: “BOOL”, “index”: [] }, { “name”: “lpg”, “type”: “DOUBLE”, “index”: [] }, { “name”: “motion”, “type”: “BOOL”, “index”: [] }, { “name”: “smoke”, “type”: “DOUBLE”, “index”: [] }, { “name”: “temp”, “type”: “DOUBLE”, “index”: [] } ] } $ griddb-cloud-cli sql query -s “SELECT COUNT(*) FROM device2” [{“stmt”: “SELECT COUNT(*) FROM device2” }] [[{“Name”:””,”Type”:”LONG”,”Value”:1092}]] Looks good to me! Conclusion And with that, we have learned three different methods of migrating from a variation of GridDB to the new Azure Marketplace GridDB

More
Building Resume Creator with Multi-agent AI

In this blog, we will build an AI-powered resume-creation system that automates the tedious and time-consuming tasks involved in manual resume creation. By leveraging multi-agent AI systems, we will streamline the process of information gathering, and content writing to produce resumes with minimal human intervention. Limitations of Manual Resume Processing Inefficient Information Gathering The manual process of collecting and organizing information is time-consuming and requires significant effort. Inconsistent Formatting Manual resume creation often leads to formatting inconsistencies. The process requires manual adjustments to maintain professional formatting standards, which can be error-prone and time-consuming. Content Writing and Rewriting Challenges The manual process requires significant effort in crafting and editing content. Writing compelling and well-structured content by hand is labor-intensive, requiring multiple revisions and edits. Automating Resume Creation using AI Creating a resume manually involves several steps: Information Gathering: Collecting and organizing your personal details, job history, skills, and education. Formatting: Ensuring the resume looks attractive and professional, often without clear guidelines. Content Writing: Crafting and refining content to make it concise, compelling, and relevant. Proofreading and Editing: Checking for errors and polishing the resume to a professional standard. With the AI system, we can automate these steps using multi-agent systems. Each agent performs a specific task, such as extracting information, generating content, or formatting the resume. By coordinating these agents, we can create a fully automated resume creation system. Running the Resume Creator Before we dive into the technical details, you can run the resume creator system by following these steps: 1) Clone the repository: git clone https://github.com/junwatu/resume-creator-multi-agent-ai.git 2) Install the dependencies: cd resume-creator-multi-agent-ai cd apps npm install 3) Create a .env file in the apps directory and add the following environment variables: OPENAI_API_KEY=api-key-here GRIDDB_WEBAPI_URL= GRIDDB_USERNAME= GRIDDB_PASSWORD= VITE_APP_BASE_URL=http://localhost VITE_PORT=3000 Please refer to the Prerequisites section for more details on obtaining the OpenAI API key and GridDB credentials. 4) Start the server: npm run start 5) Open your browser and go to http://localhost:3000 to access the resume creator system. How it Works? In this blog, we automate the information gathering and content writing for the resume, tasks that are usually manual and time-consuming. This system diagram illustrates the resume creation process discussed in this blog, showcasing the collaboration between two main AI agents: Here’s a brief description: The system starts with User Input and requires an environment setup that includes Team Initialization and OpenAI API Key. Two AI agents work together: Profile Analyst (Agent AI 1): Handles data extraction from user input, breaking down information into categories like Name, Experience, Skills, Education, and Job History. Resume Writer (Agent AI 2): Takes structured information and handles the writing aspect. The workflow follows these key steps: Data Extraction: Organizes raw user input into structured categories. This is the information-gathering step. Structured Information: Stores the organized data into the GridDB Cloud database. Resume Crafting: Combines the structured data with writing capabilities. This is the content writing step. Create Resume: Generates the content. Final Resume: Produces the completed document. Prerequisites KaibanJS KaibanJS is the JavaScript framework for building multi-agent AI systems. We will use it to build our resume creation system. OpenAI We will use the o1-mini model from OpenAI. It is a smaller version of the o1 model, suitable for tasks that require complex reasoning and understanding. Create a project, an API key, and enable the o1-mini model in the OpenAI platform. Make sure to save the API key in the .env file. OPENAI_API_KEY=api-key-here GridDB Cloud The GridDB Cloud offers a free plan tier and is officially available worldwide. This database will store the structured information extracted by the Profile Analyst agent and also the final resume generated by the Resume Writer agent. You need these GridDB environment variables in the .env file: GRIDDB_WEBAPI_URL= GRIDDB_USERNAME= GRIDDB_PASSWORD= Check the below section on how to get these values. Sign Up for GridDB Cloud Free Plan If you would like to sign up for a GridDB Cloud Free instance, you can do so in the following link: https://form.ict-toshiba.jp/download_form_griddb_cloud_freeplan_e. After successfully signing up, you will receive a free instance along with the necessary details to access the GridDB Cloud Management GUI, including the GridDB Cloud Portal URL, Contract ID, Login, and Password. GridDB WebAPI URL Go to the GridDB Cloud Portal and copy the WebAPI URL from the Clusters section. It should look like this: GridDB Username and Password Go to the GridDB Users section of the GridDB Cloud portal and create or copy the username for GRIDDB_USERNAME. The password is set when the user is created for the first time, use this as the GRIDDB_PASSWORD. For more details, to get started with GridDB Cloud, please follow this quick start guide. IP Whitelist When running this project, please ensure that the IP address where the project is running is whitelisted. Failure to do so will result in a 403 status code or forbidden access. You can use a website like What Is My IP Address to find your public IP address. To whitelist the IP, go to the GridDB Cloud Admin and navigate to the Network Access menu. Node.js We will use Node.js LTS v22.12.0 to build a server that handles the communication between the user interface, AI agents, and OpenAI API and store data in the GridDB Cloud database. React We will use React to build the user interface for the resume creation system. Where the user can input their details and generate a resume with a click of a button. Building the Resume Creation System Node.js Server We will use Node.js to build the server that handles the communication between the user interface, AI agents, and OpenAI API. The server will also store the structured information in the GridDB Cloud database. This table provides an overview of the API routes defined in the server.js code, including HTTP methods, endpoints, descriptions, and any parameters. HTTP Method Endpoint Description Parameters POST /api/resumes Creates a new resume. Calls the generateResume function to generate content, saves to the database, and returns the response. Body: { content: string } GET /api/resumes Fetches all resumes stored in the database. None GET /api/resumes/:id Fetches a specific resume by its ID. Path: id (Resume ID) DELETE /api/resumes/:id Deletes a specific resume by its ID. Path: id (Resume ID) The main route code for the resume creation is as follows: app.post(‘/api/resumes’, async (req, res) => { try { const resumeData = req.body || {}; const result = await generateResume(resumeData.content || undefined); console.log(result); const resume = { id: generateRandomID(), rawContent: resumeData.content, formattedContent: result.result, status: result.status, createdAt: new Date().toISOString(), information: JSON.stringify(result.stats), } // Save resume to database const dbResponse = await dbClient.insertData({ data: resume }); if (result.status === ‘success’) { const all = { message: ‘Resume created successfully’, data: result.result, stats: result.stats, dbStatus: dbResponse } res.status(201).json(all); } else { res.status(400).json({ message: ‘Failed to generate resume’, error: result.error }); } } catch (error) { res.status(500).json({ error: ‘Server error while creating resume’, details: error.message }); } }); When the user submits their data, the server calls the generateResume function to generate the resume content. The result is then saved to the GridDB Cloud database, and the resume content is returned as a response. Multi-agent AI We will use KaibanJS to build the multi-agent AI system for the resume creation process. You can find the agent code in the team.kban.js file and this system consists of two main agents: Profile Analyst (Agent AI 1) The Profile Analyst agent is responsible for extracting structured information from the user input. It categorizes the input into fields such as Name, Experience, Skills, Education, and Job History. The effectiveness of these fields depends on the quality and diversity of the submitted data. const profileAnalyst = new Agent({ name: ‘Carla Smith’, role: ‘Profile Analyst’, goal: ‘Extract structured information from conversational user input.’, background: ‘Data Processor’, tools: [] // Tools are omitted for now }); This profile agent will use this task code to extract user data: const processingTask = new Task({ description: `Extract relevant details such as name, experience, skills, and job history from the user’s ‘aboutMe’ input. aboutMe: {aboutMe}`, expectedOutput: ‘Structured data ready to be used for a resume creation.’, agent: profileAnalyst }); The expectedOutput is the structured data that will be used by the Resume Writer agent to generate the resume content. The description and expectedOutput mimic the prompts if were interact with ChatGPT. However, in this case, this is done by the Profile Analyst agent. Resume Writer (Agent AI 2) The Resume Writer agent is responsible for crafting the resume content based on the structured information provided by the Profile Analyst agent. It generates well-structured, compelling content that effectively showcases the user’s qualifications and achievements. const resumeWriter = new Agent({ name: ‘Alex Morra’, role: ‘Resume Writer’, goal: `Craft compelling, well-structured resumes that effectively showcase job seekers qualifications and achievements.`, background: `Extensive experience in recruiting, copywriting, and human resources, enabling effective resume design that stands out to employers.`, tools: [] }); This resume agent will use this task code to generate the resume content: const resumeCreationTask = new Task({ description: `Utilize the structured data to create a detailed and attractive resume. Enrich the resume content by inferring additional details from the provided information. Include sections such as a personal summary, detailed work experience, skills, and educational background.`, expectedOutput: `A professionally formatted resume in raw markdown format, ready for submission to potential employers`, agent: resumeWriter }); The result of this task is markdown-formatted resume content that can be easily converted into a PDF or other formats and it’s easy to process by the user interface. Save Data to GridDB Cloud Database The GridDB Cloud database stores the structured information extracted by the Profile Analyst agent and the final resume generated by the Resume Writer agent. This is the schema data used to store the resume information in the GridDB Cloud database: { “id”: “string”, “rawContent”: “string”, “formattedContent”: “string”, “status”: “string”, “createdAt”: “string”, “information”: “string” } Field Type Description id string A unique identifier for each resume. rawContent string The original user input for the resume. formattedContent string The final formatted resume content. status string Indicates the success or failure of the resume generation process. createdAt string The timestamp of when the resume was created. information string The OpenAI token information. GridDB Cloud provides a RESTful API that allows us to interact with the database. We will use this API to store and retrieve the resume information. The griddb-client.js file contains the code to interact with the GridDB Cloud database. It includes functions to insert, retrieve, and delete resume data. To insert new data, you can use the endpoint /containers/${containerName}/rows. This endpoint allows you to add a new row of data to the database: async function insertData({ data, containerName = ‘resumes’ }) { console.log(data); try { const timestamp = data.createdAt instanceof Date ? data.createdAt.toISOString() : data.createdAt; const row = [ parseInt(data.id), // INTEGER data.rawContent, // STRING data.formattedContent, // STRING data.status, // STRING timestamp, // TIMESTAMP (ISO format) data.information // STRING ]; const path = `/containers/${containerName}/rows`; return await makeRequest(path, [row], ‘PUT’); } catch (error) { throw new Error(`Failed to insert data: ${error.message}`); } } GridDB also supports SQL-like queries to interact with the database. Here’s an example of an SQL query to retrieve all resumes from the database: SELECT * FROM resumes; and to retrieve a specific resume by its ID: SELECT * FROM resumes WHERE id = ‘resume-id’; Let’s take an example how to insert data into the GridDB Cloud database: const sql = “insert into resumes (id, rawContent, formattedContent, status, createdAt, information) values(3, ‘raw contenct here’, ‘ formatted content here’, ‘success’, TIMESTAMP(‘2025-01-02’), ‘{tokens: 300}’)”; const response = await fetch(`${process.env.GRIDDB_WEBAPI_URL}’/sql/dml/update’`, { method: ‘POST’, headers: { ‘Content-Type’: ‘application/json’, ‘Authorization’: `Basic ${authToken}`, }, body: JSON.stringify(payload), }); const responseText = await response.text(); The code above inserts the resume data into the GridDB Cloud database using the /sql/dml/update endpoint and the SQL query. All these data operations will be handled by the Node.js server and exposed as API endpoints for the user interface to interact with. User Interface The ResumeCreator component is built using React and allows users to input their details in a text and generate a resume with the click of a button. The user interface is designed to be simple. import { useState } from ‘react’; import { Card, CardContent } from ‘@/components/ui/card’; import { Button } from ‘@/components/ui/button’; import { Textarea } from ‘@/components/ui/textarea’; import { Alert, AlertDescription } from ‘@/components/ui/alert’; import { ResumeMarkdownRenderer } from ‘./ResumeMarkdownRenderer.tsx’; const ResumeCreator = () => { const [isSubmitting, setIsSubmitting] = useState(false); const [submitStatus, setSubmitStatus] = useState(null); const [resumeText, setResumeText] = useState(“default resume text”); const [markdownContent, setMarkdownContent] = useState(null); const BASE_URL = import.meta.env.VITE_APP_BASE_URL + ‘:’ + import.meta.env.VITE_PORT; const handleSubmit = async () => { setIsSubmitting(true); setSubmitStatus(null); try { const response = await fetch(`${BASE_URL}/api/resumes`, { method: ‘POST’, headers: { ‘Content-Type’: ‘application/json’, }, body: JSON.stringify({ content: resumeText }), }); if (!response.ok) { throw new Error(‘Failed to create resume’); } const aiResume = await response.json(); setMarkdownContent(aiResume.data); setSubmitStatus(‘success’); } catch (error) { console.error(‘Error creating resume:’, error); setSubmitStatus(‘error’); } finally { setIsSubmitting(false); setTimeout(() => setSubmitStatus(null), 5000); } }; return ( <div className=”max-w-4xl mx-auto p-8 space-y-6″> <h1 className=”text-3xl font-bold text-center”> Resume Creator <div className=”w-40 h-1 bg-green-500 mx-auto mt-1″></div> </h1> {submitStatus && ( <Alert className={submitStatus === ‘success’ ? ‘bg-green-50’ : ‘bg-red-50’}> <AlertDescription> {submitStatus === ‘success’ ? ‘Resume created successfully!’ : ‘Failed to create resume. Please try again.’} </AlertDescription> </Alert> )} {markdownContent ? ( <ResumeMarkdownRenderer markdown={markdownContent} /> ) : ( <div className=”space-y-6″> <h2 className=”text-2xl font-semibold”>About Me</h2> <Card className=”border-2″> <CardContent className=”p-6″> <p className=”text-sm text-gray-600 mb-4″> Enter your professional experience, skills, and education. Our AI will help format this into a polished resume. </p> <Textarea value={resumeText} onChange={(e: React.ChangeEvent<HTMLTextAreaElement>) => setResumeText(e.target.value)} className=”min-h-[400px] font-mono” placeholder=”Enter your resume content here…” /> </CardContent> </Card> <div className=”flex justify-center”> <Button onClick={handleSubmit} disabled={isSubmitting} className=”bg-green-500 hover:bg-green-600 text-white px-8 py-2 rounded-md” > {isSubmitting ? ‘Creating…’ : ‘Create Resume’} </Button> </div> </div> )} </div> ); }; export default ResumeCreator; The core functionality of the ResumeCreator component is to create a user resume using AI and render the result. It uses useState to manage input (resumeText), generated markdown (markdownContent), submission status (submitStatus), and submission progress (isSubmitting). The handleSubmit function sends a POST request to the /api/resumes route at the backend (${BASE_URL}/api/resumes), passing the user’s input, and updates the state based on the API’s response. Read here for the Node.js API routes. The UI includes a text area for input, a submit button to trigger the API call, and a markdown renderer ResumeMarkdownRenderer component to display the AI-generated resume. Alerts notify the user of the submission status while loading states to ensure a smooth experience. Further Improvements Enhanced Data Extraction: Improve the Profile Analyst agent’s ability to extract and categorize information more accurately and efficiently. Advanced Content Generation: Enhance the Resume Writer agent’s content generation capabilities to produce more compelling and personalized resumes. User Interface Enhancements: Add more features to the user interface, such as resume templates, customization options, and real-time editing. Conclusion In this blog, we have built an AI-powered resume-creation system that automates the tedious and time-consuming tasks involved in manual resume creation. By leveraging multi-agent AI systems, we have streamlined the process of information gathering and content writing to produce resumes with minimal human

More
Sports Analytics with GridDB

Introduction In modern sports, data-driven decision-making has become essential for gaining a competitive edge. Every step, shot, or lap generates a stream of events that require high-speed ingestion, efficient storage, and rapid querying—challenges that traditional relational databases struggle to handle at scale. To address this, organizations are increasingly looking into alternatives. GridDB, a highly scalable and efficient time-series database, is designed to manage large volumes of continuously generated data such as above. By leveraging GridDB, teams can analyze critical performance metrics such as player speed, fatigue levels, and tactical positioning over time. These insights enable coaches and analysts to make informed decisions on game tactics, substitutions, and training regimens based on real-time and historical data. In this article, we explore how GridDB, integrated within a Spring Boot application, can be used for a soccer analytics use case—optimizing player substitutions and refining game strategies with data-driven precision. Understanding the Use Case A single soccer match generates hundreds of timestamped events—such as a midfielder’s pass at a given time e.g. 20:05:32 or a striker’s shot at time 20:10:15—each enriched with outcomes and metadata. The sequential nature of this data reveals crucial patterns, like player fatigue or shifts in attacking momentum, that static analyses often miss. For engineers, the challenge lies in efficiently managing this high-speed, high-volume data stream. To simulate this type of data, we will use events/15946.json dataset from StatsBomb, which logs an entire match’s events—including passes, shots, and tackles—with millisecond precision. Our Spring Boot application, powered by GridDB, will focus on: Performance Tracking: Monitoring pass accuracy to detect signs of fatigue. Strategy Optimization: Analyzing shot frequency to uncover attacking opportunities. Setting Up GridDB Cluster and Spring Boot Integration Project Structure Here’s a suggested project structure for this application: ├───my-griddb-app │ │ pom.xml │ │ │ ├───src │ │ ├───main │ │ │ ├───java │ │ │ │ └───mycode │ │ │ │ │ MySpringBootApplication.java │ │ │ │ │ │ │ │ │ ├───config │ │ │ │ │ GridDBConfig.txt │ │ │ │ │ │ │ │ │ ├───controller │ │ │ │ │ MatchEventsController.java │ │ │ │ │ │ │ │ │ └───service │ │ │ │ MatchEventsService.java │ │ │ │ MetricsCollectionService.java │ │ │ │ RestTemplateConfig.java │ │ │ │ │ │ │ └───resources │ │ │ │ application.properties │ │ │ │ │ │ │ └───templates │ │ │ pass-accuracy-graph.html This structure separates controllers, models, repositories, services, and the application entry point into distinct layers, enhancing modularity and maintainability. It can be further customized based on individual requirements. Set Up GridDB Cloud For this exercise, we will be using GridDB Cloud version. Start by visiting the GridDB Cloud portal and [signing up](GridDB Cloud Free Plan | TOSHIBA DIGITAL SOLUTIONS CORPORATION) for an account. Based on requirements, either the free plan or a paid plan can be selected for broader access. After registration ,an email will be sent containing essential details, including the Web API URL and login credentials. Once the login details are received, log in to the Management GUI to access the cloud instance. Create Database Credentials Before interacting with the database, we must create a database user: Navigate to Security Settings: In the Management GUI, go to the “GridDB Users” tab. Create a Database User: Click “Create Database User,” enter a username and password, and save the credentials. For example, set the username as soccer_admin and a strong password. Store Credentials Securely: These will be used in your application to authenticate with GridDB Cloud. Set Allowed IP Addresses To restrict access to authorized sources, configure the allowed IP settings: Navigate to Security Settings: In the Management GUI, go to the “Network Access” tab and locate the “Allowed IP” section and add the . Add IP Addresses: For development, you can temporarily add your local machine’s IP. Add POM Dependency Here’s an example of how to configure the dependency in thepom.xml file: <project xmlns=”http://maven.apache.org/POM/4.0.0″ xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd”> <modelVersion>4.0.0</modelVersion> <groupId>com.example</groupId> <artifactId>my-griddb-app</artifactId> <version>1.0-SNAPSHOT</version> <name>my-griddb-app</name> <url>http://maven.apache.org</url> <parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>3.2.4</version> <relativePath /> <!– lookup parent from repository –> </parent> <properties> <maven.compiler.source>17</maven.compiler.source> <maven.compiler.target>17</maven.compiler.target> </properties> <dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> <exclusions> <exclusion> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-logging</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-thymeleaf</artifactId> </dependency> <!– JSON processing –> <dependency> <groupId>org.glassfish.jersey.core</groupId> <artifactId>jersey-client</artifactId> <version>2.35</version> </dependency> <dependency> <groupId>org.json</groupId> <artifactId>json</artifactId> <version>20210307</version> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.15.0</version> <!– or the latest version –> </dependency> <!– Lombok –> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <optional>true</optional> </dependency> </dependencies> </project> Technical Implementation Implementing a soccer analytics solution with GridDB and Spring Boot involves three key steps: Ingesting the StatsBomb events/15946.json dataset into GridDB. Querying the data to extract time-series metrics. Visualizing the results to generate actionable insights. Below, we explore each phase in detail, showcasing GridDB’s time-series capabilities and its seamless integration within a Spring Boot architecture. Step 1: Data Ingestion The events/15946.json file logs a sequence of match events—passes, shots, tackles—each record containing essential fields such as: Timestamp (e.g., “2021-06-11T20:05:32.456”) Player Name (player.name) Event Type (type.name) Outcome (e.g., pass.outcome.name as “Complete”, shot.outcome.name as “Goal”) To efficiently store and query this data in GridDB, we first need define a time-series container in GridDb cloud as below. Container Setup We define a container name match_events in GridDB Cloud using the time-series type with timestamp as the row key. Next, we will create schema which will includes the following columns: timestamp (TIMESTAMP, NOT NULL, Row Key) player_name (STRING) event_type (STRING) event_outcome (STRING) minute (INTEGER) second (INTEGER) team_name (STRING) Afterwards we implement MetricsCollectionServicewhich fetches data from JSON file and pushing the data in database. Here the implentation of MetricCollectionService.java : package mycode.service; import org.json.JSONArray; import org.json.JSONObject; import org.springframework.stereotype.Service; import org.springframework.beans.factory.annotation.Value; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; import java.time.LocalDate; import java.time.format.DateTimeFormatter; import java.util.Scanner; @Service public class MetricsCollectionService { private static String gridDBRestUrl; private static String gridDBApiKey; @Value(“${griddb.rest.url}”) public void setgridDBRestUrl(String in) { gridDBRestUrl = in; } @Value(“${griddb.api.key}”) public void setgridDBApiKey(String in) { gridDBApiKey = in; } public void collect() { try { // Fetch JSON Data from GitHub String jsonResponse = fetchJSONFromGitHub( “https://raw.githubusercontent.com/statsbomb/open-data/master/data/events/15946.json”); JSONArray events = new JSONArray(jsonResponse); // Process and Send Data to GridDB Cloud sendBatchToGridDB(events); } catch (Exception e) { e.printStackTrace(); } } private static String fetchJSONFromGitHub(String urlString) throws Exception { URL url = new URL(urlString); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod(“GET”); conn.setRequestProperty(“Accept”, “application/json”); if (conn.getResponseCode() != 200) { throw new RuntimeException(“Failed to fetch data: HTTP error code : ” + conn.getResponseCode()); } Scanner scanner = new Scanner(url.openStream()); StringBuilder response = new StringBuilder(); while (scanner.hasNext()) { response.append(scanner.nextLine()); } scanner.close(); return response.toString(); } private static void sendBatchToGridDB(JSONArray events) { JSONArray batchData = new JSONArray(); boolean startProcessing = false; for (int i = 0; i < events.length(); i++) { JSONObject event = events.getJSONObject(i); JSONArray row = new JSONArray(); if (event.has(“index”) && event.getInt(“index”) == 10) { startProcessing = true; } if (!startProcessing) { continue; // Skip records until we reach index == 7 } // Extract and format fields String formattedTimestamp = formatTimestamp(event.optString(“timestamp”, null)); row.put(formattedTimestamp); row.put(event.optJSONObject(“player”) != null ? event.getJSONObject(“player”).optString(“name”, null) : null); row.put(event.optJSONObject(“type”) != null ? event.getJSONObject(“type”).optString(“name”, null) : null); JSONObject passOutcome = event.optJSONObject(“pass”); JSONObject shotOutcome = event.optJSONObject(“shot”); if (passOutcome == null && shotOutcome == null) { continue; } if (passOutcome != null) { if (passOutcome.has(“outcome”)) { row.put(passOutcome.getJSONObject(“outcome”).optString(“name”, null)); } else { row.put(JSONObject.NULL); } } else if (shotOutcome != null) { if (shotOutcome.has(“outcome”)) { row.put(shotOutcome.getJSONObject(“outcome”).optString(“name”, null)); } else { row.put(JSONObject.NULL); } } else { row.put(JSONObject.NULL); } row.put(event.optInt(“minute”, -1)); row.put(event.optInt(“second”, -1)); row.put(event.optJSONObject(“team”) != null ? event.getJSONObject(“team”).optString(“name”, null) : null); batchData.put(row); } sendPutRequest(batchData); } private static String formatTimestamp(String inputTimestamp) { try { String todayDate = LocalDate.now().format(DateTimeFormatter.ISO_DATE); return todayDate + “T” + inputTimestamp + “Z”; } catch (Exception e) { return “null”; // Default if parsing fails } } private static void sendPutRequest(JSONArray batchData) { try { URL url = new URL(gridDBRestUrl); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setDoOutput(true); conn.setRequestMethod(“PUT”); conn.setRequestProperty(“Content-Type”, “application/json”); conn.setRequestProperty(“Authorization”, gridDBApiKey); // Encode username and password for Basic Auth // Send JSON Data OutputStream os = conn.getOutputStream(); os.write(batchData.toString().getBytes()); os.flush(); int responseCode = conn.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK || responseCode == HttpURLConnection.HTTP_CREATED) { System.out.println(“Batch inserted successfully.”); } else { System.out.println(“Failed to insert batch. Response: ” + responseCode); } conn.disconnect(); } catch (Exception e) { e.printStackTrace(); } } } Ingestion Logic This steps involves fetching data from GridDB using the REST API and grouping it into 5-minute intervals. package mycode.service; import com.fasterxml.jackson.databind.JsonNode; import com.fasterxml.jackson.databind.ObjectMapper; import org.springframework.stereotype.Service; import java.net.URI; import java.net.http.HttpClient; import java.net.http.HttpRequest; import java.net.http.HttpResponse; import java.util.HashMap; import java.util.Map; @Service public class MatchEventsService { private static final String GRIDDB_URL = “https://cloud5114.griddb.com:443/griddb/v2/gs_clustermfcloud5114/dbs/9UkMCtv4/containers/match_events/rows”; private static final String AUTH_HEADER = “Basic TTAyY…lhbEAx”; private final HttpClient httpClient = HttpClient.newHttpClient(); private final ObjectMapper objectMapper = new ObjectMapper(); public Map<Integer, Integer> getPassCountByFiveMin(String playerName) { try { // Build the HTTP request based on your curl HttpRequest request = HttpRequest.newBuilder() .uri(URI.create(GRIDDB_URL)) .header(“Content-Type”, “application/json”) .header(“Authorization”, AUTH_HEADER) .POST(HttpRequest.BodyPublishers.ofString(“{\”offset\”: 0, \”limit\”: 55555}”)) .build(); // Fetch the response HttpResponse<string> response = httpClient.send(request, HttpResponse.BodyHandlers.ofString()); JsonNode rootNode = objectMapper.readTree(response.body()); JsonNode rows = rootNode.get(“rows”); // Process data: count passes every 5 minutes Map<Integer, Integer> passCountByFiveMin = new HashMap<>(); for (JsonNode row : rows) { String currentPlayer = row.get(1).asText(); String eventType = row.get(2).asText(); int minute = row.get(4).asInt(); if (playerName.equals(currentPlayer) && “Pass”.equals(eventType)) { // Group by 5-minute intervals (0-4, 5-9, 10-14, etc.) int fiveMinInterval = (minute / 5) * 5; passCountByFiveMin.merge(fiveMinInterval, 1, Integer::sum); } } return passCountByFiveMin; } catch (Exception e) { e.printStackTrace(); return new HashMap<>(); } } }</string> Step 3: Visualization To deliver insights, Spring Boot exposes REST endpoints via a @RestController: Endpoints: /api/pass-accuracy/{player} returns a JSON array of {time, accuracy} pairs; /api/shot-frequency/{team} returns {time, shots}. Implementation: The controller calls the query service, maps GridDB results to DTOs, and serializes them with Spring’s Jackson integration. package mycode.controller; import mycode.service.MatchEventsService; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Controller; import org.springframework.ui.Model; import org.springframework.web.bind.annotation.*; import java.util.Map; @Controller public class MatchEventsController { @Autowired private MatchEventsService matchEventsService; @GetMapping(“/pass-accuracy/{playerName}”) public String getPassCountEveryFiveMin(@PathVariable String playerName, Model model) { Map<Integer, Integer> passCountByFiveMin = matchEventsService.getPassCountByFiveMin(playerName); // Prepare data for the chart model.addAttribute(“playerName”, playerName); model.addAttribute(“timeIntervals”, passCountByFiveMin.keySet().stream().sorted().toList()); model.addAttribute(“passCounts”, passCountByFiveMin.values()); return “pass-accuracy-graph”; // Thymeleaf template name } } Running the Project To run the project, execute the following command to build and run our application: mvn clean install && mvn spring-boot:run   Accessing the Dashboard After launching the application, open a web browser and navigate to: http://localhost:9090/pass-accuracy/{{player name}}. For example, http://localhost:9090/pass-accuracy/Lionel%20Andr%C3%A9s%20Messi%20Cuccittini This visualization displays a chart representing pass accuracy trends over time. It provides insights into the player’s fatigue levels over time and their overall activity on the field. Similarly various insights can be generated from this saved data, providing valuable analytics for team performance and decision-making. For example, Player Pass Accuracy Over Time Data: Count of “Pass” events with outcome.name = “Complete” vs. “Incomplete” per player, bucketed by 5-minute intervals. Visualization: Line graph with time (x-axis) and pass accuracy percentage (y-axis) for a key player (e.g., a midfielder). Insight: If pass accuracy drops below 70% late in the game (e.g., after minute 70), the player may be fatigued—time for a substitution. Graph: Goal Proximity Over Time Data: Count of “Shot” events with shot.outcome.name = “Goal” or near-miss outcomes (e.g., “Off Target”), bucketed by 10-minute intervals. Visualization: Stacked bar graph with time (x-axis) and shot outcomes (y-axis). Insight: Periods with frequent near-misses (e.g., minute 30-40) suggest missed opportunities—adjust tactics to capitalize on pressure. Conclusion: As demonstrated, GridDB efficiently processes the timestamped complexity of soccer events, delivering structured insights with precision. Integrated within a Spring Boot application, its high-performance ingestion, optimized time-series indexing, and rapid querying capabilities enable the extraction of actionable metrics from StatsBomb data—identifying player fatigue and strategic opportunities with millisecond accuracy. As sports technology continues to evolve, such implementations highlight the critical role of specialized databases in unlocking the full potential of temporal

More