Blog

Saving your IoT Data into GridDB with RabbitMQ

RabbitMQ is a popular message-queuing system, used in a variety of systems where message delivery are of utmost importance. For our case, we would like to use RabbitMQ to ensure delivery of an on-the-field sensor data to be delivered to GridDB for later processing. Of course, we could always send data from the field to our main server via other means, namely HTTP, but those methods of data transfer can be finicky and unsafe; how often have you tried listening a song via Apple Music through a sparsely connected rural part of the state, only to be met with a connection error and then dead silence? Once that connection is broken, it won’t come back until the entire handshake process occurs again, and all of the data sent in the intermediary is completely lost. The goal of RabbitMQ in the context of this project will be to ensure that even if there are connection issues, the data will persist until it receives acknowledgement from the server that the data has been received and saved into GridDB. The Project The goal of this article is to create a proof-of-concept for a very basic IoT message-queue system; we will have one physical sensor out “in the field” reading data from its environment, pushing the readings onto an exchange which will then push the data onto the queue and then finally into our server. Once that server acknowledges that it has received the entirety of the data, it will remove that value from the queue and move on to the next one (if it exists). To accomplish this, first let’s talk hardware. The Hardware We have set up a Raspberry Pi 4 to connect with an air quality sensor Adafruit PMSA003I Air Quality Breakout via this STEMMA Hat and a STEMMA wire; if you are interested in learning more about this particular sensor, you can read about it in the Docs page provided by adafruit. The data will be received from the queue from an Ubuntu server — the specs are not important. Next, let’s take a look at the software. The Software Of course, we are going to be utilizing RabbitMQ for the pushing and receiving of messages of relevant data. RabbitMQ provides various connectors for many programming languages, so we essentially are free to mix and match as we see fit (which is another stealth benefit of utilizing RabbitMQ for your stack). In our case, because we were already provided with a python library to which we can easily read and translate the raw sensor data, we want to push the payload data with Python. We could receive our payload data on the server with another python script with the aid of the GridDB Python Connector, but we will instead opt to receive with Java as it is GridDB’s native interface and doesn’t require any additional downloads. The Plan Overall, our plan is as follows: Install RabbitMQ onto Ubuntu server Read sensor readings and translate into readable data payloads (python) push data onto an Exchange/Queue of our creation Receive queue with Java (and RabbitMQ) Save received payloads directly GridDB How to Run The python script can be easily run: install the required libraries and then simply run the script: python3 app.py. For Java, because we have some dependencies to outside libraries, we need to reference them (they’re in the lib directory) and then run that way. For example: $ cd lib/ $ export CP=.:amqp-client-5.16.0.jar:slf4j-api-1.7.36.jar:slf4j-simple-1.7.36.jar:gridstore-5.6.0.jar:jackson-databind-2.17.2.jar:jackson-core-2.17.2.jar:jackson-annotations-2.17.2.jar $ java -cp $CP ../Recv.java The order of running these two files is not important; the receive will stay on even if the queue is empty. Prereqs & Getting Started Here are list of needs if you would like to follow this project 1:1 Raspberry Pi STEMMA Hat & Wire (or other means of connecting to board) Python, RabbitMQ, GridDB, Java, & various other libraries You can install RabbitMQ from their download page; instructions are straightforward. The only caveats are you will need to create yourself a new user and set the permissions properly: $ sudo rabbitmqctl add_user username password $ sudo rabbitmqctl set_permissions -p / username “.*” “.*” “.*” The credentials here will be the same ones used when forging the connection between the data sender and the data receiver. One note: I was unsuccessful in trying to use “special characters” in my password when making my connection, so I’d advise to keep the password simple for now (ie. just A-z and integers). Implementation: The Producer Finally, let’s get into specifics. We will first focus on our producer (the raspberry pi) and then move on to the consumer (our server). We will also be setting some configs to ensure our messages are delivered and saved into the database. Python Script for Reading Data We are using a modified version of the python script provided by adafruit to read the sensor data. Essentially, our task is very simple: we read the data, convert to JSON, and push to the Exchange/Queue. First, let’s look at the hardware part of the code; after that we will get into the code for creating and pushing onto a queue to the correct machine. import board import busio from adafruit_pm25.i2c import PM25_I2C reset_pin = None i2c = busio.I2C(board.SCL, board.SDA, frequency=100000) # Connect to a PM2.5 sensor over I2C pm25 = PM25_I2C(i2c, reset_pin) aqdata = pm25.read() This snippet of code is all you need to read/translate the sensor readings. With this, assuming everything is connected properly, we will save the current values into the variable we called aqdata. Python Code to Create and Push Data to RabbitMQ Queue Next, let’s look at the RabbitMQ code. First, we want to establish our connection to our ubuntu server. We will point the address to the IP of the machine and set the port to the default. We will also use the credentials we made earlier on our Ubuntu server import pika credentials = pika.PlainCredentials(‘israel’, ‘israel’) parameters = pika.ConnectionParameters(‘192.168.50.206’, 5672, ‘/’, credentials) connection = pika.BlockingConnection(parameters) channel = connection.channel() Next we want to create and set some parameters for our queue, including how we handle pushing data messages to it. channel.confirm_delivery() channel.queue_declare(queue=’airQuality’, durable=True) By default, RabbitMQ prioritizes throughput above all else, meaning we need to change some default configuration options to ensure our data is being sent — even in the case of a weak connection — to our server (also known as the broker). First, we want to enable confirm delivery. This will produce an exception/error if the producer receives a negative acknowledgement (also referred to as a nack) from our broker. This means if our data is falling off, we will at least have a log of it. Unfortunately for us, there isn’t a very robust handling of failed messages on the Python side; if this were for a production project, we would need to migrate from Python to some other language where you can deal with messages in a variety of ways. Namely, I think, we’d like to add batch processing of messages so that there’s less of a chance of dropped data readings, and an easier time of re-sending dropped efforts. Anyway, working with what we have, the next thing we do is turn on durable which will save the queue in the event of a broker crash/reboot. This means the aqdata won’t need to be re-created but the messages inside of the queue won’t necessarily be saved. After that, we read and send data simultaneously: while True: time.sleep(1) try: aqdata = pm25.read() current_time = datetime.datetime.utcnow().replace(microsecond=0) now = current_time.strftime(‘%Y-%m-%dT%H:%M:%S.%fZ’) aqdata[‘ts’] = now aqdata[‘pm1’] = aqdata.pop(‘pm10 standard’) aqdata[‘pm25’] = aqdata.pop(‘pm25 standard’) aqdata[‘pm10’] = aqdata.pop(‘pm100 standard’) aqdata[‘pm1e’] = aqdata.pop(‘pm10 env’) aqdata[‘pm25e’] = aqdata.pop(‘pm25 env’) aqdata[‘pm10e’] = aqdata.pop(‘pm100 env’) aqdata[‘particles03’] = aqdata.pop(‘particles 03um’) aqdata[‘particles05’] = aqdata.pop(‘particles 05um’) aqdata[‘particles10’] = aqdata.pop(‘particles 10um’) aqdata[‘particles25’] = aqdata.pop(‘particles 25um’) aqdata[‘particles50’] = aqdata.pop(‘particles 50um’) aqdata[‘particles100’] = aqdata.pop(‘particles 100um’) #print(aqdata) except RuntimeError: print(“Unable to read from sensor, retrying…”) continue payload = json.dumps(aqdata) try: channel.basic_publish(exchange=”, routing_key=’airQuality’, body=payload, properties=pika.BasicProperties(delivery_mode=pika.DeliveryMode.Persistent), mandatory=True) print(” [x] Sent payload: ” + payload) except pika.exceptions.UnroutableError: # If the message is not confirmed, it means something went wrong print(“Message could not be confirmed”) For this snippet of code, we are reading the sensor data, changing the column names into the ones we want to use on the consumer side, and then pushing the payload into the channel to our queue we made earlier. Some things to note here: we set the mandatory flag to true and set the delivery mode to persistent. These two settings will try to save our messages into disk if they don’t receive positive acknowledgement from our broker that the messages were safely delivered. The exception occurs if the broker ends back to our producer a nack (negative acknowledgement). And so now every 1 second, our script will read sensor values and push it into the queue. Once the data is confirmed by the broker, the producer no longer cares about that data message. Implementation: The Consumer Our consumer will be written in Java and its job is to read from the Queue in our broker (in our case, the same host machine as our consumer), unmarshal the data into a Java Object, and then save the results into GridDB. Consuming the Queue in Java The consumer portion of the code is rather simple: forge the connection and read from the queue. private final static String QUEUE_NAME = “airQuality”; private final static boolean AUTO_ACK = false; ConnectionFactory factory = new ConnectionFactory(); factory.setHost(“localhost”); Connection connection = factory.newConnection(); Channel channel = connection.createChannel(); channel.queueDeclare(QUEUE_NAME, true, false, false, null); System.out.println(” [*] Waiting for messages. To exit press CTRL+C”); Here we are making our connection to our broker (hosted on the same machine as the consumer, hence localhost). We declare the queue we want to read from and set some options; we are using the default values for everything except for the first true which corresponds to durable mode, which we are setting to true, as explained above in the python section, it means that our queue will persist even if the broker goes down. Next, let’s run the actual consume: channel.basicConsume(QUEUE_NAME, AUTO_ACK, deliverCallback, consumerTag -> { }); The only thing I’d like to point out here is that we’ve turned off the AUTO_ACK option (it’s set to FALSE). This means we will need to manually acknowledge either if the message being read from the queue was successful or not. Next, here’s the callback function that is run every-time it reads a new message off of the queue: DeliverCallback deliverCallback = (consumerTag, delivery) -> { byte[] data = delivery.getBody(); try { AirData ad = mapper.readValue(data, AirData.class); String jsonString = mapper.writerWithDefaultPrettyPrinter().writeValueAsString(ad); System.out.println(jsonString); container.put(ad); channel.basicAck(delivery.getEnvelope().getDeliveryTag(), false); } catch (Exception e) { channel.basicNack(delivery.getEnvelope().getDeliveryTag(), false, true); System.out.println(“Setting nack”); } }; Here is what’s going on: we read the message (type of array of bytes), we use the jackson json library to unmarshal the value from raw bytes into a class we declare called AirData: static public class AirData { @JsonProperty(“ts”) @RowKey Date ts; @JsonProperty(“pm1”) double pm1; @JsonProperty(“pm25”) double pm25; @JsonProperty(“pm10”) double pm10; @JsonProperty(“pm1e”) double pm1e; @JsonProperty(“pm25e”) double pm25e; @JsonProperty(“pm10e”) double pm10e; @JsonProperty(“particles03”) double particles03; @JsonProperty(“particles05”) double particles05; @JsonProperty(“particles10”) double particles10; @JsonProperty(“particles25”) double particles25; @JsonProperty(“particles50”) double particles50; @JsonProperty(“particles100”) double particles100; } Next we save that newly made Java object into GridDB and then finally acknowledge to our broker that we received the message. If something goes wrong, we will send a nack and the message will remain in the queue until it gets an ack. GridDB Lastly, let’s go over how GridDB fits into this. We will do our standard connecting to GridDB and then get our timeseries container. In this case, I created the table/container in the shell as it’s easier than writing a one-time use java code. $ sudo su gsadm $ gs_sh gs> createtimeseries aqdata NO ts timestamp pm1 double pm25 double pm10 double pm1e double pm25e double pm10e double particles03 double particles05 double particles10 double particles25 double particles50 double particles100 double And now we make our connection in our Java code: public static GridStore GridDBNoSQL() throws GSException { GridStore store = null; try { Properties props = new Properties(); props.setProperty(“notificationMember”, “127.0.0.1:10001”); props.setProperty(“clusterName”, “myCluster”); props.setProperty(“user”, “admin”); props.setProperty(“password”, “admin”); store = GridStoreFactory.getInstance().getGridStore(props); } catch (Exception e) { e.printStackTrace(); } return store; } Using our AirData class from earlier we grab our newly made container: TimeSeries<airdata> container = store.getTimeSeries(“aqdata”, AirData.class); System.out.println(“Connected to GridDB!”);</airdata> And then we’ve already seen this above, but as we receive new payloads, we immediately save to GridDB and then send the positive acknowledgement: container.put(ad); channel.basicAck(delivery.getEnvelope().getDeliveryTag(), false); Conclusion In this article, we set up a robust system in which our IoT data will safely transferred from our python producer to an exchange with no name (”), transferred to our broker which houses our queue called airQuality, and then finally will be read by our java

More
Automate Slide Creation Using OpenAI and Node.js

With the rise of AI tools, we can automate many manual workloads, including creating presentation slides. Developers can generate slide content programmatically by leveraging OpenAI’s language models and Node.js. This automation surely will save time. By using OpenAI for content generation and Node.js for orchestration, you can effortlessly streamline the process of creating compelling and informative presentations. In this post, we will use the Assistant API model from OpenAI to automate slide content creation, Node.js to create the slide document, and GridDB to save the slide information. Running the Project Clone the source code from this GitHub repository. git clone https://github.com/griddbnet/Blogs.git –branch slides You also need to install Node.js and GridDB for this project to run. If the software requirements are installed, change the directory to the apps project directory and then install all the dependencies: cd apps npm install Create a .env file and copy all environment variables from the .env.example file. You need an OpenAI key for this project, please look in this section on how to get the key. OPENAI_API_KEY=sk-proj-secret VITE_APP_URL=http://localhost:3000 You can change the VITE_APP_URL to your needs and then run the project by running this command: npm run start:build Then open the browser and go to the app URL. Select the data sample and then click the Create Slide button. If the slide presentation is created successfully, a download link will be provided. Getting Started 1. Installing Node.js This project will run on the Node.js platform. You need to install it from here. For this project, we will use the nvm package manager and Node.js v16.20.2 LTS version. # installs nvm (Node Version Manager) curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash # download and install Node.js nvm install 16 # verifies the right Node.js version is in the environment node -v # should print `v16.20.2` # verifies the right NPM version is in the environment npm -v # should print `8.19.4“ To connect Node.js and GridDB database, you need the gridb-node-api npm package which is a Node.js binding developed using GridDB C Client and Node addon API. 2. Setting Up GridDB We will use the GridDB database to save recipes and it’s nutrition analysis. Please look at the guide for detailed installation. We will use Ubuntu 20.04 LTS here. Run GridDB and check if the service is running. Use this command: sudo systemctl status gridstore If not running try to run the database with this command: sudo systemctl start gridstore 3. Setup OpenAI Keys The OpenAI key is on a project basis, so we need to create a project first in the OpenAI platform. To access any OpenAI services, you need a valid key. Go to this link and create a new OpenAI key, make sure to select the right project. You need also to enable any models that you use on a project. For this project, we will need the gpt-4o model. Go to the project settings and then select which models to be enabled. You should save the OpenAI key on the .env file and make sure not to include it in version control by adding it to the .gitignore. 4. Setup AI Assistant This project needs an AI assistant. You need to set it first, go to the project dashboard, and create a new assistant. You need to pay attention to the Instruction field because it will dictate the behavior of the AI assistant. This is the instruction used for this assistant: You are a data scientist assistant. When given data and a query, write the proper code and create the proper visualization. Another setup is you need to enable Code Interpreter, which means the assistant will be able to execute code in a sandbox environment, enabling your prompt to execute code. For more information on this feature, please click here. After the AI assistant creation, you need to copy the assistant ID. This ID will be used as a reference in the code where you can send messages to the assistant. const dataScienceAssistantId = “asst_FOqRnMVXw0WShTSGw70NZJAX” Data Examples This project will use JSON data samples from car spare parts sales. The data reside in the data directory. This is the spare part sales data for the year 2020 to the year 2024: [ { “Year”: 2020, “Quarter”: “Q1”, “Distribution channel”: “Online Sales”, “Revenue ($M)”: 2.10, “Costs ($M)”: 1.905643, “Customer count”: 190, “Time”: “2020 Q1”, “Product Category”: “Engine Parts”, “Region”: “North America”, “Units Sold”: 900, “Average Sale Price ($)”: 2333, “Discounts Given ($)”: 14000, “Returns ($)”: 4500, “Customer Satisfaction Rating”: 8.2, “Salesperson”: “SP120”, “Marketing Spend ($)”: 18000 }, { “Year”: 2020, “Quarter”: “Q1”, “Distribution channel”: “Direct Sales”, “Revenue ($M)”: 2.15, “Costs ($M)”: 2.004112, “Customer count”: 200, “Time”: “2020 Q1”, “Product Category”: “Brakes”, “Region”: “Europe”, “Units Sold”: 1000, “Average Sale Price ($)”: 2150, “Discounts Given ($)”: 12000, “Returns ($)”: 5000, “Customer Satisfaction Rating”: 8.0, “Salesperson”: “SP121”, “Marketing Spend ($)”: 19000 }, … { “Year”: 2024, “Quarter”: “Q2”, “Distribution channel”: “Direct Sales”, “Revenue ($M)”: 3.15, “Costs ($M)”: 2.525112, “Customer count”: 390, “Time”: “2024 Q2”, “Product Category”: “Brakes”, “Region”: “Europe”, “Units Sold”: 1500, “Average Sale Price ($)”: 2095, “Discounts Given ($)”: 22000, “Returns ($)”: 17000, “Customer Satisfaction Rating”: 9.1, “Salesperson”: “SP144”, “Marketing Spend ($)”: 38000 } ] Ideally, the data should be uploaded via the user interface. However, for simplicity in this project, the data will be directly processed when you choose the data samples from the data samples dropdown. How can OpenAI process the file directly? The answer is, that you need to upload manually the data sample files first. Go to the project dashboard and upload the files. You need to pay attention to the purpose of the uploaded files. In this project, the data sample files are used as assistants files. Later these file IDs will be used to identify which file is used when the user selects the data sample from the dropdown. Generating Content When the user selects the data sample and clicks the Create Slide button. The Assistant API will generate the image and text for the content slide. These are a few important steps in the code to generate the slide content: 1. Analyze Data Samples OpenAI will analyze the selected data sample then it will calculate the profit by quarter and year then visualize the plot. The prompt for this process is: const analyzeDataPrompt = “Calculate the profit (revenue minus cost) by quarter and year, and visualize as a line plot across the distribution channels, where the colors of the lines are green, light red, and light blue” And this code will process the prompt and the selected file (see fileId) const thread = await openai.beta.threads.create({ messages: [ { “role”: “user”, “content”: analyzeDataPrompt, “attachments”: [ { file_id: fileId, tools: [{ type: “code_interpreter” }] } ] } ] }); From this code, you can get the plot image. It will be saved in the public directory and will be used in the slide content. 2. Generate Bullet Points The AI Assistant will give an insight into the data and will generate bullet points. This is the prompt to instruct AI to give two insights about the data: const insightPrompt = `Give me two medium-length sentences (~20-30 words per sentence) of the most important insights from the plot you just created, and save each sentence as an item in one array. Give me a raw array, no formatting, no commentary. These will be used for a slide deck, and they should be about the ‘so what’ behind the data.` 3. Generate Insight Title The last step is generating a title for the insight. This is the prompt that is responsible for that: const titlePrompt = “Given the plot and bullet point you created, come up with a very brief title only for a slide. It should reflect just the main insights you came up with.” The full code for generating slide content is in the libs/ai.js file. Generate Slides This project uses the PptxGenJS package to generate the slides. You can look at the full code in the libs/pptx.js file. This is the code that calls the createPresentation() function when all the AI-generated slide information is ready. //… if (bulletPointsSummary.status === “completed”) { const message = await openai.beta.threads.messages.list(thread.id) const dataVisTitle = message.data[0].content[0].text.value presentationOptions = { title: slideTitle, subtitle: slideSubtitle, dataVisTitle: dataVisTitle, chartImagePath: path.join(__dirname, “public”, `${filename}`), keyInsights: “Key Insights:”, bulletPoints: bulletPoints, outputFilename: path.join(__dirname, ‘public’, pptxFilename) }; try { createPresentation(presentationOptions) } catch (error) { console.log(error) } } //… Just note that the generated presentation file will be saved in the public directory with each a unique name. Slides Information To save the slide information, we will use the GridDB database. These are the database field’s documentation: Field Name Type Description id INTEGER A unique identifier for each record. This is the primary key for the container and must be unique for each entry. title STRING The main title of the slide. It is a short descriptive title summarizing the content of the slide. subtitle STRING A secondary title or subheading providing additional context or a brief description related to the main title. chartImage STRING The URL or path to an image of a chart associated with the slide, used to link visual data representations. bulletPoints STRING A string containing bullet points that summarize key information or highlights of the slide. Each bullet point is typically separated by a special character or newline. pptx STRING The URL or path to the PowerPoint file (.pptx) that contains the slide, used to link the presentation file including the slide. The griddbservices.js and libs/griddb.js files are responsible for saving all the slide information to the database. Server Routes The Node.js server provides a few routes for the client. This is the full documentation for the routes: Method Route Description GET / Serves the index.html file from the ‘dist’ folder GET /create/:fileId Triggers the AI assistant to process a file and create a presentation. Returns the save status and PPTX file name GET /metadata Serves the metadata.json file from the data directory GET /data/files Lists all JSON filenames in the data directory GET /data/files/:filename Serves a specific JSON file from the data directory GET /slides Retrieves all slides data from the database The most important route is /create/:fileId which triggers the AI assistant to analyze data samples, create a presentation, and then save all slide information to the database. app.get(‘/create/:fileId’, async (req, res) => { const fileId = req.params.fileId try { const result = await aiAssistant(fileId) if (result.status === “completed”) { const { title: titlePptx, subtitle: subtitlePptx, dataVisTitle: dataVisTitlePptx, chartImage: chartImagePptx, bulletPoints: bulletPointsPptx, outputFilename: pptxFile } = result.data const saveDataStatus = await saveData({ titlePptx, subtitlePptx, dataVisTitlePptx, chartImagePptx, bulletPointsPptx, pptxFile }) res.json({ save: saveDataStatus, data: result.data, pptx: result.pptx }) } else { res.status(500).json({ error: ‘Task not completed’, status: result.status }) } } catch (error) { console.error(‘Error in AI Assistant:’, error) res.status(500).json({ error: ‘Error in AI Assistant’, details: error.message }) } }) The aiAssistant() function will analyze the data sample, create a presentation return all information about the slide, and then save those slide information to the GridDB database using the saveData() function. To get all the slide data just go to the /slides route and it will respond with all slide data saved in the database. User Interface The main user interface consists of two components: Data Dropdown: To select a data sample. Create Slide Button: To trigge presentation creation. Download Generated Presentation Link: The download link for the presentation .pptx file. Further Enhancements This is a prototype project with static data samples. Ideally, in production, you need to provide a better user interface to upload the data and customize the

More
GridDB’s New v5.6 Features

With the release of GridDB v5.6, we are taking a look at the new features that come bundled with this new update. To read the entirety of the notes, you can read them directly from GitHub: GridDB CE v5.6 Release Notes. You can also read the detailed GridDB documentation, including the new v5.6 updates here: https://www.toshiba-sol.co.jp/en/pro/griddb/docs-en/v5_6/GridDB_FeaturesReference.html Of the new features, today we are focusing on the new data compression algorithm that is now select-able in the gs_node.json config file and automatic time aggregation from the GridDB CLI tool. Prior to v5.6, there were only two methods of compression that were select-able: NO_COMPRESSION and COMPRESSION_ZLIB. Though the default setting is still no compression for all versions, version 5.6 offers a new compression method called COMPRESSION_ZSTD. This compression method promises to be more efficient at compressing your data regularly, and also at compressing the data itself, meaning we can expect a smaller footprint. So, in this article, we will inserting a consistent amount of data into GridDB, comparing the resulting storage space taken up, and then finally comparing between all three compression methods. As for automatic aggregation, we will show a brief demonstration of how it looks at the end of this article. But first, compression. Methodology As explained above, we will need to easily compare between three instances of GridDB with the same dataset. To accomplish this, it seems docker would be the easiest method because we can easily spin up or down new instances and change the compression method for each instance. If we do this, then we simply use the same dataset or the same data generation script for each of the instances. To get a robust enough dataset to really test the compression algorithm differences, we decided on 100 million rows of data. Specifically, we wanted the dataset to be similar enough in some respects that the compression can do its job so that we in turn can effectively measure its effectiveness. The three docker containers will be griddb-server1, griddb-server2, and griddb-server3. The compression levels are set in the docker-compose file, but we will do it the way that makes the most sense to me: server1 is NO_COMPRESSION, server2 is the old compression system (COMPRESSION_ZLIB), and server3 is the new compression system (COMPRESSION_ZSTD). So when we run our gen-script, we can use command line arguments to specify which container we want to target. More on that in the next section. How to Follow Along If you plan to build and test out these methods yourself while you read along, you can grab the source code from our GitHub page: . Once you have the repo, you can start with spinning up your GridDB servers. We will get into how to run the generation data script to push 100m rows of data into your servers in the next section. To get the three servers running, the instructions are laid out in the docker compose file located in the root of the project repository; you can simply run: $ docker compose build $ docker compose up -d If all goes well, you should have three GridDB containers running: griddb-server1, griddb-server2, & griddb-server3. Implementation To implement, we used a node.js script which generated 100m rows of random data. Because our GridDB containers are spun up using Docker, we made all three docker containers for GridDB separate services inside of a docker compose file. We then grabbed that docker network name and used it when running our nodejs script. This means that our nodejs script was also built into a docker container and then we used that to push data into the GridDB containers with the following commands: $ docker build -t gen-data . $ docker run –network docker-griddb_default gen griddb-server1:10001 $ docker run –network docker-griddb_default gen griddb-server2:10001 $ docker run –network docker-griddb_default gen griddb-server3:10001 Here is the nodejs script in its entirety: const griddb = require(‘griddb-node-api’); const process = require(‘process’); var fs = require(‘fs’); var factory = griddb.StoreFactory.getInstance(); var store = factory.getStore({ “notificationMember”: process.argv[2], “clusterName”: “myCluster”, “username”: “admin”, “password”: “admin” }); const conInfo = new griddb.ContainerInfo({ ‘name’: “compressionBlog”, ‘columnInfoList’: [ [“timestamp”, griddb.Type.TIMESTAMP], [“location”, griddb.Type.STRING], [“data”, griddb.Type.FLOAT], [“temperature”, griddb.Type.FLOAT], ], ‘type’: griddb.ContainerType.COLLECTION, ‘rowKey’: false }); function getRandomFloat(min, max) { return Math.random() * (max – min) + min; } const putCont = async (sensorCount, data, temperature) => { const rows = generateSensors(sensorCount, data, temperature); try { const cont = await store.putContainer(conInfo) await cont.multiPut(rows); } catch (error) { console.log(“error: “, error) } } const generateSensors = (sensorCount, data, temperature) => { const arr = [] let now = new Date(); for (let i = 1; i <= sensorCount; i++) { let tmp = []; let newTime = now.setMilliseconds(now.getMinutes() + i) tmp.push(newTime) tmp.push(“A1”) tmp.push(data) tmp.push(temperature) arr.push(tmp) } return arr; } const AMTROWS = 10000; const AMTPASSES = 10000; (async () => { try { console.log(“attempting to gen data and push to GridDB”) for (let i = 0; i < AMTPASSES; i++) { const data = parseFloat(getRandomFloat(1, 10).toFixed(2)) const temperature = parseFloat(getRandomFloat(60, 130).toFixed(2)) await putCont(AMTROWS, data, temperature); } console.log(“Finished pushing data!”) } catch (error) { console.log(“Error putting to container”, error); } })(); The code itself is simple and self explanatory but please note that if you plan to follow along, inserting this volume of rows into GridDB takes a long time and you should be prepared to let the script work for ~10-20 minutes, depending on your server’s hardware. Compression Method Results Now that we have our rows of data inside of our three GridDB containers, we can let GridDB handle the actual compressing of the data. This process happens automatically and in the background; you can read more about that here: https://www.toshiba-sol.co.jp/en/pro/griddb/docs-en/v5_6/GridDB_FeaturesReference.html#database-compressionrelease-function. To check how much space your 100 million rows of data are taking up, you can run the following command against each Docker container of GridDB: $ docker exec griddb-server1 du -sh /var/lib/gridstore 16G /var/lib/gridstore/ Which checks the storage space used up by GridDB in total, including any swap files and logs. If you just want the data: $ docker exec griddb-server1 du -sh /var/lib/gridstore/data 12G /var/lib/gridstore/data/ This, of course, must be repeated for all three containers. You can also verify the compression method in your GridDB container like so: $ docker exec griddb-server3 cat /var/lib/gridstore/conf/gs_node.json | grep “storeCompressionMode” “storeCompressionMode”: “COMPRESSION_ZSTD”, Beyond testing the storage space used, we tested how long it took to load the data and how long a query takes. You can see the results here in the following table. For every row/cell, a lower value is better and idincated superior user experience and usability. NO_COMPRESSION COMPRESSION_ZLIB COMPRESSION_ZSTD (added v5.6) Search (ms) 32,644 20,666 11,475 Agreggation (ms) 30,261 13,302 8,402 Storage (gridstore) 11,968,312 (17GB) 7,162,824 (6.9GB) 6,519,520 (6.3GB) Storage (/data) 17,568,708 (12GB) 1,141,152 (1.1GB) 1,140,384 (1.1GB) Insert (m:ss.mmm) 14:42.452 15:02.748 15:05.404 To test the query speed, we did both select * and aggregation queries like: select AVG(data) from and then took the average of 3 results and placed them into the table. The results are clear: compression helps a lot more than it hurts. It helps save on storage space but also helps query speeds. Version 5.6’s compression method seems to both save storage space and also help query speed by a meaningful amount. All of this is done of course on consumer level hardware. Automatic Aggregation with CLI This functionality utilizes cron on your linux machine to regularly run the script you create. But essentially, what this addition allows is for you to run an aggregation on one of your containers, and then push all of those values onto another table, allowing for you to periodically run new queries, perhaps in the background when your resources aren’t in use. This way you can have updated/fresh values on hand without needing to conduct your aggregations and wait for possibly long calculation times. The way it works is you can now Insert values from one table into another like so: gs[public]> INSERT OR REPLACE INTO device_output (ts, co) SELECT ts,avg(co) FROM device WHERE ts BETWEEN TIMESTAMP(‘2020-07-12T00:29:38.905Z’) AND TIMESTAMP(‘2020-07-19T23:58:25.634Z’) GROUP BY RANGE(ts) EVERY (20,SECOND); The 34,468 records had been inserted. And so, knowing this, we can do some clever things, like writing a GridDB CLI script file (.gsh), and allowing for that script to get the latest values from a table, run aggregation, and then push them out into your etl_output file. Once you write that script file, you can set up a cron job to regularly schedule the script to run in the background. This process will allow your agg output file to be regularly updated with new, up-to-date values completely automatically! Here is an example script file directly from the docs page: # gs_sh script file (sample.gsh) # If no table exists, create a partitioning table with intervals of 30 days to output data. CREATE TABLE IF NOT EXISTS etl_output (ts TIMESTAMP PRIMARY KEY, value DOUBLE) PARTITION BY RANGE (ts) EVERY (30, DAY); # Retrieve the last run time registered. If it does not exist, retrieve the time one hour before the present. SELECT case when MAX(ts) ISNULL THEN TIMESTAMP_ADD(HOUR,NOW(),-1) else MAX(ts) end AS lasttime FROM etl_output; # Store the retrieved time in a variable. getval LastTime # Set the aggregation range between the time retrieved and the present time and obtain the average value for every 20 seconds. Register or update the results into the output container. INSERT OR REPLACE INTO etl_output (ts, value) SELECT ts,avg(value) FROM etl_input WHERE ts BETWEEN TIMESTAMP(‘$LastTime’) AND NOW() GROUP BY RANGE(ts) EVERY (20, SECOND); In this example, we’re placing aggregated results from etl_input into etl_output. Pretty

More
GridDB on ARM with Docker

GridDB running via Docker containers isn’t a new topic. We have covered it before: https://griddb.net/en/blog/run-a-griddb-server-in-docker-desktop/ & https://griddb.net/en/blog/improve-your-devops-with-griddb-server-and-client-docker-containers/. In this blog, we want to again touch on using GridDB on Docker, but will focus instead on using GridDB on ARM architecture, namely a Mac with Apple silicon (M1, M2, etc). So, in this blog, we will provide a docker image which works with ARM devices, as well as walk through how to spin up application containers to work in conjunction with your docker container service. Running GridDB & GridDB Applications with Docker First, you can read the source code that accompanies this article here: https://github.com/griddbnet/griddb-docker-arm. It contains the docker image itself which you can build to run on your ARM machine. The image itself is also available for pulling from the GridDB.net Dockerhub page. The full image/tag name is: griddbnet/griddb:arm-5.5.0. The nodejs application repo is also available: griddbnet/nodejs-arm:latest Running GridDB Server To pull and run this image: $ docker network create griddb-net $ docker pull griddbnet/griddb:arm-5.5.0 $ docker run –name griddb-server \ –network griddb-net \ -e GRIDDB_CLUSTER_NAME=myCluster \ -e GRIDDB_PASSWORD=admin \ -e NOTIFICATION_MEMBER=1 \ -d -t griddbnet/griddb:arm-5.5.0 These commands will create a network for your GridDB server and any containers you intend to run with it. It will also download the built image and then run the image on your machine. Once you confirm it’s running, you can try running application code, using your GridDB container as the data store. Running Application Containers First, let’s grab the source code and build our nodejs container to run some arbitrary code using GridDB as our connection. $ git clone https://github.com/griddbnet/Blogs.git –branch docker-arm Next, here are the commands to run some node.js GridDB code against your containerized server. First, let’s run the sample code that accompanies the official node.js GridDB repo $ cd Blogs/nodejs/node-api $ docker build -t griddb_node_app . $ docker run –name griddb-node –network griddb-net -e GRIDDB_CLUSTER_NAME=myCluster -e GRIDDB_USERNAME=admin -e GRIDDB_PASSWORD=admin -e IP_NOTIFICATION_MEMBER=griddb-server griddb_node_app First, we need to grab the source code which contains some modified files when compared to the official source code (changes to allow the C_Client to run on macos/ARM, which is required for non java programming language connectors). Then we build the image and run it, setting some options such as cluster name, user/pass combo, and finally the IP_NOTIFICATION_MEMBER which explictly tells the container the ip address of the GridDB server container. Of course here, when running this, you are simply running the sample code provided, not your own. But it also lays out the framework for running your own GridDB nodejs code. The flow is as follows: you write your code, build the docker image, and then run it with explict case of choosing the docker network and pointing to the correct hostname/ip address. To go along with the nodejs application interface, JDBC and Java have also been tested and confirmed to work with an ARM based Mac using an M-series chip. Examples of Creating Application Container To build and run your own application in docker, the process is simple: you write the application in your language of choice, write the Dockerfile for that application, and then finally build & run the container, ensuring the use the same network as used when running the GridDB container. Node.js For example, let’s say you wrote a quick node.js script to generate some ‘fake’ data. To keep the application connection agnostic, you can keep the connection details as command line arguments, meaning when you run your docker container, you can simply enter in the docker container you wish to connect to similar to how it was done above. If you enter in the environment details when running the docker container. These details will then be picked up by our entry point script. Here is the Dockerfile for installing the GridDB Node.js connector, along with the c_client connector on an ARM machine. Most of the file is installing everything necessary, including installing the included c_client rpm file. In this instance, we are simply copying over the one file we want to run (gen-data.js) along with the entrypoint script. FROM rockylinux:9.3 ENV GRIDDB_NODE_API_VERSION=0.8.5 ENV NODE_PATH=/root/node-api-${GRIDDB_NODE_API_VERSION} # Install griddb server RUN set -eux \ && dnf update -y \ # Install nodejs version 16.x and c client for griddb nodejs_client && dnf install -y curl make python3 tar –allowerasing \ && dnf groupinstall -y ‘Development Tools’ RUN curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash RUN source ~/.nvm/nvm.sh && nvm install 20 && nvm use 20 COPY ./lib/griddb-c-client-5.5.0-linux.aarch64.rpm / RUN rpm -Uvh /griddb-c-client-5.5.0-linux.aarch64.rpm SHELL [“/bin/bash”, “–login”, “-c”] # Copy entrypoint script and sample for fixlist RUN mkdir /app COPY run-griddb.sh gen-data.js /app/ WORKDIR /root # Install nodejs client RUN curl -L https://github.com/griddb/node-api/archive/refs/tags/${GRIDDB_NODE_API_VERSION}.tar.gz -o ${GRIDDB_NODE_API_VERSION}.tar.gz -sS \ && tar -xzvf ${GRIDDB_NODE_API_VERSION}.tar.gz \ && cd node-api-${GRIDDB_NODE_API_VERSION} WORKDIR /root/node-api-${GRIDDB_NODE_API_VERSION} RUN npm install RUN rm ../${GRIDDB_NODE_API_VERSION}.tar.gz WORKDIR /app # Set permission executable for script RUN chmod a+x run-griddb.sh # Run sample CMD [“/bin/bash”, “run-griddb.sh”] And here is the simple run-griddb.sh script. All it does is basically run the node command with the proper arg details to connect to our GridDB docker container. #!/bin/bash if [ -z “$GRIDDB_CLUSTER_NAME” ]; then GRIDDB_CLUSTER_NAME=’dockerGridDB’ fi if [ -z “$NOTIFICATION_ADDRESS” ]; then NOTIFICATION_ADDRESS=239.0.0.1 fi if [ -z “$NOTIFICATION_PORT” ]; then NOTIFICATION_PORT=31999 fi if [ -z “$GRIDDB_USERNAME” ]; then GRIDDB_USERNAME=’admin’ fi if [ -z “$GRIDDB_PASSWORD” ]; then GRIDDB_PASSWORD=’admin’ fi if [ -z “$IP_NOTIFICATION_MEMBER” ]; then echo “Run GridDB node_api client with GridDB server mode MULTICAST : $NOTIFICATION_ADDRESS $NOTIFICATION_PORT $GRIDDB_CLUSTER_NAME $GRIDDB_USERNAME $GRIDDB_PASSWORD” source ~/.nvm/nvm.sh && nvm use 20 node sample1.js $NOTIFICATION_ADDRESS $NOTIFICATION_PORT $GRIDDB_CLUSTER_NAME $GRIDDB_USERNAME $GRIDDB_PASSWORD else echo “Run GridDB node_api client with GridDB server mode FixedList : $IP_NOTIFICATION_MEMBER:10001 $GRIDDB_CLUSTER_NAME $GRIDDB_USERNAME $GRIDDB_PASSWORD” source ~/.nvm/nvm.sh && nvm use 20. node gen-data.js $IP_NOTIFICATION_MEMBER:10001 $GRIDDB_CLUSTER_NAME $GRIDDB_USERNAME $GRIDDB_PASSWORD fi $ docker build -t nodejs-gen-griddb . We are building our current Dockerfile with the tag of nodejs-gen-griddb. Then we run it, specifying the connection details: $ docker run –network griddb-net -e GRIDDB_CLUSTER_NAME=myCluster -e GRIDDB_USERNAME=admin -e GRIDDB_PASSWORD=admin -e IP_NOTIFICATION_MEMBER=griddb-server nodejs-gen-griddb JDBC Here is another example, connecting to our GridDB server using Java and JDBC so that we can run SQL commands. First, we create our java program. In this case, we simply want to make a connection and then create a new table. String notificationMember = args[0]; String clusterName = args[1]; String databaseName = args[2]; // String notificationMember = “griddb-server:20001”; // String clusterName = “myCluster”; // String databaseName = “public”; String username = “admin”; String password = “admin”; String encodeClusterName = URLEncoder.encode(clusterName, “UTF-8”); String encodeDatabaseName = URLEncoder.encode(databaseName, “UTF-8”); String jdbcUrl = “jdbc:gs://” + notificationMember + “/” + encodeClusterName + “/” + encodeDatabaseName; System.out.println(jdbcUrl); Properties prop = new Properties(); prop.setProperty(“user”, username); prop.setProperty(“password”, password); con = DriverManager.getConnection(jdbcUrl, prop); System.out.println(“Connected to cluster via SQL Interface”); String SQL = “CREATE TABLE IF NOT EXISTS devices (ts TIMESTAMP PRIMARY KEY, co DOUBLE, humidity DOUBLE,light BOOL,lpg DOUBLE,motion BOOL,smoke DOUBLE,temp DOUBLE) USING TIMESERIES WITH (expiration_type=’PARTITION’,expiration_time=90,expiration_time_unit=’DAY’) PARTITION BY RANGE (ts) EVERY (60, DAY)SUBPARTITION BY HASH (ts) SUBPARTITIONS 64;”; Statement stmt = con.createStatement(); stmt.executeUpdate(SQL); System.out.println(“Successfully created container called: devices”); And now we create the dockerfile to build this java program to be run against the GridDB server. FROM alpine:3.14 WORKDIR /app RUN apk add –no-cache wget RUN apk add openjdk11 RUN wget https://repo1.maven.org/maven2/com/github/griddb/gridstore-jdbc/5.6.0/gridstore-jdbc-5.6.0.jar ENV CLASSPATH /app/gridstore-jdbc-5.6.0.jar COPY ./src ./src WORKDIR /app/src/main/java/ RUN javac net/griddb/jdbc/Jdbc.java CMD [“java”, “net/griddb/jdbc/Jdbc.java”, “griddb-server:20001”, “myCluster”, “public”] For this build process, we install java and wget, download the latest griddb jdbc driver, add it to our class path environment, and then simply compile and run our java code. If all goes well, you should be able to run the docker image and set the network to be equal to where your GridDB server is connected and have it work that way. In this case, we left the command line arguments within the Dockerfile itself, meaning you can simply change how the code is executed to keep it flexible. Conclusion And now you should be able to run both nodejs and JDBC containers on your ARM devices. If you get other programming languages running ony our machines, please let us know in the GridDB forum:

More
Building Video Summarizer Using AI

With the recent advancements in AI technology, such as OpenAI, it has become possible to automate tasks that were previously too tedious to perform manually. An example of this is a video summarizer. Previously, the process of summarizing video content relied mainly on human vision and hearing. However, with AI models such as GPT-4 and Whisper, it is now possible to automate this task. We will be utilizing the following technologies: OpenAI, Node.js, React, and GridDB. This blog will teach you how to create a basic web application for uploading a video and receiving a summary of the content. Getting Started Source code can be found here: $ git clone https://github.com/griddbnet/Blogs –branch voice_summarizer This project running on Ubuntu 20.04 LTS. These are the mandatory stack requirements that you need to run for this project: OpenAI Key To access any OpenAI services, we need a valid key. Go to this link and create a new OpenAI key. The OpenAI key is on a project basis, so we need to create a project first in the OpenAI platform and you need also to enable any models that you use on a project. For this project, we will need gpt-4o and whisper models. The OpenAI key will be saved on the .env file and make sure not to include it in version control by adding it to the .gitignore. Node.js This project will run on the Node.js platform. You need to install it from here. For this project, we will use the nvm package manager and Node.js v16.20.2 LTS version. # installs nvm (Node Version Manager) curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash # download and install Node.js nvm install 16 # verifies the right Node.js version is in the environment node -v # should print `v16.20.2` # verifies the right NPM version is in the environment npm -v # should print `8.19.4“ To connect Node.js and GridDB database, you need the gridb-node-api npm package which is a Node.js binding developed using GridDB C Client and Node addon API. FFmpeg This project utilizes the fluent-ffmpeg npm package, which requires FFmpeg to be installed on the system. For Ubuntu, you can use the following command to install it: sudo apt update sudo apt install ffmpeg For more installation information, please go to the FFmpeg official website. GridDB To save the video summary and video data, we will use the GridDB database. Please look at the guide for detailed installation. We will use Ubuntu 20.04 LTS here. Run GridDB and check if the service is running. Use this command: sudo systemctl status gridstore If not running try to run the database with this command: sudo systemctl start gridstore Run the Project Pull the Source Code To run the project, you need to clone the code from this repository. Run this command to pull the source code. git clone https://github.com/junwatu/video-summarizer-nodejs-griddb.git Change the directory to the app folder and install any project dependencies using this command: cd video-summarizer-nodejs-griddb npm install Setup .env This project needs a few environment variables. Copy the env.example file to the .env file. cp .env .example .env You need to fill in the environment variables in this file: OPENAI_API_KEY=sk-…. VITE_API_URL=http://localhost:3000 It is important to note that the project must be restarted every time you change the VITE_API_URL environment variable. Start the Project Run this command to run the project: npm run start:build Open Web App Open the web app using the browser. The default URL is http://localhost:3000. Upload any videos and it’s recommended to upload a short video for fast processing. You can also use the video in the test/video folder. Depending on the video duration, it will take a minute to process. How it Works? The user flow for this project or web app involves opening the web app, uploading the video, waiting for processing, and receiving the summary result. It uses the GPT-4o and Whisper models from OpenAI to summarize the uploaded user video. This project requires two models because OpenAI models cannot process video directly, however they can process images or audio files. On the other side, in Node.js, to separate the video into images and audio files, we use the fluent-ffmpeg npm package. These are the primary preparation steps for videos before we input them into OpenAI models for the summarization process. 1. Video Processing While it’s not possible to directly send a video to the API, GPT-4o can understand videos if you sample frames and then provide them as images. It performs better at this task than the earlier GPT-4 Turbo model. This function, extractFrames(), will extract images from the video file and save them in the frames folder. export function extractFrames(videoPath, secondsPerFrame, outputFolder) { return new Promise((resolve, reject) => { const frameRate = 1 / secondsPerFrame const framePattern = path.join(outputFolder, ‘frame-%03d.png’) ffmpeg(videoPath) .outputOptions([`-vf fps=${frameRate}`]) .output(framePattern) .on(‘end’, () => { fs.readdir(outputFolder, (err, files) => { if (err) { reject(err) } else { const framePaths = files.map(file => path.join(outputFolder, file)) resolve(framePaths) } }) }) .on(‘error’, reject) .run() }) } In the function extractFrames, the parameter secondsPerFrame defines the interval between frames that you want to extract from the video. Specifically, secondsPerFrame determines how many seconds should elapse between each frame that is extracted. Here’s how it works: Frame Rate Calculation: The frame rate is calculated as the reciprocal of secondsPerFrame, i.e., frameRate = 1 / secondsPerFrame. This means: If secondsPerFrame is 1, the frame rate is 1 frame per second. If secondsPerFrame is 0.5, the frame rate is 2 frames per second. If secondsPerFrame is 2, the frame rate is 0.5 frames per second (one frame every 2 seconds). 2. Image Processing The GPT-4o model can directly process images and take intelligent actions based on the image. We can provide images in two formats: Base64 Encoded URL In this project, we will use base64 encoding for the images. The function imageToBase64() will read each image file and convert it into a base64 encoded image. export function imageToBase64(imagePath) { return new Promise((resolve, reject) => { fs.readFile(imagePath, (err, data) => { if (err) { reject(err) } else { const base64String = data.toString(‘base64′) resolve(base64String) } }) }) } 3. Audio Extraction For a better context summarization, we can add audio to the OpenAI model. To extract audio from video, we can also use the fluent-ffmpeg npm. The audio result is in mp3 format and saved in the audio directory. // Function to extract audio from video export function extractAudio(videoPath, audioPath) { return new Promise((resolve, reject) => { ffmpeg(videoPath) .output(audioPath) .audioBitrate(’32k’) .on(‘end’, resolve) .on(‘error’, reject) .run() }) } 4. Audio Transcription After extracting the audio, we need to transcribe it into text using the speech-to-text model Whisper. async function transcribeAudio(filePath) { try { const transcription = await openai.audio.transcriptions.create({ file: fs.createReadStream(filePath), model: ‘whisper-1’ }) return transcription.text } catch (error) { throw new Error(`Transcription failed: ${error.message}`) } } The transcribeAudio() will transcribe an audio file to text using the whisper-1 AI model. For more information about how this speech-to-text model works, please read here. The code for video processing, image processing, and audio extraction can be found in the file libs/videoProcessing.js. Video Summarization Process The video summary is created by inputting both the visual and audio transcription elements of the video into the model simultaneously. By providing both of these inputs, the model is expected to produce a more accurate summary as it can perceive the entire video at once. // Generate a summary with visual and audio transcription import OpenAI from “openai”; const openai = new OpenAI({ // eslint-disable-next-line no-undef apiKey: process.env.OPENAI_API_KEY }); async function createVideoSummarization(frames, audioTranscription) { const frameObjects = frames.map(x => ({ type: ‘image_url’, image_url: { url: `data:image/jpg; base64, ${x}`, detail: ‘low’ } })); const response = await openai.chat.completions.create({ model: “gpt-4o”, messages: [{ role: “system”, content: “You are generating a video summary. Please provide a summary of the video. Respond in Markdown.” } , { role: “user”, content: [{ type: ‘text’, text: “These are the frames from the video.” } , …frameObjects, { type: ‘text’, text: `The audio transcription is: ${audioTranscription}` } ], } , ], temperature: 0, }); console.log(response.choices[0].message.content); return response; } export { createVideoSummarization } The content parameter is an array and may contain text or images. Prompts can be added to summarize the video, as image frames addition, and audio text transcription for better context. You can look into the Chat API documentation for more information about the parameters. Save Video Summary to GridDB The GridDB database is utilized to store the video summary, video file path, and audio transcription. The code to save these data resides in the griddbservices.js file. export async function saveData({ filename, audioTranscription, summary }) { const id = generateRandomID(); const videoFilename = String(filename); const audioToText = String(audioTranscription); const videoSummary = String(summary); const packetInfo = [parseInt(id), videoFilename, audioToText, videoSummary]; const saveStatus = await GridDB.insert(packetInfo, collectionDb); return saveStatus; } There are three important fields here, which are: Parameter Type Description filename String The name of the video file. audioTranscription String The transcription of the audio from the video. summary String A summary of the video content. The saveData function is a wrapper to save data to GridDB. You can find the real code that saves the data in the libs/griddb.cjs file. Get All Summaries All the summaries data can be accessed in the route /summaries. The default URL is: http://localhost:3000/summaries The response is JSON data, which is very easy to process on the client if you need further features or enhancements for the project. Limitation This project is a prototype and tested with MP4 videos with a video duration not exceeding 5

More
Interact with GridDB Data Using a LangChain Chatbot

This article demonstrates creating an interactive LangChain chatbot to retrieve information from a GridDB database using natural language queries. We will use the Python LangChain library and the OpenAI GPT-4o LLM (Large Language Model), to convert natural language queries into GridDB queries to interact seamlessly with the database. Source code And Jupyter Notebook You can find the source code (jupyter notebook) from our github repo: $ git clone https://github.com/griddbnet/Blogs.git –branch chatbot Prerequisites You need to install the following libraries to run codes in this article: GridDB C Client GridDB Python client Follow the instructions on the GridDB Python Package Index (Pypi) page to install these clients. You must also install LangChain, Numpy, Pandas, and Seaborn libraries. The scripts below install and import the libraries you will need to run the code in this blog. !pip install langchain !pip install langchain-core !pip install langchain-openai !pip install langchain-experimental !pip install tabulate import griddb_python as griddb import pandas as pd from langchain_openai import OpenAI from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.pydantic_v1 import BaseModel, Field from langchain.agents.agent_types import AgentType from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent from langchain.memory import ConversationBufferMemory from langchain.chains import ConversationChain from typing import List, Dict Creating a Connection with GridDB To interact with GridDB via a LangChain chatbot, you must create a connection with GridDB instance. To do so, you must create an object of the StoreFactory class using the get_instance() method. Next, call the get_store() method on the factor object and pass it the database hostname, cluster name, user, and password parameters. In the following script, we create a connection with a GridDB instance and test if the connection is successful by creating a container object. factory = griddb.StoreFactory.get_instance() DB_HOST = “127.0.0.1:10001” DB_CLUSTER = “myCluster” DB_USER = “admin” DB_PASS = “admin” try: gridstore = factory.get_store( notification_member = DB_HOST, cluster_name = DB_CLUSTER, username = DB_USER, password = DB_PASS ) container1 = gridstore.get_container(“container1”) if container1 == None: print(“Container does not exist”) print(“Successfully connected to GridDB”) except griddb.GSException as e: for i in range(e.get_error_stack_size()): print(“[“, i, “]”) print(e.get_error_code(i)) print(e.get_location(i)) print(e.get_message(i)) Output: Container does not exist Successfully connected to GridDB If the connection is successful, you should see the above message. Else, you can verify your credentials and try again. Inserting Sample Data Into GridDB We will create a Chatbot that will return information from a GridDB container. We will create a GridDB container that contains world population statistics of different countries from 1970 to 2022. You can find more details about the dataset in my previous article on world population data analysis using GridDB. You can download the dataset from Kaggle. The script below the world_population.csv you downloaded into a Pandas DataFrame. ## Dataset link: https://www.kaggle.com/datasets/iamsouravbanerjee/world-population-dataset dataset = pd.read_csv(r”/home/mani/GridDB Projects/world_population.csv”) print(dataset.shape) dataset.head() Output: You can see that the dataset contains information such as country population, capital, continent, etc. The dataset column contains special characters you must remove since GridDB doesn’t allow containers with column names to have special characters. dataset.columns = dataset.columns.str.replace(‘[^a-zA-Z0-9]’, ‘_’, regex=True) dataset.dtypes Output: Next, we must map the DataFrame columns to the GridDB-compliant column types before inserting data into a GridDB container. The following script inserts the data from the dataset DataFrame into a PopulationStats GridDB container. # see all GridDB data types: https://docs.griddb.net/architecture/data-model/#data-type def map_pandas_dtype_to_griddb(dtype): if dtype == ‘int64’: return griddb.Type.LONG elif dtype == ‘float64’: return griddb.Type.FLOAT elif dtype == ‘object’: return griddb.Type.STRING # Add more column types if you want else: raise ValueError(f’Unsupported pandas type: {dtype}’) container_columns = [] for column_name, dtype in dataset.dtypes.items(): griddb_dtype = map_pandas_dtype_to_griddb(str(dtype)) container_columns.append([column_name, griddb_dtype]) container_info = griddb.ContainerInfo(“PopulationStats”, container_columns, griddb.ContainerType.COLLECTION, True) try: cont = gridstore.put_container(container_info) for index, row in dataset.iterrows(): cont.put(row.tolist()) print(“All rows have been successfully stored in the GridDB container.”) except griddb.GSException as e: for i in range(e.get_error_stack_size()): print(“[“, i, “]”) print(e.get_error_code(i)) print(e.get_location(i)) print(e.get_message(i)) Output: All rows have been successfully stored in the GridDB container. Now that we have created a GridDB container containing sample records, we will create a LangChain chatbot that will allow you to retrieve information from the sample data container. Creating a LangChain Chatbot to Interact with GridDB Data In LangChain, you can create Chatbots using a wide range of large language (LLM) models. In this article, we will create a LangChain chatbot using GPT-4o, a state-of-the-art LLM from OpenAI. To use GPT-4o in LangChain, you need to create an object of ChatOpenAI class and pass it your OpenAI API Key. OPENAI_API_KEY = “YOUR_OPENAI_API_KEY” llm = ChatOpenAI(api_key = OPENAI_API_KEY , temperature = 0, model_name = “gpt-4o”) Problem with Default LangChain Chains for Creating a Chatbot for Tabular Data In my previous article, I explained how to perform CRUD operations on GridDB with LangChain. The approach used in that article is good for interacting with GridDB using natural language if you already know the exact names of the GridDB container and columns. Otherwise, the LLM will attempt to retrieve information using made-up column names. For instance, in the following section, we try to get the names of the top 3 countries with the highest population in 2020. class SelectData(BaseModel): container_name: str = Field(description=”the container name from the user query”) query:str = Field(description=”natural language converted to SELECT query”) system_command = “”” Convert user commands into SQL queries for Griddb. “”” user_prompt = ChatPromptTemplate.from_messages([ (“system”, system_command), (“user”, “{input}”) ]) select_chain = user_prompt | llm.with_structured_output(SelectData) def select_records(query): select_data = select_chain.invoke(query) container_name = select_data.container_name select_query = select_data.query print(select_query) result_container = gridstore.get_container(container_name) query = result_container.query(select_query) rs = query.fetch() result_data = rs.fetch_rows() return result_data select_records(“From the PopulationStats container, return the top 3 countries with the highest population in 2020″) Output: From the above output, you can see that the LLM generates a query that returns information from the country, population, and year columns. However, looking at the dataset, you will find no year column. Instead, the population information for the year 2020 is stored in the 2020 Population column. To solve this problem, you can use LangChain agents. LangChain Agents for Interacting with Tabular Data To use LangChain agents, we will define a BaseModel class and a select_chain that extracts the container name and the additional query information from the user query. class SelectData(BaseModel): container_name: str = Field(description=”the container name from the user query”) natural_query:str = Field(description = “user query string to retrieve additional information from result returned by the SELECT query”) system_command = “”” Convert user commands into SQL queries for Griddb. “”” user_prompt = ChatPromptTemplate.from_messages([ (“system”, system_command), (“user”, “{input}”) ]) select_chain = user_prompt | llm.with_structured_output(SelectData) Next, we will define the select_records() function that accepts a user query and calls the select_chain to retrieve the container name and the additional query. The select_records() function retrieves the container data in Pandas DataFrame. The next step is to create an OpenAI create_pandas_dataframe_agent() and pass to it the DataFrame containing the container data from the GridDB instance. The additional query is passed to the agent’s invoke() method. The agent then retrieves information from the DataFrame based on the additional user query. def select_records(query): select_data = select_chain.invoke(query) container_name = select_data.container_name select_query = f”SELECT * FROM {container_name}” natural_query = select_data. natural_query print(f”Select query: {select_query}”) print(f”Additional query: {natural_query}”) result_container = gridstore.get_container(container_name) query = result_container.query(select_query) rs = query.fetch() result_data = rs.fetch_rows() agent = create_pandas_dataframe_agent( ChatOpenAI( api_key = OPENAI_API_KEY, temperature=0, model=”gpt-4o”), result_data, verbose=True, agent_type=AgentType.OPENAI_FUNCTIONS, allow_dangerous_code = True ) response = agent.invoke(f”Return the following information: {natural_query}”) return response Let’s test the select_records method using the following query: From the PopulationStats container, return the top 3 countries with the highest population in 2020. select_records(“From the PopulationStats container, return the top 3 countries with the highest population in 2020″) Output: The output shows that the SELECT query selects all the records from the PopulationStats container, while the additional query fetches the top 3 countries with the highest population in 2020. As you can see from the above output, the agent will know the column names of the PopulationStats container since it can access the corresponding result_data DataFrame and will return the required information. Creating a LangChain Chatbot Interaction with GridDB Data Now let’s create a chatbot capable of remembering the previous interaction. I recommend that instead of repeatedly defining agents in the select_records function as you did in the previous script, you just fetch the container information in a DataFrame and then use that DataFrame once in the agent. The following script defines SelectData base class and the select_records() function to retrieve the container name from the user query. class SelectData(BaseModel): container_name: str = Field(description=”the container name from the user query”) query:str = Field(description=”natural language converted to SELECT query”) system_command = “”” Convert user commands into SQL queries for Griddb. “”” user_prompt = ChatPromptTemplate.from_messages([ (“system”, system_command), (“user”, “{input}”) ]) select_chain = user_prompt | llm.with_structured_output(SelectData) def select_records(query): select_data = select_chain.invoke(query) container_name = select_data.container_name select_query = select_data.query result_container = gridstore.get_container(container_name) query = result_container.query(select_query) rs = query.fetch() result_data = rs.fetch_rows() return result_data result_data = select_records(“SELECT all records from PopulationStats container”) Next, we define the create_pandas_dataframe_agent and the get_response() functions, which accept a user query and return information about the Pandas DataFrame using the create_pandas_dataframe_agent agent. To implement the chatbot functionality, we can define the chat_with_agent() function, which executes a while loop that keeps calling the get_response() function and prints the agent response on the console. The loop terminates when a user enters’ bye, quit, orexit`. agent = create_pandas_dataframe_agent( ChatOpenAI( api_key=OPENAI_API_KEY, temperature=0, model=”gpt-4″ ), result_data, agent_type=AgentType.OPENAI_FUNCTIONS, allow_dangerous_code=True, ) def get_response(natural_query): # Create a conversation chain # Get the response from the agent response = agent.invoke(f”Return the following information: {natural_query}”) # Add the interaction to the conversation memory return response # Function to chat with the agent def chat_with_agent(): while True: user_input = input(“You: “) if user_input.lower() in [‘exit’, ‘quit’, ‘bye’]: print(“AI: Goodbye!”) break response = get_response(user_input) print(f”AI: {response[‘output’]}”) chat_with_agent() Output: From the above output, you can see chatbot-like functionality retrieving responses about the world population dataset from the GridDB container. Conclusion In this article, you learned how to create a LangChain chatbot to interact with GridDB data using natural language queries. We explored how to connect Python to GridDB, insert sample data into a GridDB container, and retrieve information using LangChain agents. We also demonstrated how you can create a chatbot using LangChain agents. GridDB is a highly scalable NoSQL database designed to handle large volumes of real-time data, making it well-suited for the Internet of Things (IoT) and big data applications. With advanced in-memory processing capabilities and efficient time series data management, GridDB can effectively manage large

More
Pandas with Python GridDB SQL Queries

We have written before about how to pair pandas dataframes with GridDB before in our article: Using Pandas Dataframes with GridDB. In there, we read from our GridDB database via the python API (which uses TQL under the hood) and convert the resulting rows of data to a dataframe. If you’re unfamilar with dataframes, they are the main purpose of using a library like Pandas and can argued as being a superior data structure for analysis and data science. In this article, we want to again visit converting rows of GridDB data into dataframes, but would like to showcase using SQL with JDBC during the querying portion of our code. The reason one might want to use SQL instead of TQL is two fold: You can conduct more intricate queries with SQL because of TQL’s limited functionality Partitioned tables are sometimes not available to be read by TQL, meaning SQL can be the only option for those specific containers So, in this article, we will showcase how to connect to GridDB and make SQL queries with Python and directly feed those results into a pandas dataframe. And please note, we are not simply using JayDeBeApi as we have showcased in our previous article: Using Python to interface with GridDB via JDBC with JayDeBeApi, because the results of those sql queries are not in a valid datatype to be read by pandas. Prerequisites The code for this article has been containerized into a Docker container. To run it, simply install Docker and build and run the project. The source code can be found in the GridDBnet github: $ git clone https://github.com/griddbnet/Blogs.git –branch sql-pandas You can take a look at the Dockerfile contained in the repo to see how to run this in baremetal — essentially you just need to install Python and the appropriate SQL/pandas libraries. You will also need java installed as Java is what is used to make connections with JDBC (Java Database Connection). Python Libraries As hinted above, we will need to find and use a JDBC python library which produces rows of data that can fed into pandas’ read_sql method call. According to the Pandas docs, the connection fed into the read_sql method needs to be either: “ADBC Connection, SQLAlchemy connectable, str, or sqlite3 connection”. This, of course, rules out JayDeBeApi but we were able to find a fork of the popular SQLAlchemy library which allows for generic connections to any database which can connect via JDBC; that library can be found here and is what allows this entire premise to work. Other than that, we will of course also need the pandas and numpy libraries to conduct our data analysis. Making SQL Connection with SQLAlchemy Reading the docs for SQLAlchemy JDBC Generic: https://pypi.org/project/sqlalchemy-jdbc-generic/ along with the docs for GridDB JDBC: https://github.com/griddb/docs-en/blob/master/manuals/GridDB_JDBC_Driver_UserGuide.md allowed for us to ascertain the proper way of building out the JDBC connection string — again, note that it’s not the same process as building out the connection string with the JayDeBeApi library. Having set the table, here is how to create that connection string: from sqlalchemy.engine.url import URL eng_url = URL.create( drivername=’sqlajdbc’, host=’/myCluster/public’, query={ ‘_class’: ‘com.toshiba.mwcloud.gs.sql.Driver’, ‘_driver’: ‘gs’, ‘user’: ‘admin’, ‘password’: ‘admin’, ‘notificationMember’: ‘griddb-server:20001’, ‘_jars’: ‘/app/lib/gridstore-jdbc-5.6.0.jar’ } ) First, the drivername must be set as sqlajdbc, this is the name of the generic JDBC driver. Next, the connection order might seem a bit backwards, but this is the correct way of building the URL. One other thing, the _jars option expects the library jar so please make sure the path points to where you keep your GridDB JDBC jar file. If you are using the included Dockerfile, it already points to the correct path. One last gotcha when trying to make this connection is that before you feed in the connection details and try to make the connection to GridDB, you will need to start the JVM (Java Virtual Machine) like so: import jpype jpype.startJVM(jpype.getDefaultJVMPath(), “-ea”, “-Djava.class.path=/app/lib/gridstore-jdbc-5.6.0.jar”) With this information all set, you can now make the connection and run some queries to be saved into dataframes: from sqlalchemy import create_engine eng = create_engine(eng_url) with eng.connect() as c: print(“Connected”) df = pd.read_sql(“SELECT * FROM LOG_agent_intrusion WHERE exploit = True”, c)

More
Predicting Salaries with Machine Learning and GridDB

Linear regression is a supervised machine learning technique that helps us to predict the value of a variable based on the value of another variable. The variable to be predicted is called the dependent variable while the variable used for predicting other variables is called the independent variable. It uses one or more independent variables to estimate the coefficients of the linear equation. Linear regression generates a straight line that minimizes the differences between the predicted and the expected output values. In this article, we will be implementing a linear regression model that predicts the salary of an individual based on their years of experience using Java and GridDB. Write the Data into GridDB The data to be used shows the years of experience and salaries for different individuals. We will store the data in GridDB as it offers many benefits such as fast query performance. Let us first import the Java libraries to help us accomplish this: import com.toshiba.mwcloud.gs.Collection; import com.toshiba.mwcloud.gs.GSException; import com.toshiba.mwcloud.gs.GridStore; import com.toshiba.mwcloud.gs.GridStoreFactory; import com.toshiba.mwcloud.gs.Query; import com.toshiba.mwcloud.gs.RowKey; import com.toshiba.mwcloud.gs.RowSet; import java.util.*; GridDB organizes data into containers, and each container can be represented as a static class in Java. Let us create a static class in Java to represent the container where the data will be stored: static class SalaryData { @RowKey int id; double yearsExperience; double salary; } We have created a GridDB container and given it the name SalaryData. See it as an SQL table with 3 columns. To write the data into GridDB, we should first establish a connection to the database. This requires us to create a Properties file from the java.util package and pass our GridDB credentials using the key:value syntax: Properties props = new Properties(); props.setProperty(“notificationMember”, “127.0.1.1:10001”); props.setProperty(“clusterName”, “myCluster”); props.setProperty(“user”, “admin”); props.setProperty(“password”, “admin”); GridStore store = GridStoreFactory.getInstance().getGridStore(props); We will be using the store variable to interact with the database. Now that we are connected, we can store the data in GridDB. Let us first define the data rows: SalaryData row1 = new SalaryData(); row1.id=1; row1.yearsExperience=1.1; row1.salary=39343.00; SalaryData row2 = new SalaryData(); row2.id=2; row2.yearsExperience=1.3; row2.salary=46205.00; SalaryData row3 = new SalaryData(); row3.id=3; row3.yearsExperience=1.5; row3.salary=37731.00; SalaryData row4 = new SalaryData(); row4.id=4; row4.yearsExperience=2.0; row4.salary=43525.00; SalaryData row5 = new SalaryData(); row5.id=5; row5.yearsExperience=2.2; row5.salary=39891.00; SalaryData row6 = new SalaryData(); row6.id=6; row6.yearsExperience=2.9; row6.salary=56642.00; SalaryData row7 = new SalaryData(); row7.id=7; row7.yearsExperience=3.0; row7.salary=60150.00; SalaryData row8 = new SalaryData(); row8.id=8; row8.yearsExperience=3.2; row8.salary=54445.00; SalaryData row9 = new SalaryData(); row9.id=9; row9.yearsExperience=3.2; row9.salary=64445.00; SalaryData row10 = new SalaryData(); row10.id=10; row10.yearsExperience=3.7; row10.salary=57189.00; SalaryData row11 = new SalaryData(); row11.id=11; row11.yearsExperience=3.9; row11.salary=63218.00; SalaryData row12 = new SalaryData(); row12.id=12; row12.yearsExperience=4.0; row12.salary=55794.00; SalaryData row13 = new SalaryData(); row13.id=13; row13.yearsExperience=4.0; row13.salary=56957.00; SalaryData row14 = new SalaryData(); row14.id=14; row14.yearsExperience=4.1; row14.salary=57081.00; SalaryData row15 = new SalaryData(); row15.id=15; row15.yearsExperience=4.5; row15.salary=61111.00; SalaryData row16 = new SalaryData(); row16.id=16; row16.yearsExperience=4.9; row16.salary=67938.00; SalaryData row17 = new SalaryData(); row17.id=17; row17.yearsExperience=5.1; row17.salary=66029.00; SalaryData row18 = new SalaryData(); row18.id=18; row18.yearsExperience=5.3; row18.salary=83088.00; SalaryData row19 = new SalaryData(); row19.id=19; row19.yearsExperience=5.9; row19.salary=81363.00; SalaryData row20 = new SalaryData(); row20.id=20; row20.yearsExperience=6.0; row20.salary=93940.00; SalaryData row21 = new SalaryData(); row21.id=21; row21.yearsExperience=6.8; row21.salary=91738.00; SalaryData row22 = new SalaryData(); row22.id=22; row22.yearsExperience=7.1; row22.salary=98273.00; SalaryData row23 = new SalaryData(); row23.id=23; row23.yearsExperience=7.9; row23.salary=101302.00; SalaryData row24 = new SalaryData(); row24.id=24; row24.yearsExperience=8.2; row24.salary=113812.00; SalaryData row25 = new SalaryData(); row25.id=25; row25.yearsExperience=8.7; row25.salary=109431.00; Let’s select the SalaryData container where the data is to be stored: Collection sd= store.putCollection(“SalaryData”, SalaryData.class); We can now call the put() function to help us flush the data into the database: sd.put(row1); sd.put(row2); sd.put(row3); sd.put(row4); sd.put(row5); sd.put(row6); sd.put(row7); sd.put(row8); sd.put(row9); sd.put(row10); sd.put(row11); sd.put(row12); sd.put(row13); sd.put(row14); sd.put(row15); sd.put(row16); sd.put(row17); sd.put(row18); sd.put(row19); sd.put(row20); sd.put(row21); sd.put(row22); sd.put(row23); sd.put(row24); sd.put(row25); Retrieve the Data We should now retrieve the data from GridDB and use it to fit a machine learning model. We will write a TQL query that selects and returns all the data stored in the SalaryData container: Query query = sd.query(“select *”); RowSet rs = query.fetch(false); double x=0; double y=0; double[][] data = {{x},{y}}; System.out.println(“Training dataset:”); while(rs.hasNext()) { SalaryData sd1 = rs.next(); x=sd1.yearsExperience; y=sd1.salary; System.out.println(x +” “+ y); double[][] d= {{x},{y}}; data=d.clone(); } The select * is a TQL query that fetches all the data stored in the SalaryData container. We have also defined two variables, x an y to store the values of yearsExperiene and salary respectively when fetched from the database. The data has then been stored in an array named data. The while loop helps us to iterate over all the rows of the GridDB container. Create Weka Instances We now want to use the Weka machine learning library to fit a linear regression model. We have to convert our data into Weka instances. We will first create attributes for the dataset and store them in a FastVector data structure. Next, we will create Weka instances from the dataset. Let’s first create the data structures for storing the attributes and the instances: int numInstances = data[0].length; FastVector atts = new FastVector(); List instances = new ArrayList(); We can now create afor loop and use it to iterate over the data items while populating the FastVector data structure with the attributes: for(int dim = 0; dim < 2; dim++) { Attribute current = new Attribute("Attribute" + dim, dim); if(dim == 0) { for(int obj = 0; obj < numInstances; obj++) { instances.add(new SparseInstance(numInstances)); } } for(int obj = 0; obj < numInstances; obj++) { instances.get(obj).setValue(current, data[dim][obj]); } atts.addElement(current); } Instances newDataset = new Instances("Dataset", atts, instances.size()); The integer variable dim generates a two dimensional data structure for storing the attributes. We have also created objects with the specific attributes to be used for building the model. These have been stored in an Instance variable named newDataset. Build a Linear Regression Model The data instances are now ready, hence, we can use them to build a machine learning model. Let's first import the necessary libraries from Weka: import weka.classifiers.Classifier; import weka.classifiers.Evaluation; import weka.core.Attribute; import weka.core.FastVector; import weka.core.Instance; import weka.core.Instances; import weka.core.SparseInstance; Let us set the class attribute for the dataset: newDataset.setClassIndex(1); The above line of code sets salary as the class attribute for the dataset. Next, we use the LinearRegression() function of the Weka library to build a Linear Regression classifier. for(Instance inst : instances) newDataset.add(inst); Classifier classifier = new weka.classifiers.functions.LinearRegression(); classifier.buildClassifier(newDataset); We have created an instance of the function and given it the name classifier. We have then called the buildClassifier() function to build a classifier using our dataset. Make a Prediction Let's use our linear regression model to predict the salary of a person based on their years of experience. We will use the last instance of our dataset to make the prediction: Instance pd = newDataset.lastInstance(); double value = classifier.classifyInstance(pd); System.out.println(value); pd is an instance of the last instance of our dataset. The classifyInstance() function predicts the value of salary for the instance. Execute the Model To run the model, first download the Weka API from the following URL: http://www.java2s.com/Code/Jar/w/weka.htm Choose Weka version 3.7.0. Set the class paths for the gridstore.jar and weka-3-7-0.jar files by running the following commands on the terminal: export CLASSPATH=$CLASSPATH:/usr/share/java/gridstore.jar export CLASSPATH=$CLASSPATH:/mnt/c/Users/user/Desktop/weka-3.7.0.jar The above commands may change depending on the location of your files. Compile your .java file by running the following command: javac Salary.java Run the generated .class file using the following command: java Salary The model predicted a salary of 109431.0 for the

More