{"id":46697,"date":"2022-05-05T00:00:00","date_gmt":"2022-05-05T07:00:00","guid":{"rendered":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/"},"modified":"2025-11-13T12:55:56","modified_gmt":"2025-11-13T20:55:56","slug":"building-a-linear-regression-model-for-housing-data-using-python-and-griddb","status":"publish","type":"post","link":"https:\/\/griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/","title":{"rendered":"Building a Linear Regression Model for Housing Data using Python and GridDB"},"content":{"rendered":"<p>In this tutorial, we will explore a housing dataset using Python. We will first prune the dataset as per our needs. Later, we will see how can we build a Machine Learning model to fit our dataset and make future predictions.<\/p>\n<p>The outline of the tutorial is as follows:<\/p>\n<ol>\n<li>Pre-requisites<\/li>\n<li>About the Dataset<\/li>\n<li>Importing the Libraries<\/li>\n<li>Loading the Dataset<\/li>\n<li>Data Preprocessing<\/li>\n<li>Data Normalization<\/li>\n<li>Splitting the Dataset<\/li>\n<li>Building the Model<\/li>\n<li>Making Predictions<\/li>\n<li>Model Evaluation<\/li>\n<li>Conclusion<\/li>\n<\/ol>\n<h2>1&#46; Prerequisites<\/h2>\n<p>This tutorial is executed using Jupyter Notebooks (Anaconda version 4.8.3) with Python version 3.8 on Windows 10 Operating system. The following packages need to be installed before the code execution:<\/p>\n<ol>\n<li><a href=\"https:\/\/pandas.pydata.org\/docs\/getting_started\/install.html\">Pandas<\/a><\/li>\n<li><a href=\"https:\/\/pypi.org\/project\/scikit-learn\/\">scikit-learn<\/a><\/li>\n<\/ol>\n<p>If you are using Anaconda, packages can be installed through multiple ways such as the User Interface, Command Line, or Jupyter Notebooks. The most conventional way to install a python package is via <code>pip<\/code>. If you are using the command line or the terminal, type <code>pip install package-name<\/code>. Another way to install a package is through <code>conda install package-name<\/code> within the Anaconda environment.<\/p>\n<p>Also, note that we will cover two methods to load our dataset in the python environment &#8211; Using <code>Pandas<\/code> and <code>GridDB<\/code>. For using GridDB within the python environment, the following packages are required:<\/p>\n<ol>\n<li><a href=\"https:\/\/github.com\/griddb\/c_client\">GridDB C-client<\/a><\/li>\n<li>SWIG (Simplified Wrapper and Interface Generator)<\/li>\n<li><a href=\"https:\/\/github.com\/griddb\/python_client\">GridDB Python Client<\/a><\/li>\n<\/ol>\n<h2>2&#46; About the Dataset<\/h2>\n<p>We will be using a snapshot of the Melbourne Housing Dataset which has been scraped from public resources and is now available on <a href=\"https:\/\/www.kaggle.com\/dansbecker\/melbourne-housing-snapshot\">Kaggle<\/a>. The dataset has been preprocessed to some extent and contains a total of 13580 instances. The number of attributes present in the dataset is 21. The dependent variable is the price of the property while the other 20 attributes are independent. Let us now get started on the code.<\/p>\n<h2>3&#46; Importing the Libraries<\/h2>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">import pandas as pd\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn.model_selection import train_test_split \nfrom sklearn.metrics import mean_absolute_error<\/code><\/pre>\n<\/div>\n<p>The above cell should execute without any output if you successfully installed the libraries. In case you encounter an error, try the following :<\/p>\n<ol>\n<li>Reconfirm if the installation was successful. If not, try executing <code>pip install package-name<\/code> again. <\/li>\n<li>Check if your system is compatible with the package version.<\/li>\n<\/ol>\n<h2>4&#46; Loading the Dataset<\/h2>\n<h3>Using GridDB<\/h3>\n<p><a href=\"https:\/\/griddb.net\/en\/\">GridDB<\/a> is an open-source time-series database designed for handling large amounts of data. It is optimized for IoT and is highly efficient because of its in-memory architecture. Since dealing with files locally can lead to integration issues in a professional environment, using a reliable database becomes important. GridDB provides that reliability and scalability with fault tolerance.<\/p>\n<p>Moreover, with GridDB&#8217;s <a href=\"https:\/\/github.com\/griddb\/python_client\">python client<\/a>, it has become much easier to include the database and manipulate it directly within the coding environment. Learn more about the GriDB WebAPI <a href=\"https:\/\/griddb.net\/en\/blog\/griddb-webapi\/\">here<\/a>.<\/p>\n<p>Let us now go ahead and load our dataset.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">import griddb_python as griddb\n\nsql_statement = ('SELECT * FROM melb_data')\ndataset = pd.read_sql_query(sql_statement, container)<\/code><\/pre>\n<\/div>\n<p>The <code>dataset<\/code> variable will now have the data in the form of a pandas dataframe. If you are new to GridDB, a tutorial on <a href=\"https:\/\/griddb.net\/en\/blog\/using-pandas-dataframes-with-griddb\/\">how to insert data in GridDB<\/a> might be helpful.<\/p>\n<h3>Using Pandas<\/h3>\n<p>Another way to load the dataset is using the pandas directly.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">dataset = pd.read_csv(\"melb_data.csv\")<\/code><\/pre>\n<\/div>\n<h2>5&#46; Data Preprocessing<\/h2>\n<p>Great! Now that we have our dataset, let&#8217;s see what it actually looks like &#8211;<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">dataset.head()<\/code><\/pre>\n<\/div>\n<div style=\"overflow-y: hidden;\">\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          Suburb\n        <\/th>\n<th>\n          Address\n        <\/th>\n<th>\n          Rooms\n        <\/th>\n<th>\n          Type\n        <\/th>\n<th>\n          Price\n        <\/th>\n<th>\n          Method\n        <\/th>\n<th>\n          SellerG\n        <\/th>\n<th>\n          Date\n        <\/th>\n<th>\n          Distance\n        <\/th>\n<th>\n          Postcode\n        <\/th>\n<th>\n          &#8230;\n        <\/th>\n<th>\n          Bathroom\n        <\/th>\n<th>\n          Car\n        <\/th>\n<th>\n          Landsize\n        <\/th>\n<th>\n          BuildingArea\n        <\/th>\n<th>\n          YearBuilt\n        <\/th>\n<th>\n          CouncilArea\n        <\/th>\n<th>\n          Lattitude\n        <\/th>\n<th>\n          Longtitude\n        <\/th>\n<th>\n          Regionname\n        <\/th>\n<th>\n          Propertycount\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          0\n        <\/th>\n<td>\n          Abbotsford\n        <\/td>\n<td>\n          85 Turner St\n        <\/td>\n<td>\n          2\n        <\/td>\n<td>\n          h\n        <\/td>\n<td>\n          1480000.0\n        <\/td>\n<td>\n          S\n        <\/td>\n<td>\n          Biggin\n        <\/td>\n<td>\n          3\/12\/2016\n        <\/td>\n<td>\n          2.5\n        <\/td>\n<td>\n          3067.0\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          202.0\n        <\/td>\n<td>\n          NaN\n        <\/td>\n<td>\n          NaN\n        <\/td>\n<td>\n          Yarra\n        <\/td>\n<td>\n          -37.7996\n        <\/td>\n<td>\n          144.9984\n        <\/td>\n<td>\n          Northern Metropolitan\n        <\/td>\n<td>\n          4019.0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          1\n        <\/th>\n<td>\n          Abbotsford\n        <\/td>\n<td>\n          25 Bloomburg St\n        <\/td>\n<td>\n          2\n        <\/td>\n<td>\n          h\n        <\/td>\n<td>\n          1035000.0\n        <\/td>\n<td>\n          S\n        <\/td>\n<td>\n          Biggin\n        <\/td>\n<td>\n          4\/02\/2016\n        <\/td>\n<td>\n          2.5\n        <\/td>\n<td>\n          3067.0\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          0.0\n        <\/td>\n<td>\n          156.0\n        <\/td>\n<td>\n          79.0\n        <\/td>\n<td>\n          1900.0\n        <\/td>\n<td>\n          Yarra\n        <\/td>\n<td>\n          -37.8079\n        <\/td>\n<td>\n          144.9934\n        <\/td>\n<td>\n          Northern Metropolitan\n        <\/td>\n<td>\n          4019.0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          2\n        <\/th>\n<td>\n          Abbotsford\n        <\/td>\n<td>\n          5 Charles St\n        <\/td>\n<td>\n          3\n        <\/td>\n<td>\n          h\n        <\/td>\n<td>\n          1465000.0\n        <\/td>\n<td>\n          SP\n        <\/td>\n<td>\n          Biggin\n        <\/td>\n<td>\n          4\/03\/2017\n        <\/td>\n<td>\n          2.5\n        <\/td>\n<td>\n          3067.0\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          2.0\n        <\/td>\n<td>\n          0.0\n        <\/td>\n<td>\n          134.0\n        <\/td>\n<td>\n          150.0\n        <\/td>\n<td>\n          1900.0\n        <\/td>\n<td>\n          Yarra\n        <\/td>\n<td>\n          -37.8093\n        <\/td>\n<td>\n          144.9944\n        <\/td>\n<td>\n          Northern Metropolitan\n        <\/td>\n<td>\n          4019.0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          3\n        <\/th>\n<td>\n          Abbotsford\n        <\/td>\n<td>\n          40 Federation La\n        <\/td>\n<td>\n          3\n        <\/td>\n<td>\n          h\n        <\/td>\n<td>\n          850000.0\n        <\/td>\n<td>\n          PI\n        <\/td>\n<td>\n          Biggin\n        <\/td>\n<td>\n          4\/03\/2017\n        <\/td>\n<td>\n          2.5\n        <\/td>\n<td>\n          3067.0\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          2.0\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          94.0\n        <\/td>\n<td>\n          NaN\n        <\/td>\n<td>\n          NaN\n        <\/td>\n<td>\n          Yarra\n        <\/td>\n<td>\n          -37.7969\n        <\/td>\n<td>\n          144.9969\n        <\/td>\n<td>\n          Northern Metropolitan\n        <\/td>\n<td>\n          4019.0\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          4\n        <\/th>\n<td>\n          Abbotsford\n        <\/td>\n<td>\n          55a Park St\n        <\/td>\n<td>\n          4\n        <\/td>\n<td>\n          h\n        <\/td>\n<td>\n          1600000.0\n        <\/td>\n<td>\n          VB\n        <\/td>\n<td>\n          Nelson\n        <\/td>\n<td>\n          4\/06\/2016\n        <\/td>\n<td>\n          2.5\n        <\/td>\n<td>\n          3067.0\n        <\/td>\n<td>\n          &#8230;\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          2.0\n        <\/td>\n<td>\n          120.0\n        <\/td>\n<td>\n          142.0\n        <\/td>\n<td>\n          2014.0\n        <\/td>\n<td>\n          Yarra\n        <\/td>\n<td>\n          -37.8072\n        <\/td>\n<td>\n          144.9941\n        <\/td>\n<td>\n          Northern Metropolitan\n        <\/td>\n<td>\n          4019.0\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\n    5 rows \u00c3\u0097 21 columns\n  <\/p>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">len(dataset)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-sh\">    13580<\/code><\/pre>\n<\/div>\n<p>As we can see there are a lot of columns, let&#8217;s go ahead and print out the column names to get a better idea of the independent and dependent attributes.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">dataset.columns<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-sh\">    Index(['Suburb', 'Address', 'Rooms', 'Type', 'Price', 'Method', 'SellerG',\n           'Date', 'Distance', 'Postcode', 'Bedroom2', 'Bathroom', 'Car',\n           'Landsize', 'BuildingArea', 'YearBuilt', 'CouncilArea', 'Lattitude',\n           'Longtitude', 'Regionname', 'Propertycount'],\n          dtype='object')<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">dataset.describe()<\/code><\/pre>\n<\/div>\n<div style=\"overflow-y: hidden;\">\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          Rooms\n        <\/th>\n<th>\n          Price\n        <\/th>\n<th>\n          Distance\n        <\/th>\n<th>\n          Postcode\n        <\/th>\n<th>\n          Bedroom2\n        <\/th>\n<th>\n          Bathroom\n        <\/th>\n<th>\n          Car\n        <\/th>\n<th>\n          Landsize\n        <\/th>\n<th>\n          BuildingArea\n        <\/th>\n<th>\n          YearBuilt\n        <\/th>\n<th>\n          Lattitude\n        <\/th>\n<th>\n          Longtitude\n        <\/th>\n<th>\n          Propertycount\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          count\n        <\/th>\n<td>\n          13580.000000\n        <\/td>\n<td>\n          1.358000e+04\n        <\/td>\n<td>\n          13580.000000\n        <\/td>\n<td>\n          13580.000000\n        <\/td>\n<td>\n          13580.000000\n        <\/td>\n<td>\n          13580.000000\n        <\/td>\n<td>\n          13518.000000\n        <\/td>\n<td>\n          13580.000000\n        <\/td>\n<td>\n          7130.000000\n        <\/td>\n<td>\n          8205.000000\n        <\/td>\n<td>\n          13580.000000\n        <\/td>\n<td>\n          13580.000000\n        <\/td>\n<td>\n          13580.000000\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          mean\n        <\/th>\n<td>\n          2.937997\n        <\/td>\n<td>\n          1.075684e+06\n        <\/td>\n<td>\n          10.137776\n        <\/td>\n<td>\n          3105.301915\n        <\/td>\n<td>\n          2.914728\n        <\/td>\n<td>\n          1.534242\n        <\/td>\n<td>\n          1.610075\n        <\/td>\n<td>\n          558.416127\n        <\/td>\n<td>\n          151.967650\n        <\/td>\n<td>\n          1964.684217\n        <\/td>\n<td>\n          -37.809203\n        <\/td>\n<td>\n          144.995216\n        <\/td>\n<td>\n          7454.417378\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          std\n        <\/th>\n<td>\n          0.955748\n        <\/td>\n<td>\n          6.393107e+05\n        <\/td>\n<td>\n          5.868725\n        <\/td>\n<td>\n          90.676964\n        <\/td>\n<td>\n          0.965921\n        <\/td>\n<td>\n          0.691712\n        <\/td>\n<td>\n          0.962634\n        <\/td>\n<td>\n          3990.669241\n        <\/td>\n<td>\n          541.014538\n        <\/td>\n<td>\n          37.273762\n        <\/td>\n<td>\n          0.079260\n        <\/td>\n<td>\n          0.103916\n        <\/td>\n<td>\n          4378.581772\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          min\n        <\/th>\n<td>\n          1.000000\n        <\/td>\n<td>\n          8.500000e+04\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          3000.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          1196.000000\n        <\/td>\n<td>\n          -38.182550\n        <\/td>\n<td>\n          144.431810\n        <\/td>\n<td>\n          249.000000\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          25%\n        <\/th>\n<td>\n          2.000000\n        <\/td>\n<td>\n          6.500000e+05\n        <\/td>\n<td>\n          6.100000\n        <\/td>\n<td>\n          3044.000000\n        <\/td>\n<td>\n          2.000000\n        <\/td>\n<td>\n          1.000000\n        <\/td>\n<td>\n          1.000000\n        <\/td>\n<td>\n          177.000000\n        <\/td>\n<td>\n          93.000000\n        <\/td>\n<td>\n          1940.000000\n        <\/td>\n<td>\n          -37.856822\n        <\/td>\n<td>\n          144.929600\n        <\/td>\n<td>\n          4380.000000\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          50%\n        <\/th>\n<td>\n          3.000000\n        <\/td>\n<td>\n          9.030000e+05\n        <\/td>\n<td>\n          9.200000\n        <\/td>\n<td>\n          3084.000000\n        <\/td>\n<td>\n          3.000000\n        <\/td>\n<td>\n          1.000000\n        <\/td>\n<td>\n          2.000000\n        <\/td>\n<td>\n          440.000000\n        <\/td>\n<td>\n          126.000000\n        <\/td>\n<td>\n          1970.000000\n        <\/td>\n<td>\n          -37.802355\n        <\/td>\n<td>\n          145.000100\n        <\/td>\n<td>\n          6555.000000\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          75%\n        <\/th>\n<td>\n          3.000000\n        <\/td>\n<td>\n          1.330000e+06\n        <\/td>\n<td>\n          13.000000\n        <\/td>\n<td>\n          3148.000000\n        <\/td>\n<td>\n          3.000000\n        <\/td>\n<td>\n          2.000000\n        <\/td>\n<td>\n          2.000000\n        <\/td>\n<td>\n          651.000000\n        <\/td>\n<td>\n          174.000000\n        <\/td>\n<td>\n          1999.000000\n        <\/td>\n<td>\n          -37.756400\n        <\/td>\n<td>\n          145.058305\n        <\/td>\n<td>\n          10331.000000\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          max\n        <\/th>\n<td>\n          10.000000\n        <\/td>\n<td>\n          9.000000e+06\n        <\/td>\n<td>\n          48.100000\n        <\/td>\n<td>\n          3977.000000\n        <\/td>\n<td>\n          20.000000\n        <\/td>\n<td>\n          8.000000\n        <\/td>\n<td>\n          10.000000\n        <\/td>\n<td>\n          433014.000000\n        <\/td>\n<td>\n          44515.000000\n        <\/td>\n<td>\n          2018.000000\n        <\/td>\n<td>\n          -37.408530\n        <\/td>\n<td>\n          145.526350\n        <\/td>\n<td>\n          21650.000000\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>The output of the <code>describe<\/code> function conveys that the value of each attribute has a different scale. Therefore, we will need to normalize it before building our model.<\/p>\n<p>Before normalization, we will be taking a subset of the attributes which seem to be directly correlated to the price.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">dataset = dataset[[\"Rooms\", \"Price\", \"Bedroom2\", \"Bathroom\",\"Landsize\", \"BuildingArea\", \"YearBuilt\"]]<\/code><\/pre>\n<\/div>\n<p>We also need to make sure that our data does not contain any null values before proceeding to model building.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">dataset.isna().sum()<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-sh\">    Rooms              0\n    Price              0\n    Bedroom2           0\n    Bathroom           0\n    Landsize           0\n    BuildingArea    6450\n    YearBuilt       5375\n    dtype: int64<\/code><\/pre>\n<\/div>\n<p>As we can see, the two attributes contain several null values. Let&#8217;s go ahead and drop those instances.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">dataset = dataset.dropna()<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">len(dataset)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-sh\">    6858<\/code><\/pre>\n<\/div>\n<p>We will now create a new attribute called <code>HouseAge<\/code>. The values of the attribute can be derived by subtracting the current year from the <code>YearBuilt<\/code> attribute. This is helpful because we do not have to deal with dates anymore. All the attributes are now numerical in nature which will help us with the Machine Learning part later on.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">dataset['HouseAge'] = 2022 - dataset[\"YearBuilt\"].astype(int)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">dataset.head()<\/code><\/pre>\n<\/div>\n<div style=\"overflow-y: hidden;\">\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          Rooms\n        <\/th>\n<th>\n          Price\n        <\/th>\n<th>\n          Bedroom2\n        <\/th>\n<th>\n          Bathroom\n        <\/th>\n<th>\n          Landsize\n        <\/th>\n<th>\n          BuildingArea\n        <\/th>\n<th>\n          YearBuilt\n        <\/th>\n<th>\n          HouseAge\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          1\n        <\/th>\n<td>\n          2\n        <\/td>\n<td>\n          1035000.0\n        <\/td>\n<td>\n          2.0\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          156.0\n        <\/td>\n<td>\n          79.0\n        <\/td>\n<td>\n          1900.0\n        <\/td>\n<td>\n          122\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          2\n        <\/th>\n<td>\n          3\n        <\/td>\n<td>\n          1465000.0\n        <\/td>\n<td>\n          3.0\n        <\/td>\n<td>\n          2.0\n        <\/td>\n<td>\n          134.0\n        <\/td>\n<td>\n          150.0\n        <\/td>\n<td>\n          1900.0\n        <\/td>\n<td>\n          122\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          4\n        <\/th>\n<td>\n          4\n        <\/td>\n<td>\n          1600000.0\n        <\/td>\n<td>\n          3.0\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          120.0\n        <\/td>\n<td>\n          142.0\n        <\/td>\n<td>\n          2014.0\n        <\/td>\n<td>\n          8\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          6\n        <\/th>\n<td>\n          3\n        <\/td>\n<td>\n          1876000.0\n        <\/td>\n<td>\n          4.0\n        <\/td>\n<td>\n          2.0\n        <\/td>\n<td>\n          245.0\n        <\/td>\n<td>\n          210.0\n        <\/td>\n<td>\n          1910.0\n        <\/td>\n<td>\n          112\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          7\n        <\/th>\n<td>\n          2\n        <\/td>\n<td>\n          1636000.0\n        <\/td>\n<td>\n          2.0\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          256.0\n        <\/td>\n<td>\n          107.0\n        <\/td>\n<td>\n          1890.0\n        <\/td>\n<td>\n          132\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>Great! The <code>YearBuilt<\/code> attribute is not needed anymore. So, let&#8217;s go ahead and drop that.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">dataset = dataset.drop(\"YearBuilt\", axis=1)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">dataset.head()<\/code><\/pre>\n<\/div>\n<div style=\"overflow-y: hidden;\">\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          Rooms\n        <\/th>\n<th>\n          Price\n        <\/th>\n<th>\n          Bedroom2\n        <\/th>\n<th>\n          Bathroom\n        <\/th>\n<th>\n          Landsize\n        <\/th>\n<th>\n          BuildingArea\n        <\/th>\n<th>\n          HouseAge\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          1\n        <\/th>\n<td>\n          2\n        <\/td>\n<td>\n          1035000.0\n        <\/td>\n<td>\n          2.0\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          156.0\n        <\/td>\n<td>\n          79.0\n        <\/td>\n<td>\n          122\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          2\n        <\/th>\n<td>\n          3\n        <\/td>\n<td>\n          1465000.0\n        <\/td>\n<td>\n          3.0\n        <\/td>\n<td>\n          2.0\n        <\/td>\n<td>\n          134.0\n        <\/td>\n<td>\n          150.0\n        <\/td>\n<td>\n          122\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          4\n        <\/th>\n<td>\n          4\n        <\/td>\n<td>\n          1600000.0\n        <\/td>\n<td>\n          3.0\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          120.0\n        <\/td>\n<td>\n          142.0\n        <\/td>\n<td>\n          8\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          6\n        <\/th>\n<td>\n          3\n        <\/td>\n<td>\n          1876000.0\n        <\/td>\n<td>\n          4.0\n        <\/td>\n<td>\n          2.0\n        <\/td>\n<td>\n          245.0\n        <\/td>\n<td>\n          210.0\n        <\/td>\n<td>\n          112\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          7\n        <\/th>\n<td>\n          2\n        <\/td>\n<td>\n          1636000.0\n        <\/td>\n<td>\n          2.0\n        <\/td>\n<td>\n          1.0\n        <\/td>\n<td>\n          256.0\n        <\/td>\n<td>\n          107.0\n        <\/td>\n<td>\n          132\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<h2>6&#46; Data Normalization<\/h2>\n<p>As we saw before, the values of the attributes have different scales which can lead to a disparity as the features larger in value will dominate over the smaller ones. Therefore, it is important to bring all the values down to one scale. For that, we will be using the <code>Min-Max Normalization<\/code>. It is one of the most common techniques where the minimum values coverts to 0 while the maximum value converts to 1. All the other values spread out between 0 and 1.<\/p>\n<p>There are direct methods present for normalization but they convert the dataframe into a NumPy array. Hence, we lose the column names. For that reason, we will define our own method which takes in a dataframe and returns a new normalized dataframe.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">def normalize(df):\n    result = df.copy()\n    for feature_name in df.columns:\n        max_value = df[feature_name].max()\n        min_value = df[feature_name].min()\n        result[feature_name] = (df[feature_name] - min_value) \/ (max_value - min_value)\n    return result<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">df = normalize(dataset)<\/code><\/pre>\n<\/div>\n<p>Let&#8217;s have a look at our normalized dataframe.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">df.head()<\/code><\/pre>\n<\/div>\n<div style=\"overflow-y: hidden;\">\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          Rooms\n        <\/th>\n<th>\n          Price\n        <\/th>\n<th>\n          Bedroom2\n        <\/th>\n<th>\n          Bathroom\n        <\/th>\n<th>\n          Landsize\n        <\/th>\n<th>\n          BuildingArea\n        <\/th>\n<th>\n          HouseAge\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          1\n        <\/th>\n<td>\n          0.142857\n        <\/td>\n<td>\n          0.101928\n        <\/td>\n<td>\n          0.222222\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.004216\n        <\/td>\n<td>\n          0.025386\n        <\/td>\n<td>\n          0.143552\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          2\n        <\/th>\n<td>\n          0.285714\n        <\/td>\n<td>\n          0.150412\n        <\/td>\n<td>\n          0.333333\n        <\/td>\n<td>\n          0.142857\n        <\/td>\n<td>\n          0.003622\n        <\/td>\n<td>\n          0.048201\n        <\/td>\n<td>\n          0.143552\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          4\n        <\/th>\n<td>\n          0.428571\n        <\/td>\n<td>\n          0.165633\n        <\/td>\n<td>\n          0.333333\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.003243\n        <\/td>\n<td>\n          0.045630\n        <\/td>\n<td>\n          0.004866\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          6\n        <\/th>\n<td>\n          0.285714\n        <\/td>\n<td>\n          0.196753\n        <\/td>\n<td>\n          0.444444\n        <\/td>\n<td>\n          0.142857\n        <\/td>\n<td>\n          0.006622\n        <\/td>\n<td>\n          0.067481\n        <\/td>\n<td>\n          0.131387\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          7\n        <\/th>\n<td>\n          0.142857\n        <\/td>\n<td>\n          0.169692\n        <\/td>\n<td>\n          0.222222\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.006919\n        <\/td>\n<td>\n          0.034383\n        <\/td>\n<td>\n          0.155718\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>As we can see, all the values lie between 0 and 1. It is now time to split our dataset into <code>train<\/code> and <code>test<\/code>.<\/p>\n<h2>7&#46; Splitting the dataset<\/h2>\n<p>We will be doing a <code>70-30<\/code> train-test split. In the case of smaller datasets, one can also do <code>80-20<\/code>.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">train, test = train_test_split(df, test_size=0.3)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">len(train)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-sh\">    4800<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">len(test)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-sh\">    2058<\/code><\/pre>\n<\/div>\n<p>Let&#8217;s now separate the dependent and independent variables.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">train_y = train[[\"Price\"]]\ntrain_x = train.drop([\"Price\"], axis=1)\ntest_y = test[[\"Price\"]]\ntest_x = test.drop([\"Price\"], axis=1)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">train_x.head()<\/code><\/pre>\n<\/div>\n<div style=\"overflow-y: hidden;\">\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          Rooms\n        <\/th>\n<th>\n          Bedroom2\n        <\/th>\n<th>\n          Bathroom\n        <\/th>\n<th>\n          Landsize\n        <\/th>\n<th>\n          BuildingArea\n        <\/th>\n<th>\n          HouseAge\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          4860\n        <\/th>\n<td>\n          0.285714\n        <\/td>\n<td>\n          0.333333\n        <\/td>\n<td>\n          0.142857\n        <\/td>\n<td>\n          0.222054\n        <\/td>\n<td>\n          0.041774\n        <\/td>\n<td>\n          0.027981\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          3434\n        <\/th>\n<td>\n          0.285714\n        <\/td>\n<td>\n          0.333333\n        <\/td>\n<td>\n          0.142857\n        <\/td>\n<td>\n          0.018459\n        <\/td>\n<td>\n          0.064267\n        <\/td>\n<td>\n          0.027981\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          6048\n        <\/th>\n<td>\n          0.285714\n        <\/td>\n<td>\n          0.333333\n        <\/td>\n<td>\n          0.285714\n        <\/td>\n<td>\n          0.005973\n        <\/td>\n<td>\n          0.049807\n        <\/td>\n<td>\n          0.008516\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          9918\n        <\/th>\n<td>\n          0.142857\n        <\/td>\n<td>\n          0.222222\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.007135\n        <\/td>\n<td>\n          0.031170\n        <\/td>\n<td>\n          0.131387\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          7855\n        <\/th>\n<td>\n          0.142857\n        <\/td>\n<td>\n          0.222222\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.000000\n        <\/td>\n<td>\n          0.030848\n        <\/td>\n<td>\n          0.155718\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">train_y.head()<\/code><\/pre>\n<\/div>\n<div style=\"overflow-y: hidden;\">\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }<\/p>\n<p>    .dataframe tbody tr th {\n        vertical-align: top;\n    }<\/p>\n<p>    .dataframe thead th {\n        text-align: right;\n    }\n  <\/style>\n<table border=\"1\" class=\"dataframe\">\n<thead>\n<tr style=\"text-align: right;\">\n<th>\n        <\/th>\n<th>\n          Price\n        <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th>\n          4860\n        <\/th>\n<td>\n          0.103619\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          3434\n        <\/th>\n<td>\n          0.064494\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          6048\n        <\/th>\n<td>\n          0.055136\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          9918\n        <\/th>\n<td>\n          0.174879\n        <\/td>\n<\/tr>\n<tr>\n<th>\n          7855\n        <\/th>\n<td>\n          0.053219\n        <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<h2>8&#46; Building the Model<\/h2>\n<p>We will use a <code>Linear Regression<\/code> model, in this case. Since this is a simple dataset, a Linear Regression Model should do the trick. To build a more sophisticated model, one can also try Decision Trees.<\/p>\n<p>Explore more about Linear Regression with GridDB and Python <a href=\"https:\/\/griddb.net\/en\/blog\/create-a-machine-learning-model-using-griddb\/\">here<\/a>.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">model =  LinearRegression()\nmodel.fit(train_x, train_y)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-sh\">    LinearRegression()<\/code><\/pre>\n<\/div>\n<h2>9&#46; Making Predictions<\/h2>\n<p>Let us now make predictions on our <code>test<\/code> dataset.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">predictions = model.predict(test_x)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">predictions<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-sh\">    array([[0.0890521 ],\n           [0.06244483],\n           [0.13166691],\n           ...,\n           [0.09182388],\n           [0.20981148],\n           [0.1077662 ]])<\/code><\/pre>\n<\/div>\n<h2>10&#46; Model Evaluation<\/h2>\n<p>To quantify how good our predictions are, there are several <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/model_evaluation.html\">metrics<\/a> provided by the <code>sklearn<\/code> library. We will be using the <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/model_evaluation.html#mean-squared-error\">mean_absolute_error<\/a> metric which is one of the most common metrics used for Linear Regression Models.<\/p>\n<div class=\"clipboard\">\n<pre><code class=\"language-python\">mean_absolute_error(predictions, test_y)<\/code><\/pre>\n<\/div>\n<div class=\"clipboard\">\n<pre><code class=\"language-sh\">    0.035125149637253696<\/code><\/pre>\n<\/div>\n<p>Great! Our model has a mean absolute error of <code>0.03<\/code> which is not a bad start for a Linear Regression Model.<\/p>\n<h2>11&#46; Conclusion<\/h2>\n<p>In this tutorial, we saw how can we build a Machine Learning Model for a housing dataset. In the beginning, we covered two methods for loading our dataset into the environment &#8211; <code>GridDB<\/code> and Pandas. We also pruned the dataset as per our needs. Later on, we used the <code>Linear Regression<\/code> function provided by the <code>sklearn<\/code> library to fit our dataset.<\/p>\n<p>Learn more about real-time predictions with GridDB and Python <a href=\"https:\/\/griddb.net\/en\/blog\/performing-real-time-predictions-using-machine-learning-griddb-and-python\/\">here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this tutorial, we will explore a housing dataset using Python. We will first prune the dataset as per our needs. Later, we will see how can we build a Machine Learning model to fit our dataset and make future predictions. The outline of the tutorial is as follows: Pre-requisites About the Dataset Importing the [&hellip;]<\/p>\n","protected":false},"author":41,"featured_media":27038,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[121],"tags":[],"class_list":["post-46697","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.1.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Building a Linear Regression Model for Housing Data using Python and GridDB | GridDB: Open Source Time Series Database for IoT<\/title>\n<meta name=\"description\" content=\"In this tutorial, we will explore a housing dataset using Python. We will first prune the dataset as per our needs. Later, we will see how can we build a\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Building a Linear Regression Model for Housing Data using Python and GridDB | GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"og:description\" content=\"In this tutorial, we will explore a housing dataset using Python. We will first prune the dataset as per our needs. Later, we will see how can we build a\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/\" \/>\n<meta property=\"og:site_name\" content=\"GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/griddbcommunity\/\" \/>\n<meta property=\"article:published_time\" content=\"2022-05-05T07:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-13T20:55:56+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/griddb.net\/wp-content\/uploads\/2020\/11\/architecture_1920x1441.jpeg\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1441\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"griddb-admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:site\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"griddb-admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/\"},\"author\":{\"name\":\"griddb-admin\",\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233\"},\"headline\":\"Building a Linear Regression Model for Housing Data using Python and GridDB\",\"datePublished\":\"2022-05-05T07:00:00+00:00\",\"dateModified\":\"2025-11-13T20:55:56+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/\"},\"wordCount\":1187,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2020\/11\/architecture_1920x1441.jpeg\",\"articleSection\":[\"Blog\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/\",\"url\":\"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/\",\"name\":\"Building a Linear Regression Model for Housing Data using Python and GridDB | GridDB: Open Source Time Series Database for IoT\",\"isPartOf\":{\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/#primaryimage\"},\"thumbnailUrl\":\"\/wp-content\/uploads\/2020\/11\/architecture_1920x1441.jpeg\",\"datePublished\":\"2022-05-05T07:00:00+00:00\",\"dateModified\":\"2025-11-13T20:55:56+00:00\",\"description\":\"In this tutorial, we will explore a housing dataset using Python. We will first prune the dataset as per our needs. Later, we will see how can we build a\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/#primaryimage\",\"url\":\"\/wp-content\/uploads\/2020\/11\/architecture_1920x1441.jpeg\",\"contentUrl\":\"\/wp-content\/uploads\/2020\/11\/architecture_1920x1441.jpeg\",\"width\":1920,\"height\":1441},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#website\",\"url\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/\",\"name\":\"GridDB: Open Source Time Series Database for IoT\",\"description\":\"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL\",\"publisher\":{\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#organization\",\"name\":\"Fixstars\",\"url\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"contentUrl\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"width\":200,\"height\":83,\"caption\":\"Fixstars\"},\"image\":{\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/griddbcommunity\/\",\"https:\/\/x.com\/GridDBCommunity\",\"https:\/\/www.linkedin.com\/company\/griddb-by-toshiba\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233\",\"name\":\"griddb-admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g\",\"caption\":\"griddb-admin\"},\"url\":\"https:\/\/griddb.net\/en\/author\/griddb-admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Building a Linear Regression Model for Housing Data using Python and GridDB | GridDB: Open Source Time Series Database for IoT","description":"In this tutorial, we will explore a housing dataset using Python. We will first prune the dataset as per our needs. Later, we will see how can we build a","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/","og_locale":"en_US","og_type":"article","og_title":"Building a Linear Regression Model for Housing Data using Python and GridDB | GridDB: Open Source Time Series Database for IoT","og_description":"In this tutorial, we will explore a housing dataset using Python. We will first prune the dataset as per our needs. Later, we will see how can we build a","og_url":"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/","og_site_name":"GridDB: Open Source Time Series Database for IoT","article_publisher":"https:\/\/www.facebook.com\/griddbcommunity\/","article_published_time":"2022-05-05T07:00:00+00:00","article_modified_time":"2025-11-13T20:55:56+00:00","og_image":[{"width":1920,"height":1441,"url":"https:\/\/griddb.net\/wp-content\/uploads\/2020\/11\/architecture_1920x1441.jpeg","type":"image\/jpeg"}],"author":"griddb-admin","twitter_card":"summary_large_image","twitter_creator":"@GridDBCommunity","twitter_site":"@GridDBCommunity","twitter_misc":{"Written by":"griddb-admin","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/#article","isPartOf":{"@id":"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/"},"author":{"name":"griddb-admin","@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233"},"headline":"Building a Linear Regression Model for Housing Data using Python and GridDB","datePublished":"2022-05-05T07:00:00+00:00","dateModified":"2025-11-13T20:55:56+00:00","mainEntityOfPage":{"@id":"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/"},"wordCount":1187,"commentCount":0,"publisher":{"@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#organization"},"image":{"@id":"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2020\/11\/architecture_1920x1441.jpeg","articleSection":["Blog"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/","url":"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/","name":"Building a Linear Regression Model for Housing Data using Python and GridDB | GridDB: Open Source Time Series Database for IoT","isPartOf":{"@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/#primaryimage"},"image":{"@id":"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/#primaryimage"},"thumbnailUrl":"\/wp-content\/uploads\/2020\/11\/architecture_1920x1441.jpeg","datePublished":"2022-05-05T07:00:00+00:00","dateModified":"2025-11-13T20:55:56+00:00","description":"In this tutorial, we will explore a housing dataset using Python. We will first prune the dataset as per our needs. Later, we will see how can we build a","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.griddb.net\/en\/blog\/building-a-linear-regression-model-for-housing-data-using-python-and-griddb\/#primaryimage","url":"\/wp-content\/uploads\/2020\/11\/architecture_1920x1441.jpeg","contentUrl":"\/wp-content\/uploads\/2020\/11\/architecture_1920x1441.jpeg","width":1920,"height":1441},{"@type":"WebSite","@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#website","url":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/","name":"GridDB: Open Source Time Series Database for IoT","description":"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL","publisher":{"@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#organization","name":"Fixstars","url":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#\/schema\/logo\/image\/","url":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","contentUrl":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","width":200,"height":83,"caption":"Fixstars"},"image":{"@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/griddbcommunity\/","https:\/\/x.com\/GridDBCommunity","https:\/\/www.linkedin.com\/company\/griddb-by-toshiba"]},{"@type":"Person","@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#\/schema\/person\/4fe914ca9576878e82f5e8dd3ba52233","name":"griddb-admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/en\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5bceca1cafc06886a7ba873e2f0a28011a1176c4dea59709f735b63ae30d0342?s=96&d=mm&r=g","caption":"griddb-admin"},"url":"https:\/\/griddb.net\/en\/author\/griddb-admin\/"}]}},"_links":{"self":[{"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/posts\/46697","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/users\/41"}],"replies":[{"embeddable":true,"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/comments?post=46697"}],"version-history":[{"count":1,"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/posts\/46697\/revisions"}],"predecessor-version":[{"id":51371,"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/posts\/46697\/revisions\/51371"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/media\/27038"}],"wp:attachment":[{"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/media?parent=46697"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/categories?post=46697"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/tags?post=46697"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}