Collect Cryptomarket Data with GridDB

Collecting historical data from the cryptomarket, using Coingecko API and GridDB.

Cryptocurrencies have captured the world’s attention since the price of the main ones such as Bitcoin has reached record prices. Unlike traditional financial markets, the foundation of cryptocurrencies is in an open ledger, called Blockchain, which works thanks to a network of miners that validate transactions and secure them using cryptographic techniques.

The analysis of financial markets is pertinent since it allows us to understand how it works to make predictions, obtain a better understanding of it, or even as a prerequisite to making investments with a piece of broader and more accurate knowledge.

In particular, the decentralized way of working, the high volatility, and the growing interest in the short and medium-term behavior of cryptocurrencies, motivates us to look for reliable and fast analysis tools to be able to make analyses and predictions in this dynamic market.

In this article, we want to use the features of GridDB to store the historical data of the Bitcoin price movements. For this, we will connect to the Coingecko API.

We will also develop a Python script to automate obtaining the information from the API and storing it in our GridDB.

What technologies we will be using?

We will build our tool on top of the following:

  • OS: Ubuntu 18.04
  • Python 3.6!
  • GridDB server: last version available v4.5.2
  • GridDB c client: v4.5.0
  • GridDB Python client: 0.8.3

Setting up the environment.

Our environment preparation starts with Python 3.6 over an Ubuntu 18.04 Operating System. Then we will need GridDB both as a server and the client.

Following the instructions from the official repository, first, we downloaded the .deb package from the official release link and installed with the command

Getting the package:

wget https://github.com/griddb/griddb/releases/download/v4.5.2/griddb_4.5.2_amd64.deb

And install it as usual.

sudo apt install ./griddb_4.5.2_amd64.deb

It is important to define the environment variables for Grid. Also, we must log in as gsadm user, to define a password for the admin user and start a node.

The next step is to download and install the GridDB c-client.

wget https://download.opensuse.org/repositories/home:/knonomura/xUbuntu_18.04/amd64/griddb-c-client_4.5.0_amd64.deb
sudo apt install ./griddb-c-client_4.5.0_amd64.deb

As a prerequisite for using the GridDB Python client, we need to install Numpy and Pandas. and SWIG.

wget https://prdownloads.sourceforge.net/swig/swig-3.0.12.tar.gz
tar xvfz swig-3.0.12.tar.gz
cd swig-3.0.12
./configure
make
sudo make install

A good advantage now is that the Python client is already available in the pypi package manager, and the installation process on Ubuntu 18.04 can be done simply with pip:

python3.6 -m pip install griddb_python

We can see how, if we access the Python console, the package griddb_python is properly installed.

Getting the data

There are several options for getting the data that every day is generated in the market of cryptocurrencies. In this case, for illustration purposes, we will focus on the historical data of the main cryptocurrency, Bitcoin.

It is very interesting to study the behavior of the price of Bitcoin, respecting fiat currencies. Bitcoin officially born in 2009, but we currently don’t have data for the first years, because at the beginning it wasn’t exchanged publically. There are of course transactions that are stored in the first blocks of the chain, but the price was not controlled by the market.

Let’s see how we can get these historical data from Coingecko. We can start by installing the Python client:

python3.6 -m pip install pycoingecko

Here is the repository, with the information about how to use this API.

When writing the script we will import the Coingecko library in this way:

from pycoingecko import CoinGeckoAPI

Let’s import Pandas too, so we can manipulate the data as dataframes:

import pandas as pd

The first thing to do is creating an instance of the Coingecko API

coingecko = CoinGeckoAPI()

Now, we can define the functions we will use.

First, we can get the market data of a given cryptocurrency using the get_coin_by_id API call. This information we will save in a dictionary called cryptocurrency_data, which in turn we will convert into a dataframe, so we can have it as a csv file.

def get_data(cryptocurrency):
    cryptocurrency_data = coingecko.get_coin_by_id(cryptocurrency, market_data='true', sparkline='true')
    df = pd.DataFrame.from_dict(cryptocurrency_data, orient='index')
    df.to_csv(r'cryptocurrency_data.csv')
    return df

For example, we know that 'bitcoin' is one of the cryptocurrencies available, then the following line will give us the market data available for Bitcoin.

get_data('bitcoin')

And the information we get in the csv looks like this:

If we want to know what other cryptocurrencies are available, we can define a function to get all the currencies supported by coingecko.

def get_available_curr():
    return coingecko.get_supported_vs_currencies()

Now, in order to be more specific about our search, we can refine the output defining a function to get the price for a given specific date:

# get data about a crypto currency for a specific date
def get_price_by_date(cryptocurrency, a_date):
    data = coingecko.get_coin_history_by_id(coin_id, a_date)
    price = data['market_data']['current_price']['usd']
    return price

Finally, our goal here is to get the historical data of the Bitcoin’s price. A function called get_historical_data will help us to achieve it.

According to the documentation, the number of days will define how the data is shown. For example, 1 day will present the data divided in minutes, but if we have more than 1 day and less than 90 days, the data will be present in hours. And if we ask for more than 90 days, the data will be shown on a daily basis.

def get_hsitorical_data(cryptocurrency, fiat_currency, number_of_days):
    historic_price = coingecko.get_coin_market_chart_by_id(cryptocurrency, fiat_currency, number_of_days)
    prices = [price[1] for price in historic_price['prices']]
    return prices

The returned variable prices is a list. For example, getting the price of the Bitcoin, of th last 5 days in USD, can be obtained with:

print(get_hsitorical_data('bitcoin', 'USD', 5))

And the output is:

[19259.08030258841, 19193.068890063274, 19206.672769667563, 19118.969144726925, 18965.63616668234, 18960.652709654136, 19050.430880588847, 18990.7793634949, 18844.52437618184, 18873.21199220193, 18734.216704469767, 18753.288921684914, 18818.24732423196, 18813.656564419583, 18481.79300558414, 17959.305407026557, 17814.023726259376, 17901.579048072937, 17672.036464087603, 16767.389103553738, 17016.96216829086, 17134.181554772298, 17289.54296210973, 17106.587656105632, 17331.4762052838, 17006.44771520346, 16823.201432127884, 17010.20985595975, 16721.488013156064, 16761.546965177848, 16637.362422462975, 16842.53065248619, 17195.84403474497, 17105.9312737208, 17075.8100346244, 17138.029512395206, 17393.535276669707, 17248.96543684629, 17195.531466234796, 17073.86409154308, 17192.598651467735, 17229.806019685024, 17232.39613479942, 16926.28405241609, 16870.3011306117, 16760.47144801828, 16841.31525546079, 16821.304443182802, 17061.937579188543, 16858.460475364325, 16797.14755015842, 16638.44643803269, 16779.904684746798, 16807.83803236001, 16820.094022073426, 16808.808218210037, 17055.615405688295, 17008.343058249597, 17054.024622349476, 17097.50909797396, 17160.944233312493, 17081.86851776995, 16991.665437686977, 17027.683600569744, 16987.94930527874, 17010.59041345089, 17041.641919991886, 17035.614023573155, 16922.606314865297, 16933.710467900757, 17200.332760677484, 17167.860885749073, 17123.01185956512, 17361.901621171553, 17404.918158919027, 17497.251070586633, 17667.212140138796, 17749.984303735313, 17698.71548702206, 17806.829994664855, 17732.254867093405, 17716.16974243831, 17767.95578065574, 17704.67191022015, 17637.20942656516, 17581.23011590924, 17743.80422275333, 17827.476832589644, 17777.23209954322, 17797.354567421582, 17728.82850859051, 17826.57123916383, 18135.82259357728, 18083.917418061254, 18165.51914403239, 18095.05067283884, 18050.805289775806, 18060.28166508652, 18036.475224790385, 18031.013344442, 18089.37740068174, 18091.290368854792, 18112.333933562346, 18102.92697464547, 18202.5966538826, 18266.287411393914, 18140.052096296262, 18192.645660559352, 18408.78365577349, 18483.289178463645, 18494.51841080122, 18566.892532959428, 18535.14020869768, 18510.032786248743, 18509.742863375064, 18439.96937840414, 18453.345378206308, 18485.314339104578, 18516.004629775616, 18606.395740319094, 18801.45191322498]

Conclusion

Every day the cryptocurrencies are more attractive to make investments or purely for research purposes. In either case, we will always need cutting-edge technologies to access the data, manipulating it, and extract useful analysis from it.

In this post we have shown how to extract historical data of the Bitcoin price respecting to USD, using the Coingecko API. Also, we’ve used GridDB and the GridDB Python client to store the data obtained.

Using GridDB has given us a great tool to use this data, now we can make time series analysis or execute predictions, based on the insight we have.

If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.