{"id":46556,"date":"2017-09-07T00:00:00","date_gmt":"2017-09-07T07:00:00","guid":{"rendered":"https:\/\/griddb-linux-hte8hndjf8cka8ht.westus-01.azurewebsites.net\/blog\/connector-apache-spark\/"},"modified":"2017-09-07T00:00:00","modified_gmt":"2017-09-07T07:00:00","slug":"connector-apache-spark","status":"publish","type":"post","link":"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/","title":{"rendered":"GridDB Connector for Apache Spark"},"content":{"rendered":"<h2> Introduction <\/h2>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Apache_Spark\">Apache Spark <\/a> now has support to fully integrate GridDB into its workflow. For those unaware, Spark is <a href=\"https:\/\/en.wikipedia.org\/wiki\/Free_and_open-source_software\">FOSS<\/a> which saw its initial release in 2014. Since then, it has very quickly established itself as an important piece of Big Data processing and analyzing. This blog post is meant give instructions on how to install Spark on your GridDB machine and will also go over some brief queries to provide a tangible look at its usage. <\/p>\n<p>As briefly explained before, Apache Spark is a parallel data processing framework meant to provide <i>fast<\/i> data analytics. Using the <a href=\"https:\/\/github.com\/griddb\/griddb_spark\">GridDB connector <\/a> allows a GridDB database to be used as an input source for Spark queries and analytics. Its interactive shell can be used to quickly and easily perform ad-hoc queries by data scientists\/developers or can be built into user-facing business applications. Installation is a fairly simple process. <\/p>\n<p>This blog assumes your machine already has a <a href=\"https:\/\/github.com\/griddb\/griddb_nosql\">GridDB server<\/a>, the GridDB Java Client, and the <a href=\"https:\/\/github.com\/griddb\/griddb_hadoop_mapreduce\">GridDB Hadoop Mapreduce Connector<\/a>. These items all also each have their own sets of dependencies, so I will post a full list below. And please note, if you have any sorts of issues installing any of these items, please leave a comment below or post on the forums for help.<\/p>\n<p>Full list of dependencies: <\/p>\n<ul>\n<li><b>OS:<\/b>         CentOS6.7(x64)<\/li>\n<li><b>Maven:<\/b>      apache-maven-3.3.9<\/li>\n<li><b>Java:<\/b>           JDK 1.8.0_101<\/li>\n<li><b>Apache Hadoop:<\/b>  Version 2.6.5<\/li>\n<li><b>Apache Spark:<\/b>   Version 2.1.0<\/li>\n<li><b>Scala:<\/b>          Version 2.11.8<\/li>\n<li><b>GridDB server and Java client:<\/b>                3.0 CE<\/li>\n<li><b>GridDB connector for Apache Hadoop MapReduce:<\/b> 1.0<\/li>\n<\/ul>\n<p>If beginning from scratch, I recommend ensuring all of these items are installed and configured. This tutorial also assumes that your Hadoop, Spark, and Connector are all installed in the <code>[INSTALL_FOLDER]<\/code> directory (I used <code>\/opt<\/code>).<\/p>\n<h2> Installation <\/h2>\n<p>Once verified, please proceed with the steps outlined below:<\/p>\n<p>We start this process off with adding the following environment variables to <code>.bashrc<\/code><\/p>\n<pre class=\"prettyprint\">$ nano ~\/.bashrc<\/pre>\n<pre class=\"prettyprint\">\n export JAVA_HOME=\/usr\/lib\/jvm\/[JDK folder]\n export HADOOP_HOME=[INSTALL_FOLDER]\/hadoop-2.6.5\n export SPARK_HOME=[INSTALL_FOLDER]\/spark-2.1.0-bin-hadoop2.6\n export GRIDDB_SPARK=[INSTALL_FOLDER]\/griddb_spark\n export GRIDDB_SPARK_PROPERTIES=$GRIDDB_SPARK\/gd-config.xml\n \n export PATH=$HADOOP_HOME\/sbin:$HADOOP_HOME\/bin:$SPARK_HOME\/bin:$PATH\n \n export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME\/lib\/native\n export HADOOP_OPTS=\"$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME\/lib\/native\"\n<\/pre>\n<pre class=\"prettyprint\">$ source ~\/.bashrc<\/pre>\n<p>Once those are added, modify the <code>gd-config.xml<\/code> file.<\/p>\n<pre class=\"prettyprint\">$ cd $GRIDDB_SPARK\n$ nano gd-config.xml<\/pre>\n<pre class=\"prettyprint\">\n &lt;!-- GridDB properties --&gt;\n &lt;property&gt;\n \t &lt;name&gt;gs.user &lt;\/name&gt;\n \t &lt;value&gt;[GridDB user] &lt;\/value&gt;\n &lt;\/property&gt;\n &lt;property&gt;\n \t &lt;name&gt;gs.password &lt;\/name&gt;\n \t &lt;value&gt;[GridDB password] &lt;\/value&gt;\n &lt;\/property&gt;\n &lt;property&gt;\n \t &lt;name&gt;gs.cluster.name &lt;\/name&gt;\n \t &lt;value&gt;[GridDB cluster name] &lt;\/value&gt;\n &lt;\/property&gt;\n  &lt;!-- Define address and port for multicast method, leave it blank if using other method --&gt;\n &lt;property&gt;\n \t &lt;name&gt;gs.notification.address &lt;\/name&gt;\n \t &lt;value&gt;[GridDB notification address(default is 239.0.0.1)] &lt;\/value&gt;\n &lt;\/property&gt;\n &lt;property&gt;\n \t &lt;name&gt;gs.notification.port &lt;\/name&gt;\n \t &lt;value&gt;[GridDB notification port(default is 31999)] &lt;\/value&gt;\n &lt;\/property&gt;\n<\/pre>\n<h3> Build The Connector + An Example <\/h3>\n<p>Next up, refer to this <a href=\"https:\/\/github.com\/griddb\/griddb_spark\/blob\/master\/Configuration.md\">configuration <\/a> page for a quick definition of each of the GridDB properties.<\/p>\n<p>To build a GridDB Java client and a GridDB connector for Hadoop MapReduce, place the following files under the <code>$GRIDDB_SPARK\/gs-spark-datasource\/lib<\/code> directory.<\/p>\n<pre class=\"prettyprint\">\ngridstore.jar\ngs-hadoop-mapreduce-client-1.0.0.jar\n<\/pre>\n<p>(Note: these <code>.jar<\/code> files should have been created when you built your GridDB client and the GridDB Mapreduce Connector. You can find <code>gridstore.jar<\/code> in <code>\/usr\/griddb-X.X.X\/bin<\/code>, for example)<\/p>\n<p>Once that&#8217;s complete, add the SPARK_CLASSPATH to &#8220;spark-env.sh&#8221;<\/p>\n<pre class=\"prettyprint\">\n$ cd $SPARK_HOME\n$ nano conf\/spark-env.sh<\/pre>\n<pre class=\"prettyprint\">\n SPARK_CLASSPATH=.:$GRIDDB_SPARK\/gs-spark-datasource\/target\/gs-spark-datasource.jar:$GRIDDB_SPARK\/gs-spark-datasource\/lib\/gridstore.jar:$GRIDDB_SPARK\/gs-spark-datasource\/lib\/gs-hadoop-mapreduce-client-1.0.0.jar\n<\/pre>\n<p>Now that we&#8217;ve got the prerequisites out of the way, we can continue on to build the connector and an example to ensure everything is working properly. <\/p>\n<p>To begin, we will need to edit our <code>Init.java<\/code> file to add the correct authentication credientials.<\/p>\n<pre class=\"prettyprint\">\n$ cd $SPARK_HOME\/gs-spark-datasource-example\/src\/\n$ nano Init.java\n<\/pre>\n<p>And add in your credentials:<\/p>\n<pre class=\"prettyprint\">\nProperties props = new Properties();\nprops.setProperty(\"notificationAddress\", \"239.0.0.1\");\nprops.setProperty(\"notificationPort\", \"31999\");\nprops.setProperty(\"clusterName\", \"Spark-Cluster\");\nprops.setProperty(\"user\", \"admin\");\nprops.setProperty(\"password\", \"hunter2\");\nGridStore store = GridStoreFactory.getInstance().getGridStore(props);\n<\/pre>\n<p>And now we can run the mvn command like so: <\/p>\n<pre class=\"prettyprint\">\n$ cd $GRIDDB_SPARK\n$ mvn package\n<\/pre>\n<p>which will create the following <code>.jar<\/code> files: <\/p>\n<pre class=\"prettyprint\">\ngs-spark-datasource\/target\/gs-spark-datasource.jar\ngs-spark-datasource-example\/target\/example.jar\n<\/pre>\n<p>Now proceed with running the example program. First start your GridDB cluster. And then:<\/p>\n<p>Put some data into the server with the GridDB Java client<\/p>\n<pre class=\"prettyprint\">\n$ cd $GRIDDB_SPARK\n$ java -cp .\/gs-spark-datasource-example\/target\/example.jar:gs-spark-datasource\/lib\/gridstore.jar Init\n<\/pre>\n<h2> Queries <\/h2>\n<p>Now you can run queries with your GridDB connector for Spark:<\/p>\n<pre class=\"prettyprint\">\n$ spark-submit --class Query .\/gs-spark-datasource-example\/target\/example.jar\n<\/pre>\n<p>We will go over some brief examples of Apache Spark&#8217;s API. Examples are pulled from the <a href=\"https:\/\/spark.apache.org\/examples.html\">official page<\/a>. <\/p>\n<p>Spark&#8217;s defining feature is its RDD (Resilient Distributed Datasets) and the accompanying API. RDDs are immutable data structures that can be run in parallel on commodity hardware &#8212; essentially it is exactly what allows Spark to run its queries in parallel and outperform MapReduce. Here&#8217;s a very basic example; it will showcase how to build an RDD of the numbers 1 &#8211; 5<\/p>\n<pre class=\"prettyprint\">\nList<Integer> data = Arrays.asList(1, 2, 3, 4, 5);\nJavaRDD<Integer> distData = sc.parallelize(data);\n<\/pre>\n<p>With this, you can now run that small array in parallel. Pretty cool, huh?<\/p>\n<h4> Command Line Query <\/h4>\n<p>A &#8220;must-run&#8221; query in the Big Data scene is running a word count, so here&#8217;s what it looks like on Spark. For this example, let&#8217;s try using the shell (example taken from: <a href=\"https:\/\/www.dezyre.com\/apache-spark-tutorial\/spark-tutorial\">here<\/a>). To run this, please be sure you place a text file <code>input.txt <\/code> into your <code>$GRIDDB_SPARK<\/code> directory. Fill it with whatever text you like; I used the opening chapter of <i> Moby Dick <\/i>. Now fire up the spark shell:<\/p>\n<pre class=\"prettyprint\">$ spark-shell <\/pre>\n<p><a href=\"https:\/\/griddb.net\/newen\/wp-content\/uploads\/2017\/08\/Screenshot_7.png\"><img fetchpriority=\"high\" decoding=\"async\" src=\"https:\/\/griddb.net\/newen\/wp-content\/uploads\/2017\/08\/Screenshot_7.png\" alt=\"\" width=\"1030\" height=\"200\" class=\"alignleft size-full wp-image-5563\" srcset=\"\/wp-content\/uploads\/2017\/08\/Screenshot_7.png 1030w, \/wp-content\/uploads\/2017\/08\/Screenshot_7-600x117.png 600w, \/wp-content\/uploads\/2017\/08\/Screenshot_7-300x58.png 300w, \/wp-content\/uploads\/2017\/08\/Screenshot_7-768x149.png 768w, \/wp-content\/uploads\/2017\/08\/Screenshot_7-1024x199.png 1024w\" sizes=\"(max-width: 1030px) 100vw, 1030px\" \/><\/a><\/p>\n<pre class=\"prettyprint\">scala&gt; val inputfile = sc.textFile (\"input.txt\")\ninputfile: org.apache.spark.rdd.RDD[String] = input.txt MapPartitionsRDD[1] at textFile at <console>:24\n\nscala&gt; val counts = inputfile.flatMap (line => line.split (\" \" )).map (word => (word, 1)).reduceByKey(_+_)\ncounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:26\n\nscala&gt; counts.saveAsTextFile (\"output\")\n<\/pre>\n<p>And now if you head back into <code>$GRIDDB_SPARK<\/code>, you should find the <code> output <\/code> dir. Now just run a simple <code> cat <\/code> on the file in there to retrieve the word count results of your text file.<\/p>\n<pre class=\"prettyprint\">$ cd $GRIDDB_SPARK\n$ cd output\n$ cat part-00000 \n(Ah!,1)\n(Let,1)\n(dreamiest,,1)\n(dotings,1)\n(cooled,1)\n(spar,1)\n(previous,2)\n(street,,1)\n(old,6)\n(left,,1)\n(order,2)\n(told,1)\n(marvellous,,1)\n(Now,,1)\n(virtue,1)\n(Take,1)<\/pre>\n<h4> TS Query <\/h4>\n<p>Of course, Spark is also capable of handling much more complex queries. Because GridDB ideally deals mostly in TimeSeries (TS) data, how about we take a look into a TS query? Here&#8217;s a sample query taken from <a href=\"http:\/\/sryza.github.io\/spark-timeseries\/0.3.0\/docs\/users.html\">here<\/a>: <\/p>\n<pre class=\"prettyprint\">\nval tsRdd: TimeSeriesRDD = ...\n\n\/\/ Find a sub-slice between two dates \nval zone = ZoneId.systemDefault()\nval subslice = tsRdd.slice(\n  ZonedDateTime.of(LocalDateTime.parse(\"2015-04-10T00:00:00\"), zone)\n  ZonedDateTime.of(LocalDateTime.parse(\"2015-04-14T00:00:00\"), zone))\n\n\/\/ Fill in missing values based on linear interpolation\nval filled = subslice.fill(\"linear\")\n\n\/\/ Use an AR(1) model to remove serial correlations\nval residuals = filled.mapSeries(series => ar(series, 1).removeTimeDependentEffects(series))\n<\/pre>\n<p>Using Spark with GridDB as an input source is very easy and will prove to be a very useful implementation. We hope the marriage of both services yields lots of productive analysis work.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Apache Spark now has support to fully integrate GridDB into its workflow. For those unaware, Spark is FOSS which saw its initial release in 2014. Since then, it has very quickly established itself as an important piece of Big Data processing and analyzing. This blog post is meant give instructions on how to install [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[121],"tags":[],"class_list":["post-46556","post","type-post","status-publish","format-standard","hentry","category-blog"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.1.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>GridDB Connector for Apache Spark | GridDB: Open Source Time Series Database for IoT<\/title>\n<meta name=\"description\" content=\"Introduction Apache Spark now has support to fully integrate GridDB into its workflow. For those unaware, Spark is FOSS which saw its initial release in\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"GridDB Connector for Apache Spark | GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"og:description\" content=\"Introduction Apache Spark now has support to fully integrate GridDB into its workflow. For those unaware, Spark is FOSS which saw its initial release in\" \/>\n<meta property=\"og:url\" content=\"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/\" \/>\n<meta property=\"og:site_name\" content=\"GridDB: Open Source Time Series Database for IoT\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/griddbcommunity\/\" \/>\n<meta property=\"article:published_time\" content=\"2017-09-07T07:00:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/griddb.net\/newen\/wp-content\/uploads\/2017\/08\/Screenshot_7.png\" \/>\n<meta name=\"author\" content=\"Israel\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:site\" content=\"@GridDBCommunity\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Israel\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/\"},\"author\":{\"name\":\"Israel\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/c8a430e7156a9e10af73b1fbb46c2740\"},\"headline\":\"GridDB Connector for Apache Spark\",\"datePublished\":\"2017-09-07T07:00:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/\"},\"wordCount\":761,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/griddb.net\/en\/#organization\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/griddb.net\/newen\/wp-content\/uploads\/2017\/08\/Screenshot_7.png\",\"articleSection\":[\"Blog\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/\",\"url\":\"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/\",\"name\":\"GridDB Connector for Apache Spark | GridDB: Open Source Time Series Database for IoT\",\"isPartOf\":{\"@id\":\"https:\/\/griddb.net\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/griddb.net\/newen\/wp-content\/uploads\/2017\/08\/Screenshot_7.png\",\"datePublished\":\"2017-09-07T07:00:00+00:00\",\"description\":\"Introduction Apache Spark now has support to fully integrate GridDB into its workflow. For those unaware, Spark is FOSS which saw its initial release in\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/#primaryimage\",\"url\":\"https:\/\/griddb.net\/newen\/wp-content\/uploads\/2017\/08\/Screenshot_7.png\",\"contentUrl\":\"https:\/\/griddb.net\/newen\/wp-content\/uploads\/2017\/08\/Screenshot_7.png\"},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/griddb.net\/en\/#website\",\"url\":\"https:\/\/griddb.net\/en\/\",\"name\":\"GridDB: Open Source Time Series Database for IoT\",\"description\":\"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL\",\"publisher\":{\"@id\":\"https:\/\/griddb.net\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/griddb.net\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/griddb.net\/en\/#organization\",\"name\":\"Fixstars\",\"url\":\"https:\/\/griddb.net\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"contentUrl\":\"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png\",\"width\":200,\"height\":83,\"caption\":\"Fixstars\"},\"image\":{\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/griddbcommunity\/\",\"https:\/\/x.com\/GridDBCommunity\",\"https:\/\/www.linkedin.com\/company\/griddb-by-toshiba\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/c8a430e7156a9e10af73b1fbb46c2740\",\"name\":\"Israel\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/griddb.net\/en\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/4df8cfc155402a2928d11f80b0220037b8bd26c4f1b19c4598d826e0306e6307?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/4df8cfc155402a2928d11f80b0220037b8bd26c4f1b19c4598d826e0306e6307?s=96&d=mm&r=g\",\"caption\":\"Israel\"},\"url\":\"https:\/\/griddb.net\/en\/author\/israel\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"GridDB Connector for Apache Spark | GridDB: Open Source Time Series Database for IoT","description":"Introduction Apache Spark now has support to fully integrate GridDB into its workflow. For those unaware, Spark is FOSS which saw its initial release in","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/","og_locale":"en_US","og_type":"article","og_title":"GridDB Connector for Apache Spark | GridDB: Open Source Time Series Database for IoT","og_description":"Introduction Apache Spark now has support to fully integrate GridDB into its workflow. For those unaware, Spark is FOSS which saw its initial release in","og_url":"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/","og_site_name":"GridDB: Open Source Time Series Database for IoT","article_publisher":"https:\/\/www.facebook.com\/griddbcommunity\/","article_published_time":"2017-09-07T07:00:00+00:00","og_image":[{"url":"https:\/\/griddb.net\/newen\/wp-content\/uploads\/2017\/08\/Screenshot_7.png","type":"","width":"","height":""}],"author":"Israel","twitter_card":"summary_large_image","twitter_creator":"@GridDBCommunity","twitter_site":"@GridDBCommunity","twitter_misc":{"Written by":"Israel","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/#article","isPartOf":{"@id":"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/"},"author":{"name":"Israel","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/c8a430e7156a9e10af73b1fbb46c2740"},"headline":"GridDB Connector for Apache Spark","datePublished":"2017-09-07T07:00:00+00:00","mainEntityOfPage":{"@id":"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/"},"wordCount":761,"commentCount":0,"publisher":{"@id":"https:\/\/griddb.net\/en\/#organization"},"image":{"@id":"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/griddb.net\/newen\/wp-content\/uploads\/2017\/08\/Screenshot_7.png","articleSection":["Blog"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/","url":"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/","name":"GridDB Connector for Apache Spark | GridDB: Open Source Time Series Database for IoT","isPartOf":{"@id":"https:\/\/griddb.net\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/#primaryimage"},"image":{"@id":"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/#primaryimage"},"thumbnailUrl":"https:\/\/griddb.net\/newen\/wp-content\/uploads\/2017\/08\/Screenshot_7.png","datePublished":"2017-09-07T07:00:00+00:00","description":"Introduction Apache Spark now has support to fully integrate GridDB into its workflow. For those unaware, Spark is FOSS which saw its initial release in","inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/blog\/connector-apache-spark\/#primaryimage","url":"https:\/\/griddb.net\/newen\/wp-content\/uploads\/2017\/08\/Screenshot_7.png","contentUrl":"https:\/\/griddb.net\/newen\/wp-content\/uploads\/2017\/08\/Screenshot_7.png"},{"@type":"WebSite","@id":"https:\/\/griddb.net\/en\/#website","url":"https:\/\/griddb.net\/en\/","name":"GridDB: Open Source Time Series Database for IoT","description":"GridDB is an open source time-series database with the performance of NoSQL and convenience of SQL","publisher":{"@id":"https:\/\/griddb.net\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/griddb.net\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/griddb.net\/en\/#organization","name":"Fixstars","url":"https:\/\/griddb.net\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/","url":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","contentUrl":"https:\/\/griddb.net\/wp-content\/uploads\/2019\/04\/fixstars_logo_web_tagline.png","width":200,"height":83,"caption":"Fixstars"},"image":{"@id":"https:\/\/griddb.net\/en\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/griddbcommunity\/","https:\/\/x.com\/GridDBCommunity","https:\/\/www.linkedin.com\/company\/griddb-by-toshiba"]},{"@type":"Person","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/c8a430e7156a9e10af73b1fbb46c2740","name":"Israel","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/griddb.net\/en\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/4df8cfc155402a2928d11f80b0220037b8bd26c4f1b19c4598d826e0306e6307?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/4df8cfc155402a2928d11f80b0220037b8bd26c4f1b19c4598d826e0306e6307?s=96&d=mm&r=g","caption":"Israel"},"url":"https:\/\/griddb.net\/en\/author\/israel\/"}]}},"_links":{"self":[{"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/posts\/46556","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/comments?post=46556"}],"version-history":[{"count":0,"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/posts\/46556\/revisions"}],"wp:attachment":[{"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/media?parent=46556"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/categories?post=46556"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/griddb.net\/en\/wp-json\/wp\/v2\/tags?post=46556"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}