SHC with Spark Structured Streaming

Hi,

I have a Spark Structured Streaming application where I'd like to write streaming data to HBase using SHC. It reads data from a location where new csv files continuously are being created. The defined catalog works for writing a DataFrame with identical data into HBase.
The key components of my streaming application are a DataStreamReader and a DataStreamWriter.

```
val inputDataStream = spark
      .readStream
      .option("sep", ",")
      .schema(schema)
      .csv("/path/to/data/*.csv")

inputDataStream
      .writeStream
      .outputMode("append")
      .options(
        Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable -> "2"))
      .format("org.apache.spark.sql.execution.datasources.hbase")
      .start
```

When running the application I'm getting the following message:

`Exception in thread "main" java.lang.UnsupportedOperationException: Data source org.apache.spark.sql.execution.datasources.hbase does not support streamed writing
        at org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:285)
        at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:286)
        at my.package.SHCStreamingApplication$.main(SHCStreamingApplication.scala:153)
        at my.package.SHCStreamingApplication.main(SHCStreamingApplication.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)`

Does anyone know a solution or way/workaround to still use the SHC for writing structured streaming data to HBase?
Thanks in advance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SHC with Spark Structured Streaming #205

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SHC with Spark Structured Streaming #205

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions