diff --git a/docs/modules/ROOT/assets/images/cassandra-source-connector.drawio b/docs/modules/ROOT/assets/images/cassandra-source-connector.drawio deleted file mode 100644 index 89c813d9..00000000 --- a/docs/modules/ROOT/assets/images/cassandra-source-connector.drawio +++ /dev/null @@ -1 +0,0 @@ -5Vtbd9soEP41enSOBLr5MXba5HTbbvakmyZ92YMlbNHKQkE4jvvrF1lIlgSJHUeOL82LYYARfPNpYAbFgMPp0yVDafSFhjg2gBk+GfDCAMCyXFv85JJFIfFtrxBMGAkLkbkS3JDfWI4spTMS4kzKChGnNOYkbQoDmiQ44A0ZYozOm93GNA4bghRNsCK4CVCsSr+TkEfVuvqrhitMJpF8tA/k+qao7CxXkkUopPOaCH4w4JBRyovS9GmI4xy8Ji4fn2mtJsZwwjcZcHeZLv6bAQfPMOn99f364Yf3T8+Ss31E8Uyu+DxFQYSF7HoWZ4iJwpkBXDRNDThIRln+Y+SP2kC0XDZflFgyOktCnE/HEs3ziHB8Ix6Wt84Fe4Qs4tNYNo9JHA9pTNlyLAwR9seBkIcoiyodj5hxIox1HpNJImQjyjmdigYkBaywzCDjjP7CNX1u4OPRuJplHUaJbK4bP9VEEtZLTKeYs4XoIluhI8kgOW5B98wpJPMVZ5ySCFGNLsCRHZHk6aTSvjKlKEhrvsayimGHKMtQEjIkxIl4Ud9mnzb0nKbdoGm5ZhPNErcalpYOS9/cEZRAgTJlNJwFmBWsj3OGjUTFneSlT7df8llMcgi6fAXGfoCDQEfmke/YjtkR/F4LflcDv6mBH+4KfqjAjx/F+rLlVpCSQAFZeNo0LworBTjL1gM9QsGvydI0f894TBIs5V34hn7LN3i+xjdo+ezuClD7VFwDAPaeXYPzgmvQ+4XzE/ELwNVg/65+wT0VGkMI90xj9RS4jsansr1BT4P9u9LYf4nGbfBv6IwJOBX5sAh/BGJvMkkHADvtw/ABbHh9BeIQcXQU5wfPOjw4y9i8hqeCIU7C8zz8FrUgFnwmQRMy/ET4Xa18L8qmWFlRu8hXb5aVRVlJxOzv6pViFHDK+mrcslYOfNYOWfE6rT36c8QmmK8/o+KwkU1QrVqPBzUmK2UMx4iTx2YOQmdH+YRrSnJvXB06QWuvNltUKNYtR9VzBm1F7WDAaSkqgFEULWlVLfsNTFOj2L0yzdsd07yjZFqlp3JPHTENeK3EyK6Zpgb5e2Xa7ojmnATRbKsjokHznYmmpjNOlGhwQ6L5h0U0u3ngAlZ14Ho11dy1qnZNNjXVc6Jk8zckW/+wyNYKlkSIvzXZvLWqdk02NRm2NdlWBLuv82sN2Vb8um/Qa19ksw6KbE6/dVbrb7mFunZLkb1ZVCAsjxa1bmneIXt+wm4rjPHNF6cFXuwuCsXzu+W8moRkGIW9ER5ThntzRriahRShP2/SvryuDAQVMdPkHqckDPPhA4Yz8huNlqpyEksMhV5nYDgXOa3hIEYjHA+qpEQ9ybb805L/5Xe6nbCobtzlXIz6pbYukWGeQWC7DQOV78e2vC670PE4w291adfw4dvV+LPTe0ivrpwouvr35k5z8/d6j7ZtKLnNvru9S9OEn1pE3INyabZnNxhlO/Z2Ls02m4qg35pLd3unFtYOg4Lm1mlttnVaxv7OaUfAMw+26AHsbc9pdpnYqOLP/obntFdvn63UMYSg0w1Ra7c/JuDQpFGOgMjVFWLpME07T58fpc/sMNw4vluII+Qa8PvHSTT1jJ8FEZ6iHsMTkpU3avpLV/NdLl3bXxlB03rPW8LQvP1Bez/h18Un17nE5ldwizQfIFoKTkINSTO8i28FxjTh8nNiy5b1j2hK4hyibyiiU1T2kvNZ2UIT7Sjmed4WptewheWYGlsAXzVFKevcFGoYo3GVJ2gKu31p4O/XDuopH/4RdnD95pnXKsO27g0hqqtP6osNYPWPCfDD/w== \ No newline at end of file diff --git a/docs/modules/ROOT/examples/extension-start.sh b/docs/modules/ROOT/examples/extension-start.sh deleted file mode 100644 index d7c00928..00000000 --- a/docs/modules/ROOT/examples/extension-start.sh +++ /dev/null @@ -1,2 +0,0 @@ -./bin/pulsar-admin cassandra-cdc backfill --data-dir target/export --export-host 127.0.0.1:9042 \ - --export-username cassandra --export-password cassandra --keyspace ks1 --table table1 \ No newline at end of file diff --git a/docs/modules/ROOT/examples/java-start.sh b/docs/modules/ROOT/examples/java-start.sh deleted file mode 100644 index 2f12ef98..00000000 --- a/docs/modules/ROOT/examples/java-start.sh +++ /dev/null @@ -1,2 +0,0 @@ -java -jar backfill-cli/build/libs/backfill-cli-{version}-all.jar --data-dir target/export --export-host 127.0.0.1:9042 \ - --export-username cassandra --export-password cassandra --keyspace ks1 --table table1 \ No newline at end of file diff --git a/docs/modules/ROOT/pages/backfill-cli.adoc b/docs/modules/ROOT/pages/backfill-cli.adoc index 92b18366..dc5a8900 100644 --- a/docs/modules/ROOT/pages/backfill-cli.adoc +++ b/docs/modules/ROOT/pages/backfill-cli.adoc @@ -11,7 +11,7 @@ Developers can also use the backfill CLI to trigger change events for downstream == Installation The CDC backfill CLI is distributed both as a JAR file and as a Pulsar-admin extension NAR file. -The Pulsar-admin extension is packaged with the DataStax Luna Streaming distribution in the /cliextensions folder, so you don't need to build from source unless you want to make changes to the code. +The Pulsar-admin extension is packaged with the IBM Elite Support for Apache Pulsar distribution in the `/cliextensions` folder, so you don't need to build from source unless you want to make changes to the code. Both artifacts are built with Gradle. To build the CLI, run the following commands: @@ -50,19 +50,26 @@ Once the artifacts are generated, you can run the backfill CLI tool as either a Java standalone:: + -- -[source,shell,subs="attributes+"] +[source,shell] ---- -include::example$java-start.sh[] +java -jar backfill-cli/build/libs/backfill-cli-{version}-all.jar --data-dir target/export --export-host 127.0.0.1:9042 \ + --export-username cassandra --export-password cassandra --keyspace ks1 --table table1 ---- -- Pulsar-admin extension:: + -- -include::partial$extension.adoc[] +The Pulsar-admin extension is packaged with the IBM Elite Support for Apache Pulsar (formerly DataStax Luna Streaming) distribution in the /cliextensions folder, so you don't need to build from source unless you want to make changes to the code. + +. Move the generated NAR archive to the /cliextensions folder of your Pulsar installation (e.g. /pulsar/cliextensions). +. Modify the client.conf file of your Pulsar installation to include: `customCommandFactories=cassandra-cdc`. +. Run the following command (this assumes the https://docs.datastax.com/en/installing/docs/installTARdse.html[default installation] of DSE Cassandra): + +[source,shell] ---- -include::example$extension-start.sh[] +-data-dir target/export --export-host 127.0.0.1:9042 \ + --export-username cassandra --export-password cassandra --keyspace ks1 --table table1 ---- -- ==== @@ -255,64 +262,78 @@ be exported in subdirectories of the data directory specified here; there will be one subdirectory per keyspace inside the data directory, then one subdirectory per table inside each keyspace directory. + |--help, -h |Displays this help message + |--dsbulk-log-dir=PATH, -l |The directory where DSBulk should store its logs. The default is a 'logs' subdirectory in the current working directory. This subdirectory will be created if it does not exist. Each DSBulk operation will create a subdirectory inside the log directory specified here. This command is not available in the Pulsar-admin extension. + |--export-bundle=PATH -|The path to a secure connect bundle to connect to the Cassandra -cluster, if that cluster is a DataStax Astra cluster. Options ---export-host and --export-bundle are mutually exclusive. +|The path to a Secure Connect Bundle (SCB) to connect to an Astra DB database. Options --export-host and --export-bundle are mutually exclusive. + |--export-consistency=CONSISTENCY |The consistency level to use when exporting data. The default is LOCAL_QUORUM. + |--export-max-concurrent-files=NUM\|AUTO |The maximum number of concurrent files to write to. Must be a positive number or the special value AUTO. The default is AUTO. + |--export-max-concurrent-queries=NUM\|AUTO |The maximum number of concurrent queries to execute. Must be a positive number or the special value AUTO. The default is AUTO. + |--export-splits=NUM\|NC |The maximum number of token range queries to generate. Use the NC syntax to specify a multiple of the number of available cores, e.g. 8C = 8 times the number of available cores. The default is 8C. This is an advanced setting; you should rarely need to modify the default value. + |--export-dsbulk-option=OPT=VALUE |An extra DSBulk option to use when exporting. Any valid DSBulk option can be specified here, and it will be passed as-is to the DSBulk process. DSBulk options, including driver options, must be passed as '--long.option.name='. Short options are not supported. For more DSBulk options, see https://docs.datastax.com/en/dsbulk/docs/reference/commonOptions.html[here]. + |--export-host=HOST[:PORT] |The host name or IP and, optionally, the port of a node from the Cassandra cluster. If the port is not specified, it will default to 9042. This option can be specified multiple times. Options --export-host and --export-bundle are mutually exclusive. + |--export-password |The password to use to authenticate against the origin cluster. Options --export-username and --export-password must be provided together, or not at all. Omit the parameter value to be prompted for the password interactively. + |--export-protocol-version=VERSION |The protocol version to use to connect to the Cassandra cluster, e.g. 'V4'. If not specified, the driver will negotiate the highest version supported by both the client and the server. + |--export-username=STRING |The username to use to authenticate against the origin cluster. Options --export-username and --export-password must be provided together, or not at all. + |--keyspace=, -k |The name of the keyspace where the table to be exported exists + |--max-rows-per-second=PATH |The maximum number of rows per second to read from the Cassandra table. Setting this option to any negative value or zero will disable it. The default is -1. + |--table=, -t |The name of the table to export data from for cdc back filling + |--version, -v |Displays version info. |=== diff --git a/docs/modules/ROOT/pages/cdc-cassandra-events.adoc b/docs/modules/ROOT/pages/cdc-cassandra-events.adoc index 64e555b5..bf10fe75 100644 --- a/docs/modules/ROOT/pages/cdc-cassandra-events.adoc +++ b/docs/modules/ROOT/pages/cdc-cassandra-events.adoc @@ -1,6 +1,6 @@ -= CDC for Cassandra Events += CDC for Cassandra Events -The DataStax CDC for Cassandra agent pushes the mutation primary key for the CDC-enabled table into the Apache Pulsar events topic (also called the dirty topic). The messages in the data topic (or clean topic) are keyed messages where both the key and the payload are https://avro.apache.org/docs/current/spec.html#schema_record[AVRO records]: + +The {cdc_cass_first} agent pushes the mutation primary key for the CDC-enabled table into the Apache Pulsar events topic (also called the dirty topic). The messages in the data topic (or clean topic) are keyed messages where both the key and the payload are https://avro.apache.org/docs/current/spec.html#schema_record[AVRO records]: * The message key is an AVRO record including all the primary key columns of your Cassandra table. * The message payload is an AVRO record including regular columns from your Cassandra table. @@ -18,9 +18,9 @@ Finally, the following CQL data types are encoded as AVRO logical types: See https://avro.apache.org/docs/current/spec.html#Logical+Types[AVRO Logical Types] for more info on AVRO. -== Change Event’s Key +== Change Event's Key -For a given table, the change event’s key is an AVRO record that contains a field for each column in the primary key of the table at the time the event was created. Both the events and the data topics (also called the dirty and the clean topics) have the same message key, an AVRO record including the primary key columns. +For a given table, the change event's key is an AVRO record that contains a field for each column in the primary key of the table at the time the event was created. Both the events and the data topics (also called the dirty and the clean topics) have the same message key, an AVRO record including the primary key columns. == `INSERT` Event diff --git a/docs/modules/ROOT/pages/cdcExample.adoc b/docs/modules/ROOT/pages/cdcExample.adoc index 4da8771f..4fd6faa2 100644 --- a/docs/modules/ROOT/pages/cdcExample.adoc +++ b/docs/modules/ROOT/pages/cdcExample.adoc @@ -11,7 +11,7 @@ This installation requires the following. Latest version artifacts are available ** DSE - use `agent-dse4--all.jar` ** OSS C* - use `agent-c4--all.jar` * Pulsar -** DataStax Luna Streaming - use `agent-dse4--all.jar` +** IBM Elite Support for Apache Pulsar - use `agent-dse4--all.jar` * Pulsar C* source connector (CSC) ** Pulsar Cassandra Source NAR - use `pulsar-cassandra-source-.nar` diff --git a/docs/modules/ROOT/pages/faqs.adoc b/docs/modules/ROOT/pages/faqs.adoc index 21fa7603..4fb3c120 100644 --- a/docs/modules/ROOT/pages/faqs.adoc +++ b/docs/modules/ROOT/pages/faqs.adoc @@ -2,34 +2,30 @@ If you are new to {cdc_cass_first}, these frequently asked questions are for you. -== Introduction - -=== What is {cdc_cass}? +== What is {cdc_cass}? The {cdc_cass} is a an open-source product from DataStax. With {cdc_cass}, updates to data in Apache Cassandra are put into a Pulsar topic, which in turn can write the data to external targets such as Elasticsearch, Snowflake, and other platforms. The {csc_pulsar_first} component is simple, with a 1:1 correspondence between the Cassandra table and a single Pulsar topic. -=== What are the requirements for {cdc_pulsar}? +== What are the requirements for {cdc_pulsar}? Minimum requirements are: * Cassandra version 3.11+ or 4.0+, DSE 6.8.16+ for near real-time event streaming CDC * Cassandra version 3.0 to 3.10 for batch CDC -* Luna Streaming 2.8.0+ or Apache Pulsar 2.8.1+ +* IBM Elite Support for Apache Pulsar (formerly DataStax Luna Streaming) or Apache Pulsar 2.8.1+ * Additional memory and CPU available on all Cassandra nodes [NOTE] ==== -Cassandra has supported batch CDC since Cassandra 3.0, but for near real-time event streaming, Cassandra 3.11+ or DSE 6.8.16+ are required. +Cassandra has supported batch CDC since Cassandra 3.0, but for near real-time event streaming, Cassandra 3.11+ or DSE 6.8.16+ are required. ==== -// insert link to pulsar cluster system doc - -Depending on the workloads of the CDC enabled C* tables, you may need to increase the CPU and memory specification of the C* nodes. +Depending on the workloads of the CDC enabled C* tables, you may need to increase the CPU and memory specification of the C* nodes. -=== What is the impact of the C* CDC solution on the existing C* cluster? +== What is the impact of the C* CDC solution on the existing C* cluster? For each CDC-enabled C* table, C* needs extra processing cycles and storage to process the CDC commit logs. The impact for dealing with a single CDC-enabled table is small, but when there are a large number of C* tables with CDC enabled, the impact within C* increases. The performance impact occurs within C* itself, not the C* CDC solution with Pulsar. @@ -39,7 +35,7 @@ For each C* write operation (one detected change-event), the Pulsar CSC connecto In a worst-case scenario, where a CDC-enabled C* has 100% write workload, the CDC solution would double the workload by adding the same amount of read workload to C* table. Since the C* read is primary key-based, it will be efficient. -=== What are the {cdc_cass} limitations? +== What are the {cdc_cass} limitations? {cdc_cass} has the following limitations: @@ -50,8 +46,7 @@ In a worst-case scenario, where a CDC-enabled C* has 100% write workload, the CD * Does not support range deletes. * CQL column names must not match a Pulsar primitive type name (ex: INT32) below -==== Table Pulsar primitive types - +.Pulsar primitive types [cols=2*, options=header] [%autowidth] |=== @@ -91,9 +86,9 @@ It stores the number of milliseconds since January 1, 1970, 00:00:00 GMT as an I |=== -=== What happens if Luna Streaming or Apache Pulsar is unavailable? +== What happens if the Apache Pulsar service is unavailable? -If the Pulsar cluster is down, the CDC agent on each C* node will periodically try to send the mutations, and will keep the CDC commitlog segments on disk until the data sending is successful. +If the Pulsar cluster is down, the CDC agent on each C* node will periodically try to send the mutations, and will keep the CDC commitlog segments on disk until the data sending is successful. The CDC agent keeps track of the CDC commitlog segment offsets, so the CDC agent knows where to resume sending the mutation messages when the Pulsar cluster is back online. @@ -108,14 +103,14 @@ WARN [CoreThread-5] 2021-10-29 09:12:52,790 NoSpamLogger.java:98 - Rejecting M ---- To avoid or recover from this situation, increase the `cdc_total_space_in_mb` and restart the node. -To prevent hitting this new limit, increase the write throughput to Luna Streaming or Apache Pulsar, or decrease the write throughput to your node. +To prevent hitting this new limit, increase the write throughput to your Apache Pulsar cluster, or decrease the write throughput to your node. -Increasing the Luna Streaming or Apache Pulsar write throughput may involve tuning the change agent configuration (the number of allocated threads, the batching delay, the number of inflight messages), the Luna Streaming or Apache Pulsar configuration (the number of partitions of your topics), or the {cdc_pulsar} configuration (query executors, batching and cache settings, connector parallelism). +Increasing the write throughput may involve tuning the change agent configuration (the number of allocated threads, the batching delay, the number of inflight messages), the Pulsar cluster configuration (the number of partitions of your topics), or the {cdc_pulsar} configuration (query executors, batching and cache settings, connector parallelism). As a last resort, if losing data is acceptable in your CDC pipeline, remove `commitlog` files from the `cdc_raw` directory. Restarting the node is not needed in this case. -=== I have multiple Cassandra datacenters. How do I configure {cdc_cass}? +== I have multiple Cassandra datacenters. How do I configure {cdc_cass}? In a multi-datacenter Cassandra configuration, enable CDC and install the change agent in only one datacenter. To ensure the data sent to all datacenters are delivered to the data topic, make sure to configure replication to the datacenter that has CDC enabled on the table. @@ -125,36 +120,30 @@ To ensure all updates in DC2 and DC3 are propagated to the data topic, configure For example, `replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3, 'dc3': 3})`. The data replicated to DC1 will be processed by the change agent and eventually end up in the data topic. -=== Is {cdc_cass} an open-source project? +== Is {cdc_cass} an open-source project? Yes, {cdc_cass} is open source using the Apache 2.0 license. You can find the source code on the GitHub repository https://github.com/datastax/cdc-apache-cassandra[datastax/cdc-apache-cassandra]. -=== What does {cdc_cass} provide that I cannot get with open-source Apache Pulsar? +== What does {cdc_cass} provide that I cannot get with open-source Apache Pulsar? In effect, the {cdc_cass} implements the reverse of Apache Pulsar or DataStax Cassandra Sink Connector. With those sink connectors, data is taken from a Pulsar topic and put into Cassandra. With {cdc_cass}, updates to a Cassandra table are converted into events and put into a data topic. From there, the data can be published to external platforms like Elasticsearch, Snowflake, and other platforms. -//=== Does {cdc_cass} support Kubernetes? - -//Yes. -//You can run the {cdc_pulsar} on Luna Streaming or Apache Pulsar running on Minikube, Google Kubernetes Engine (GKE), Microsoft Azure Kubernetes Service, // Amazon Kubernetes Service (AKS), and other commonly used platforms. -//You can deploy the change agent with Cassandra on Kubernetes with the https://github.com/datastax/cass-operator[cass-operator]. - -=== Where is the {cdc_cass} public GitHub repository? +== Where is the {cdc_cass} public GitHub repository? The source for this FAQs document is co-located with the {cdc_cass} repository code. You can access the repository https://github.com/datastax/cdc-apache-cassandra[here]. -=== How do I install {cdc_cass}? +== How do I install {cdc_cass}? Follow the xref:install.adoc[install] instructions. -=== What is Prometheus? +== What is Prometheus? https://prometheus.io/docs/introduction/overview/[Prometheus] is an open-source tool to collect metrics on a running app, providing real-time monitoring and alerts. -=== What is Grafana? +== What is Grafana? -https://grafana.com/[Grafana] is a visualization tool that helps you make sense of metrics and related data coming from your apps via Prometheus. +https://grafana.com/[Grafana] is a visualization tool that helps you make sense of metrics and related data coming from your apps via Prometheus. \ No newline at end of file diff --git a/docs/modules/ROOT/pages/index.adoc b/docs/modules/ROOT/pages/index.adoc index aeae47bb..7b4353c8 100644 --- a/docs/modules/ROOT/pages/index.adoc +++ b/docs/modules/ROOT/pages/index.adoc @@ -1,12 +1,11 @@ = About {cdc_cass} -{cdc_cass_first} is open-source software (OSS) that sends Cassandra mutations -for tables having Change Data Capture (CDC) enabled to https://www.datastax.com/products/luna-streaming[Luna Streaming] or https://pulsar.apache.org/[Apache Pulsar(TM)], which in turn can write the data to platforms such as Elasticsearch(R) or Snowflake(R). +{cdc_cass_first} is open-source software (OSS) that sends Cassandra mutations for tables having Change Data Capture (CDC) enabled to https://www.ibm.com/docs/en/supportforpulsar[IBM Elite Support for Apache Pulsar] or your own self-managed https://pulsar.apache.org/[Apache Pulsar(TM)] deployment, which in turn can write the data to platforms such as Elasticsearch(R) or Snowflake(R). == Key Features * Supports Apache Cassandra 3.11+, 4.0+, and Datastax Enterprise Server 6.8.16+ -* Supports Luna Streaming 2.8+ and Apache Pulsar 2.8.1+ +* Supports IBM Elite Support for Apache Pulsar (formerly DataStax Luna Streaming) and Apache Pulsar 2.8.1+ * De-duplicates updates from multiple replicas * Propagates Cassandra schema change to the built-in Pulsar schema registry * Supports AVRO message format @@ -63,7 +62,7 @@ For each update to the table, an MD5 digest is calculated to de-duplicate the up [cols="1,1"] |=== -| Cassandra version | Apache Pulsar/Luna Streaming +| Cassandra version | Apache Pulsar/IBM Elite Support for Apache Pulsar (formerly DataStax Luna Streaming) | Cassandra v3.x | https://github.com/datastax/cdc-apache-cassandra/tree/master/agent-c3[agent-c3] | Cassandra v4.x | https://github.com/datastax/cdc-apache-cassandra/tree/master/agent-c4[agent-c4] | DSE 6.8.16+ | https://github.com/datastax/cdc-apache-cassandra/tree/master/agent-dse4[agent-dse4] @@ -71,14 +70,14 @@ For each update to the table, an MD5 digest is calculated to de-duplicate the up == Supported streaming platforms -* Luna Streaming 2.8 and later (current Luna Streaming version is {luna_version}) +* IBM Elite Support for Apache Pulsar (formerly DataStax Luna Streaming) 2.8 and later (current version is {luna_version}) * Apache Pulsar 2.8.1 and later (current Pulsar version is {pulsar_version}) === Connector deployment matrix [cols="1"] |=== -| Apache Pulsar/Luna Streaming +| Apache Pulsar/IBM Elite Support for Apache Pulsar (formerly DataStax Luna Streaming) | https://github.com/datastax/cdc-apache-cassandra/tree/master/connector[connector] |=== diff --git a/docs/modules/ROOT/pages/install.adoc b/docs/modules/ROOT/pages/install.adoc index 39b0cf47..9997dd1a 100644 --- a/docs/modules/ROOT/pages/install.adoc +++ b/docs/modules/ROOT/pages/install.adoc @@ -34,7 +34,7 @@ tar xvf cassandra-source-agents-.tar == Start Cassandra with the Change Agent for Cassandra All data nodes of your Cassandra or DSE datacenter must run the change agent as a JVM agent to send mutations into the events topic of your streaming software. -Start your Cassandra or DSE nodes with the appropriate producer binary matching your Cassandra (3.11 or 4.0) or DSE (6.8.16) version and your streaming platform (Luna Streaming 2.8+ or Apache Pulsar 2.8.1+). +Start your Cassandra or DSE nodes with the appropriate producer binary matching your Cassandra (3.11 or 4.0) or DSE (6.8.16) version and your streaming platform: IBM Elite Support for Apache Pulsar (formerly DataStax Luna Streaming) 2.8+ or Apache Pulsar 2.8.1+. In CDC agent versions *before 1.0.3*, the CDC agent Pulsar connection parameters were provided as extra JVM options after the `jar` file name in the form of a comma-separated list of `paramName=paramValue`, as below: @@ -88,35 +88,26 @@ cdc_total_space_in_mb: 50000 include::partial$agentParams.adoc[] == Download {cdc_pulsar} + IMPORTANT ==== By downloading this DataStax product, you agree to the terms of the open-source https://www.apache.org/licenses/LICENSE-2.0[Apache-2.0 license agreement]. ==== -. Download the `cassandra-source-connectors-.tar` file from the https://downloads.datastax.com/#cassandra-source-connector[DataStax downloads page]. - -The following files are available: - -[cols="1"] -|=== -| Streaming platform | NAR file - -| Apache Pulsar 2.8 and Luna Streaming 2.8 | pulsar-cassandra-source-.nar +Download the `cassandra-source-connectors-.tar` file from the https://downloads.datastax.com/#cassandra-source-connector[DataStax downloads page]. -|=== +For Apache Pulsar and IBM Elite Support for Apache Pulsar (formerly DataStax Luna Streaming) 2.8, the `pulsar-cassandra-source-.nar` file is available. -Extract the files from the tar with the following command: +Extract the files from the tar, specifying the version that matches your streaming platform: [source,bash] ---- tar xvf cassandra-source-connectors-.tar ---- -Use the version that matches your streaming platform. - == Deploy {cdc_pulsar} -To deploy the {cdc_pulsar} `NAR` file in your Pulsar cluster, upload it to Luna Streaming or Pulsar using the `pulsar-admin sources create` command. +To deploy the {cdc_pulsar} `NAR` file in your Pulsar cluster, upload it to your Pulsar cluster using the `pulsar-admin sources create` command. You need to deploy {cdc_pulsar} for each CDC-enabled table. For each CDC-enabled table, the change agent will send events to the events topic. @@ -190,7 +181,64 @@ include::partial$cfgCassandraAuth.adoc[] include::partial$cfgCassandraSSL.adoc[] -include::partial$cfgCassandraJavaDriverSettings.adoc[] +== Pass {cdc_pulsar} settings directly to the DataStax Java driver + +In your {cdc_pulsar} configuration file, you can directly pass settings to the DataStax Java driver by using the `datastax-java-driver` prefix. +For example: + +[source,console] +---- +datastax-java-driver.basic.request.consistency=ALL +---- + +== Mapping {cdc_pulsar} settings to Java driver properties + +The following table identifies functionally equivalent {cdc_pulsar} and DataStax Java driver settings. + +NOTE: If you define both in your configuration, the {cdc_pulsar} setting take precedence over the `datastax-java-driver.property-name`. +If you do not provide either in your configuration, {cdc_pulsar} defaults are in effect. + +For information about the Java properties, refer to the link:https://docs.datastax.com/en/developer/java-driver-dse/2.3/manual/core/configuration/[DataStax Java driver documentation]. + +|=== +| {csc_pulsar_first} | Using datastax-java-driver prefix + +| `contactPoints` +| `datastax-java-driver.basic.contact-points` + +| `loadBalancing.localDc` +| `datastax-java-driver.basic.load-balancing-policy.local-datacenter` + +| `cloud.secureConnectBundle` +| `datastax-java-driver.basic.cloud.secure-connect-bundle` + +| `queryExecutionTimeout` +| `datastax-java-driver.basic.request.timeout` + +| `connectionPoolLocalSize` +| `datastax-java-driver.advanced.connection.pool.local.size` + +| `compression` +| `datastax-java-driver.advanced.protocol.compression` + +| `metricsHighestLatency` +| `datastax-java-driver.advanced.metrics.session.cql-requests.highest-latency` +|=== + +There is a difference between the {cdc_pulsar}'s `contactPoints` setting and the Java driver's `datastax-java-driver.basic.contact-points`. +For {cdc_pulsar}'s `contactPoints`, the value of the port is appended to every host provided by this setting. +For `datastax-java-driver.basic.contact-points`, you must provide the fully qualified contact points (`host:port`). + +By passing in the Java driver's setting, this option gives you more configuration flexibility because you can specify a different port for each host. For example: + +[source,console] +---- +datastax-java-driver.basic.contact-points = 127.0.0.1:9042, 127.0.0.2:9042 +---- + +=== Java driver reference + +For more information, refer to the link:https://docs.datastax.com/en/developer/java-driver/4.3/manual/core/configuration/reference/[Java driver reference configuration] topic. == Scaling up your configuration diff --git a/docs/modules/ROOT/partials/cfgCassandraJavaDriverSettings.adoc b/docs/modules/ROOT/partials/cfgCassandraJavaDriverSettings.adoc deleted file mode 100644 index 469ff89c..00000000 --- a/docs/modules/ROOT/partials/cfgCassandraJavaDriverSettings.adoc +++ /dev/null @@ -1,58 +0,0 @@ -== Pass {cdc_pulsar} settings directly to the DataStax Java driver - -In your {cdc_pulsar} configuration file, you can directly pass settings to the DataStax Java driver by using the `datastax-java-driver` prefix. -For example: - -[source,console] ----- -datastax-java-driver.basic.request.consistency=ALL ----- - -== Mapping {cdc_pulsar} settings to Java driver properties - -The following table identifies functionally equivalent {cdc_pulsar} and DataStax Java driver settings. - -NOTE: If you define both in your configuration, the {cdc_pulsar} setting take precedence over the `datastax-java-driver.property-name`. -If you do not provide either in your configuration, {cdc_pulsar} defaults are in effect. - -For information about the Java properties, refer to the link:https://docs.datastax.com/en/developer/java-driver-dse/2.3/manual/core/configuration/[DataStax Java driver documentation]. - -|=== -| {csc_pulsar_first} | Using datastax-java-driver prefix - -| `contactPoints` -| `datastax-java-driver.basic.contact-points` - -| `loadBalancing.localDc` -| `datastax-java-driver.basic.load-balancing-policy.local-datacenter` - -| `cloud.secureConnectBundle` -| `datastax-java-driver.basic.cloud.secure-connect-bundle` - -| `queryExecutionTimeout` -| `datastax-java-driver.basic.request.timeout` - -| `connectionPoolLocalSize` -| `datastax-java-driver.advanced.connection.pool.local.size` - -| `compression` -| `datastax-java-driver.advanced.protocol.compression` - -| `metricsHighestLatency` -| `datastax-java-driver.advanced.metrics.session.cql-requests.highest-latency` -|=== - -There is a difference between the {cdc_pulsar}'s `contactPoints` setting and the Java driver's `datastax-java-driver.basic.contact-points`. -For {cdc_pulsar}'s `contactPoints`, the value of the port is appended to every host provided by this setting. -For `datastax-java-driver.basic.contact-points`, you must provide the fully qualified contact points (`host:port`). - -By passing in the Java driver's setting, this option gives you more configuration flexibility because you can specify a different port for each host. For example: - -[source,console] ----- -datastax-java-driver.basic.contact-points = 127.0.0.1:9042, 127.0.0.2:9042 ----- - -== Java driver reference - -For more information, refer to the link:https://docs.datastax.com/en/developer/java-driver/4.3/manual/core/configuration/reference/[Java driver reference configuration] topic. diff --git a/docs/modules/ROOT/partials/extension.adoc b/docs/modules/ROOT/partials/extension.adoc deleted file mode 100644 index 0ac839fc..00000000 --- a/docs/modules/ROOT/partials/extension.adoc +++ /dev/null @@ -1,5 +0,0 @@ -The Pulsar-admin extension is packaged with the DataStax Luna Streaming distribution in the /cliextensions folder, so you don't need to build from source unless you want to make changes to the code. - -. Move the generated NAR archive to the /cliextensions folder of your Pulsar installation (e.g. /pulsar/cliextensions). -. Modify the client.conf file of your Pulsar installation to include: `customCommandFactories=cassandra-cdc`. -. Run the following command (this assumes the https://docs.datastax.com/en/installing/docs/installTARdse.html[default installation] of DSE Cassandra): \ No newline at end of file