datastax · aimurphy · Oct 8, 2025 · Oct 7, 2025
diff --git a/docs/modules/ROOT/assets/images/cassandra-source-connector.drawio b/docs/modules/ROOT/assets/images/cassandra-source-connector.drawio
diff --git a/docs/modules/ROOT/examples/extension-start.sh b/docs/modules/ROOT/examples/extension-start.sh
diff --git a/docs/modules/ROOT/examples/java-start.sh b/docs/modules/ROOT/examples/java-start.sh
diff --git a/docs/modules/ROOT/pages/backfill-cli.adoc b/docs/modules/ROOT/pages/backfill-cli.adoc
@@ -11,7 +11,7 @@ Developers can also use the backfill CLI to trigger change events for downstream
 == Installation
 
 The CDC backfill CLI is distributed both as a JAR file and as a Pulsar-admin extension NAR file.
-The Pulsar-admin extension is packaged with the DataStax Luna Streaming distribution in the /cliextensions folder, so you don't need to build from source unless you want to make changes to the code.
+The Pulsar-admin extension is packaged with the IBM Elite Support for Apache Pulsar distribution in the `/cliextensions` folder, so you don't need to build from source unless you want to make changes to the code.
 
 Both artifacts are built with Gradle.
 To build the CLI, run the following commands:
@@ -50,19 +50,26 @@ Once the artifacts are generated, you can run the backfill CLI tool as either a
 Java standalone::
 +
 --
-[source,shell,subs="attributes+"]
+[source,shell]
 ----
-include::example$java-start.sh[]
+java -jar backfill-cli/build/libs/backfill-cli-{version}-all.jar --data-dir target/export --export-host 127.0.0.1:9042 \
+ --export-username cassandra --export-password cassandra --keyspace ks1 --table table1
 ----
 --
 
 Pulsar-admin extension::
 +
 --
-include::partial$extension.adoc[]
+The Pulsar-admin extension is packaged with the IBM Elite Support for Apache Pulsar (formerly DataStax Luna Streaming) distribution in the /cliextensions folder, so you don't need to build from source unless you want to make changes to the code.
+
+. Move the generated NAR archive to the /cliextensions folder of your Pulsar installation (e.g. /pulsar/cliextensions).
+. Modify the client.conf file of your Pulsar installation to include: `customCommandFactories=cassandra-cdc`.
+. Run the following command (this assumes the https://docs.datastax.com/en/installing/docs/installTARdse.html[default installation] of DSE Cassandra):
 +
+[source,shell]
 ----
-include::example$extension-start.sh[]
+-data-dir target/export --export-host 127.0.0.1:9042 \
+ --export-username cassandra --export-password cassandra --keyspace ks1 --table table1
 ----
 --
 ====
@@ -255,64 +262,78 @@ be exported in subdirectories of the data directory specified here;
 there will be one subdirectory per keyspace inside the data
 directory, then one subdirectory per table inside each keyspace
 directory.
+
 |--help, -h
 |Displays this help message
+
 |--dsbulk-log-dir=PATH, -l
 |The directory where DSBulk should store its logs. The default is a
 'logs' subdirectory in the current working directory. This
 subdirectory will be created if it does not exist. Each DSBulk
 operation will create a subdirectory inside the log directory
 specified here. This command is not available in the Pulsar-admin extension.
+
 |--export-bundle=PATH
-|The path to a secure connect bundle to connect to the Cassandra
-cluster, if that cluster is a DataStax Astra cluster. Options
---export-host and --export-bundle are mutually exclusive.
+|The path to a Secure Connect Bundle (SCB) to connect to an Astra DB database. Options --export-host and --export-bundle are mutually exclusive.
+
 |--export-consistency=CONSISTENCY
 |The consistency level to use when exporting data. The default is
 LOCAL_QUORUM.
+
 |--export-max-concurrent-files=NUM\|AUTO
 |The maximum number of concurrent files to write to. Must be a positive
 number or the special value AUTO. The default is AUTO.
+
 |--export-max-concurrent-queries=NUM\|AUTO
 |The maximum number of concurrent queries to execute. Must be a
 positive number or the special value AUTO. The default is AUTO.
+
 |--export-splits=NUM\|NC
 |The maximum number of token range queries to generate. Use the NC
 syntax to specify a multiple of the number of available cores, e.g.
 8C = 8 times the number of available cores. The default is 8C. This
 is an advanced setting; you should rarely need to modify the default
 value.
+
 |--export-dsbulk-option=OPT=VALUE
 |An extra DSBulk option to use when exporting. Any valid DSBulk option
 can be specified here, and it will be passed as-is to the DSBulk
 process. DSBulk options, including driver options, must be passed as
 '--long.option.name=<value>'. Short options are not supported. For more DSBulk options, see https://docs.datastax.com/en/dsbulk/docs/reference/commonOptions.html[here].
+
 |--export-host=HOST[:PORT]
 |The host name or IP and, optionally, the port of a node from the
 Cassandra cluster. If the port is not specified, it will default to
 9042. This option can be specified multiple times. Options
 --export-host and --export-bundle are mutually exclusive.
+
 |--export-password
 |The password to use to authenticate against the origin cluster.
 Options --export-username and --export-password must be provided
 together, or not at all. Omit the parameter value to be prompted for
 the password interactively.
+
 |--export-protocol-version=VERSION
 |The protocol version to use to connect to the Cassandra cluster, e.g.
 'V4'. If not specified, the driver will negotiate the highest
 version supported by both the client and the server.
+
 |--export-username=STRING
 |The username to use to authenticate against the origin cluster.
 Options --export-username and --export-password must be provided
 together, or not at all.
+
 |--keyspace=<keyspace>, -k
 |The name of the keyspace where the table to be exported exists
+
 |--max-rows-per-second=PATH
 |The maximum number of rows per second to read from the Cassandra
 table. Setting this option to any negative value or zero will
 disable it. The default is -1.
+
 |--table=<table>, -t
 |The name of the table to export data from for cdc back filling
+
 |--version, -v
 |Displays version info.
 |===

diff --git a/docs/modules/ROOT/pages/cdc-cassandra-events.adoc b/docs/modules/ROOT/pages/cdc-cassandra-events.adoc
@@ -1,6 +1,6 @@
-= CDC for Cassandra Events 
+= CDC for Cassandra Events
 
-The DataStax CDC for Cassandra agent pushes the mutation primary key for the CDC-enabled table into the Apache Pulsar events topic (also called the dirty topic). The messages in the data topic (or clean topic) are keyed messages where both the key and the payload are https://avro.apache.org/docs/current/spec.html#schema_record[AVRO records]: +
+The {cdc_cass_first} agent pushes the mutation primary key for the CDC-enabled table into the Apache Pulsar events topic (also called the dirty topic). The messages in the data topic (or clean topic) are keyed messages where both the key and the payload are https://avro.apache.org/docs/current/spec.html#schema_record[AVRO records]:
 
 * The message key is an AVRO record including all the primary key columns of your Cassandra table.
 * The message payload is an AVRO record including regular columns from your Cassandra table.
@@ -18,9 +18,9 @@ Finally, the following CQL data types are encoded as AVRO logical types:
 
 See https://avro.apache.org/docs/current/spec.html#Logical+Types[AVRO Logical Types] for more info on AVRO.
 
-== Change Event’s Key
+== Change Event's Key
 
-For a given table, the change event’s key is an AVRO record that contains a field for each column in the primary key of the table at the time the event was created. Both the events and the data topics (also called the dirty and the clean topics) have the same message key, an AVRO record including the primary key columns.
+For a given table, the change event's key is an AVRO record that contains a field for each column in the primary key of the table at the time the event was created. Both the events and the data topics (also called the dirty and the clean topics) have the same message key, an AVRO record including the primary key columns.
 
 == `INSERT` Event
 

diff --git a/docs/modules/ROOT/pages/cdcExample.adoc b/docs/modules/ROOT/pages/cdcExample.adoc
@@ -11,7 +11,7 @@ This installation requires the following. Latest version artifacts are available
 ** DSE - use `agent-dse4-<version>-all.jar`
 ** OSS C* - use `agent-c4-<version>-all.jar`
 * Pulsar
-** DataStax Luna Streaming - use `agent-dse4-<version>-all.jar`
+** IBM Elite Support for Apache Pulsar - use `agent-dse4-<version>-all.jar`
 * Pulsar C* source connector (CSC)
 ** Pulsar Cassandra Source NAR - use `pulsar-cassandra-source-<version>.nar`
 

diff --git a/docs/modules/ROOT/pages/faqs.adoc b/docs/modules/ROOT/pages/faqs.adoc
@@ -2,34 +2,30 @@
 
 If you are new to {cdc_cass_first}, these frequently asked questions are for you.
 
-== Introduction
-
-=== What is {cdc_cass}?
+== What is {cdc_cass}?
 
 The {cdc_cass} is a an open-source product from DataStax.
 
 With {cdc_cass}, updates to data in Apache Cassandra are put into a Pulsar topic, which in turn can write the data to external targets such as Elasticsearch, Snowflake, and other platforms.
 The {csc_pulsar_first} component is simple, with a 1:1 correspondence between the Cassandra table and a single Pulsar topic.
 
-=== What are the requirements for {cdc_pulsar}?
+== What are the requirements for {cdc_pulsar}?
 
 Minimum requirements are:
 
 * Cassandra version 3.11+ or 4.0+, DSE 6.8.16+ for near real-time event streaming CDC
 * Cassandra version 3.0 to 3.10 for batch CDC
-* Luna Streaming 2.8.0+ or Apache Pulsar 2.8.1+
+* IBM Elite Support for Apache Pulsar (formerly DataStax Luna Streaming) or Apache Pulsar 2.8.1+
 * Additional memory and CPU available on all Cassandra nodes
 
 [NOTE]
 ====
-Cassandra has supported batch CDC since Cassandra 3.0, but for near real-time event streaming, Cassandra 3.11+ or DSE 6.8.16+ are required. 
+Cassandra has supported batch CDC since Cassandra 3.0, but for near real-time event streaming, Cassandra 3.11+ or DSE 6.8.16+ are required.
 ====
 
-// insert link to pulsar cluster system doc
-
-Depending on the workloads of the CDC enabled C* tables, you may need to increase the CPU and memory specification of the C* nodes. 
+Depending on the workloads of the CDC enabled C* tables, you may need to increase the CPU and memory specification of the C* nodes.
 
-=== What is the impact of the C* CDC solution on the existing C* cluster? 
+== What is the impact of the C* CDC solution on the existing C* cluster?
 
 For each CDC-enabled C* table, C* needs extra processing cycles and storage to process the CDC commit logs. The impact for dealing with a single CDC-enabled table is small, but when there are a large number of C* tables with CDC enabled, the impact within C* increases. The performance impact occurs within C* itself, not the C* CDC solution with Pulsar.
 
@@ -39,7 +35,7 @@ For each C* write operation (one detected change-event), the Pulsar CSC connecto
 
 In a worst-case scenario, where a CDC-enabled C* has 100% write workload, the CDC solution would double the workload by adding the same amount of read workload to C* table. Since the C* read is primary key-based, it will be efficient.
 
-=== What are the {cdc_cass} limitations?
+== What are the {cdc_cass} limitations?
 
 {cdc_cass} has the following limitations:
 
@@ -50,8 +46,7 @@ In a worst-case scenario, where a CDC-enabled C* has 100% write workload, the CD
 * Does not support range deletes.
 * CQL column names must not match a Pulsar primitive type name (ex: INT32) below
 
-==== Table Pulsar primitive types
-
+.Pulsar primitive types
 [cols=2*, options=header]
 [%autowidth]
 |===
@@ -91,9 +86,9 @@ It stores the number of milliseconds since January 1, 1970, 00:00:00 GMT as an I
 
 |===
 
-=== What happens if Luna Streaming or Apache Pulsar is unavailable?
+== What happens if the Apache Pulsar service is unavailable?
 
-If the Pulsar cluster is down, the CDC agent on each C* node will periodically try to send the mutations, and will keep the CDC commitlog segments on disk until the data sending is successful. 
+If the Pulsar cluster is down, the CDC agent on each C* node will periodically try to send the mutations, and will keep the CDC commitlog segments on disk until the data sending is successful.
 
 The CDC agent keeps track of the CDC commitlog segment offsets, so the CDC agent knows where to resume sending the mutation messages when the Pulsar cluster is back online.
 
@@ -108,14 +103,14 @@ WARN  [CoreThread-5] 2021-10-29 09:12:52,790  NoSpamLogger.java:98 - Rejecting M
 ----
 
 To avoid or recover from this situation, increase the `cdc_total_space_in_mb` and restart the node.
-To prevent hitting this new limit, increase the write throughput to Luna Streaming or Apache Pulsar, or decrease the write throughput to your node.
+To prevent hitting this new limit, increase the write throughput to your Apache Pulsar cluster, or decrease the write throughput to your node.
 
-Increasing the Luna Streaming or Apache Pulsar write throughput may involve tuning the change agent configuration (the number of allocated threads, the batching delay, the number of inflight messages), the Luna Streaming or Apache Pulsar configuration (the number of partitions of your topics), or the {cdc_pulsar} configuration (query executors, batching and cache settings, connector parallelism).
+Increasing the write throughput may involve tuning the change agent configuration (the number of allocated threads, the batching delay, the number of inflight messages), the Pulsar cluster configuration (the number of partitions of your topics), or the {cdc_pulsar} configuration (query executors, batching and cache settings, connector parallelism).
 
 As a last resort, if losing data is acceptable in your CDC pipeline, remove `commitlog` files from the `cdc_raw` directory.
 Restarting the node is not needed in this case.
 
-=== I have multiple Cassandra datacenters. How do I configure {cdc_cass}?
+== I have multiple Cassandra datacenters. How do I configure {cdc_cass}?
 
 In a multi-datacenter Cassandra configuration, enable CDC and install the change agent in only one datacenter.
 To ensure the data sent to all datacenters are delivered to the data topic, make sure to configure replication to the datacenter that has CDC enabled on the table.
@@ -125,36 +120,30 @@ To ensure all updates in DC2 and DC3 are propagated to the data topic, configure
 For example, `replication = {'class': 'NetworkTopologyStrategy', 'dc1': 3, 'dc2': 3, 'dc3': 3})`.
 The data replicated to DC1 will be processed by the change agent and eventually end up in the data topic.
 
-=== Is {cdc_cass} an open-source project?
+== Is {cdc_cass} an open-source project?
 
 Yes, {cdc_cass} is open source using the Apache 2.0 license. You can find the source code on the GitHub repository https://github.com/datastax/cdc-apache-cassandra[datastax/cdc-apache-cassandra].
 
-=== What does {cdc_cass} provide that I cannot get with open-source Apache Pulsar?
+== What does {cdc_cass} provide that I cannot get with open-source Apache Pulsar?
 
 In effect, the {cdc_cass} implements the reverse of Apache Pulsar or DataStax Cassandra Sink Connector.
 With those sink connectors, data is taken from a Pulsar topic and put into Cassandra.
 With {cdc_cass}, updates to a Cassandra table are converted into events and put into a data topic.
 From there, the data can be published to external platforms like Elasticsearch, Snowflake, and other platforms.
 
-//=== Does {cdc_cass} support Kubernetes?
-
-//Yes.
-//You can run the {cdc_pulsar} on Luna Streaming or Apache Pulsar running on Minikube, Google Kubernetes Engine (GKE), Microsoft Azure Kubernetes Service, // Amazon Kubernetes Service (AKS), and other commonly used platforms.
-//You can deploy the change agent with Cassandra on Kubernetes with the https://github.com/datastax/cass-operator[cass-operator].
-
-=== Where is the {cdc_cass} public GitHub repository?
+== Where is the {cdc_cass} public GitHub repository?
 
 The source for this FAQs document is co-located with the {cdc_cass} repository code.
 You can access the repository https://github.com/datastax/cdc-apache-cassandra[here].
 
-=== How do I install {cdc_cass}?
+== How do I install {cdc_cass}?
 
 Follow the xref:install.adoc[install] instructions.
 
-=== What is Prometheus?
+== What is Prometheus?
 
 https://prometheus.io/docs/introduction/overview/[Prometheus] is an open-source tool to collect metrics on a running app, providing real-time monitoring and alerts.
 
-=== What is Grafana?
+== What is Grafana?
 
-https://grafana.com/[Grafana] is a visualization tool that helps you make sense of metrics and related data coming from your apps via Prometheus.
+https://grafana.com/[Grafana] is a visualization tool that helps you make sense of metrics and related data coming from your apps via Prometheus.