CockroachDB CDC is built in feature.
set cluster setting kv.rangefeed.enabled = true;
Changefeeds emit messages in JSON format by default. 
Different formats can be used by creating a changefeed with the format option and specifying one of the following:
- json
 - csv
 - avro
 - parquet
 
Creating Changefeed using CDC:
create changefeed for table <table_name> into '<sink_name>'
with
initial_scan='only',
format=json;
create changefeed for table <table_name> into '<sink_name>'
with
initial_scan='only',
format=json;
Creating Changefeed using CDC Queries (formerly CDC Transformations)
create changefeed into '<sink_name>'
with
initial_scan='only',
format=json
as
select stmt;
create changefeed into '<sink_name>'
with
initial_scan='only',
format=json
as
select stmt;
Possible options with initial_scan
- initial_scan='only' > the changefeed job will end with a successful status (succeeded) after the initial scan completes and publish everything to CDC
 - initial_scan='yes' > publish everything to CDC and will listen to changes
 - initial_scan='no' > won't publish anything to CDC but will listen for updates
 
If we have to use, AWS S3 as Sink, it will be in below format
's3://{BUCKET NAME}/{PATH}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}'
CDC Queries is the new name for CDC transformations
Scheduling Changefeed:
create schedule <schedule_name> for changefeed into <sink_name> <changefeed options>
create schedule <schedule_name> for changefeed into <sink_name> <changefeed options>
recurring '<cron type schedule>'
with schedule options
	on_execution_failure=retry, 
	on_previous_running=skip; 
- on_execution_failure=retry, #retry Changefeed again upon failure of changefeed
 - on_previous_running=skip; #don't run change feed if previous run is still in progress
 
To monitor the CDC
SELECT job_id,description,running_status FROM [show jobs] WHERE job_type='CHANGEFEED';
SELECT job_id,description,running_status FROM [show jobs] WHERE job_type='CHANGEFEED';
To monitor Scheduled Changefeeds
select id, schedule_status, state, recurrence, command from [show running schedules for changefeed];
We can manage the Changefeeds by checking Changefeed jobs and then we can pause, resume or cancel the jobs
SHOW CHANGEFEED JOBS;
PAUSE JOB job_id; #to pause a Changefeed
RESUME JOB job_id; #to resume a Changefeed
CANCEL JOB job_id;  #to cancel a Changefeed
In for some reason, you don't want Changefeeds to flood CDC because of huge write operations,  use this setting which is valid at session level and this setting will not impact CDC outside of this session.
set disable_changefeed_replication = 'on';
Couple of new features related to Changefeeds 
- can access the cdc_prev hidden column on a table to emit the previous state of a row or column. cdc_prev is a tuple-typed column that contains the table's columns.
 - event_op() - a new function that can be called from Changefeed which call tell the type of action that was performed which created Changefeed
 
No comments:
Post a Comment