CockroachDB - CDC



CockroachDB CDC is built in feature.

In order to work with CockroachDB CDC, below Cluster setting should be done to enable rangefeeds
set cluster setting kv.rangefeed.enabled = true;

Changefeeds emit messages in JSON format by default. 
Different formats can be used by creating a changefeed with the format option and specifying one of the following:
  1. json
  2. csv
  3. avro
  4. parquet

Creating Changefeed using CDC:
create changefeed for table <table_name> into '<sink_name>'
with 
initial_scan='only', 
format=json;

    Creating Changefeed using CDC Queries (formerly CDC Transformations)
    create changefeed into '<sink_name>'
    with 
    initial_scan='only', 
    format=json
    as
    select stmt;

    Possible options with initial_scan
    1. initial_scan='only' > the changefeed job will end with a successful status (succeeded) after the initial scan completes and publish everything to CDC
    2. initial_scan='yes'  > publish everything to CDC and will listen to changes
    3. initial_scan='no'   > won't publish anything to CDC but will listen for updates

    If we have to use, AWS S3 as Sink, it will be in below format
    's3://{BUCKET NAME}/{PATH}?AWS_ACCESS_KEY_ID={KEY ID}&AWS_SECRET_ACCESS_KEY={SECRET ACCESS KEY}'


    CDC Queries is the new name for CDC transformations

    Scheduling Changefeed:
    create schedule <schedule_name> for changefeed into <sink_name> <changefeed options>
    recurring '<cron type schedule>'
    with schedule options
    on_execution_failure=retry, 
    on_previous_running=skip; 
    • on_execution_failure=retry, #retry Changefeed again upon failure of changefeed
    • on_previous_running=skip; #don't run change feed if previous run is still in progress


    To monitor the CDC
    SELECT job_id,description,running_status FROM [show jobs] WHERE job_type='CHANGEFEED';

    To monitor Scheduled Changefeeds
    select id, schedule_status, state, recurrence, command from [show running schedules for changefeed];

    We can manage the Changefeeds by checking Changefeed jobs and then we can pause, resume or cancel the jobs
    SHOW CHANGEFEED JOBS;

    PAUSE JOB job_id; #to pause a Changefeed
    RESUME JOB job_id; #to resume a Changefeed
    CANCEL JOB job_id;  #to cancel a Changefeed


    In for some reason, you don't want Changefeeds to flood CDC because of huge write operations,  use this setting which is valid at session level and this setting will not impact CDC outside of this session.
    set disable_changefeed_replication = 'on';

    Couple of new features related to Changefeeds 
    • can access the cdc_prev hidden column on a table to emit the previous state of a row or column. cdc_prev is a tuple-typed column that contains the table's columns.
    • event_op() - a new function that can be called from Changefeed which call tell the type of action that was performed which created Changefeed


    No comments:

    Post a Comment