Status

Current state: Under Discussion

Discussion threadhere

JIRAhere

Released: <Cassandra Sidecar Version>

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

For those managing Cassandra clusters over extended periods, it's common to encounter the following scenarios:

  • Moving a Cassandra cluster to higher capacity hardware or replacing aging servers
  • Moving a Cassandra cluster to new physical data centres
  • Moving Cassandra clusters to different type of infrastructure

In above scenarios, the Cassandra host/instances can be replaced by a new host/instance using the regular host replacement procedure. Time spent to replace a single host/instance depends on the volume of data each instance has, network bandwidth and other factors. Replacing all required instances could take multiple days, or even weeks depending on the number of instances to migrate while maintaining the availability of the cluster. While replacing a Cassandra host/instance, most of the time is spent on bootstrapping, during which the new instance receives data streams from its replicas.

Instead of bootstrapping, data could be transferred from the instance getting replaced to the new instance, and then new instance could just start with same data; this would save a lot of time. This approach is equivalent to an instance coming up with a different IP and can take full advantage of the file streaming Cassandra Sidecar already has. File streaming using the sidecar is much faster than bootstrapping. Multiple instances can be migrated parallelly without any downtime by migrating instances such that quorum of any replica set is live at any point of time (assuming *QUORUM consistency levels are used).

Audience

Cassandra admins, operators & developers who build solutions around Cassandra.

Goals

  • Migrate one or more healthy Cassandra instances of a cluster fast.
  • Minimize the down time of a Cassandra instance while migrating.

Non-Goals

  • Making/applying changes to configuration of a Cassandra cluster and instances.
  • Managing the Cassandra processes like bringing them up or down while migrating the instances.
  • Migrating non-healthy instances like instance having disk issues. In such cases, they should fall back to normal host replacement.

Out of initial scope

  • All files may not be required to copy from source to destination (such as file related to heapdumps, stack traces etc... or any other files maintained for operational purposes but not related to Cassandra). User may want to exclude some files/directories from migration. Using simple filters to exclude files/directories for now. Using complex expressions for matching files/directories for exclusion could be added later.

Proposed Changes

This approach proposes changes only to Cassandra Sidecar. No changes are expected in Cassandra. Below procedure at high level can be followed for migrating a Cassandra instance (source) to another instance (destination) with the help of Sidecars and without bootstrap.

  • Bring up Sidecar at destination
  • Stop Cassandra process at source
  • Copy data to destination host/instance using Sidecar instances
  • Start the Cassandra process at destination

Data copy could take hours to complete depending on the volume of data each instance has and network bandwidth limitations. If source instance is down and new instance takes multiple hours to come live, then the destination instance would be lagging behind a lot. The time gap between source instance going down and destination instance coming up can be minimized by initiating first round of data copy before bringing the source Cassandra instance down. Source Cassandra process can continue writing data and add/delete/change some files (due to compaction/repairs/writes etc...) during the first data copy. Once the initial data copy is complete, source can be brought down and another data copy (a final one) can be initiated so that destination instance has same data as source. This final data copy will be quick compared to initial data copy as good amount of data already exists at destination. Now this procedure looks as follows:

  • Bring up sidecar at destination
  • Initiate first round of data copy to destination host/instance using Sidecar instances
  • Stop Cassandra process at source
  • Initiate final round of data copy to destination host/instance using Sidecar instances
  • Start the Cassandra process at destination

Here, the migration process starts when the source instance is live (hence it is live migration) and the instance is switched in the ring at the end. Let’s look into the details of how a single Cassandra instance will be migrated to a new instance.

MIGRATING ONE INSTANCE

  1. First identify the destination instance. One may make changes to their configuration like topology to bring-in the destination instance into the cluster. How these configuration changes are made and applied are not in the scope of this document.
  2. Keep the Cassandra process running as usual at source. Do not start the Cassandra process on the destination but start the Sidecar. Submit data copy request to destination Sidecar to pull data from source using Sidecar.
    1. During data copy, the destination Sidecar first pulls the list of files present at the source at that point of time.
    2. Then, the destination sidecar checks the list of files present locally against the list of files at the source.
      1. If a file is present at the source but not at the destination, then that file will be added to the download list.
      2. If a file is present at the destination but does not match by size with the source file, then the local file is deleted and added to the list of files to download. It also avoids filling up the disk with unnecessary files.
      3. For all files existing at source and destination, file digest is calculated and compared. Files for which digest does not match will be deleted and added to the list of files to download. During implementation a flag will be provided to enable it on demand so that users can enable it selectively (user may opt for disabling it during initial data copy and enable it during final data copy).
      4. Downloads the shortlisted files.
    3. While copying the data, source continues to write/update the data leading to add more SSTables or delete existing SSTables due to compaction etc... It is fine to be in this state at this stage.
    4. After downloading the list of files, the destination may not have some files or may have some files that are no longer present at the source. It may not be possible to ensure that 100% of the data at the destination matches the source as the source continues to run and makes changes. We can relax 100% matching of the data to some lower threshold to consider the data copy a successful one. This success threshold can be specified as part of the request payload. This success threshold will be tested before downloading shortlisted files. If the success threshold is met, then the data copy task is considered as success.
    5. It could be possible that, after downloading the files once, the threshold was not met. This process can be repeated multiple times to meet the threshold. The number of iterations can be specified as part of the request payload. 
      1. The first iteration is considered a zeroth iteration as the destination may not have anything at that time and the success threshold will not be met.
      2. If the threshold is met in an iteration, then the data copy operation ends with a success status.
    6. If the threshold is not met even after multiple iterations, then the data copy task fails. The operators can still continue the migration if they want, because the data will be copied again after bringing down the source.
  3. Now, bring down the Cassandra process at source, keeping the Sidecar up and running. Once the source Cassandra process goes down, we can expect that there will be no changes made to the files. How the instances are brought down is not in scope of this document.
  4. Initiate the final data copy at destination with 100% success threshold. Destination pulls a list of files, deletes unnecessary files and the files whose digests are not matching, and downloads the required files. Afterwards, it compares the files with the source and checks if there is a 100% match.
  5. Now, bring up the Cassandra instance at the destination. At this moment, the destination will have the same data as its source and will be equivalent to an instance coming up with a new IP address. The Sidecar at the source can now be brought down.
  6. If the destination has more number of data_file_directories compared to source, then executing nodetool relocatesstables is required.
  7. How to bring up/down Cassandra/Sidecar instances or making/applying config changes are outside the scope of this document.

Below diagram depicts the above procedure.



This procedure is roughly equivalent to doing a rsync multiple times. Sidecar’s file transmission handler FileStreamHandler is pretty fast and this proposal leverages it.

MIGRATING MULTIPLE INSTANCES

If multiple instances need to be migrated, then above procedure can be executed sequentially. To improve speed of migrations, one can pick multiple instances in such a manner that the number of instances that are brought down at any point of time does not impact the quorum of any replica set at any point of time, assuming *_QUORUM consistency level is used. If much stronger consistency levels are used like ALL, then client would observe interruptions in query processing. The degree of parallelism and execution of migrations are at the discretion of the operator.

Operator can optionally initiate a round of data copy tasks across the board even before starting this procedure on multiple instances to reduce the time taken to migrate an instance or set of instances.

New or Changed Public Interfaces

Below new APIs need to be added to Sidecar to facilitate data transfer.

List-files endpoint

Endpoint to list all instance files as URLs by a source, so that a sidecar can download the files using them. Along with a list of files to download, sending total bytes to download will help to check if the success threshold met to proceed further or not. Cassandra handles multiple types of directories for data (data_file_directories), commit log(commitlog_directory), hints(hints_directory), cdc (cdc_raw_directory) etc... 

Each file in those directories listed by sidecar roughly looks like this: "<api_version>/migration/files/<type_of_directory>/<optinal_index>/ + relative" path from the directory.

For example:

If the cassandra.yaml has the config like:

data_file_directories:/var/lib/cassandra/data

commitlog_directory:/var/lib/cassandra_commitlog

And if source has the following files

/var/lib/cassandra/data/ks1/table1/1_Data.db
/var/lib/cassandra_commitlog/1.log

Then the files list exposed by source side car will have the following urls

/api/v1/migration/files/data_file_directories/0/ks1/table1/1_Data.db

/api/v1/migration/files/commitlog_directory/1.log

Notes:

  • data_file_directories can have a list of directories. Hence using index to identify and separate each directory.
  • This endpoint can also accept a list of files/directories to exclude so that only desired files are listed.

Data copy task endpoints

New endpoints will be introduced to create, get and cancel (if required) data copy tasks from the Sidecar. Each data copy task will make use of a file downloader. File downloader takes the responsibility to get the list of files from the source, remove unnecessary local files, validates the integrity of existing files and downloads the files from source. File downloader doesn't make assumptions about how the source is going to list the files, for it, it is just an URL to download. It also makes use of a location mapper (which can be plugged-in) to map source file urls to local file paths. Since files need to be copied to relatively the same location at the destination too, local paths remain unchanged after the directories path configured in cassandra.yaml. This endpoint can also accept a list of files/directories to exclude so that only desired files are listed.

File download/transfer endpoint

To download a single file

File digest endpoint

Returns digest of a single file specified in the path

Security

The proposed endpoints are required only at the time performing migrations otherwise they are not enabled by default. These endpoints should be enabled on demand. There is no point in making them available when a migration is not in progress and enabling for instances for which no migration is planned. To facilitate it, a request filter will be introduced for these endpoints which will allow/deny requests. How the filter is configured goes to the implementation details.

When migration is in progress, the new endpoints can expose files present on the disk which are downloadable. So, access to these endpoints should be strictly restricted to admin/operator/tooling only. Hence, protecting these endpoints is mandatory.

EndpointEnabled at source?Enabled at destination?
List files endpointYesNo (can be enabled for debugging)
Data copy task endpointsNoYes
File digest endpointYesYes
File downloaderYesNo

Compatibility, Deprecation, and Migration Plan

This is a new feature. Compatibility, deprecation and migration plan are not applicable.

Test Plan

Tests will be added in Cassandra Sidecar covering end-to-end flow.

Rejected Alternatives

  • Cassandra Sidecar has a way to transmit SSTables exposed via route /api/v1/keyspaces/:keyspace/tables/:table/snapshots/:snapshot/components/:component . It allows to transmit SSTables of a keyspace. For migrating one instance to another, we need to copy all the files and not just SSTables.
  • One may utilise the "rsync" Linux tool for data transfer. However, it might not be permitted by the administrator or available in various environments such as Kubernetes or virtual instances like EC2. Enabling data transfer through a sidecar facilitates smooth instance migration.



  • No labels