Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • All files may not be required to copy from source to destination (such as file related to heapdumps, stack traces etc... or any other files maintained for operational purposes but not realted related to Cassandra). User may want to exclude some files/directories from migration. Using simple filters to exclude files/directories for now. Using complex expressions for matching files/directories for exclusion could be added later.
  • To verify whether the desired file set matches with source, only file path and size is considered at the moment. Strict binary level verification is deferred for later.

Proposed Changes

This approach proposes changes only to Cassandra Sidecar. No changes are expected in Cassandra. Below procedure at high level can be followed for migrating a Cassandra instance (source) to another instance (destination) with the help of Sidecars and without bootstrap.

...

  1. First identify the destination instance. One may make changes to their configuration like topology to bring-in the destination instance into the cluster. How these configuration changes are made and applied are not in the scope of this document.
  2. Keep the Cassandra process running as usual at source. Do not start the Cassandra process on the destination but start the Sidecaron destination host/instance. Submit data copy request to destination Sidecar to pull data from source using Sidecar.
    1. During data copy, the destination Sidecar first pulls the list of files present at the source at that point of time.
    2. Then, the destination sidecar checks the list of files present locally against the list of files at the source.
      1. If a file is present at the source but not at the destination, then that file will be added to the download list.
      2. If a file is present at the destination but did does not match ( by size or timestamp) with the source file, then the local file is deleted and added to the list of files to download. It also avoids filling up the disk with un-necessary files. If the file has matched, then it will be excluded from the download listunnecessary files.
      3. For all files existing at source and destination, file digest is calculated and compared. Files for which digest does not match will be deleted and added to the list of files to download. During implementation a flag will be provided to enable it on demand so that users can enable it selectively (user may opt for disabling it during initial data copy and enable it during final data copy).
      4. Downloads the shortlisted files.
    3. While copying the data, source continues to write/update the data leading to add more SSTables or delete existing SSTables due to compaction etc... It is fine to be in this state at this stage.
    4. After downloading the list of files, the destination may not have some files or may have some files that are no longer present at the source. It may not be possible to ensure that 100% of the data at the destination matches the source as the source continues to run and makes changes. We can relax 100% matching of the data to some lower threshold to consider the data copy a successful one. This success threshold can be specified as part of the request payload. This success threshold will be tested before downloading shortlisted files. If the success threshold is met, then the data copy task is considered as success.
    5. It could be possible that, after downloading the files once, the threshold was not met. This process can be repeated multiple times to meet the threshold. The number of iterations can be specified as part of the request payload. 
      1. The first iteration is considered a zeroth iteration as the destination may not have anything at that time and the success threshold will not be met.
      2. If the threshold is met in an iteration, then the data copy operation ends with a success status.
    6. If the threshold is not met even after multiple iterations, then the data copy task fails. The operators can still continue the migration if they want, because the data will be copied again after bringing down the source.
  3. Now, bring down the Cassandra process at source, keeping the Sidecar up and running. Once the source Cassandra process goes down, we can expect that there will be no changes made to the files. How the instances are brought down is not in scope of this document.
  4. Initiate the final data copy at destination with 100% success threshold. Destination pulls a list of files, deleting deletes unnecessary files , skipping matching filesand the files whose digests are not matching, and downloading downloads the required files. Afterwards, it compares the files with the source and checks if there is a 100% match.If the time to perform the final copy is less, then the time difference between the source instance going down and the destination coming up is also less.
  5. The operator can ensure that files in the destination matches with the source. In the first iteration of this feature, an API is introduced to calculate digest for the list of file names and their lengths to identify any mismatches. It does not validate the file contents at the binary level, but, such feature can be added at a later point of time.
  6. Now, bring up the Cassandra instance at the destination. At this moment, the destination will have the same data as its source and will be equivalent to an instance coming up with a new IP address. The Sidecar at the source can now be brought down.
  7. If the destination has more number of data_file_directories compared to source, then executing nodetool relocatesstables is required.
  8. How to bring up/down Cassandra/Sidecar instances or making/applying config changes are outside the scope of this document.

Below diagram depicts the above procedure.


Image RemovedImage Added


This procedure is roughly equivalent to doing a rsync multiple times. Sidecar’s file transmission handler FileStreamHandler is pretty fast and this proposal leverages it.

...

  • Data copy task endpoints: to create, manage file downloads, monitor and cancel (if required) data copy tasks
  • List-files endpoint: to list all instance files as Sidecar URLs
  • File download/transfer endpoint: to download a single file
  • Files File digest endpoint: to calculate a digest based on file names and their sizes to identify any mismatch between source and destinationCalculates digest for single file

Security

The proposed endpoints are required only at the time performing migrations otherwise they are not enabled by default. These endpoints should be enabled on demand. There is no point in making them available when a migration is not in progress and enabling for instances for which no migration is planned. To facilitate it, a request filter will be introduced for these endpoints which will allow/deny requests. How the filter is configured goes to the implementation details.

...

  • Cassandra Sidecar has a way to transmit SSTables exposed via route /api/v1/keyspaces/:keyspace/tables/:table/snapshots/:snapshot/components/:component . It allows to transmit SSTables of a keyspace. For migrating one instance to another, we need to copy all the files and not just SSTables.
  • One may utilize utilise the "rsync" Linux tool for data transfer. However, it might not be permitted by the administrator or available in various environments such as Kubernetes or virtual instances like EC2. Enabling data transfer through a sidecar facilitates smooth instance migration.

...