Status

Current state: Under Discussion

Discussion thread: https://lists.apache.org/thread/pow83q92m666nqtwyw4m3b18nnkgj2y8

Slack: https://the-asf.slack.com/archives/CK23JSY2K/p1688662169018449

JIRA: CASSANDRA-19476 - Getting issue details... STATUS

Released: Unreleased (5.2)

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

Cassandra makes use of the JMX to expose the management commands, such as taking a snapshot operation of specified keyspaces, with local JMX access enabled by default and remote access enabled by configuration when it's needed. JMX is used as the transport layer for command execution by most of the management tools, such as the built-in command line tool, the `nodetool` that ships with the Cassandra itself, or the Cassandra Sidecar a standalone JVM process that runs alongside the Cassandra server daemon and is used for the configuration management and/or metrics exposure.

This means that the extensive management API that Cassandra has is tied to the JMX MBeans themselves, and this approach has the following drawbacks, some of which are related to JMX itself:

  • JMX Security Concerns: JMX can be a security risk if not properly secured, as it can potentially allow remote code execution, e.g. CASSANDRA-17921;
  • JMX Operational Complexity: Managing a large number of MBeans is complex and cumbersome, and while we can't get rid of MBeans now for the backwards compatibility, it is possible to avoid extending it every time a new command is needed;
  • JMX Classpath Awareness: Any JMX client needs to be aware of Cassandra’s classpath because the standard JMX clients don't have certain Cassandra-specific class types, or we have to pay additional attention to a management command implementation to prevent exposing Cassandra-specific types, we encounter such problems from time to time, e.g. CASSANDRA-17668;
  • Absence Management Command Metadata: The Cassandra server daemon must provide all command metadata to create a new command execution query at runtime, including but not limited to command arguments, command argument options, command description, etc. Typically, this metadata should be promoted by the server itself, rather than implemented on the `nodetool` and sidecar sides;
  • Maintenance Cost: Cassandra ships built-in command line tools for accessing and administering Cassandra: cqlsh and nodetool. The `nodetool` is used to execute Cassandra management commands and is implemented as a wrapper around the JMX client by using the now obsolete Airline library. By moving towards running management commands with the `cqlsh` we can reduce the overhead of maintaining the `nodetool`.

There is no intention to fix everything in an initial pull request, for reasons of backwards compatibility. However, once the necessary steps have been taken to support command execution via CQL will be completed, the adopted deprecation policy will be applied.

Audience

  • Cassandra operators who want to configure the database and/or run adopted cluster commands;
  • Cassandra developers who are involved across the ecosystem building tools for cluster management;

Goals

  • To provide a way to run predefined management commands via CQL;
  • To provide a mechanism for retrieving command definitions and metadata via CQL;
  • To provide information on all available management commands via virtual tables;
  • To provide a registry that stores all C* commands and their metadata accordingly;
  • To internal instrumentation and a reasonable plan for migrating cluster management from JMX to CQL, taking into account backward compatibility and adopted deprecation policies;

Non-Goals

Considering how much of the world's infrastructure relies on our current JMX interfaces to orchestrate C* clusters, there is no appetite to update or remove existing MBeans as well as CLI tools currently in use. This proposal doesn't also imply that JMX will be deprecated once the design is fully implemented, but rather it moves the state of the project toward the point where such deprecation might be possible.

Proposed Changes

The proposed design describes all the changes that are needed to support command execution via CQL from a high-level component perspective, changes that are necessary to understand the solution design, and only incorporates public API changes, leaving internal classes and interfaces undescribed to allow room for manoeuvre during the implementation.

Basic Requirements

Requirements that are necessary to support the execution of management commands through CQL:

  • Asynchronous execution, preferably with a UUID for each execution to track the execution result;
  • A single command registry, to provide generic command metadata to the API consumers such as CLI tools, or other ecosystem products. This concept is borrowed from the Dropwizard metrics registry and shares the same idea;
  • A dedicated admin port with the native protocol behind it, allowing only admin commands, to address the concerns when the native protocol is disabled in certain circumstances e.g. the disablebinary command is executed;
  • Management via CQL must be developed alongside existing C* management API such as statically registered JMX MBeans, to ensure backward compatibility for ecosystem products that rely on it. However, this doesn't mean that new commands have to be implemented twice when they are required;
  • Reduce the cost of implementing a new command, meaning that once a new command is registered in the command registry, it is automatically available through all of the public interfaces we support e.g. JMX, CQL. There is no need to implement a new command for the CLI tools, such as the nodetool, or the corresponding MBeans, a new command must be available there out of the box.

Cassandra Solutions

Currently, there are a few ways to perform management operations:

  • JMX MBeans - a standard way of performing C* management operations that expose an interface consisting of a set of readable or writable attributes, or both.
  • nodetool - is a command line utility that ships with C*. It is an essential tool for managing and monitoring a Cassandra cluster;
  • Cassandra Sidecar - a JVM process, distinct and separate in its lifecycle from the C* server process. This sidecar is designed to provide additional functionalities such as health checks, the execution of bulk commands etc. These services are accessible through a REST API, which internally leverages a jmxClient as its transport layer [1].
  • k8ssandra management API - provides a REST API for Cassandra nodes, allowing communication via Unix socket or HTTP(S) with optional TLS client authentication on the local machine. It integrates a Cassandra java-driver client that runs over the configured Unix socket referenced by the C* pid file, and an implementation of the java agent that intercepts and processes CQL queries, translating them into corresponding JMX MBean method calls. This API simplifies cluster management and monitoring by providing RESTful access to management commands and detailed metrics [2].

Other Vendor Solutions

How other vendors have solved the same problem:

  • Apache Ignite offers a variety of management interfaces, among which the REST API as a database plugin and dynamically generated JMX MBeans. However, the preferred tool for many is the CLI tool, which acts as a wrapper over a standard thin client. This CLI connects to the cluster through a special admin port, typically set to 11211, to perform its management tasks, better known as compute tasks, across the nodes in the cluster. These compute tasks are part of the standard Ignite API for users [3].
  • Apache HBase uses Google’s protobuf to describe the RPC interfaces they expose to clients, for example, the Admin and Connection interfaces. The Admin and Connection interfaces in HBase don’t require a dedicated admin port to initiate a connection for cluster management. These classes utilize the client API to communicate with the HBase cluster over the same ports that are used for regular client access to HBase services [4].

Command Specifications

At present, Apache Cassandra doesn't advocate a unified approach to the specification of management commands that are promoted directly by the C* node itself. The heart of these management commands lies in the JMX MBeans. While each MBean provides operations on the same keyspaces and tables, they differ subtly in aspects such as command parameter names and their order. Additionally, MBeans do not provide fine-grained command metadata to a user. This limitation causes command-line interfaces and other tools that depend on the JMX API to create their layer of organizing MBean operations, adding an extra layer of grouping to the process. 

For example, the same operations on a keyspace are represented differently for the user's public API:

  • StorageServiceMBean exposes
    • forceKeyspaceCompactionForTokenRange - run compaction by token range;
    • forceKeyspaceCompactionForPartitionKey - run compaction by given partition key;
    • forceKeyspaceCompaction - run compaction on given table names;
  • NodeTool CLI tool combines these operations into a single  `compact` command and operates based on the command's input arguments provided by a user;
  • k8ssandra management API exposes the command in much the same way as the nodetool CLI, combining the listed MBeand methods under the single `compact`, adding small variations of the input arguments;

To solve this problem we can use the same design approach that has been adopted by the Dropwizard Metrics library for metrics [5], choosing the right granularity when creating command metadata. This means that all of the commands that are executed across the same keyspaces must have a common denominator of the command arguments and the order of the arguments.

Command Registry

The starting point for management operations is the CommandRegistry (or OperationManager), which is a collection of all commands or subcommands for C* management commands.

CommandRegistry
public interface CommandRegistry<A, R> extends Command<A, R>
{
    public Command<?, ?> command(String name);
    public Iterator<Map.Entry<String, Command<?, ?>>> commands();
}

Command API

The nodetool CLI uses the Airline annotation-based framework [6] to parse input arguments and execute management commands from the command line; these annotations already contain all the necessary command metadata including input arguments, their descriptions, and general command details. A significant limitation, however, is that the metadata is embedded in the CLI tool itself, making it inaccessible on the C* server node. As a result, the metadata can't be shared with other API consumers involved in management operations. Direct migration of all CLI commands to the CommandRegistry is not feasible as well. This is not only due to the obsolescence of the Airline library, but also because it lacks the necessary abstractions to support a transparent and aligned reflection of available commands in the CLI, JMX, and REST API (represented as k8ssandra management API project) that we have, and thus such a reflection requires a more narrow approach to ensure consistency and compatibility across different management interfaces.

Therefore, the Command API might look like this:

Command<A, R>
public interface Command<A, R>
{
    public String description();

    public Class<? extends A> argClass();

    public R execute(A arg);

    /** Custom output required to preserve backwards compatibility with the nodetool output. */
    public default void printResult(A arg, R res, Consumer<String> printer) {}
}
CompactCommand
public CompactCommand implements Command<CompactCommandArg, Response> 
{
    public String description()
    {
        return "Force a (major) compaction on one or more tables or user-defined compaction on given SSTables";
    }

    public Class<? extends A> argClass()
    {
        return CompactCommandArg.class;
    }

    // The rest part of the class.
}
CompactCommandArg
@ArgumentGroup(value = {"userDefined", "startToken", "partitionKey"}, optional = true, oneOf = true)
public class CompactCommandArg implements Serializable 
{
  @Argument(aliases = {"s", "split-output"}, description = "Use -s to not create a single big file", optional = true)
  public final boolean splitOutput;

  @Argument(aliases = {"user-defined"}, description = "Use --user-defined to submit listed files for user-defined compaction")
  public final boolean userDefined;

  @Argument(aliases = {"st", "start-token"}, description = "Use --user-defined to submit listed files for user-defined compaction", optional = true)
  public final String startToken;

  @Argument(aliases = {"et", "end-token"}, description = "Use -et to specify a token at which compaction range ends (inclusive)", optional = true)
  public final String endToken;

  @Argument(aliases = {"partition", "partition_key"}, description = "String representation of the partition key", optional = true)
  private String partitionKey;

  @Argument(aliases = {"keyspace"}, description = "The keyspace followed by one or many tables or list of SSTable data files when using --user-defined")
  public final String keyspaceName;

  @Argument(description = "The table names to compact")
  public final List<String> tables;

  // The rest part of the class.
}

Command Registry Adapters

Once the CommandRegistry is available, some out-of-the-box adapters must be developed to achieve the goals of the proposal and make the commands available through JMX, CQL and indirectly REST API:

  1. CQL Command Adapter - The CQL invoker validates the given arguments based on the command metadata from CommandRegistry and invokes the corresponding command;
  2. Dynamic Command MBean Adapter - New dynamic JMX MBeans for management operations are generated and exposed to public API based on available command metadata for use by the nodetool. The JMX MBeans that are now statically registered are still supported. However, they are deprecated in favour of new ones;
  3. Open API Adapter - an adapter that provides dynamically generated RESTful API endpoints based on the CommandRegistry metadata as well as `openapi.json` specification; 

Commands Virtual Table

All the commands and command definitions are available by querying a corresponding new virtual table based on metadata provided by the CommandRegistry.

Management Diagram

CQL Command Syntax

The commands below become valid. While many commands in Apache Cassandra are typically run on keyspaces and tables, and adopting a CQL syntax like `EXECUTE COMMAND rebuild ON keyspace.table` seems logical, it's more practical for the initial implementation to concentrate on accepting command arguments as straightforward key-value pairs or as a JSON string. This approach simplifies the early stages of development. Subsequently, the more intuitive CQL syntax can be introduced as an alias for these commands, enhancing usability and aligning with familiar patterns.

Execute Command

Basic Syntax

EXECUTE COMMAND forcecompact 
    WITH keyspace=distributed_test_keyspace
    AND table=tbl
    AND keys=["k4", "k2", "k7"];
EXECUTE COMMAND rebuild 
	WITH keyspace=distributed_test_keyspace
    AND sourceDataCenterName=datacenter1
    AND tokens=null
    AND specificSources=null
    AND excludeLocalDatacenterNodes=true;
EXECUTE COMMAND setconcurrentcompactors 
    WITH concurrent_compactors=5;

Describe Command

DESCRIBE COMMANDS | COMMAND command_name;

Shows the output depending on the options selected:

  • List of all commands in a cluster;
  • Executable CQL information for a single command, including command attributes and command description;

Nodetool Compatibility

Once the CommandRegistry is in place, our goal is to streamline the process so that simply adding a command to the registry makes it available to the CLI. This leads to a decision point: we can either adopt a new CQL syntax for executing commands or continue utilizing JMX MBeans for managing node states. Opting for the new CQL syntax would necessitate reconfiguring the nodetool to communicate through a new administrative port. However, this change risks breaking backward compatibility for numerous existing deployments. We must consider various deployment complexities, such as potential firewall restrictions blocking the new port, particularly in scenarios involving remote node management. 

Given these considerations, maintaining the current functionality of the nodetool via internal JMX client emerges as a crucial aspect that we need to preserve. Thus to accomplish the goal, we need to change the following:

  • New dynamic JMX MBeans must be created to expose the available commands to the public API in a way that matches the corresponding CQL queries. The API that is provided by static MBeans and the CLI are too far apart.
  • The nodetool uses a newly created dynamic MBean to achieve both API alignment and API backward compatibility goals;
  • The nodetool parses the input arguments based on the command metadata it receives from the CommandRegistry;

Minimum Viable Product (MVP)

Although the scope has been described quite broadly, the minimum viable product includes the following changes:

  • The CommandRegistry is available;
  • Some commands from the nodetool are migrated to the new CommandRegistry;
  • New dynamic MBeans representing the registered commands are available;
  • Added support for a new CQL syntax that accepts a simple set of K-V parameters required for command execution;
  • Added to the nodetool a new command invoker, which is aware of new commands via the CommandRegistry;

New or Changed Public Interfaces

The new public interfaces are available:

  • The native protocol specification is updated and new CQL syntax is supported;
  • New dynamic JMX MBeans are exposed that are aligned with the CLI and CQL;
  • The `nodetool` commands are adapted based on the metadata available in the CommandRegistry, while the old arguments and argument order are still supported;

Compatibility, Deprecation, and Migration Plan

It's not expected that the proposed changes will be finalized in a single JIRA issue and pull request unless we're focusing on a minimum-viable product. It is planned to implement a deprecation policy for the statically registered MBeans over several major releases. For example, `nodetool` commands will be phased into new JMX MBeans, with each command migrated individually. Throughout this process, we'll ensure that full backward compatibility is maintained for users and the distributed tests as well.

Dynamic JMX MBeans

New dynamic MBeans are available in the public API that match the migrated `nodetool` commands and the way the same commands can be executed via CQL. These endpoints are fully compatible as the statically registered MBeans remain intact.

CQL Syntax

The proposed changes to CQL should be fully backwards compatible since it doesn’t introduce any regressions to current CQL capabilities. 

Internode Messaging

Command execution via CQL using the newly introduced syntax must be rejected until all nodes in the cluster are upgraded to the new version that supports the new syntax.

Test Plan

The nodetool commands are widely used across the distributed tests to perform operations on the node, which in turn means that running the full CI is sufficient to verify that these executions produce the same expected results. 

For the CQL commands a new custom test adapter is implemented that translates an MBean method call into the appropriate CQL syntax for the command that is being executed, so everything is tested via CQL on the same set of `nodetool` tests.

Rejected Alternatives

There are not many alternatives in this case, since the CQL is the Cassandra-specific protocol, meaning that all other possible implementations would lead to implementations that are overwhelmed by the unrelated code and would also increase the amount of change and maintenance enormously. 

Since management via CQL will be available after the changes have been implemented, the question remains whether nodetool should still use the JMX client to trigger the management operation, or whether it should be migrated to the CQL instead. The former seems to be a smoother transition in terms of usage in C*s already in use.