Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Wiki Markup
The Grid Agents are Java processes installed and running on each machine that listen (on a configured port) for commands from a Grid Manager application (Web or  CLI) and act upon them by interacting with the local Tomcat instances. No  encryption is envisioned on the network channels, since all these machines are  considered to be installed on a secured network segment (at least in prod).  \[Maybe we should reconsider this\]

markt: I think this needs to be reconsidered.

vlad: Sure. My take was to use unencrypted connections, since it's fast to implement for me and I don't know well how to implement a secured connection :/ . Maybe we could even offer both options. Anyway, why do you consider we need an secured connection? In my experience, I've never needed them in dev/state/QA/test, and in prod the environment are isolated. Well... I guess there are other scenarios I haven't been exposed to.

...

Collectively, Tomcat instances and Grid Agents are "services" since both can be managed.

markt: Managing the agents strikes me as making this significantly more complex. Operating systems have tools to ensure particular services are running and are restarted if they fail. What is the benefit of pulling this into this tool?

vlad: I agree that agents can run as daemons and be up all the time. However, I've seen many "robust" programs to have memory leaks and that for one reason or another stop working normally after some time. Maybe after a couple of days, or weeks, or months. Maybe it would be useful to restart them just before a critical operation such as a deployment. I agree that managing agent is more expensive since it requires the development and maintenance of multiple OS-dependent implementations.

...

Considering all the above, the following phases could be considered as a base line for the road map of the Tomcat Grid.

markt: I'd like to see a little more high level architecture to steer development. For example, is there a common core of management functionality that we then wrap with a web-based GUI and a CLI? That would allow others to write other wrappers to plug this into other tools.

vlad: I would like to provide as much detail as you need, but I'm afraid I still I don't quite understand exactly what you do you refer as the architecture. Anyway, below is a general overview of the software modules and their responsibilities.

...

In particular, no Tomcat instances or Grid Agents automatic provisioning is considered, no configuration GUI (only pre-configured XML config files), no WAR deployments, no command-line interface, no complex grid operations, no secondary managers, and no collections.

markt: This raises another architectural question. Wouldn't this be more scalable if agents were configured with the location of the primary manager and registered themselves? The manager could persist that registration so an agent would have to be explicitly removed if it was taken off-line permanently.

vlad: Definitively. A basic agent install, and registration could "summon" all the necessary software from the primary manager: i.e. Tomcat executables, grid configuration, etc. I thought about something like this as an advanced feature (phase 12), but we can rearrange the priorities if needed.

Included features are:

  1. The Web Grid Manager presents a Web interface that shows information of the whole Grid and present simple buttons to operate the Tomcat instances.
  2. The managing logic must be clearly separated from the Web interface logic, since later on, a Command-Line Grid Manager will be included, and will use the same managing logic.
  3. The available commands for each instance are:
    • status: retrieves the status of a Tomcat instance through the corresponding Grid Agent
    • trigger-start: sends a start request to the Tomcat instance using the corresponding Grid Agent
    • trigger-stop: sends a stop request to the Tomcat instance using the corresponding Grid Agent
    • trigger-kill: sends a kill request to the Tomcat instance using the corresponding Grid Agent
      • markt: A small thing. I think I'd prefer start/stop/kill/
      • vlad: I added these commands on phase 7 and, as I see them, they behave are a little bit different from the trigger ones, specially when we refer to the CLI manager. The trigger-start, issues the signal to the Grid Agent and ends. The start keeps on working (and updating the user) until the Tomcat instance is actually up (or fails to start up). On the Web interface the trigger start could show up as a simple icon (omitting its name). On the CLI I see the start command as far more useful than the trigger-start.
  4. Wiki Markup
    A simple 
    Wiki Markup
    A simple configuration file lists all the machines and their instances so the Grid knows where each instance resides. \[This configuration file is probably in XML format\]
    \\
  5. Grid Agents are installed on each machine and manage all instances in that machine pertaining to the Grid. Grid Agents receive commands from any manager and act accordingly. To manage the instances the Agents use:
    • Shell calls: start an instance, kill an instance.
    • JMX calls to retrieve instance live information.
    • JMX calls to change instance live values, and to request instance shutdown.
    • OS calls for any OS related need.
  6. It's assumed that a port will be accessible from each Grid Manager to each machine where the Grid Agents are serving. The firewall, if present must allow active server-type sockets on that port.
    • markt: Another architectural question. Which end opens the connection, does it stay open and which protocol is used? For example, agents connect to Manager via WebSocket.
    • vlad: I always considered the Grid Agent would would open a server NIO socket, to avoid using ephemereal ports. The Grid Agent is always listening, and the Managers connect when needed.
  7. Multiple Grids (and Grid Agents) can be running on the same set (or subset) of machines. If that's the case, Tomcat instances, and Grid Agents run on different ports for each grid. When multiple grids use the same machines they don't interfere with each other and can be operated simultaneously.
  8. The status command shows the following information for each instance:
    • Machine
    • Service (a unique grid-wide name for each instance)
    • State
  9. The state of an instance can be:
    • Wiki Markup
      *Active*: the instance OS process exists, the instance is serving requests, and it looks healthy \[enough\].
      \\
    • Wiki Markup
      *Zoetic* \[for lack of a better word\]: the instance OS process exists, but the instance is unresponsive and it doesn't respond to requests for state. It's probably not serving any HTTP requests, does not look healthy, it may be starting, it may be shutting down, it may be overwhelmed. Who knows.
      \\
    • Stopped: the instance OS process does not exist, and therefore the instance is not operating at all.
    • Not Available: This is a pseudo state that the manager applications (web and cli) show when a Grid Agent does not respond to requests for status in a timely manner.
    • If possible it would be great to discern different sub cases of the Zoetic state, so to help the user to determine what's going on and tackle the case accordingly:
      • Starting: The Tomcat instance process exists, and the instance is starting. It's not yet serving HTTP requests.
      • Stopping: The Tomcat instance process exists, and the instance is stopping. It's no longer serving HTTP requests.
      • Unresponsive: The Tomcat instance process exists, but the instance health isn't good, it's not responding to HTTP requests, or it's overwhelmed. It's not even responding to requests for status. This state is different from "Not Available" since in this case the Grid Agent IS active and responding, but the Tomcat instance itself is unresponsive.
      • On second thoughts, these extra states can actually be discerned today with the current version of Tomcat, since the Grid Agents know all the trigger commands each local instance has received and can deduce (or make up) the sub case. If the Grid Agent is restarted, some kind of persistence of its state might be needed to "remember" what was going on before the Grid Agent was shut down, so to make an educated guess.
  10. Grid Agents communicate over unsecured TCP sockets, and assume communication security is enforced by the network architecture (segregated segments/VLans).
  11. The "trigger"-type commands just deliver the corresponding signal to the instance's Grid Agent and returns right away, without waiting for the full operation to complete. It's kind of "fire and forget". The web user can keep on refreshing the the web interface to find out about the progress of the status of the Tomcat instances.
  12. Wiki Markup
    Simple user name/password authentication is implemented to secure the Web interface. \[Maybe we'll need to provide more options\]
    \\

...