Open proposals and issues

A new Tcl script and templates execution model (Massimo)

I've been thinking lately of a rather radical revision of the Tcl code execution as it's done currently in mod_rivet. I haven't checked the feasibility of every detail but I don't see fundamental and forbidding obstacles to the implementation of such scheme.

Let's imagine to have mod_rivet replace the current C language machinery of script evaluation by simply Tcl_Evaluating this script

  try {
    <content-generation-script>
  } trap {RIVET ABORTPAGE} {
    <abort-script>
  } trap {RIVET THREAD_EXIT} {
    <sudden-exit-script>
  } on error {::rivet::error_info ::rivet::error_options} {
    <error-script>
  } finally {
    <after-every-script>
  }

The script could be composed in memory from configuration information, for example omitting the <abort-script> trap or the 'finally' branch if such handlers have not been set up. Also the error-script handler determination could be done during the configuration stage and the branch would be always existing al least with the default handler

The content generation script should by default be made by the concatenation of the sequence

::Rivet initialize_request
eval <before-script>
namespace eval ::request { <url-script> }
eval <after-script>

which reproduces the current execution sequence. The script referenced in the URL could be stored in the module status and in case accessed by introducing a new command, something like

 eval [::rivet::url_script]

the command should in case fetch the script from the cache. So far I can see only problems impacting the performance, as I expect the C language machinery to be faster, but in this case not dramatically faster than the 'try....finally' construct (after all most requests are successfully processed without errors. At least we hope so). Of course there are possible optimizations like chaining the before-url-after scripts in one script requiring a single call to Tcl_EvalObjEx

Perhaps I've been influenced by years of mod_rewrite usage where the local path is analyzed and the transformation output blended into the list of argument-value pairs to become the final URL query arguments. I ended up having virtual hosts running websites with a single empty root index.rvt and every template is parsed from the main application script. Furthermore the content generation script is shared between different applications and by no means resides in any directory tree pointed by DocumentRoot directives. The bottom line is that the segmentation in 3 scripts has no meaning to me (..and for what I need also the URL referenced script has no interest. But that's my approach...).

My point is that we could take a further step ahead and make the default request processing script shown above entirely replaceable by a user defined script (either based on try...on error...finally... or on any other solutions the user would deem fit for their applications). Some optimization could be attained by introducing some mechanism for building such script by inclusion of the code, not simply sourcing or parsing other files. The script could be kept in a special cache providing some methods for determining its depends on the files it was built from, in case one of them is modified requiring the buffer to be refreshed.

Karl wishlist's open points

This talk of new capabilities for a new release of Rivet has had me revisiting something I've been thinking about for quite a while...

It's really easy for developers to use forms in a way that they shouldn't, putting stuff in hidden fields, etc, that could easily be monkeyed with by an attacker. That data needs to be kept server side and accessed with an opaque, unguessable session ID cookie or equivalent.

It's also super common for developers to accept input from forms with insufficient checks on the validity of the input data, the source of SQL injection attacks and so forth.

I'm thinking of what you might call a response broker. (This isn't an original idea -- I've read about stuff like this.)

Rather than doing a load_response, the page handling the response would explicitly name the fields you expect to get from the form, you invoke a proc specifying the fields you expect to find, their data types, optional code to validate them, and an array to stuff the validated fields into.

set wantVarList {{username string} {id integer} {uid check_routine validate_uid} {password check_routine validate_password} {hash base64} {email email}}

set status [response_broker $wantVarList response]

Since every page using the response broker would need to check for response broker parse failures it would probably be nice to be able to specify a general handler routine that would run and then abort the page, removing the need to check the return.

Now if I run…

response_broker $wantVarList response

…if the page continues then the response array would contain validated fields found in the form for the variables named in wantVarList and no others. You might want the presence of unexpected fields to also blow out, but that would be likely to bite you pretty often when the reasons are harmless. Maybe log them and include a {field ignore} option that will inhibit logging for expected-but-ignored fields.

We've done a form package at FlightAware that has a specific look and feel that allows you to specify both Tcl and Javascript validation code. Of course you still need the Tcl code on the server but it's nice in the modern era to also provide Javascript validation on the browser. We could probably open source if its appearance was genericized or something, but I mention it mainly to point out the usefulness of such an approach both for the developer and the users and a likely need to push Rivet into providing more stuff to support developers making modern websites.

We use separate virtual interpreters for development (for each developer we diddle auto_path to give them source-controlled private copies of all of our packages) and it's really badass. The problem comes in the all-or-nothing approach of SVI. We set up a virtual host for each developer, for both port 80 (well really 8080 + varnish on 80) and 443. This results in 18 interpreters in each httpd process on our development machine, making for very large httpd processes and slow startup time after a graceful.

Right now if one of our developers changes a package, private to them, they still have to do an apachectl graceful to pick up the change. This restarts all of the httpd processes and reinitializes all of the interpreters. Our interpreter initialization is intense. Each FlightAware httpd process loads 468 packages.

I'd like to be able to cause only one vhost's Tcl interpreters to be reloaded by a Tcl_DeleteInterp / Tcl_CreateInterp / Rivet initialization process. Instead of a graceful, you'd be able to specify something like a trigger file for each vhost. Every time a vhost (with separate virtual interpreters) serves a page, it gets the mtime of the trigger file. If the mtime of the trigger file has changed since the last time the interpreter served a page, Rivet deletes the virtual host's interpreter, creates and initializes a new one, and then handles the page. [I tried to write this but kind of lost control of it and was not successful.]

This way, developers could totally reload all their libraries without any httpd processes being stopped or started. Also this will lower overall overhead because a lot of times a httpd process won't have ever handled a page in its lifetime for many to most of the virtual interpreters.

An additional improvement would be to create the ability to not even initialize a vhost's separate virtual interpreter until the first time it is needed.

.rvt templates run within the ::request namespace. It is a sensible practice to parse other templates or to source other scripts from an .rvt file. Being run within a ::request namespace that is going to be destroyed after the request has been served. There are still consequences on how specific Tcl commands work (e.g. 'package require ...') This has to be documented extensively in the manual (suggested by Karl)

RivetProposals (last edited 2016-09-24 16:08:42 by MassimoManghi)