2024-04-29 Meeting notes

Date

29 Apr 2024 at 9am PT

https://www.timeanddate.com/worldclock/fixedtime.html?msg=Solr+Community+Meetup%2C+April+2024&iso=20240429T09&p1=224&ah=1

meet.google.com/rju-vqau-gqt

Attendees

Alex Zhou
Aparna Suresh
Carlos Ugarte
Christopher Ball
David Smiley
Eric Pugh
Hoss
Jan Hoydahl
Jason Gerlowski
Justin Sweeney
Kevin Liang
Matthew Biscocho
Michael Gibney
Paul McArthur
Richard Lu
Stefan Vodita
Yuntong Qu

Discussion items

Topic	People	Notes
PRS & Distributed-processing (no-Overseer)	David Smiley	Perspectives on Solr 10 using both, perhaps without current non-PRS and Overseer existing
Embedded ZooKeeper Hack Day	Eric P & Jason G	Share what Jason and Eric learned about embedded ZK and an ensemble.
"StartTool.java"?	Eric P	Chatted with Christopher Ball at Haystack, and realized, we don't start Solr via our SolrCLI architecture...
Blog@solr.apache.org	Jason	Now live, open for submissions!

Notes and Action items

PRS

There was some discussion of PRS as a feature, with particular interest in its current state and whether it might be enabled by default (e.g. in 10). David and Aparna discussed some performance tests they did internally (which had partially counter-intuitive results). Justin Sweeney and Michael Gibney discussed their team's use of the feature, its benefits (does a lot to speed up rolling restarts), and its drawbacks (maybe doesn't go far enough in handling numReplicas > 1).

It was pointed out that some tests are set to run only when PRS is disabled, and that better understanding these tests would be a good first step in an effort to move PRS towards being default-ready.

Progress enabling the feature seems to be in the familiar Catch-22, wherein it's hard to be confident in the feature without adoption, but that adoption remains small while the feature is off by default. We discussed other options that might build confidence, such as setting up test jobs to facilitate a longitudinal bake off between PRS and non-PRS. (This could be done either by creating PRS and non-PRS Jenkins jobs, or via some Gradle Enterprise features to accomplish a similar end.)

Distributed Processing

A similar discussion followed for our two mechanisms for making cluster-state changes in SolrCloud: "distributed processing" vs. "traditional overseer". Much like PRS, maintaining both options in SolrCloud complicates our code and fragments our test coverage.

Some time was spent discussing why the Overseer approach (with its serialized execution and ZK queue approach) was initially chosen. There were presumably reasons for going this route initially - had those rationales and pros/cons shifted, or were there still reasons to prefer the overseer approach?Several attendees offered guesses for why overseer was chosen initially, but there wasn't a ton of context in the room on the original motivations.

There seemed to be openness to enabling this by default in 10.0 (provided users could use a sysprop to go back to "Overseer Mode")

Embedded-ZooKeeper Ensemble POC

Eric and Jason shared a "spike" (i.e. POC) they'd done during a pair programming day back in March. The spike builds on some previous "Community Meetup" discussions around providing a way for users with a small number of Solr nodes (2 to 3) to run SolrCloud without standing up their own external ZK. The spike shows embedded ZK running on multiple Solr servers, but in quorum/ensemble mode. (Previously Solr's embedded-ZK has only supported non-ensemble operation.)

There seemed to be openness to including this feature with Solr for those interested in using it, despite its lack of integration with "node roles", etc.

Using Solr's "Tool" interface to Start Solr

Solr uses a Java abstraction called a "Tool" for most of the functionality offered in the "bin/solr" scripts. This abstraction offers a number of niceties: it simplifies showing help text, encapsulates the CLI options, etc. Eric's question: could this be used for starting Solr?

The answer currently appears to be that it would require some effort, but might be do-able in the medium term. The main impediment is that what we call "starting Solr" colloquially is really starting a Jetty server that then loads the Solr app on startup. There's been some work around changing how we package and run Jetty though - particularly the work described in SIP-6

From here discussion drifted towards other ways to reduce and simplify the platform-dependent code used to start Solr, which is a huge source of duplication and bugs. Is "Windows Subsystem for Linux" mature enough that we could drop explicit Windows support entirely? Would assuming a Docker enivonment always help us simplify things here? Could we at least drop the platform-specific "include" files (i.e. solr.in.sh) or replace them with something platform independent? The last idea seemed the most promising, and the group discussed a previous nearly-completed attempt at doing this in SOLR-7871. The core idea in that ticket is sound and still relevant, and our adoption of BATs tests gives us better confidence in regression-detection - it just needs a contributed interested enough to carry the torch forward.

Blog@solr.apache.org

Months after previewing it in a previous Community Meetup, Solr's Blog is now live at solr.apache.org!

The blog is a great place to share information about conferences and community events, highlight underused features, and tell success (or failure) stories using Solr in the wild. Short posts that link out to content elsewhere (e.g. company blogs) are also welcome and encouraged. Blog posts are proposed and reviewed as PRs on the "solr-site" repo; see the walkthrough/tutorial for more details.

Have your voice heard and submit a writeup today!

Space shortcuts

Page tree

Date

Attendees

Discussion items

Notes and Action items