Solr Security

First and foremost, Solr does not concern itself with security either at the document level or the communication level. It is strongly recommended that the application server containing Solr be firewalled such the only clients with access to Solr are your own. A default/example installation of Solr allows any client with access to it to add, update, and delete documents (and of course search/read too), including access to the Solr configuration and schema files and the administrative user interface.

Besides limiting port access to the Solr server, standard Java web security can be added by tuning the container and the Solr web application configuration itself via web.xml. For example, all /update URLs could require HTTP authentication.

If there is a need to provide query access to a Solr server from the open internet, it is highly recommended to use a proxy, such as one of these.

Cross-Site Scripting (XSS)

Solr has no known cross-site scripting vulnerabilities.

Quick XSS tip:

Problem: What if you want the browser to highlight text, but you also want to protect yourself from XSS and escape the HTML output? Solution: One solution is to escape the HTML output and then reapply the em tags. Now the rest of the snippet is safe and the browser will recognize the highlighted text.

For example, with groovy/grails you could have the following in your controller:

snippet = snippet.encodeAsHTML()
snippet = snippet.replaceAll('&lt;em&gt;', '<em>')
snippet = snippet.replaceAll('&lt;/em&gt;', </em>)

Cross-Site Request Forgery (CSRF)

Even if a Solr instance is protected by good firewalls so that "bad guys" have no direct access, that instance may be at risk to potential "Cross-Site Request Forgery" based attacks if the following are all true:

  1. Some number of "good guys" have direct access to that Solr instance from their web browsers.
  2. A "bad guy" knows/guesses the host:port/path of the Solr instance (even though they can not access it directly)
  3. The bad guy can trick one of the good guy into clicking a maliciously crafted URL, or loading a webpage that contains malicious javascript.

This is because Solr's most basic behavior is to receive updates and deletes via HTTP. If you have a firewall or other security measure restricting Solr's /update handler so it only accepts connections from approved hosts/clients, but you are approved then you could inadvertently be tricked into loading a web page that initiates an HTTP Connection to Solr on your behalf.

It's important to keep this in mind when thinking about what it means to "secure" an instance of Solr (if you have not already).

A basic technique that can be used to mitigate the risk of a possible CSRF attack like this is to configure your Servlet Container so that access to paths which can modify the index (ie: /update, /update/csv, etc...) are restricted either to specific client IPs, or using HTTP Authentication.

Path Based Authentication

Path based authentication configured at the servlet container level can be used to restrict access to urls (such as /admin and /update) to only clients specifying the correct credentials.

Using path based authentication to limit certain clients to path based request handlers with "appends" and "invariants" is also a nice way to expose a subset of the documents and constraining or defaulting any request parameters.

Consider:

  <requestHandler name="/instock" class="solr.DisMaxRequestHandler" >
    <lst name="appends">
      <str name="fq">inStock:true</str>
    </lst>
    <lst name="invariants">
      <str name="facet.field">cat</str>
    </lst>
  </requestHandler>

Any queries into /instock, such as /instock?q=ipod, will always be limited to documents with an indexed inStock field containing a value of "true", and all responses will include facet counts for the "cat" field.

/!\ NOTE: Solr provides access to request handlers through a general purpose /select?qt=request_handler_name URL. Prior to Solr1.4 (via SOLR-1233), request handlers named with a leading forward-slash like /select?qt=/request_handler_name could not be used, but had to be requested using /request_handler_name. Solr1.4 removed the forward-slash restriction and allows /select to work with any request handler name. Externally blocking access to /select is recommended in environments where only path-based access to request handlers is warranted.

When using patch based authentication, you will most likely want to configure your HTTP client code to use Preemptive Authentication. With SOLR-4470 preemptive authentication will be used by default in SolrJ clints, but it can be turned off on a per-request basis by doing the following to your SolrRequest before using it

solrRequest.setPreemptiveAuthentication(false);

With SOLR-4470 preemptive authentication works for POST requests - it did not before.

Common servlet container example

Add this inside <web-app ..> in WEB-INF/web.xml inside solr.war (or for Jetty to /example/etc/webdefault.xml file)

  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Solr authenticated application</web-resource-name>
      <url-pattern>/core1/*</url-pattern>
    </web-resource-collection>
    <auth-constraint>
      <role-name>core1-role</role-name>
    </auth-constraint>
  </security-constraint>

  <login-config>
    <auth-method>BASIC</auth-method>
    <realm-name>Test Realm</realm-name>
  </login-config>

This is standardized for all servlet containers and will set up security with the following properties

Unfortunately it has not been standardized how to actually set up a "realm" in a servlet container, so its different from servlet container to servlet container.

Jetty realm example

Edit jetty.xml

In Jetty 6 Add / uncomment this section in /example/etc/jetty.xml

    <Set name="UserRealms">
      <Array type="org.mortbay.jetty.security.UserRealm">
        <Item>
          <New class="org.mortbay.jetty.security.HashUserRealm">
            <Set name="name">Test Realm</Set>
            <Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
          </New>
        </Item>
      </Array>
    </Set>

In Jetty 8 (which ships with Solr 4) you add this to /example/etc/jetty.xml

<Call name="addBean">
      <Arg>
        <New class="org.eclipse.jetty.security.HashLoginService">
          <Set name="name">Test Realm</Set>
          <Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
          <Set name="refreshInterval">0</Set>
        </New>
      </Arg>
    </Call>

/example/etc/realm.properties

guest: guest, core1-role

Of course you can configure other Realm/LoginService-implementations than HashUserRealm/HashLoginService

Security for inter-solr-node requests

Without SOLR-4470, in a cluster/cloud/distributed installation with several solr-nodes communicating internally, in practice you would not be able to add security constraint on most paths, because inter-solr-node communication (potentially) involves requests to most paths, and solr-nodes where not able to provide credentials in those internal solr-node-to-solr-node requests. SOLR-4470 introduces this ability. Basically there are two strategies for credentials to be used in solr-node-to-solr-node requests

Using 1) is the most correct way, because it prevent request from "the outside" to indirectly trigger successful inter-solr-node sub-requests that the user sending the "outside request" is not allowed to perform directly himself. Therefore 1) should be used whenever feasible. But there are cases where inter-solr-node requests are not a direct reaction to requests coming from "the outside" - e.g. requests around shard-synchronization. In such cases 2) will have to be used - you have no other credentials to use, and you certainly (potentially) want to be able to protect e.g. shard-synchronization paths. There are also border-cases like e.g. when a request from "the outside" triggers inter-solr-node requests to be issued asynchronously - e.g. when a request from "the outside" to the Collections API makes the Overseer send inter-solr-node request to the CoreAdmin API. In such case 1) ought to be used, but that would require to persist the credentials from the original request and reuse them when sub-request to CoreAdmin API is eventually performed. The current implementation in SOLR-4470 just uses 2) in such cases.

With SOLR-4470 only "basic http authentication" is supported, even though the code has been prepared for also supporting other kinds of authentication.

Default is not to include credentials in solr-node-to-solr-node requests. You will have to activate it. You do that by adding the following to solr.xml

  <security>
    <interSolrNodeRequestAuthCredentialsProviderFactories>
      <directSubRequest>
        <str name="class">... fully qualified name of a class implementing org.apache.solr.security.InterSolrNodeAuthCredentialsFactory.SubRequestFactory - providing credentials for inter-solr-node requests of type 1) ...</str>
      </directSubRequest>
      <internalRequest>
        <str name="class">... fully qualified name of a class implementing org.apache.solr.security.InterSolrNodeAuthCredentialsFactory.InternalRequestFactory - providing credentials for inter-solr-node requests of type 2) ...</str>
      </internalRequest>
    </interSolrNodeRequestAuthCredentialsProviderFactories>
  </security>

org.apache.solr.security.InterSolrNodeAuthCredentialsFactory.SubRequestFactory

  public static interface SubRequestFactory {
    AuthCredentials getFromOuterRequest(SolrQueryRequest outerRequest);
  }

Out-of-the-box Solr comes with one implementation that you can just use - org.apache.solr.security.UseSuperRequestAuthCredentialsSubRequestFactory. The credentials it provides for the sub-inter-solr-node-requests will be a copy of the credentials provided in the outer "super"-request.

org.apache.solr.security.InterSolrNodeAuthCredentialsFactory.InternalRequestFactory

  public static interface InternalRequestFactory {
    AuthCredentials getInternalAuthCredentials();
  }

Out-of-the-box Solr comes with one implementation that you can just use - org.apache.solr.security.SystemPropertiesAuthCredentialsInternalRequestFactory. The credentials it provides for the inter-solr-node-requests will include http basic authentication with username taken from system property "internalAuthCredentialsBasicAuthUsername" and password taken from system property "internalAuthCredentialsBasicAuthPassword".

java -jar ... -DinternalAuthCredentialsBasicAuthUsername=<username> -DinternalAuthCredentialsBasicAuthPassword=<password> ... start.jar

You can write you own implementations of the interfaces and configure Solr to use them in solr.xml. Make sure the jar containing your implementations are present on Solr classpath (<str name="sharedLib">...</str> in solr.xml)

Providing credentials in requests from your own client

Basically "basic http authentication" requires you to add a header on the following form to your request

Authorization:Basic <base64 encoding of "<username>:<password>">

Fortunately misc tools help you do this

SolrJ

Before SOLR-4470 SolrJ only supports setting up credentials on SolrServer/HttpClient level - credentials being used for all requests issued through this SolrServer/HttpClient. Basically you need to get hold of the HttpClient used and set it up. E.g. for a CloudSolrServer instance

HttpClientUtil.setBasicAuth(cloudSolrServer.getLbServer().getHttpClient(), <username>, <password>);

With SOLR-4470 this has changed a little, because a class AuthCredentials encapsulating the abstract concept of credentials was introduced

HttpClientUtil.setAuthCredentials(cloudSolrServer.getLbServer().getHttpClient(), AuthCredentials.createBasicAuthCredentials(<username>, <password>));

With SOLR-4470 you are also able to provide credentials on request level, so that you can use different credentials in different requests issued through one and the same SolrServer/HttpClient. Just set the credentials on the SolrRequest before using it (it will override credentials on SolrServer/HttpClient level if set)

solrRequest.setAuthCredentials(AuthCredentials.createBasicAuthCredentials(<username>, <password>));

or use one of the helper-methods on SolrServer taking a AuthCredentials object as its last argument. E.g.

mySolrServer.add(docs, commitWithinMs, AuthCredentials.createBasicAuthCredentials(<username>, <password>))

With SOLR-4470 you have a final option. Namely overriding method manipulateRequest on your SolrServer. manipulateRequest will be called for every request you fire with this SolrServer and gives you a nice place to implement general credentials rules. E.g.

  CloudSolrServer mySolrServer = new CloudSolrServer(...) {
      @Override
      protected void manipulateRequest( final SolrRequest request ) {
        if (request.getAuthCredentials() == null) {
          String path = request.getPath();
          AuthCredentials credentials = AuthCredentials.createBasicAuthCredentials(<my default username>, <my default password>); 
          if (path.contains("/update")) {
            credentials = AuthCredentials.createBasicAuthCredentials(<my update username>, <my update password>);
          }
          request.setAuthCredentials(credentials);
        }
      };
    };

curl and base64 on linux

Believe something like this will work

BASE64_CREDENTIALS=$(echo -n "<username>:<password>" | base64)
curl -i --header "Authorization:Basic ${BASE64_CREDENTIALS}" <url>

javascript (using jquery and jquery-base64)

This shows how to base64 encode the "<username>:<password>" with jquery and jquery-base64. How to easily include the credentials in actual requests issued by jquery is TBD

<html>                                                                  
    <head>                                                            
        <script type="text/javascript" src="http://code.jquery.com/jquery-x.y.z.js"></script>
        <script type="text/javascript" src="https://raw.github.com/carlo/jquery-base64/master/jquery.base64.js"></script>
        <script type="text/javascript">
            $(document).ready(function() {
                alert( $.base64.encode( "<username>:<password>" ) );
            });
        </script>
    </head>
    <body>
        Javascript base64 encoding test
    </body>
</html>

You might want to download jquery-x.y.z.js and jquery.base64.js to be provided by your own web-application, instead of depending on them always being available from code.jquery.com and github.com :-)

Security in Solr on per-operation basis

Due to limitations on "url-pattern"'s in web.xml and the structure of URLs in Solr, it is hard to set up path based authentication on per-type-of-operation basis

Those facts makes it easy to set up path based authentication on per-collection/core basis. E.g. url-pattern "/core1/*" matchs all operations on core1. On the other hand, it makes it hard to set up path based authentication on operation basis. Lets say you want a url-pattern matching "updates" but across all cores/collections, you cannot just use url-pattern "/solr/*/update", "*/update" or "*update" - its not allowed in url-patterns. Different servlet containers provide different solutions to this problem.

You can also implement you own authorization-filter and let that deal with the authorization part - only let the container deal with the authentication. E.g. check out Solr code and find org.apache.solr.servlet.security.RegExpAuthorizationFilter in test-framework project, make a copy com.mycompany.security.RegExpAuthorizationFilter and set things up like this in WEB-INF/web.xml

  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Solr authenticated application</web-resource-name>
      <url-pattern>/*</url-pattern>  <!-- the container requires authentication for any url -->
    </web-resource-collection>
    <auth-constraint>
      <role-name>*</role-name> <!-- the container authorizes any authenticated user to do anything -->
    </auth-constraint>
  </security-constraint>

  <login-config>
    <auth-method>BASIC</auth-method>
    <realm-name>Test Realm</realm-name>
  </login-config>

In practice you probably want to replace the "<role-name>*</role-name>" line with several lines on the form "<role-name>some_concrete_role</role-name>" - one for each actual role your realm will ever "talk about". This is because some servlet containers (including jetty v8) do not work properly with *.

Now set up RegExpAuthorizationFilter to do the authorization. Insert this filter AS THE FIRST filter in the WEB-INF/web.xml inside solr.war.

<filter>
  <filter-name>RegExpAuthorizationFilter</filter-name>
  <filter-class>com.mycompany.security.RegExpAuthorizationFilter</filter-class>
  <init-param>
    <param-name>search-constraint</param-name>
    <param-value>1|update-role,admin-role|^.*/update$</param-value>
  </init-param>
  <init-param>
    <param-name>admin-constraint</param-name>
    <param-value>2|admin-role|^.*$</param-value>
  </init-param>
</filter>

<filter-mapping>
  <filter-name>RegExpAuthorizationFilter</filter-name>
  <url-pattern>/*</url-pattern>
</filter-mapping>

The RegExpAuthorizationFilter verifies authorization by matching paths against patterns - but support regular expression patterns. The patterns and corresponding "allowed roles" are provided to RegExpAuthorizationFilter using init-params. You provide an init-param for every "rule" you want to set up. Each init-param has to have a value on the from "<order>|<comma-separated-roles>|<path-regular-expression>" where

RegExpAuthorizationFilter iterates "rules" in the given order, matches the request-path against its path-regular-expression. If no match continues to next "rule", if match the next "rule" is never considered. If no "rules" match the request is allowed to proceed - it passed authorization so to speak. In case of a match the authenticated user will be matched against the roles in comma-separated-roles and only allowed access in case he is in one of the roles mentioned. In case he is not the filter will return a response with status-code 403 "Unauthorized".

RegExpAuthorizationFilter is not a supported part of Solr, so use it at your own risk

Resin example

See resin-security and auth-constraint

Here is an example showing how to force login for /update and /admin

      <web-app 
        id="/solr" 
        document-directory="/path/to/where/it/gets/exploded"
        archive-path="/path/to/solr.war"
        character-encoding="utf-8">
            
       <system-property solr.solr.home="/path/to/solr/data" />
      
       <authenticator type="com.caucho.server.security.XmlAuthenticator">
          <init>
            <user>yourusername:yourpassword:user,admin</user>
            <password-digest>none</password-digest>
          </init>
        </authenticator>
       <security-constraint url-pattern='/update/*' role-name='user'/>
       <security-constraint url-pattern='/admin/*' role-name='user'/>
       
     </web-app>

Document Level Security

Manifold CF (Connector Framework)

One way to add document level security to your search is through Apache ManifoldCF. ManifoldCF "defines a security model for target repositories that permits them to enforce source-repository security policies".

It works by adding security tokens from the source repositories as metadata on the indexed documents. Then, at query time, a Search Component adds a filter to all queries, matching only documents the logged-in user is allowed to see. ManifoldCF supports AD security out of the box.

Write Your Own RequestHandler or SearchComponent

*Stub - this is incomplete*

If ManifoldCF does not solve your need, first consider writing a ManifoldCF plugin. Or roll your own.

If you need permission based authentication -- where user A can update document 1 and 2, but not 3 -- you will need to augment the request with user information. Either you can add parameters to the query string (?u=XXX&p=YYY) or use a custom dispatcher filter that augments the context:

public class CustomDispatchFilter extends SolrDispatchFilter 
{
  @Override
  protected void execute( HttpServletRequest req, SolrRequestHandler handler, SolrQueryRequest sreq, SolrQueryResponse rsp) 
  {
    // perhaps the whole request
    sreq.getContext().put( "HttpServletRequest", req );

    // or maybe just the user
    sreq.getContext().put( "user", req.getRemoteUser());

    core.execute( handler, sreq, rsp );
  }
}


public class AuthenticatingHandler extends RequestHandlerBase 
{
  @Override
  public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
    
    HttpServletRequest httpreq = (HttpServletRequest)
      req.getContext().get( "HttpServletRequest" );
    
    if( httpreq.isUserInRole( "editor" ) ) {
      ...
    }

    String user = (String)req.getContext().get( "user" );
    ...
  }
  ...
}

Streaming Consideration

If streaming is enabled, you need to make sure Solr is as secure as it needs to be. When streaming is enabled, the parameters "stream.url" will go to a remote site and download the content. Likewise, "stream.file" will read a file on disk.

Streaming is disabled by default and is configured from solrconfig.xml

  <requestParsers enableRemoteStreaming="false" ... />

Web Server Level Security

Tomcat Remote Address Valve

You can limit access to server based on ip address by putting the following in server.xml

<Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="127.0.0.1″/>

ZooKeeper security

* Protecting content: Per Steffensen/ZooKeeper protecting content

SolrSecurity (last edited 2014-02-19 13:26:20 by 87)