Differences between revisions 37 and 38
Revision 37 as of 2015-07-20 13:08:38
Size: 25092
Editor: ShawnHeisey
Comment: Updated jetty paths for Solr 5.x with notes for older versions.
Revision 38 as of 2015-08-28 09:22:48
Size: 6070
Editor: JanHoydahl
Comment: Deleted a lot of outdated content and added pointer to RefGuide
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
First and foremost, Solr does not concern itself with security either at the document level or the communication level. It is strongly recommended that the application server containing Solr be firewalled such the only clients with access to Solr are your own. A default/example installation of Solr allows any client with access to it to add, update, and delete documents (and of course search/read too), including access to the Solr configuration and schema files and the administrative user interface. The authoritative guide on security is in the [[https://cwiki.apache.org/confluence/display/solr/Securing+Solr|Reference Guide]]. The rest of this page is tips and tricks beyond what is mentioned in the Refguide.
Line 5: Line 5:
Besides limiting port access to the Solr server, standard Java web security can be added by tuning the container and the Solr web application configuration itself via web.xml. For example, all /update URLs could require HTTP authentication. <<TableOfContents>>

== Current state of affairs ==

 * SSL support was added in version 4.2 (SolrCloud v4.7).
 * Protection of Zookeeper content through ACLs was added in version 5.0
 * Authentication and Authorization plugin support was added in 5.2 (SolrCloud only)
 * Basic Auth & Kerberos plugins and Rule-based Authorization plugin was added in 5.3

There is (as of 5.3) no role-based restrictions on the Admin UI, so be aware that anyone with access to Admin UI will be able to do '''anything''' with your system.

== Need for firewall ==

Even though you add SSL or Authentication plugins, it is still strongly recommended that the application server containing Solr be firewalled such the only clients with access to Solr are your own. A default/example installation of Solr allows any client with access to it to add, update, and delete documents (and of course search/read too), including access to the Solr configuration and schema files and the administrative user interface.
Line 9: Line 22:
<<TableOfContents>>
Line 38: Line 50:


== Path Based Authentication ==

Path based authentication configured at the servlet container level can be used to restrict access to urls (such as /admin and /update) to only clients specifying the correct credentials.

Using path based authentication to limit certain clients to path based request handlers with "appends" and "invariants" is also a nice way to expose a subset of the documents and constraining or defaulting any request parameters.

Consider:

{{{
  <requestHandler name="/instock" class="solr.DisMaxRequestHandler" >
    <lst name="appends">
      <str name="fq">inStock:true</str>
    </lst>
    <lst name="invariants">
      <str name="facet.field">cat</str>
    </lst>
  </requestHandler>
}}}

Any queries into /instock, such as /instock?q=ipod, will always be limited to documents with an indexed inStock field containing a value of "true", and all responses will include facet counts for the "cat" field.

/!\ NOTE: Solr provides access to request handlers through a general purpose /select?qt=request_handler_name URL. Prior to [[Solr1.4]] (via SOLR-1233), request handlers named with a leading forward-slash like /select?qt=/request_handler_name could not be used, but had to be requested using /request_handler_name. [[Solr1.4]] removed the forward-slash restriction and allows /select to work with any request handler name. Externally blocking access to /select is recommended in environments where only path-based access to request handlers is warranted.

When using patch based authentication, you will most likely want to configure your HTTP client code to use [[http://hc.apache.org/httpclient-3.x/authentication.html#Preemptive_Authentication|Preemptive Authentication]]. With [[https://issues.apache.org/jira/browse/SOLR-4470|SOLR-4470]] preemptive authentication will be used by default in SolrJ clints, but it can be turned off on a per-request basis by doing the following to your SolrRequest before using it
{{{
solrRequest.setPreemptiveAuthentication(false);
}}}
With [[https://issues.apache.org/jira/browse/SOLR-4470|SOLR-4470]] preemptive authentication works for POST requests - it did not before.

=== Common servlet container example ===

Add this inside <web-app ..> in WEB-INF/web.xml inside solr.war. If using the Jetty included in Solr, you can add this to server/etc/webdefault.xml (example/etc/webdefault.xml for Solr 4.x and earlier)
{{{
  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Solr authenticated application</web-resource-name>
      <url-pattern>/core1/*</url-pattern>
    </web-resource-collection>
    <auth-constraint>
      <role-name>core1-role</role-name>
    </auth-constraint>
  </security-constraint>

  <login-config>
    <auth-method>BASIC</auth-method>
    <realm-name>Test Realm</realm-name>
  </login-config>
}}}

This is standardized for all servlet containers and will set up security with the following properties
 * Authentication: You have to authenticate to access any path starting with "/core1/". Other paths can be accessed without authenticating. Authentication will have to be performed against a "realm" called "Test Realm". The "realm" will verify credentials
 * Authorization: The configuration above also states that you will actually have to be part of the "role" "core1-role" in order to be authorized for "/core1/" paths. The "realm" will also, in case credentials where verified successfully, provide a set of "roles" that the authenticated user is part of.

Unfortunately it has not been standardized how to actually set up a "realm" in a servlet container, so its different from servlet container to servlet container.

=== Jetty realm example ===

Edit jetty.xml

In Jetty 6 (Solr 3.x and older) add/uncomment this section in example/etc/jetty.xml
{{{
    <Set name="UserRealms">
      <Array type="org.mortbay.jetty.security.UserRealm">
        <Item>
          <New class="org.mortbay.jetty.security.HashUserRealm">
            <Set name="name">Test Realm</Set>
            <Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
          </New>
        </Item>
      </Array>
    </Set>
}}}

In Jetty 8 (which ships with Solr 4) you add this to server/etc/jetty.xml (example/etc/jetty.xml in Solr 4.x and older)
{{{
<Call name="addBean">
      <Arg>
        <New class="org.eclipse.jetty.security.HashLoginService">
          <Set name="name">Test Realm</Set>
          <Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
          <Set name="refreshInterval">0</Set>
        </New>
      </Arg>
    </Call>
}}}

server/etc/realm.properties (example/etc/realm.properties in Solr 4.x and older)
{{{
guest: guest, core1-role
}}}

Of course you can configure other Realm/LoginService-implementations than HashUserRealm/HashLoginService

=== Security for inter-solr-node requests ===

Without [[https://issues.apache.org/jira/browse/SOLR-4470|SOLR-4470]], in a cluster/cloud/distributed installation with several solr-nodes communicating internally, in practice you would not be able to add security constraint on most paths, because inter-solr-node communication (potentially) involves requests to most paths, and solr-nodes where not able to provide credentials in those internal solr-node-to-solr-node requests. [[https://issues.apache.org/jira/browse/SOLR-4470|SOLR-4470]] introduces this ability. Basically there are two strategies for credentials to be used in solr-node-to-solr-node requests
 * 1) Just forward credentials from the "super"-request which caused the inter-solr-node "sub"-requests
 * 2) Use "internal credentials" provided to the solr-node by the administrator at startup

Using 1) is the most correct way, because it prevent request from "the outside" to indirectly trigger successful inter-solr-node sub-requests that the user sending the "outside request" is not allowed to perform directly himself. Therefore 1) should be used whenever feasible. But there are cases where inter-solr-node requests are not a direct reaction to requests coming from "the outside" - e.g. requests around shard-synchronization. In such cases 2) will have to be used - you have no other credentials to use, and you certainly (potentially) want to be able to protect e.g. shard-synchronization paths. There are also border-cases like e.g. when a request from "the outside" triggers inter-solr-node requests to be issued asynchronously - e.g. when a request from "the outside" to the Collections API makes the Overseer send inter-solr-node request to the CoreAdmin API. In such case 1) ought to be used, but that would require to persist the credentials from the original request and reuse them when sub-request to CoreAdmin API is eventually performed. The current implementation in [[https://issues.apache.org/jira/browse/SOLR-4470|SOLR-4470]] just uses 2) in such cases.

With [[https://issues.apache.org/jira/browse/SOLR-4470|SOLR-4470]] only "basic http authentication" is supported, even though the code has been prepared for also supporting other kinds of authentication.

Default is not to include credentials in solr-node-to-solr-node requests. You will have to activate it. You do that by adding the following to solr.xml
{{{
  <security>
    <interSolrNodeRequestAuthCredentialsProviderFactories>
      <directSubRequest>
        <str name="class">... fully qualified name of a class implementing org.apache.solr.security.InterSolrNodeAuthCredentialsFactory.SubRequestFactory - providing credentials for inter-solr-node requests of type 1) ...</str>
      </directSubRequest>
      <internalRequest>
        <str name="class">... fully qualified name of a class implementing org.apache.solr.security.InterSolrNodeAuthCredentialsFactory.InternalRequestFactory - providing credentials for inter-solr-node requests of type 2) ...</str>
      </internalRequest>
    </interSolrNodeRequestAuthCredentialsProviderFactories>
  </security>
}}}

org.apache.solr.security.InterSolrNodeAuthCredentialsFactory.SubRequestFactory
{{{
  public static interface SubRequestFactory {
    AuthCredentials getFromOuterRequest(SolrQueryRequest outerRequest);
  }
}}}
Out-of-the-box Solr comes with one implementation that you can just use - org.apache.solr.security.UseSuperRequestAuthCredentialsSubRequestFactory. The credentials it provides for the sub-inter-solr-node-requests will be a copy of the credentials provided in the outer "super"-request.

org.apache.solr.security.InterSolrNodeAuthCredentialsFactory.InternalRequestFactory
{{{
  public static interface InternalRequestFactory {
    AuthCredentials getInternalAuthCredentials();
  }
}}}
Out-of-the-box Solr comes with one implementation that you can just use - org.apache.solr.security.SystemPropertiesAuthCredentialsInternalRequestFactory. The credentials it provides for the inter-solr-node-requests will include http basic authentication with username taken from system property "internalAuthCredentialsBasicAuthUsername" and password taken from system property "internalAuthCredentialsBasicAuthPassword".
{{{
java -jar ... -DinternalAuthCredentialsBasicAuthUsername=<username> -DinternalAuthCredentialsBasicAuthPassword=<password> ... start.jar
}}}

You can write you own implementations of the interfaces and configure Solr to use them in solr.xml. Make sure the jar containing your implementations are present on Solr classpath (&lt;str name="sharedLib"&gt;...&lt;/str&gt; in solr.xml)

=== Providing credentials in requests from your own client ===

Basically "basic http authentication" requires you to add a header on the following form to your request
{{{
Authorization:Basic <base64 encoding of "<username>:<password>">
}}}

Fortunately misc tools help you do this

==== SolrJ ====

Before [[https://issues.apache.org/jira/browse/SOLR-4470|SOLR-4470]] SolrJ only supports setting up credentials on SolrServer/HttpClient level - credentials being used for all requests issued through this SolrServer/HttpClient. Basically you need to get hold of the HttpClient used and set it up. E.g. for a CloudSolrServer instance
{{{
HttpClientUtil.setBasicAuth(cloudSolrServer.getLbServer().getHttpClient(), <username>, <password>);
}}}

With [[https://issues.apache.org/jira/browse/SOLR-4470|SOLR-4470]] this has changed a little, because a class AuthCredentials encapsulating the abstract concept of credentials was introduced
{{{
HttpClientUtil.setAuthCredentials(cloudSolrServer.getLbServer().getHttpClient(), AuthCredentials.createBasicAuthCredentials(<username>, <password>));
}}}

With [[https://issues.apache.org/jira/browse/SOLR-4470|SOLR-4470]] you are also able to provide credentials on request level, so that you can use different credentials in different requests issued through one and the same SolrServer/HttpClient. Just set the credentials on the SolrRequest before using it (it will override credentials on SolrServer/HttpClient level if set)
{{{
solrRequest.setAuthCredentials(AuthCredentials.createBasicAuthCredentials(<username>, <password>));
}}}
or use one of the helper-methods on SolrServer taking a AuthCredentials object as its last argument. E.g.
{{{
mySolrServer.add(docs, commitWithinMs, AuthCredentials.createBasicAuthCredentials(<username>, <password>))
}}}

With [[https://issues.apache.org/jira/browse/SOLR-4470|SOLR-4470]] you have a final option. Namely overriding method manipulateRequest on your SolrServer. manipulateRequest will be called for every request you fire with this SolrServer and gives you a nice place to implement general credentials rules. E.g.
{{{
  CloudSolrServer mySolrServer = new CloudSolrServer(...) {
      @Override
      protected void manipulateRequest( final SolrRequest request ) {
        if (request.getAuthCredentials() == null) {
          String path = request.getPath();
          AuthCredentials credentials = AuthCredentials.createBasicAuthCredentials(<my default username>, <my default password>);
          if (path.contains("/update")) {
            credentials = AuthCredentials.createBasicAuthCredentials(<my update username>, <my update password>);
          }
          request.setAuthCredentials(credentials);
        }
      };
    };
}}}

==== curl and base64 on linux ====

Believe something like this will work
{{{
BASE64_CREDENTIALS=$(echo -n "<username>:<password>" | base64)
curl -i --header "Authorization:Basic ${BASE64_CREDENTIALS}" <url>
}}}

==== javascript (using jquery and jquery-base64) ====

This shows how to base64 encode the "<username>:<password>" with jquery and jquery-base64. How to easily include the credentials in actual requests issued by jquery is TBD
{{{
<html>
    <head>
        <script type="text/javascript" src="http://code.jquery.com/jquery-x.y.z.js"></script>
        <script type="text/javascript" src="https://raw.github.com/carlo/jquery-base64/master/jquery.base64.js"></script>
        <script type="text/javascript">
            $(document).ready(function() {
                alert( $.base64.encode( "<username>:<password>" ) );
            });
        </script>
    </head>
    <body>
        Javascript base64 encoding test
    </body>
</html>
}}}
You might want to download jquery-x.y.z.js and jquery.base64.js to be provided by your own web-application, instead of depending on them always being available from code.jquery.com and github.com :-)

=== Security in Solr on per-operation basis ===

Due to limitations on "url-pattern"'s in web.xml and the structure of URLs in Solr, it is hard to set up path based authentication on per-type-of-operation basis
 * url-pattern limitations: Wildcards are only allowed "in the end" (e.g. "/core1/*") or as "extension patterns" (e.g. "*.jsp" - the . is required)
 * Solr URL-structure: Solr URLs are structured as <core-or-collection-name>/<operation> (e.g. /core1/update)
Those facts makes it easy to set up path based authentication on per-collection/core basis. E.g. url-pattern "/core1/*" matchs all operations on core1. On the other hand, it makes it hard to set up path based authentication on operation basis. Lets say you want a url-pattern matching "updates" but across all cores/collections, you cannot just use url-pattern "/solr/*/update", "*/update" or "*update" - its not allowed in url-patterns. Different servlet containers provide different solutions to this problem.

You can also implement you own authorization-filter and let that deal with the authorization part - only let the container deal with the authentication. E.g. check out Solr code and find org.apache.solr.servlet.security.RegExpAuthorizationFilter in test-framework project, make a copy com.mycompany.security.RegExpAuthorizationFilter and set things up like this in WEB-INF/web.xml
{{{
  <security-constraint>
    <web-resource-collection>
      <web-resource-name>Solr authenticated application</web-resource-name>
      <url-pattern>/*</url-pattern> <!-- the container requires authentication for any url -->
    </web-resource-collection>
    <auth-constraint>
      <role-name>*</role-name> <!-- the container authorizes any authenticated user to do anything -->
    </auth-constraint>
  </security-constraint>

  <login-config>
    <auth-method>BASIC</auth-method>
    <realm-name>Test Realm</realm-name>
  </login-config>
}}}
In practice you probably want to replace the "<role-name>*</role-name>" line with several lines on the form "<role-name>some_concrete_role</role-name>" - one for each actual role your realm will ever "talk about". This is because some servlet containers (including jetty v8) do not work properly with *.

Now set up RegExpAuthorizationFilter to do the authorization. Insert this filter '''AS THE FIRST''' filter in the WEB-INF/web.xml inside solr.war.
{{{
<filter>
  <filter-name>RegExpAuthorizationFilter</filter-name>
  <filter-class>com.mycompany.security.RegExpAuthorizationFilter</filter-class>
  <init-param>
    <param-name>search-constraint</param-name>
    <param-value>1|update-role,admin-role|^.*/update$</param-value>
  </init-param>
  <init-param>
    <param-name>admin-constraint</param-name>
    <param-value>2|admin-role|^.*$</param-value>
  </init-param>
</filter>

<filter-mapping>
  <filter-name>RegExpAuthorizationFilter</filter-name>
  <url-pattern>/*</url-pattern>
</filter-mapping>
}}}
The RegExpAuthorizationFilter verifies authorization by matching paths against patterns - but support regular expression patterns. The patterns and corresponding "allowed roles" are provided to RegExpAuthorizationFilter using init-params. You provide an init-param for every "rule" you want to set up. Each init-param has to have a value on the from "<order>|<comma-separated-roles>|<path-regular-expression>" where
 * ''order'' is the order of this "rule" relative to the other "rules". Unfortunately it is not enough just to make sure the "rules" are ordered correctly in the web.xml, because the init-params might not be provided to the filter in that order
 * ''comma-separated-roles'' is a comma separated list of "roles" allowed to access paths matching ''path-regular-expressoin'' of the same "rule"
 * ''path-regular-expression'' is a regular expression (as understood by java.util.regex.Pattern) matched against the path of a particular request hitting the filter.
RegExpAuthorizationFilter iterates "rules" in the given order, matches the request-path against its ''path-regular-expression''. If no match continues to next "rule", if match the next "rule" is never considered. If no "rules" match the request is allowed to proceed - it passed authorization so to speak. In case of a match the authenticated user will be matched against the roles in ''comma-separated-roles'' and only allowed access in case he is in one of the roles mentioned. In case he is not the filter will return a response with status-code 403 "Unauthorized".

'''RegExpAuthorizationFilter is not a supported part of Solr, so use it at your own risk'''

=== Resin example ===

See [[http://caucho.com/resin/doc/resin-security.xtp|resin-security]] and [[http://caucho.com/resin/doc/webapp-tags.xtp#auth-constraint|auth-constraint]]

Here is an example showing how to force login for /update and /admin
{{{
      <web-app
        id="/solr"
        document-directory="/path/to/where/it/gets/exploded"
        archive-path="/path/to/solr.war"
        character-encoding="utf-8">

       <system-property solr.solr.home="/path/to/solr/data" />

       <authenticator type="com.caucho.server.security.XmlAuthenticator">
          <init>
            <user>yourusername:yourpassword:user,admin</user>
            <password-digest>none</password-digest>
          </init>
        </authenticator>
       <security-constraint url-pattern='/update/*' role-name='user'/>
       <security-constraint url-pattern='/admin/*' role-name='user'/>

     </web-app>
}}}
Line 393: Line 110:

== Web Server Level Security ==

=== Tomcat Remote Address Valve ===

You can limit access to server based on ip address by putting the following in server.xml

{{{
<Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="127.0.0.1″/>
}}}

=== ZooKeeper security ===

* Protecting content: [[Per Steffensen/ZooKeeper protecting content]]

Solr Security

The authoritative guide on security is in the Reference Guide. The rest of this page is tips and tricks beyond what is mentioned in the Refguide.

Current state of affairs

  • SSL support was added in version 4.2 (SolrCloud v4.7).

  • Protection of Zookeeper content through ACLs was added in version 5.0
  • Authentication and Authorization plugin support was added in 5.2 (SolrCloud only)

  • Basic Auth & Kerberos plugins and Rule-based Authorization plugin was added in 5.3

There is (as of 5.3) no role-based restrictions on the Admin UI, so be aware that anyone with access to Admin UI will be able to do anything with your system.

Need for firewall

Even though you add SSL or Authentication plugins, it is still strongly recommended that the application server containing Solr be firewalled such the only clients with access to Solr are your own. A default/example installation of Solr allows any client with access to it to add, update, and delete documents (and of course search/read too), including access to the Solr configuration and schema files and the administrative user interface.

If there is a need to provide query access to a Solr server from the open internet, it is highly recommended to use a proxy, such as one of these.

Cross-Site Scripting (XSS)

Solr has no known cross-site scripting vulnerabilities.

Quick XSS tip:

Problem: What if you want the browser to highlight text, but you also want to protect yourself from XSS and escape the HTML output? Solution: One solution is to escape the HTML output and then reapply the em tags. Now the rest of the snippet is safe and the browser will recognize the highlighted text.

For example, with groovy/grails you could have the following in your controller:

snippet = snippet.encodeAsHTML()
snippet = snippet.replaceAll('&lt;em&gt;', '<em>')
snippet = snippet.replaceAll('&lt;/em&gt;', </em>)

Cross-Site Request Forgery (CSRF)

Even if a Solr instance is protected by good firewalls so that "bad guys" have no direct access, that instance may be at risk to potential "Cross-Site Request Forgery" based attacks if the following are all true:

  1. Some number of "good guys" have direct access to that Solr instance from their web browsers.
  2. A "bad guy" knows/guesses the host:port/path of the Solr instance (even though they can not access it directly)
  3. The bad guy can trick one of the good guy into clicking a maliciously crafted URL, or loading a webpage that contains malicious javascript.

This is because Solr's most basic behavior is to receive updates and deletes via HTTP. If you have a firewall or other security measure restricting Solr's /update handler so it only accepts connections from approved hosts/clients, but you are approved then you could inadvertently be tricked into loading a web page that initiates an HTTP Connection to Solr on your behalf.

It's important to keep this in mind when thinking about what it means to "secure" an instance of Solr (if you have not already).

A basic technique that can be used to mitigate the risk of a possible CSRF attack like this is to configure your Servlet Container so that access to paths which can modify the index (ie: /update, /update/csv, etc...) are restricted either to specific client IPs, or using HTTP Authentication.

Document Level Security

Manifold CF (Connector Framework)

One way to add document level security to your search is through Apache ManifoldCF. ManifoldCF "defines a security model for target repositories that permits them to enforce source-repository security policies".

It works by adding security tokens from the source repositories as metadata on the indexed documents. Then, at query time, a Search Component adds a filter to all queries, matching only documents the logged-in user is allowed to see. ManifoldCF supports AD security out of the box.

Write Your Own RequestHandler or SearchComponent

*Stub - this is incomplete*

If ManifoldCF does not solve your need, first consider writing a ManifoldCF plugin. Or roll your own.

If you need permission based authentication -- where user A can update document 1 and 2, but not 3 -- you will need to augment the request with user information. Either you can add parameters to the query string (?u=XXX&p=YYY) or use a custom dispatcher filter that augments the context:

public class CustomDispatchFilter extends SolrDispatchFilter
{
  @Override
  protected void execute( HttpServletRequest req, SolrRequestHandler handler, SolrQueryRequest sreq, SolrQueryResponse rsp)
  {
    // perhaps the whole request
    sreq.getContext().put( "HttpServletRequest", req );

    // or maybe just the user
    sreq.getContext().put( "user", req.getRemoteUser());

    core.execute( handler, sreq, rsp );
  }
}


public class AuthenticatingHandler extends RequestHandlerBase
{
  @Override
  public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {

    HttpServletRequest httpreq = (HttpServletRequest)
      req.getContext().get( "HttpServletRequest" );

    if( httpreq.isUserInRole( "editor" ) ) {
      ...
    }

    String user = (String)req.getContext().get( "user" );
    ...
  }
  ...
}

Streaming Consideration

If streaming is enabled, you need to make sure Solr is as secure as it needs to be. When streaming is enabled, the parameters "stream.url" will go to a remote site and download the content. Likewise, "stream.file" will read a file on disk.

Streaming is disabled by default and is configured from solrconfig.xml

  <requestParsers enableRemoteStreaming="false" ... />

SolrSecurity (last edited 2015-08-28 09:22:48 by JanHoydahl)