|
Size: 21485
Comment: Adding a small section about ZooKeeper related security to the general security page. Only with a link to a new page describing how to protect solr content in ZK and why
|
← Revision 35 as of 2013-05-30 10:34:21 ⇥
Size: 21454
Comment: better link formatting
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 353: | Line 353: |
| * Protecting content: https://wiki.apache.org/solr/Per%20Steffensen/ZooKeeper%20protecting%20content | * Protecting content: [[Per Steffensen/ZooKeeper protecting content]] |
Solr Security
First and foremost, Solr does not concern itself with security either at the document level or the communication level. It is strongly recommended that the application server containing Solr be firewalled such the only clients with access to Solr are your own. A default/example installation of Solr allows any client with access to it to add, update, and delete documents (and of course search/read too), including access to the Solr configuration and schema files and the administrative user interface.
Besides limiting port access to the Solr server, standard Java web security can be added by tuning the container and the Solr web application configuration itself via web.xml. For example, all /update URLs could require HTTP authentication.
If there is a need to provide query access to a Solr server from the open internet, it is highly recommended to use a proxy, such as one of these.
Contents
Cross-Site Scripting (XSS)
Solr has no known cross-site scripting vulnerabilities.
Quick XSS tip:
Problem: What if you want the browser to highlight text, but you also want to protect yourself from XSS and escape the HTML output? Solution: One solution is to escape the HTML output and then reapply the em tags. Now the rest of the snippet is safe and the browser will recognize the highlighted text.
For example, with groovy/grails you could have the following in your controller:
snippet = snippet.encodeAsHTML()
snippet = snippet.replaceAll('<em>', '<em>')
snippet = snippet.replaceAll('</em>', </em>)
Cross-Site Request Forgery (CSRF)
Even if a Solr instance is protected by good firewalls so that "bad guys" have no direct access, that instance may be at risk to potential "Cross-Site Request Forgery" based attacks if the following are all true:
- Some number of "good guys" have direct access to that Solr instance from their web browsers.
- A "bad guy" knows/guesses the host:port/path of the Solr instance (even though they can not access it directly)
- The bad guy can trick one of the good guy into clicking a maliciously crafted URL, or loading a webpage that contains malicious javascript.
This is because Solr's most basic behavior is to receive updates and deletes via HTTP. If you have a firewall or other security measure restricting Solr's /update handler so it only accepts connections from approved hosts/clients, but you are approved then you could inadvertently be tricked into loading a web page that initiates an HTTP Connection to Solr on your behalf.
It's important to keep this in mind when thinking about what it means to "secure" an instance of Solr (if you have not already).
A basic technique that can be used to mitigate the risk of a possible CSRF attack like this is to configure your Servlet Container so that access to paths which can modify the index (ie: /update, /update/csv, etc...) are restricted either to specific client IPs, or using HTTP Authentication.
Path Based Authentication
Path based authentication configured at the servlet container level can be used to restrict access to urls (such as /admin and /update) to only clients specifying the correct credentials.
Using path based authentication to limit certain clients to path based request handlers with "appends" and "invariants" is also a nice way to expose a subset of the documents and constraining or defaulting any request parameters.
Consider:
<requestHandler name="/instock" class="solr.DisMaxRequestHandler" >
<lst name="appends">
<str name="fq">inStock:true</str>
</lst>
<lst name="invariants">
<str name="facet.field">cat</str>
</lst>
</requestHandler>Any queries into /instock, such as /instock?q=ipod, will always be limited to documents with an indexed inStock field containing a value of "true", and all responses will include facet counts for the "cat" field.
NOTE: Solr provides access to request handlers through a general purpose /select?qt=request_handler_name URL. Prior to Solr1.4 (via SOLR-1233), request handlers named with a leading forward-slash like /select?qt=/request_handler_name could not be used, but had to be requested using /request_handler_name. Solr1.4 removed the forward-slash restriction and allows /select to work with any request handler name. Externally blocking access to /select is recommended in environments where only path-based access to request handlers is warranted.
When using patch based authentication, you will most likely want to configure your HTTP client code to use Preemptive Authentication. With SOLR-4470 preemptive authentication will be used by default, but it can be turned off on a per-request basis by doing the following to your SolrRequest before using it
solrRequest.setPreemptiveAuthentication(false);
With SOLR-4470 preemptive authentication works for POST requests - it did not before.
Common servlet container example
Add this inside <web-app ..> in WEB-INF/web.xml inside solr.war (or for Jetty to /example/etc/webdefault.xml file)
<security-constraint>
<web-resource-collection>
<web-resource-name>Solr authenticated application</web-resource-name>
<url-pattern>/core1/*</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>core1-role</role-name>
</auth-constraint>
</security-constraint>
<login-config>
<auth-method>BASIC</auth-method>
<realm-name>Test Realm</realm-name>
</login-config>This is standardized for all servlet containers and will set up security with the following properties
- Authentication: You have to authenticate to access any path starting with "/core1/". Other paths can be accessed without authenticating. Authentication will have to be performed against a "realm" called "Test Realm". The "realm" will verify credentials
- Authorization: The configuration above also states that you will actually have to be part of the "role" "core1-role" in order to be authorized for "/core1/" paths. The "realm" will also, in case credentials where verified successfully, provide a set of "roles" that the authenticated user is part of.
Unfortunately it has not been standardized how to actually set up a "realm" in a servlet container, so its different from servlet container to servlet container.
Jetty realm example
Edit jetty.xml
In Jetty 6 Add / uncomment this section in /example/etc/jetty.xml
<Set name="UserRealms">
<Array type="org.mortbay.jetty.security.UserRealm">
<Item>
<New class="org.mortbay.jetty.security.HashUserRealm">
<Set name="name">Test Realm</Set>
<Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
</New>
</Item>
</Array>
</Set>In Jetty 8 (which ships with Solr 4) you add this to /example/etc/jetty.xml
<Call name="addBean">
<Arg>
<New class="org.eclipse.jetty.security.HashLoginService">
<Set name="name">Test Realm</Set>
<Set name="config"><SystemProperty name="jetty.home" default="."/>/etc/realm.properties</Set>
<Set name="refreshInterval">0</Set>
</New>
</Arg>
</Call>/example/etc/realm.properties
guest: guest, core1-role
Security for inter-solr-node requests
Without SOLR-4470, in a cluster/cloud/distributed installation with several solr-nodes communicating internally, in practice you would not be able to add security constraint on most paths, because inter-solr-node communication (potentially) involves requests to most paths, and solr-nodes where not able to provide credentials in those internal solr-to-solr requests. SOLR-4470 introduces this ability. Basically there are two strategies for credentials to be used in solr-to-solr requests
- 1) Just forward credentials from the "super"-request which caused the inter-solr-node "sub"-requests
- 2) Use "internal credentials" provided to the solr-node by the administrator at startup
Using 1) is the most correct way, because it prevent request from "the outside" to indirectly trigger successful inter-solr-node sub-requests that the user sending the "outside request" is not allowed to perform directly himself. Therefore 1) should be used whenever feasible. But there are cases where inter-solr-node requests are not a direct reaction to requests coming from "the outside" - e.g. requests around shard-synchronization. In such cases 2) will have to be used - you have no other credentials to use, and you certainly (potentially) want to be able to protect e.g. shard-synchronization paths. There are also border-cases like e.g. when a request from "the outside" triggers inter-solr-node requests to be issued asynchronously - e.g. when a request from "the outside" to the Collections API makes the Overseer send inter-solr-node request to the CoreAdmin API. In such case 1) ought to be used, but that would require to persist the credentials from the original request and reuse them when sub-request to CoreAdmin API is eventually performed. The current implementation in SOLR-4470 just uses 2) in such cases.
With SOLR-4470 only "basic http authentication" is supported, even though the code has been prepared for also supporting other kinds of authentication.
To provide 2) to the solr-node at startup you have to add a few VM-params
java -jar ... -DinternalAuthCredentialsBasicAuthUsername=<username> -DinternalAuthCredentialsBasicAuthPassword=<password> ... start.jar
Providing credentials in requests
Basically "basic http authentication" requires you to add a header on the following form to your request
Authorization:Basic <base64 encoding of "<username>:<password>">
Fortunately misc tools help you do this
SolrJ
Before SOLR-4470 SolrJ only supports setting up credentials on SolrServer/HttpClient level - credentials being used for all requests issued through this SolrServer/HttpClient. Basically you need to get hold of the HttpClient used and set it up. E.g. for a CloudSolrServer instance
HttpClientUtil.setBasicAuth(cloudSolrServer.getLbServer().getHttpClient(), <username>, <password>);
With SOLR-4470 this has changed a little, because a class AuthCredentials encapsulating the abstract concept of credentials was introduced
HttpClientUtil.setAuthCredentials(cloudSolrServer.getLbServer().getHttpClient(), AuthCredentials.createBasicAuthCredentials(<username>, <password>));
With SOLR-4470 you are also able to provide credentials on request level, so that you can use different credentials in different requests issued through one and the same SolrServer/HttpClient. Just set the credentials on the SolrRequest before using it (it will override credentials on SolrServer/HttpClient level if set)
solrRequest.setAuthCredentials(AuthCredentials.createBasicAuthCredentials(<username>, <password>));
curl and base64 on linux
Believe something like this will work
BASE64_CREDENTIALS=$(echo -n "<username>:<password>" | base64)
curl -i --header "Authorization:Basic ${BASE64_CREDENTIALS}" <url>
javascript (using jquery and jquery-base64)
This shows how to base64 encode the "<username>:<password>" with jquery and jquery-base64. How to easily include the credentials in actual requests issued by jquery is TBD
<html>
<head>
<script type="text/javascript" src="http://code.jquery.com/jquery-x.y.z.js"></script>
<script type="text/javascript" src="https://raw.github.com/carlo/jquery-base64/master/jquery.base64.js"></script>
<script type="text/javascript">
$(document).ready(function() {
alert( $.base64.encode( "<username>:<password>" ) );
});
</script>
</head>
<body>
Javascript base64 encoding test
</body>
</html>You might want to download jquery-x.y.z.js and jquery.base64.js to be provided by your own web-application, instead of depending on them always being available from code.jquery.com and github.com
Security in Solr on per-operation basis
Due to limitations on "url-pattern"'s in web.xml and the structure of URLs in Solr, it is hard to set up path based authentication on per-type-of-operation basis
- url-pattern limitations: Wildcards are only allowed "in the end" (e.g. "/core1/*") or as "extension patterns" (e.g. "*.jsp" - the . is required)
Solr URL-structure: Solr URLs are structured as <core-or-collection-name>/<operation> (e.g. /core1/update)
Those facts makes it easy to set up path based authentication on per-collection/core basis. E.g. url-pattern "/core1/*" matchs all operations on core1. On the other hand, it makes it hard to set up path based authentication on operation basis. Lets say you want a url-pattern matching "updates" but across all cores/collections, you cannot just use url-pattern "/solr/*/update", "*/update" or "*update" - its not allowed in url-patterns. Different servlet containers provide different solutions to this problem, but SOLR-4470 also provides a solution as part of solr itself.
The solution is provided in org.apache.solr.servlet.security.RegExpAuthorizationFilter. This is a normal filter that can be used to handle the authorization part of security, still leaving authentication to the servlet container (web.xml).
So lets set up web.xml to make the servlet container handle authentication only (basically letting every authenticated user access any path)
<security-constraint>
<web-resource-collection>
<web-resource-name>Solr authenticated application</web-resource-name>
<url-pattern>/*</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>*</role-name>
</auth-constraint>
</security-constraint>
<login-config>
<auth-method>BASIC</auth-method>
<realm-name>Test Realm</realm-name>
</login-config>In practice you probably want to replace the "<role-name>*</role-name>" line with several lines on the form "<role-name>some_concrete_role</role-name>" - one for each actual role your realm will ever "talk about". This is because some servlet containers (including jetty v8) do not work properly with *.
Now lets set up RegExpAuthorizationFilter to do the authorization. Insert this filter AS THE FIRST filter in the WEB-INF/web.xml inside solr.war.
<filter>
<filter-name>RegExpAuthorizationFilter</filter-name>
<filter-class>org.apache.solr.servlet.security.RegExpAuthorizationFilter</filter-class>
<init-param>
<param-name>search-constraint</param-name>
<param-value>1|update-role,admin-role|^.*/update$</param-value>
</init-param>
<init-param>
<param-name>admin-constraint</param-name>
<param-value>2|admin-role|^.*$</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>RegExpAuthorizationFilter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>The RegExpAuthorizationFilter verifies authorization by matching paths against patterns - but support regular expression patterns. The patterns and corresponding "allowed roles" are provided to RegExpAuthorizationFilter using init-params. You provide an init-param for every "rule" you want to set up. Each init-param has to have a value on the from "<order>|<comma-separated-roles>|<path-regular-expression>" where
order is the order of this "rule" relative to the other "rules". Unfortunately it is not enough just to make sure the "rules" are ordered correctly in the web.xml, because the init-params might not be provided to the filter in that order
comma-separated-roles is a comma separated list of "roles" allowed to access paths matching path-regular-expressoin of the same "rule"
path-regular-expression is a regular expression (as understood by java.util.regex.Pattern) matched against the path of a particular request hitting the filter.
RegExpAuthorizationFilter iterates "rules" in the given order, matches the request-path against its path-regular-expression. If no match continues to next "rule", if match the next "rule" is never considered. If no "rules" match the request is allowed to proceed - it passed authorization so to speak. In case of a match the authenticated user will be matched against the roles in comma-separated-roles and only allowed access in case he is in one of the roles mentioned. In case he is not the filter will return a response with status-code 403 "Unauthorized".
Resin example
See resin-security and auth-constraint
Here is an example showing how to force login for /update and /admin
<web-app
id="/solr"
document-directory="/path/to/where/it/gets/exploded"
archive-path="/path/to/solr.war"
character-encoding="utf-8">
<system-property solr.solr.home="/path/to/solr/data" />
<authenticator type="com.caucho.server.security.XmlAuthenticator">
<init>
<user>yourusername:yourpassword:user,admin</user>
<password-digest>none</password-digest>
</init>
</authenticator>
<security-constraint url-pattern='/update/*' role-name='user'/>
<security-constraint url-pattern='/admin/*' role-name='user'/>
</web-app>
Document Level Security
Manifold CF (Connector Framework)
One way to add document level security to your search is through Apache ManifoldCF. ManifoldCF "defines a security model for target repositories that permits them to enforce source-repository security policies".
It works by adding security tokens from the source repositories as metadata on the indexed documents. Then, at query time, a Search Component adds a filter to all queries, matching only documents the logged-in user is allowed to see. ManifoldCF supports AD security out of the box.
Write Your Own RequestHandler or SearchComponent
*Stub - this is incomplete*
If ManifoldCF does not solve your need, first consider writing a ManifoldCF plugin. Or roll your own.
If you need permission based authentication -- where user A can update document 1 and 2, but not 3 -- you will need to augment the request with user information. Either you can add parameters to the query string (?u=XXX&p=YYY) or use a custom dispatcher filter that augments the context:
public class CustomDispatchFilter extends SolrDispatchFilter
{
@Override
protected void execute( HttpServletRequest req, SolrRequestHandler handler, SolrQueryRequest sreq, SolrQueryResponse rsp)
{
// perhaps the whole request
sreq.getContext().put( "HttpServletRequest", req );
// or maybe just the user
sreq.getContext().put( "user", req.getRemoteUser());
core.execute( handler, sreq, rsp );
}
}
public class AuthenticatingHandler extends RequestHandlerBase
{
@Override
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
HttpServletRequest httpreq = (HttpServletRequest)
req.getContext().get( "HttpServletRequest" );
if( httpreq.isUserInRole( "editor" ) ) {
...
}
String user = (String)req.getContext().get( "user" );
...
}
...
}
Streaming Consideration
If streaming is enabled, you need to make sure Solr is as secure as it needs to be. When streaming is enabled, the parameters "stream.url" will go to a remote site and download the content. Likewise, "stream.file" will read a file on disk.
Streaming is disabled by default and is configured from solrconfig.xml
<requestParsers enableRemoteStreaming="false" ... />
Web Server Level Security
Tomcat Remote Address Valve
You can limit access to server based on ip address by putting the following in server.xml
<Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="127.0.0.1″/>
ZooKeeper security
* Protecting content: Per Steffensen/ZooKeeper protecting content