Protocol implementations allow Nutch to use different protocols (ftp, http, file, etc.) to fetch documents. Implementation is done in plugins which allows users
Simple (no third-party dependencies) but error-tolerant HTTP/HTTPS protocol implementation (HTTP 1.0 and 1.1).
HTTP/HTTPS protocol based on Apache HttpClient, optionally with Basic, Digest and NTLM authentication schemes, form/post authentication and support to use proxy servers. See HttpAuthenticationSchemes and HttpPostAuthentication.
HTTP/HTTPS protocol based on on okhttp, supports
Nutch provides a couple of protocol plugins which fetch content not directly but using an intermediate web browser controlled via the Selenium browser automation library.
See README.
See README.
(under development, see NUTCH-2856)