Differences between revisions 2 and 3
Revision 2 as of 2006-07-27 16:54:37
Size: 2281
Editor: pool-68-160-34-54
Revision 3 as of 2009-09-20 23:09:48
Size: 2281
Editor: localhost
Comment: converted to 1.6 markup
No differences found!

Install Tinyproxy (Ubuntu Dapper)


sudo apt-get install tinyproxy


sudo vi /etc/tinyproxy/tinyproxy.conf

Sample configuration, make sure you set up the Port and Allow (here, I'm using my localhost)

Port 5555
Filter "/etc/tinyproxy/filter"
FilterURLs On
FilterDefaultDeny No #filters will act as a blacklist

User nobody
Group nogroup
ViaProxyName "tinyproxy"
ConnectPort 443
ConnectPort 563
Timeout 600
DefaultErrorFile "/usr/share/tinyproxy/default.html"
StatFile "/usr/share/tinyproxy/stats.html"
Logfile "/var/log/tinyproxy.log"
LogLevel Info
PidFile "/var/run/tinyproxy.pid"
MaxClients 100
MinSpareServers 5
MaxSpareServers 20
StartServers 10
MaxRequestsPerChild 0

Create filters

If necessary (will act as a blacklist, because of FilterDefaultDeny No)

sudo vi /etc/tinyproxy/filter

and add sites urls to be blocked


Commands to Stop,Start, and Restart

sudo /etc/init.d/tinyproxy stop
sudo /etc/init.d/tinyproxy start
sudo /etc/init.d/tinyproxy restart

Test the proxy with your browser

  • For Firefox, menu Preferences, tab General, button Connection settings. Then select Manual Proxy Configuration and enter the host you defined above and the port.
  • If you have created the filter above, and browse to google.com, the proxy should block you.

Configure Nutch (Nutch O.8)

Copy the proxy configuration (see below) from conf/nutch-default.xml to conf/nutch-site.xml and fill up with the values of your proxy

  <description>The proxy hostname.  If empty, no proxy is used.</description>

  <description>The proxy port.</description>

Now if you crawl sites, Nutch will use your proxy. You can monitor it by looking at the logs of Tinyproxy during a crawl:

sudo tail -f /var/log/tinyproxy.log

More resources

* http://ubuntuforums.org/showthread.php?t=122011 * http://doc.gwos.org/index.php/TinyProxy * http://doc.ubuntu-fr.org/serveur/tinyproxy

SetupProxyForNutch (last edited 2014-03-05 22:44:55 by LewisJohnMcgibbney)