Contents
- What is Squid?
- What is Internet object caching?
- Why is it called Squid?
- What is the latest version of Squid?
- Who is responsible for Squid?
- Where can I get Squid?
- What Operating Systems does Squid support?
- Does Squid run on Windows ?
- What Squid mailing lists are available?
- I can't figure out how to unsubscribe from your mailing list.
- What other Squid-related documentation is available?
- Does Squid support SSL/HTTPS/TLS?
- What's the legal status of Squid?
- Can I pay someone for Squid support?
- Squid FAQ contributors
- About This Document
- Want to contribute?
- Which file do I download to get Squid?
- Do you have pre-compiled binaries available?
- How do I compile Squid?
- Building Squid on ...
- I see a lot warnings while compiling Squid.
- undefined reference to __inet_ntoa
- How big of a system do I need to run Squid?
- How do I install Squid?
- How do I start Squid?
- How do I start Squid automatically when the system boots?
- How do I tell if Squid is running?
- squid command line options
- How do I see how Squid works?
- Can Squid benefit from SMP systems?
- Is it okay to use separate drives for Squid?
- Is it okay to use RAID on Squid?
- How do I configure Squid without re-compiling it?
- What does the squid.conf file do?
- Do you have a squid.conf example?
- How do I join a cache hierarchy?
- How do I join NLANR's cache hierarchy?
- Why should I want to join NLANR's cache hierarchy?
- How do I register my cache with NLANR's registration service?
- How do I find other caches close to me and arrange parent/child/sibling relationships with them?
- My cache registration is not appearing in the Tracker database.
- How do I configure Squid to work behind a firewall?
- How do I configure Squid forward all requests to another proxy?
- I have "dnsserver" processes that aren't being used, should I lower the number in "squid.conf"?
- My ''dnsserver'' average/median service time seems high, how can I reduce it?
- How can I easily change the default HTTP port?
- Is it possible to control how big each ''cache_dir'' is?
- What ''cache_dir'' size should I use?
- I'm adding a new cache_dir. Will I lose my cache?
- Squid and http-gw from the TIS toolkit.
- What is "HTTP_X_FORWARDED_FOR"? Why does squid provide it to WWW servers, and how can I stop it?
- Can Squid anonymize HTTP requests?
- Can I make Squid go direct for some sites?
- Can I make Squid proxy only, without caching anything?
- Can I prevent users from downloading large files?
- How do I enable IPv6?
- Communication between browsers and Squid
- Manual Browser Configuration
- Firefox and Thunderbird manual configuration
- Microsoft Internet Explorer manual configuration
- Netscape manual configuration
- Lynx and Mosaic manual configuration
- Opera 2.12 manual configuration
- Netmanage Internet Chameleon WebSurfer manual configuration
- Partially Automatic Configuration
- Netscape automatic configuration
- Microsoft Internet Explorer
- Fully Automatically Configuring Browsers for WPAD
- Fully Automatically Configuring Browsers for WPAD with DHCP
- Redundant Proxy Auto-Configuration
- Proxy Auto-Configuration with URL Hashing
- How do I tell Squid to use a specific username for FTP urls?
- IE 5.0x crops trailing slashes from FTP URL's
- IE 6.0 SP1 fails when using authentication
- Squid Log Files
- squid.out
- cache.log
- useragent.log
- store.log
- hierarchy.log
- access.log
- access.log native format in detail
- sending access.log to syslog
- customizable access.log
- cache/log (Squid-1.x)
- swap.state (Squid-2.x)
- Which log files can I delete safely?
- How can I disable Squid's log files?
- What is the maximum size of access.log?
- My log files get very big!
- I want to use another tool to maintain the log files.
- Managing log files
- Why do I get ERR_NO_CLIENTS_BIG_OBJ messages so often?
- What does ERR_LIFETIME_EXP mean?
- Retrieving "lost" files from the cache
- Can I use store.log to figure out if a response was cachable?
- Can I pump the squid access.log directly into a pipe?
- How do I see system level Squid statistics?
- How can I find the biggest objects in my cache?
- I want to restart Squid with a clean cache
- How can I proxy/cache Real Audio?
- How can I purge an object from my cache?
- How can i purge multiple objects from my cache?
- Using ICMP to Measure the Network
- Why are so few requests logged as TCP_IMS_MISS?
- How can I make Squid NOT cache some servers or URLs?
- How can I delete and recreate a cache directory?
- Why can't I run Squid as root?
- Can you tell me a good way to upgrade Squid with minimal downtime?
- Can Squid listen on more than one HTTP port?
- Can I make origin servers see the client's IP address when going through Squid?
- Why does Squid use so much memory!?
- How can I tell how much memory my Squid process is using?
- My Squid process grows without bounds.
- I set cache_mem to XX, but the process grows beyond that!
- How do I analyze memory usage from the cache manger output?
- The "Total memory accounted" value is less than the size of my Squid process.
- xmalloc: Unable to allocate 4096 bytes!
- fork: (12) Cannot allocate memory
- What can I do to reduce Squid's memory usage?
- Using an alternate malloc library
- How much memory do I need in my Squid server?
- Why can't my Squid process grow beyond a certain size?
- What is the cache manager?
- How do you set it up?
- Cache manager configuration for CERN httpd 3.0
- Cache manager configuration for Apache 1.x
- Cache manager configuration for Apache 2.x
- Cache manager configuration for Roxen 2.0 and later
- Cache manager access from squidclient
- Cache manager ACLs in squid.conf
- Why does it say I need a password and a URL?
- I want to shutdown the cache remotely. What's the password?
- How do I make the cache host default to my cache?
- What's the difference between Squid TCP connections and Squid UDP connections?
- It says the storage expiration will happen in 1970!
- What do the Meta Data entries mean?
- In the utilization section, what is Other?
- In the utilization section, why is the Transfer KB/sec column always zero?
- In the utilization section, what is the Object Count?
- In the utilization section, what is the Max/Current/Min KB?
- What is the I/O section about?
- What is the Objects section for?
- What is the VM Objects section for?
- What does AVG RTT mean?
- In the IP cache section, what's the difference between a hit, a negative hit and a miss?
- What do the IP cache contents mean anyway?
- What is the fqdncache and how is it different from the ipcache?
- What does "Page faults with physical i/o: 4897" mean?
- What does the IGNORED field mean in the 'cache server list'?
- ACL elements
- Access Lists
- How do I allow my clients to use the cache?
- how do I configure Squid not to cache a specific server?
- How do I implement an ACL ban list?
- How do I block specific users or groups from accessing my cache?
- Do you have a CGI program which lets users change their own proxy passwords?
- Is there a way to do ident lookups only for a certain host and compare the result with a userlist in squid.conf?
- Common Mistakes
- I set up my access controls, but they don't work! why?
- Proxy-authentication and neighbor caches
- Is there an easy way of banning all Destination addresses except one?
- Does anyone have a ban list of porn sites and such?
- Squid doesn't match my subdomains
- Why does Squid deny some port numbers?
- Does Squid support the use of a database such as mySQL for storing the ACL list?
- How can I allow a single address to access a specific URL?
- How can I allow some clients to use the cache at specific times?
- How can I allow some users to use the cache at specific times?
- Problems with IP ACL's that have complicated netmasks
- Can I set up ACL's based on MAC address rather than IP?
- Can I limit the number of connections from a client?
- I'm trying to deny ''foo.com'', but it's not working.
- I want to customize, or make my own error messages.
- I want to use local time zone in error messages.
- I want to put ACL parameters in an external file.
- I want to authorize users depending on their MS Windows group memberships
- Maximum length of an acl name
- Why am I getting "Proxy Access Denied?"
- I can't get ''local_domain'' to work; ''Squid'' is caching the objects from my local servers.
- Connection Refused when reaching a sibling
- Running out of filedescriptors
- What are these strange lines about removing objects?
- Can I change a Windows NT FTP server to list directories in Unix format?
- Why am I getting "Ignoring MISS from non-peer x.x.x.x?"
- DNS lookups for domain names with underscores (_) always fail.
- Why does Squid say: "Illegal character in hostname; underscores are not allowed?'
- Why am I getting access denied from a sibling cache?
- Cannot bind socket FD NN to *:8080 (125) Address already in use
- icpDetectClientClose: ERROR xxx.xxx.xxx.xxx: (32) Broken pipe
- icpDetectClientClose: FD 135, 255 unexpected bytes
- Does Squid work with NTLM Authentication?
- The ''default'' parent option isn't working!
- "Hotmail" complains about: Intrusion Logged. Access denied.
- My Squid becomes very slow after it has been running for some time.
- WARNING: Failed to start 'dnsserver'
- Sending bug reports to the Squid team
- Debugging Squid
- FATAL: ipcache_init: DNS name lookup tests failed
- FATAL: Failed to make swap directory /var/spool/cache: (13) Permission denied
- FATAL: Cannot open HTTP Port
- FATAL: All redirectors have exited!
- FATAL: file_map_allocate: Exceeded filemap limit
- FATAL: You've run out of swap file numbers.
- I am using up over 95% of the filemap bits?!!
- FATAL: Cannot open /usr/local/squid/logs/access.log: (13) Permission denied
- When using a username and password, I can not access some files.
- pingerOpen: icmp_sock: (13) Permission denied
- What is a forwarding loop?
- accept failure: (71) Protocol error
- storeSwapInFileOpened: ... Size mismatch
- Why do I get ''fwdDispatch: Cannot retrieve 'https://www.buy.com/corp/ordertracking.asp' ''
- Squid can't access URLs like http://3626046468/ab2/cybercards/moreinfo.html
- I get a lot of "URI has whitespace" error messages in my cache log, what should I do?
- commBind: Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address
- Unknown cache_dir type '/var/squid/cache'
- unrecognized: 'cache_dns_program /usr/local/squid/bin/dnsserver'
- Is ''dns_defnames'' broken in Squid-2.3 and later?
- What does "sslReadClient: FD 14: read failure: (104) Connection reset by peer" mean?
- What does ''Connection refused'' mean?
- squid: ERROR: no running copy
- FATAL: getgrnam failed to find groupid for effective group 'nogroup'
- "Unsupported Request Method and Protocol" for ''https'' URLs.
- Squid uses 100% CPU
- Webmin's ''cachemgr.cgi'' crashes the operating system
- Segment Violation at startup or upon first request
- urlParse: Illegal character in hostname 'proxy.mydomain.com:8080proxy.mydomain.com'
- Requests for international domain names does not work
- Why do I sometimes get "Zero Sized Reply"?
- Why do I get "The request or reply is too large" errors?
- Negative or very large numbers in Store Directory Statistics, or constant complaints about cache above limit
- Squid problems with Windows Update v5
- What are cachable objects?
- What is the ICP protocol?
- What is the ''dnsserver''?
- What is a cache hierarchy? What are parents and siblings?
- What is the Squid cache resolution algorithm?
- What features are Squid developers currently working on?
- Tell me more about Internet traffic workloads
- What are the tradeoffs of caching with the NLANR cache system?
- Where can I find out more about firewalls?
- What is the "Storage LRU Expiration Age?"
- What is "Failure Ratio at 1.01; Going into hit-only-mode for 5 minutes"?
- Does squid periodically re-read its configuration file?
- How does ''unlinkd'' work?
- What is an icon URL?
- Can I make my regular FTP clients use a Squid cache?
- Why is the select loop average time so high?
- How does Squid deal with Cookies?
- How does Squid decide when to refresh a cached object?
- What exactly is a ''deferred read''?
- Why is my cache's inbound traffic equal to the outbound traffic?
- How come some objects do not get cached?
- What does ''keep-alive ratio'' mean?
- How does Squid's cache replacement algorithm work?
- What are private and public keys?
- What is FORW_VIA_DB for?
- Does Squid send packets to port 7 (echo)? If so, why?
- What does "WARNING: Reply from unknown nameserver [a.b.c.d]" mean?
- How does Squid distribute cache files among the available directories?
- Why do I see negative byte hit ratio?
- What does "Disabling use of private keys" mean?
- What is a half-closed filedescriptor?
- What does --enable-heap-replacement do?
- Why is actual filesystem space used greater than what Squid thinks?
- How do ''positive_dns_ttl'' and ''negative_dns_ttl'' work?
- What does ''swapin MD5 mismatch'' mean?
- What does ''failed to unpack swapfile meta data'' mean?
- Why doesn't Squid make ''ident'' lookups in interception mode?
- dnsSubmit: queue overload, rejecting blah
- What are FTP passive connections?
- What is Multicast?
- How do I know if my network has multicast?
- Should I be using Multicast ICP?
- How do I configure Squid to send Multicast ICP queries?
- How do I know what Multicast TTL to use?
- How do I configure Squid to receive and respond to Multicast ICP?
- General advice
- FreeBSD
- Solaris
- FreeBSD
- OSF1/3.2
- BSD/OS
- Linux
- IRIX
- SCO-UNIX
- AIX
- What is a redirector?
- Why use a redirector?
- How does it work?
- Do you have any examples?
- Can I use the redirector to return HTTP redirect messages?
- FATAL: All redirectors have exited!
- Redirector interface is broken re IDENT values
- Redirections by origin servers
- What is a Cache Digest?
- How and why are they used?
- What is the theory behind Cache Digests?
- How is the size of the Cache Digest in Squid determined?
- What hash functions (and how many of them) does Squid use?
- How are objects added to the Cache Digest in Squid?
- Does Squid support deletions in Cache Digests? What are diffs/deltas?
- When and how often is the local digest built?
- How are Cache Digests transferred between peers?
- How and where are Cache Digests stored?
- How are the Cache Digest statistics in the Cache Manager to be interpreted?
- What are False Hits and how should they be handled?
- How can Cache Digest related activity be traced/debugged?
- What about ICP?
- Is there a Cache Digest Specification?
- Would it be possible to stagger the timings when cache_digests are retrieved from peers?
- Concepts of Interception Caching
- Requirements and methods for Interception Caching
- Steps involved in configuring Interception Caching
- Configuring Other Operating Systems
- Issues with HotMail
- Does Squid support SNMP?
- Enabling SNMP in Squid
- Configuring Squid
- How can I query the Squid SNMP Agent
- What can I use SNMP and Squid for?
- How can I use SNMP with Squid?
- Where can I get more information/discussion about Squid and SNMP?
- Monitoring Squid with MRTG
- Monitoring Squid with Cacti
- What are the new features in squid 2.X?
- How do I configure 'ssl_proxy' now?
- Adding a new cache disk
- How do I configure proxy authentication?
- Why does proxy-auth reject all users after upgrading from Squid-2.1 or earlier?
- Delay Pools
- Customizable Error Messages
- My squid.conf from version 1.1 doesn't work!
- What is the httpd-accelerator mode?
- How do I set it up?
- Domain based virtual host support
- Sending different requests to different backend web servers
- Running the web server on the same server
- Load balancing of backend servers
- When using an httpd-accelerator, the port number or host name for redirects or CGI-generated content is wrong
- Access to password protected content fails via the reverse proxy
- Clients
- Load Balancers
- HA Clusters
- Logfile Analysis
- Configuration Tools
- Squid add-ons
- Ident Servers
- Cacheability Validators
- What is DISKD?
- Does it perform better?
- How do I use it?
- FATAL: Unknown cache_dir type 'diskd'
- If I use DISKD, do I have to wipe out my current cache?
- How do I configure message queues?
- How do I configure shared memory?
- Sometimes shared memory and message queues aren't released when Squid exits.
- What are the Q1 and Q2 parameters?
- What is COSS?
- Does it perform better?
- How do I use it?
- If I use COSS, do I have to wipe out my current cache?
- What options are required for COSS?
- Are there any other configuration options for COSS?
- Examples
- How does Proxy Authentication work in Squid?
- How do I use authentication in access controls?
- How do I ask for authentication of an already authenticated user?
- Does Squid cache authentication lookups?
- Are passwords stored in clear text or encrypted?
- How do I use the Winbind authenticators?
- Can I use different authentication mechanisms together?
- Can I use more than one user-database?
- References
- Authentication in interception and transparent modes
- Other Resources
- Neighbor
- Regular Expression
- Open-access proxies
- Mail relaying
- Hijackable proxies
- Way Too Many Cache Misses
- Pruning the Cache Down
- Changing the Cache Levels
What is Squid?
Squid is a high-performance proxy caching server for web clients, supporting FTP, gopher, and HTTP data objects. Squid handles all requests in a single, non-blocking, I/O-driven process.
Squid keeps meta data and especially hot objects cached in RAM, caches DNS lookups, supports non-blocking DNS lookups, and implements negative caching of failed requests.
Squid supports SSL, extensive access controls, and full request logging. By using the lightweight Internet Cache Protocol, Squid caches can be arranged in a hierarchy or mesh for additional bandwidth savings.
Squid consists of a main server program squid, an optional Domain Name System lookup program dnsserver (Squid nowadays implements the DNS protocol on its own by default), some optional programs for rewriting requests and performing authentication, and some management and client tools.
Squid is originally derived from the ARPA-funded Harvest project. Since then it has gone through many changes and has many new features.
What is Internet object caching?
Internet object caching is a way to store requested Internet objects (i.e., data available via the HTTP, FTP, and gopher protocols) on a system closer to the requesting site than to the source. Web browsers can then use the local Squid cache as a proxy HTTP server, reducing access time as well as bandwidth consumption.
Why is it called Squid?
Harris' Lament says, "All the good ones are taken."
We needed to distinguish this new version from the Harvest cache software. Squid was the code name for initial development, and it stuck.
What is the latest version of Squid?
At the time of writing (August 2006), [Squid-2.6] is the stable version and [Squid-3.0] is under development.
Please see the Squid home page for the most recent versions.
Who is responsible for Squid?
Squid is the result of efforts by numerous individuals from the Internet community. The core team and main contributors list is at WhoWeAre; a list of our excellent contributors can be seen in the CONTRIBUTORS file.
Where can I get Squid?
You can download Squid via FTP from one of the many worldwide mirror sites or the primary FTP site.
Many sushi bars also have Squid.
What Operating Systems does Squid support?
The software is designed to operate on any modern system, and is known to work on at least the following platforms:
- Linux
- FreeBSD
- NetBSD
- OpenBSD
- BSDI
- Mac OS/X
- OSF/Digital Unix/Tru64
- IRIX
- SunOS/Solaris
- NeXTStep
- SCO Unix
- AIX
- HP-UX
- Microsoft Windows Cygwin and MinGW
- OS/2
If you encounter any platform-specific problems, please let us know by registering an entry in our bug database. If you're curious about what is the best OS to run Squid, see BestOsForSquid.
Does Squid run on Windows ?
Recent versions of Squid will compile and run on Windows NT and later incarnations with the Cygwin / MinGW packages.
GuidoSerassio maintains the native Windows port of Squid (built using the Microsoft toolchain) and is actively working on having the needed changes integrated into the standard Squid distribution. His effort is partially based on earlier Windows NT port by Romeo Anghelache.
UPDATE: starting from 2.6.STABLE4, Windows MinGW support is available in the standard Squid distribution.
What Squid mailing lists are available?
- <squid-users AT squid-cache DOT org>
hosts general discussions about the Squid cache software. subscribe via <squid-users-subscribe AT squid-cache DOT org>. Previous messages are available for browsing at the Squid Users Archive, and also at theaimsgroup.com and MarkMail.
squid-users-digest: digested (daily) version of above. Subscribe via <squid-users-digest-subscribe AT squid-cache DOT org>.
- <squid-announce AT squid-cache DOT org>
is a receive-only list for announcements of new versions. Subscribe via <squid-announce-subscribe AT squid-cache DOT org>.
- <squid-bugs AT squid-cache DOT org> is meant for sending us bug reports. Bug reports received here are given priority over those mentioned on squid-users.
- <squid AT squid-cache DOT org>: A closed list for sending us feed-back and ideas.
- <squid-faq AT squid-cache DOT org>: A closed list for sending us feed-back, updates, and additions to the Squid FAQ.
I can't figure out how to unsubscribe from your mailing list.
All of our mailing lists have "-subscribe" and "-unsubscribe" addresses that you must use for subscribe and unsubscribe requests. To unsubscribe from the squid-users list, you send a message to <squid-users-unsubscribe AT squid-cache DOT org>.
What other Squid-related documentation is available?
The Squid home page for information on the Squid software
Squid: The Definitive Guide written by Duane Wessels and published by O'Reilly and Associates January 2004.
The IRCache Mesh gives information on our operational mesh of caches.
The Squid FAQ (uh, you're reading it).
Squid documentation in German, Turkish, Italian, Brazilian Portugese, and another in Brazilian Portugese.
Squid Programmers Guide. Yeah, its extremely incomplete. I assure you this is the most recent version.
RFC 2186 ICPv2 -- Protocol
RFC 2187 ICPv2 -- Application
Does Squid support SSL/HTTPS/TLS?
As of version 2.5, Squid can terminate SSL connections. This is perhaps only useful in a surrogate (http accelerator) configuration. You must run configure with --enable-ssl. See https_port in squid.conf for more information.
Squid also supports these encrypted protocols by "tunneling" traffic between clients and servers. In this case, Squid can relay the encrypted bits between a client and a server.
Normally, when your browser comes across an https URL, it does one of two things:
- - The browser opens an SSL connection directly to the origin server.
- The browser tunnels the request through Squid with the CONNECT request method.
The CONNECT method is a way to tunnel any kind of connection through an HTTP proxy. The proxy doesn't understand or interpret the contents. It just passes bytes back and forth between the client and server. For the gory details on tunnelling and the CONNECT method, please see RFC 2817 and Tunneling TCP based protocols through Web proxy servers (expired).
What's the legal status of Squid?
Squid is copyrighted by the University of California San Diego. Squid uses some code developed by others.
Squid is Free Software, licensed under the terms of the GNU General Public License.
Can I pay someone for Squid support?
Yes. Please see Squid Support Services. You can also donate money or equipment to members of the squid core team.
Squid FAQ contributors
The following people have made contributions to this document:
Dodjie Nava, Jonathan Larmour, Cord Beermann, Tony Sterrett, Gerard Hynes, Katayama, Takeo, Duane Wessels, K Claffy, Paul Southworth, Oskar Pearson, Ong Beng Hui, Torsten Sturm, James R Grinter, Rodney van den Oever, Kolics Bertold, Carson Gaspar, Michael O'Reilly, Hume Smith, Richard Ayres, John Saunders, Miquel van Smoorenburg, David J N Begley, Kevin Sartorelli, Andreas Doering, Mark Visser, tom minchin, Jens-S. Vöckler, Andre Albsmeier, Doug Nazar, Henrik Nordstrom, Mark Reynolds, Arjan de Vet, Peter Wemm, John Line, Jason Armistead, Chris Tilbury, Jeff Madison, Mike Batchelor, Bill Bogstad, Radu Greab, F.J. Bosscha, Brian Feeny, Martin Lyons, David Luyer, Chris Foote, Jens Elkner, Simon White, Jerry Murdock, Gerard Eviston, Rob Poe, FrancescoChemolli, ReubenFarrelly
About This Document
The Squid FAQ is copyrighted (2006) by The Squid Core Team.
This FAQ was maintained for a long time as an XML Docbook file. It was converted to a Wiki in March 2006. The wiki is now the authoritative version.
Want to contribute?
We always welcome help keeping the Squid FAQ up-to-date. If you would like to help out, please register with this Wiki and type away. Please also send a note to the wiki operator <wiki AT kinkie DOT it> to inform him of your changes.
Compiling Squid
Contents
Which file do I download to get Squid?
You must download a source archive file of the form squid-x.y.tar.gz or squid-x.y.tar.bz2 (eg, squid-2.5.STABLE14.tar.bz2). We recommend you first try one of our mirror sites.
Alternatively, the main Squid WWW site www.squid-cache.org, and FTP site ftp.squid-cache.org have these files.
Context diffs are available for upgrading to new versions. These can be applied with the patch program (available from the GNU FTP site or your distribution).
Do you have pre-compiled binaries available?
The squid core team members do not have the resources to make pre-compiled binaries available. Instead, we invest effort into making the source code very portable. Some contributors have made binary packages available. Please see our Platforms Page.
The SGI Freeware site has pre-compiled packages for SGI IRIX.
Squid binaries for FreeBSD on Alpha and Intel.
Squid binaries for NetBSD on everything
Gurkan Sengun has some Sparc/Solaris packages available.
Squid binaries for Windows.
How do I compile Squid?
You must run the configure script yourself before running make. We suggest that you first invoke ./configure --help and make a note of the configure options you need in order to support the features you intend to use. Do not compile in features you do not think you will need.
% tar xzf squid-2.5.RELEASExy.tar.gz % cd squid-2.5.RELEASExy % ./configure --with-MYOPTION --with-MYOPTION2 etc % make
- .. and finally install...
% make install
Squid will by default, install into /usr/local/squid. If you wish to install somewhere else, see the --prefix option for configure.
What kind of compiler do I need?
To compile Squid, you will need an ANSI C compiler. Almost all modern Unix systems come with pre-installed compilers which work just fine. The old SunOS compilers do not have support for ANSI C, and the Sun compiler for Solaris is a product which must be purchased separately.
If you are uncertain about your system's C compiler, The GNU C compiler is widely available and supplied in almost all operating systems. It is also well tested with Squid. If your OS does not come with GCC you may download it from the GNU FTP site. In addition to gcc, you may also want or need to install the binutils package.
What else do I need to compile Squid?
You will need Perl installed on your system.
How do I apply a patch or a diff?
You need the patch program. You should probably duplicate the entire directory structure before applying the patch. For example, if you are upgrading from squid-2.5STABLE13 to 2.5STABLE14, you would run these commands:
cp -rl squid-2.5.STABLE13 squid-2.5.STABLE14 cd squid-2.5.STABLE14 zcat /tmp/squid-2.5.STABLE13-STABLE14.diff.gz | patch -p1
After the patch has been applied, you must rebuild Squid from the very beginning, i.e.:
make distclean ./configure [--option --option...] make make install
If your patch program seems to complain or refuses to work, you should get a more recent version, from the GNU FTP site, for example.
Ideally you should use the patch command which comes with your OS.
configure options
The configure script can take numerous options. The most useful is --prefix to install it in a different directory. The default installation directory is /usr/local/squid/. To change the default, you could do:
% cd squid-x.y.z % ./configure --prefix=/some/other/directory/squid
Type
% ./configure --help
to see all available options. You will need to specify some of these options to enable or disable certain features. Some options which are used often include:
--prefix=PREFIX install architecture-independent files in PREFIX [/usr/local/squid] --enable-dlmalloc[=LIB] Compile & use the malloc package by Doug Lea --enable-gnuregex Compile GNUregex --enable-splaytree Use SPLAY trees to store ACL lists --enable-xmalloc-debug Do some simple malloc debugging --enable-xmalloc-debug-trace Detailed trace of memory allocations --enable-xmalloc-statistics Show malloc statistics in status page --enable-carp Enable CARP support --enable-async-io Do ASYNC disk I/O using threads --enable-icmp Enable ICMP pinging --enable-delay-pools Enable delay pools to limit bandwith usage --enable-mem-gen-trace Do trace of memory stuff --enable-useragent-log Enable logging of User-Agent header --enable-kill-parent-hack Kill parent on shutdown --enable-snmp Enable SNMP monitoring --enable-cachemgr-hostname[=hostname] Make cachemgr.cgi default to this host --enable-arp-acl Enable use of ARP ACL lists (ether address) --enable-htpc Enable HTCP protocol --enable-forw-via-db Enable Forw/Via database --enable-cache-digests Use Cache Digests see http://www.squid-cache.org/Doc/FAQ/FAQ-16.html --enable-err-language=lang Select language for Error pages (see errors dir)
Building Squid on ...
BSD/OS or BSDI
Known Problem:
cache_cf.c: In function `parseConfigFile': cache_cf.c:1353: yacc stack overflow before `token' ...
You may need to upgrade your gcc installation to a more recent version. Check your gcc version with
gcc -v
If it is earlier than 2.7.2, you might consider upgrading. Gcc 2.7.2 is very old and not widely supported.
Cygwin (Windows)
In order to compile Squid, you need to have Cygwin fully installed.
WCCP is not available on Windows so the following configure options are needed to disable them:
--disable-wccp --disable-wccpv2
Squid will by default, install into /usr/local/squid. If you wish to install somewhere else, see the --prefix option for configure.
Now, add a new Cygwin user - see the Cygwin user guide - and map it to SYSTEM, or create a new NT user, and a matching Cygwin user and they become the squid runas users.
Read the squid FAQ on permissions if you are using CYGWIN=ntsec.
After run squid -z. If that succeeds, try squid -N -D -d1, squid should start. Check that there are no errors. If everything looks good, try browsing through squid.
Now, configure cygrunsrv to run Squid as a service as the chosen username. You may need to check permissions here.
Debian
From 2.6 STABLE 14 Squid should compile easily on this platform.
There is just one known problem. The Linux system layout differs markedly from the Squid defaults. The following ./configure options are needed to install Squid into the Linux structure properly:
--prefix=/usr --localstatedir=/var --libexecdir=${prefix}/lib/squid --srcdir=. --datadir=${prefix}/share/squid --sysconfdir=/etc/squid
From Squid 3.0 the default user can also be set. The Debian package default is:
--with-default-user=proxy
The following patch also needs to be applied since the /var/logs/ directory for logs has no configure option.
--- src/Makefile.am 2007-09-17 14:22:33.000000000 +1200 +++ src/Makefile.am-new 2007-09-12 19:31:53.000000000 +1200 @@ -985,7 +985,7 @@ DEFAULT_CONFIG_FILE = $(sysconfdir)/squid.conf DEFAULT_MIME_TABLE = $(sysconfdir)/mime.conf DEFAULT_DNSSERVER = $(libexecdir)/`echo dnsserver | sed '$(transform);s/$$/$(EXEEXT)/'` -DEFAULT_LOG_PREFIX = $(localstatedir)/logs +DEFAULT_LOG_PREFIX = $(localstatedir)/log DEFAULT_CACHE_LOG = $(DEFAULT_LOG_PREFIX)/cache.log DEFAULT_ACCESS_LOG = $(DEFAULT_LOG_PREFIX)/access.log DEFAULT_STORE_LOG = $(DEFAULT_LOG_PREFIX)/store.log
FreeBSD, NetBDS, OpenBSD
Squid is developed on FreeBSD. The general build instructions above should be all you need.
MinGW (Windows)
In order to compile squid using the MinGW environment, the packages MSYS, MinGW and msysDTK must be installed. Some additional libraries and tools must be downloaded separately:
libcrypt: MinGW packages repository
db-1.85: TinyCOBOL download area
uudecode: Native Win32 ports of some GNU utilities
Unpack the source archive as usual and run configure.
The following are the recommended minimal options for Windows:
--prefix=c:/squid --disable-wccp --disable-wccpv2 --enable-win32-service --enable-default-hostsfile=none
Then run make and install as usual.
Squid will install into c:\squid. If you wish to install somewhere else, change the --prefix option for configure.
After run squid -z. If that succeeds, try squid -N -D -d1, squid should start. Check that there are no errors. If everything looks good, try browsing through squid.
Now, to run Squid as a Windows system service, run squid -n, this will create a service named "Squid" with automatic startup. To start it run net start squid from command line prompt or use the Services Administrative Applet.
Always check the provided release notes for any version specific detail.
OS/2
by Doug Nazar (<nazard AT man-assoc DOT on DOT ca>).
In order in compile squid, you need to have a reasonable facsimile of a Unix system installed. This includes bash, make, sed, emx, various file utilities and a few more. I've setup a TVFS drive that matches a Unix file system but this probably isn't strictly necessary.
I made a few modifications to the pristine EMX 0.9d install.
added defines for strcasecmp() & strncasecmp() to string.h
- changed all occurrences of time_t to signed long instead of unsigned long
- hacked ld.exe
- to search for both xxxx.a and libxxxx.a
- to produce the correct filename when using the -Zexe option
You will need to run scripts/convert.configure.to.os2 (in the Squid source distribution) to modify the configure script so that it can search for the various programs.
Next, you need to set a few environment variables (see EMX docs for meaning):
export EMXOPT="-h256 -c" export LDFLAGS="-Zexe -Zbin -s"
Now you are ready to configure, make, and install Squid.
Now, don't forget to set EMXOPT before running squid each time. I recommend using the -Y and -N options.
Solaris
Many squid are running well on Solaris. There is just one known problem encountered when building.
The following error occurs on Solaris systems using gcc when the Solaris C compiler is not installed:
/usr/bin/rm -f libmiscutil.a /usr/bin/false r libmiscutil.a rfc1123.o rfc1738.o util.o ... make[1]: *** [libmiscutil.a] Error 255 make[1]: Leaving directory `/tmp/squid-1.1.11/lib' make: *** [all] Error 1
Note on the second line the /usr/bin/false. This is supposed to be a path to the ar program. If configure cannot find ar on your system, then it substitues false.
To fix this you either need to:
Add /usr/ccs/bin to your PATH. This is where the ar command should be. You need to install SUNWbtool if ar is not there. Otherwise,
Install the binutils package from the GNU FTP site. This package includes programs such as ar, as, and ld.
Ubuntu
From 2.6 STABLE 14 Squid should compile easily on this platform. See the Debian build for details on the remaining known problem(s).
Other Platforms
Please let us know of other platforms you have built squid. Whether successful or not.
Please check the page of platforms on which Squid is known to compile. Your problem might be listed there together with a solution. If it isn't listed there, mail us what you are trying, your Squid version, and the problems you encounter.
I see a lot warnings while compiling Squid.
Warnings are usually not usually a big concern, and can be common with software designed to operate on multiple platforms. The Squid developers do wish to make Squid build without errors or warning. If you feel like fixing compile-time warnings, please do so and send us the patches.
undefined reference to __inet_ntoa
by Kevin Sartorelli (<SarKev AT topnz DOT ac DOT nz>) and Andreas Doering (<[doering AT usf DOT uni-kassel DOT de>).
Probably you've recently installed bind 8.x. There is a mismatch between the header files and DNS library that Squid has found. There are a couple of things you can try.
First, try adding -lbind to XTRA_LIBS in src/Makefile. If -lresolv is already there, remove it.
If that doesn't seem to work, edit your arpa/inet.h file and comment out the following:
#define inet_addr __inet_addr #define inet_aton __inet_aton #define inet_lnaof __inet_lnaof #define inet_makeaddr __inet_makeaddr #define inet_neta __inet_neta #define inet_netof __inet_netof #define inet_network __inet_network #define inet_net_ntop __inet_net_ntop #define inet_net_pton __inet_net_pton #define inet_ntoa __inet_ntoa #define inet_pton __inet_pton #define inet_ntop __inet_ntop #define inet_nsap_addr __inet_nsap_addr #define inet_nsap_ntoa __inet_nsap_ntoa
Contents
- How big of a system do I need to run Squid?
- How do I install Squid?
- How do I start Squid?
- How do I start Squid automatically when the system boots?
- How do I tell if Squid is running?
- squid command line options
- How do I see how Squid works?
- Can Squid benefit from SMP systems?
- Is it okay to use separate drives for Squid?
- Is it okay to use RAID on Squid?
How big of a system do I need to run Squid?
There are no hard-and-fast rules. The most important resource for Squid is physical memory, so put as much in your Squid box as you can. Your processor does not need to be ultra-fast. We recommend buying whatever is economical at the time.
Your disk system will be the major bottleneck, so fast disks are important for high-volume caches. SCSI disks generally perform better than ATA, if you can afford them. Serial ATA (SATA) performs somewhere between the two. Your system disk, and logfile disk can probably be IDE without losing any cache performance.
The ratio of memory-to-disk can be important. We recommend that you have at least 32 MB of RAM for each GB of disk space that you plan to use for caching.
How do I install Squid?
After ../CompilingSquid, you can install it with this simple command:
% make install
If you have enabled the pinger then you will also want to type
% su # make install-pinger
After installing, you will want to read ../ConfiguringSquid to edit and customize Squid to run the way you want it to.
How do I start Squid?
First you need to check your Squid configuration. The Squid configuration can be found in /usr/local/squid/etc/squid.conf and includes documentation on all directives.
In the Suqid distribution there is a small QUICKSTART guide indicating which directives you need to look closer at and why. At a absolute minimum you need to change the http_access configuration to allow access from your clients.
To verify your configuration file you can use the -k parse option
% /usr/local/squid/sbin/squid -k parse
If this outputs any errors then these are syntax errors or other fatal misconfigurations and needs to be corrected before you continue. If it is silent and immediately gives back the command promt then your squid.conf is syntactically correct and could be understood by Squid.
After you've finished editing the configuration file, you can start Squid for the first time. The procedure depends a little bit on which version you are using.
First, you must create the swap directories. Do this by running Squid with the -z option:
% /usr/local/squid/sbin/squid -z
|
If you run Squid as root then you may need to first create /usr/local/squid/var/logs and your cache_dir directories and assign ownership of these to the cache_effective_user configured in your squid.conf |
Once the creation of the cache directories completes, you can start Squid and try it out. Probably the best thing to do is run it from your terminal and watch the debugging output. Use this command:
% /usr/local/squid/sbin/squid -NCd1
If everything is working okay, you will see the line:
Ready to serve requests.
If you want to run squid in the background, as a daemon process, just leave off all options:
% /usr/local/squid/sbin/squid
|
Depending on which http_port you select you may need to start squid as root (http_port <1024) |
|
In Squid-2.4 and earlier Squid was installed in bin by default, not sbin |
How do I start Squid automatically when the system boots?
by hand
Squid-2 has a restart feature built in. This greatly simplifies starting Squid and means that you don't need to use RunCache or inittab. At the minimum, you only need to enter the pathname to the Squid executable. For example:
/usr/local/squid/sbin/squid
Squid will automatically background itself and then spawn a child process. In your syslog messages file, you should see something like this:
Sep 23 23:55:58 kitty squid[14616]: Squid Parent: child process 14617 started
That means that process ID 14563 is the parent process which monitors the child process (pid 14617). The child process is the one that does all of the work. The parent process just waits for the child process to exit. If the child process exits unexpectedly, the parent will automatically start another child process. In that case, syslog shows:
Sep 23 23:56:02 kitty squid[14616]: Squid Parent: child process 14617 exited with status 1 Sep 23 23:56:05 kitty squid[14616]: Squid Parent: child process 14619 started
If there is some problem, and Squid can not start, the parent process will give up after a while. Your syslog will show:
Sep 23 23:56:12 kitty squid[14616]: Exiting due to repeated, frequent failures
When this happens you should check your syslog messages and cache.log file for error messages.
When you look at a process (ps command) listing, you'll see two squid processes:
24353 ?? Ss 0:00.00 /usr/local/squid/bin/squid 24354 ?? R 0:03.39 (squid) (squid)
The first is the parent process, and the child process is the one called "(squid)". Note that if you accidentally kill the parent process, the child process will not notice.
If you want to run Squid from your termainal and prevent it from backgrounding and spawning a child process, use the -N command line option.
/usr/local/squid/bin/squid -N
from inittab
On systems which have an /etc/inittab file (Digital Unix, Solaris, IRIX, HP-UX, Linux), you can add a line like this:
sq:3:respawn:/usr/local/squid/sbin/squid.sh < /dev/null >> /tmp/squid.log 2>&1
We recommend using a squid.sh shell script, but you could instead call Squid directly with the -N option and other options you may require. A sameple squid.sh script is shown below:
#!/bin/sh C=/usr/local/squid PATH=/usr/bin:$C/bin TZ=PST8PDT export PATH TZ # User to notify on restarts notify="root" # Squid command line options opts="" cd $C umask 022 sleep 10 while [ -f /var/run/nosquid ]; do sleep 1 done /usr/bin/tail -20 $C/logs/cache.log \ | Mail -s "Squid restart on `hostname` at `date`" $notify exec bin/squid -N $opts
from rc.local
On BSD-ish systems, you will need to start Squid from the "rc" files, usually /etc/rc.local. For example:
if [ -f /usr/local/squid/sbin/squid ]; then echo -n ' Squid' /usr/local/squid/sbin/squid fi
from init.d
Squid ships with a init.d type startup script in contrib/squid.rc which works on most init.d type systems. Or you can write your own using any normal init.d script found in your system as template and add the start/stop fragments shown below.
Start:
/usr/local/squid/sbin/squid
Stop:
/usr/local/squid/sbin/squid -k shutdown n=120 while /usr/local/squid/sbin/squid -k check && [ $n -gt 120 ]; do sleep 1 echo -n . n=`expr $n - 1` done
with daemontools
Create squid service directory, and the log directory (if it does not exist yet).
mkdir -p /usr/local/squid/supervise/log /var/log/squid chown squid /var/log/squid
Then, change to the service directory,
cd /usr/local/squid/supervise
and create 2 executable scripts: run
#!/bin/sh rm -f /var/run/squid/squid.pid exec /usr/local/squid/sbin/squid -N 2>&1
and log/run.
#!/bin/sh exec /usr/local/bin/multilog t /var/log/squid
Finally, start the squid service by linking it into svscan monitored area.
cd /service ln -s /usr/local/squid/supervise squid
Squid should start within 5 seconds.
How do I tell if Squid is running?
You can use the squidclient program:
% squidclient http://www.netscape.com/ > test
There are other command-line HTTP client programs available as well. Two that you may find useful are wget and echoping.
Another way is to use Squid itself to see if it can signal a running Squid process:
% squid -k check
And then check the shell's exit status variable.
Also, check the log files, most importantly the access.log and cache.log files.
squid command line options
These are the command line options for Squid-2:
-a Specify an alternate port number for incoming HTTP requests. Useful for testing a configuration file on a non-standard port.
-d Debugging level for "stderr" messages. If you use this option, then debugging messages up to the specified level will also be written to stderr.
-f Specify an alternate squid.conf file instead of the pathname compiled into the executable.
-h Prints the usage and help message.
-k reconfigure Sends a HUP signal, which causes Squid to re-read its configuration files.
-k rotate Sends an USR1 signal, which causes Squid to rotate its log files. Note, if logfile_rotate is set to zero, Squid still closes and re-opens all log files.
-k shutdown Sends a TERM signal, which causes Squid to wait briefly for current connections to finish and then exit. The amount of time to wait is specified with shutdown_lifetime.
-k interrupt Sends an INT signal, which causes Squid to shutdown immediately, without waiting for current connections.
-k kill Sends a KILL signal, which causes the Squid process to exit immediately, without closing any connections or log files. Use this only as a last resort.
-k debug Sends an USR2 signal, which causes Squid to generate full debugging messages until the next USR2 signal is recieved. Obviously very useful for debugging problems.
-k check Sends a "ZERO" signal to the Squid process. This simply checks whether or not the process is actually running.
-s Send debugging (level 0 only) message to syslog.
-u Specify an alternate port number for ICP messages. Useful for testing a configuration file on a non-standard port.
-v Prints the Squid version.
-z Creates disk swap directories. You must use this option when installing Squid for the first time, or when you add or modify the cache_dir configuration.
-D Do not make initial DNS tests. Normally, Squid looks up some well-known DNS hostnames to ensure that your DNS name resolution service is working properly.
-F If the swap.state logs are clean, then the cache is rebuilt in the "foreground" before any requests are served. This will decrease the time required to rebuild the cache, but HTTP requests will not be satisified during this time.
-N Do not automatically become a background daemon process.
-R Do not set the SO_REUSEADDR option on sockets.
-V Enable virtual host support for the httpd-accelerator mode. This is identical to writing httpd_accel_host virtual in the config file.
-X Enable full debugging while parsing the config file.
-Y Return ICP_OP_MISS_NOFETCH instead of ICP_OP_MISS while the swap.state file is being read. If your cache has mostly child caches which use ICP, this will allow your cache to rebuild faster.
How do I see how Squid works?
Check the cache.log file in your logs directory. It logs interesting (and boring) things as a part of its normal operation.
Install and use the ../CacheManager.
Can Squid benefit from SMP systems?
Squid is a single process application and can not make use of SMP. If you want to make Squid benefit from a SMP system you will need to run multiple instances of Squid and find a way to distribute your users on the different Squid instances just as if you had multiple Squid boxes.
Having two CPUs is indeed nice for running other CPU intensive tasks on the same server as the proxy, such as if you have a lot of logs and need to run various statistics collections during peak hours.
The authentication and group helpers barely use any CPU and does not benefit from dual-CPU configuration.
Is it okay to use separate drives for Squid?
Yes. Running Squid on separate drives to that which your OS is running is often a very good idea.
Generally seek time is what you want to optimize for Squid, or more precisely the total amount of seeks/s your system can sustain. This is why it is better to have your cache_dir spread over multiple smaller disks than one huge drive (especially with SCSI).
If your system is very I/O bound, you will want to have both your OS and log directories running on separate drives.
Is it okay to use RAID on Squid?
We generally recommend you do not run RAID on the Squid disks especially those on which your cache content is stored.
If you must use RAID:
RAID1 suffers a very slight degradation in write performance but slight improvement in read performance, and you may find it better use of resources to run two separate drives and have double the disk cache space. Cache data is not usually considered critical so generally there is little point in running squid on a RAID1 array. However as pointed out above it may make sense to run your O/S in a RAID-1 configuration.
RAID0 (striping) with Squid only gives you the drawback that if you lose one of the drives the whole stripe set is lost. There is no benefit in performance as Squid already distributes the load on the drives quite nicely. It is better to configure multiple separate drives with a separate cache_dir entrie for each one than one RAID0 partition.
Squid is the worst case application for RAID5, whether hardware or software, and will absolutely kill the performance of a RAID5. Once the cache has been filled Squid uses a lot of small random writes which the worst case workload for RAID5, effectively reducing write speed to only little more than that of one single drive.
Include: Nothing found for "^Back to the"!
Configuring Squid
Contents
- Configuring Squid
- How do I configure Squid without re-compiling it?
- What does the squid.conf file do?
- Do you have a squid.conf example?
- How do I join a cache hierarchy?
- How do I join NLANR's cache hierarchy?
- Why should I want to join NLANR's cache hierarchy?
- How do I register my cache with NLANR's registration service?
- How do I find other caches close to me and arrange parent/child/sibling relationships with them?
- My cache registration is not appearing in the Tracker database.
- How do I configure Squid to work behind a firewall?
- How do I configure Squid forward all requests to another proxy?
- I have "dnsserver" processes that aren't being used, should I lower the number in "squid.conf"?
- My ''dnsserver'' average/median service time seems high, how can I reduce it?
- How can I easily change the default HTTP port?
- Is it possible to control how big each ''cache_dir'' is?
- What ''cache_dir'' size should I use?
- I'm adding a new cache_dir. Will I lose my cache?
- Squid and http-gw from the TIS toolkit.
- What is "HTTP_X_FORWARDED_FOR"? Why does squid provide it to WWW servers, and how can I stop it?
- Can Squid anonymize HTTP requests?
- Can I make Squid go direct for some sites?
- Can I make Squid proxy only, without caching anything?
- Can I prevent users from downloading large files?
- How do I enable IPv6?
How do I configure Squid without re-compiling it?
The squid.conf file. By default, this file is located at /usr/local/squid/etc/squid.conf.
Also, a QUICKSTART guide has been included with the source distribution. Please see the directory where you unpacked the source archive.
What does the squid.conf file do?
The squid.conf file defines the configuration for squid. the configuration includes (but not limited to) HTTP port number, the ICP request port number, incoming and outgoing requests, information about firewall access, and various timeout information.
Do you have a squid.conf example?
Yes, after you make install, a sample squid.conf file will exist in the etc directory under the Squid installation directory.
The sample squid.conf file contains comments explaining each option.
From 2.6 the Squid developers also provide a set of Configuration Guides online. They list all the options each version of Squid can accept in its squid.conf file, including the current development test releases.
Squid 2.6 Configuration Guide
Squid 2.7 Configuration Guide
Squid 3.0 Configuration Guide
Squid 2-HEAD Configuration Guide
Squid 3-HEAD Configuration Guide
How do I join a cache hierarchy?
To place your cache in a hierarchy, use the cache_peer directive in squid.conf to specify the parent and sibling nodes.
For example, the following squid.conf file on childcache.example.com configures its cache to retrieve data from one parent cache and two sibling caches:
# squid.conf - On the host: childcache.example.com # # Format is: hostname type http_port udp_port # cache_peer parentcache.example.com parent 3128 3130 cache_peer childcache2.example.com sibling 3128 3130 cache_peer childcache3.example.com sibling 3128 3130
The cache_peer_domain directive allows you to specify that certain caches siblings or parents for certain domains:
# squid.conf - On the host: sv.cache.nlanr.net # # Format is: hostname type http_port udp_port # cache_peer electraglide.geog.unsw.edu.au parent 3128 3130 cache_peer cache1.nzgate.net.nz parent 3128 3130 cache_peer pb.cache.nlanr.net parent 3128 3130 cache_peer it.cache.nlanr.net parent 3128 3130 cache_peer sd.cache.nlanr.net parent 3128 3130 cache_peer uc.cache.nlanr.net sibling 3128 3130 cache_peer bo.cache.nlanr.net sibling 3128 3130 cache_peer_domain electraglide.geog.unsw.edu.au .au cache_peer_domain cache1.nzgate.net.nz .au .aq .fj .nz cache_peer_domain pb.cache.nlanr.net .uk .de .fr .no .se .it cache_peer_domain it.cache.nlanr.net .uk .de .fr .no .se .it cache_peer_domain sd.cache.nlanr.net .mx .za .mu .zm
The configuration above indicates that the cache will use pb.cache.nlanr.net and it.cache.nlanr.net for domains uk, de, fr, no, se and it, sd.cache.nlanr.net for domains mx, za, mu and zm, and cache1.nzgate.net.nz for domains au, aq, fj, and nz.
How do I join NLANR's cache hierarchy?
We have a simple set of guidelines for joining the NLANR cache hierarchy.
Why should I want to join NLANR's cache hierarchy?
The NLANR hierarchy can provide you with an initial source for parent or sibling caches. Joining the NLANR global cache system will frequently improve the performance of your caching service.
How do I register my cache with NLANR's registration service?
Just enable these options in your squid.conf and you'll be registered:
cache_announce 24 announce_to sd.cache.nlanr.net:3131
|
Announcing your cache is not the same thing as joining the NLANR cache hierarchy. You can join the NLANR cache hierarchy without registering, and you can register without joining the NLANR cache hierarchy |
How do I find other caches close to me and arrange parent/child/sibling relationships with them?
Visit the NLANR cache registration database to discover other caches near you. Keep in mind that just because a cache is registered in the database does not mean they are willing to be your parent/sibling/child. But it can't hurt to ask...
My cache registration is not appearing in the Tracker database.
- Your site will not be listed if your cache IP address does not have a DNS PTR record. If we can't map the IP address back to a domain name, it will be listed as "Unknown."
- The registration messages are sent with UDP. We may not be receiving your announcement message due to firewalls which block UDP, or dropped packets due to congestion.
How do I configure Squid to work behind a firewall?
If you are behind a firewall then you can't make direct connections to the outside world, so you must use a parent cache. Normally Squid tries to be smart and only uses cache peers when it makes sense from a perspective of global hit ratio, and thus you need to tell Squid when it can not go direct and must use a parent proxy even if it knows the request will be a cache miss.
You can use the never_direct access list in squid.conf to specify which requests must be forwarded to your parent cache outside the firewall, and the always_direct access list to specify which requests must not be forwarded. For example, if Squid must connect directly to all servers that end with mydomain.com, but must use the parent for all others, you would write:
acl INSIDE dstdomain .mydomain.com always_direct allow INSIDE never_direct allow all
You could also specify internal servers by IP address
acl INSIDE_IP dst 1.2.3.0/24 always_direct allow INSIDE_IP never_direct allow all
Note, however that when you use IP addresses, Squid must perform a DNS lookup to convert URL hostnames to an address. Your internal DNS servers may not be able to lookup external domains.
If you use never_direct and you have multiple parent caches, then you probably will want to mark one of them as a default choice in case Squid can't decide which one to use. That is done with the default keyword on a cache_peer line. For example:
cache_peer xyz.mydomain.com parent 3128 0 no-query default
How do I configure Squid forward all requests to another proxy?
First, you need to give Squid a parent cache. Second, you need to tell Squid it can not connect directly to origin servers. This is done with three configuration file lines:
cache_peer parentcache.foo.com parent 3128 0 no-query default acl all src 0.0.0.0/0.0.0.0 never_direct allow all
Note, with this configuration, if the parent cache fails or becomes unreachable, then every request will result in an error message.
In case you want to be able to use direct connections when all the parents go down you should use a different approach:
cache_peer parentcache.foo.com parent 3128 0 no-query prefer_direct off
The default behaviour of Squid in the absence of positive ICP, HTCP, etc replies is to connect to the origin server instead of using parents. The prefer_direct off directive tells Squid to try parents first.
I have "dnsserver" processes that aren't being used, should I lower the number in "squid.conf"?
The dnsserver processes are used by squid because the gethostbyname(3) library routines used to convert web sites names to their internet addresses blocks until the function returns (i.e., the process that calls it has to wait for a reply). Since there is only one squid process, everyone who uses the cache would have to wait each time the routine was called. This is why the dnsserver is a separate process, so that these processes can block, without causing blocking in squid.
It's very important that there are enough dnsserver processes to cope with every access you will need, otherwise squid will stop occasionally. A good rule of thumb is to make sure you have at least the maximum number of dnsservers squid has ever needed on your system, and probably add two to be on the safe side. In other words, if you have only ever seen at most three dnsserver processes in use, make at least five. Remember that a dnsserver is small and, if unused, will be swapped out.
My ''dnsserver'' average/median service time seems high, how can I reduce it?
First, find out if you have enough dnsserver processes running by looking at the ../CacheManager dns output. Ideally, you should see that the first dnsserver handles a lot of requests, the second one less than the first, etc. The last dnsserver should have serviced relatively few requests. If there is not an obvious decreasing trend, then you need to increase the number of dns_children in the configuration file. If the last dnsserver has zero requests, then you definately have enough.
Another factor which affects the dnsserver service time is the proximity of your DNS resolver. Normally we do not recommend running Squid and named on the same host. Instead you should try use a DNS resolver (named) on a different host, but on the same LAN. If your DNS traffic must pass through one or more routers, this could be causing unnecessary delays.
How can I easily change the default HTTP port?
Before you run the configure script, simply set the CACHE_HTTP_PORT environment variable.
setenv CACHE_HTTP_PORT 8080 ./configure make make install
Is it possible to control how big each ''cache_dir'' is?
With Squid-1.1 it is NOT possible. Each cache_dir is assumed to be the same size. The cache_swap setting defines the size of all cache_dirs taken together. If you have N cache_dirs then each one will hold cache_swap / N Megabytes.
What ''cache_dir'' size should I use?
This chapter assumes that you are dedicating an entire disk partition to a squid cache_dir, as is often the case.
Generally speaking, setting the cache_dir to be the same size as the disk partition is not a wise choice, for two reasons. The first is that squid is not very tolerant to running out of disk space. On top of the cache_dir size, squid will use some extra space for swap.state and then some more temporary storage as work-areas, for instance when rebuilding swap.state. So in any case make sure to leave some extra room for this, or your cache will enter an endless crash-restart cycle.
The second reason is fragmentation (note, this won't apply to the COSS object storage engine - when it will be ready): filesystems can only do so much to avoid fragmentation, and in order to be effective they need to have the space to try and optimize file placement. If the disk is full, optimization is very hard, and when the disk is 100% full optimizing is plain impossible. Get your disk fragmented, and it will most likely be your worst bottleneck, by far offsetting the modest gain you got by having more storage.
Let's see an example: you have a 9Gb disk (these times they're even hard to find..). First thing, manifacturers often lie about disk capacity (the whole Megabyte vs Mebibyte issue), and then the OS needs some space for its accounting structures, so you'll reasonably end up with 8Gib of useable space. You then have to account for another 10% in overhead for Squid, and then the space needed for keeping fragmentation at bay. So in the end the recommended cache_dir setting is 6000 to 7000 Mebibyte.
cache_dir ... 7000 16 256
Its better to start out with a conservative setting and then, after the cache has been filled, look at the disk usage. If you think there is plenty of unused space, then increase the cache_dir setting a little.
If you're getting "disk full" write errors, then you definately need to decrease your cache size.
I'm adding a new cache_dir. Will I lose my cache?
With Squid-2, you will not lose your existing cache. You can add and delete cache_dir lines without affecting any of the others.
Squid and http-gw from the TIS toolkit.
Several people on both the fwtk-users and the squid-users mailing asked about using Squid in combination with http-gw from the TIS toolkit. The most elegant way in my opinion is to run an internal Squid caching proxyserver which handles client requests and let this server forward it's requests to the http-gw running on the firewall. Cache hits won't need to be handled by the firewall.
In this example Squid runs on the same server as the http-gw, Squid uses 8000 and http-gw uses 8080 (web). The local domain is home.nl.
Firewall configuration
Either run http-gw as a daemon from the /etc/rc.d/rc.local (Linux Slackware):
exec /usr/local/fwtk/http-gw -daemon 8080
or run it from inetd like this:
web stream tcp nowait.100 root /usr/local/fwtk/http-gw http-gw
I increased the watermark to 100 because a lot of people run into problems with the default value.
Make sure you have at least the following line in /usr/local/etc/netperm-table:
http-gw: hosts 127.0.0.1
You could add the IP-address of your own workstation to this rule and make sure the http-gw by itself works, like:
http-gw: hosts 127.0.0.1 10.0.0.1
Squid configuration
The following settings are important:
http_port 8000 icp_port 0 cache_peer localhost.home.nl parent 8080 0 default acl HOME dstdomain .home.nl alwayws_direct allow HOME never_direct allow all
This tells Squid to use the parent for all domains other than home.nl. Below, access.log entries show what happens if you do a reload on the Squid-homepage:
872739961.631 1566 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/ - DEFAULT_PARENT/localhost.home.nl - 872739962.976 1266 10.0.0.21 TCP_CLIENT_REFRESH/304 88 GET http://www.nlanr.net/Images/cache_now.gif - DEFAULT_PARENT/localhost.home.nl - 872739963.007 1299 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/Icons/squidnow.gif - DEFAULT_PARENT/localhost.home.nl - 872739963.061 1354 10.0.0.21 TCP_CLIENT_REFRESH/304 83 GET http://www.squid-cache.org/Icons/Squidlogo2.gif - DEFAULT_PARENT/localhost.home.nl
http-gw entries in syslog:
Aug 28 02:46:00 memo http-gw[2052]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta) Aug 28 02:46:00 memo http-gw[2052]: log host=localhost/127.0.0.1 protocol=HTTP cmd=dir dest=www.squid-cache.org path=/ Aug 28 02:46:01 memo http-gw[2052]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1 Aug 28 02:46:01 memo http-gw[2053]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta) Aug 28 02:46:01 memo http-gw[2053]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/Squidlogo2.gif Aug 28 02:46:01 memo http-gw[2054]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta) Aug 28 02:46:01 memo http-gw[2054]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/squidnow.gif Aug 28 02:46:01 memo http-gw[2055]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta) Aug 28 02:46:01 memo http-gw[2055]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.nlanr.net path=/Images/cache_now.gif Aug 28 02:46:02 memo http-gw[2055]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1 Aug 28 02:46:03 memo http-gw[2053]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=2 Aug 28 02:46:04 memo http-gw[2054]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=3
To summarize:
Advantages:
- http-gw allows you to selectively block ActiveX and Java, and it's primary design goal is security.
- The firewall doesn't need to run large applications like Squid.
- The internal Squid-server still gives you the benefit of caching.
Disadvantages:
- The internal Squid proxyserver can't (and shouldn't) work with other parent or neighbor caches.
- Initial requests are slower because these go through http-gw, http-gw also does reverse lookups. Run a nameserver on the firewall or use an internal nameserver.
(contributed by Rodney van den Oever)
What is "HTTP_X_FORWARDED_FOR"? Why does squid provide it to WWW servers, and how can I stop it?
When a proxy-cache is used, a server does not see the connection coming from the originating client. Many people like to implement access controls based on the client address. To accommodate these people, Squid adds its own request header called "X-Forwarded-For" which looks like this:
X-Forwarded-For: 128.138.243.150, unknown, 192.52.106.30
Entries are always IP addresses, or the word unknown if the address could not be determined or if it has been disabled with the forwarded_for configuration option.
We must note that access controls based on this header are extremely weak and simple to fake. Anyone may hand-enter a request with any IP address whatsoever. This is perhaps the reason why client IP addresses have been omitted from the HTTP/1.1 specification.
Because of the weakness of this header, support for access controls based on X-Forwarded-For is not yet available in any officially released version of squid. However, unofficial patches are available from the follow_xff Squid development project and may be integrated into later versions of Squid once a suitable trust model have been developed.
Can Squid anonymize HTTP requests?
Yes it can, however the way of doing it has changed from earlier versions of squid. As of squid-2.2 a more customisable method has been introduced. Please follow the instructions for the version of squid that you are using. As a default, no anonymizing is done.
If you choose to use the anonymizer you might wish to investigate the forwarded_for option to prevent the client address being disclosed. Failure to turn off the forwarded_for option will reduce the effectiveness of the anonymizer. Finally if you filter the User-Agent header using the fake_user_agent option can prevent some user problems as some sites require the User-Agent header.
Squid 2.2
With the introduction of squid 2.2 the anonoymizer has become more customisable. It now allows specification of exactly which headers will be allowed to pass. This is further extended in Squid-2.5 to allow headers to be anonymized conditionally.
For details see the documentation of the http_header_access and header_replace directives in squid.conf.default.
References: Anonymous WWW
Can I make Squid go direct for some sites?
Sure, just use the always_direct access list.
For example, if you want Squid to connect directly to hotmail.com servers, you can use these lines in your config file:
acl hotmail dstdomain .hotmail.com always_direct allow hotmail
Can I make Squid proxy only, without caching anything?
Sure, there are few things you can do.
You can use the cache access list to make Squid never cache any response:
acl all src 0.0.0.0/0 cache deny all
With Squid-2.7, Squid-3.1 and later you can also remove all 'cache_dir' options from your squid.conf to avoid having a cache directory.
With Squid-2.4, 2.5, 2.6, and 3.0 you can do the same by using the "null" storage module:
cache_dir null /tmp
Note: a null cache_dir does not disable caching, but it does save you from creating a cache structure if you have disabled caching with cache. The directory (e.g., /tmp) must exist so that squid can chdir to it, unless you also use the coredump_dir option.
To configure Squid for the "null" storage module, specify it on the configure command line:
--enable-storeio=null,...
Can I prevent users from downloading large files?
You can set the global reply_body_max_size parameter. This option controls the largest HTTP message body that will be sent to a cache client for one request.
If the HTTP response coming from the server has a Content-length header, then Squid compares the content-length value to the reply_body_max_size value. If the content-length is larger,the server connection is closed and the user receives an error message from Squid.
Some responses don't have Content-length headers. In this case, Squid counts how many bytes are written to the client. Once the limit is reached, the client's connection is simply closed.
Note that "creative" user-agents will still be able to download really large files through the cache using HTTP/1.1 range requests.
How do I enable IPv6?
You will need a squid 3.1 release or a daily development snapshot later than 16th Dec 2007 and a computer system with IPv6 capabilities.
IPv6 is available in most Linux 2.6+ kernels, MacOSX 10+, all of the BSD variants, Windows XP/Vista, and others. See your system documentation for its capability and configuration.
IPv6 support in squid needs to be enabled first with
./configure --enable-ipv6
If you are using a packaged version without it, please contact the maintainer about enabling it.
Windows XP users will need:
./configure --enable-ipv6 --with-ipv6-split-stack
When squid is built you will then be able to start squid and see some IPv6 operations. The most active will be DNS as IPv6 addresses are looked up for each website, and IPv6 addresses in the cachemgr reports and logs.
Note: make sure that you check you helper script can handle IPv6 addresses
Squid builds with IPv6 but it won't listen for IPv6 requests.
Your squid may be configured to only listen for IPv4.
Each of the port lines in squid.conf (http_port, icp_port, snmp_port, https_port, maybe others) can take either a port, hostname:port, or ip:port combo.
When these lines contain an IPv4 address or a hostname with only IPv4 addresses squid will only open on those IPv4 you configured. You can add new port lines for IPv6 using [ipv6]:port, add AAAA records to the hostname in DNS, or use only a port.
When only a port is set it should be opening for IPv6 access as well as IPv4. The one exception to default IPv6-listening are port lines where 'transparent' or 'tproxy' options are set. NAT-interception (commonly called transparent proxy) cannot be done in IPv6 so squid will only listen on IPv4 for that type of traffic.
Again Windows XP users are unique, the geeks out there will notice two ports opening for seperate IPv4 and IPv6 access with each plain-port squid.conf line. The effect is the same as with more modern systems.
Your squid may be configured with restrictive ACL.
A good squid configuration will allow only the traffic it has to and deny any other. If you are testing IPv6 using a pre-existing config you may need to update your ACL lines to include the IPv6 addresses or network ranges which should be allowed. src, dst, and other ACL which accept IPv4 addresses or netmasks will also accept IPv6 addresses or CIDR masks now. For example the old ACL to match traffic from localhost is now:
acl localhost src 127.0.0.1/32 ::1/128
Squid listens on IPv6 but says 'Access Denied' or similar.
Your squid may be configured to only connect out through specific IPv4.
A number of networks are known to need tcp_outgoing_address (or various other *_outgoing_address) in their squid.conf. These can force squid to request the website over an IPv4 link when it should be trying an IPv6 link instead. There is a little bit of ACL magic possible with tcp_outgoing_address which will get around this problem.
acl to_ipv6 dst ipv6 tcp_outgoing_address 10.255.0.1 !to_ipv6 tcp_outgoing_address 2001:dead:beef::1 to_ipv6
That will split all outgoing requests into two groups, those headed for IPv4 and those headed for IPv6. It will push the requests out the IP which matches the destination side of the Internet and allow IPv4/IPv6 access with controlled source address exactly as before.
How do I make squid use IPv6 to its helpers?
With squid external ACL helpers there are two new options ipv4 and ipv6. By default to work with older setups, helpers are still connected over IPv4. You can add ipv6 option to use IPv6.
external_acl_type hi ipv6 %DST /etc/squid/hello_world.sh
How do I block IPv6 traffic?
Why you would want to do that without similar limits on IPv4 (using all) is beyond me but here it is. Previously squid defined the all ACL which means the whole Internet. It still does, but now it means both IPv6 and IPv4 so using it will not block IPv6. A new ACL ipv6 has been added to match only IPv6. It can be used directly in a deny or inverted with ! to match IPv4 in an allow.
acl to_ipv6 dst ipv6
Back to the SquidFaq
Contents
- Communication between browsers and Squid
- Manual Browser Configuration
- Firefox and Thunderbird manual configuration
- Microsoft Internet Explorer manual configuration
- Netscape manual configuration
- Lynx and Mosaic manual configuration
- Opera 2.12 manual configuration
- Netmanage Internet Chameleon WebSurfer manual configuration
- Partially Automatic Configuration
- Netscape automatic configuration
- Microsoft Internet Explorer
- Fully Automatically Configuring Browsers for WPAD
- Fully Automatically Configuring Browsers for WPAD with DHCP
- Redundant Proxy Auto-Configuration
- Proxy Auto-Configuration with URL Hashing
- How do I tell Squid to use a specific username for FTP urls?
- IE 5.0x crops trailing slashes from FTP URL's
- IE 6.0 SP1 fails when using authentication
Communication between browsers and Squid
Most web browsers available today support proxying and are easily configured to use a Squid server as a proxy. Some browsers support advanced features such as lists of domains or URL patterns that shouldn't be fetched through the proxy, or JavaScript automatic proxy configuration.
There are three ways to configure browsers to use Squid. The first method involves manually configuring the proxy in each browser. Alternatively, a proxy.pac file can be manually entered into each browser so that it will download the proxy settings (partial auto configuration), and lastly all modern browsers can also and indeed are configured by default to fully automatically configure themselves if the network is configured to support this.
Manual Browser Configuration
This involves manually specifying the proxy server and port name in each browser.
Firefox and Thunderbird manual configuration
Both Firefox and Thunderbird are configured in the same way. Look in the Tools menu, Options, General and then Connection Settings. The options in there are fairly self explanatory. Firefox and Thunderbird support manually specifying the proxy server, automatically downloading a wpad.dat file from a specified source, and additionally wpad auto-detection.
Thunderbird uses these settings for downloading HTTP images in emails.
In both cases if you are manually configuring proxies, make sure you should add relevant statements for your network in the "No Proxy For" boxes.
Microsoft Internet Explorer manual configuration
Select Options from the View menu. Click on the Connection tab. Tick the Connect through Proxy Server option and hit the Proxy Settings button. For each protocol that your Squid server supports (by default, HTTP, FTP, and gopher) enter the Squid server's hostname or IP address and put the HTTP port number for the Squid server (by default, 3128) in the Port column. For any protocols that your Squid does not support, leave the fields blank.
Netscape manual configuration
Select Network Preferences from the Options menu. On the Proxies page, click the radio button next to Manual Proxy Configuration and then click on the View button. For each protocol that your Squid server supports (by default, HTTP, FTP, and gopher) enter the Squid server's hostname or IP address and put the HTTP port number for the Squid server (by default, 3128) in the Port column. For any protocols that your Squid does not support, leave the fields blank.
Lynx and Mosaic manual configuration
For Mosaic and Lynx, you can set environment variables before starting the application. For example (assuming csh or tcsh):
% setenv http_proxy http://mycache.example.com:3128/ % setenv gopher_proxy http://mycache.example.com:3128/ % setenv ftp_proxy http://mycache.example.com:3128/
For Lynx you can also edit the lynx.cfg file to configure proxy usage. This has the added benefit of causing all Lynx users on a system to access the proxy without making environment variable changes for each user. For example:
http_proxy:http://mycache.example.com:3128/ ftp_proxy:http://mycache.example.com:3128/ gopher_proxy:http://mycache.example.com:3128/
Opera 2.12 manual configuration
by Hume Smith
Select Proxy Servers... from the Preferences menu. Check each protocol that your Squid server supports (by default, HTTP, FTP, and Gopher) and enter the Squid server's address as hostname:port (e.g. mycache.example.com:3128 or 123.45.67.89:3128). Click on Okay to accept the setup.
Notes:
- Opera 2.12 doesn't support gopher on its own, but requires a proxy; therefore Squid's gopher proxying can extend the utility of your Opera immensely.
Unfortunately, Opera 2.12 chokes on some HTTP requests, for example abuse.net.
At the moment I think it has something to do with cookies. If you have trouble with a site, try disabling the HTTP proxying by unchecking that protocol in the Preferences|Proxy Servers... dialogue. Opera will remember the address, so reenabling is easy.
Netmanage Internet Chameleon WebSurfer manual configuration
Netmanage WebSurfer supports manual proxy configuration and exclusion lists for hosts or domains that should not be fetched via proxy (this information is current as of WebSurfer 5.0). Select Preferences from the Settings menu. Click on the Proxies tab. Select the Use Proxy options for HTTP, FTP, and gopher. For each protocol that enter the Squid server's hostname or IP address and put the HTTP port number for the Squid server (by default, 3128) in the Port boxes. For any protocols that your Squid does not support, leave the fields blank.
On the same configuration window, you'll find a button to bring up the exclusion list dialog box, which will let you enter some hosts or domains that you don't want fetched via proxy.
Partially Automatic Configuration
This involves the browser being preconfigured with the location of an autoconfiguration script.
Netscape automatic configuration
Netscape Navigator's proxy configuration can be automated with JavaScript (for Navigator versions 2.0 or higher). Select Network Preferences from the Options menu. On the Proxies page, click the radio button next to Automatic Proxy Configuration and then fill in the URL for your JavaScript proxy configuration file in the text box. The box is too small, but the text will scroll to the r8ight as you go.
You may also wish to consult Netscape's documentation for the Navigator JavaScript proxy configuration
Here is a sample auto configuration file from Oskar Pearson (link to save at the bottom):
//We (www.is.co.za) run a central cache for our customers that they //access through a firewall - thus if they want to connect to their intranet //system (or anything in their domain at all) they have to connect //directly - hence all the "fiddling" to see if they are trying to connect //to their local domain. // //Replace each occurrence of company.com with your domain name //and if you have some kind of intranet system, make sure //that you put it's name in place of "internal" below. // //We also assume that your cache is called "cache.company.com", and //that it runs on port 8080. Change it down at the bottom. // //(C) Oskar Pearson and the Internet Solution (http://www.is.co.za) function FindProxyForURL(url, host) { //If they have only specified a hostname, go directly. if (isPlainHostName(host)) return "DIRECT"; //These connect directly if the machine they are trying to //connect to starts with "intranet" - ie http://intranet //Connect directly if it is intranet.* //If you have another machine that you want them to //access directly, replace "internal*" with that //machine's name if (shExpMatch( host, "intranet*")|| shExpMatch(host, "internal*")) return "DIRECT"; //Connect directly to our domains (NB for Important News) if (dnsDomainIs( host,"company.com")|| //If you have another domain that you wish to connect to //directly, put it in here dnsDomainIs(host,"sistercompany.com")) return "DIRECT"; //So the error message "no such host" will appear through the //normal Netscape box - less support queries :) if (!isResolvable(host)) return "DIRECT"; //We only cache http, ftp and gopher if (url.substring(0, 5) == "http:" || url.substring(0, 4) == "ftp:"|| url.substring(0, 7) == "gopher:") //Change the ":8080" to the port that your cache //runs on, and "cache.company.com" to the machine that //you run the cache on return "PROXY cache.company.com:8080; DIRECT"; //We don't cache WAIS if (url.substring(0, 5) == "wais:") return "DIRECT"; else return "DIRECT"; }
Microsoft Internet Explorer
Microsoft Internet Explorer, versions 4.0 and above, supports JavaScript automatic proxy configuration in a Netscape-compatible way. Just select Options from the View menu. Click on the Advanced tab. In the lower left-hand corner, click on the Automatic Configuration button. Fill in the URL for your JavaScript file in the dialog box it presents you. Then exit MSIE and restart it for the changes to take effect. MSIE will reload the JavaScript file every time it starts.
Fully Automatically Configuring Browsers for WPAD
by Mark Reynolds
You may like to start by reading the Expired Internet-Draft that describes WPAD.
After reading the 8 steps below, if you don't understand any of the terms or methods mentioned, you probably shouldn't be doing this. Implementing wpad requires you to fully understand:
- web server installations and modifications.
- squid proxy server (or others) installation etc.
- Domain Name System maintenance etc.
|
Please don't bombard the squid list with web server or DNS questions. See your system administrator, or do some more research on those topics. |
This is not a recommendation for any product or version. All major browsers out now implementing WPAD. I think WPAD is an excellent feature that will return several hours of life per month.
I have only focused on the domain name method, to the exclusion of the DHCP method. I think the dns method might be easier for most people. I don't currently, and may never, fully understand wpad and IE5, but this method worked for me. It may work for you.
But if you'd rather just have a go ...
Create a standard Netscape auto proxy config file. The sample provided above is more than adequate to get you going. No doubt all the other load balancing and backup scripts will be fine also.
Store the resultant file in the document root directory of a handy web server as wpad.dat (Not proxy.pac as you may have previously done.) Andrei Ivanov notes that you should be able to use an HTTP redirect if you want to store the wpad.dat file somewhere else. You can probably even redirect wpad.dat to proxy.pac:
Redirect /wpad.dat http://racoon.riga.lv/proxy.pac
If you do nothing more, a URL like http://www.your.domain.name/wpad.dat should bring up the script text in your browser window.
Insert the following entry into your web server mime.types file. Maybe in addition to your pac file type, if you've done this before.
application/x-ns-proxy-autoconfig dat
And then restart your web server, for new mime type to work.
Assuming Internet Explorer 5, under Tools, Internet Options, Connections, Settings or Lan Settings, set ONLY Use Automatic Configuration Script to be the URL for where your new wpad.dat file can be found. i.e. http://www.your.domain.name/wpad.dat. Test that that all works as per your script and network. There's no point continuing until this works ...
Create/install/implement a DNS record so that wpad.your.domain.name resolves to the host above where you have a functioning auto config script running. You should now be able to use http://wpad.your.domain.name/wpad.dat as the Auto Config Script location in step 5 above.
And finally, go back to the setup screen detailed in 5 above, and choose nothing but the Automatically Detect Settings option, turning everything else off. Best to restart IE5, as you normally do with any Microsoft product... And it should all work. Did for me anyway.
One final question might be "Which domain name does the client (IE5) use for the wpad... lookup?" It uses the hostname from the control panel setting. It starts the search by adding the hostname wpad to current fully-qualified domain name. For instance, a client in a.b.Microsoft.com would search for a WPAD server at wpad.a.b.microsoft.com. If it could not locate one, it would remove the bottom-most domain and try again; for instance, it would try wpad.b.microsoft.com next. IE 5 would stop searching when it found a WPAD server or reached the third-level domain, wpad.microsoft.com.
Anybody using these steps to install and test, please feel free to make notes, corrections or additions for improvements, and post back to the squid list...
There are probably many more tricks and tips which hopefully will be detailed here in the future. Things like wpad.dat files being served from the proxy server themselves, maybe with a round robin dns setup for the WPAD host.
Fully Automatically Configuring Browsers for WPAD with DHCP
You can also use DHCP to configure browsers for WPAD. This technique allows you to set any URL as the PAC URL. For ISC DHCPD, enter a line like this in your dhcpd.conf file:
option wpad code 252 = text; option wpad "http://www.example.com/proxy.pac";
Replace the hostname with the name or address of your own server.
Ilja Pavkovic notes that the DHCP mode does not work reliably with every version of Internet Explorer. The DNS name method to find wpad.dat is more reliable.
Another user adds that IE 6.01 seems to strip the last character from the URL. By adding a trailing newline, he is able to make it work with both IE 5.0 and 6.0:
option wpad "http://www.example.com/proxy.pac\n";
Redundant Proxy Auto-Configuration
by Rodney van den Oever
There's one nasty side-effect to using auto-proxy scripts: if you start the web browser it will try and load the auto-proxy-script.
If your script isn't available either because the web server hosting the script is down or your workstation can't reach the web server (e.g. because you're working off-line with your notebook and just want to read a previously saved HTML-file) you'll get different errors depending on the browser you use.
The Netscape browser will just return an error after a timeout (after that it tries to find the site 'www.proxy.com' if the script you use is called 'proxy.pac').
The Microsoft Internet Explorer on the other hand won't even start, no window displays, only after about 1 minute it'll display a window asking you to go on with/without proxy configuration.
The point is that your workstations always need to locate the proxy-script. I created some extra redundancy by hosting the script on two web servers (actually Apache web servers on the proxy servers themselves) and adding the following records to my primary nameserver:
proxy IN A 10.0.0.1 ; IP address of proxy1 IN A 10.0.0.2 ; IP address of proxy2
The clients just refer to 'http://proxy/proxy.pac'. This script looks like this:
function FindProxyForURL(url,host) { // Hostname without domainname or host within our own domain? // Try them directly: // http://www.domain.com actually lives before the firewall, so // make an exception: if ((isPlainHostName(host)||dnsDomainIs( host,".domain.com")) && !localHostOrDomainIs(host, "www.domain.com")) return "DIRECT"; // First try proxy1 then proxy2. One server mostly caches '.com' // to make sure both servers are not // caching the same data in the normal situation. The other // server caches the other domains normally. // If one of 'm is down the client will try the other server. else if (shExpMatch(host, "*.com")) return "PROXY proxy1.domain.com:8080; PROXY proxy2.domain.com:8081; DIRECT"; return "PROXY proxy2.domain.com:8081; PROXY proxy1.domain.com:8080; DIRECT"; }
I made sure every client domain has the appropriate 'proxy' entry. The clients are automatically configured with two nameservers using DHCP.
Proxy Auto-Configuration with URL Hashing
The Sharp Super Proxy Script page contains a lot of good information about hash-based proxy auto-configuration scripts. With these you can distribute the load between a number of caching proxies.
How do I tell Squid to use a specific username for FTP urls?
Insert your username in the host part of the URL, for example:
ftp://joecool@ftp.foo.org/
Squid should then prompt you for your account password. Alternatively, you can specify both your username and password in the URL itself:
ftp://joecool:secret@ftp.foo.org/
However, we certainly do not recommend this, as it could be very easy for someone to see or grab your password.
IE 5.0x crops trailing slashes from FTP URL's
There was a bug in the 5.0x releases of Internet Explorer in which IE cropped any trailing slash off an FTP URL. The URL showed up correctly in the browser's "Address:" field, however squid logs show that the trailing slash was being taken off.
An example of where this impacted squid if you had a setup where squid would go direct for FTP directory listings but forward a request to a parent for FTP file transfers. This was useful if your upstream proxy was an older version of Squid or another vendors software which displayed directory listings with broken icons and you wanted your own local version of squid to generate proper FTP directory listings instead. The workaround for this is to add a double slash to any directory listing in which the slash was important, or else upgrade to IE 5.5. (Or use Firefox if you cannot upgrade your IE)
IE 6.0 SP1 fails when using authentication
When using authentication with Internet Explorer 6 SP1, you may encounter issues when you first launch Internet Explorer. The problem will show itself when you first authenticate, you will receive a "Page Cannot Be Displayed" error. However, if you click refresh, the page will be correctly displayed.
This only happens immediately after you authenticate.
This is not a Squid error or bug. Microsoft broke the Basic Authentication when they put out IE6 SP1.
There is a knowledgebase article ( KB 331906) regarding this issue, which contains a link to a downloadable "hot fix." They do warn that this code is not "regression tested" but so far there have not been any reports of this breaking anything else. The problematic file is wininet.dll. Please note that this hotfix is included in the latest security update.
Lloyd Parkes notes that the article references another article, KB 312176. He says that you must not have the registry entry that KB 312176 encourages users to add to their registry.
According to Joao Coutinho, this simple solution also corrects the problem:
- Go to Tools/Internet
- Go to Options/Advanced
- UNSELECT "Show friendly HTTP error messages" under Browsing.
Another possible workaround to these problems is to make the ERR_CACHE_ACCESS_DENIED larger than 1460 bytes. This should trigger IE to handle the authentication in a slightly different manner.
Contents
- Squid Log Files
- squid.out
- cache.log
- useragent.log
- store.log
- hierarchy.log
- access.log
- access.log native format in detail
- sending access.log to syslog
- customizable access.log
- cache/log (Squid-1.x)
- swap.state (Squid-2.x)
- Which log files can I delete safely?
- How can I disable Squid's log files?
- What is the maximum size of access.log?
- My log files get very big!
- I want to use another tool to maintain the log files.
- Managing log files
- Why do I get ERR_NO_CLIENTS_BIG_OBJ messages so often?
- What does ERR_LIFETIME_EXP mean?
- Retrieving "lost" files from the cache
- Can I use store.log to figure out if a response was cachable?
- Can I pump the squid access.log directly into a pipe?
Squid Log Files
The logs are a valuable source of information about Squid workloads and performance. The logs record not only access information, but also system configuration errors and resource consumption (eg, memory, disk space). There are several log file maintained by Squid. Some have to be explicitely activated during compile time, others can safely be deactivated during run-time.
There are a few basic points common to all log files. The time stamps logged into the log files are usually UTC seconds unless stated otherwise. The initial time stamp usually contains a millisecond extension.
squid.out
If you run your Squid from the RunCache script, a file squid.out contains the Squid startup times, and also all fatal errors, e.g. as produced by an assert() failure. If you are not using RunCache, you will not see such a file.
cache.log
The cache.log file contains the debug and error messages that Squid generates. If you start your Squid using the default RunCache script, or start it with the -s command line option, a copy of certain messages will go into your syslog facilities. It is a matter of personal preferences to use a separate file for the squid log data.
From the area of automatic log file analysis, the cache.log file does not have much to offer. You will usually look into this file for automated error reports, when programming Squid, testing new features, or searching for reasons of a perceived misbehaviour, etc.
useragent.log
The user agent log file is only maintained, if
you configured the compile time --enable-useragent-log option, and
you pointed the useragent_log configuration option to a file.
From the user agent log file you are able to find out about distributation of browsers of your clients. Using this option in conjunction with a loaded production squid might not be the best of all ideas.
store.log
The store.log file covers the objects currently kept on disk or removed ones. As a kind of transaction log it is ususally used for debugging purposes. A definitive statement, whether an object resides on your disks is only possible after analysing the complete log file. The release (deletion) of an object may be logged at a later time than the swap out (save to disk).
The store.log file may be of interest to log file analysis which looks into the objects on your disks and the time they spend there, or how many times a hot object was accessed. The latter may be covered by another log file, too. With knowledge of the cache_dir configuration option, this log file allows for a URL to filename mapping without recursing your cache disks. However, the Squid developers recommend to treat store.log primarily as a debug file, and so should you, unless you know what you are doing.
The print format for a store log entry (one line) consists of thirteen space-separated columns, compare with the storeLog() function in file src/store_log.c:
9ld.%03d %-7s %02d %08X %s %4d %9ld %9ld %9ld %s %ld/%ld %s %s
time The timestamp when the line was logged in UTC with a millisecond fraction.
action The action the object was sumitted to, compare with src/store_log.c:
CREATE Seems to be unused.
RELEASE The object was removed from the cache (see also file number below).
SWAPOUT The object was saved to disk.
SWAPIN The object existed on disk and was read into memory.
dir number The cache_dir number this object was stored into, starting at 0 for your first cache_dir line.
file number The file number for the object storage file. Please note that the path to this file is calculated according to your cache_dir configuration. A file number of FFFFFFFF indicates "memory only" objects. Any action code for such a file number refers to an object which existed only in memory, not on disk. For instance, if a RELEASE code was logged with file number FFFFFFFF, the object existed only in memory, and was released from memory.
hash The hash value used to index the object in the cache. Squid currently uses MD5 for the hash value.
status The HTTP reply status code.
datehdr The value of the HTTP Date reply header.
lastmod The value of the HTTP Last-Modified reply header.
expires The value of the HTTP "Expires: " reply header.
type The HTTP Content-Type major value, or "unknown" if it cannot be determined.
sizes This column consists of two slash separated fields:
The advertised content length from the HTTP Content-Length reply header.
- The size actually read.
- If the advertised (or expected) length is missing, it will be set to zero. If the advertised length is not zero, but not equal to the real length, the object will be realeased from the cache.
method The request method for the object, e.g. GET.
key The key to the object, usually the URL.
The datehdr, lastmod, and expires values are all expressed in UTC seconds. The actual values are parsed from the HTTP reply headers. An unparsable header is represented by a value of -1, and a missing header is represented by a value of -2.
hierarchy.log
This logfile exists for Squid-1.0 only. The format is
[date] URL peerstatus peerhost
access.log
Most log file analysis program are based on the entries in access.log.
Squid 2.6 allows the administrators to configure their logfile format with great flexibility previous version offered a much more limited functionality.
Previous versions allow to log accesses either in native logformat (default) or using the http common logfile format (CLF). The latter is enabled by specifying the emulate_httpd_log option in squid.conf.
The common log file format
The Common Logfile Format is used by numerous HTTP servers. This format consists of the following seven fields:
remotehost rfc931 authuser [date] "method URL" status bytes
It is parsable by a variety of tools. The common format contains different information than the native log file format. The HTTP version is logged, which is not logged in native log file format.
The native log file format
The native format is different for different major versions of Squid. For Squid-1.0 it is:
time elapsed remotehost code/status/peerstatus bytes method URL
For Squid-1.1, the information from the hierarchy.log was moved into access.log. The format is:
time elapsed remotehost code/status bytes method URL rfc931 peerstatus/peerhost type
For Squid-2 the columns stay the same, though the content within may change a little.
The native log file format logs more and different information than the common log file format: the request duration, some timeout information, the next upstream server address, and the content type.
There exist tools, which convert one file format into the other. Please mind that even though the log formats share most information, both formats contain information which is not part of the other format, and thus this part of the information is lost when converting. Especially converting back and forth is not possible without loss.
squid2common.pl is a conversion utility, which converts any of the squid log file formats into the old CERN proxy style output. There exist tools to analyse, evaluate and graph results from that format.
access.log native format in detail
We recommend that you use Squid's native log format due to its greater amount of information made available for later analysis. The print format line for native access.log entries looks like this:
"%9d.%03d %6d %s %s/%03d %d %s %s %s %s%s/%s %s"
Therefore, an access.log entry usually consists of (at least) 10 columns separated by one ore more spaces:
time A Unix timestamp as UTC seconds with a millisecond resolution. You can convert Unix timestamps into something more human readable using this short perl script:
#! /usr/bin/perl -p s/^\d+\.\d+/localtime $&/e;
duration The elapsed time considers how many milliseconds the transaction busied the cache. It differs in interpretation between TCP and UDP:
- For HTTP this is basically the time from having received the request to when Squid finishes sending the last byte of the response.
- For ICP, this is the time between scheduling a reply and actually sending it.
Please note that the entries are logged after the reply finished being sent, not during the lifetime of the transaction.
client address The IP address of the requesting instance, the client IP address. The client_netmask configuration option can distort the clients for data protection reasons, but it makes analysis more difficult. Often it is better to use one of the log file anonymizers. Also, the log_fqdn configuration option may log the fully qualified domain name of the client instead of the dotted quad. The use of that option is discouraged due to its performance impact.
result codes This column is made up of two entries separated by a slash. This column encodes the transaction result:
The cache result of the request contains information on the kind of request, how it was satisfied, or in what way it failed. Please refer to Squid result codes for valid symbolic result codes. Several codes from older versions are no longer available, were renamed, or split. Especially the ERR_ codes do not seem to appear in the log file any more. Also refer to Squid result codes for details on the codes no longer available in Squid-2. The NOVM versions and Squid-2 also rely on the Unix buffer cache, thus you will see less TCP_MEM_HITs than with a Squid-1. Basically, the NOVM feature relies on read() to obtain an object, but due to the kernel buffer cache, no disk activity is needed. Only small objects (below 8KByte) are kept in Squid's part of main memory. The status part contains the HTTP result codes with some Squid specific extensions. Squid uses a subset of the RFC defined error codes for HTTP. Refer to section status codes for details of the status codes ecognized by a Squid-2.
bytes The size is the amount of data delivered to the client. Mind that this does not constitute the net object size, as headers are also counted. Also, failed requests may deliver an error page, the size of which is also logged here.
request method The request method to obtain an object. Please refer to section request-methods for available methods. If you turned off log_icp_queries in your configuration, you will not see (and thus unable to analyse) ICP exchanges. The PURGE method is only available, if you have an ACL for "method purge" enabled in your configuration file.
URL This column contains the URL requested. Please note that the log file may contain whitespaces for the URI. The default configuration for uri_whitespace denies whitespaces, though.
rfc931 The eigth column may contain the ident lookups for the requesting client. Since ident lookups have performance impact, the default configuration turns ident_loookups off. If turned off, or no ident information is available, a "-" will be logged.
hierarchy code The hierarchy information consists of three items:
Any hierarchy tag may be prefixed with TIMEOUT_, if the timeout occurs waiting for all ICP replies to return from the neighbours. The timeout is either dynamic, if the icp_query_timeout was not set, or the time configured there has run up.
A code that explains how the request was handled, e.g. by forwarding it to a peer, or going straight to the source. Refer to Hierarchy Codes for details on hierarchy codes and removed hierarchy codes.
- The IP address or hostname where the request (if a miss) was forwarded. For requests sent to origin servers, this is the origin server's IP address. For requests sent to a neighbor cache, this is the neighbor's hostname. NOTE: older versions of Squid would put the origin server hostname here.
type The content type of the object as seen in the HTTP reply header. Please note that ICP exchanges usually don't have any content type, and thus are logged "-". Also, some weird replies have content types ":" or even empty ones.
There may be two more columns in the access.log, if the (debug) option log_mime_headers is enabled In this case, the HTTP request headers are logged between a "[" and a "]", and the HTTP reply headers are also logged between "[" and "]". All control characters like CR and LF are URL-escaped, but spaces are not escaped! Parsers should watch out for this.
Squid result codes
The TCP_ codes refer to requests on the HTTP port (usually 3128). The UDP_ codes refer to requests on the ICP port (usually 3130). If ICP logging was disabled using the log_icp_queries option, no ICP replies will be logged.
The following result codes were taken from a Squid-2, compare with the log_tags struct in src/access_log.c:
TCP_HIT A valid copy of the requested object was in the cache.
TCP_MISS The requested object was not in the cache.
TCP_REFRESH_HIT The requested object was cached but STALE. The IMS query for the object resulted in "304 not modified".
TCP_REF_FAIL_HIT The requested object was cached but STALE. The IMS query failed and the stale object was delivered.
TCP_REFRESH_MISS The requested object was cached but STALE. The IMS query returned the new content.
TCP_CLIENT_REFRESH_MISS The client issued a "no-cache" pragma, or some analogous cache control command along with the request. Thus, the cache has to refetch the object.
TCP_IMS_HIT The client issued an IMS request for an object which was in the cache and fresh.
TCP_SWAPFAIL_MISS The object was believed to be in the cache, but could not be accessed.
TCP_NEGATIVE_HIT Request for a negatively cached object, e.g. "404 not found", for which the cache believes to know that it is inaccessible. Also refer to the explainations for negative_ttl in your squid.conf file.
TCP_MEM_HIT A valid copy of the requested object was in the cache and it was in memory, thus avoiding disk accesses.
TCP_DENIED Access was denied for this request.
TCP_OFFLINE_HIT The requested object was retrieved from the cache during offline mode. The offline mode never validates any object, see offline_mode in squid.conf file.
UDP_HIT A valid copy of the requested object was in the cache.
UDP_MISS The requested object is not in this cache.
UDP_DENIED Access was denied for this request.
UDP_INVALID An invalid request was received.
UDP_MISS_NOFETCH During "-Y" startup, or during frequent failures, a cache in hit only mode will return either UDP_HIT or this code. Neighbours will thus only fetch hits.
NONE Seen with errors and cachemgr requests.
The following codes are no longer available in Squid-2:
ERR_* Errors are now contained in the status code.
TCP_CLIENT_REFRESH See: TCP_CLIENT_REFRESH_MISS.
TCP_SWAPFAIL See: TCP_SWAPFAIL_MISS.
TCP_IMS_MISS Deleted, now replaced with TCP_IMS_HIT.
UDP_HIT_OBJ Refers to an old version that would send cache hits in ICP replies. No longer implemented.
UDP_RELOADING See: UDP_MISS_NOFETCH.
HTTP status codes
These are taken from RFC 2616 and verified for Squid. Squid-2 uses almost all codes except 307 (Temporary Redirect), 416 (Request Range Not Satisfiable), and 417 (Expectation Failed). Extra codes include 0 for a result code being unavailable, and 600 to signal an invalid header, a proxy error. Also, some definitions were added as for RFC 2518 (WebDAV). Yes, there are really two entries for status code 424, compare with http_status in src/enums.h:
000 Used mostly with UDP traffic. 100 Continue 101 Switching Protocols *102 Processing 200 OK 201 Created 202 Accepted 203 Non-Authoritative Information 204 No Content 205 Reset Content 206 Partial Content *207 Multi Status 300 Multiple Choices 301 Moved Permanently 302 Moved Temporarily 303 See Other 304 Not Modified 305 Use Proxy [307 Temporary Redirect] 400 Bad Request 401 Unauthorized 402 Payment Required 403 Forbidden 404 Not Found 405 Method Not Allowed 406 Not Acceptable 407 Proxy Authentication Required 408 Request Timeout 409 Conflict 410 Gone 411 Length Required 412 Precondition Failed 413 Request Entity Too Large 414 Request URI Too Large 415 Unsupported Media Type [416 Request Range Not Satisfiable] [417 Expectation Failed] *424 Locked *424 Failed Dependency *433 Unprocessable Entity 500 Internal Server Error 501 Not Implemented 502 Bad Gateway 503 Service Unavailable 504 Gateway Timeout 505 HTTP Version Not Supported *507 Insufficient Storage 600 Squid header parsing error
Request methods
Squid recognizes several request methods as defined in RFC 2616. Newer versions of Squid (2.2.STABLE5 and above) also recognize RFC 2518 "HTTP Extensions for Distributed Authoring -- WEBDAV" extensions.
method defined cachabil. meaning --------- ---------- ---------- ------------------------------------------- GET HTTP/0.9 possibly object retrieval and simple searches. HEAD HTTP/1.0 possibly metadata retrieval. POST HTTP/1.0 CC or Exp. submit data (to a program). PUT HTTP/1.1 never upload data (e.g. to a file). DELETE HTTP/1.1 never remove resource (e.g. file). TRACE HTTP/1.1 never appl. layer trace of request route. OPTIONS HTTP/1.1 never request available comm. options. CONNECT HTTP/1.1r3 never tunnel SSL connection. ICP_QUERY Squid never used for ICP based exchanges. PURGE Squid never remove object from cache. PROPFIND rfc2518 ? retrieve properties of an object. PROPATCH rfc2518 ? change properties of an object. MKCOL rfc2518 never create a new collection. COPY rfc2518 never create a duplicate of src in dst. MOVE rfc2518 never atomically move src to dst. LOCK rfc2518 never lock an object against modifications. UNLOCK rfc2518 never unlock an object.
Hierarchy Codes
The following hierarchy codes are used with Squid-2:
NONE For TCP HIT, TCP failures, cachemgr requests and all UDP requests, there is no hierarchy information.
DIRECT The object was fetched from the origin server.
SIBLING_HIT The object was fetched from a sibling cache which replied with UDP_HIT.
PARENT_HIT The object was requested from a parent cache which replied with UDP_HIT.
DEFAULT_PARENT No ICP queries were sent. This parent was chosen because it was marked "default" in the config file.
SINGLE_PARENT The object was requested from the only parent appropriate for the given URL.
FIRST_UP_PARENT The object was fetched from the first parent in the list of parents.
NO_PARENT_DIRECT The object was fetched from the origin server, because no parents existed for the given URL.
FIRST_PARENT_MISS The object was fetched from the parent with the fastest (possibly weighted) round trip time.
CLOSEST_PARENT_MISS This parent was chosen, because it included the the lowest RTT measurement to the origin server. See also the closest-only peer configuration option.
CLOSEST_PARENT The parent selection was based on our own RTT measurements.
CLOSEST_DIRECT Our own RTT measurements returned a shorter time than any parent.
NO_DIRECT_FAIL The object could not be requested because of a firewall configuration, see also never_direct and related material, and no parents were available.
SOURCE_FASTEST The origin site was chosen, because the source ping arrived fastest.
ROUNDROBIN_PARENT No ICP replies were received from any parent. The parent was chosen, because it was marked for round robin in the config file and had the lowest usage count.
CACHE_DIGEST_HIT The peer was chosen, because the cache digest predicted a hit. This option was later replaced in order to distinguish between parents and siblings.
CD_PARENT_HIT The parent was chosen, because the cache digest predicted a hit.
CD_SIBLING_HIT The sibling was chosen, because the cache digest predicted a hit.
NO_CACHE_DIGEST_DIRECT This output seems to be unused?
CARP The peer was selected by CARP.
ANY_PARENT part of src/peer_select.c:hier_strings[].
INVALID CODE part of src/peer_select.c:hier_strings[].
Almost any of these may be preceded by 'TIMEOUT_' if the two-second (default) timeout occurs waiting for all ICP replies to arrive from neighbors, see also the icp_query_timeout configuration option.
The following hierarchy codes were removed from Squid-2:
code meaning -------------------- ------------------------------------------------- PARENT_UDP_HIT_OBJ hit objects are not longer available. SIBLING_UDP_HIT_OBJ hit objects are not longer available. SSL_PARENT_MISS SSL can now be handled by squid. FIREWALL_IP_DIRECT No special logging for hosts inside the firewall. LOCAL_IP_DIRECT No special logging for local networks.
sending access.log to syslog
Squid 2.6 allows to send access.log contents to a local syslog server by specifying syslog as a file path, for example as in:
access_log syslog squid
customizable access.log
Squid 2.6 and later versions feature a customizeable access.log format. To use this feature you must first define a log format name using the logformat directive, then use the extended access_log directive specifying your newly-defined logfile format.
defining a format
FIXME: complete this chapter
using a custom logfile format
FIXME: complete this chapter
cache/log (Squid-1.x)
This file has a rather unfortunate name. It also is often called the swap log. It is a record of every cache object written to disk. It is read when Squid starts up to "reload" the cache. If you remove this file when squid is NOT running, you will effectively wipe out your cache contents. If you remove this file while squid IS running, you can easily recreate it. The safest way is to simply shutdown the running process:
% squid -k shutdown
This will disrupt service, but at least you will have your swap log back. Alternatively, you can tell squid to rotate its log files. This also causes a clean swap log to be written.
% squid -k rotate
For Squid-1.1, there are six fields:
[1] fileno: The swap file number holding the object data. This is mapped to a pathname on your filesystem.
[2] timestamp: This is the time when the object was last verified to be current. The time is a hexadecimal representation of Unix time.
[3] expires: This is the value of the Expires header in the HTTP reply. If an Expires header was not present, this will be -2 or FFFFFFFE. If the Expires header was present, but invalid (unparsable), this will be -1 or FFFFFFFF.
[4] lastmod: Value of the HTTP reply Last-Modified header. If missing it will be -2, if invalid it will be -1.
[5] size: Size of the object, including headers.
[6] url: The URL naming this object.
swap.state (Squid-2.x)
In Squid-2, the swap log file is now called swap.state. This is a binary file that includes MD5 checksums, and StoreEntry fields. Please see the Programmers' Guide for information on the contents and format of that file.
If you remove swap.state while Squid is running, simply send Squid the signal to rotate its log files:
% squid -k rotate
Alternatively, you can tell Squid to shutdown and it will rewrite this file before it exits.
If you remove the swap.state while Squid is not running, you will not lose your entire cache. In this case, Squid will scan all of the cache directories and read each swap file to rebuild the cache. This can take a very long time, so you'll have to be patient.
By default the swap.state file is stored in the top-level of each cache_dir. You can move the logs to a different location with the cache_swap_log option.
Which log files can I delete safely?
You should never delete access.log, store.log, cache.log, or swap.state while Squid is running. With Unix, you can delete a file when a process has the file opened. However, the filesystem space is not reclaimed until the process closes the file.
If you accidentally delete swap.state while Squid is running, you can recover it by following the instructions in the previous questions. If you delete the others while Squid is running, you can not recover them.
The correct way to maintain your log files is with Squid's "rotate" feature. You should rotate your log files at least once per day. The current log files are closed and then renamed with numeric extensions (.0, .1, etc). If you want to, you can write your own scripts to archive or remove the old log files. If not, Squid will only keep up to logfile_rotate versions of each log file. The logfile rotation procedure also writes a clean swap.state file, but it does not leave numbered versions of the old files.
If you set logfile_rotate to 0, Squid simply closes and then re-opens the logs. This allows third-party logfile management systems, such as newsyslog, to maintain the log files.
To rotate Squid's logs, simple use this command:
squid -k rotate
For example, use this cron entry to rotate the logs at midnight:
0 0 * * * /usr/local/squid/bin/squid -k rotate
How can I disable Squid's log files?
For Squid 2.4:
To disable access.log:
cache_access_log /dev/null
To disable store.log:
cache_store_log none
To disable cache.log:
cache_log /dev/null
For Squid 2.5:
To disable access.log:
cache_access_log none
To disable store.log:
cache_store_log none
To disable cache.log:
cache_log /dev/null
|
It is a bad idea to disable the cache.log because this file contains many important status and debugging messages. However, if you really want to, you can |
|
If /dev/null is specified to any of the above log files, logfile rotate must also be set to 0 or else risk Squid rotating away /dev/null making it a plain log file |
|
Instead of disabling the log files, it is advisable to use a smaller value for logfile_rotate and properly rotating Squid's log files in your cron. That way, your log files are more controllable and self-maintained by your system |
What is the maximum size of access.log?
Squid does not impose a size limit on its log files. Some operating systems have a maximum file size limit, however. If a Squid log file exceeds the operating system's size limit, Squid receives a write error and shuts down. You should regularly rotate Squid's log files so that they do not become very large.
|
Logging is very important to Squid. In fact, it is so important that it will shut itself down if it can't write to its logfiles. This includes cases such as a full log disk, or logfiles getting too big. |
My log files get very big!
You need to rotate your log files with a cron job. For example:
0 0 * * * /usr/local/squid/bin/squid -k rotate
I want to use another tool to maintain the log files.
If you set logfile_rotate to 0, Squid simply closes and then re-opens the logs. This allows third-party logfile management systems, such as newsyslog or logrotate, to maintain the log files.
Managing log files
The preferred log file for analysis is the access.log file in native format. For long term evaluations, the log file should be obtained at regular intervals. Squid offers an easy to use API for rotating log files, in order that they may be moved (or removed) without disturbing the cache operations in progress. The procedures were described above.
Depending on the disk space allocated for log file storage, it is recommended to set up a cron job which rotates the log files every 24, 12, or 8 hour. You will need to set your logfile_rotate to a sufficiently large number. During a time of some idleness, you can safely transfer the log files to your analysis host in one burst.
Before transport, the log files can be compressed during off-peak time. On the analysis host, the log file are concatinated into one file, so one file for 24 hours is the yield. Also note that with log_icp_queries enabled, you might have around 1 GB of uncompressed log information per day and busy cache. Look into you cache manager info page to make an educated guess on the size of your log files.
The EU project DESIRE developed some some basic rules to obey when handling and processing log files:
- Respect the privacy of your clients when publishing results.
- Keep logs unavailable unless anonymized. Most countries have laws on privacy protection, and some even on how long you are legally allowed to keep certain kinds of information.
Rotate and process log files at least once a day. Even if you don't process the log files, they will grow quite large, see My log files get very big above here. If you rely on processing the log files, reserve a large enough partition solely for log files.
- Keep the size in mind when processing. It might take longer to process log files than to generate them!
- Limit yourself to the numbers you are interested in. There is data beyond your dreams available in your log file, some quite obvious, others by combination of different views. Here are some examples for figures to watch:
- The hosts using your cache.
- The elapsed time for HTTP requests - this is the latency the user sees. Usually, you will want to make a distinction for HITs and MISSes and overall times. Also, medians are preferred over averages.
- The requests handled per interval (e.g. second, minute or hour).
Why do I get ERR_NO_CLIENTS_BIG_OBJ messages so often?
This message means that the requested object was in "Delete Behind" mode and the user aborted the transfer. An object will go into "Delete Behind" mode if
It is larger than maximum_object_size
It is being fetched from a neighbor which has the proxy-only option set.
What does ERR_LIFETIME_EXP mean?
This means that a timeout occurred while the object was being transferred. Most likely the retrieval of this object was very slow (or it stalled before finishing) and the user aborted the request. However, depending on your settings for quick_abort, Squid may have continued to try retrieving the object. Squid imposes a maximum amount of time on all open sockets, so after some amount of time the stalled request was aborted and logged win an ERR_LIFETIME_EXP message.
Retrieving "lost" files from the cache
"I've been asked to retrieve an object which was accidentally destroyed at the source for recovery. So, how do I figure out where the things are so I can copy them out and strip off the headers?""
The following method applies only to the Squid-1.1 versions:
Use grep to find the named object (URL) in the cache.log file. The first field in this file is an integer file number.
Then, find the file fileno-to-pathname.pl from the "scripts" directory of the Squid source distribution. The usage is
perl fileno-to-pathname.pl [-c squid.conf]
file numbers are read on stdin, and pathnames are printed on stdout.
Can I use store.log to figure out if a response was cachable?
Sort of. You can use store.log to find out if a particular response was cached.
Cached responses are logged with the SWAPOUT tag. Uncached responses are logged with the RELEASE tag.
However, your analysis must also consider that when a cached response is removed from the cache (for example due to cache replacement) it is also logged in store.log with the RELEASE tag. To differentiate these two, you can look at the filenumber (3rd) field. When an uncachable response is released, the filenumber is FFFFFFFF (-1). Any other filenumber indicates a cached response was released.
Can I pump the squid access.log directly into a pipe?
Several people have asked for this, usually to feed the log into some kind of external database, or to analyze them in real-time.
The answer is No. Well, yes, sorta. But you have to be very careful, and Squid doesn't encourage or help it in any way, as it opens up a whole load of possible problems.
|
Logging is very important to Squid. In fact, it is so important that it will shut itself down if it can't write to its logfiles. |
There's a whole load of possible problems, security risks and DOS scenarios that emerge if Squid allowed writing log files to some external program (for instance via a pipe). For instance, how should Squid behave if the output program crashes? Or if it can't keep up with the load? Or if it blocks? So the safest path was chosen, and that means sticking to writing to files.
There's a few tricks that can be used to still be able to work around this:
using the tail -f UNIX command on access.log
It will keep on reading the access.log file and write to stdout (or to a pipe) the lines being added in almost real time. Unfortunately it doesn't behave correctly if the access.log file gets renamed (via link/unlink, which is what squid does on -k rotate: tail will happily keep the old file open, but noone is writing to it anymore
using the tail -F feature of GNU tail
- GNU tail supports an extra option, which allows it to notice if a file gets renamed and recreated.
using File::Tail from within a PERL script
File::Tail behaves like tail -F. It is however only available in PERL.
It's unfortunately highly unlikely that either of those will work under MS Windows, due to its brain-dead file-sharing semantics.
|
Anyone with good MS Windows experience or knowing any better is invited to amend the previous sentence. |
If you really really want to send your squid logs to some external script, AND you're really really sure you know what you're doing (but then again, if you're doing this you probably you don't know what you're doing), you can use the UNIX command mkfifo to create a named pipe. You need to
create a named pipe (i.e. with the command mkfifo /var/log/squid/access.log)
attach a daemonized text-processor to it (i.e. (/usr/local/sbin/text-processor.pl /var/log/squid/access.log)& )
- start squid
The problem with this approach is that if the text-processor blocks, squid blocks. If it crashes, in the best case squid blocks until the processor is restarted. In the second-best case, squid crashes or aborts. There is no worst case (than this).
Contents
- How do I see system level Squid statistics?
- How can I find the biggest objects in my cache?
- I want to restart Squid with a clean cache
- How can I proxy/cache Real Audio?
- How can I purge an object from my cache?
- How can i purge multiple objects from my cache?
- Using ICMP to Measure the Network
- Why are so few requests logged as TCP_IMS_MISS?
- How can I make Squid NOT cache some servers or URLs?
- How can I delete and recreate a cache directory?
- Why can't I run Squid as root?
- Can you tell me a good way to upgrade Squid with minimal downtime?
- Can Squid listen on more than one HTTP port?
- Can I make origin servers see the client's IP address when going through Squid?
How do I see system level Squid statistics?
The Squid distribution includes a CGI utility called cachemgr.cgi which can be used to view squid statistics with a web browser. See ../CacheManager for more information on its usage and installation.
How can I find the biggest objects in my cache?
sort -r -n +4 -5 access.log | awk '{print $5, $7}' | head -25
If your cache processes several hundred hits per second, good luck.
I want to restart Squid with a clean cache
Note: The information here is current for version 2.2 and later.
First of all, you must stop Squid of course. You can use the command:
% squid -k shutdown
The fastest way to restart with an entirely clean cache is to over write the swap.state files for each cache_dir in your config file. Note, you can not just remove the swap.state file, or truncate it to zero size. Instead, you should put just one byte of garbage there. For example:
% echo "" > /cache1/swap.state
Repeat that for every cache_dir, then restart Squid. Be sure to leave the swap.state file with the same owner and permissions that it had before!
Another way, which takes longer, is to have squid recreate all the cache_dir directories. But first you must move the existing directories out of the way. For example, you can try this:
% cd /cache1 % mkdir JUNK % mv ?? swap.state* JUNK % rm -rf JUNK &
Repeat this for your other cache_dirs, then tell Squid to create new directories:
% squid -z
How can I proxy/cache Real Audio?
by Rodney van den Oever and James R Grinter
Point the RealPlayer at your Squid server's HTTP port (e.g. 3128).
Using the Preferences->Transport tab, select Use specified transports and with the Specified Transports button, select use HTTP Only.
The RealPlayer (and RealPlayer Plus) manual states:
Use HTTP Only Select this option if you are behind a firewall and cannot receive data through TCP. All data will be streamed through HTTP. Note: You may not be able to receive some content if you select this option.
Again, from the documentation:
RealPlayer 4.0 identifies itself to the firewall when making a request for content to a RealServer. The following string is attached to any URL that the Player requests using HTTP GET: /SmpDsBhgRl Thus, to identify an HTTP GET request from the RealPlayer, look for: http://[^/]+/SmpDsBhgRl The Player can also be identified by the mime type in a POST to the RealServer. The RealPlayer POST has the following mime type: "application/x-pncmd"
Note that the first request is a POST, and the second has a '?' in the URL, so standard Squid configurations would treat it as non-cachable. It also looks rather "magic."
HTTP is an alternative delivery mechanism introduced with version 3 players, and it allows a reasonable approximation to "streaming" data - that is playing it as you receive it.
It isn't available in the general case: only if someone has made the realaudio file available via an HTTP server, or they're using a version 4 server, they've switched it on, and you're using a version 4 client. If someone has made the file available via their HTTP server, then it'll be cachable. Otherwise, it won't be (as far as we can tell.)
The more common RealAudio link connects via their own pnm: method and is transferred using their proprietary protocol (via TCP or UDP) and not using HTTP. It can't be cached nor proxied by Squid, and requires something such as the simple proxy that Progressive Networks themselves have made available, if you're in a firewall/no direct route situation. Their product does not cache (and I don't know of any software available that does.)
Some confusion arises because there is also a configuration option to use an HTTP proxy (such as Squid) with the RealAudio/RealVideo players. This is because the players can fetch the ".ram" file that contains the pnm: reference for the audio/video stream. They fetch that .ram file from an HTTP server, using HTTP.
How can I purge an object from my cache?
Squid does not allow you to purge objects unless it is configured with access controls in squid.conf. First you must add something like
acl PURGE method PURGE acl localhost src 127.0.0.1 http_access allow PURGE localhost http_access deny PURGE
The above only allows purge requests which come from the local host and denies all other purge requests.
To purge an object, you can use the squidclient program:
squidclient -m PURGE http://www.miscreant.com/
If the purge was successful, you will see a "200 OK" response:
HTTP/1.0 200 OK Date: Thu, 17 Jul 1997 16:03:32 GMT Server: Squid/1.1.14
If the object was not found in the cache, you will see a "404 Not Found" response:
HTTP/1.0 404 Not Found Date: Thu, 17 Jul 1997 16:03:22 GMT Server: Squid/1.1.14
How can i purge multiple objects from my cache?
It's not possible; you have to purte the objects one by one by URL. This is because squid doesn't keep in memory the URL of every object it stores, but only a compact representation of it (a hash). Finding the hash given the URL is easy, the other way around is not possible.
Purging by wildcard, by domain etc. are unfortunately not possible at this time.
Using ICMP to Measure the Network
As of version 1.1.9, Squid is able to utilize ICMP Round-Trip-Time (RTT) measurements to select the optimal location to forward a cache miss. Previously, cache misses would be forwarded to the parent cache which returned the first ICP reply message. These were logged with FIRST_PARENT_MISS in the access.log file. Now we can select the parent which is closest (RTT-wise) to the origin server.
Supporting ICMP in your Squid cache
It is more important that your parent caches enable the ICMP features. If you are acting as a parent, then you may want to enable ICMP on your cache. Also, if your cache makes RTT measurements, it will fetch objects directly if your cache is closer than any of the parents.
If you want your Squid cache to measure RTT's to origin servers, Squid must be compiled with the USE_ICMP option. This is easily accomplished by uncommenting "-DUSE_ICMP=1" in src/Makefile and/or src/Makefile.in.
An external program called pinger is responsible for sending and receiving ICMP packets. It must run with root privileges. After Squid has been compiled, the pinger program must be installed separately. A special Makefile target will install pinger with appropriate permissions.
% make install % su # make install-pinger
There are three configuration file options for tuning the measurement database on your cache. netdb_low and netdb_high specify high and low water marks for keeping the database to a certain size (e.g. just like with the IP cache). The netdb_ttl option specifies the minimum rate for pinging a site. If netdb_ttl is set to 300 seconds (5 minutes) then an ICMP packet will not be sent to the same site more than once every five minutes. Note that a site is only pinged when an HTTP request for the site is received.
Another option, minimum_direct_hops can be used to try finding servers which are close to your cache. If the measured hop count to the origin server is less than or equal to minimum_direct_hops, the request will be forwarded directly to the origin server.
Utilizing your parents database
Your parent caches can be asked to include the RTT measurements in their ICP replies. To do this, you must enable query_icmp in your config file:
query_icmp on
This causes a flag to be set in your outgoing ICP queries.
If your parent caches return ICMP RTT measurements then the eighth column of your access.log will have lines similar to:
CLOSEST_PARENT_MISS/it.cache.nlanr.net
In this case, it means that it.cache.nlanr.net returned the lowest RTT to the origin server. If your cache measured a lower RTT than any of the parents, the request will be logged with
CLOSEST_DIRECT/www.sample.com
Inspecting the database
The measurement database can be viewed from the cachemgr by selecting "Network Probe Database." Hostnames are aggregated into /24 networks. All measurements made are averaged over time. Measurements are made to specific hosts, taken from the URLs of HTTP requests. The recv and sent fields are the number of ICMP packets sent and received. At this time they are only informational.
A typical database entry looks something like this:
Network recv/sent RTT Hops Hostnames 192.41.10.0 20/ 21 82.3 6.0 www.jisedu.org www.dozo.com bo.cache.nlanr.net 42.0 7.0 uc.cache.nlanr.net 48.0 10.0 pb.cache.nlanr.net 55.0 10.0 it.cache.nlanr.net 185.0 13.0
This means we have sent 21 pings to both www.jisedu.org and www.dozo.com. The average RTT is 82.3 milliseconds. The next four lines show the measured values from our parent caches. Since bo.cache.nlanr.net has the lowest RTT, it would be selected as the location to forward a request for a www.jisedu.org or www.dozo.com URL.
Why are so few requests logged as TCP_IMS_MISS?
When Squid receives an If-Modified-Since request, it will not forward the request unless the object needs to be refreshed according to the refresh_pattern rules. If the request does need to be refreshed, then it will be logged as TCP_REFRESH_HIT or TCP_REFRESH_MISS.
If the request is not forwarded, Squid replies to the IMS request according to the object in its cache. If the modification times are the same, then Squid returns TCP_IMS_HIT. If the modification times are different, then Squid returns TCP_IMS_MISS. In most cases, the cached object will not have changed, so the result is TCP_IMS_HIT. Squid will only return TCP_IMS_MISS if some other client causes a newer version of the object to be pulled into the cache.
How can I make Squid NOT cache some servers or URLs?
In Squid-2, you use the cache option to specify uncachable requests. For example, this makes all responses from origin servers in the 10.0.1.0/24 network uncachable:
acl Local dst 10.0.1.0/24 cache deny Local
This example makes all URL's with '.html' uncachable:
acl HTML url_regex .html$ cache deny HTML
This example makes a specific URL uncachable:
acl XYZZY url_regex ^http://www.i.suck.com/foo.html$ cache deny XYZZY
This example caches nothing between the hours of 8AM to 11AM:
acl Morning time 08:00-11:00 cache deny Morning
In Squid-1.1, whether or not an object gets cached is controlled by the cache_stoplist, and cache_stoplist_pattern options. So, you may add:
cache_stoplist my.domain.com
How can I delete and recreate a cache directory?
Deleting an existing cache directory is not too difficult. Unfortunately, you can't simply change squid.conf and then reconfigure. You can't stop using a cache_dir while Squid is running. Also note that Squid requires at least one cache_dir to run.
Edit your squid.conf file and comment out, or delete the cache_dir line for the cache directory that you want to remove.
If you don't have any cache_dir lines in your squid.conf, then Squid was using the default. You'll need to add a new cache_dir line because Squid will continue to use the default otherwise. You can add a small, temporary directory, for example:
/usr/local/squid/cachetmp ....
If you add a new cache_dir you have to run squid -z to initialize that directory.
Remeber that you can not delete a cache directory from a running Squid process; you can not simply reconfigure squid. You must shutdown Squid:
squid -k shutdown
Once Squid exits, you may immediately start it up again. Since you deleted the old cache_dir from squid.conf, Squid won't try to access that directory. If you use the RunCache script, Squid should start up again automatically.
Now Squid is no longer using the cache directory that you removed from the config file. You can verify this by checking "Store Directory" information with the cache manager. From the command line, type:
squidclient mgr:storedir
Now that Squid is not using the cache directory, you can rm -rf it, format the disk, build a new filesystem, or whatever.
The procedure is similar to recreate the directory.
Edit squid.conf and add a new cache_dir line.
Shutdown Squid (squid -k shutdown)
Initialize the new directory by running
% squid -z
- Start Squid again
Why can't I run Squid as root?
by Dave J Woolley
If someone were to discover a buffer overrun bug in Squid and it runs as a user other than root, they can only corrupt the files writeable to that user, but if it runs a root, they can take over the whole machine. This applies to all programs that don't absolutely need root status, not just squid.
Can you tell me a good way to upgrade Squid with minimal downtime?
Here is a technique that was described by Radu Greab.
Start a second Squid server on an unused HTTP port (say 4128). This instance of Squid probably doesn't need a large disk cache. When this second server has finished reloading the disk store, swap the http_port values in the two squid.conf files. Set the original Squid to use port 5128, and the second one to use 3128. Next, run "squid -k reconfigure" for both Squids. New requests will go to the second Squid, now on port 3128 and the first Squid will finish handling its current requests. After a few minutes, it should be safe to fully shut down the first Squid and upgrade it. Later you can simply repeat this process in reverse.
Can Squid listen on more than one HTTP port?
Note: The information here is current for version 2.3.
Yes, you can specify multiple http_port lines in your squid.conf file. Squid attempts to bind() to each port that you specify. Sometimes Squid may not be able to bind to a port, either because of permissions or because the port is already in use. If Squid can bind to at least one port, then it will continue running. If it can not bind to any of the ports, then Squid stops.
With version 2.3 and later you can specify IP addresses and port numbers together (see the squid.conf comments).
Can I make origin servers see the client's IP address when going through Squid?
Normally you cannot. Most TCP/IP stacks do not allow applications to create sockets with the local endpoint assigned to a foreign IP address. However, some folks have some patches to Linux that allow exactly that.
In this situation, you must ensure that all HTTP packets destined for the client IP addresses are routed to the Squid box. If the packets take another path, the real clients will send TCP resets to the origin servers, thereby breaking the connections.
Contents
- Why does Squid use so much memory!?
- How can I tell how much memory my Squid process is using?
- My Squid process grows without bounds.
- I set cache_mem to XX, but the process grows beyond that!
- How do I analyze memory usage from the cache manger output?
- The "Total memory accounted" value is less than the size of my Squid process.
- xmalloc: Unable to allocate 4096 bytes!
- fork: (12) Cannot allocate memory
- What can I do to reduce Squid's memory usage?
- Using an alternate malloc library
- How much memory do I need in my Squid server?
- Why can't my Squid process grow beyond a certain size?
Why does Squid use so much memory!?
Squid uses a lot of memory for performance reasons. It takes much, much longer to read something from disk than it does to read directly from memory.
A small amount of metadata for each cached object is kept in memory. This is the StoreEntry data structure. For Squid-2 this is 56-bytes on "small" pointer architectures (Intel, Sparc, MIPS, etc) and 88-bytes on "large" pointer architectures (Alpha). In addition, there is a 16-byte cache key (MD5 checksum) associated with each StoreEntry. This means there are 72 or 104 bytes of metadata in memory for every object in your cache. A cache with 1,000,000 objects therefore requires 72MB of memory for metadata only. In practice it requires much more than that.
Other uses of memory by Squid include:
- Disk buffers for reading and writing
- Network I/O buffers
- IP Cache contents
- FQDN Cache contents
- Netdb ICMP measurement database
- Per-request state information, including full request and reply headers
- Miscellaneous statistics collection.
- "Hot objects" which are kept entirely in memory.
How can I tell how much memory my Squid process is using?
One way is to simply look at ps output on your system. For BSD-ish systems, you probably want to use the -u option and look at the VSZ and RSS fields:
wessels ~ 236% ps -axuhm USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND squid 9631 4.6 26.4 141204 137852 ?? S 10:13PM 78:22.80 squid -NCYs
For SYSV-ish, you probably want to use the -l option. When interpreting the ps output, be sure to check your ps manual page. It may not be obvious if the reported numbers are kbytes, or pages (usually 4 kb).
A nicer way to check the memory usage is with a program called top:
last pid: 20128; load averages: 0.06, 0.12, 0.11 14:10:58 46 processes: 1 running, 45 sleeping CPU states: % user, % nice, % system, % interrupt, % idle Mem: 187M Active, 1884K Inact, 45M Wired, 268M Cache, 8351K Buf, 1296K Free Swap: 1024M Total, 256K Used, 1024M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 9631 squid 2 0 138M 135M select 78:45 3.93% 3.93% squid
Finally, you can ask the Squid process to report its own memory usage. This is available on the Cache Manager info page. Your output may vary depending upon your operating system and Squid version, but it looks similar to this:
Resource usage for squid: Maximum Resident Size: 137892 KB Memory usage for squid via mstats(): Total space in arena: 140144 KB Total free: 8153 KB 6%
If your RSS (Resident Set Size) value is much lower than your process size, then your cache performance is most likely suffering due to Paging. See also ../CacheManager
My Squid process grows without bounds.
You might just have your cache_mem parameter set too high. See What can I do to reduce Squid's memory usage? below.
When a process continually grows in size, without levelling off or slowing down, it often indicates a memory leak. A memory leak is when some chunk of memory is used, but not free'd when it is done being used.
Memory leaks are a real problem for programs (like Squid) which do all of their processing within a single process. Historically, Squid has had real memory leak problems. But as the software has matured, we believe almost all of Squid's memory leaks have been eliminated, and new ones are least easy to identify.
Memory leaks may also be present in your system's libraries, such as libc.a or even libmalloc.a. If you experience the ever-growing process size phenomenon, we suggest you first try #alternate-malloc.
I set cache_mem to XX, but the process grows beyond that!
The cache_mem parameter does NOT specify the maximum size of the process. It only specifies how much memory to use for caching "hot" (very popular) replies. Squid's actual memory usage is depends very strongly on your cache size (disk space) and your incoming request load. Reducing cache_mem will usually also reduce the process size, but not necessarily, and there are other ways to reduce Squid's memory usage (see below).
See also How much memory do I need in my Squid server?.
How do I analyze memory usage from the cache manger output?
Note: This information is specific to Squid-1.1 versions
Look at your cachemgr.cgi Cache Information page. For example:
Memory usage for squid via mallinfo(): Total space in arena: 94687 KB Ordinary blocks: 32019 KB 210034 blks Small blocks: 44364 KB 569500 blks Holding blocks: 0 KB 5695 blks Free Small blocks: 6650 KB Free Ordinary blocks: 11652 KB Total in use: 76384 KB 81% Total free: 18302 KB 19% Meta Data: StoreEntry 246043 x 64 bytes = 15377 KB IPCacheEntry 971 x 88 bytes = 83 KB Hash link 2 x 24 bytes = 0 KB URL strings = 11422 KB Pool MemObject structures 514 x 144 bytes = 72 KB ( 70 free) Pool for Request structur 516 x 4380 bytes = 2207 KB ( 2121 free) Pool for in-memory object 6200 x 4096 bytes = 24800 KB ( 22888 free) Pool for disk I/O 242 x 8192 bytes = 1936 KB ( 1888 free) Miscellaneous = 2600 KB total Accounted = 58499 KB
First note that mallinfo() reports 94M in "arena." This is pretty close to what top says (97M).
Of that 94M, 81% (76M) is actually being used at the moment. The rest has been freed, or pre-allocated by malloc(3) and not yet used.
Of the 76M in use, we can account for 58.5M (76%). There are some calls to malloc(3) for which we can't account.
The Meta Data list gives the breakdown of where the accounted memory has gone. 45% has gone to StoreEntry and URL strings. Another 42% has gone to buffering hold objects in VM while they are fetched and relayed to the clients (Pool for in-memory object).
The pool sizes are specified by squid.conf parameters. In version 1.0, these pools are somewhat broken: we keep a stack of unused pages instead of freeing the block. In the Pool for in-memory object, the unused stack size is 1/2 of cache_mem. The Pool for disk I/O is hardcoded at 200. For MemObject and Request it's 1/8 of your system's FD_SETSIZE value.
If you need to lower your process size, we recommend lowering the max object sizes in the 'http', 'ftp' and 'gopher' config lines. You may also want to lower cache_mem to suit your needs. But if you make cache_mem too low, then some objects may not get saved to disk during high-load periods. Newer Squid versions allow you to set memory_pools off to disable the free memory pools.
The "Total memory accounted" value is less than the size of my Squid process.
We are not able to account for all memory that Squid uses. This would require excessive amounts of code to keep track of every last byte. We do our best to account for the major uses of memory.
Also, note that the malloc and free functions have their own overhead. Some additional memory is required to keep track of which chunks are in use, and which are free. Additionally, most operating systems do not allow processes to shrink in size. When a process gives up memory by calling free, the total process size does not shrink. So the process size really represents the maximum size your Squid process has reached.
xmalloc: Unable to allocate 4096 bytes!
Messages like "FATAL: xcalloc: Unable to allocate 4096 blocks of 1 bytes!" appear when Squid can't allocate more memory, and on most operating systems (inclusive BSD) there are only two possible reasons:
- The machine is out of swap
- The process' maximum data segment size has been reached
The first case is detected using the normal swap monitoring tools available on the platform (pstat on SunOS, perhaps pstat is used on BSD as well).
To tell if it is the second case, first rule out the first case and then monitor the size of the Squid process. If it dies at a certain size with plenty of swap left then the max data segment size is reached without no doubts.
The data segment size can be limited by two factors:
- Kernel imposed maximum, which no user can go above
- The size set with ulimit, which the user can control.
When squid starts it sets data and file ulimit's to the hard level. If you manually tune ulimit before starting Squid make sure that you set the hard limit and not only the soft limit (the default operation of ulimit is to only change the soft limit). root is allowed to raise the soft limit above the hard limit.
This command prints the hard limits:
ulimit -aH
This command sets the data size to unlimited:
ulimit -HSd unlimited
BSD/OS
by Arjan de Vet
The default kernel limit on BSD/OS for datasize is 64MB (at least on 3.0 which I'm using).
Recompile a kernel with larger datasize settings:
maxusers 128 # Support for large inpcb hash tables, e.g. busy WEB servers. options INET_SERVER # support for large routing tables, e.g. gated with full Internet routing: options "KMEMSIZE=\(16*1024*1024\)" options "DFLDSIZ=\(128*1024*1024\)" options "DFLSSIZ=\(8*1024*1024\)" options "SOMAXCONN=128" options "MAXDSIZ=\(256*1024*1024\)"
See /usr/share/doc/bsdi/config.n for more info.
In /etc/login.conf I have this:
default:\ :path=/bin /usr/bin /usr/contrib/bin:\ :datasize-cur=256M:\ :openfiles-cur=1024:\ :openfiles-max=1024:\ :maxproc-cur=1024:\ :stacksize-cur=64M:\ :radius-challenge-styles=activ,crypto,skey,snk,token:\ :tc=auth-bsdi-defaults:\ :tc=auth-ftp-bsdi-defaults: # # Settings used by /etc/rc and root # This must be set properly for daemons started as root by inetd as well. # Be sure reset these values back to system defaults in the default class! # daemon:\ :path=/bin /usr/bin /sbin /usr/sbin:\ :widepasswords:\ :tc=default: # :datasize-cur=128M:\ # :openfiles-cur=256:\ # :maxproc-cur=256:\
This should give enough space for a 256MB squid process.
FreeBSD (2.2.X)
by [wessels Duane Wessels]
The procedure is almost identical to that for BSD/OS above. Increase the open filedescriptor limit in /sys/conf/param.c:
int maxfiles = 4096; int maxfilesperproc = 1024;
Increase the maximum and default data segment size in your kernel config file, e.g. /sys/conf/i386/CONFIG:
options "MAXDSIZ=(512*1024*1024)" options "DFLDSIZ=(128*1024*1024)"
We also found it necessary to increase the number of mbuf clusters:
options "NMBCLUSTERS=10240"
And, if you have more than 256 MB of physical memory, you probably have to disable BOUNCE_BUFFERS (whatever that is), so comment out this line:
#options BOUNCE_BUFFERS #include support for DMA bounce buffers
Also, update limits in /etc/login.conf:
# Settings used by /etc/rc # daemon:\ :coredumpsize=infinity:\ :datasize=infinity:\ :maxproc=256:\ :maxproc-cur@:\ :memoryuse-cur=64M:\ :memorylocked-cur=64M:\ :openfiles=4096:\ :openfiles-cur@:\ :stacksize=64M:\ :tc=default:
And don't forget to run "cap_mkdb /etc/login.conf" after editing that file.
OSF, Digital Unix
by Ong Beng Hui
To increase the data size for Digital UNIX, edit the file /etc/sysconfigtab and add the entry...
proc: per-proc-data-size=1073741824
Or, with csh, use the limit command, such as
> limit datasize 1024M
Editing /etc/sysconfigtab requires a reboot, but the limit command doesn't.
fork: (12) Cannot allocate memory
When Squid is reconfigured (SIGHUP) or the logs are rotated (SIGUSR1), some of the helper processes (dnsserver) must be killed and restarted. If your system does not have enough virtual memory, the Squid process may not be able to fork to start the new helper processes. This is due to the UNIX way of starting child processes using the fork() system call which temporary duplicates the whole Squid process, and when rapidly starting many child processes such as on "squid -k rotate" the memory usage can temporarily grow to many times the normal memory usage due to several temporary copies of the whole process.
The best way to fix this is to increase your virtual memory by adding swap space. Normally your system uses raw disk partitions for swap space, but most operating systems also support swapping on regular files (Digital Unix excepted). See your system manual pages for swap, swapon, and mkfile. Alternatively you can use the sleep_after_fork directive to make Squid sleep a little while invoking helpers to allow the helper to start up before trying to start the next one. This can be helpful if you find that Squid sometimes fail to restart all helpers on "squid -k reconfigure".
What can I do to reduce Squid's memory usage?
If your cache performance is suffering because of memory limitations, you might consider buying more memory. But if that is not an option, There are a number of things to try:
- Try a different malloc library (see below)
Reduce the cache_mem parameter in the config file. This controls how many "hot" objects are kept in memory. Reducing this parameter will not significantly affect performance, but you may recieve some warnings in cache.log if your cache is busy.
Turn the memory_pools off in the config file. This causes Squid to give up unused memory by calling free() instead of holding on to the chunk for potential, future use. Generally speaking, this is a bad idea as it will induce heap fragmentation. Use memory_pools_limit instead.
Reduce the cache_swap parameter in your config file. This will reduce the number of objects Squid keeps. Your overall hit ratio may go down a little, but your cache will perform significantly better.
Using an alternate malloc library
Many users have found improved performance and memory utilization when linking Squid with an external malloc library. We recommend either GNU malloc, or dlmalloc.
GNU malloc
To make Squid use GNU malloc follow these simple steps:
- Download the GNU malloc source, available from one of [GNU FTP Mirror sites. - Compile it
% gzip -dc malloc.tar.gz | tar xf - % cd malloc % vi Makefile # edit as needed % make
- Copy libmalloc.a to your system's library directory and be sure to name it libgnumalloc.a.
% su # cp malloc.a /usr/lib/libgnumalloc.a
- (Optional) Copy the GNU malloc.h to your system's include directory and be sure to name it gnumalloc.h. This step is not required, but if you do this, then Squid will be able to use the mstat() function to report memory usage statistics on the cachemgr info page.
# cp malloc.h /usr/include/gnumalloc.h
- - Reconfigure and recompile Squid
% make distclean % ./configure ... % make % make install
As Squid's configure script runs, watch its output. You should find that it locates libgnumalloc.a and optionally gnumalloc.h.
dlmalloc
dlmalloc has been written by Doug Lea. According to Doug:
This is not the fastest, most space-conserving, most portable, or most tunable malloc ever written. However it is among the fastest while also being among the most space-conserving, portable and tunable.
dlmalloc is included with the Squid-2 source distribution. To use this library, you simply give an option to the configure script:
% ./configure --enable-dlmalloc ...
How much memory do I need in my Squid server?
As a rule of thumb on Squid uses approximately 10 MB of RAM per GB of the total of all cache_dirs (more on 64 bit servers such as Alpha), plus your cache_mem setting and about an additional 10-20MB. It is recommended to have at least twice this amount of physical RAM available on your Squid server. For a more detailed discussion on Squid's memory usage see the sections above.
The recommended extra RAM besides what is used by Squid is used by the operating system to improve disk I/O performance and by other applications or services running on the server. This will be true even of a server which runs Squid as the only tcp service, since there is a minimum level of memory needed for process management, logging, and other OS level routines.
If you have a low memory server, and a large disk, then you will not necessarily be able to use all the disk space, since as the cache fills the memory available will be insufficient, forcing Squid to swap out memory and affecting performance. A very large cache_dir total and insufficient physical RAM + Swap could cause Squid to stop functioning completely. The solution for larger caches is to get more physical RAM; allocating more to Squid via cache_mem will not help.
Why can't my Squid process grow beyond a certain size?
by [AdrianChadd Adrian Chadd]
A number of people are running Squid with more than a gigabyte of memory. Here are some things to keep in mind.
- The Operating System may put a limit on how much memory available per-process. Check the resource limits (/etc/security/limits.conf or similar under PAM systems, 'ulimit', etc.)
- The Operating System may have a limit on the size of processes. 32-bit platforms are sometimes "split" to be 2gb process/2gb kernel; this can be changed to be 3gb process/1gb kernel through a kernel recompile or boot-time option. Check your operating system's documentation for specific details.
Some malloc implementations may not support > 2gb of memory - eg dlmalloc. Don't use dlmalloc unless your platform is very broken (and then realise you won't be able to use >2gb RAM using it.)
- Make sure the Squid has been compiled to be a 64 bit binary (with modern Unix-like OSes you can use the 'file' command for this); some platforms may have a 64 bit kernel but a 32 bit userland, or the compiler may default to a 32 bit userland.
Contents
- What is the cache manager?
- How do you set it up?
- Cache manager configuration for CERN httpd 3.0
- Cache manager configuration for Apache 1.x
- Cache manager configuration for Apache 2.x
- Cache manager configuration for Roxen 2.0 and later
- Cache manager access from squidclient
- Cache manager ACLs in squid.conf
- Why does it say I need a password and a URL?
- I want to shutdown the cache remotely. What's the password?
- How do I make the cache host default to my cache?
- What's the difference between Squid TCP connections and Squid UDP connections?
- It says the storage expiration will happen in 1970!
- What do the Meta Data entries mean?
- In the utilization section, what is Other?
- In the utilization section, why is the Transfer KB/sec column always zero?
- In the utilization section, what is the Object Count?
- In the utilization section, what is the Max/Current/Min KB?
- What is the I/O section about?
- What is the Objects section for?
- What is the VM Objects section for?
- What does AVG RTT mean?
- In the IP cache section, what's the difference between a hit, a negative hit and a miss?
- What do the IP cache contents mean anyway?
- What is the fqdncache and how is it different from the ipcache?
- What does "Page faults with physical i/o: 4897" mean?
- What does the IGNORED field mean in the 'cache server list'?
Chapter contributed by Jonathan Larmour
What is the cache manager?
The cache manager (cachemgr.cgi) is a CGI utility for displaying statistics about the squid process as it runs. The cache manager is a convenient way to manage the cache and view statistics without logging into the server.
How do you set it up?
That depends on which web server you're using. Below you will find instructions for configuring the CERN and Apache servers to permit cachemgr.cgi usage.
|
EDITOR'S NOTE: readers are encouraged to submit instructions for configuration of cachemgr.cgi on other web server platforms, such as Netscape. |
After you edit the server configuration files, you will probably need to either restart your web server or or send it a SIGHUP signal to tell it to re-read its configuration files.
When you're done configuring your web server, you'll connect to the cache manager with a web browser, using a URL such as:
http://www.example.com/Squid/cgi-bin/cachemgr.cgi
Cache manager configuration for CERN httpd 3.0
First, you should ensure that only specified workstations can access the cache manager. That is done in your CERN httpd.conf, not in squid.conf.
Protection MGR-PROT { Mask @(workstation.example.com) }
Wildcards are acceptable, IP addresses are acceptable, and others can be added with a comma-separated list of IP addresses. There are many more ways of protection. Your server documentation has details.
You also need to add:
Protect /Squid/* MGR-PROT Exec /Squid/cgi-bin/*.cgi /usr/local/squid/bin/*.cgi
This marks the script as executable to those in MGR-PROT.
Cache manager configuration for Apache 1.x
First, make sure the cgi-bin directory you're using is listed with a ScriptAlias in your Apache httpd.conf file like this:
ScriptAlias /Squid/cgi-bin/ /usr/local/squid/cgi-bin/
It's probably a bad idea to ScriptAlias the entire //usr/local/squid/bin/ directory where all the Squid executables live.
Next, you should ensure that only specified workstations can access the cache manager. That is done in your Apache httpd.conf, not in squid.conf. At the bottom of httpd.conf file, insert:
<Location /Squid/cgi-bin/cachemgr.cgi> order allow,deny allow from workstation.example.com </Location>
You can have more than one allow line, and you can allow domains or networks.
Alternately, cachemgr.cgi can be password-protected. You'd add the following to httpd.conf:
<Location /Squid/cgi-bin/cachemgr.cgi> AuthUserFile /path/to/password/file AuthGroupFile /dev/null AuthName User/Password Required AuthType Basic require user cachemanager </Location>
Consult the Apache documentation for information on using htpasswd to set a password for this "user."
Cache manager configuration for Apache 2.x
First, make sure the cgi-bin directory you're using is listed with a ScriptAlias in your Apache config. In the Apache config there is a sub-directory /etc/apache2/conf.d for application specific settings (unrelated to any specific site). Create a file conf.d/squid containing this:
ScriptAlias /Squid/cgi-bin/cachemgr.cgi /usr/local/squid/cgi-bin/cachemgr.cgi <Location /Squid/cgi-bin/cachemgr.cgi> order allow,deny allow from workstation.example.com </Location>
SECURITY NOTE: It's possible but a bad idea to ScriptAlias the entire //usr/local/squid/bin/ directory where all the Squid executables live.
You should ensure that only specified workstations can access the cache manager. That is done in your Apache conf.d/squid <Location> settings, not in squid.conf.
You can have more than one allow line, and you can allow domains or networks.
Alternately, cachemgr.cgi can be password-protected. You'd add the following to conf.d/squid:
<Location /Squid/cgi-bin/cachemgr.cgi> AuthUserFile /path/to/password/file AuthGroupFile /dev/null AuthName User/Password Required AuthType Basic require user cachemanager </Location>
Consult the Apache 2.0 documentation for information on using htpasswd to set a password for this "user."
To further protect the cache-manager on public systems you should consider creating a whole new <VirtualHost> segment in the Apache configuration for the squid manager. This is done by creating a file in the Apache configuration sub-directory .../apache2/sites-enabled/ usually with the domain name of the new site, see the Apache 2.0 documentation for further details for your system.
Cache manager configuration for Roxen 2.0 and later
Notice: this is not how things would get best done with Roxen, but this what you need to do go adhere to the example. Also, knowledge of basic Roxen configuration is required.
This is what's required to start up a fresh Virtual Server, only serving the cache manager. If you already have some Virtual Server you wish to use to host the Cache Manager, just add a new CGI support module to it.
Create a new virtual server, and set it to host http://www.example.com/. Add to it at least the following modules:
- Content Types
- CGI scripting support
In the CGI scripting support module, section Settings, change the following settings:
- CGI-bin path: set to /Squid/cgi-bin/
Handle *.cgi: set to no
Run user scripts as owner: set to no
- Search path: set to the directory containing the cachemgr.cgi file
In section Security, set Patterns to:
allow ip=1.2.3.4
where 1.2.3.4 is the IP address for workstation.example.com
Save the configuration, and you're done.
Cache manager access from squidclient
A simple way to test the access to the cache manager is:
% ./squidclient -p 8080 mgr:info@yourcachemanagerpassword
Note, 8080 and yourcachemanagerpassword come from your exact squid.configuration See squidclient -h for more options.
Cache manager ACLs in squid.conf
The default cache manager access configuration in squid.conf is:
acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl all src 0.0.0.0/0.0.0.0
With the following rules:
http_access deny manager !localhost http_access allow all
The first ACL is the most important as the cache manager program interrogates squid using a special cache_object protocol. Try it yourself by doing:
telnet mycache.example.com 3128 GET cache_object://mycache.example.com/info HTTP/1.0
The default ACLs say that if the request is for a cache_object, and it isn't the local host, then deny access; otherwise allow access.
In fact, only allowing localhost access means that on the initial cachemgr.cgi form you can only specify the cache host as localhost. We recommend the following:
acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl example src 123.123.123.123/255.255.255.255 acl all src 0.0.0.0/0.0.0.0
Where 123.123.123.123 is the IP address of your web server. Then modify the rules like this:
http_access allow manager localhost http_access allow manager example http_access deny manager http_access allow all
If you're using miss_access, then don't forget to also add a miss_access rule for the cache manager:
miss_access allow manager
The default ACLs assume that your web server is on the same machine as squid. Remember that the connection from the cache manager program to squid originates at the web server, not the browser. So if your web server lives somewhere else, you should make sure that IP address of the web server that has cachemgr.cgi installed on it is in the example ACL above.
Always be sure to send a SIGHUP signal to squid any time you change the squid.conf file, or to run squid -k reconfigure.
Why does it say I need a password and a URL?
If you "drop" the list box, and browse it, you will see that the password is only required to shutdown the cache, and the URL is required to refresh an object (i.e., retrieve it from its original source again) Otherwise these fields can be left blank: a password is not required to obtain access to the informational aspects of cachemgr.cgi.
I want to shutdown the cache remotely. What's the password?
See the cachemgr_passwd directive in squid.conf.
How do I make the cache host default to my cache?
When you run configure use the --enable-cachemgr-hostname option:
% ./configure --enable-cachemgr-hostname=`hostname` ...
Note, if you do this after you already installed Squid before, you need to make sure cachemgr.cgi gets recompiled. For example:
% cd src % rm cachemgr.o cachemgr.cgi % make cachemgr.cgi
Then copy cachemgr.cgi to your HTTP server's cgi-bin directory.
What's the difference between Squid TCP connections and Squid UDP connections?
Browsers and caches use TCP connections to retrieve web objects from web servers or caches. UDP connections are used when another cache using you as a sibling or parent wants to find out if you have an object in your cache that it's looking for. The UDP connections are ICP queries.
It says the storage expiration will happen in 1970!
Don't worry. The default (and sensible) behavior of squid is to expire an object when it happens to overwrite it. It doesn't explicitly garbage collect (unless you tell it to in other ways).
What do the Meta Data entries mean?
- StoreEntry
- Entry describing an object in the cache.
- IPCacheEntry
- An entry in the DNS cache.
- Hash link
- Link in the cache hash table structure.
- URL strings
The strings of the URLs themselves that map to an object number in the cache, allowing access to the StoreEntry.
Basically just like the log file in your cache directory:
PoolMemObject structures
- Info about objects currently in memory, (eg, in the process of being transferred).
- Pool for Request structures
- Information about each request as it happens.
- Pool for in-memory object
- Space for object data as it is retrieved.
If squid is much smaller than this field, run for cover! Something is very wrong, and you should probably restart squid.
In the utilization section, what is Other?
Other is a default category to track objects which don't fall into one of the defined categories.
In the utilization section, why is the Transfer KB/sec column always zero?
This column contains gross estimations of data transfer rates averaged over the entire time the cache has been running. These numbers are unreliable and mostly useless.
In the utilization section, what is the Object Count?
The number of objects of that type in the cache right now.
In the utilization section, what is the Max/Current/Min KB?
These refer to the size all the objects of this type have grown to/currently are/shrunk to.
What is the I/O section about?
These are histograms on the number of bytes read from the network per read(2) call. Somewhat useful for determining maximum buffer sizes.
What is the Objects section for?
|
This will download to your browser a list of every URL in the cache and statistics about it. It can be very, very large. Sometimes it will be larger than the amount of available memory in your client! You probably don't need this information anyway. |
What is the VM Objects section for?
VM Objects are the objects which are in Virtual Memory. These are objects which are currently being retrieved and those which were kept in memory for fast access (accelerator mode).
What does AVG RTT mean?
Average Round Trip Time. This is how long on average after an ICP ping is sent that a reply is received.
In the IP cache section, what's the difference between a hit, a negative hit and a miss?
A HIT means that the document was found in the cache. A MISS, that it wasn't found in the cache. A negative hit means that it was found in the cache, but it doesn't exist.
What do the IP cache contents mean anyway?
The hostname is the name that was requested to be resolved.
For the Flags column:
C means positively cached.
N means negatively cached.
P means the request is pending being dispatched.
D means the request has been dispatched and we're waiting for an answer.
L means it is a locked entry because it represents a parent or sibling.
The TTL column represents "Time To Live" (i.e., how long the cache entry is valid). (May be negative if the entry has expired.)
The N column is the number of hostnames which the cache has translations for.
The rest of the line lists all the host names that have been associated with that IP cache entry.
What is the fqdncache and how is it different from the ipcache?
IPCache contains data for the Hostname to IP-Number mapping, and FQDNCache does it the other way round. For example:
IP Cache Contents:
Hostname Flags lstref TTL N [IP-Number] gorn.cc.fh-lippe.de C 0 21581 1 193.16.112.73 lagrange.uni-paderborn.de C 6 21594 1 131.234.128.245 www.altavista.digital.com C 10 21299 4 204.123.2.75 ... 2/ftp.symantec.com DL 1583 -772855 0 Flags: C --> Cached D --> Dispatched N --> Negative Cached L --> Locked lstref: Time since last use TTL: Time-To-Live until information expires N: Count of addresses
FQDN Cache Contents:
IP-Number Flags TTL N Hostname 130.149.17.15 C -45570 1 andele.cs.tu-berlin.de 194.77.122.18 C -58133 1 komet.teuto.de 206.155.117.51 N -73747 0 Flags: C --> Cached D --> Dispatched N --> Negative Cached L --> Locked TTL: Time-To-Live until information expires N: Count of names
What does "Page faults with physical i/o: 4897" mean?
This question was asked on the squid-users mailing list, to which there were three excellent replies.
by Jonathan Larmour
You get a "page fault" when your OS tries to access something in memory which is actually swapped to disk. The term "page fault" while correct at the kernel and CPU level, is a bit deceptive to a user, as there's no actual error - this is a normal feature of operation.
Also, this doesn't necessarily mean your squid is swapping by that much. Most operating systems also implement paging for executables, so that only sections of the executable which are actually used are read from disk into memory. Also, whenever squid needs more memory, the fact that the memory was allocated will show up in the page faults.
However, if the number of faults is unusually high, and getting bigger, this could mean that squid is swapping. Another way to verify this is using a program called "vmstat" which is found on most UNIX platforms. If you run this as "vmstat 5" this will update a display every 5 seconds. This can tell you if the system as a whole is swapping a lot (see your local man page for vmstat for more information).
It is very bad for squid to swap, as every single request will be blocked until the requested data is swapped in. It is better to tweak the cache_mem and/or memory_pools setting in squid.conf, or switch to the NOVM versions of squid, than allow this to happen.
by Peter Wemm
There's two different operations at work, Paging and swapping. Paging is when individual pages are shuffled (either discarded or swapped to/from disk), while "swapping" generally means the entire process got sent to/from disk.
Needless to say, swapping a process is a pretty drastic event, and usually only reserved for when there's a memory crunch and paging out cannot free enough memory quickly enough. Also, there's some variation on how swapping is implemented in OS's. Some don't do it at all or do a hybrid of paging and swapping instead.
As you say, paging out doesn't necessarily involve disk IO, eg: text (code) pages are read-only and can simply be discarded if they are not used (and reloaded if/when needed). Data pages are also discarded if unmodified, and paged out if there's been any changes. Allocated memory (malloc) is always saved to disk since there's no executable file to recover the data from. mmap() memory is variable.. If it's backed from a file, it uses the same rules as the data segment of a file - ie: either discarded if unmodified or paged out.
There's also "demand zeroing" of pages as well that cause faults.. If you malloc memory and it calls brk()/sbrk() to allocate new pages, the chances are that you are allocated demand zero pages. Ie: the pages are not "really" attached to your process yet, but when you access them for the first time, the page fault causes the page to be connected to the process address space and zeroed - this saves unnecessary zeroing of pages that are allocated but never used.
The "page faults with physical IO" comes from the OS via getrusage(). It's highly OS dependent on what it means. Generally, it means that the process accessed a page that was not present in memory (for whatever reason) and there was disk access to fetch it. Many OS's load executables by demand paging as well, so the act of starting squid implicitly causes page faults with disk IO - however, many (but not all) OS's use "read ahead" and "prefault" heuristics to streamline the loading. Some OS's maintain "intent queues" so that pages can be selected as pageout candidates ahead of time. When (say) squid touches a freshly allocated demand zero page and one is needed, the OS can page out one of the candidates on the spot, causing a 'fault with physical IO' with demand zeroing of allocated memory which doesn't happen on many other OS's. (The other OS's generally put the process to sleep while the pageout daemon finds a page for it).
The meaning of "swapping" varies. On FreeBSD for example, swapping out is implemented as unlocking upages, kernel stack, PTD etc for aggressive pageout with the process. The only thing left of the process in memory is the 'struct proc'. The FreeBSD paging system is highly adaptive and can resort to paging in a way that is equivalent to the traditional swapping style operation (ie: entire process). FreeBSD also tries stealing pages from active processes in order to make space for disk cache. I suspect this is why setting 'memory_pools off' on the non-NOVM squids on FreeBSD is reported to work better - the VM/buffer system could be competing with squid to cache the same pages. It's a pity that squid cannot use mmap() to do file IO on the 4K chunks in it's memory pool (I can see that this is not a simple thing to do though, but that won't stop me wishing. :-).
by John Line
The comments so far have been about what paging/swapping figures mean in a "traditional" context, but it's worth bearing in mind that on some systems (Sun's Solaris 2, at least), the virtual memory and filesystem handling are unified and what a user process sees as reading or writing a file, the system simply sees as paging something in from disk or a page being updated so it needs to be paged out. (I suppose you could view it as similar to the operating system memory-mapping the files behind-the-scenes.)
The effect of this is that on Solaris 2, paging figures will also include file I/O. Or rather, the figures from vmstat certainly appear to include file I/O, and I presume (but can't quickly test) that figures such as those quoted by Squid will also include file I/O.
To confirm the above (which represents an impression from what I've read and observed, rather than 100% certain facts...), using an otherwise idle Sun Ultra 1 system system I just tried using cat (small, shouldn't need to page) to copy (a) one file to another, (b) a file to /dev/null, (c) /dev/zero to a file, and (d) /dev/zero to /dev/null (interrupting the last two with control-C after a while!), while watching with vmstat. 300-600 page-ins or page-outs per second when reading or writing a file (rather than a device), essentially zero in other cases (and when not cat-ing).
So ... beware assuming that all systems are similar and that paging figures represent *only* program code and data being shuffled to/from disk - they may also include the work in reading/writing all those files you were accessing...
Ok, so what is unusually high?
You'll probably want to compare the number of page faults to the number of HTTP requests. If this ratio is close to, or exceeding 1, then Squid is paging too much.
What does the IGNORED field mean in the 'cache server list'?
This refers to ICP replies which Squid ignored, for one of these reasons:
- The URL in the reply could not be found in the cache at all.
- The URL in the reply was already being fetched. Probably this ICP reply arrived too late.
The URL in the reply did not have a MemObject associated with it. Either the request is already finished, or the user aborted before the ICP arrived.
The reply came from a multicast-responder, but the cache_peer_access configuration does not allow us to forward this request to that neighbor.
- Source-Echo replies from known neighbors are ignored.
- ICP_OP_DENIED replies are ignored after the first 100.
Contents
- ACL elements
- Access Lists
- How do I allow my clients to use the cache?
- how do I configure Squid not to cache a specific server?
- How do I implement an ACL ban list?
- How do I block specific users or groups from accessing my cache?
- Do you have a CGI program which lets users change their own proxy passwords?
- Is there a way to do ident lookups only for a certain host and compare the result with a userlist in squid.conf?
- Common Mistakes
- I set up my access controls, but they don't work! why?
- Proxy-authentication and neighbor caches
- Is there an easy way of banning all Destination addresses except one?
- Does anyone have a ban list of porn sites and such?
- Squid doesn't match my subdomains
- Why does Squid deny some port numbers?
- Does Squid support the use of a database such as mySQL for storing the ACL list?
- How can I allow a single address to access a specific URL?
- How can I allow some clients to use the cache at specific times?
- How can I allow some users to use the cache at specific times?
- Problems with IP ACL's that have complicated netmasks
- Can I set up ACL's based on MAC address rather than IP?
- Can I limit the number of connections from a client?
- I'm trying to deny ''foo.com'', but it's not working.
- I want to customize, or make my own error messages.
- I want to use local time zone in error messages.
- I want to put ACL parameters in an external file.
- I want to authorize users depending on their MS Windows group memberships
- Maximum length of an acl name
Squid's access control scheme is relatively comprehensive and difficult for some people to understand. There are two different components: ACL elements, and access lists. An access list consists of an allow or deny action followed by a number of ACL elements.
ACL elements
|
The information here is current for version 2.5. |
Squid knows about the following types of ACL elements:
src: source (client) IP addresses
dst: destination (server) IP addresses
myip: the local IP address of a client's connection
srcdomain: source (client) domain name
dstdomain: destination (server) domain name
srcdom_regex: source (client) regular expression pattern matching
dstdom_regex: destination (server) regular expression pattern matching
time: time of day, and day of week
url_regex: URL regular expression pattern matching
urlpath_regex: URL-path regular expression pattern matching, leaves out the protocol and hostname
port: destination (server) port number
myport: local port number that client connected to
proto: transfer protocol (http, ftp, etc)
method: HTTP request method (get, post, etc)
browser: regular expression pattern matching on the request's user-agent header
ident: string matching on the user's name
ident_regex: regular expression pattern matching on the user's name
src_as: source (client) Autonomous System number
dst_as: destination (server) Autonomous System number
proxy_auth: user authentication via external processes
proxy_auth_regex: user authentication via external processes
snmp_community: SNMP community string matching
maxconn: a limit on the maximum number of connections from a single client IP address
req_mime_type: regular expression pattern matching on the request content-type header
arp: Ethernet (MAC) address matching
rep_mime_type: regular expression pattern matching on the reply (downloaded content) content-type header. This is only usable in the http_reply_access directive, not http_access.
external: lookup via external acl helper defined by external_acl_type
Notes:
Not all of the ACL elements can be used with all types of access lists (described below). For example, snmp_community is only meaningful when used with snmp_access. The src_as and dst_as types are only used in cache_peer_access access lists.
The arp ACL requires the special configure option --enable-arp-acl. Furthermore, the ARP ACL code is not portable to all operating systems. It works on Linux, Solaris, and some *BSD variants.
The SNMP ACL element and access list require the --enable-snmp configure option.
Some ACL elements can cause processing delays. For example, use of src_domain and srcdom_regex require a reverse DNS lookup on the client's IP address. This lookup adds some delay to the request.
Each ACL element is assigned a unique name. A named ACL element consists of a list of values. When checking for a match, the multiple values use OR logic. In other words, an ACL element is matched when any one of its values is a match.
You can't give the same name to two different types of ACL elements. It will generate a syntax error.
You can put different values for the same ACL name on different lines. Squid combines them into one list.
Access Lists
There are a number of different access lists:
http_access: Allows HTTP clients (browsers) to access the HTTP port. This is the primary access control list.
http_reply_access: Allows HTTP clients (browsers) to receive the reply to their request. This further restricts permissions given by http_access, and is primarily intended to be used together with the rep_mime_type acl type for blocking different content types.
icp_access: Allows neighbor caches to query your cache with ICP.
miss_access: Allows certain clients to forward cache misses through your cache. This further restricts permissions given by http_access, and is primarily intended to be used for enforcing sibling relations by denying siblings from forwarding cache misses through your cache.
cache: Defines responses that should not be cached.
redirector_access: Controls which requests are sent through the redirector pool.
ident_lookup_access: Controls which requests need an Ident lookup.
always_direct: Controls which requests should always be forwarded directly to origin servers.
never_direct: Controls which requests should never be forwarded directly to origin servers.
snmp_access: Controls SNMP client access to the cache.
broken_posts: Defines requests for which squid appends an extra CRLF after POST message bodies as required by some broken origin servers.
cache_peer_access: Controls which requests can be forwarded to a given neighbor (peer).
Notes:
An access list rule consists of an allow or deny keyword, followed by a list of ACL element names.
An access list consists of one or more access list rules.
Access list rules are checked in the order they are written. List searching terminates as soon as one of the rules is a match.
If a rule has multiple ACL elements, it uses AND logic. In other words, all ACL elements of the rule must be a match in order for the rule to be a match. This means that it is possible to write a rule that can never be matched. For example, a port number can never be equal to both 80 AND 8000 at the same time.
To summarise the acl logics can be described as:
http_access allow|deny acl AND acl AND ... OR http_access allow|deny acl AND acl AND ... OR ...
If none of the rules are matched, then the default action is the opposite of the last rule in the list. Its a good idea to be explicit with the default action. The best way is to use the all ACL. For example:
acl all src 0/0 http_access deny all
How do I allow my clients to use the cache?
Define an ACL that corresponds to your client's IP addresses. For example:
acl myclients src 172.16.5.0/24
Next, allow those clients in the http_access list:
http_access allow myclients
how do I configure Squid not to cache a specific server?
acl someserver dstdomain .someserver.com cache deny someserver
How do I implement an ACL ban list?
As an example, we will assume that you would like to prevent users from accessing cooking recipes.
One way to implement this would be to deny access to any URLs that contain the words "cooking" or "recipe." You would use these configuration lines:
acl Cooking1 url_regex cooking acl Recipe1 url_regex recipe acl myclients src 172.16.5.0/24 http_access deny Cooking1 http_access deny Recipe1 http_access allow myclients http_access deny all
The url_regex means to search the entire URL for the regular expression you specify. Note that these regular expressions are case-sensitive, so a url containing "Cooking" would not be denied.
Another way is to deny access to specific servers which are known to hold recipes. For example:
acl Cooking2 dstdomain www.gourmet-chef.com http_access deny Cooking2 http_access allow all
The dstdomain means to search the hostname in the URL for the string "www.gourmet-chef.com." Note that when IP addresses are used in URLs (instead of domain names), Squid-1.1 implements relaxed access controls. If the a domain name for the IP address has been saved in Squid's "FQDN cache," then Squid can compare the destination domain against the access controls. However, if the domain is not immediately available, Squid allows the request and makes a lookup for the IP address so that it may be available for future reqeusts.
How do I block specific users or groups from accessing my cache?
Using Ident
You can use ident lookups to allow specific users access to your cache. This requires that an ident server process runs on the user's machine(s). In your squid.conf configuration file you would write something like this:
ident_lookup_access allow all acl friends ident kim lisa frank joe http_access allow friends http_access deny all
Using Proxy Authentication
Another option is to use proxy-authentication. In this scheme, you assign usernames and passwords to individuals. When they first use the proxy they are asked to authenticate themselves by entering their username and password.
In Squid v2 this authentication is hanled via external processes. For information on how to configure this, please see ../ProxyAuthentication.
Do you have a CGI program which lets users change their own proxy passwords?
Pedro L Orso has adapted the Apache's htpasswd into a CGI program called [/htpasswd/chpasswd-cgi.tar.gz chpasswd.cgi].
Is there a way to do ident lookups only for a certain host and compare the result with a userlist in squid.conf?
You can use the ident_access directive to control for which hosts Squid will issue ident lookup requests.
Additionally, if you use a ident ACL in squid conf, then Squid will make sure an ident lookup is performed while evaluating the acl even if iden_access does not indicate ident lookups should be performed.
However, Squid does not wait for the lookup to complete unless the ACL rules require it. Consider this configuration:
acl host1 src 10.0.0.1 acl host2 src 10.0.0.2 acl pals ident kim lisa frank joe http_access allow host1 http_access allow host2 pals
Requests coming from 10.0.0.1 will be allowed immediately because there are no user requirements for that host. However, requests from 10.0.0.2 will be allowed only after the ident lookup completes, and if the username is in the set kim, lisa, frank, or joe.
Common Mistakes
And/Or logic
You've probably noticed (and been frustrated by) the fact that you cannot combine access controls with terms like "and" or "or." These operations are already built in to the access control scheme in a fundamental way which you must understand.
All elements of an acl entry are OR'ed together.
All elements of an access entry are AND'ed together (e.g. http_access and icp_access)
For example, the following access control configuration will never work:
acl ME src 10.0.0.1 acl YOU src 10.0.0.2 http_access allow ME YOU
In order for the request to be allowed, it must match the "ME" acl AND the "YOU" acl. This is impossible because any IP address could only match one or the other. This should instead be rewritten as:
acl ME src 10.0.0.1 acl YOU src 10.0.0.2 http_access allow ME http_access allow YOU
Or, alternatively, this would also work:
acl US src 10.0.0.1 10.0.0.2 http_access allow US
allow/deny mixups
I have read through my squid.conf numerous times, spoken to my neighbors, read the FAQ and Squid Docs and cannot for the life of me work out why the following will not work.
I can successfully access cachemgr.cgi from our web server machine here, but I would like to use MRTG to monitor various aspects of our proxy. When I try to use 'squidclient' or GET cache_object from the machine the proxy is running on, I always get access denied.
acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl server src 1.2.3.4/255.255.255.255 acl all src 0.0.0.0/0.0.0.0 acl ourhosts src 1.2.0.0/255.255.0.0 http_access deny manager !localhost !server http_access allow ourhosts http_access deny all
The intent here is to allow cache manager requests from the localhost and server addresses, and deny all others. This policy has been expressed here:
http_access deny manager !localhost !server
The problem here is that for allowable requests, this access rule is not matched. For example, if the source IP address is localhost, then "!localhost" is false and the access rule is not matched, so Squid continues checking the other rules. Cache manager requests from the server address work because server is a subset of ourhosts and the second access rule will match and allow the request. Also note that this means any cache manager request from ourhosts would be allowed.
To implement the desired policy correctly, the access rules should be rewritten as
http_access allow manager localhost http_access allow manager server http_access deny manager http_access allow ourhosts http_access deny all
If you're using miss_access, then don't forget to also add a miss_access rule for the cache manager:
miss_access allow manager
You may be concerned that the having five access rules instead of three may have an impact on the cache performance. In our experience this is not the case. Squid is able to handle a moderate amount of access control checking without degrading overall performance. You may like to verify that for yourself, however.
Differences between ''src'' and ''srcdomain'' ACL types
For the srcdomain ACL type, Squid does a reverse lookup of the client's IP address and checks the result with the domains given on the acl line. With the src ACL type, Squid converts hostnames to IP addresses at startup and then only compares the client's IP address. The src ACL is preferred over srcdomain because it does not require address-to-name lookups for each request.
I set up my access controls, but they don't work! why?
If ACLs are giving you problems and you don't know why they aren't working, you can use this tip to debug them.
In squid.conf enable debugging for section 33 at level 2. For example:
debug_options ALL,1 33,2
Then restart or reconfigure squid.
From now on, your cache.log should contain a line for every request that explains if it was allowed, or denied, and which ACL was the last one that it matched.
If this does not give you sufficient information to nail down the problem you can also enable detailed debug information on ACL processing
debug_options ALL,1 33,2 28,9
Then restart or reconfigure squid as above.
From now on, your cache.log should contain detailed traces of all access list processing. Be warned that this can be quite some lines per request.
See also ../TroubleShooting.
Proxy-authentication and neighbor caches
The problem
[ Parents ] / \ / \ [ Proxy A ] --- [ Proxy B ] | | USER
Proxy A sends and ICP query to Proxy B about an object, Proxy B replies with an ICP_HIT. Proxy A forwards the HTTP request to Proxy B, but does not pass on the authentication details, therefore the HTTP GET from Proxy A fails.
Only ONE proxy cache in a chain is allowed to "use" the Proxy-Authentication request header. Once the header is used, it must not be passed on to other proxies.
Therefore, you must allow the neighbor caches to request from each other without proxy authentication. This is simply accomplished by listing the neighbor ACL's first in the list of http_access lines. For example:
acl proxy-A src 10.0.0.1 acl proxy-B src 10.0.0.2 acl user_passwords proxy_auth /tmp/user_passwds http_access allow proxy-A http_access allow proxy-B http_access allow user_passwords http_access deny all
Squid 2.5 allows two exceptions to this rule, by defining the appropriate cache_peer options:
cache_peer parent.foo.com parent login=PASS
This will forward the user's credentials as-is to the parent proxy which will be thus able to authenticate again.
|
This will only work with the Basic authentication scheme. If any other scheme is enabled, it will fail |
cache_peer parent.foo.com parent login=*:somepassword
This will perform Basic authentication against the parent, sending the username of the current client connection and as password always somepassword. The parent will need to authorization against the child cache's IP address, as if there was no authentication forwarding, and it will need to perform client authentication for all usernames against somepassword via a specially-designed authentication helper. The purpose is to log the client cache's usernames into the parent's access.log. You can find an example semi-tested helper of that kind as parent_auth.pl .
Is there an easy way of banning all Destination addresses except one?
acl GOOD dst 10.0.0.1 acl BAD dst 0.0.0.0/0.0.0.0 http_access allow GOOD http_access deny BAD
Does anyone have a ban list of porn sites and such?
Snerpa, an ISP in Iceland operates a DNS-database of IP-addresses of blacklisted sites containing porn, violence, etc. which is utilized using a small perl-script redirector. Information on this on the INfilter webpage.
The SquidGuard redirector folks provide a blacklist.
Bill Stearns maintains the sa-blacklist of known spammers. By blocking the spammer web sites in squid, users can no longer use up bandwidth downloading spam images and html. Even more importantly, they can no longer send out requests for things like scripts and gifs that have a unique identifer attached, showing that they opened the email and making their addresses more valuable to the spammer.
The SleezeBall site has a list of patterns that you can download.
Squid doesn't match my subdomains
If you are using Squid-2.4 or later then keep in mind that dstdomain acls uses different syntax for exact host matches and entire domain matches. www.example.com matches the exact host www.example.com, while .example.com matches the entire domain example.com (including example.com alone)
There is also subtle issues if your dstdomain ACLs contains matches for both an exact host in a domain and the whole domain where both are in the same domain (i.e. both www.example.com and .example.com). Depending on how your data is ordered this may cause only the most specific of these (e.g. www.example.com) to be used.
|
Current Squid versions (as of Squid-2.4) will warn you when this kind of configuration is used. If your Squid does not warn you while reading the configuration file you do not have the problem described below. Also the configuration here uses the dstdomain syntax of Squid-2.1 or earlier.. (2.2 and later needs to have domains prefixed by a dot) |
There is a subtle problem with domain-name based access controls when a single ACL element has an entry that is a subdomain of another entry. For example, consider this list:
acl FOO dstdomain boulder.co.us vail.co.us co.us
In the first place, the above list is simply wrong because the first two (boulder.co.us and vail.co.us) are unnecessary. Any domain name that matches one of the first two will also match the last one (co.us). Ok, but why does this happen?
The problem stems from the data structure used to index domain names in an access control list. Squid uses Splay trees for lists of domain names. As other tree-based data structures, the searching algorithm requires a comparison function that returns -1, 0, or +1 for any pair of keys (domain names). This is similar to the way that strcmp() works.
The problem is that it is wrong to say that co.us is greater-than, equal-to, or less-than boulder.co.us.
For example, if you said that co.us is LESS than fff.co.us, then the Splay tree searching algorithm might never discover co.us as a match for kkk.co.us.
similarly, if you said that co.us is GREATER than fff.co.us, then the Splay tree searching algorithm might never discover co.us as a match for bbb.co.us.
The bottom line is that you can't have one entry that is a subdomain of another. Squid-2.2 will warn you if it detects this condition.
Why does Squid deny some port numbers?
It is dangerous to allow Squid to connect to certain port numbers. For example, it has been demonstrated that someone can use Squid as an SMTP (email) relay. As I'm sure you know, SMTP relays are one of the ways that spammers are able to flood our mailboxes. To prevent mail relaying, Squid denies requests when the URL port number is 25. Other ports should be blocked as well, as a precaution.
There are two ways to filter by port number: either allow specific ports, or deny specific ports. By default, Squid does the first. This is the ACL entry that comes in the default squid.conf:
acl Safe_ports port 80 21 443 563 70 210 1025-65535 http_access deny !Safe_ports
The above configuration denies requests when the URL port number is not in the list. The list allows connections to the standard ports for HTTP, FTP, Gopher, SSL, WAIS, and all non-priveleged ports.
Another approach is to deny dangerous ports. The dangerous port list should look something like:
acl Dangerous_ports 7 9 19 22 23 25 53 109 110 119 http_access deny Dangerous_ports
...and probably many others.
Please consult the /etc/services file on your system for a list of known ports and protocols.
Does Squid support the use of a database such as mySQL for storing the ACL list?
Yes, Squid supports acl interaction with external data sources via the external_acl_type directive. Helpers for LDAP and NT Domain group membership is included in the distribution and it's very easy to write additional helpers to fit your environment.
How can I allow a single address to access a specific URL?
This example allows only the special_client to access the special_url. Any other client that tries to access the special_url is denied.
acl special_client src 10.1.2.3 acl special_url url_regex ^http://www.squid-cache.org/Doc/FAQ/$ http_access allow special_client special_url http_access deny special_url
How can I allow some clients to use the cache at specific times?
Let's say you have two workstations that should only be allowed access to the Internet during working hours (8:30 - 17:30). You can use something like this:
acl FOO src 10.1.2.3 10.1.2.4 acl WORKING time MTWHF 08:30-17:30 http_access allow FOO WORKING http_access deny FOO
How can I allow some users to use the cache at specific times?
acl USER1 proxy_auth Dick acl USER2 proxy_auth Jane acl DAY time 06:00-18:00 http_access allow USER1 DAY http_access deny USER1 http_access allow USER2 !DAY http_access deny USER2
Problems with IP ACL's that have complicated netmasks
The following ACL entry gives inconsistent or unexpected results:
acl restricted src 10.0.0.128/255.0.0.128 10.85.0.0/16
The reason is that IP access lists are stored in "splay" tree data structures. These trees require the keys to be sortable. When you use a complicated, or non-standard, netmask (255.0.0.128), it confuses the function that compares two address/mask pairs.
The best way to fix this problem is to use separate ACL names for each ACL value. For example, change the above to:
acl restricted1 src 10.0.0.128/255.0.0.128 acl restricted2 src 10.85.0.0/16
Then, of course, you'll have to rewrite your http_access lines as well.
Can I set up ACL's based on MAC address rather than IP?
Yes, for some operating systes. Squid calls these "ARP ACLs" and they are supported on Linux, Solaris, and probably BSD variants.
|
MAC address is only available for clients that are on the same subnet. If the client is on a different subnet, then Squid can not find out its MAC address as the MAC is replaced by the router MAC when a packet is router. |
To use ARP (MAC) access controls, you first need to compile in the optional code. Do this with the --enable-arp-acl configure option:
% ./configure --enable-arp-acl ... % make clean % make
If src/acl.c doesn't compile, then ARP ACLs are probably not supported on your system.
If everything compiles, then you can add some ARP ACL lines to your squid.conf:
acl M1 arp 01:02:03:04:05:06 acl M2 arp 11:12:13:14:15:16 http_access allow M1 http_access allow M2 http_access deny all
Can I limit the number of connections from a client?
Yes, use the maxconn ACL type in conjunction with http_access deny. For example:
acl losers src 1.2.3.0/24 acl 5CONN maxconn 5 http_access deny 5CONN losers
Given the above configuration, when a client whose source IP address is in the 1.2.3.0/24 subnet tries to establish 6 or more connections at once, Squid returns an error page. Unless you use the deny_info feature, the error message will just say "access denied."
The maxconn ACL requires the client_db feature. If you've disabled client_db (for example with client_db off) then maxconn ALCs will not work.
Note, the maxconn ACL type is kind of tricky because it uses less-than comparison. The ACL is a match when the number of established connections is greater than the value you specify. Because of that, you don't want to use the maxconn ACL with http_access allow.
Also note that you could use maxconn in conjunction with a user type (ident, proxy_auth), rather than an IP address type.
I'm trying to deny ''foo.com'', but it's not working.
In Squid-2.3 we changed the way that Squid matches subdomains. There is a difference between .foo.com and foo.com. The first matches any domain in foo.com, while the latter matches only "foo.com" exactly. So if you want to deny bar.foo.com, you should write
acl yuck dstdomain .foo.com http_access deny yuck
I want to customize, or make my own error messages.
You can customize the existing error messages as described in Customizable Error Messages in ../MiscFeatures. You can also create new error messages and use these in conjunction with the deny_info option.
For example, lets say you want your users to see a special message when they request something that matches your pornography list. First, create a file named ERR_NO_PORNO in the /usr/local/squid/etc/errors directory. That file might contain something like this:
Our company policy is to deny requests to known porno sites. If you feel you've received this message in error, please contact the support staff (support@this.company.com, 555-1234).
Next, set up your access controls as follows:
acl porn url_regex "/usr/local/squid/etc/porno.txt" deny_info ERR_NO_PORNO porn http_access deny porn (additional http_access lines ...)
I want to use local time zone in error messages.
Squid, by default, uses GMT as timestamp in all generated error messages. This to allow the cache to participate in a hierarchy of caches in different timezones without risking confusion about what the time is.
To change the timestamp in Squid generated error messages you must change the Squid signature. See Customizable Error Messages in ../MiscFeatures. The signature by defaults uses %T as timestamp, but if you like then you can use %t instead for a timestamp using local time zone.
I want to put ACL parameters in an external file.
by Adam Aube
Squid can read ACL parameters from an external file. To do this, first place the acl parameters, one per line, in a file. Then, on the ACL line in squid.conf, put the full path to the file in double quotes.
For example, instead of:
acl trusted_users proxy_auth john jane jim
you would have:
acl trusted_users proxy_auth "/usr/local/squid/etc/trusted_users.txt"
Inside trusted_users.txt, there is:
john jane jim
I want to authorize users depending on their MS Windows group memberships
There is an excellent resource over at http://workaround.org/moin/SquidLdap on how to use LDAP-based group membership checking.
Maximum length of an acl name
By default the maximum length of an ACL name is 32-1 = 31 characters, but it can be changed by editing the source: in defines.h
#define ACL_NAME_SZ 32
Contents
- Why am I getting "Proxy Access Denied?"
- I can't get ''local_domain'' to work; ''Squid'' is caching the objects from my local servers.
- Connection Refused when reaching a sibling
- Running out of filedescriptors
- What are these strange lines about removing objects?
- Can I change a Windows NT FTP server to list directories in Unix format?
- Why am I getting "Ignoring MISS from non-peer x.x.x.x?"
- DNS lookups for domain names with underscores (_) always fail.
- Why does Squid say: "Illegal character in hostname; underscores are not allowed?'
- Why am I getting access denied from a sibling cache?
- Cannot bind socket FD NN to *:8080 (125) Address already in use
- icpDetectClientClose: ERROR xxx.xxx.xxx.xxx: (32) Broken pipe
- icpDetectClientClose: FD 135, 255 unexpected bytes
- Does Squid work with NTLM Authentication?
- The ''default'' parent option isn't working!
- "Hotmail" complains about: Intrusion Logged. Access denied.
- My Squid becomes very slow after it has been running for some time.
- WARNING: Failed to start 'dnsserver'
- Sending bug reports to the Squid team
- Debugging Squid
- FATAL: ipcache_init: DNS name lookup tests failed
- FATAL: Failed to make swap directory /var/spool/cache: (13) Permission denied
- FATAL: Cannot open HTTP Port
- FATAL: All redirectors have exited!
- FATAL: file_map_allocate: Exceeded filemap limit
- FATAL: You've run out of swap file numbers.
- I am using up over 95% of the filemap bits?!!
- FATAL: Cannot open /usr/local/squid/logs/access.log: (13) Permission denied
- When using a username and password, I can not access some files.
- pingerOpen: icmp_sock: (13) Permission denied
- What is a forwarding loop?
- accept failure: (71) Protocol error
- storeSwapInFileOpened: ... Size mismatch
- Why do I get ''fwdDispatch: Cannot retrieve 'https://www.buy.com/corp/ordertracking.asp' ''
- Squid can't access URLs like http://3626046468/ab2/cybercards/moreinfo.html
- I get a lot of "URI has whitespace" error messages in my cache log, what should I do?
- commBind: Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address
- Unknown cache_dir type '/var/squid/cache'
- unrecognized: 'cache_dns_program /usr/local/squid/bin/dnsserver'
- Is ''dns_defnames'' broken in Squid-2.3 and later?
- What does "sslReadClient: FD 14: read failure: (104) Connection reset by peer" mean?
- What does ''Connection refused'' mean?
- squid: ERROR: no running copy
- FATAL: getgrnam failed to find groupid for effective group 'nogroup'
- "Unsupported Request Method and Protocol" for ''https'' URLs.
- Squid uses 100% CPU
- Webmin's ''cachemgr.cgi'' crashes the operating system
- Segment Violation at startup or upon first request
- urlParse: Illegal character in hostname 'proxy.mydomain.com:8080proxy.mydomain.com'
- Requests for international domain names does not work
- Why do I sometimes get "Zero Sized Reply"?
- Why do I get "The request or reply is too large" errors?
- Negative or very large numbers in Store Directory Statistics, or constant complaints about cache above limit
- Squid problems with Windows Update v5
Why am I getting "Proxy Access Denied?"
You may need to set up the http_access option to allow requests from your IP addresses. Please see ../SquidAcl for information about that.
If squid is in httpd-accelerator mode, it will accept normal HTTP requests and forward them to a HTTP server, but it will not honor proxy requests. If you want your cache to also accept proxy-HTTP requests then you must enable this feature:
httpd_accel_with_proxy on
Alternately, you may have misconfigured one of your ACLs. Check the access.log and squid.conf files for clues.
I can't get ''local_domain'' to work; ''Squid'' is caching the objects from my local servers.
The local_domain directive does not prevent local objects from being cached. It prevents the use of sibling caches when fetching local objects. If you want to prevent objects from being cached, use the cache_stoplist or http_stop configuration options (depending on your version).
Connection Refused when reaching a sibling
I get Connection Refused when the cache tries to retrieve an object located on a sibling, even though the sibling thinks it delivered the object to my cache.
If the HTTP port number is wrong but the ICP port is correct you will send ICP queries correctly and the ICP replies will fool your cache into thinking the configuration is correct but large objects will fail since you don't have the correct HTTP port for the sibling in your squid.conf file. If your sibling changed their http_port, you could have this problem for some time before noticing.
Running out of filedescriptors
If you see the Too many open files error message, you are most likely running out of file descriptors. This may be due to running Squid on an operating system with a low filedescriptor limit. This limit is often configurable in the kernel or with other system tuning tools. There are two ways to run out of file descriptors: first, you can hit the per-process limit on file descriptors. Second, you can hit the system limit on total file descriptors for all processes.
Linux
Linux kernel 2.2.12 and later supports "unlimited" number of open files without patching. So does most of glibc-2.1.1 and later (all areas touched by Squid is safe from what I can tell, even more so in later glibc releases). But you still need to take some actions as the kernel defaults to only allow processes to use up to 1024 filedescriptors, and Squid picks up the limit at build time.
Edit /usr/include/bits/types.h to define __FD_SETSIZE to at least the amount of filedescriptors you'd like to support (Not required for Squid-2.5 and later).
Before configuring Squid run "ulimit -HSn ####" (where #### is the number of filedescriptors you need to support). Be sure to run "make clean" before configure if you have already run configure as the script might otherwise have cached the prior result.
- Configure, build and install Squid as usual
Make sure your script for starting Squid contains the above ulimit command to raise the filedescriptor limit. You may also need to allow a larger port span for outgoing connections (set in /proc/sys/net/ipv4/, like in "echo 1024 32768 > /proc/sys/net/ipv4/ip_local_port_range")
Alternatively you can
- Run configure with your needed configure options
- edit include/autoconf.h and define SQUID_MAXFD to your desired limit. Make sure to make it a nice and clean modulo 64 value (multiple of 64) to avoid various bugs in the libc headers.
- build and install Squid as usual
- Set the runtime ulimit as described above when starting Squid.
If running things as root is not an option then get your sysadmin to install a the needed ulimit command in /etc/inittscript (see man initscript), install a patched kernel where INR_OPEN in include/linux/fs.h is changed to at least the amount you need or have them install a small suid program which sets the limit (see link below).
More information can be found from Henriks How to get many filedescriptors on Linux 2.2.X and later page.
Solaris
Add the following to your /etc/system file and reboot to increase your maximum file descriptors per process:
set rlim_fd_max = 4096
Next you should re-run the configure script in the top directory so that it finds the new value. If it does not find the new limit, then you might try editing include/autoconf.h and setting #define DEFAULT_FD_SETSIZE by hand. Note that include/autoconf.h is created from autoconf.h.in every time you run configure. Thus, if you edit it by hand, you might lose your changes later on.
Jens-S. Voeckler advises that you should NOT change the default soft limit (rlim_fd_cur) to anything larger than 256. It will break other programs, such as the license manager needed for the SUN workshop compiler. Jens-S. also says that it should be safe to raise the limit for the Squid process as high as 16,384 except that there may be problems duruing reconfigure or logrotate if all of the lower 256 filedescriptors are in use at the time or rotate/reconfigure.
FreeBSD
- How do I check my maximum filedescriptors?
Do sysctl -a and look for the value of kern.maxfilesperproc.
- How do I increase them?
sysctl -w kern.maxfiles=XXXX sysctl -w kern.maxfilesperproc=XXXX
|
You probably want maxfiles > maxfilesperproc if you're going to be pushing the limit. |
- What is the upper limit?
- I don't think there is a formal upper limit inside the kernel. All the data structures are dynamically allocated. In practice there might be unintended metaphenomena (kernel spending too much time searching tables, for example).
General BSD
For most BSD-derived systems (SunOS, 4.4BSD, OpenBSD, FreeBSD, NetBSD, BSD/OS, 386BSD, Ultrix) you can also use the "brute force" method to increase these values in the kernel (requires a kernel rebuild):
- How do I check my maximum filedescriptors?
Do pstat -T and look for the files value, typically expressed as the ratio of currentmaximum.
- How do I increase them the easy way?
One way is to increase the value of the maxusers variable in the kernel configuration file and build a new kernel. This method is quick and easy but also has the effect of increasing a wide variety of other variables that you may not need or want increased.
- Is there a more precise method?
Another way is to find the param.c file in your kernel build area and change the arithmetic behind the relationship between maxusers and the maximum number of open files.
Here are a few examples which should lead you in the right direction:
SunOS
Change the value of nfile in 'usr/kvm/sys/conf.common/param.c/tt> by altering this equation: int nfile = 16 * (NPROC + 16 + MAXUSERS) / 10 + 64;
Where NPROC is defined by:
#define NPROC (10 + 16 * MAXUSERS)
FreeBSD (from the 2.1.6 kernel)
Very similar to SunOS, edit /usr/src/sys/conf/param.c and alter the relationship between maxusers and the maxfiles and maxfilesperproc variables:
int maxfiles = NPROC*2; int maxfilesperproc = NPROC*2;
Where NPROC is defined by: #define NPROC (20 + 16 * MAXUSERS) The per-process limit can also be adjusted directly in the kernel configuration file with the following directive: options OPEN_MAX=128
BSD/OS (from the 2.1 kernel)
Edit /usr/src/sys/conf/param.c and adjust the maxfiles math here:
int maxfiles = 3 * (NPROC + MAXUSERS) + 80;
Where NPROC is defined by: #define NPROC (20 + 16 * MAXUSERS) You should also set the OPEN_MAX value in your kernel configuration file to change the per-process limit.
Reconfigure afterwards
After you rebuild/reconfigure your kernel with more filedescriptors, you must then recompile Squid. Squid's configure script determines how many filedescriptors are available, so you must make sure the configure script runs again as well. For example:
cd squid-1.1.x make realclean ./configure --prefix=/usr/local/squid make
What are these strange lines about removing objects?
For example:
97/01/23 22:31:10| Removed 1 of 9 objects from bucket 3913 97/01/23 22:33:10| Removed 1 of 5 objects from bucket 4315 97/01/23 22:35:40| Removed 1 of 14 objects from bucket 6391
These log entries are normal, and do not indicate that squid has reached cache_swap_high.
Consult your cache information page in cachemgr.cgi for a line like this:
Storage LRU Expiration Age: 364.01 days
Objects which have not been used for that amount of time are removed as a part of the regular maintenance. You can set an upper limit on the LRU Expiration Age value with reference_age in the config file.
Can I change a Windows NT FTP server to list directories in Unix format?
Why, yes you can! Select the following menus:
- Start
- Programs
- Microsoft Internet Server (Common)
- Internet Service Manager
This will bring up a box with icons for your various services. One of them should be a little ftp "folder." Double click on this.
You will then have to select the server (there should only be one) Select that and then choose "Properties" from the menu and choose the "directories" tab along the top.
There will be an option at the bottom saying "Directory listing style." Choose the "Unix" type, not the "MS-DOS" type.
by Oskar Pearson
Why am I getting "Ignoring MISS from non-peer x.x.x.x?"
You are receiving ICP MISSes (via UDP) from a parent or sibling cache whose IP address your cache does not know about. This may happen in two situations.
If the peer is multihomed, it is sending packets out an interface which is not advertised in the DNS. Unfortunately, this is a configuration problem at the peer site. You can tell them to either add the IP address interface to their DNS, or use Squid's "udp_outgoing_address" option to force the replies out a specific interface. For example: on your parent squid.conf:
udp_outgoing_address proxy.parent.com
on your squid.conf:
cache_peer proxy.parent.com parent 3128 3130
You can also see this warning when sending ICP queries to multicast addresses. For security reasons, Squid requires your configuration to list all other caches listening on the multicast group address. If an unknown cache listens to that address and sends replies, your cache will log the warning message. To fix this situation, either tell the unknown cache to stop listening on the multicast address, or if they are legitimate, add them to your configuration file.
DNS lookups for domain names with underscores (_) always fail.
The standards for naming hosts ( RFC 952 and RFC 1101) do not allow underscores in domain names:
A "name" (Net, Host, Gateway, or Domain name) is a text string up to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-), and period (.).
The resolver library that ships with recent versions of BIND enforces this restriction, returning an error for any host with underscore in the hostname. The best solution is to complain to the hostmaster of the offending site, and ask them to rename their host.
See also the comp.protocols.tcp-ip.domains FAQ.
Some people have noticed that RFC 1033 implies that underscores are allowed. However, this is an informational RFC with a poorly chosen example, and not a standard by any means.
Why does Squid say: "Illegal character in hostname; underscores are not allowed?'
See the above question. The underscore character is not valid for hostnames.
Some DNS resolvers allow the underscore, so yes, the hostname might work fine when you don't use Squid.
To make Squid allow underscores in hostnames, re-run the configure script with this option:
% ./configure --enable-underscores ...
and then recompile:
% make clean % make
Why am I getting access denied from a sibling cache?
The answer to this is somewhat complicated, so please hold on.
|
Most of this text is taken from ICP and the Squid Web Cache |
An ICP query does not include any parent or sibling designation, so the receiver really has no indication of how the peer cache is configured to use it. This issue becomes important when a cache is willing to serve cache hits to anyone, but only handle cache misses for its paying users or customers. In other words, whether or not to allow the request depends on if the result is a hit or a miss. To accomplish this, Squid acquired the miss_access feature in October of 1996.
The necessity of "miss access" makes life a little bit complicated, and not only because it was awkward to implement. Miss access means that the ICP query reply must be an extremely accurate prediction of the result of a subsequent HTTP request. Ascertaining this result is actually very hard, if not impossible to do, since the ICP request cannot convey the full HTTP request. Additionally, there are more types of HTTP request results than there are for ICP. The ICP query reply will either be a hit or miss. However, the HTTP request might result in a "304 Not Modified" reply sent from the origin server. Such a reply is not strictly a hit since the peer needed to forward a conditional request to the source. At the same time, its not strictly a miss either since the local object data is still valid, and the Not-Modified reply is quite small.
One serious problem for cache hierarchies is mismatched freshness parameters. Consider a cache C using "strict" freshness parameters so its users get maximally current data. C has a sibling S with less strict freshness parameters. When an object is requested at C, C might find that S already has the object via an ICP query and ICP HIT response. C then retrieves the object from S.
In an HTTP/1.0 world, C (and Cs client) will receive an object that was never subject to its local freshness rules. Neither HTTP/1.0 nor ICP provides any way to ask only for objects less than a certain age. If the retrieved object is stale by Cs rules, it will be removed from Cs cache, but it will subsequently be fetched from S so long as it remains fresh there. This configuration miscoupling problem is a significant deterrent to establishing both parent and sibling relationships. HTTP/1.1 provides numerous request headers to specify freshness requirements, which actually introduces a different problem for cache hierarchies: ICP still does not include any age information, neither in query nor reply. So In the end, the fundamental problem is that the ICP query does not provide enough information to accurately predict whether the HTTP request will be a hit or miss. In fact, the current ICP Internet Draft is very vague on this subject. What does ICP HIT really mean? Does it mean "I know a little about that URL and have some copy of the object?" Or does it mean "I have a valid copy of that object and you are allowed to get it from me?" So, what can be done about this problem? We really need to change ICP so that freshness parameters are included. Until that happens, the members of a cache hierarchy have only two options to totally eliminate the "access denied" messages from sibling caches: Make sure all members have the same
refresh_rules parameters.
Do not use miss_access at all. Promise your sibling cache administrator that your cache is properly configured and that you will not abuse their generosity. The sibling cache administrator can check his log files to make sure you are keeping your word.
If neither of these is realistic, then the sibling relationship should not exist.
Cannot bind socket FD NN to *:8080 (125) Address already in use
This means that another processes is already listening on port 8080 (or whatever you're using). It could mean that you have a Squid process already running, or it could be from another program. To verify, use the netstat command: netstat -naf inet | grep LISTEN
That will show all sockets in the LISTEN state. You might also try
netstat -naf inet | grep 8080
If you find that some process has bound to your port, but you're not sure which process it is, you might be able to use the excellent lsof program. It will show you which processes own every open file descriptor on your system.
icpDetectClientClose: ERROR xxx.xxx.xxx.xxx: (32) Broken pipe
This means that the client socket was closed by the client before Squid was finished sending data to it. Squid detects this by trying to read(2) some data from the socket. If the read(2) call fails, then Squid konws the socket has been closed. Normally the read(2) call returns ECONNRESET: Connection reset by peer and these are NOT logged. Any other error messages (such as EPIPE: Broken pipe are logged to cache.log. See the "intro" of section 2 of your Unix manual for a list of all error codes.
icpDetectClientClose: FD 135, 255 unexpected bytes
These are caused by misbehaving Web clients attempting to use persistent connections. Squid-1.1 does not support persistent connections.
Does Squid work with NTLM Authentication?
Version 2.5 supports Microsoft NTLM authentication to authenticate users accessing the proxy server itself (be it in a forward or reverse setup). See ../ProxyAuthentication for further details
Version 2.6 and onwards also support the kind of infrastructure that's needed to properly allow an user to authenticate against an NTLM-enabled webserver.
As NTLM authentication backends go, the real work is usually done by Samba on squid's behalf. That being the case, Squid supports any authentication backend supported by Samba, including Samba itself and MS Windows 3.51 and onwards Domain Controllers.
NTLM for HTTP is, however, an horrible example of an authentication protocol, and we recommend to avoid using it in favour of saner and standard-sanctioned alternatives such as Digest.
The ''default'' parent option isn't working!
This message was received at squid-bugs: If you have only one parent, configured as: cache_peer xxxx parent 3128 3130 no-query default
nothing is sent to the parent; neither UDP packets, nor TCP connections. Simply adding If the cache is able to make direct connections, direct will be preferred over default. If you want to force all requests to your parent cache(s), use the acl all src 0.0.0.0/0.0.0.0
never_direct allow all
"Hotmail" complains about: Intrusion Logged. Access denied.
Hotmail is proxy-unfriendly and requires all requests to come from the same IP address. You can fix this by adding to your squid.conf: hierarchy_stoplist hotmail.com
My Squid becomes very slow after it has been running for some time.
This is most likely because Squid is using more memory than it should be for your system. When the Squid process becomes large, it experiences a lot of paging. This will very rapidly degrade the performance of Squid. Memory usage is a complicated problem. There are a number of things to consider.
Then, examine the Cache Manager Info ouput and look at these two lines: Number of HTTP requests received: 121104
Page faults with physical i/o: 16720
|
If your system does not have the getrusage() function, then you will not see the page faults line. |
Divide the number of page faults by the number of connections. In this case 16720/121104 = 0.14. Ideally this ratio should be in the 0.0 - 0.1 range. It may be acceptable to be in the 0.1 - 0.2 range. Above that, however, and you will most likely find that Squid's performance is unacceptably slow.
If the ratio is too high, you will need to make some changes as detailed in ../SquidMemory.
WARNING: Failed to start 'dnsserver'
This could be a permission problem. Does the Squid userid have permission to execute the dnsserver program? You might also try testing $ echo oceana.nlanr.net | ./dnsserver
Should produce something like:
$name oceana.nlanr.net $h_name oceana.nlanr.net $h_len 4 $ipcount 1 132.249.40.200 $aliascount 0 $ttl 82067 $end
Sending bug reports to the Squid team
Bug reports for Squid should be registered in our bug database. Any bug report must include
- The Squid version
- Your Operating System type and version
- A clear description of the bug symptoms.
- If your Squid crashes the report must include a coredumps stack trace as described below
Please note that bug reports are only processed if they can be reproduced or identified in the current STABLE or development versions of Squid. If you are running an older version of Squid the first response will be to ask you to upgrade unless the developer who looks at your bug report immediately can identify that the bug also exists in the current versions. It should also be noted that any patches provided by the Squid developer team will be to the current STABLE version even if you run an older version.
crashes and core dumps
There are two conditions under which squid will exit abnormally and generate a coredump. First, a SIGSEGV or SIGBUS signal will cause Squid to exit and dump core. Second, many functions include consistency checks. If one of those checks fail, Squid calls abort() to generate a core dump.
Many people report that Squid doesn't leave a coredump anywhere. This may be due to one of the following reasons:
- Resource Limits
- The shell has limits on the size of a coredump file. You may need to increase the limit using ulimit or a similar command (see below)
- sysctl options
- On FreeBSD, you won't get a coredump from programs that call setuid() and/or setgid() (like Squid sometimes does) unless you enable this option:
# sysctl -w kern.sugid_coredump=1
- No debugging symbols
- The Squid binary must have debugging symbols in order to get a meaningful coredump.
- Threads and Linux
- On Linux, threaded applications do not generat core dumps. When you use the aufs cache_dir type, it uses threads and you can't get a coredump.
- It did leave a coredump file, you just can't find it.
Resource Limits
These limits can usually be changed in shell scripts. The command to change the resource limits is usually either limit or limits. Sometimes it is a shell-builtin function, and sometimes it is a regular program. Also note that you can set resource limits in the /etc/login.conf file on FreeBSD and maybe other systems. To change the coredumpsize limit you might use a command like: limit coredumpsize unlimited
or
limits coredump unlimited
Debugging Symbols
To see if your Squid binary has debugging symbols, use this command:
% nm /usr/local/squid/bin/squid | head
The binary has debugging symbols if you see gobbledegook like this:
0812abec B AS_tree_head 080a7540 D AclMatchedName 080a73fc D ActionTable 080908a4 r B_BYTES_STR 080908bc r B_GBYTES_STR 080908ac r B_KBYTES_STR 080908b4 r B_MBYTES_STR 080a7550 D Biggest_FD 08097c0c R CacheDigestHashFuncCount 08098f00 r CcAttrs
There are no debugging symbols if you see this instead:
/usr/local/squid/bin/squid: no symbols
Debugging symbols may have been removed by your install program. If you look at the squid binary from the source directory, then it might have the debugging symbols.
Coredump Location
The core dump file will be left in one of the following locations:
The coredump_dir directory, if you set that option.
The first cache_dir directory if you have used the cache_effective_user option.
- The current directory when Squid was started
Recent versions of Squid report their current directory after starting, so look there first:
2000/03/14 00:12:36| Set Current Directory to /usr/local/squid/cache
If you cannot find a core file, then either Squid does not have permission to write in its current directory, or perhaps your shell limits are preventing the core file from being written.
Often you can get a coredump if you run Squid from the command line like this (csh shells and clones):
% limit core un % /usr/local/squid/bin/squid -NCd1
Once you have located the core dump file, use a debugger such as dbx or gdb to generate a stack trace: tirana-wessels squid/src 270% gdb squid /T2/Cache/core
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the conditions.
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.15.1 (hppa1.0-hp-hpux10.10), Copyright 1995 Free Software Foundation, Inc...
Core was generated by `squid'.
Program terminated with signal 6, Aborted.
[...]
(gdb) where
#0 0xc01277a8 in _kill ()
#1 0xc00b2944 in _raise ()
#2 0xc007bb08 in abort ()
#3 0x53f5c in __eprintf (string=0x7b037048 "", expression=0x5f <Address 0x5f out of bounds>, line=8, filename=0x6b <Address 0x6b out of bounds>)
#4 0x29828 in fd_open (fd=10918, type=3221514150, desc=0x95e4 "HTTP Request") at fd.c:71
#5 0x24f40 in comm_accept (fd=2063838200, peer=0x7b0390b0, me=0x6b) at comm.c:574
#6 0x23874 in httpAccept (sock=33, notused=0xc00467a6) at client_side.c:1691
#7 0x25510 in comm_select_incoming () at comm.c:784
#8 0x25954 in comm_select (sec=29) at comm.c:1052
#9 0x3b04c in main (argc=1073745368, argv=0x40000dd8) at main.c:671
If possible, you might keep the coredump file around for a day or two. It is often helpful if we can ask you to send additional debugger output, such as the contents of some variables. But please note that a core file is only useful if paired with the exact same binary as generated the corefile. If you recompile Squid then any coredumps from previous versions will be useless unless you have saved the corresponding Squid binaries, and any attempts to analyze such coredumps will most certainly give misleading information about the cause to the crash.
If you CANNOT get Squid to leave a core file for you then one of the following approaches can be used
First alternative is to start Squid under the contol of GDB
% gdb /path/to/squid handle SIGPIPE pass nostop noprint run -DNYCd3 [wait for crash] backtrace quit
The drawback from the above is that it isn't really suitable to run on a production system as Squid then won't restart automatically if it crashes. The good news is that it is fully possible to automate the process above to automatically get the stack trace and then restart Squid. Here is a short automated script that should work:
#!/bin/sh trap "rm -f $$.gdb" 0 cat <<EOF >$$.gdb handle SIGPIPE pass nostop noprint run -DNYCd3 backtrace quit EOF while sleep 2; do gdb -x $$.gdb /path/to/squid 2>&1 | tee -a squid.out done
Other options if the above cannot be done is to:
- Build Squid with the --enable-stacktraces option, if support exists for your OS (exists for Linux glibc on Intel, and Solaris with some extra libraries which seems rather impossible to find these days..)
- Run Squid using the "catchsegv" tool. (Linux glibc Intel)
these approaches does not by far provide as much details as using gdb.
Debugging Squid
If you believe you have found a non-fatal bug (such as incorrect HTTP processing) please send us a section of your cache.log with debugging to demonstrate the problem. The cache.log file can become very large, so alternatively, you may want to copy it to an FTP or HTTP server where we can download it.
It is very simple to enable full debugging on a running squid process. Simply use the -k debug command line option: % ./squid -k debug
This causes every debug() statement in the source code to write a line in the cache.log file. You also use the same command to restore Squid to normal debugging level. To enable selective debugging (e.g. for one source file only), you need to edit debug_options ALL,1 28,9
Then you have to restart or reconfigure Squid.
Once you have the debugging captured to cache.log, take a look at it yourself and see if you can make sense of the behaviour which you see. If not, please feel free to send your debugging output to the squid-users or squid-bugs lists.
FATAL: ipcache_init: DNS name lookup tests failed
Squid normally tests your system's DNS configuration before it starts server requests. Squid tries to resolve some common DNS names, as defined in the dns_testnames configuration directive. If Squid cannot resolve these names, it could mean:
your /etc/resolv.conf file may contain incorrect information.
your /etc/resolv.conf file may have incorrect permissions, and may be unreadable by Squid.
To disable this feature, use the -D command line option. Note, Squid does NOT use the
FATAL: Failed to make swap directory /var/spool/cache: (13) Permission denied
Starting with version 1.1.15, we have required that you first run
squid -z
to create the swap directories on your filesystem. If you have set the cache_effective_user option, then the Squid process takes on the given userid before making the directories. If the cache_dir directory (e.g. /var/spool/cache) does not exist, and the Squid userid does not have permission to create it, then you will get the "permission denied" error. This can be simply fixed by manually creating the cache directory. # mkdir /var/spool/cache
# chown <userid> <groupid> /var/spool/cache
# squid -z
Alternatively, if the directory already exists, then your operating system may be returning "Permission Denied" instead of "File Exists" on the mkdir() system call. This [store.c-mkdir.patch patch] by Miquel van Smoorenburg should fix it.
FATAL: Cannot open HTTP Port
Either
- the Squid userid does not have permission to bind to the port, or
- some other process has bound itself to the port
Remember that root privileges are required to open port numbers less than 1024. If you see this message when using a high port number, or even when starting Squid as root, then the port has already been opened by another process.
SELinux can also deny squid access to port 80, even if you are starting squid as root. Configure SELinux to allow squid to open port 80 or disable SELinux in this case.
Maybe you are running in the HTTP Accelerator mode and there is already a HTTP server running on port 80? If you're really stuck, install the way cool lsof utility to show you which process has your port in use.
FATAL: All redirectors have exited!
This is explained in ../SquidRedirectors.
FATAL: file_map_allocate: Exceeded filemap limit
See the next question.
FATAL: You've run out of swap file numbers.
|
The information here applies to version 2.2 and earlier |
Squid keeps an in-memory bitmap of disk files that are available for use, or are being used. The size of this bitmap is determined at run name, based on two things: the size of your cache, and the average (mean) cache object size.
The size of your cache is specified in squid.conf, on the cache_dir lines. The mean object size can also be specified in squid.conf, with the 'store_avg_object_size' directive. By default, Squid uses 13 Kbytes as the average size. When allocating the bitmaps, Squid allocates this many bits: 2 * cache_size / store_avg_object_size
So, if you exactly specify the correct average object size, Squid should have 50% filemap bits free when the cache is full. You can see how many filemap bits are being used by looking at the 'storedir' cache manager page. It looks like this:
Store Directory #0: /usr/local/squid/cache First level subdirectories: 4 Second level subdirectories: 4 Maximum Size: 1024000 KB Current Size: 924837 KB Percent Used: 90.32% Filemap bits in use: 77308 of 157538 (49%) Flags:
Now, if you see the "You've run out of swap file numbers" message, then it means one of two things:
- You've found a Squid bug.
- Your cache's average file size is much smaller than the 'store_avg_object_size' value.
To check the average file size of object currently in your cache, look at the cache manager 'info' page, and you will find a line like:
Mean Object Size: 11.96 KB
To make the warning message go away, set 'store_avg_object_size' to that value (or lower) and then restart Squid.
I am using up over 95% of the filemap bits?!!
|
The information here is current for version 2.3 |
Calm down, this is now normal. Squid now dynamically allocates filemap bits based on the number of objects in your cache. You won't run out of them, we promise.
FATAL: Cannot open /usr/local/squid/logs/access.log: (13) Permission denied
In Unix, things like processes and files have an owner. For Squid, the process owner and file owner should be the same. If they are not the same, you may get messages like "permission denied." To find out who owns a file, use the % ls -l /usr/local/squid/logs/access.log
A process is normally owned by the user who starts it. However, Unix sometimes allows a process to change its owner. If you specified a value for the effective_user option in squid.conf, then that will be the process owner. The files must be owned by this same userid. If all this is confusing, then you probably should not be running Squid until you learn some more about Unix. As a reference, I suggest Learning the UNIX Operating System, 4th Edition.
When using a username and password, I can not access some files.
If I try by way of a test, to access ftp://username:password@ftpserver/somewhere/foo.tar.gz
I get somewhere/foo.tar.gz: Not a directory.
Use this URL instead:
ftp://username:password@ftpserver/%2fsomewhere/foo.tar.gz
pingerOpen: icmp_sock: (13) Permission denied
This means your pinger program does not have root priveleges. You should either do this: % su
# make install-pinger
or
# chown root /usr/local/squid/bin/pinger # chmod 4755 /usr/local/squid/bin/pinger
What is a forwarding loop?
A forwarding loop is when a request passes through one proxy more than once. You can get a forwarding loop if
- a cache forwards requests to itself. This might happen with interception caching (or server acceleration) configurations.
- a pair or group of caches forward requests to each other. This can happen when Squid uses ICP, Cache Digests, or the ICMP RTT database to select a next-hop cache.
Forwarding loops are detected by examining the Via request header. Each cache which "touches" a request must add its hostname to the Via header. If a cache notices its own hostname in this header for an incoming request, it knows there is a forwarding loop somewhere. Squid may report a forwarding loop if a request goes through two caches that have the same visible_hostname value. If you want to have multiple machines with the same visible_hostname then you must give each machine a different unique_hostname so that forwarding loops are correctly detected. When Squid detects a forwarding loop, it is logged to the One way to reduce forwarding loops is to change a Another way is to use
# Our parent caches
cache_peer A.example.com parent 3128 3130
cache_peer B.example.com parent 3128 3130
cache_peer C.example.com parent 3128 3130
# An ACL list
acl PEERS src A.example.com
acl PEERS src B.example.com
acl PEERS src C.example.com
# Prevent forwarding loops
cache_peer_access A.example.com allow !PEERS
cache_peer_access B.example.com allow !PEERS
cache_peer_access C.example.com allow !PEERS
The above configuration instructs squid to NOT forward a request to parents A, B, or C when a request is received from any one of those caches.
accept failure: (71) Protocol error
This error message is seen mostly on Solaris systems. Mark Kennedy gives a great explanation:
Error 71 [EPROTO] is an obscure way of reporting that clients made it onto your server's TCP incoming connection queue but the client tore down the connection before the server could accept it. I.e. your server ignored its clients for too long. We've seen this happen when we ran out of file descriptors. I guess it could also happen if something made squid block for a long time.
storeSwapInFileOpened: ... Size mismatch
|
These messages are specific to squid 2.X |
Got these messages in my cache log - I guess it means that the index contents do not match the contents on disk. 1998/09/23 09:31:30| storeSwapInFileOpened: /var/cache/00/00/00000015: Size mismatch: 776(fstat) != 3785(object)
1998/09/23 09:31:31| storeSwapInFileOpened: /var/cache/00/00/00000017: Size mismatch: 2571(fstat) != 4159(object)
What does Squid do in this case? These happen when Squid reads an object from disk for a cache hit. After it opens the file, Squid checks to see if the size is what it expects it should be. If the size doesn't match, the error is printed. In this case, Squid does not send the wrong object to the client. It will re-fetch the object from the source.
Why do I get ''fwdDispatch: Cannot retrieve 'https://www.buy.com/corp/ordertracking.asp' ''
These messages are caused by buggy clients, mostly Netscape Navigator. What happens is, Netscape sends an HTTPS/SSL request over a persistent HTTP connection. Normally, when Squid gets an SSL request, it looks like this:
CONNECT www.buy.com:443 HTTP/1.0
Then Squid opens a TCP connection to the destination host and port, and the real request is sent encrypted over this connection. Thats the whole point of SSL, that all of the information must be sent encrypted. With this client bug, however, Squid receives a request like this: GET https://www.buy.com/corp/ordertracking.asp HTTP/1.0
Accept: */*
User-agent: Netscape ...
...
Now, all of the headers, and the message body have been sent, unencrypted to Squid. There is no way for Squid to somehow turn this into an SSL request. The only thing we can do is return the error message. This browser bug does represent a security risk because the browser is sending sensitive information unencrypted over the network.
by Dave J Woolley (DJW at bts dot co dot uk) These are illegal URLs, generally only used by illegal sites; typically the web site that supports a spammer and is expected to survive a few hours longer than the spamming account. Their intention is to: Any browser or proxy that works with them should be considered a security risk. RFC 1738 has this to say about the hostname part of a URL:
Whitespace characters (space, tab, newline, carriage return) are not allowed in URI's and URL's. Unfortunately, a number of Web services generate URL's with whitespace. Of course your favorite browser silently accomodates these bad URL's. The servers (or people) that generate these URL's are in violation of Internet standards. The whitespace characters should be encoded. If you want Squid to accept URL's with whitespace, you have to decide how to handle them. There are four choices that you can set with the
Squid can't access URLs like http://3626046468/ab2/cybercards/moreinfo.html
The fully qualified domain name of a network host, or its IP
address as a set of four decimal digit groups separated by
".". Fully qualified domain names take the form as described
in Section 3.5 of RFC 1034 [13] and Section 2.1 of RFC 1123
[5]: a sequence of domain labels separated by ".", each domain
label starting and ending with an alphanumerical character and
possibly also containing "-" characters. The rightmost domain
label will never start with a digit, though, which
syntactically distinguishes all domain names from the IP
addresses.
I get a lot of "URI has whitespace" error messages in my cache log, what should I do?
DENY
ALLOW
ENCODE The whitespace characters are encoded according to RFC 1738. This can be considered a violation of the HTTP specification.
CHOP
commBind: Cannot bind socket FD 5 to 127.0.0.1:0: (49) Can't assign requested address
This likely means that your system does not have a loopback network device, or that device is not properly configured. All Unix systems should have a network device named lo0, and it should be configured with the address 127.0.0.1. If not, you may get the above error message. To check your system, run: % ifconfig lo0
The result should look something like:
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 inet 127.0.0.1 netmask 0xff000000
If you use FreeBSD, see freebsd-no-lo0
Unknown cache_dir type '/var/squid/cache'
The format of the cache_dir option changed with version 2.3. It now takes a type argument. All you need to do is insert ufs in the line, like this: cache_dir ufs /var/squid/cache ...
unrecognized: 'cache_dns_program /usr/local/squid/bin/dnsserver'
As of Squid 2.3, the default is to use internal DNS lookup code. The cache_dns_program and dns_children options are not known squid.conf directives in this case. Simply comment out these two options. If you want to use external DNS lookups, with the --disable-internal-dns
Is ''dns_defnames'' broken in Squid-2.3 and later?
Sort of. As of Squid 2.3, the default is to use internal DNS lookup code. The dns_defnames option is only used with the external dnsserver processes. If you relied on dns_defnames before, you have three choices: See if the
append_domain option will work for you instead.
Enhance src/dns_internal.c to understand the search and domain lines from /etc/resolv.conf.
What does "sslReadClient: FD 14: read failure: (104) Connection reset by peer" mean?
"Connection reset by peer" is an error code that Unix operating systems sometimes return for read, write, connect, and other system calls. Connection reset means that the other host, the peer, sent us a RESET packet on a TCP connection. A host sends a RESET when it receives an unexpected packet for a nonexistent connection. For example, if one side sends data at the same time that the other side closes a connection, when the other side receives the data it may send a reset back. The fact that these messages appear in Squid's log might indicate a problem, such as a broken origin server or parent cache. On the other hand, they might be "normal," especially since some applications are known to force connection resets rather than a proper close. You probably don't need to worry about them, unless you receive a lot of user complaints relating to SSL sites. Rick Jones notes that if the server is running a Microsoft TCP stack, clients receive RST segments whenever the listen queue overflows. In other words, if the server is really busy, new connections receive the reset message. This is contrary to rational behaviour, but is unlikely to change.
What does ''Connection refused'' mean?
This is an error message, generated by your operating system, in response to a connect() system call. It happens when there is no server at the other end listening on the port number that we tried to connect to. Its quite easy to generate this error on your own. Simply telnet to a random, high numbered port: % telnet localhost 12345
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
It happens because there is no server listening for connections on port 12345.
When you see this in response to a URL request, it probably means the origin server web site is temporarily down. It may also mean that your parent cache is down, if you have one.
squid: ERROR: no running copy
You may get this message when you run commands like squid -krotate. This error message usually means that the If you accidentally removed the PID file, there are two ways to get it back. One is to run bender-wessels % ps ax | grep squid
83617 ?? Ss 0:00.00 squid -s
83619 ?? S 0:00.48 (squid) -s (squid)
You want the second process id, 83619 in this case. Create the PID file and put the process id number there. For example:
echo 83619 > /usr/local/squid/logs/squid.pid
The second is to use the above technique to find the Squid process id. Send the process a HUP signal, which is the same as squid -kreconfigure: kill -HUP 83619
The reconfigure process creates a new PID file automatically.
FATAL: getgrnam failed to find groupid for effective group 'nogroup'
You are probably starting Squid as root. Squid is trying to find a group-id that doesn't have any special priveleges that it will run as. The default is nogroup, but this may not be defined on your system. You need to edit squid.conf and set cache_effective_group to the name of an unpriveledged group from /etc/group. There is a good chance that nobody will work for you.
"Unsupported Request Method and Protocol" for ''https'' URLs.
|
The information here is current for version 2.3 |
This is correct. Squid does not know what to do with an https URL. To handle such a URL, Squid would need to speak the SSL protocol. Unfortunately, it does not (yet). Normally, when you type an
The browser tunnels the request through Squid with the CONNECT request method.
The CONNECT method is a way to tunnel any kind of connection through an HTTP proxy. The proxy doesn't understand or interpret the contents. It just passes bytes back and forth between the client and server. For the gory details on tunnelling and the CONNECT method, please see RFC 2817 and Tunneling TCP based protocols through Web proxy servers (expired IETF draft).
Squid uses 100% CPU
There may be many causes for this.
Andrew Doroshenko reports that removing /dev/null, or mounting a filesystem with the nodev option, can cause Squid to use 100% of CPU. His suggested solution is to "touch /dev/null."
Webmin's ''cachemgr.cgi'' crashes the operating system
Mikael Andersson reports that clicking on Webmin's cachemgr.cgi link creates numerous instances of cachemgr.cgi that quickly consume all available memory and brings the system to its knees. Joe Cooper reports this to be caused by SSL problems in some browsers (mainly Netscape 6.x/Mozilla) if your Webmin is SSL enabled. Try with another browser such as Netscape 4.x or Microsoft IE, or disable SSL encryption in Webmin.
Segment Violation at startup or upon first request
Some versions of GCC (notably 2.95.1 through 2.95.4 at least) have bugs with compiler optimization. These GCC bugs may cause NULL pointer accesses in Squid, resulting in a "FATAL: Received Segment Violation...dying" message and a core dump. You can work around these GCC bugs by disabling compiler optimization. The best way to do that is start with a clean source tree and set the CC options specifically: % cd squid-x.y
% make distclean
% setenv CFLAGS='-g -Wall'
% ./configure ...
To check that you did it right, you can search for AC_CFLAGS in src/Makefile: % grep AC_CFLAGS src/Makefile
AC_CFLAGS = -g -Wall
Now when you recompile, GCC won't try to optimize anything:
% make Making all in lib... gcc -g -Wall -I../include -I../include -c rfc1123.c ...etc...
|
Some people worry that disabling compiler optimization will negatively impact Squid's performance. The impact should be negligible, unless your cache is really busy and already runs at a high CPU usage. For most people, the compiler optimization makes little or no difference at all |
urlParse: Illegal character in hostname 'proxy.mydomain.com:8080proxy.mydomain.com'
By Yomler of fnac.net
A combination of a bad configuration of Internet Explorer and any application which use the cydoor DLLs will produce the entry in the log. See cydoor.com for a complete list.
The bad configuration of IE is the use of a active configuration script (proxy.pac) and an active or inactive, but filled proxy settings. IE will only use the proxy.pac. Cydoor aps will use both and will generate the errors.
Disabling the old proxy settings in IE is not enought, you should delete them completely and only use the proxy.pac for example.
Requests for international domain names does not work
By HenrikNordström.
Some people have asked why requests for domain names using national symbols as "supported" by the certain domain registrars does not work in Squid. This is because there as of yet is no standard on how to manage national characters in the current Internet protocols such as HTTP or DNS. The current Internet standards is very strict on what is an acceptable hostname and only accepts A-Z a-z 0-9 and - in Internet hostname labels. Anything outside this is outside the current Internet standards and will cause interoperability issues such as the problems seen with such names and Squid.
When there is a consensus in the DNS and HTTP standardization groups on how to handle international domain names Squid will be changed to support this if any changes to Squid will be required.
If you are interested in the progress of the standardization process for international domain names please see the IETF IDN working group's dedicated page.
Why do I sometimes get "Zero Sized Reply"?
This happens when Squid makes a TCP connection to an origin server, but for some reason, the connection is closed before Squid reads any data. Depending on various factors, Squid may be able to retry the request again. If you see the "Zero Sized Reply" error message, it means that Squid was unable to retry, or that all retry attempts also failed.
What causes a connection to close prematurely? It could be a number of things, including:
- An overloaded origin server.
TCP implementation/interoperability bugs. See the ../SystemWeirdnesses for details.
- Race conditions with HTTP persistent connections.
- Buggy or misconfigured NAT boxes, firewalls, and load-balancers.
- Denial of service attacks.
Utilizing TCP blackholing on FreeBSD (check ../SystemWeirdnesses).
You may be able to use tcpdump to track down and observe the problem. Some users believe the problem is caused by very large cookies. One user reports that his Zero Sized Reply problem went away when he told Internet Explorer to not accept third-party cookies. Here are some things you can try to reduce the occurance of the Zero Sized Reply error:
Disable HTTP persistent connections with the server_persistent_connections and client_persistent_connections directives.
Disable any advanced TCP features on the Squid system. Disable ECN on Linux with echo 0 > /proc/sys/net/ipv4/tcp_ecn/.
If this error causes serious problems for you and the above does not help, Squid developers would be happy to help you uncover the problem. However, we will require high-quality debugging information from you, such as tcpdump output, server IP addresses, operating system versions, and access.log entries with full HTTP headers. If you want to make Squid give the Zero Sized error on demand, you can use a short C program. Simply compile and start the program on a system that doesn't already have a server running on port 80. Then try to connect to this fake server through Squid:
Why do I get "The request or reply is too large" errors?
by Grzegorz Janoszka
This error message appears when you try downloading large file using GET or uploading it using POST/PUT. There are three parameters to look for: request_body_max_size, reply_body_max_size (these two are set to 0 by default now, which means no limits at all, earlier version of squid had e.g. 1MB in request) and request_header_max_size - it defaults to 10kB (now, earlier versions had here 4 or even 2 kB) - in some rather rare circumstances even 10kB is too low, so you can increase this value.
Negative or very large numbers in Store Directory Statistics, or constant complaints about cache above limit
In some situations where swap.state has been corrupted Squid can be very confused about how much data it has in the cache. Such corruption may happen after a power failure or similar fatal event. To recover first stop Squid, then delete the swap.state files from each cache directory and then start Squid again. Squid will automatically rebuild the swap.state index from the cached files reasonably well.
If this does not work or causes too high load on your server due to the reindexing of the cache then delete the cache content as explained in ../OperatingSquid.
Squid problems with Windows Update v5
By Janno de Wit
There seems to be some problems with Microsoft Windows to access the Windows Update website. This is especially a problem when you block all traffic by a firewall and force your users to go through the Squid Cache.
Symptom: Windows Update gives error codes like 0x80072EFD and cannot update, automatic updates aren't working too.
Cause: In earlier Windows-versions Windows Update takes the proxy-settings from Internet Explorer. Since XP SP2 this is not sure. At my machine I ran Windows XP SP1 without Windows Update problems. When I upgraded to SP2 Windows Update started to give errors when searching updates etc.
The problem was that WU did not go through the proxy and tries to establish direct HTTP connections to Update-servers. Even when I set the proxy in IE again, it didn't help . It isn't Squid's problem that Windows Update doesn't work, but it is in Windows itself. The solution is to use the 'proxycfg' tool shipped with Windows XP. With this tool you can set the proxy for WinHTTP.
Commands:
C:\> proxycfg # gives information about the current connection type. Note: 'Direct Connection' does not force WU to bypass proxy C:\> proxycfg -d # Set Direct Connection C:\> proxycfg -p wu-proxy.lan:8080 # Set Proxy to use with Windows Update to wu-proxy.lan, port 8080 c:\> proxycfg -u # Set proxy to Internet Explorer settings.
Contents
- What are cachable objects?
- What is the ICP protocol?
- What is the ''dnsserver''?
- What is a cache hierarchy? What are parents and siblings?
- What is the Squid cache resolution algorithm?
- What features are Squid developers currently working on?
- Tell me more about Internet traffic workloads
- What are the tradeoffs of caching with the NLANR cache system?
- Where can I find out more about firewalls?
- What is the "Storage LRU Expiration Age?"
- What is "Failure Ratio at 1.01; Going into hit-only-mode for 5 minutes"?
- Does squid periodically re-read its configuration file?
- How does ''unlinkd'' work?
- What is an icon URL?
- Can I make my regular FTP clients use a Squid cache?
- Why is the select loop average time so high?
- How does Squid deal with Cookies?
- How does Squid decide when to refresh a cached object?
- What exactly is a ''deferred read''?
- Why is my cache's inbound traffic equal to the outbound traffic?
- How come some objects do not get cached?
- What does ''keep-alive ratio'' mean?
- How does Squid's cache replacement algorithm work?
- What are private and public keys?
- What is FORW_VIA_DB for?
- Does Squid send packets to port 7 (echo)? If so, why?
- What does "WARNING: Reply from unknown nameserver [a.b.c.d]" mean?
- How does Squid distribute cache files among the available directories?
- Why do I see negative byte hit ratio?
- What does "Disabling use of private keys" mean?
- What is a half-closed filedescriptor?
- What does --enable-heap-replacement do?
- Why is actual filesystem space used greater than what Squid thinks?
- How do ''positive_dns_ttl'' and ''negative_dns_ttl'' work?
- What does ''swapin MD5 mismatch'' mean?
- What does ''failed to unpack swapfile meta data'' mean?
- Why doesn't Squid make ''ident'' lookups in interception mode?
- dnsSubmit: queue overload, rejecting blah
- What are FTP passive connections?
What are cachable objects?
An Internet Object is a file, document or response to a query for an Internet service such as FTP, HTTP, or gopher. A client requests an Internet object from a caching proxy; if the object is not already cached, the proxy server fetches the object (either from the host specified in the URL or from a parent or sibling cache) and delivers it to the client.
What is the ICP protocol?
ICP is a protocol used for communication among squid caches. The ICP protocol is defined in two Internet RFC's. RFC 2186 describes the protocol itself, while RFC 2187 describes the application of ICP to hierarchical Web caching.
ICP is primarily used within a cache hierarchy to locate specific objects in sibling caches. If a squid cache does not have a requested document, it sends an ICP query to its siblings, and the siblings respond with ICP replies indicating a "HIT" or a "MISS." The cache then uses the replies to choose from which cache to resolve its own MISS.
ICP also supports multiplexed transmission of multiple object streams over a single TCP connection. ICP is currently implemented on top of UDP. Current versions of Squid also support ICP via multicast.
What is the ''dnsserver''?
The dnsserver is a process forked by squid to resolve IP addresses from domain names. This is necessary because the gethostbyname(3) function blocks the calling process until the DNS query is completed.
Squid must use non-blocking I/O at all times, so DNS lookups are implemented external to the main process. The dnsserver processes do not cache DNS lookups, that is implemented inside the squid process.
The dnsserver program was integrated into the main Squid binary in Squid-2. If you have reason to use the old style dnsserver process you can build it at ./configure time. However we would suggest that you file a bug if you find that the internal DNS process does not work as you would expect.
What is a cache hierarchy? What are parents and siblings?
A cache hierarchy is a collection of caching proxy servers organized in a logical parent/child and sibling arrangement so that caches closest to Internet gateways (closest to the backbone transit entry-points) act as parents to caches at locations farther from the backbone. The parent caches resolve "misses" for their children. In other words, when a cache requests an object from its parent, and the parent does not have the object in its cache, the parent fetches the object, caches it, and delivers it to the child. This ensures that the hierarchy achieves the maximum reduction in bandwidth utilization on the backbone transit links, helps reduce load on Internet information servers outside the network served by the hierarchy, and builds a rich cache on the parents so that the other child caches in the hierarchy will obtain better "hit" rates against their parents.
In addition to the parent-child relationships, squid supports the notion of siblings: caches at the same level in the hierarchy, provided to distribute cache server load. Each cache in the hierarchy independently decides whether to fetch the reference from the object's home site or from parent or sibling caches, using a a simple resolution protocol. Siblings will not fetch an object for another sibling to resolve a cache "miss."
What is the Squid cache resolution algorithm?
- Send ICP queries to all appropriate siblings
- Wait for all replies to arrive with a configurable timeout (the default is two seconds).
- Begin fetching the object upon receipt of the first HIT reply, or
- Fetch the object from the first parent which replied with MISS (subject to weighting values), or
- Fetch the object from the source
The algorithm is somewhat more complicated when firewalls are involved.
The single_parent_bypass directive can be used to skip the ICP queries if the only appropriate sibling is a parent cache (i.e., if there's only one place you'd fetch the object from, why bother querying?)
What features are Squid developers currently working on?
There are several open issues for the caching project namely more automatic load balancing and (both configured and dynamic) selection of parents, routing, multicast cache-to-cache communication, and better recognition of URLs that are not worth caching.
For our other to-do list items, please see our "TODO" file in the recent source distributions.
Prospective developers should review the resources available at the Squid developers corner
Tell me more about Internet traffic workloads
Workload can be characterized as the burden a client or group of clients imposes on a system. Understanding the nature of workloads is important to the managing system capacity.
If you are interested in Internet traffic workloads then NLANR's Network Analysis activities is a good place to start.
What are the tradeoffs of caching with the NLANR cache system?
The NLANR root caches are at the NSF supercomputer centers (SCCs), which are interconnected via NSF's high speed backbone service (vBNS). So inter-cache communication between the NLANR root caches does not cross the Internet.
The benefits of hierarchical caching (namely, reduced network bandwidth consumption, reduced access latency, and improved resiliency) come at a price. Caches higher in the hierarchy must field the misses of their descendents. If the equilibrium hit rate of a leaf cache is 50%, half of all leaf references have to be resolved through a second level cache rather than directly from the object's source. If this second level cache has most of the documents, it is usually still a win, but if higher level caches often don't have the document, or become overloaded, then they could actually increase access latency, rather than reduce it.
Where can I find out more about firewalls?
Please see the Firewalls FAQ information site.
What is the "Storage LRU Expiration Age?"
For example:
Storage LRU Expiration Age: 4.31 days
The LRU expiration age is a dynamically-calculated value. Any objects which have not been accessed for this amount of time will be removed from the cache to make room for new, incoming objects. Another way of looking at this is that it would take your cache approximately this many days to go from empty to full at your current traffic levels.
As your cache becomes more busy, the LRU age becomes lower so that more objects will be removed to make room for the new ones. Ideally, your cache will have an LRU age value in the range of at least 3 days. If the LRU age is lower than 3 days, then your cache is probably not big enough to handle the volume of requests it receives. By adding more disk space you could increase your cache hit ratio.
The configuration parameter reference_age places an upper limit on your cache's LRU expiration age.
What is "Failure Ratio at 1.01; Going into hit-only-mode for 5 minutes"?
Consider a pair of caches named A and B. It may be the case that A can reach B, and vice-versa, but B has poor reachability to the rest of the Internet. In this case, we would like B to recognize that it has poor reachability and somehow convey this fact to its neighbor caches.
Squid will track the ratio of failed-to-successful requests over short time periods. A failed request is one which is logged as ERR_DNS_FAIL, ERR_CONNECT_FAIL, or ERR_READ_ERROR. When the failed-to-successful ratio exceeds 1.0, then Squid will return ICP_MISS_NOFETCH instead of ICP_MISS to neighbors. Note, Squid will still return ICP_HIT for cache hits.
Does squid periodically re-read its configuration file?
No, you must send a HUP signal to have Squid re-read its configuration file, including access control lists. An easy way to do this is with the -k command line option:
squid -k reconfigure
How does ''unlinkd'' work?
unlinkd is an external process used for unlinking unused cache files. Performing the unlink operation in an external process opens up some race-condition problems for Squid. If we are not careful, the following sequence of events could occur:
An object with swap file number S is removed from the cache.
We want to unlink file F which corresponds to swap file number S, so we write pathname F to the unlinkd socket. We also mark S as available in the filemap.
We have a new object to swap out. It is allocated to the first available file number, which happens to be S. Squid opens file F for writing.
The unlinkd process reads the request to unlink F and issues the actual unlink call.
So, the problem is, how can we guarantee that unlinkd will not remove a cache file that Squid has recently allocated to a new object? The approach we have taken is to have Squid keep a stack of unused (but not deleted!) swap file numbers. The stack size is hard-coded at 128 entries. We only give unlink requests to unlinkd when the unused file number stack is full. Thus, if we ever have to start unlinking files, we have a pool of 128 file numbers to choose from which we know will not be removed by unlinkd.
In terms of implementation, the only way to send unlink requests to the unlinkd process is via the storePutUnusedFileno function.
Unfortunately there are times when Squid can not use the unlinkd process but must call unlink(2) directly. One of these times is when the cache swap size is over the high water mark. If we push the released file numbers onto the unused file number stack, and the stack is not full, then no files will be deleted, and the actual disk usage will remain unchanged. So, when we exceed the high water mark, we must call unlink(2) directly.
What is an icon URL?
One of the most unpleasant things Squid must do is generate HTML pages of Gopher and FTP directory listings. For some strange reason, people like to have little icons next to each listing entry, denoting the type of object to which the link refers (image, text file, etc.).
We include a set of icons in the source distribution for this purpose. These icon files are loaded by Squid as cached objects at runtime. Thus, every Squid cache now has its own icons to use in Gopher and FTP listings. Just like other objects available on the web, we refer to the icons with Uniform Resource Locators, or URLs.
Can I make my regular FTP clients use a Squid cache?
Nope, its not possible. Squid only accepts HTTP requests. It speaks FTP on the server-side, but not on the client-side.
The very cool wget will download FTP URLs via Squid (and probably any other proxy cache).
Why is the select loop average time so high?
Is there any way to speed up the time spent dealing with select? Cachemgr shows:
Select loop called: 885025 times, 714.176 ms avg
This number is NOT how much time it takes to handle filedescriptor I/O. We simply count the number of times select was called, and divide the total process running time by the number of select calls.
This means, on average it takes your cache .714 seconds to check all the open file descriptors once. But this also includes time select() spends in a wait state when there is no I/O on any file descriptors. My relatively idle workstation cache has similar numbers:
Select loop called: 336782 times, 715.938 ms avg
But my busy caches have much lower times:
Select loop called: 16940436 times, 10.427 ms avg Select loop called: 80524058 times, 10.030 ms avg Select loop called: 10590369 times, 8.675 ms avg Select loop called: 84319441 times, 9.578 ms avg
How does Squid deal with Cookies?
The presence of Cookies headers in requests does not affect whether or not an HTTP reply can be cached. Similarly, the presense of Set-Cookie headers in replies does not affect whether the reply can be cached.
The proper way to deal with Set-Cookie reply headers, according to RFC 2109 is to cache the whole object, EXCEPT the Set-Cookie header lines.
However, we can filter out specific HTTP headers. But instead of filtering them on the receiving-side, we filter them on the sending-side. Thus, Squid does cache replies with Set-Cookie headers, but it filters out the Set-Cookie header itself for cache hits.
How does Squid decide when to refresh a cached object?
When checking the object freshness, we calculate these values:
OBJ_DATE is the time when the object was given out by the
origin server. This is taken from the HTTP Date reply header.
OBJ_LASTMOD is the time when the object was last modified,
given by the HTTP Last-Modified reply header.
OBJ_AGE is how much the object has aged since it was retrieved:
OBJ_AGE = NOW - OBJ_DATE
LM_AGE is how old the object was when it was retrieved:
LM_AGE = OBJ_DATE - OBJ_LASTMOD
LM_FACTOR is the ratio of OBJ_AGE to LM_AGE:
LM_FACTOR = OBJ_AGE / LM_AGE
CLIENT_MAX_AGE is the (optional) maximum object age the client will
accept as taken from the HTTP/1.1 Cache-Control request header.
EXPIRES is the (optional) expiry time from the server reply headers.
These values are compared with the parameters of the refresh_pattern rules. The refresh parameters are:
- URL regular expression
CONF_MIN: The time (in minutes) an object without an explicit expiry time should be considered fresh. The recommended value is 0, any higher values may cause dynamic applications to be erronously cached unless the application designer has taken the appropriate actions.
CONF_PERCENT: A percentage of the objects age (time since last modification age) an object without explicit exipry time will be considered fresh.
CONF_MAX: An upper limit on how long objects without an explicit expiry time will be considered fresh.
The URL regular expressions are checked in the order listed until a match is found. Then the algorithms below are applied for determining if an object is fresh or stale.
The refresh algorithm used in Squid-2 looks like this:
if (EXPIRES) { if (EXPIRES <= NOW) return STALE else return FRESH } if (CLIENT_MAX_AGE) if (OBJ_AGE > CLIENT_MAX_AGE) return STALE if (OBJ_AGE > CONF_MAX) return STALE if (OBJ_DATE > OBJ_LASTMOD) { if (LM_FACTOR < CONF_PERCENT) return FRESH else return STALE } if (OBJ_AGE <= CONF_MIN) return FRESH return STALE
What exactly is a ''deferred read''?
The cachemanager I/O page lists deferred reads for various server-side protocols.
Sometimes reading on the server-side gets ahead of writing to the client-side. Especially if your cache is on a fast network and your clients are connected at modem speeds. Squid-1.1 will read up to 256k (per request) ahead before it starts to defer the server-side reads.
Why is my cache's inbound traffic equal to the outbound traffic?
I've been monitoring the traffic on my cache's ethernet adapter an found a behavior I can't explain: the inbound traffic is equal to the outbound traffic. The differences are negligible. The hit ratio reports 40%. Shouldn't the outbound be at least 40% greater than the inbound?
I can't account for the exact behavior you're seeing, but I can offer this advice; whenever you start measuring raw Ethernet or IP traffic on interfaces, you can forget about getting all the numbers to exactly match what Squid reports as the amount of traffic it has sent/received.
Why?
Squid is an application - it counts whatever data is sent to, or received from, the lower-level networking functions; at each successively lower layer, additional traffic is involved (such as header overhead, retransmits and fragmentation, unrelated broadcasts/traffic, etc.). The additional traffic is never seen by Squid and thus isn't counted - but if you run MRTG (or any SNMP/RMON measurement tool) against a specific interface, all this additional traffic will "magically appear".
Also remember that an interface has no concept of upper-layer networking (so an Ethernet interface doesn't distinguish between IP traffic that's entirely internal to your organization, and traffic that's to/from the Internet); this means that when you start measuring an interface, you have to be aware of *what* you are measuring before you can start comparing numbers elsewhere.
It is possible (though by no means guaranteed) that you are seeing roughly equivalent input/output because you're measuring an interface that both retrieves data from the outside world (Internet), *and* serves it to end users (internal clients). That wouldn't be the whole answer, but hopefully it gives you a few ideas to start applying to your own circumstance.
To interpret any statistic, you have to first know what you are measuring; for example, an interface counts inbound and outbound bytes - that's it. The interface doesn't distinguish between inbound bytes from external Internet sites or from internal (to the organization) clients (making requests). If you want that, try looking at RMON2.
Also, if you're talking about a 40% hit rate in terms of object requests/counts then there's absolutely no reason why you should expect a 40% reduction in traffic; after all, not every request/object is going to be the same size so you may be saving a lot in terms of requests but very little in terms of actual traffic.
How come some objects do not get cached?
To determine whether a given object may be cached, Squid takes many things into consideration. The current algorithm (for Squid-2) goes something like this:
Responses with Cache-Control: Private are NOT cachable.
Responses with Cache-Control: No-Cache are NOT cachable.
Responses with Cache-Control: No-Store are NOT cachable.
Responses for requests with an Authorization header are cachable ONLY if the reponse includes Cache-Control: Public.
- The following HTTP status codes are cachable:
- 200 OK
- 203 Non-Authoritative Information
- 300 Multiple Choices
- 301 Moved Permanently
- 410 Gone
However, if Squid receives one of these responses from a neighbor cache, it will NOT be cached if ALL of the Date, Last-Modified, and Expires reply headers are missing. This prevents such objects from bouncing back-and-forth between siblings forever.
A 302 Moved Temporarily response is cachable ONLY if the response also includes an Expires header.
The following HTTP status codes are "negatively cached" for a short amount of time (configurable):
- 204 No Content
- 305 Use Proxy
- 400 Bad Request
- 403 Forbidden
- 404 Not Found
- 405 Method Not Allowed
- 414 Request-URI Too Large
- 500 Internal Server Error
- 501 Not Implemented
- 502 Bad Gateway
- 503 Service Unavailable
- 504 Gateway Time-out
All other HTTP status codes are NOT cachable, including:
- 206 Partial Content
- 303 See Other
- 304 Not Modified
- 401 Unauthorized
- 407 Proxy Authentication Required
What does ''keep-alive ratio'' mean?
The keep-alive ratio shows up in the server_list cache manager page.
This is a mechanism to try detecting neighbor caches which might not be able to deal with persistent connections. Every time we send a proxy-connection: keep-alive request header to a neighbor, we count how many times the neighbor sent us a proxy-connection: keep-alive reply header. Thus, the keep-alive ratio is the ratio of these two counters.
If the ratio stays above 0.5, then we continue to assume the neighbor properly implements persistent connections. Otherwise, we will stop sending the keep-alive request header to that neighbor.
How does Squid's cache replacement algorithm work?
Squid uses an LRU (least recently used) algorithm to replace old cache objects. This means objects which have not been accessed for the longest time are removed first. In the source code, the StoreEntry->lastref value is updated every time an object is accessed.
Objects are not necessarily removed "on-demand." Instead, a regularly scheduled event runs to periodically remove objects. Normally this event runs every second.
Squid keeps the cache disk usage between the low and high water marks. By default the low mark is 90%, and the high mark is 95% of the total configured cache size. When the disk usage is close to the low mark, the replacement is less aggressive (fewer objects removed). When the usage is close to the high mark, the replacement is more aggressive (more objects removed).
When selecting objects for removal, Squid examines some number of objects and determines which can be removed and which cannot. A number of factors determine whether or not any given object can be removed. If the object is currently being requested, or retrieved from an upstream site, it will not be removed. If the object is "negatively-cached" it will be removed. If the object has a private cache key, it will be removed (there would be no reason to keep it -- because the key is private, it can never be "found" by subsequent requests). Finally, if the time since last access is greater than the LRU threshold, the object is removed.
The LRU threshold value is dynamically calculated based on the current cache size and the low and high marks. The LRU threshold scaled exponentially between the high and low water marks. When the store swap size is near the low water mark, the LRU threshold is large. When the store swap size is near the high water mark, the LRU threshold is small. The threshold automatically adjusts to the rate of incoming requests. In fact, when your cache size has stabilized, the LRU threshold represents how long it takes to fill (or fully replace) your cache at the current request rate. Typical values for the LRU threshold are 1 to 10 days.
Back to selecting objects for removal. Obviously it is not possible to check every object in the cache every time we need to remove some of them. We can only check a small subset each time.
Every time an object is accessed, it gets moved to the top of a list. Over time, the least used objects migrate to the bottom of the list. When looking for objects to remove, we only need to check the last 100 or so objects in the list. Unfortunately this approach increases our memory usage because of the need to store three additional pointers per cache object. We also use cache keys with MD5 hashes.
What are private and public keys?
keys refers to the database keys which Squid uses to index cache objects. Every object in the cache--whether saved on disk or currently being downloaded--has a cache key. We use MD5 checksums for cache keys.
The Squid cache uses the notions of private and public cache keys. An object can start out as being private, but may later be changed to public status. Private objects are associated with only a single client whereas a public object may be sent to multiple clients at the same time. In other words, public objects can be located by any cache client. Private keys can only be located by a single client--the one who requested it.
Objects are changed from private to public after all of the HTTP reply headers have been received and parsed. In some cases, the reply headers will indicate the object should not be made public. For example, if the no-cache Cache-Control directive is used.
What is FORW_VIA_DB for?
We use it to collect data for Plankton.
Does Squid send packets to port 7 (echo)? If so, why?
It may. This is an old feature from the Harvest cache software. The cache would send ICP "SECHO" message to the echo ports of origin servers. If the SECHO message came back before any of the other ICP replies, then it meant the origin server was probably closer than any neighbor cache. In that case Harvest/Squid sent the request directly to the origin server.
With more attention focused on security, many administrators filter UDP packets to port 7. The Computer Emergency Response Team (CERT) once issued an advisory note ( CA-96.01: UDP Port Denial-of-Service Attack) that says UDP echo and chargen services can be used for a denial of service attack. This made admins extremely nervous about any packets hitting port 7 on their systems, and they made complaints.
The source_ping feature has been disabled in Squid-2. If you're seeing packets to port 7 that are coming from a Squid cache (remote port 3130), then its probably a very old version of Squid.
What does "WARNING: Reply from unknown nameserver [a.b.c.d]" mean?
It means Squid sent a DNS query to one IP address, but the response came back from a different IP address. By default Squid checks that the addresses match. If not, Squid ignores the response.
There are a number of reasons why this would happen:
- -Your DNS name server just works this way, either becuase its been configured to, or because its stupid and doesn't know any better.
-You have a weird broadcast address, like 0.0.0.0, in your /etc/resolv.conf file. -Somebody is trying to send spoofed DNS responses to your cache.
If you recognize the IP address in the warning as one of your name server hosts, then its probably numbers (1) or (2).
You can make these warnings stop, and allow responses from "unknown" name servers by setting this configuration option:
ignore_unknown_nameservers off
How does Squid distribute cache files among the available directories?
Note: The information here is current for version 2.2.
See storeDirMapAllocate() in the source code.
When Squid wants to create a new disk file for storing an object, it first selects which cache_dir the object will go into. This is done with the storeDirSelectSwapDir() function. If you have N cache directories, the function identifies the 3N/4 (75%) of them with the most available space. These directories are then used, in order of having the most available space. When Squid has stored one URL to each of the 3N/4 cache_dirs, the process repeats and storeDirSelectSwapDir() finds a new set of 3N/4 cache directories with the most available space.
Once the cache_dir has been selected, the next step is to find an available swap file number. This is accomplished by checking the file map, with the file_map_allocate() function. Essentially the swap file numbers are allocated sequentially. For example, if the last number allocated happens to be 1000, then the next one will be the first number after 1000 that is not already being used.
Why do I see negative byte hit ratio?
Byte hit ratio is calculated a bit differently than Request hit ratio. Squid counts the number of bytes read from the network on the server-side, and the number of bytes written to the client-side. The byte hit ratio is calculated as
(client_bytes - server_bytes) / client_bytes
If server_bytes is greater than client_bytes, you end up with a negative value.
The server_bytes may be greater than client_bytes for a number of reasons, including:
- Cache Digests and other internally generated requests. Cache Digest messages are quite large. They are counted in the server_bytes, but since they are consumed internally, they do not count in client_bytes.
User-aborted requests. If your quick_abort setting allows it, Squid sometimes continues to fetch aborted requests from the server-side, without sending any data to the client-side.
- Some range requests, in combination with Squid bugs, can consume more bandwidth on the server-side than on the
client-side. In a range request, the client is asking for only some part of the object. Squid may decide to retrieve the whole object anyway, so that it can be used later on. This means downloading more from the server than sending to the client. You can affect this behavior with the range_offset_limit option.
What does "Disabling use of private keys" mean?
First you need to understand the difference between public and private keys.
When Squid sends ICP queries, it uses the ICP 'reqnum' field to hold the private key data. In other words, when Squid gets an ICP reply, it uses the 'reqnum' value to build the private cache key for the pending object.
Some ICP implementations always set the 'reqnum' field to zero when they send a reply. Squid can not use private cache keys with such neighbor caches because Squid will not be able to locate cache keys for those ICP replies. Thus, if Squid detects a neighbor cache that sends zero reqnum's, it disables the use of private cache keys.
Not having private cache keys has some important privacy implications. Two users could receive one response that was meant for only one of the users. This response could contain personal, confidential information. You will need to disable the 'zero reqnum' neighbor if you want Squid to use private cache keys.
What is a half-closed filedescriptor?
TCP allows connections to be in a "half-closed" state. This is accomplished with the shutdown(2) system call. In Squid, this means that a client has closed its side of the connection for writing, but leaves it open for reading. Half-closed connections are tricky because Squid can't tell the difference between a half-closed connection, and a fully closed one.
If Squid tries to read a connection, and read() returns 0, and Squid knows that the client doesn't have the whole response yet, Squid puts marks the filedescriptor as half-closed. Most likely the client has aborted the request and the connection is really closed. However, there is a slight chance that the client is using the shutdown() call, and that it can still read the response.
To disable half-closed connections, simply put this in squid.conf:
half_closed_clients off
Then, Squid will always close its side of the connection instead of marking it as half-closed.
What does --enable-heap-replacement do?
Squid has traditionally used an LRU replacement algorithm. However with Squid version 2.4 and later you should use this configure option:
./configure --enable-removal-policies=heap
Currently, the heap replacement code supports two additional algorithms: LFUDA, and GDS.
Then, in squid.conf, you can select different policies with the cache_replacement_policy option. See the squid.conf comments for details.
The LFUDA and GDS replacement code was contributed by John Dilley and others from Hewlett-Packard. Their work is described in these papers:
- -
Enhancement and Validation of Squid's Cache Replacement Policy (HP Tech Report).
- -
Enhancement and Validation of the Squid Cache Replacement Policy (WCW 1999 paper).
Why is actual filesystem space used greater than what Squid thinks?
If you compare df output and cachemgr storedir output, you will notice that actual disk usage is greater than what Squid reports. This may be due to a number of reasons:
Squid doesn't keep track of the size of the swap.state
file, which normally resides on each cache_dir.
- Directory entries and take up filesystem space.
- Other applications might be using the same disk partition.
- Your filesystem block size might be larger than what Squid
thinks. When calculating total disk usage, Squid rounds file sizes up to a whole number of 1024 byte blocks. If your filesystem uses larger blocks, then some "wasted" space is not accounted.
- Your cache has suffered some minor corruption and some objects have gotten lost without being removed from the swap.state file. Over time, Squid will detect this and automatically fix it.
How do ''positive_dns_ttl'' and ''negative_dns_ttl'' work?
positive_dns_ttl is how long Squid caches a successful DNS lookup. Similarly, negative_dns_ttl is how long Squid caches a failed DNS lookup.
positive_dns_ttl is not always used. It is NOT used in the following cases:
- Squid-2.3 and later versions with internal DNS lookups. Internal
lookups are the default for Squid-2.3 and later.
If you applied the "DNS TTL" for BIND as described in ../CompilingSquid.
- If you are using FreeBSD, then it already has the DNS TTL patch
built in.
Let's say you have the following settings:
positive_dns_ttl 1 hours negative_dns_ttl 1 minutes
When Squid looks up a name like www.squid-cache.org, it gets back an IP address like 204.144.128.89. The address is cached for the next hour. That means, when Squid needs to know the address for www.squid-cache.org again, it uses the cached answer for the next hour. After one hour, the cached information expires, and Squid makes a new query for the address of www.squid-cache.org.
If you have the DNS TTL patch, or are using internal lookups, then each hostname has its own TTL value, which was set by the domain name administrator. You can see these values in the 'ipcache' cache manager page. For example:
Hostname Flags lstref TTL N www.squid-cache.org C 73043 12784 1( 0) 204.144.128.89-OK www.ircache.net C 73812 10891 1( 0) 192.52.106.12-OK polygraph.ircache.net C 241768 -181261 1( 0) 192.52.106.12-OK
The TTL field shows how how many seconds until the entry expires. Negative values mean the entry is already expired, and will be refreshed upon next use.
The negative_dns_ttl specifies how long to cache failed DNS lookups. When Squid fails to resolve a hostname, you can be pretty sure that it is a real failure, and you are not likely to get a successful answer within a short time period. Squid retries its lookups many times before declaring a lookup has failed. If you like, you can set negative_dns_ttl to zero.
What does ''swapin MD5 mismatch'' mean?
It means that Squid opened up a disk file to serve a cache hit, but it found that the stored object doesn't match what the user's request. Squid stores the MD5 digest of the URL at the start of each disk file. When the file is opened, Squid checks that the disk file MD5 matches the MD5 of the URL requested by the user. If they don't match, the warning is printed and Squid forwards the request to the origin server.
You do not need to worry about this warning. It means that Squid is automatically recovering from a corrupted cache directory.
What does ''failed to unpack swapfile meta data'' mean?
Each of Squid's disk cache files has a metadata section at the beginning. This header is used to store the URL MD5, some StoreEntry data, and more. When Squid opens a disk file for reading, it looks for the meta data header and unpacks it.
This warning means that Squid couln't unpack the meta data. This is non-fatal bug, from which Squid can recover. Perhaps the meta data was just missing, or perhaps the file got corrupted.
You do not need to worry about this warning. It means that Squid is double-checking that the disk file matches what Squid thinks should be there, and the check failed. Squid recorvers and generates a cache miss in this case.
Why doesn't Squid make ''ident'' lookups in interception mode?
Its a side-effect of the way interception proxying works.
When Squid is configured for interception proxying, the operating system pretends that it is the origin server. That means that the "local" socket address for intercepted TCP connections is really the origin server's IP address. If you run netstat -n on your interception proxy, you'll see a lot of foreign IP addresses in the Local Address column.
When Squid wants to make an ident query, it creates a new TCP socket and binds the local endpoint to the same IP address as the local end of the client's TCP connection. Since the local address isn't really local (its some far away origin server's IP address), the bind() system call fails. Squid handles this as a failed ident lookup.
So why bind in that way? If you know you are interception proxying, then why not bind the local endpoint to the host's (intranet) IP address? Why make the masses suffer needlessly?
Because thats just how ident works. Please read RFC 931, in particular the RESTRICTIONS section.
dnsSubmit: queue overload, rejecting blah
This means that you are using external dnsserver processes for lookups, and all processes are busy, and Squid's pending queue is full. Each dnsserver program can only handle one request at a time. When all dnsserver processes are busy, Squid queues up requests, but only to a certain point.
To alleviate this condition, you need to either (1) increase the number of dnsserver processes by changing the value for dns_children in your config file, or (2) switch to using Squid's internal DNS client code.
Note that in some versions, Squid limits dns_children to 32. To increase it beyond that value, you would have to edit the source code.
As we have mentioned previously in this page, you should NOT be running with external DNS processes.
What are FTP passive connections?
by Colin Campbell
Ftp uses two data streams, one for passing commands around, the other for moving data. The command channel is handled by the ftpd listening on port 21.
The data channel varies depending on whether you ask for passive ftp or not. When you request data in a non-passive environment, you client tells the server "I am listening on <ip-address> <port>." The server then connects FROM port 20 to the ip address and port specified by your client. This requires your "security device" to permit any host outside from port 20 to any host inside on any port > 1023. Somewhat of a hole.
In passive mode, when you request a data transfer, the server tells the client "I am listening on <ip address> <port>." Your client then connects to the server on that IP and port and data flows.
Contents
What is Multicast?
Multicast is essentially the ability to send one IP packet to multiple receivers. Multicast is often used for audio and video conferencing systems.
How do I know if my network has multicast?
One way is to ask someone who manages your network. If your network manager doesn't know, or looks at you funny, then you probably don't have it.
Another way is to use the mtrace program, which can be found on the Xerox PARC FTP site. Mtrace is similar to traceroute. It will tell you about the multicast path between your site and another. For example:
> mtrace mbone.ucar.edu mtrace: WARNING: no multicast group specified, so no statistics printed Mtrace from 128.117.64.29 to 192.172.226.25 via group 224.2.0.1 Querying full reverse path... * switching to hop-by-hop: 0 oceana-ether.nlanr.net (192.172.226.25) -1 avidya-ether.nlanr.net (192.172.226.57) DVMRP thresh^ 1 -2 mbone.sdsc.edu (198.17.46.39) DVMRP thresh^ 1 -3 * nccosc-mbone.dren.net (138.18.5.224) DVMRP thresh^ 48 -4 * * FIXW-MBONE.NSN.NASA.GOV (192.203.230.243) PIM/Special thresh^ 64 -5 dec3800-2-fddi-0.SanFrancisco.mci.net (204.70.158.61) DVMRP thresh^ 64 -6 dec3800-2-fddi-0.Denver.mci.net (204.70.152.61) DVMRP thresh^ 1 -7 mbone.ucar.edu (192.52.106.7) DVMRP thresh^ 64 -8 mbone.ucar.edu (128.117.64.29) Round trip time 196 ms; total ttl of 68 required.
Should I be using Multicast ICP?
Short answer: No, probably not.
Reasons why you SHOULD use Multicast:
It reduces the number of times Squid calls sendto() to put a UDP packet onto the network.
- Its trendy and cool to use Multicast.
Reasons why you SHOULD NOT use Multicast:
- Multicast tunnels/configurations/infrastructure are often unstable. You may lose multicast connectivity but still have unicast connectivity.
- Multicast does not simplify your Squid configuration file. Every trusted neighbor cache must still be specified.
- Multicast does not reduce the number of ICP replies being sent around. It does reduce the number of ICP queries sent, but not the number of replies.
- Multicast exposes your cache to some privacy issues. There are no special emissions required to join a multicast group. Anyone may join your group and eavesdrop on ICP query messages. However, the scope of your multicast traffic can be controlled such that it does not exceed certain boundaries.
We only recommend people to use Multicast ICP over network infrastructure which they have close control over. In other words, only use Multicast over your local area network, or maybe your wide area network if you are an ISP. We think it is probably a bad idea to use Multicast ICP over congested links or commodity backbones.
How do I configure Squid to send Multicast ICP queries?
To configure Squid to send ICP queries to a Multicast address, you need to create another neighbour cache entry specified as multicast. For example:
cache_peer 224.9.9.9 multicast 3128 3130 ttl=64
224.9.9.9 is a sample multicast group address. multicast indicates that this is a special type of neighbour. The HTTP-port argument (3128) is ignored for multicast peers, but the ICP-port (3130) is very important. The final argument, ttl=64 specifies the multicast TTL value for queries sent to this address. It is probably a good idea to increment the minimum TTL by a few to provide a margin for error and changing conditions.
You must also specify which of your neighbours will respond to your multicast queries, since it would be a bad idea to implicitly trust any ICP reply from an unknown address. Note that ICP replies are sent back to unicast addresses; they are NOT multicast, so Squid has no indication whether a reply is from a regular query or a multicast query. To configure your multicast group neighbours, use the cache_peer directive and the multicast-responder option:
cache_peer cache1 sibling 3128 3130 multicast-responder cache_peer cache2 sibling 3128 3130 multicast-responder
Here all fields are relevant. The ICP port number (3130) must be the same as in the cache_peer line defining the multicast peer above. The third field must either be parent or sibling to indicate how Squid should treat replies. With the multicast-responder flag set for a peer, Squid will NOT send ICP queries to it directly (i.e. unicast).
How do I know what Multicast TTL to use?
The Multicast TTL (which is specified on the cache_peer line of your multicast group) determines how "far" your ICP queries will go. In the Mbone, there is a certain TTL threshold defined for each network interface or tunnel. A multicast packet's TTL must be larger than the defined TTL for that packet to be forwarded across that link. For example, the mrouted manual page recommends:
32 for links that separate sites within an organization. 64 for links that separate communities or organizations, and are attached to the Internet MBONE. 128 for links that separate continents on the MBONE.
A good way to determine the TTL you need is to run mtrace as shown above and look at the last line. It will show you the minimum TTL required to reach the other host.
If you set you TTL too high, then your ICP messages may travel "too far" and will be subject to eavesdropping by others. If you're only using multicast on your LAN, as we suggest, then your TTL will be quite small, for example ttl=4.
How do I configure Squid to receive and respond to Multicast ICP?
You must tell Squid to join a multicast group address with the mcast_groups directive. For example:
mcast_groups 224.9.9.9
Of course, all members of your Multicast ICP group will need to use the exact same multicast group address.
Choose a multicast group address with care! If two organizations happen to choose the same multicast address, then they may find that their groups "overlap" at some point. This will be especially true if one of the querying caches uses a large TTL value. There are two ways to reduce the risk of group overlap:
- Use a unique group address
- Limit the scope of multicast messages with TTLs or administrative scoping.
Using a unique address is a good idea, but not without some potential problems. If you choose an address randomly, how do you know that someone else will not also randomly choose the same address? NLANR has been assigned a block of multicast addresses by the IANA for use in situations such as this. If you would like to be assigned one of these addresses, please write to us. However, note that NLANR or IANA have no authority to prevent anyone from using an address assigned to you.
Limiting the scope of your multicast messages is probably a better solution. They can be limited with the TTL value discussed above, or with some newer techniques known as administratively scoped addresses. Here you can configure well-defined boundaries for the traffic to a specific address. The Administratively Scoped IP Multicast RFC describes this.
General advice
The settings detailed in this FAQ chapter are suggestion for operating-system-specific settings which may help when running busy caches. It is recommended to check that the settings have the desired effect by using the Cache Manager.
FreeBSD
Filedescriptors
For busy caches, it makes sense to increase the number of system-wide available filedescriptors, by setting in: in /etc/sysctl.conf
kern.maxfilesperproc=8192
Diskd
|
This information is out-of-date, as with newer FreeBSD versions these parameters can be tuned at runtime via sysctl. We're looking for contributions to update this page |
In order to run diskd you may need to tweak your kernel settings. Try setting in the kernel config file (larger values may be needed for very busy caches):
options MSGMNB=8192 # max # of bytes in a queue options MSGMNI=40 # number of message queue identifiers options MSGSEG=512 # number of message segments per queue options MSGSSZ=64 # size of a message segment options MSGTQL=2048 # max messages in system options SHMSEG=16 options SHMMNI=32 options SHMMAX=2097152 options SHMALL=4096 options MAXFILES=16384
Contents
- Solaris
- TCP incompatibility?
- select()
- malloc
- DNS lookups and ''nscd''
- DNS lookups and /etc/nsswitch.conf
- DNS lookups and NIS
- Tuning
- disk write error: (28) No space left on device
- Solaris X86 and IPFilter
- Changing the directory lookup cache size
- The priority_paging algorithm
- assertion failed: StatHist.c:91: `statHistBin(H, max) == H->capacity - 1'
- FreeBSD
- OSF1/3.2
- BSD/OS
- Linux
- FATAL: Don't run Squid as root, set 'cache_effective_user'!
- Large ACL lists make Squid slow
- gethostbyname() leaks memory in RedHat 6.0 with glibc 2.1.1.
- assertion failed: StatHist.c:91: `statHistBin(H, max) == H->capacity - 1' on Alpha system.
- tools.c:605: storage size of `rl' isn't known
- Can't connect to some sites through Squid
- Some sites load extremely slowly or not at all
- IRIX
- SCO-UNIX
- AIX
Solaris
TCP incompatibility?
J.D. Bronson (jb at ktxg dot com) reported that his Solaris box could not talk to certain origin servers, such as moneycentral.msn.com and www.mbnanetaccess.com. J.D. fixed his problem by setting:
tcp_xmit_hiwat 49152 tcp_xmit_lowat 4096 tcp_recv_hiwat 49152
select()
select(3c) won't handle more than 1024 file descriptors. The configure script should enable poll() by default for Solaris. poll() allows you to use many more filedescriptors, probably 8192 or more.
For older Squid versions you can enable poll() manually by changing HAVE_POLL in include/autoconf.h, or by adding -DUSE_POLL=1 to the DEFINES in src/Makefile.
malloc
libmalloc.a is leaky. Squid's configure does not use -lmalloc on Solaris.
DNS lookups and ''nscd''
by David J N Begley.
DNS lookups can be slow because of some mysterious thing called ncsd. You should edit /etc/nscd.conf and make it say:
enable-cache hosts no
Apparently nscd serializes DNS queries thus slowing everything down when an application (such as Squid) hits the resolver hard. You may notice something similar if you run a log processor executing many DNS resolver queries - the resolver starts to slow.. right.. down.. . . .
According to at online dot ee Andres Kroonmaa, users of Solaris starting from version 2.6 and up should NOT completely disable nscd daemon. nscd should be running and caching passwd and group files, although it is suggested to disable hosts caching as it may interfere with DNS lookups.
Several library calls rely on available free FILE descriptors FD < 256. Systems running without nscd may fail on such calls if first 256 files are all in use.
Since solaris 2.6 Sun has changed the way some system calls work and is using nscd daemon as a implementor of them. To communicate to nscd Solaris is using undocumented door calls. Basically nscd is used to reduce memory usage of user-space system libraries that use passwd and group files. Before 2.6 Solaris cached full passwd file in library memory on the first use but as this was considered to use up too much ram on large multiuser systems Sun has decided to move implementation of these calls out of libraries and to a single dedicated daemon.
DNS lookups and /etc/nsswitch.conf
by Jason Armistead.
The /etc/nsswitch.conf file determines the order of searches for lookups (amongst other things). You might only have it set up to allow NIS and HOSTS files to work. You definitely want the "hosts:" line to include the word dns, e.g.:
hosts: nis dns [NOTFOUND=return] files
DNS lookups and NIS
by Chris Tilbury.
Our site cache is running on a Solaris 2.6 machine. We use NIS to distribute authentication and local hosts information around and in common with our multiuser systems, we run a slave NIS server on it to help the response of NIS queries.
We were seeing very high name-ip lookup times (avg ~2sec) and ip->name lookup times (avg ~8 sec), although there didn't seem to be that much of a problem with response times for valid sites until the cache was being placed under high load. Then, performance went down the toilet.
After some time, and a bit of detective work, we found the problem. On Solaris 2.6, if you have a local NIS server running (ypserv) and you have NIS in your /etc/nsswitch.conf hosts entry, then check the flags it is being started with. The 2.6 ypstart script checks to see if there is a resolv.conf file present when it starts ypserv. If there is, then it starts it with the -d option.
This has the same effect as putting the YP_INTERDOMAIN key in the hosts table -- namely, that failed NIS host lookups are tried against the DNS by the NIS server.
This is a bad thing(tm)! If NIS itself tries to resolve names using the DNS, then the requests are serialised through the NIS server, creating a bottleneck (This is the same basic problem that is seen with nscd). Thus, one failing or slow lookup can, if you have NIS before DNS in the service switch file (which is the most common setup), hold up every other lookup taking place.
If you're running in this kind of setup, then you will want to make sure that
ypserv doesn't start with the -d flag.
you don't have the YP_INTERDOMAIN key in the hosts table (find the B=-b line in the yp Makefile and change it to B=)
We changed these here, and saw our average lookup times drop by up to an order of magnitude (~150msec for name-ip queries and ~1.5sec for ip-name queries, the latter still so high, I suspect, because more of these fail and timeout since they are not made so often and the entries are frequently non-existent anyway).
Tuning
Have a look at Tuning your TCP/IP stack and more by Jens-S. Voeckler.
disk write error: (28) No space left on device
You might get this error even if your disk is not full, and is not out of inodes. Check your syslog logs (/var/adm/messages, normally) for messages like either of these:
NOTICE: realloccg /proxy/cache: file system full NOTICE: alloc: /proxy/cache: file system full
In a nutshell, the UFS filesystem used by Solaris can't cope with the workload squid presents to it very well. The filesystem will end up becoming highly fragmented, until it reaches a point where there are insufficient free blocks left to create files with, and only fragments available. At this point, you'll get this error and squid will revise its idea of how much space is actually available to it. You can do a "fsck -n raw_device" (no need to unmount, this checks in read only mode) to look at the fragmentation level of the filesystem. It will probably be quite high (>15%).
Sun suggest two solutions to this problem. One costs money, the other is free but may result in a loss of performance (although Sun do claim it shouldn't, given the already highly random nature of squid disk access).
The first is to buy a copy of VxFS, the Veritas Filesystem. This is an extent-based filesystem and it's capable of having online defragmentation performed on mounted filesystems. This costs money, however (VxFS is not very cheap!)
The second is to change certain parameters of the UFS filesystem. Unmount your cache filesystems and use tunefs to change optimization to "space" and to reduce the "minfree" value to 3-5% (under Solaris 2.6 and higher, very large filesystems will almost certainly have a minfree of 2% already and you shouldn't increase this). You should be able to get fragmentation down to around 3% by doing this, with an accompanied increase in the amount of space available.
Thanks to Chris Tilbury.
Solaris X86 and IPFilter
by Jeff Madison
Important update regarding Squid running on Solaris x86. I have been working for several months to resolve what appeared to be a memory leak in squid when running on Solaris x86 regardless of the malloc that was used. I have made 2 discoveries that anyone running Squid on this platform may be interested in.
Number 1: There is not a memory leak in Squid even though after the system runs for some amount of time, this varies depending on the load the system is under, Top reports that there is very little memory free. True to the claims of the Sun engineer I spoke to this statistic from Top is incorrect. The odd thing is that you do begin to see performance suffer substantially as time goes on and the only way to correct the situation is to reboot the system. This leads me to discovery number 2.
Number 2: There is some type of resource problem, memory or other, with IPFilter on Solaris x86. I have not taken the time to investigate what the problem is because we no longer are using IPFilter. We have switched to a Alteon ACE 180 Gigabit switch which will do the trans-proxy for you. After moving the trans-proxy, redirection process out to the Alteon switch Squid has run for 3 days strait under a huge load with no problem what so ever. We currently have 2 boxes with 40 GB of cached objects on each box. This 40 GB was accumulated in the 3 days, from this you can see what type of load these boxes are under. Prior to this change we were never able to operate for more than 4 hours.
Because the problem appears to be with IPFilter I would guess that you would only run into this issue if you are trying to run Squid as a interception proxy using IPFilter. That makes sense. If there is anyone with information that would indicate my finding are incorrect I am willing to investigate further.
Changing the directory lookup cache size
On Solaris, the kernel variable for the directory name lookup cache size is ncsize. In /etc/system, you might want to try
set ncsize = 8192
or even higher. The kernel variable ufs_inode - which is the size of the inode cache itself - scales with ncsize in Solaris 2.5.1 and later. Previous versions of Solaris required both to be adjusted independently, but now, it is not recommended to adjust ufs_inode directly on 2.5.1 and later.
You can set ncsize quite high, but at some point - dependent on the application - a too-large ncsize will increase the latency of lookups.
Defaults are:
Solaris 2.5.1 : (max_nprocs + 16 + maxusers) + 64 Solaris 2.6/Solaris 7 : 4 * (max_nprocs + maxusers) + 320
The priority_paging algorithm
Another new tuneable (actually a toggle) in Solaris 2.5.1, 2.6 or Solaris 7 is the priority_paging algorithm. This is actually a complete rewrite of the virtual memory system on Solaris. It will page out application data last, and filesystem pages first, if you turn it on (set priority_paging = 1 in /etc/system). As you may know, the Solaris buffer cache grows to fill available pages, and under the old VM system, applications could get paged out to make way for the buffer cache, which can lead to swap thrashing and degraded application performance. The new priority_paging helps keep application and shared library pages in memory, preventing the buffer cache from paging them out, until memory gets REALLY short. Solaris 2.5.1 requires patch 103640-25 or higher and Solaris 2.6 requires 105181-10 or higher to get priority_paging. Solaris 7 needs no patch, but all versions have it turned off by default.
assertion failed: StatHist.c:91: `statHistBin(H, max) == H->capacity - 1'
by Marc
This crash happen on Solaris, when you don't have the "math.h" file at the compile time. I guess it can happen on every system without the correct include, but I have not verified.
The configure script just report: "math.h: no" and continue. The math functions are bad declared, and this cause this crash.
For 32bit Solaris, "math.h" is found in the SUNWlibm package.
FreeBSD
T/TCP bugs
We have found that with FreeBSD-2.2.2-RELEASE, there some bugs with T/TCP. FreeBSD will try to use T/TCP if you've enabled the "TCP Extensions." To disable T/TCP, use sysinstall to disable TCP Extensions, or edit /etc/rc.conf and set
tcp_extensions="NO" # Allow RFC1323 & RFC1544 extensions (or NO).
or add this to your /etc/rc files:
sysctl -w net.inet.tcp.rfc1644=0
mbuf size
We noticed an odd thing with some of Squid's interprocess communication. Often, output from the dnsserver processes would NOT be read in one chunk. With full debugging, it looks like this:
1998/04/02 15:18:48| comm_select: FD 46 ready for reading 1998/04/02 15:18:48| ipcache_dnsHandleRead: Result from DNS ID 2 (100 bytes) 1998/04/02 15:18:48| ipcache_dnsHandleRead: Incomplete reply ....other processing occurs... 1998/04/02 15:18:48| comm_select: FD 46 ready for reading 1998/04/02 15:18:48| ipcache_dnsHandleRead: Result from DNS ID 2 (9 bytes) 1998/04/02 15:18:48| ipcache_parsebuffer: parsing: $name www.karup.com $h_name www.karup.inter.net $h_len 4 $ipcount 2 38.15.68.128 38.15.67.128 $ttl 2348 $end
Interestingly, it is very common to get only 100 bytes on the first read. When two read() calls are required, this adds additional latency to the overall request. On our caches running Digital Unix, the median dnsserver response time was measured at 0.01 seconds. On our FreeBSD cache, however, the median latency was 0.10 seconds.
Here is a simple patch to fix the bug:
============================ RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v retrieving revision 1.40 retrieving revision 1.41 diff -p -u -r1.40 -r1.41 --- src/sys/kern/uipc_socket.c 1998/05/15 20:11:30 1.40 +++ /home/ncvs/src/sys/kern/uipc_socket.c 1998/07/06 19:27:14 1.41 @@ -31,7 +31,7 @@ * SUCH DAMAGE. * * @(#)uipc_socket.c 8.3 (Berkeley) 4/15/94 - * $Id: FAQ.sgml,v 1.250 2005/04/22 19:29:50 hno Exp $ + * $Id: FAQ.sgml,v 1.250 2005/04/22 19:29:50 hno Exp $ */ #include <sys/param.h> @@ -491,6 +491,7 @@ restart: mlen = MCLBYTES; len = min(min(mlen, resid), space); } else { + atomic = 1; nopages: len = min(min(mlen, resid), space); /*
Another technique which may help, but does not fix the bug, is to increase the kernel's mbuf size. The default is 128 bytes. The MSIZE symbol is defined in /usr/include/machine/param.h. However, to change it we added this line to our kernel configuration file:
options MSIZE="256"
Dealing with NIS
/var/yp/Makefile has the following section:
# The following line encodes the YP_INTERDOMAIN key into the hosts.byname # and hosts.byaddr maps so that ypserv(8) will do DNS lookups to resolve # hosts not in the current domain. Commenting this line out will disable # the DNS lookups. B=-b
You will want to comment out the B=-b line so that ypserv does not do DNS lookups.
FreeBSD 3.3: The lo0 (loop-back) device is not configured on startup
Squid requires a the loopback interface to be up and configured. If it is not, you will get errors such as [FAQ-11.html#comm-bind-loopback-fail commBind].
From FreeBSD 3.3 Errata Notes:
Fix: Assuming that you experience this problem at all, edit ''/etc/rc.conf'' and search for where the network_interfaces variable is set. In its value, change the word ''auto'' to ''lo0'' since the auto keyword doesn't bring the loop-back device up properly, for reasons yet to be adequately determined. Since your other interface(s) will already be set in the network_interfaces variable after initial installation, it's reasonable to simply s/auto/lo0/ in rc.conf and move on.
Thanks to at lentil dot org Robert Lister.
FreeBSD 3.x or newer: Speed up disk writes using Softupdates
FreeBSD 3.x and newer support Softupdates. This is a mechanism to speed up disk writes as it is possible by mounting ufs volumes async. However, Softupdates does this in a way that a performance similar or better than async is achieved but without loosing security in a case of a system crash. For more detailed information and the copyright terms see /sys/contrib/softupdates/README and /sys/ufs/ffs/README.softupdate.
To build a system supporting softupdates, you have to build a kernel with options SOFTUPDATES set (see LINT for a commented out example). After rebooting with the new kernel, you can enable softupdates on a per filesystem base with the command:
$ tunefs -n /mountpoint
The filesystem in question MUST NOT be mounted at this time. After that, softupdates are permanently enabled and the filesystem can be mounted normally. To verify that the softupdates code is running, simply issue a mount command and an output similar to the following will appear:
$ mount /dev/da2a on /usr/local/squid/cache (ufs, local, noatime, soft-updates, writes: sync 70 async 225)
Internal DNS problems with jail environment
Some users report problems with running Squid in the jail environment. Specifically, Squid logs messages like:
2001/10/12 02:08:49| comm_udp_sendto: FD 4, 192.168.1.3, port 53: (22) Invalid argument 2001/10/12 02:08:49| idnsSendQuery: FD 4: sendto: (22) Invalid argument
You can eliminate the problem by putting the jail's network interface address in the 'udp_outgoing_addr' configuration option in squid.conf.
"Zero Sized Reply" error due to TCP blackholing
On FreeBSD, make sure that TCP blackholing is not active. You can verify the current setting with:
# /sbin/sysctl net.inet.tcp.blackhole
It should return the following output:
net.inet.tcp.blackhole: 0
If it is set to a positive value (usually, 2), disable it by setting it back to zero with<
# /sbin/sysctl net.inet.tcp.blackhole=0
To make sure the setting survives across reboots, add the following line to the file /etc/sysctl.conf:
net.inet.tcp.blackhole=0
OSF1/3.2
If you compile both libgnumalloc.a and Squid with cc, the mstats() function returns bogus values. However, if you compile libgnumalloc.a with gcc, and Squid with cc, the values are correct.
BSD/OS
gcc/yacc
Some people report [FAQ-2.html#bsdi-compile difficulties compiling squid on BSD/OS].
process priority
I've noticed that my Squid process seems to stick at a nice value of four, and clicks back to that even after I renice it to a higher priority. However, looking through the Squid source, I can't find any instance of a setpriority() call, or anything else that would seem to indicate Squid's adjusting its own priority.
by Bill Bogstad
BSD Unices traditionally have auto-niced non-root processes to 4 after they used alot (4 minutes???) of CPU time. My guess is that it's the BSD/OS not Squid that is doing this. I don't know offhand if there is a way to disable this on BSD/OS.
by Arjan de Vet
You can get around this by starting Squid with nice-level -4 (or another negative value).
by at nl dot compuware dot com Bert Driehuis
The autonice behavior is a leftover from the history of BSD as a university OS. It penalises CPU bound jobs by nicing them after using 600 CPU seconds. Adding
sysctl -w kern.autonicetime=0
to /etc/rc.local will disable the behavior systemwide.
Linux
Generally we recommend you use Squid with an up-to-date Linux distribution, preferably one with a 2.6 kernel. Recent 2.6 kernels support some features in new versions of Squid such as epoll and WCCP/GRE support built in that will give better performance and flexibility. Note that Squid will however still function just fine under older Linux kernels. You will need to be mindful of the security implications of running your Squid proxy on the Internet if you are using a very old and unsupported distribution.
There have been issues with GLIBC in some very old distributions, and upgrading or fixing GLIBC is not for the faint of heart.
FATAL: Don't run Squid as root, set 'cache_effective_user'!
Some users have reported that setting cache_effective_user to nobody under Linux does not work. However, it appears that using any cache_effective_user other than nobody will succeed. One solution is to create a user account for Squid and set cache_effective_user to that. Alternately you can change the UID for the nobody account from 65535 to 65534.
Russ Mellon notes that these problems with cache_effective_user are fixed in version 2.2.x of the Linux kernel.
Large ACL lists make Squid slow
The regular expression library which comes with Linux is known to be very slow. Some people report it entirely fails to work after long periods of time.
To fix, use the GNUregex library included with the Squid source code. With Squid-2, use the --enable-gnuregex configure option.
gethostbyname() leaks memory in RedHat 6.0 with glibc 2.1.1.
by at netsoft dot ro Radu Greab
The gethostbyname() function leaks memory in RedHat 6.0 with glibc 2.1.1. The quick fix is to delete nisplus service from hosts entry in /etc/nsswitch.conf. In my tests dnsserver memory use remained stable after I made the above change.
See RedHat bug id 3919.
assertion failed: StatHist.c:91: `statHistBin(H, max) == H->capacity - 1' on Alpha system.
Some early versions of Linux have a kernel bug that causes this. All that is needed is a recent kernel that doesn't have the mentioned bug.
tools.c:605: storage size of `rl' isn't known
This is a bug with some versions of glibc. The glibc headers incorrectly depended on the contents of some kernel headers. Everything broke down when the kernel folks rearranged a bit in the kernel-specific header files.
We think this glibc bug is present in versions 2.1.1 (or 2.1.0) and earlier. There are two solutions:
- Make sure /usr/include/linux and /usr/include/asm are from the kernel version glibc is build/configured for, not any other kernel version. Only compiling of loadable kernel modules outside of the kernel sources depends on having the current versions of these, and for such builds -I/usr/src/linux/include (or where ever the new kernel headers are located) can be used to resolve the matter.
- Upgrade glibc to 2.1.2 or later. This is always a good idea anyway, provided a prebuilt upgrade package exists for the Linux distribution used.. Note: Do not attempt to manually build and install glibc from source unless you know exactly what you are doing, as this can easily render the system unuseable.
Can't connect to some sites through Squid
When using Squid, some sites may give erorrs such as "(111) Connection refused" or "(110) Connection timed out" although these sites work fine without going through Squid.
Linux 2.6 implements Explicit Congestion Notification (ECN) support and this can cause some TCP connections to fail when contacting some sites with broken firewalls or broken TCP/IP implementations.
As of June 2006, the number of sites that fail when ECN is enabled is very low and you may find you benefit more from having this feature enabled than globally turning it off.
To work around such broken sites you can disable ECN with the following command:
echo 0 > /proc/sys/net/ipv4/tcp_ecn
HenrikNordstrom explains:
ECN is an standard extension to TCP/IP, making TCP/IP behave better in overload conditions where the available bandwidth is all used up (i.e. the default condition for any WAN link). Defined by Internet RFC3168 issued by the Networking Working Group at IETF, the standardization body responsible for the evolution of TCP/IP and other core Internet technologies such as routing. It's implemented by using two previously unused bits (of 6) in the TCP header, plus redefining two bits of the never standardized TOS field in the IP header (dividing TOS in 6 bits Diffserv and 2 bit ECN fields), allowing routers to clearly indicate overload conditions to the participating computers instead of dropping packets hoping that the computers will realize there is too much traffic. The main problem is the use of those previously unused bits in the TCP header. The TCP/IP standard has always said that those bits is reserved for future use, but many old firewalls assume the bits will never be used and simply drops all traffic using this new feature thinking it's invalid use of TCP/IP to evolve beyond the original standards from 1981. ECN in it's final form was defined 2001, but earlier specifications was circulated several years earlier.
See also the thread on the NANOG mailing list, RFC3168 "The Addition of Explicit Congestion Notification (ECN) to IP, PROPOSED STANDARD" , Sally Floyd's page on ECN and problems related to it or ECN Hall of Shame for more information.
Some sites load extremely slowly or not at all
You may occasionally have problems with TCP Window Scaling on Linux. At first you may be able to TCP connect to the site, but then unable to transfer any data across your connection or that data flows extremely slowly. This is due to some broken firewalls on the Internet (it is not a bug with Linux) mangling the window scaling option when the TCP connection is established. More details and a workaround can be found at lwn.net.
The reason why this is experienced with Linux and not most other OS:es is that all desktop OS:es advertise a quite small window scaling factor if at all, and therefore the firewall bug goes unnoticed with these OS:es.
IRIX
''dnsserver'' always returns 255.255.255.255
There is a problem with GCC (2.8.1 at least) on Irix 6 which causes it to always return the string 255.255.255.255 for _ANY_ address when calling inet_ntoa(). If this happens to you, compile Squid with the native C compiler instead of GCC.
SCO-UNIX
by F.J. Bosscha
To make squid run comfortable on SCO-unix you need to do the following:
Increase the NOFILES paramater and the NUMSP parameter and compile squid with I had, although squid told in the cache.log file he had 3000 filedescriptors, problems with the messages that there were no filedescriptors more available. After I increase also the NUMSP value the problems were gone.
One thing left is the number of tcp-connections the system can handle. Default is 256, but I increase that as well because of the number of clients we have.
AIX
"shmat failed" errors with ''diskd''
32-bit processes on AIX and later are restricted by default to a maximum of 11 shared memory segments. This restriction can be removed on AIX 4.2.1 and later by setting the environment variable EXTSHM=ON in the script or shell which starts squid.
Core dumps when squid process grows to 256MB
32-bit processes cannot use more than 256MB of stack and data in the default memory model. To force the loader to use large address space for squid, either:
set the LDR_CNTRL environment variable,
eg LDR_CNTRL="MAXDATA=0x80000000"; or
link with -bmaxdata:0x80000000; or
- patch the squid binary
See IBM's documentation on large program support for more information, including how to patch an already-compiled program.
Contents
What is a redirector?
Squid has the ability to rewrite requested URLs. Implemented as an external process (similar to a dnsserver), Squid can be configured to pass every incoming URL through a redirector process that returns either a new URL, or a blank line to indicate no change.
The redirector program is NOT a standard part of the Squid package. However, some examples are provided below, and in the "contrib/" directory of the source distribution. Since everyone has different needs, it is up to the individual administrators to write their own implementation.
Why use a redirector?
A redirector allows the administrator to control the locations to which his users goto. Using this in conjunction with interception proxies allows simple but effective porn control.
How does it work?
The redirector program must read URLs (one per line) on standard input, and write rewritten URLs or blank lines on standard output. Note that the redirector program can not use buffered I/O. Squid writes additional information after the URL which a redirector can use to make a decision. The input line consists of four fields:
URL ip-address/fqdn ident method
Do you have any examples?
A simple very fast redirector called SQUIRM is a good place to start, it uses the regex lib to allow pattern matching.
Also see jesred.
The following Perl script may also be used as a template for writing your own redirector:
#!/usr/bin/perl $|=1; while (<>) { s@http://fromhost.com@http://tohost.org@; print; }
Can I use the redirector to return HTTP redirect messages?
Normally, the redirector feature is used to rewrite requested URLs. Squid then transparently requests the new URL. However, in some situations, it may be desirable to return an HTTP "301" or "302" redirect message to the client.
Simply modify your redirector program to prepend either "301:" or "302:" before the new URL. For example, the following script might be used to direct external clients to a secure Web server for internal documents:
#!perl (-) #!/usr/bin/perl $|=1; while (<>) { @X = split; $url = $X[0]; if ($url =~ /^http:\/\/internal\.foo\.com/) { $url =~ s/^http/https/; $url =~ s/internal/secure/; print "302:$url\n"; } else { print "$url\n"; } }
Please see sections 10.3.2 and 10.3.3 of RFC 2068 for an explanation of the 301 and 302 HTTP reply codes.
FATAL: All redirectors have exited!
A redirector process must exit (stop running) only when its stdin is closed. If you see the "All redirectors have exited" message, it probably means your redirector program has a bug. Maybe it runs out of memory or has memory access errors. You may want to test your redirector program outside of squid with a big input list, taken from your access.log perhaps. Also, check for coredump files from the redirector program (see ../TroubleShooting to define where).
Redirector interface is broken re IDENT values
I added a redirctor consisting of
#! /bin/sh /usr/bin/tee /tmp/squid.log
and many of the redirector requests don't have a username in the ident field.
Squid does not delay a request to wait for an ident lookup, unless you use the ident ACLs. Thus, it is very likely that the ident was not available at the time of calling the redirector, but became available by the time the request is complete and logged to access.log.
If you want to block requests waiting for ident lookup, try something like this:
acl foo ident REQUIRED http_access allow foo
Redirections by origin servers
Redirectors only act on client requests; if you wish to modify server-generated redirections (the HTTP Location header) you have to use a location_rewrite helper
Contents
- What is a Cache Digest?
- How and why are they used?
- What is the theory behind Cache Digests?
- How is the size of the Cache Digest in Squid determined?
- What hash functions (and how many of them) does Squid use?
- How are objects added to the Cache Digest in Squid?
- Does Squid support deletions in Cache Digests? What are diffs/deltas?
- When and how often is the local digest built?
- How are Cache Digests transferred between peers?
- How and where are Cache Digests stored?
- How are the Cache Digest statistics in the Cache Manager to be interpreted?
- What are False Hits and how should they be handled?
- How can Cache Digest related activity be traced/debugged?
- What about ICP?
- Is there a Cache Digest Specification?
- Would it be possible to stagger the timings when cache_digests are retrieved from peers?
Cache Digest FAQs compiled by Niall Doherty <ndoherty AT eei DOT ericsson DOT se>.
What is a Cache Digest?
A Cache Digest is a summary of the contents of an Internet Object Caching Server. It contains, in a compact (i.e. compressed) format, an indication of whether or not particular URLs are in the cache.
A "lossy" technique is used for compression, which means that very high compression factors can be achieved at the expense of not having 100% correct information.
How and why are they used?
Cache servers periodically exchange their digests with each other.
When a request for an object (URL) is received from a client a cache can use digests from its peers to find out which of its peers (if any) have that object. The cache can then request the object from the closest peer (Squid uses the NetDB database to determine this).
Note that Squid will only make digest queries in those digests that are enabled. It will disable a peers digest IFF it cannot fetch a valid digest for that peer. It will enable that peers digest again when a valid one is fetched.
The checks in the digest are very fast and they eliminate the need for per-request queries to peers. Hence:
- Latency is eliminated and client response time should be improved.
- Network utilisation may be improved.
Note that the use of Cache Digests (for querying the cache contents of peers) and the generation of a Cache Digest (for retrieval by peers) are independent. So, it is possible for a cache to make a digest available for peers, and not use the functionality itself and vice versa.
What is the theory behind Cache Digests?
Cache Digests are based on Bloom Filters - they are a method for representing a set of keys with lookup capabilities; where lookup means "is the key in the filter or not?".
In building a cache digest:
- A vector (1-dimensional array) of m bits is allocated, with all bits initially set to 0.
- A number, k, of independent hash functions are chosen, h1, h2, ..., hk, with range { 1, ..., m } (i.e. a key hashed with any of these functions gives a value between 1 and m inclusive).
- The set of n keys to be operated on are denoted by: A = { a1, a2, a3, ..., an }.
Adding a Key
To add a key the value of each hash function for that key is calculated. So, if the key was denoted by a, then h1(a), h2(a), ..., hk(a) are calculated.
The value of each hash function for that key represents an index into the array and the corresponding bits are set to 1. So, a digest with 6 hash functions would have 6 bits to be set to 1 for each key added.
Note that the addition of a number of different keys could cause one particular bit to be set to 1 multiple times.
Querying a Key
To query for the existence of a key the indices into the array are calculated from the hash functions as above.
- If any of the corresponding bits in the array are 0 then the key is not present.
If all of the corresponding bits in the array are 1 then the key is likely to be present.
Note the term likely. It is possible that a collision in the digest can occur, whereby the digest incorrectly indicates a key is present. This is the price paid for the compact representation. While the probability of a collision can never be reduced to zero it can be controlled. Larger values for the ratio of the digest size to the number of entries added lower the probability. The number of hash functions chosen also influence the probability.
Deleting a Key
To delete a key, it is not possible to simply set the associated bits to 0 since any one of those bits could have been set to 1 by the addition of a different key!
Therefore, to support deletions a counter is required for each bit position in the array. The procedures to follow would be:
- When adding a key, set appropriate bits to 1 and increment the corresponding counters.
When deleting a key, decrement the appropriate counters (while > 0), and if a counter reaches 0 then the corresponding bit is set to 0.
How is the size of the Cache Digest in Squid determined?
Upon initialisation, the capacity is set to the number of objects that can be (are) stored in the cache. Note that there are upper and lower limits here.
An arbitrary constant, bits_per_entry (currently set to 5), is used to calculate the size of the array using the following formula:
number of bits in array = capacity * bits_per_entry + 7
The size of the digest, in bytes, is therefore:
digest size = int (number of bits in array / 8)
When a digest rebuild occurs, the change in the cache size (capacity) is measured. If the capacity has changed by a large enough amount (10%) then the digest array is freed and reallocated memory, otherwise the same digest is re-used.
What hash functions (and how many of them) does Squid use?
The protocol design allows for a variable number of hash functions (k). However, Squid employs a very efficient method using a fixed number - four.
Rather than computing a number of independent hash functions over a URL Squid uses a 128-bit MD5 hash of the key (actually a combination of the URL and the HTTP retrieval method) and then splits this into four equal chunks.
Each chunk, modulo the digest size (m), is used as the value for one of the hash functions - i.e. an index into the bit array.
Note: As Squid retrieves objects and stores them in its cache on disk, it adds them to the in-RAM index using a lookup key which is an MD5 hash - the very one discussed above. This means that the values for the Cache Digest hash functions are already available and consequently the operations are extremely efficient!
Obviously, modifying the code to support a variable number of hash functions would prove a little more difficult and would most likely reduce efficiency.
How are objects added to the Cache Digest in Squid?
Every object referenced in the index in RAM is checked to see if it is suitable for addition to the digest.
A number of objects are not suitable, e.g. those that are private, not cachable, negatively cached etc. and are skipped immediately.
A freshness test is next made in an attempt to guess if the object will expire soon, since if it does, it is not worthwhile adding it to the digest. The object is checked against the refresh patterns for staleness...
Since Squid stores references to objects in its index using the MD5 key discussed earlier there is no URL actually available for each object - which means that the pattern used will fall back to the default pattern, ".". This is an unfortunate state of affairs, but little can be done about it. A cd_refresh_pattern option will be added to the configuration file soon which will at least make the confusion a little clearer
Note that it is best to be conservative with your refresh pattern for the Cache Digest, i.e. do not add objects if they might become stale soon. This will reduce the number of False Hits.
Does Squid support deletions in Cache Digests? What are diffs/deltas?
Squid does not support deletions from the digest. Because of this the digest must, periodically, be rebuilt from scratch to erase stale bits and prevent digest pollution.
A more sophisticated option is to use diffs or deltas. These would be created by building a new digest and comparing with the current/old one. They would essentially consist of aggregated deletions and additions since the previous digest.
Since less bandwidth should be required using these it would be possible to have more frequent updates (and hence, more accurate information).
Costs:
- RAM - extra RAM needed to hold two digests while comparisons takes place.
- CPU - probably a negligible amount.
When and how often is the local digest built?
The local digest is built:
- when store_rebuild completes after startup (the cache contents have been indexed in RAM), and
- periodically thereafter. Currently, it is rebuilt every hour (more data and experience is required before other periods, whether fixed or dynamically varying, can "intelligently" be chosen). The good thing is that the local cache decides on the expiry time and peers must obey (see later).
While the (new) digest is being built in RAM the old version (stored on disk) is still valid, and will be returned to any peer requesting it. When the digest has completed building it is then swapped out to disk, overwriting the old version.
The rebuild is CPU intensive, but not overly so. Since Squid is programmed using an event-handling model, the approach taken is to split the digest building task into chunks (i.e. chunks of entries to add) and to register each chunk as an event. If CPU load is overly high, it is possible to extend the build period - as long as it is finished before the next rebuild is due!
It may prove more efficient to implement the digest building as a separate process/thread in the future...
How are Cache Digests transferred between peers?
Cache Digests are fetched from peers using the standard HTTP protocol (note that a pull rather than push technique is used).
After the first access to a peer, a peerDigestValidate event is queued (this event decides if it is time to fetch a new version of a digest from a peer). The queuing delay depends on the number of peers already queued for validation - so that all digests from different peers are not fetched simultaneously.
A peer answering a request for its digest will specify an expiry time for that digest by using the HTTP Expires header. The requesting cache thus knows when it should request a fresh copy of that peers digest.
Note: requesting caches use an If-Modified-Since request in case the peer has not rebuilt its digest for some reason since the last time it was fetched.
How and where are Cache Digests stored?
Cache Digest built locally
Since the local digest is generated purely for the benefit of its neighbours keeping it in RAM is not strictly required. However, it was decided to keep the local digest in RAM partly because of the following:
- Approximately the same amount of memory will be (re-)allocated on every rebuild of the digest
- the memory requirements are probably quite small (when compared to other requirements of the cache server)
- if ongoing updates of the digest are to be supported (e.g. additions/deletions) it will be necessary to perform these operations on a digest in RAM
- if diffs/deltas are to be supported the "old" digest would have to be swapped into RAM anyway for the comparisons.
When the digest is built in RAM, it is then swapped out to disk, where it is stored as a "normal" cache item - which is how peers request it.
Cache Digest fetched from peer
When a query from a client arrives, fast lookups are required to decide if a request should be made to a neighbour cache. It it therefore required to keep all peer digests in RAM.
Peer digests are also stored on disk for the following reasons:
Recovery If stopped and restarted, peer digests can be reused from the local on-disk copy (they will soon be validated using an HTTP IMS request to the appropriate peers as discussed earlier)
Sharing peer digests are stored as normal objects in the cache. This allows them to be given to neighbour caches.
How are the Cache Digest statistics in the Cache Manager to be interpreted?
Cache Digest statistics can be seen from the Cache Manager or through the squidclient utility. The following examples show how to use the squidclient utility to request the list of possible operations from the localhost, local digest statistics from the localhost, refresh statistics from the localhost and local digest statistics from another cache, respectively.
squidclient mgr:menu squidclient mgr:store_digest squidclient mgr:refresh squidclient -h peer mgr:store_digest
The available statistics provide a lot of useful debugging information. The refresh statistics include a section for Cache Digests which explains why items were added (or not) to the digest.
The following example shows local digest statistics for a 16GB cache in a corporate intranet environment (may be a useful reference for the discussion below).
store digest: size: 768000 bytes entries: count: 588327 capacity: 1228800 util: 48% deletion attempts: 0 bits: per entry: 5 on: 1953311 capacity: 6144000 util: 32% bit-seq: count: 2664350 avg.len: 2.31 added: 588327 rejected: 528703 ( 47.33 %) del-ed: 0 collisions: on add: 0.23 % on rej: 0.23 %
entries:capacity is a measure of how many items "are likely" to be added to the digest. It represents the number of items that were in the local cache at the start of digest creation - however, upper and lower limits currently apply. This value is multiplied by bits: per entry (an arbitrary constant) to give bits:capacity, which is the size of the cache digest in bits. Dividing this by 8 will give store digest: size which is the size in bytes.
The number of items represented in the digest is given by entries:count. This should be equal to added minus deletion attempts.
Since (currently) no modifications are made to the digest after the initial build (no additions are made and deletions are not supported) deletion attempts will always be 0 and entries:count should simply be equal to added.
entries:util is not really a significant statistic. At most it gives a measure of how many of the items in the store were deemed suitable for entry into the cache compared to how many were "prepared" for.
rej shows how many objects were rejected. Objects will not be added for a number of reasons, the most common being refresh pattern settings. Remember that (currently) the default refresh pattern will be used for checking for entry here and also note that changing this pattern can significantly affect the number of items added to the digest! Too relaxed and False Hits increase, too strict and False Misses increase. Remember also that at time of validation (on the peer) the "real" refresh pattern will be used - so it is wise to keep the default refresh pattern conservative.
bits: on indicates the number of bits in the digest that are set to 1. bits: util gives this figure as a percentage of the total number of bits in the digest. As we saw earlier, a figure of 50% represents the optimal trade-off. Values too high (say > 75%) would cause a larger number of collisions, and hence False Hits, while lower values mean the digest is under-utilised (using unnecessary RAM). Note that low values are normal for caches that are starting to fill up.
A bit sequence is an uninterrupted sequence of bits with the same value. bit-seq: avg.len gives some insight into the quality of the hash functions. Long values indicate problem, even if bits:util is 50% (> 3 = suspicious, > 10 = very suspicious).
What are False Hits and how should they be handled?
A False Hit occurs when a cache believes a peer has an object and asks the peer for it but the peer is not able to satisfy the request.
Expiring or stale objects on the peer are frequent causes of False Hits. At the time of the query actual refresh patterns are used on the peer and stale entries are marked for revalidation. However, revalidation is prohibited unless the peer is behaving as a parent, or miss_access is enabled. Thus, clients can receive error messages instead of revalidated objects!
The frequency of False Hits can be reduced but never eliminated completely, therefore there must be a robust way of handling them when they occur. The philosophy behind the design of Squid is to use lightweight techniques and optimise for the common case and robustly handle the unusual case (False Hits).
Squid will soon support the HTTP only-if-cached header. Requests for objects made to a peer will use this header and if the objects are not available, the peer can reply appropriately allowing Squid to recognise the situation. The following describes what Squid is aiming towards:
- Cache Digests used to obtain good estimates of where a requested object is located in a Cache Hierarchy
- Persistent HTTP Connections between peers. There will be no TCP startup overhead and both latency and
network load will be similar for ICP (i.e. fast).
HTTP False Hit Recognition using the only-if-cached HTTP header - allowing fall back to another peer or, if no other
peers are available with the object, then going direct (or through a parent if behind a firewall).
How can Cache Digest related activity be traced/debugged?
Enabling Cache Digests
If you wish to use Cache Digests (available in Squid version 2) you need to add a configure option, so that the relevant code is compiled in:
./configure --enable-cache-digests ...
What do the access.log entries look like?
If a request is forwarded to a neighbour due a HIT in that neighbour's Cache Digest the hierarchy (9th) field of the access.log file for the local cache will look like CACHE_DIGEST_HIT/neighbour. The Log Tag (4th field) should obviously show a MISS.
On the peer cache the request should appear as a normal HTTP request from the first cache.
What does a False Hit look like?
The easiest situation to analyse is when two caches (say A and are involved neither of which uses the other as a parent. In this case, a False Hit would show up as a CACHE_DIGEST_HIT on A and NOT as a TCP_HIT on B (or vice versa). If B does not fetch the object for A then the hierarchy field will look like NONE/- (and A will have received an Access Denied or Forbidden message). This will happen if the object is not "available" on B and B does not have miss_access enabled for A (or is not acting as a parent for A).
How is the cause of a False Hit determined?
Assume A requests a URL from B and receives a False Hit
Using the squidclient utility PURGE the URL from A, e.g.
squidclient -m PURGE 'URL'
Using the squidclient utility request the object from A, e.g.
squidclient 'URL'
The HTTP headers of the request are available. Two header types are of particular interest:
X-Cache - this shows whether an object is available or not.
X-Cache-Lookup - this keeps the result of a store table lookup before refresh causing rules are checked (i.e. it indicates if the object is available before any validation would be attempted).
The X-Cache and X-Cache-Lookup headers from A should both show MISS.
If A requests the object from B (which it will if the digest lookup indicates B has it - assuming B is closest peer of course then there will be another set of these headers from B.
If the X-Cache header from B shows a MISS a False Hit has occurred. This means that A thought B had an object but B tells A it does not have it available for retrieval. The reason why it is not available for retrieval is indicated by the X-Cache-Lookup header. If:
X-Cache-Lookup = MISS then either A's (version of B's) digest is out-of-date or corrupt OR a collision occurred
in the digest (very small probability) OR B recently purged the object.
X-Cache-Lookup = HIT then B had the object, but refresh rules (or A's max-age requirements) prevent A from getting a HIT (validation failed).
Use The Source
If there is something else you need to check you can always look at the source code. The main Cache Digest functionality is organised as follows:
CacheDigest.c (debug section 70) Generic Cache Digest routines
store_digest.c (debug section 71) Local Cache Digest routines
peer_digest.c (debug section 72) Peer Cache Digest routines
Note that in the source the term Store Digest refers to the digest created locally. The Cache Digest code is fairly self-explanatory (once you understand how Cache Digests work):
What about ICP?
|
WANTED |
Is there a Cache Digest Specification?
There is now, thanks to Martin Hamilton <martin AT net DOT lut DOT ac DOT uk> and AlexRousskov.
Cache Digests, as implemented in Squid 2.1.PATCH2, are described in cache-digest-v5.txt.
You'll notice the format is similar to an Internet Draft. We decided not to submit this document as a draft because Cache Digests will likely undergo some important changes before we want to try to make it a standard.
Would it be possible to stagger the timings when cache_digests are retrieved from peers?
|
The information here is current for version 2.2 |
Squid already has code to spread the digest updates. The algorithm is currently controlled by a few hard-coded constants in peer_digest.c. For example, GlobDigestReqMinGap variable determines the minimum interval between two requests for a digest. You may want to try to increase the value of GlobDigestReqMinGap from 60 seconds to whatever you feel comfortable with (but it should be smaller than hour/number_of_peers, of course).
Note that whatever you do, you still need to give Squid enough time and bandwidth to fetch all the digests. Depending on your environment, that bandwidth may be more or less than an ICP would require. Upcoming digest deltas (x10 smaller than the digests themselves) may be the only way to solve the "big scale" problem.
Include: Nothing found for "^Back to the"!
Or, How can I make my users' browsers use my cache without configuring the browsers for proxying?
Contents
- Concepts of Interception Caching
- Requirements and methods for Interception Caching
- Steps involved in configuring Interception Caching
- Compile a version of Squid which accepts connections for other addresses
- Getting your traffic to the right port on your Squid Cache
- Get the packets from the end clients to your cache server
- Interception Caching packet redirection with Cisco routers using policy routing (NON WCCP)
- Interception Caching packet redirection with Foundry L4 switches
- Interception Caching packet redirection with an Alcatel OmnySwitch 7700
- Interception Caching packet redirection with Cabletron/Entrasys products
- Interception Caching packet redirection with ACC Tigris digital access server
- WCCP - Web Cache Coordination Protocol
- TProxy Interception
- Complete
- Troubleshooting and Questions
- Configuration Examples contributed by users who have working installations
- Further information about configuring Interception Caching with Squid
- Configuring Other Operating Systems
- Issues with HotMail
Concepts of Interception Caching
Interception Caching goes under many names - Interception Caching, Transparent Proxying and Cache Redirection. Interception Caching is the process by which HTTP connections coming from remote clients are redirected to a cache server, without their knowledge or explicit configuration.
There are some good reasons why you may want to use this technique:
- There is no client configuration required. This is the most popular reason for investigating this option.
- You can implement better and more reliable strategies to maintain client access in case of your cache infrastructure going out of service.
However there are also significant disadvantages for this strategy, as outlined by Mark Elsen:
- Intercepting HTTP breaks TCP/IP standards because user agents think they are talking directly to the origin server.
- It causes path-MTU (PMTUD) to fail, possibly making some remote sites inaccessible. This is not usually a problem if your client machines are connected via Ethernet or DSL PPPoATM where the MTU of all links between the cache and client is 1500 or more. If your clients are connecting via DSL PPPoE then this is likely to be a problem as PPPoE links often have a reduced MTU (1472 is very common).
- On older IE versions before version 6, the ctrl-reload function did not work as expected.
- Proxy authentication does not work, and IP based authentication conceptually fails because the users are all seen to come from the Interception Cache's own IP address.
- You can't use IDENT lookups (which are inherently very insecure anyway)
- Interception Caching only supports the HTTP protocol, not gopher, SSL or FTP. You cannot setup a redirection-rule to the proxy server for other protocols other than HTTP since it will not know how to deal with it.
- Intercepting Caches are incompatible with IP filtering designed to prevent address spoofing.
- Clients are still expected to have full Internet DNS resolving capabilities; in certain intranet/firewalling setups, this is not always wanted.
- Related to above: suppose the users browser connects to a site which is down. However, due to the transparent proxying, it gets a connected state to the interceptor. The end user may get wrong error messages or a hung browser, for seemingly unknown reasons to them.
If you feel that the advantages outweigh the disadvantages in your network, you may choose to continue reading and look at implementing Interception Caching.
Requirements and methods for Interception Caching
- You need to have a good understanding of what you are doing before you start. This involves understanding at a TCP layer what is happening to the connections. This will help you both configure the system and additionally assist you if your end clients experience problems after you have deployed your solution.
- Squid-2.5, Squid-2.6 or Squid-3.0. You should run the latest version of 2.6 or 3.0 that is available at the time.
- A newer OS may make things easier, especially with Linux. Linux 2.6.9 supports WCCP via the native GRE kernel module. This will save you having to build the ip_wccp module by hand later on, and also means that any upgrades to your kernel will not result in a broken binary WCCP module.
- Quite likely you will need a network device which can redirect the traffic to your cache. If your Squid box is also functioning as a router and all traffic from and to your network is in the path, you can skip this step. If your cache is a standalone box on a LAN that does not normally see your clients web browsing traffic, you will need to choose a method of redirecting the HTTP traffic from your client machines to the cache. This is typically done with a network appliance such as a router or Layer 3 switch which either rewrite the destination MAC address or alternatively encapsulate the network traffic via a GRE or WCCP tunnel to your cache.
NB: If you are using Cisco routers and switches in your network you may wish to investigate the use of WCCP. WCCP is an extremely flexible way of redirecting traffic and is intelligent enough to automatically stop redirecting client traffic if your cache goes offline. This may involve you upgrading your router or switch to a release of IOS or an upgraded featureset which supports WCCP. There is a section written specifically on WCCP below.
Steps involved in configuring Interception Caching
- Building a Squid with the correct options to ./configure to support the redirection and handle the clients correctly
- Routing the traffic from port 80 to the port your Squid is configured to accept the connections on
- Decapsulating the traffic that your network device sends to Squid (only if you are using GRE or WCCP to intercept the traffic)
- Configuring your network device to redirect the port 80 traffic.
The first two steps are required and the last two may or may not be required depending on how you intend to route the HTTP traffic to your cache.
!It is critical to read the full comments in the squid.conf file and in this document in it's entirety before you begin. Getting Interception Caching to work with Squid is non-trivial and requires many subsystems of both Squid and your network to be configured exactly right or else you will find that it will not work and your users will not be able to browse at all. You MUST test your configuration out in a non-live environment before you unleash this feature on your end users.
Compile a version of Squid which accepts connections for other addresses
Firstly you need to build Squid with the correct options to ./configure, and then you need to configure squid.conf to support Intercept Caching.
Choosing the right options to pass to ./configure
All supported versions of Squid currently available support Interception Caching, however for this to work properly, your operating system and network also need to be configured. For some operating systems, you need to have configured and built a version of Squid which can recognize the hijacked connections and discern the destination addresses. For Linux this works by configuring Squid with the --enable-linux-netfilter option. For *BSD-based systems, you probably have to configure squid with the --enable-ipf-transparent option if you're using IP Filter, or --enable-pf-transparent if you're using OpenBSD's PF. Do a make clean if you previously configured without that option, or the correct settings may not be present.
By default, Squid-2.6 and Squid-3.0 support both WCCPv1 and WCCPv2 by default (unless explicitly disabled).
Configure Squid to accept and process the redirected port 80 connections
You have to change the Squid configuration settings to recognize the hijacked connections and discern the destination addresses.
For Squid-2.6 and Squid-3.0 you simply need to add the keyword transparent on the http_port that your proxy will receive the redirected requests on as the above directives are not necessary and in fact have been removed in those releases:
http_port 3128 transparent
You can manually configure browsers to connect to the IP address and port which you have specified as transparent. The only drawback is that there will be a very slight (and probably unnoticeable) performance hit as a syscall done to see if the connection is intercepted. If no interception state is found it is processed just like a normal connection.
For Squid-2.5 and earlier the configuration is a little more complex. Here are the important settings in squid.conf for Squid-2.5 and earlier:
http_port 3128 httpd_accel_host virtual httpd_accel_port 80 httpd_accel_with_proxy on httpd_accel_uses_host_header on
The http_port 3128 in this example assumes you will redirect incoming port 80 packets to port 3128 on your cache machine. You may use any other port like 8080, the most important thing is that the port number matches the interception rules in the local firewall.
In the httpd_accel_host option, use the keyword virtual
The httpd_accel_with_proxy on is required to enable interception proxy mode; essentially in interception proxy mode Squid thinks it is acting both as an accelerator (hence accepting packets for other IPs on port 80) and a caching proxy (hence serving files out of cache.)
You must use httpd_accel_uses_host_header on to get the cache to work properly in interception mode. This enables the cache to index its stored objects under the true hostname, as is done in a normal proxy, rather than under the IP address. This is especially important if you want to use a parent cache hierarchy, or to share cache data between interception proxy users and non-interception proxy users, which you can do with Squid in this configuration.
Getting your traffic to the right port on your Squid Cache
You have to configure your cache host to accept the redirected packets - any IP address, on port 80 - and deliver them to your cache application. This is typically done with IP filtering/forwarding features built into the kernel. On Linux this is called iptables (kernel 2.4 and above), ipchains (2.2.x) or
ipfwadm (2.0.x). On FreeBSD its called ipfw. Other BSD systems may use ip filter, ipnat or pf.
On most systems, it may require rebuilding the kernel or adding a new loadable kernel module. If you are running a modern Linux distribution and using the vendor supplied kernel you will likely not need to do any rebuilding as the required modules will have been built by default.
Interception Caching packet redirection for Solaris, SunOS, and BSD systems
You don't need to use IP Filter on FreeBSD. Use the built-in ipfw feature instead. See the FreeBSD subsection below.
Install IP Filter
First, get and install the IP Filter package.
Configure ipnat
Put these lines in /etc/ipnat.rules:
# Redirect direct web traffic to local web server. rdr de0 1.2.3.4/32 port 80 -> 1.2.3.4 port 80 tcp # Redirect everything else to squid on port 8080 rdr de0 0.0.0.0/0 port 80 -> 1.2.3.4 port 8080 tcp
Modify your startup scripts to enable ipnat. For example, on FreeBSD it looks something like this:
/sbin/modload /lkm/if_ipl.o /sbin/ipnat -f /etc/ipnat.rules chgrp nobody /dev/ipnat chmod 644 /dev/ipnat
Thanks to Quinton Dolan.
Interception Caching packet redirection for OpenBSD PF
<After having compiled Squid with the options to accept and process the redirected port 80 connections enumerated above, either manually or with FLAVOR=transparent for /usr/ports/www/squid, one needs to add a redirection rule to pf (/etc/pf.conf). In the following example, sk0 is the interface on which traffic you want transparently redirected will arrive:
i = "sk0" rdr on $i inet proto tcp from any to any port 80 -> $i port 3128 pass on $i inet proto tcp from $i:network to $i port 3128
Or, depending on how recent your implementation of PF is:
i = "sk0" rdr pass on $i inet proto tcp to any port 80 -> $i port 3128
Also, see Daniel Hartmeier's page on the subject.
Interception Caching packet redirection for Linux
Specific instructions depend on what version of Linux Kernel you are using.
Interception Caching packet redirection with Linux 2.0 and ipfwadm
Interception proxying does NOT work with Linux-2.0.30! Linux-2.0.29 is known to work well. If you're using a more recent kernel, like 2.2.X, then you should probably use an ipchains configuration, as described below.
This technique has some shortcomings.
If you can live with the side-effects, go ahead and compile your kernel with firewalling and redirection support. Here are the important parameters from
/usr/src/linux/.config:
# # Code maturity level options # CONFIG_EXPERIMENTAL=y # # Networking options # CONFIG_FIREWALL=y # CONFIG_NET_ALIAS is not set CONFIG_INET=y CONFIG_IP_FORWARD=y # CONFIG_IP_MULTICAST is not set CONFIG_IP_FIREWALL=y # CONFIG_IP_FIREWALL_VERBOSE is not set CONFIG_IP_MASQUERADE=y CONFIG_IP_TRANSPARENT_PROXY=y CONFIG_IP_ALWAYS_DEFRAG=y # CONFIG_IP_ACCT is not set CONFIG_IP_ROUTER=y
You may also need to enable IP Forwarding. One way to do it is to add this line to your startup scripts:
echo 1 > /proc/sys/net/ipv4/ip_forward
Alternatively edit /etc/sysctl.conf
You can either go to the Linux IP Firewall and Accounting page, obtain the source distribution to ipfwadm and install it OR better still, download a precompiled binary from your distribution. Older versions of ipfwadm may not work. You might need at least version 2.3.0. You'll use ipfwadm to setup the redirection rules. I added this rule to the script that runs from /etc/rc.d/rc.inet1 (Slackware) which sets up the interfaces at boot-time. The redirection should be done before any other Input-accept rule.
To really make sure it worked I disabled the forwarding (masquerading) I normally do.
/etc/rc.d/rc.firewall:
#!/bin/sh # rc.firewall Linux kernel firewalling rules FW=/sbin/ipfwadm # Flush rules, for testing purposes for i in I O F # A # If we enabled accounting too do ${FW} -$i -f done # Default policies: ${FW} -I -p rej # Incoming policy: reject (quick error) ${FW} -O -p acc # Output policy: accept ${FW} -F -p den # Forwarding policy: deny # Input Rules: # Loopback-interface (local access, eg, to local nameserver): ${FW} -I -a acc -S localhost/32 -D localhost/32 # Local Ethernet-interface: # Redirect to Squid proxy server: ${FW} -I -a acc -P tcp -D default/0 80 -r 8080 # Accept packets from local network: ${FW} -I -a acc -P all -S localnet/8 -D default/0 -W eth0 # Only required for other types of traffic (FTP, Telnet): # Forward localnet with masquerading (udp and tcp, no icmp!): ${FW} -F -a m -P tcp -S localnet/8 -D default/0 ${FW} -F -a m -P udp -S localnet/8 -D default/0
Here all traffic from the local LAN with any destination gets redirected to the local port 8080. Rules can be viewed like this:
IP firewall input rules, default policy: reject type prot source destination ports acc all 127.0.0.1 127.0.0.1 n/a acc/r tcp 10.0.0.0/8 0.0.0.0/0 * -> 80 => 8080 acc all 10.0.0.0/8 0.0.0.0/0 n/a acc tcp 0.0.0.0/0 0.0.0.0/0 * -> *
I did some testing on Windows 95 with both Microsoft Internet Explorer 3.01 and Netscape Communicator pre-release and it worked with both browsers with the proxy-settings disabled.
At one time Squid seemed to get in a loop when I pointed the browser to the local port 80. But this could be avoided by adding a reject rule for client to this address:
${FW} -I -a rej -P tcp -S localnet/8 -D hostname/32 80 IP firewall input rules, default policy: reject type prot source destination ports acc all 127.0.0.1 127.0.0.1 n/a rej tcp 10.0.0.0/8 10.0.0.1 * -> 80 acc/r tcp 10.0.0.0/8 0.0.0.0/0 * -> 80 => 8080 acc all 10.0.0.0/8 0.0.0.0/0 n/a acc tcp 0.0.0.0/0 0.0.0.0/0 * -> *
NOTE on resolving names: Instead of just passing the URLs to the proxy server, the browser itself has to resolve the URLs. Make sure the workstations are setup to query a local nameserver, to minimize outgoing traffic.
If you're already running a nameserver at the firewall or proxy server (which is a good idea anyway IMHO) let the workstations use this nameserver.
Additional notes from Richard Ayres
I'm using such a setup. The only issues so far have been that: * Linux kernel 2.0.30 is a no-no as interception proxying is broken (Use 2.0.29 or 2.0.31 or later) * The Microsoft Network won't authorize its users through a proxy, so I have to specifically *not* redirect those packets (my company is a MSN content provider).
See also Daniel Kiracofe's HOWTO page.
Interception Caching packet redirection with Linux 2.2 and ipchains
by Martin Lyons
You need to configure your kernel for ipchains. Configuring Linux kernels is beyond the scope of this FAQ. One way to do it is:
# cd /usr/src/linux # make menuconfig
The following shows important kernel features to include:
[*] Network firewalls [ ] Socket Filtering [*] Unix domain sockets [*] TCP/IP networking [ ] IP: multicasting [ ] IP: advanced router [ ] IP: kernel level autoconfiguration [*] IP: firewalling [ ] IP: firewall packet netlink device [*] IP: always defragment (required for masquerading) [*] IP: transparent proxy support
You must include the IP: always defragment, otherwise it prevents you from using the REDIRECT chain. You can use this script as a template for your own rc.firewall to configure ipchains:
#!/bin/sh # rc.firewall Linux kernel firewalling rules # Leon Brooks (leon at brooks dot fdns dot net) FW=/sbin/ipchains ADD="$FW -A" # Flush rules, for testing purposes for i in I O F # A # If we enabled accounting too do ${FW} -F $i done # Default policies: ${FW} -P input REJECT # Incoming policy: reject (quick error) ${FW} -P output ACCEPT # Output policy: accept ${FW} -P forward DENY # Forwarding policy: deny # Input Rules: # Loopback-interface (local access, eg, to local nameserver): ${ADD} input -j ACCEPT -s localhost/32 -d localhost/32 # Local Ethernet-interface: # Redirect to Squid proxy server: ${ADD} input -p tcp -d 0/0 80 -j REDIRECT 8080 # Accept packets from local network: ${ADD} input -j ACCEPT -s localnet/8 -d 0/0 -i eth0 # Only required for other types of traffic (FTP, Telnet): # Forward localnet with masquerading (udp and tcp, no icmp!): ${ADD} forward -j MASQ -p tcp -s localnet/8 -d 0/0 ${ADD} forward -j MASQ -P udp -s localnet/8 -d 0/0
Also, Andrew Shipton notes that with 2.0.x kernels you don't need to enable packet forwarding, but with the 2.1.x and 2.2.x kernels using ipchains you do. Edit /etc/sysctl.conf to make this change permanent. Packet forwarding is enabled with the following command:
echo 1 > /proc/sys/net/ipv4/ip_forward
Interception Caching packet redirection with Linux 2.4 or later and Netfilter
NOTE: this information comes from Daniel Kiracofe's Transparent Proxy with Squid HOWTO.
To support Netfilter transparent interception on Linux 2.4 or later, remember Squid must be compiled with the --enable-linux-netfilter option.
If you are running a custom built kernel (rather than one supplied by your Linux distribution), you need to build in support for at least these options:
- Networking support
- Sysctl support
- Network packet filtering
- TCP/IP networking
- Connection tracking (Under "IP: Netfilter Configuration" in menuconfig)
- IP tables support
- Full NAT
- REDIRECT target support
Quite likely you will already have most if not all of those options.
You must say NO to "Fast switching".
After building the kernel, install it and reboot.
You may need to enable packet forwarding (e.g. in your startup scripts):
echo 1 > /proc/sys/net/ipv4/ip_forward
Use the iptables command to make your kernel intercept HTTP connections and send them to Squid:
iptables -t nat -A PREROUTING -i eth0 -d 192.168.0.0/255.255.255.0 ACCEPT iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 3128
Get the packets from the end clients to your cache server
There are several ways to do this. First, if your proxy machine is already in the path of the packets (i.e. it is routing between your proxy users and the Internet) then you don't have to worry about this step as the Interception Caching should now be working. This would be true if you install Squid on a firewall machine, or on a UNIX-based router. If the cache is not in the natural path of the connections, then you have to divert the packets from the normal path to your cache host using a router or switch.
If you are using an external device to route the traffic to your Cache, there are multiple ways of doing this. You may be able to do this with a Cisco router using WCCP, or the "route map" feature. You might also use a so-called layer-4 switch, such as the Alteon ACE-director or the Foundry Networks ServerIron.
Finally, you might be able to use a stand-alone router/load-balancer type product, or routing capabilities of an access server.
Interception Caching packet redirection with Cisco routers using policy routing (NON WCCP)
This works with at least IOS 11.1 and later. If your router is doing anything more complicated that shuffling packets between an ethernet interface and either a serial port or BRI port, then you should work through if this will work for you.
First define a route map with a name of proxy-redirect (name doesn't matter) and specify the next hop to be the machine Squid runs on.
! route-map proxy-redirect permit 10 match ip address 110 set ip next-hop 203.24.133.2 !
Define an access list to trap HTTP requests. The second line allows the Squid host direct access so an routing loop is not formed. By carefully writing your access list as show below, common cases are found quickly and this can greatly reduce the load on your router's processor.
! access-list 110 deny tcp any any neq www access-list 110 deny tcp host 203.24.133.2 any access-list 110 permit tcp any any !
Apply the route map to the ethernet interface.
! interface FastEthernet0/0 ip policy route-map proxy-redirect !
Shortcomings of the cisco ip policy route-map method
Bruce Morgan notes that there is a Cisco bug relating to interception proxying using IP policy route maps, that causes NFS and other applications to break. Apparently there are two bug reports raised in Cisco, but they are not available for public dissemination.
The problem occurs with o/s packets with more than 1472 data bytes. If you try to ping a host with more than 1472 data bytes across a Cisco interface with the access-lists and ip policy route map, the icmp request will fail. The packet will be fragmented, and the first fragment is checked against the access-list and rejected - it goes the "normal path" as it is an icmp packet - however when the second fragment is checked against the access-list it is accepted (it isn't regarded as an icmp packet), and goes to the action determined by the policy route map!
John notes that you may be able to get around this bug by carefully writing your access lists. If the last/default rule is to permit then this bug would be a problem, but if the last/default rule was to deny then it won't be a problem. I guess fragments, other than the first, don't have the information available to properly policy route them. Normally TCP packets should not be fragmented, at least my network runs an MTU of 1500 everywhere to avoid fragmentation. So this would affect UDP and ICMP traffic only.
Basically, you will have to pick between living with the bug or better performance. This set has better performance, but suffers from the bug:
access-list 110 deny tcp any any neq www access-list 110 deny tcp host 10.1.2.3 any access-list 110 permit tcp any any
Conversely, this set has worse performance, but works for all protocols:
access-list 110 deny tcp host 10.1.2.3 any access-list 110 permit tcp any any eq www access-list 110 deny tcp any any
Interception Caching packet redirection with Foundry L4 switches
by at shreve dot net Brian Feeny.
First, configure Squid for interception caching as detailed at the beginning of this section.
Next, configure the Foundry layer 4 switch to redirect traffic to your Squid box or boxes. By default, the Foundry redirects to port 80 of your squid box. This can be changed to a different port if needed, but won't be covered here.
In addition, the switch does a "health check" of the port to make sure your squid is answering. If you squid does not answer, the switch defaults to sending traffic directly thru instead of redirecting it. When the Squid comes back up, it begins redirecting once again.
This example assumes you have two squid caches:
squid1.foo.com 192.168.1.10 squid2.foo.com 192.168.1.11
We will assume you have various workstations, customers, etc, plugged into the switch for which you want them to be intercepted and sent to Squid. The squid caches themselves should be plugged into the switch as well. Only the interface that the router is connected to is important. Where you put the squid caches or ther connections does not matter.
This example assumes your router is plugged into interface 17 of the switch. If not, adjust the following commands accordingly.
- Enter configuration mode:
telnet@ServerIron#conf t
- Configure each squid on the Foundry:
telnet@ServerIron(config)# server cache-name squid1 192.168.1.10 telnet@ServerIron(config)# server cache-name squid2 192.168.1.11
- Add the squids to a cache-group:
telnet@ServerIron(config)#server cache-group 1 telnet@ServerIron(config-tc-1)#cache-name squid1 telnet@ServerIron(config-tc-1)#cache-name squid2
- Create a policy for caching http on a local port
telnet@ServerIron(config)# ip policy 1 cache tcp http local
- Enable that policy on the port connected to your router
telnet@ServerIron(config)#int e 17 telnet@ServerIron(config-if-17)# ip-policy 1
Since all outbound traffic to the Internet goes out interface 17 (the router), and interface 17 has the caching policy applied to it, HTTP traffic is going to be intercepted and redirected to the caches you have configured.
The default port to redirect to can be changed. The load balancing algorithm used can be changed (Least Used, Round Robin, etc). Ports can be exempted from caching if needed. Access Lists can be applied so that only certain source IP Addresses are redirected, etc. This information was left out of this document since this was just a quick howto that would apply for most people, not meant to be a comprehensive manual of how to configure a Foundry switch. I can however revise this with any information necessary if people feel it should be included.
Interception Caching packet redirection with an Alcatel OmnySwitch 7700
by Pedro A M Vazquez
On the switch define a network group to be intercepted:
policy network group MyGroup 10.1.1.0 mask 255.255.255.0
Define the tcp services to be intercepted:
policy service web80 destination tcp port 80 policy service web8080 destination tcp port 8080
Define a group of services using the services above:
policy service group WebPorts web80 web8080
And use these to create an intercept condition:
policy condition WebFlow source network group MyGroup service group WebPorts
Now, define an action to redirect the traffic to the host running squid:
policy action Redir alternate gateway ip 10.1.2.3
Finally, create a rule using this condition and the corresponding action:
policy rule Intercept condition WebFlow action Redir
Apply the rules to the QoS system to make them effective
qos apply
Don't forget that you still need to configure Squid and Squid's operating system to handle the intercepted connections. See above for Squid and OS-specific details.
Interception Caching packet redirection with Cabletron/Entrasys products
By Dave Wintrip, dave at purevanity dot net, June 3, 2004.
I have verified this configuration as working on a Cabletron SmartSwitchRouter 2000, and it should work on any layer-4 aware Cabletron or Entrasys product.
You must first configure Squid to enable interception caching, outlined earlier.
Next, make sure that you have connectivity from the layer-4 device to your squid box, and that squid is correctly configured to intercept port 80 requests thrown it's way.
I generally create two sets of redirect ACLs, one for cache, and one for bypassing the cache. This method of interception is very similar to Cisco's route-map.
Log into the device, and enter enable mode, as well as configure mode.
ssr> en Password: ssr# conf ssr(conf)#
I generally create two sets of redirect ACLs, one for specifying who to cache, and one for destination addresses that need to bypass the cache. This method of interception is very similar to Cisco's route-map in this way. The ACL cache-skip is a list of destination addresses that we do not want to transparently redirect to squid.
ssr(conf)# acl cache-skip permit tcp any 192.168.1.100/255.255.255.255 any http
The ACL cache-allow is a list of source addresses that will be redirected to Squid.
ssr(conf)# acl cache-allow permit tcp 10.0.22.0/255.255.255.0 any any http
Save your new ACLs to the running configuration.
ssr(conf)# save a
Next, we need to create the ip-policies that will work to perform the redirection. Please note that 10.0.23.2 is my Squid server, and that 10.0.24.1 is my standard default next hop. By pushing the cache-skip ACL to the default gateway, the web request is sent out as if the squid box was not present. This could just as easily be done using the squid configuration, but I would rather Squid not touch the data if it has no reason to.
ssr(conf)# ip-policy cache-allow permit acl cache-allow next-hop-list 10.0.23.2 action policy-only ssr(conf)# ip-policy cache-skip permit acl cache-skip next-hop-list 10.0.24.1 action policy-only
Apply these new policies into the active configuration.
ssr(conf)# save a
We now need to apply the ip-policies to interfaces we want to cache requests from. Assuming that localnet-gw is the interface name to the network we want to cache requests from, we first apply the cache-skip ACL to intercept requests on our do-not-cache list, and forward them out the default gateway. We then apply the cache-allow ACL to the same interface to redirect all other requests to the cache server.
ssr(conf)# ip-policy cache-skip apply interface localnet-gw ssr(conf)# ip-policy cache-allow apply interface localnet-gw
We now need to apply, and permanently save our changes. Nothing we have done before this point would effect anything without adding the ip-policy applications into the active configuration, so lets try it.
ssr(conf)# save a ssr(conf)# save s
Provided your Squid box is correct configured, you should now be able to surf, and be transparently cached if you are using the localnet-gw address as your gateway.
Some Cabletron/Entrasys products include another method of applying a web cache, but details on configuring that is not covered in this document, however is it fairly straight forward.
Also note, that if your Squid box is plugged directly into a port on your layer-4 switch, and that port is part of its own VLAN, and its own subnet, if that port were to change states to down, or the address becomes uncontactable, then the switch will automatically bypass the ip-policies and forward your web request though the normal means. This is handy, might I add.
Interception Caching packet redirection with ACC Tigris digital access server
This is to do with configuring interception proxy for an ACC Tigris digital access server (like a CISCO 5200/5300 or an Ascend MAX 4000). I've found that doing this in the NAS reduces traffic on the LAN and reduces processing load on the CISCO. The Tigris has ample CPU for filtering.
Step 1 is to create filters that allow local traffic to pass. Add as many as needed for all of your address ranges.
ADD PROFILE IP FILTER ENTRY local1 INPUT 10.0.3.0 255.255.255.0 0.0.0.0 0.0.0.0 NORMAL ADD PROFILE IP FILTER ENTRY local2 INPUT 10.0.4.0 255.255.255.0 0.0.0.0 0.0.0.0 NORMAL
Step 2 is to create a filter to trap port 80 traffic.
ADD PROFILE IP FILTER ENTRY http INPUT 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 = 0x6 D= 80 NORMAL
Step 3 is to set the "APPLICATION_ID" on port 80 traffic to 80. This causes all packets matching this filter to have ID 80 instead of the default ID of 0.
SET PROFILE IP FILTER APPLICATION_ID http 80
Step 4 is to create a special route that is used for packets with "APPLICATION_ID" set to 80. The routing engine uses the ID to select which routes to use.
ADD IP ROUTE ENTRY 0.0.0.0 0.0.0.0 PROXY-IP 1 SET IP ROUTE APPLICATION_ID 0.0.0.0 0.0.0.0 PROXY-IP 80
Step 5 is to bind everything to a filter ID called transproxy. List all local filters first and the http one last.
ADD PROFILE ENTRY transproxy local1 local2 http
With this in place use your RADIUS server to send back the "Framed-Filter-Id = transproxy" key/value pair to the NAS.
You can check if the filter is being assigned to logins with the following command:
display profile port table
WCCP - Web Cache Coordination Protocol
Contributors: Glenn Chisholm, Lincoln Dale and ReubenFarrelly.
WCCP is a very common and indeed a good way of doing Interception Caching as it adds additional features and intelligence to the traffic redirection process. WCCP is a dynamic service in which a cache engine communicates to a router about it's status, and based on that the router decides whether or not to redirect the traffic. This means that if your cache becomes unavailable, the router will automatically stop attempting to forward traffic to it and end users will not be affected (and likely not even notice that your cache is out of service).
WCCPv1 is documented in the Internet-Draft draft-forster-wrec-wccp-v1-00.txt and WCCPv2 is documented in draft-wilson-wrec-wccp-v2-00.txt.
For WCCP to work, you firstly need to configure your Squid Cache, and additionally configure the host OS to redirect the HTTP traffic from port 80 to whatever port your Squid box is listening to the traffic on. Once you have done this you can then proceed to configure WCCP on your router.
Does Squid support WCCP?
Cisco's Web Cache Coordination Protocol V1.0 is supported in all current versions of Squid. WCCPv2 is supported by Squid-2.6 and later.
Do I need a cisco router to run WCCP?
No. Originally WCCP support could only be found on cisco devices, but some other network vendors now support WCCP as well. If you have any information on how to configure non-cisco devices, please post this here.
Can I run WCCP with the Windows port of Squid?
Technically it may be possible, but we have not heard of anyone doing so. The easiest way would be to use a Layer 3 switch and doing Layer 2 MAC rewriting to send the traffic to your cache. If you are using a router then you will need to find out a way to decapsulate the GRE/WCCP traffic that the router sends to your Windows cache (this is a function of your OS, not Squid).
Where can I find out more about WCCP?
Cisco have some good content on their website about WCCP. One of the better documents which lists the features and describes how to configure WCCP on their routers can be found on there website here.
There is also a more technical document which describes the format of the WCCP packets at Colasoft
Cisco router software required for WCCP
This depends on whether you are running a switch or a router.
IOS support in Cisco Routers
Almost all Cisco routers support WCCP provided you are running IOS release 12.0 or above, however some routers running older software require an upgrade to their software feature sets to a 'PLUS' featureset or better. WCCPv2 is supported on almost all routers in recent IPBASE releases.
Cisco's Feature Navigator at http://www.cisco.com/go/fn runs an up to date list of which platforms support WCCPv2.
Generally you should run the latest release train of IOS for your router that you can. We do not recommend you run T or branch releases unless you have fully tested them out in a test environment before deployment as WCCP requires many parts of IOS to work reliably. The latest mainline 12.1, 12.2, 12.3 and 12.4 releases are generally the best ones to use and should be the most trouble free.
Note that you will need to set up a GRE or WCCP tunnel on your cache to decapsulate the packets your router sends to it.
IOS support in Cisco Switches
High end Cisco switches support Layer 2 WCCPv2, which means that instead of a GRE tunnel transport, the ethernet frames have their next hop/destination MAC address rewritten to that of your cache engine. This is far faster to process by the hardware than the router/GRE method of redirection, and in fact on some platforms such as the 6500s may be the only way WCCP can be configured. L2 redirection is supposedly capable of redirecting in excess of 30 million PPS on the high end 6500 Sup cards.
Cisco switches known to be able to do WCCPv2 include the Catalyst 3550 (very basic WCCP only), Catalyst 4500-SUP2 and above, and all models of the 6000/6500.
Note that the Catalyst 2900, 3560, 3750 and 4000 early Supervisors do NOT support WCCP (at all).
Layer 2 WCCP is a WCCPv2 feature and does not exist in cisco's WCCPv1 implementation.
WCCPv2 Layer 2 redirection was added in 12.1E and 12.2S.
It is always advisable to read the release notes for the version of software you are running on your switch before you deploy WCCP.
Software support in Cisco Firewalls (PIX OS)
Version 7.2(1) of the cisco PIX software now also supports WCCP, allowing you to do WCCP redirection with this appliance rather than having to have a router do the redirection.
7.2(1) has been tested and verified to work with Squid-2.6.
What about WCCPv2?
WCCPv2 is a new feature to Squid-2.6 and Squid-3.0. WCCPv2 configuration is similar to the WCCPv1 configuration. The directives in squid.conf are slightly different but are well documented within that file. Router configuration for WCCPv2 is identical except that you must not force the router to use WCCPv1 (it defaults to WCCPv2 unless you tell it otherwise).
Configuring your router
There are two different methods of configuring WCCP on Cisco routers. The first method is for routers that only support V1.0 of the protocol. The second is for routers that support both.
IOS Version 11.x
For very old versions of IOS you will need this config:
conf t wccp enable ! interface [Interface carrying Outgoing Traffic]x/x ! ip wccp web-cache redirect ! CTRL Z copy running-config startup-config
IOS Version 12.x
Some of the early versions of 12.x do not have the 'ip wccp version' command. You will need to upgrade your IOS version to use V1.0.
conf t ip wccp version 1 ip wccp web-cache redirect-list 150 ! interface [Interface carrying Outgoing/Incoming Traffic]x/x ip wccp web-cache redirect out|in ! CTRL Z copy running-config startup-config
IOS defaults to using WCCP version 2 if you do not explicitly specify a version.
Replace 150 with an access list number (either standard or extended) which lists IP addresses which you do not wish to be transparently redirected to your cache. If you wish to redirect all client traffic then do not add the ip wccp web-cache redirect-list command.
WCCP is smart enough that it will automatically bypass your cache from the redirection process, ensuring that your cache does not become redirected back to itself.
IOS 12.x problems
Some people report problems with WCCP and IOS 12.x.
If you find that the redirection does not work properly, try turning off CEF and disabling the route-cache on the interface. WCCP has a nasty habit of sometimes badly interacting with some other cisco features. Note that both features result in quite significant performance penalties, so only disable them if there is no other way.
IOS firewall inspection can also cause problems with WCCP and is worth disabling if you experience problems.
Configuring you cisco PIX to run WCCP
Cisco PIX is very easy to configure. The configuration format is almost identical to a cisco router, which is hardly surprising given many of the features are common to both. Like cisco router's, PIX supports the GRE encapsulation method of traffic redirection.
Merely put this in your global config:
wccp web-cache wccp interface inside web-cache redirect in
There is no interface specific configuration required.
Note that the only supported configuration of WCCP on the PIX is with the WCCP cache engine on the inside of the network (most people want this anyway). The PIX only supports WCCPv2 and not WCCPv1. There are some other limitations of this WCCP support, but this feature has been tested and proven to work with a simple PIX config using version 7.2(1) and Squid-2.6.
You can find more information about configuring this and how the PIX handles WCCP at http://www.cisco.com/en/US/customer/products/ps6120/products_configuration_guide_chapter09186a0080636f31.html#wp1094445
Cache/Host configuration of WCCP
There are two parts to this. Firstly you need to configure Squid to talk WCCP, and additionally you need to configure your operating system to decapsulate the WCCP traffic as it comes from the router.
Configuring Squid to talk WCCP
The configuration directives for this are well documented in squid.conf.
For Squid-2.5 which supports only WCCPv1, you need these directives:
wccp_router a.b.c.d wccp_version 4 wccp_incoming_address e.f.g.h wccp_outgoing_address e.f.g.h
- a.b.c.d is the address of your WCCP router
- e.f.g.h is the address that you want your WCCP requests to come and go from. If you are not sure or have only a single IP address on your cache, do not specify these.
For Squid-2.6 and Squid-3.0:
Note: do NOT configure both the WCCPv1 directives (wccp_*) and WCCPv2 (wccp2_*) options at the same time in your squid.conf. Squid 2.6 and above only supports configuration of one version at a time, either WCCPv1 or WCCPv2. With no configuration, the unconfigured version(s) are not enabled. Unpredictable things might happen if you configure both sets of options.
If you are doing WCCPv1, then the configuration is the same as for Squid-2.5. If you wish to run WCCPv2, then you will want something like this:
wccp2_router a.b.c.d wccp2_version 4 wccp2_forwarding_method 1 wccp2_return_method 1 wccp2_service standard 0 wccp2_outgoing_address e.f.g.h
Use a wccp_forwarding_method and wccp2_return_method of 1 if you are using a router and GRE/WCCP tunnel, or 2 if you are using a Layer 3 switch to do the forwarding.
Your wccp2_service should be set to standard 0 which is the standard HTTP redirection.
- a.b.c.d is the address of your WCCP router
- e.f.g.h is the address that you want your WCCP requests to come and go from. If you are not sure or have only a single IP address on your cache, do not specify these parameters as they are usually not needed.
Now you need to read on for the details of configuring your operating system to support WCCP.
Configuring FreeBSD
FreeBSD first needs to be configured to receive and strip the GRE encapsulation from the packets from the router. To do this you will need to patch and recompile your kernel. The steps depend on your kernel version.
FreeBSD-3.x
Apply the patch for FreeBSD-3.x kernels:
# cd /usr/src # patch -s < /tmp/gre.patch
Download gre.c for FreeBSD-3.x. Save this file as /usr/src/sys/netinet/gre.c.
Add "options GRE" to your kernel config file and rebuild your kernel. Note, the opt_gre.h file is created when you run config. Once your kernel is installed you will need to configure FreeBSD for interception proxying (see below).
====== FreeBSD 4.0 through 4.7 ======= The procedure is nearly identical to the above for 3.x, but the source files are a little different.
Apply the most appropriate patch file from the list of patches for 4.x kernels.
Download gre.c for FreeBSD-3.x. Save this file as /usr/src/sys/netinet/gre.c.
Add "options GRE" to your kernel config file and rebuild your kernel. Note, the opt_gre.h file is created when you run config. Once your kernel is installed you will need to configure FreeBSD for interception proxying.
FreeBSD 4.8 and later
The operating system now comes standard with some GRE support. You need to make a kernel with the GRE code enabled:
pseudo-device gre
And then configure the tunnel so that the router's GRE packets are accepted:
# ifconfig gre0 create # ifconfig gre0 $squid_ip $router_ip netmask 255.255.255.255 up # ifconfig gre0 tunnel $squid_ip $router_ip # route delete $router_ip
Alternatively, you can try it like this:
ifconfig gre0 create ifconfig gre0 $squid_ip 10.20.30.40 netmask 255.255.255.255 link1 tunnel $squid_ip $router_ip up
Since the WCCP/GRE tunnel is one-way, Squid never sends any packets to 10.20.30.40 and that particular address doesn't matter.
FreeBSD 6.x and later
FreeBSD 6.x has GRE support in kernel by default. It also supports both WCCPv1 and WCCPv2. From gre(4) manpage: "Since there is no reliable way to distinguish between WCCP versions, it should be configured manually using the link2 flag. If the link2 flag is not set (default), then WCCP version 1 is selected." The rest of configuration is just as it was in 4.8+
Standard Linux GRE Tunnel
Linux 2.2 kernels already support GRE, as long as the GRE module is compiled into the kernel. However, WCCP uses a slightly non-standard GRE encapsulation format and Linux versions earlier than 2.6.9 may need to be patched to support WCCP. That is why we strongly recommend you run a recent version of the Linux kernel, as if you are you simply need to modprobe the module to gain it's functionality.
Ensure that the GRE code is either built as static or as a module by chosing the appropriate option in your kernel config. Then rebuild your kernel. If it is a module you will need to:
modprobe ip_gre
The next step is to tell Linux to establish an IP tunnel between the router and your host.
ip tunnel add wccp0 mode gre remote <Router-External-IP> local <Host-IP> dev <interface> ip addr add <Host-IP>/32 dev wccp0 ip link set wccp0 up
or if using the older network tools
iptunnel add wccp0 mode gre remote <Router-External-IP> local <Host-IP> dev <interface> ifconfig wccp0 <Host-IP> netmask 255.255.255.255 up
<Router-External-IP> is the extrnal IP address of your router that is intercepting the HTTP packets. <Host-IP> is the IP address of your cache, and <interface> is the network interface that receives those packets (probably eth0).
Note that WCCP is incompatible with the rp_filter function in Linux and you must disable this if enabled. If enabled any packets redirected by WCCP and intercepted by Netfilter/iptables will be silendly discarded by the TCP/IP stack due to their "unexpected" origin from the gre interface.
echo 0 >/proc/sys/net/ipv4/conf/wccp0/rp_filter
And then you need to tell the Linux NAT kernel to redirect incoming traffic on the wccp0 interface to Squid
iptables -t nat -A PREROUTING -i wccp0 -j REDIRECT --redirect-to 3128
WCCP Specific Module
This module is not part of the standard Linux distributon. It needs to be compiled as a module and loaded on your system to function. Do not attempt to build this in as a static part of your kernel.
This module is most suited to Linux kernels prior to 2.6.9. Kernels more recent than that support WCCP with the ip_gre module that comes with the kernel.
Download the Linux WCCP module and compile it as you would any Linux network module. In most cases this is just to run make install in the module source directory. Note: Compiling kernel modules requires the kernel development files to be installed.
Finally you will need to load the module:
modprobe ip_wccp
If the WCCP redirected traffic is coming on on a different interface than where return traffic to the clients are sent then you may also need to disable the rp_filter function. If enabled any packets redirected by WCCP will be silendly discarded by the TCP/IP stack due to their "unexpected" origin from the other interface.
echo 0 >/proc/sys/net/ipv4/conf/eth0/rp_filter
And finally set up Netfilter/iptables to redirect the intercepted traffic to your Squid port
iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --redirect-to 3128
TProxy Interception
TProxy is a new feature in Squid-2.6 which enhances standard Interception Caching so that it further hides the presence of your cache. Normally with Interception Caching the remote server sees your cache engine as the source of the HTTP request. TProxy takes this a step further by hiding your cache engine so that the end client is seen as the source of the request (even though really they aren't).
Here are some notes by StevenWilton on how to get TProxy working properly:
I've got TProxy + WCCPv2 working with squid 2.6. There are a few things that need to be done:
- The kernel and iptables need to be patched with the tproxy patches (and the tproxy include file needs to be placed in /usr/include/linux/netfilter_ipv4/ip_tproxy.h or include/netfilter_ipv4/ip_tproxy.h in the squid src tree).
- The iptables rule needs to use the TPROXY target (instead of the REDIRECT target) to redirect the port 80 traffic to the proxy. ie:
iptables -t tproxy -A PREROUTING -i eth0 -p tcp -m tcp --dport 80 -j TPROXY --on-port 80
- The kernel must strip the GRE header from the incoming packets (either using the ip_wccp module, or by having a GRE tunnel set up in Linux pointing at the router (no GRE setup is required on the router)).
- Two WCCP services must be used, one for outgoing traffic and an inverse for return traffic from the Internet. We use the following WCCP definitions in squid.conf:
wccp2_service dynamic 80 wccp2_service_info 80 protocol=tcp flags=src_ip_hash priority=240 ports=80 wccp2_service dynamic 90 wccp2_service_info 90 protocol=tcp flags=dst_ip_hash,ports_source priority=240 ports=80
It is highly recommended that the above definitions be used for the two WCCP services, otherwise things will break if you have more than one cache (specifically, you will have problems when the name of a web server resolves to multiple ip addresses).
- The http port that you are redirecting to must have the transparent and tproxy options enabled as follows (modify the port as appropriate): http_port 80 transparent tproxy
There must be a tcp_outgoing address defined. This will need to be valid to satisfy any non-tproxied connections.
On the router, you need to make sure that all traffic going to/from the customer will be processed by both WCCP rules. The way we have
implemented this is to apply WCCP service 80 to all traffic coming in from a customer-facing interface, and WCCP service 90 applied to all traffic going out a customer-facing interface. We have also applied the WCCP exclude-in rule to all traffic coming in from the proxy-facing interface although this will probably not normally be necessary if all your caches have registered to the WCCP router. ie:
interface GigabitEthernet0/3.100 description ADSL customers encapsulation dot1Q 502 ip address x.x.x.x y.y.y.y ip wccp 80 redirect in ip wccp 90 redirect out interface GigabitEthernet0/3.101 description Dialup customers encapsulation dot1Q 502 ip address x.x.x.x y.y.y.y ip wccp 80 redirect in ip wccp 90 redirect out interface GigabitEthernet0/3.102 description proxy servers encapsulation dot1Q 506 ip address x.x.x.x y.y.y.y ip wccp redirect exclude in
- It's highly recommended to turn httpd_accel_no_pmtu_disc on in the squid.conf.
The homepage for the TProxy software is at balabit.com.
Complete
By now if you have followed the documentation you should have a working Interception Caching system. Verify this by unconfiguring any proxy settings in your browser and surfing out through your system. You should see entries appearing in your access.log for the sites you are visiting in your browser. If your system does not work as you would expect, you will want to read on to our troubleshooting section below.
Troubleshooting and Questions
It doesn't work. How do I debug it?
- Start by testing your cache. Check to make sure you have configured Squid with the right configure options - squid -v will tell you what options Squid was configured with.
- Can you manually configure your browser to talk to the proxy port? If not, you most likely have a proxy configuration problem.
- Have you tried unloading ALL firewall rules on your cache and/or the inside address of your network device to see if that helps? If your router or cache are inadvertently blocking or dropping either the WCCP control traffic or the GRE, things won't work.
- If you are using WCCP on a cisco router or switch, is the router seeing your cache? Use the command show ip wccp web-cache detail
- Look in your logs both in Squid (cache.log), and on your router/switch where a show log will likely tell you if it has detected your cache engine registering.
On your Squid cache, set debug_options ALL,1 80,3 or for even more detail debug_options ALL,1 80,5 . The output of this will be in your cache.log.
- On your cisco router, turn on WCCP debugging:
router#term mon router#debug ip wccp events WCCP events debugging is on router#debug ip wccp packets WCCP packet info debugging is on router#
!Do not forget to turn this off after you have finished your debugging session as this imposes a performance hit on your router.
- Run tcpdump or ethereal on your cache interface and look at the traffic, try and figure out what is going on. You should be seeing UDP packets to and from port 2048 and GRE encapsulated traffic with TCP inside it. If you are seeing messages about "protocol not supported" or "invalid protocol", then your GRE or WCCP module is not loaded, and your cache is rejecting the traffic because it does not know what to do with it.
- Have you configured both wccp_ and wccp2_ options? You should only configure one or the other and NOT BOTH.
- The most common problem people have is that the router and cache are talking to each other and traffic is being redirected from the router but the traffic decapsulation process is either broken or (as is almost always the case) misconfigured. This is often a case of your traffic rewriting rules on your cache not being applied correctly (see section 2 above - Getting your traffic to the right port on your Squid Cache).
- Run the most recent General Deployment (GD) release of the software train you have on your router or switch. Broken IOS's can also result in broken redirection. A known good version of IOS for routers with no apparent WCCP breakage is 12.3(7)T12. There was extensive damage to WCCP in 12.3(8)T up to and including early 12.4(x) releases. 12.4(8) is known to work fine as long as you are not doing ip firewall inspection on the interface where your cache is located.
If none of these steps yield any useful clues, post the vital information including the versions of your router, proxy, operating system, your traffic redirection rules, debugging output and any other things you have tried to the squid-users mailing list.
Why can't I use authentication together with interception proxying?
Interception Proxying works by having an active agent (the proxy) where there should be none. The browser is not expecting it to be there, and it's for all effects and purposes being cheated or, at best, confused. As an user of that browser, I would require it not to give away any credentials to an unexpected party, wouldn't you agree? Especially so when the user-agent can do so without notifying the user, like Microsoft browsers can do when the proxy offers any of the Microsoft-designed authentication schemes such as NTLM (see ../ProxyAuthentication and NegotiateAuthentication).
In other words, it's not a squid bug, but a browser security feature.
Can I use ''proxy_auth'' with interception?
No, you cannot. See the answer to the previous question. With interception proxying, the client thinks it is talking to an origin server and would never send the Proxy-authorization request header.
"Connection reset by peer" and Cisco policy routing
Fyodor has tracked down the cause of unusual "connection reset by peer" messages when using Cisco policy routing to hijack HTTP requests.
When the network link between router and the cache goes down for just a moment, the packets that are supposed to be redirected are instead sent out the default route. If this happens, a TCP ACK from the client host may be sent to the origin server, instead of being diverted to the cache. The origin server, upon receiving an unexpected ACK packet, sends a TCP RESET back to the client, which aborts the client's request.
To work around this problem, you can install a static route to the null0 interface for the cache address with a higher metric (lower precedence), such as 250.
Then, when the link goes down, packets from the client just get dropped instead of sent out the default route. For example, if 1.2.3.4 is the IP address of your Squid cache, you may add:
ip route 1.2.3.4 255.255.255.255 Null0 250
This appears to cause the correct behaviour.
Configuration Examples contributed by users who have working installations
Linux 2.0.33 and Cisco policy-routing
By Brian Feeny
Here is how I have Interception proxying working for me, in an environment where my router is a Cisco 2501 running IOS 11.1, and Squid machine is running Linux 2.0.33.
Many thanks to the following individuals and the squid-users list for helping me get redirection and interception proxying working on my Cisco/Linux box.
- Lincoln Dale
- Riccardo Vratogna
- Mark White
First, here is what I added to my Cisco, which is running IOS 11.1. In IOS 11.1 the route-map command is "process switched" as opposed to the faster "fast-switched" route-map which is found in IOS 11.2 and later. Even more recent versions CEF switch for much better performance.
! interface Ethernet0 description To Office Ethernet ip address 208.206.76.1 255.255.255.0 no ip directed-broadcast no ip mroute-cache ip policy route-map proxy-redir ! access-list 110 deny tcp host 208.206.76.44 any eq www access-list 110 permit tcp any any eq www route-map proxy-redir permit 10 match ip address 110 set ip next-hop 208.206.76.44
So basically from above you can see I added the "route-map" declaration, and an access-list, and then turned the route-map on under int e0 "ip policy route-map proxy-redir" The host above: 208.206.76.44, is the ip number of my squid host.
My squid box runs Linux, so I had to configure my kernel (2.0.33) like this:
# # Networking options # CONFIG_FIREWALL=y # CONFIG_NET_ALIAS is not set CONFIG_INET=y CONFIG_IP_FORWARD=y CONFIG_IP_MULTICAST=y CONFIG_SYN_COOKIES=y # CONFIG_RST_COOKIES is not set CONFIG_IP_FIREWALL=y # CONFIG_IP_FIREWALL_VERBOSE is not set CONFIG_IP_MASQUERADE=y # CONFIG_IP_MASQUERADE_IPAUTOFW is not set CONFIG_IP_MASQUERADE_ICMP=y CONFIG_IP_TRANSPARENT_PROXY=y CONFIG_IP_ALWAYS_DEFRAG=y # CONFIG_IP_ACCT is not set CONFIG_IP_ROUTER=y
You will need Firewalling and Transparent Proxy turned on at a minimum.
Then some ipfwadm stuff:
# Accept all on loopback ipfwadm -I -a accept -W lo # Accept my own IP, to prevent loops (repeat for each interface/alias) ipfwadm -I -a accept -P tcp -D 208.206.76.44 80 # Send all traffic destined to port 80 to Squid on port 3128 ipfwadm -I -a accept -P tcp -D 0/0 80 -r 3128
it accepts packets on port 80 (redirected from the Cisco), and redirects them to 3128 which is the port my squid process is sitting on. I put all this in /etc/rc.d/rc.local
I am using [/Versions/1.1/1.1.20/ v1.1.20 of Squid] with Henrik's patch installed.
You will want to install this patch if using a setup similar to mine.
Interception on Linux with Squid and the Browser on the same box
by Joshua N Pritikin
#!/bin/sh iptables -t nat -F # clear table # normal transparent proxy iptables -t nat -A PREROUTING -p tcp -i eth0 --dport 80 -j REDIRECT --to-port 8080 # handle connections on the same box (192.168.0.2 is a loopback instance) gid=`id -g proxy` iptables -t nat -A OUTPUT -p tcp --dport 80 -m owner --gid-owner $gid -j ACCEPT iptables -t nat -A OUTPUT -p tcp --dport 80 -j DNAT --to-destination 192.168.0.2:8080
Interception Caching with FreeBSD by by DuaneWessels
I set out yesterday to make interception caching work with Squid-2 and FreeBSD. It was, uh, fun.
It was relatively easy to configure a cisco to divert port 80 packets to my FreeBSD box. Configuration goes something like this:
access-list 110 deny tcp host 10.0.3.22 any eq www access-list 110 permit tcp any any eq www route-map proxy-redirect permit 10 match ip address 110 set ip next-hop 10.0.3.22 int eth2/0 ip policy route-map proxy-redirect
Here, 10.0.3.22 is the IP address of the FreeBSD cache machine.
Once I have packets going to the FreeBSD box, I need to get the kernel to deliver them to Squid. I started on FreeBSD-2.2.7, and then downloaded
IPFilter. This was a dead end for me. The IPFilter distribution includes patches to the FreeBSD kernel sources, but many of these had conflicts. Then I noticed that the IPFilter page says "It comes as a part of [FreeBSD-2.2 and later]." Fair enough. Unfortunately, you can't hijack connections with the FreeBSD-2.2.X IPFIREWALL code (ipfw), and you can't (or at least I couldn't) do it with natd either.
FreeBSD-3.0 has much better support for connection hijacking, so I suggest you start with that. You need to build a kernel with the following options:
options IPFIREWALL options IPFIREWALL_FORWARD
Next, its time to configure the IP firewall rules with ipfw. By default, there are no "allow" rules and all packets are denied. I added these commands to /etc/rc.local just to be able to use the machine on my network:
ipfw add 60000 allow all from any to any
But we're still not hijacking connections. To accomplish that, add these rules:
ipfw add 49 allow tcp from 10.0.3.22 to any ipfw add 50 fwd 127.0.0.1 tcp from any to any 80
The second line (rule 50) is the one which hijacks the connection. The first line makes sure we never hit rule 50 for traffic originated by the local machine.
This prevents forwarding loops.
Note that I am not changing the port number here. That is, port 80 packets are simply diverted to Squid on port 80. My Squid configuration is:
http_port 80 httpd_accel_host virtual httpd_accel_port 80 httpd_accel_with_proxy on httpd_accel_uses_host_header on
If you don't want Squid to listen on port 80 (because that requires root privileges) then you can use another port. In that case your ipfw redirect rule looks like:
ipfw add 50 fwd 127.0.0.1,3128 tcp from any to any 80
and the squid.conf lines are:
http_port 3128 httpd_accel_host virtual httpd_accel_port 80 httpd_accel_with_proxy on httpd_accel_uses_host_header on
Interception Caching with Linux 2.6.18, ip_gre, Squid-2.6 and cisco IOS 12.4(6)T2 by ReubenFarrelly
Here's how I do it. My system is a Fedora Core 5 based system, and I am presently running Squid-2.6 with WCCPv2. The cache is located on the same subnet as my router and client PC's.
My Squid proxy is configured like this:
- In /etc/sysconfig/iptables:
-A PREROUTING -s 192.168.0.0/255.255.255.0 -d ! 192.168.0.0/255.255.255.0 -i gre0 -p tcp -m tcp --dport 80 -j DNAT --to-destination 192.168.0.5:3128
- In /etc/sysctl.conf
# Controls IP packet forwarding net.ipv4.ip_forward = 1 # Controls source route verification net.ipv4.conf.default.rp_filter = 0 # Do not accept source routing net.ipv4.conf.default.accept_source_route = 0
- In /etc/sysconfig/network-scripts/ifcfg-gre0 I have this:
DEVICE=gre0 BOOTPROTO=static IPADDR=172.16.1.6 NETMASK=255.255.255.252 ONBOOT=yes IPV6INIT=no
By configuring the interface like this, it automatically comes up at boot, and the module is loaded automatically. I can additionally ifup or ifdown the interface at will. This is the standard Fedora way of configuring a GRE interface.
- I build customised kernels for my hardware, so I have this set in my kernel .config:
CONFIG_NET_IPGRE=m
However you can optionally build the GRE tunnel into your kernel by selecting 'y' instead.
My router runs cisco IOS 12.4(6)T2 ADVSECURITY, and I have a sub-interface on my FastEthernet port as the switch-router link is a trunk:
! ip wccp web-cache ip cef ! interface FastEthernet0/0.2 description Link to internal LAN encapsulation dot1Q 2 ip address 192.168.0.1 255.255.255.0 ip access-group outboundfilters in no ip proxy-arp ip wccp web-cache redirect in ip inspect fw-rules in ip nat inside ip virtual-reassembly no snmp trap link-status !
Note: in this release of IOS software that I am running (12.4(6)T2 and 12.4(9)T) you MUST NOT have ip inspect fw-rules in on the same interface as your ip wccp web-cache redirect statement. I opened a TAC case on this as it is clearly a bug and regression from past behaviour where WCCP did work fine with IP inspection configured on the same interface. This turned out to be confirmed as a bug in IOS, which is documented as CSCse55959. The cause of this is TCP fragments of traffic being dropped by the ip inspection process - fragments which should not even be inspected in the first place. This bug does not occur on the PIX which works fine with the same network design and configuration. If you would like this bug fixed, please open a cisco TAC case referencing this bug report and encourage cisco to fix it.
If you are running WCCPv1 then you would additionally add:
ip wccp version 1
to your router configuration.
What does it all look like?
- iptables rules looks like this:
[root@tornado squid]# iptables -t nat -L Chain PREROUTING (policy ACCEPT) target prot opt source destination DNAT tcp -- network.reub.net/24 !network.reub.net/24 tcp dpt:http to:192.168.0.5:3128
- my squid.conf looks like this:
http_port tornado.reub.net:3128 transparent wccp2_router router.reub.net wccp2_forwarding_method 1 wccp2_return_method 1 wccp2_service standard 0
- my operating system runs a GRE tunnel which looks like this:
[root@tornado squid]# ifconfig gre0 gre0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 inet addr:172.16.1.6 Mask:255.255.255.252 UP RUNNING NOARP MTU:1476 Metric:1 RX packets:449 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:20917 (20.4 KiB) TX bytes:0 (0.0 b)
- my router sees the cache engine, and tells me how much traffic it has switched through to the cache:
router#show ip wccp web-cache Global WCCP information: Router information: Router Identifier: 172.16.1.5 Protocol Version: 2.0 Service Identifier: web-cache Number of Service Group Clients: 1 Number of Service Group Routers: 1 Total Packets s/w Redirected: 1809 Process: 203 Fast: 1606 CEF: 0 Redirect access-list: -none- Total Packets Denied Redirect: 0 Total Packets Unassigned: 0 Group access-list: -none- Total Messages Denied to Group: 0 Total Authentication failures: 0 Total Bypassed Packets Received: 0 router# router#show ip wccp web-cache detail WCCP Client information: WCCP Client ID: 192.168.0.5 Protocol Version: 2.0 State: Usable Initial Hash Info: 00000000000000000000000000000000 00000000000000000000000000000000 Assigned Hash Info: FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF Hash Allotment: 256 (100.00%) Packets s/w Redirected: 449 Connect Time: 13:51:42 Bypassed Packets Process: 0 Fast: 0 CEF: 0 router#
Joe Cooper's Patch
Joe Cooper has a patch for Linux 2.2.18 kernel on his Squid page.
Further information about configuring Interception Caching with Squid
ReubenFarrelly has written a fairly comprehensive but somewhat incomplete guide to configuring WCCP with cisco routers on his website. You can find it at www.reub.net.
DuaneWessels has written an O'Reilly book about Web Caching which is an invaluable reference guide for Squid (and in fact non-Squid) cache administrators. A sample chapter on "Interception Proxying and Caching" from his book is up online, at http://www.oreilly.com/catalog/webcaching/chapter/ch05.html.
Configuring Other Operating Systems
If you have managed to configure your operating system to support WCCP with Squid please contact us or add the details to this wiki so that others may benefit.
Issues with HotMail
Recent changes at Hotmail.com and has led to some users receiving a blank page in response to a login request when browsing through a proxy operating in interception, or transparent, mode. This is due to Hotmail incorrectly responding with Transfer-Encoding encoded response when the HTTP/1.0 request has an Accept-Encoding header. (Transfer-Encoding absolutely REQUIRES HTTP/1.1 and is forbidden within HTTP/1.0)
A workaround is simply to add the following three lines to /etc/squid/squid.conf:
acl hotmail_domains dstdomain .hotmail.msn.com header_access Accept-Encoding deny hotmail_domains
(para-quoted by HenrikNordström from http://www.swelltech.com/news.html)
Back to the SquidFaq
Contents
Contributors: Glenn Chisholm.
Does Squid support SNMP?
Yes. You will need to configure Squid with SNMP support and edit your squid.conf file with appropriate access conrols.
Enabling SNMP in Squid
To use SNMP, it must first be enabled with the configure script, and squid rebuilt. To enable is first run the script:
./configure --enable-snmp [ ... other configure options ]
Next, recompile after cleaning the source tree :
make clean make all make install
Once the compile is completed and the new binary is installed the squid.conf file needs to be configured to allow access; the default is to deny all requests.
You may also want to move the Squid mib.txt into your SNMP MIB directory so that you can view the output as text rather than raw OID numbers.
Configuring Squid
To configure SNMP first specify a list of communities that you would like to allow access by using a standard acl of the form:
acl aclname snmp_community string
For example:
acl snmppublic snmp_community public acl snmpjoebloggs snmp_community joebloggs
This creates two acl's, with two different communities, public and joebloggs. You can name the acl's and the community strings anything that you like.
To specify the port that the agent will listen on modify the "snmp_port" parameter, it is defaulted to 3401. The port that the agent will forward requests that can not be furfilled by this agent to is set by "forward_snmpd_port" it is defaulted to off. It must be configured for this to work. Remember that as the requests will be originating from this agent you will need to make sure that you configure your access accordingly.
To allow access to Squid's SNMP agent, define an snmp_access ACL with the community strings that you previously defined. For example:
snmp_access allow snmppublic localhost snmp_access deny all
The above will allow anyone on the localhost who uses the community public to access the agent. It will deny all others access.
If you do not define any snmp_access ACL's, then SNMP access is denied by default.
Finally squid allows to you to configure the address that the agent will bind to for incoming and outgoing traffic. These are defaulted to 0.0.0.0, changing these will cause the agent to bind to a specific address on the host, rather than the default which is all.
snmp_incoming_address 0.0.0.0 snmp_outgoing_address 0.0.0.0
How can I query the Squid SNMP Agent
You can test if your Squid supports SNMP with the snmpwalk program (snmpwalk is a part of the NET-SNMP project). Note that you have to specify the SNMP port, which in Squid defaults to 3401.
snmpwalk -m /usr/share/squid/mib.txt -v2c -c communitystring hostname:3401 .1.3.6.1.4.1.3495.1.1
If it gives output like:
enterprises.nlanr.squid.cacheSystem.cacheSysVMsize = 7970816 enterprises.nlanr.squid.cacheSystem.cacheSysStorage = 2796142 enterprises.nlanr.squid.cacheSystem.cacheUptime = Timeticks: (766299) 2:07:42.99
or
SNMPv2-SMI::enterprises.3495.1.1.1.0 = INTEGER: 460 SNMPv2-SMI::enterprises.3495.1.1.2.0 = INTEGER: 1566452 SNMPv2-SMI::enterprises.3495.1.1.3.0 = Timeticks: (584627) 1:37:26.27
then it is working ok, and you should be able to make nice statistics out of it.
For an explanation of what every string (OID) does, you should refer to the [/SNMP/ Squid SNMP web pages].
What can I use SNMP and Squid for?
There are a lot of things you can do with SNMP and Squid. It can be useful in some extent for a longer term overview of how your proxy is doing. It can also be used as a problem solver. For example: how is it going with your filedescriptor usage? or how much does your LRU vary along a day. Things you can't monitor very well normally, aside from clicking at the cachemgr frequently. Why not let MRTG do it for you?
How can I use SNMP with Squid?
There are a number of tools that you can use to monitor Squid via SNMP. Many people use MRTG. Another good combination is NET-SNMP plus RRDTool. You might be able to find more information at the [/SNMP/ Squid SNMP web pages] or ircache rrdtool scripts
Where can I get more information/discussion about Squid and SNMP?
There is an archive of messages from the cache-snmp@ircache.net mailing list mailing list.
Subscriptions should be sent to: cache-snmp-request@ircache.net .
Monitoring Squid with MRTG
Some people use MRTG to query Squid through its SNMP interface.
To get instruction on using MRTG with Squid please visit these pages:
Cache Monitoring - How to set up your own monitoring by DFN-Cache
Using MRTG to monitor Squid by ACME Consulting
Using MRTG for Squid monitoring Desire II caching workshop session by Matija Grabnar
How do I monitor my Squid 2 cache using MRT by The National Janet Web Cache Service
Further examples of Squid MRTG configurations can be found here:
MRTG HOWTO Collection / Squid from MRTG
using mrtg to monitor Squid from MRTG
MRTG & Squid by Glenn Chisholm
Braindump by Joakim Recht
Monitoring Squid with Cacti
Cacti is a software tool based on the same concepts as MRTG, but with a more user-friendly interface and infrastructure. Its home is at http://www.cacti.net/. It allows to use pre-defined templates to facilitate deployment. Templates for squid can be found on the cacti forums
Include: Nothing found for "^Back to the"!
Contents
What are the new features in squid 2.X?
- persistent connections.
- Lower VM usage; in-transit objects are not held fully in memory.
- Totally independent swap directories.
- Customizable error texts.
- FTP supported internally; no more ftpget.
- Asynchronous disk operations (optional, requires pthreads library).
- Internal icons for FTP and gopher directories.
- snprintf() used everywhere instead of sprintf().
- SNMP
- URN support
- Routing requests based on AS numbers.
- ...and many more!
How do I configure 'ssl_proxy' now?
By default, Squid connects directly to origin servers for SSL requests. But if you must force SSL requests through a parent, first tell Squid it can not go direct for SSL:
acl SSL method CONNECT never_direct allow SSL
With this in place, Squid should pick one of your parents to use for SSL requests. If you want it to pick a particular parent, you must use the cache_peer_access configuration:
cache_peer parent1 parent 3128 3130 cache_peer parent2 parent 3128 3130 cache_peer_access parent2 allow !SSL
The above lines tell Squid to NOT use parent2 for SSL, so it should always use parent1.
Adding a new cache disk
Simply add your new cache_dir line to squid.conf, then run squid -z again. Squid will create swap directories on the new disk and leave the existing ones in place.
How do I configure proxy authentication?
Authentication is handled via external processes. Arjan's proxy auth page describes how to set it up. Some simple instructions are given below as well.
- We assume you have configured an ACL entry with proxy_auth, for example:
acl foo proxy_auth REQUIRED http_access allow foo
You will need to compile and install an external authenticator program. Most people will want to use ncsa_auth. The source for this program is included in the source distribution, in the helpers/basic_auth/NCSA directory.
% cd helpers/basic_auth/NCSA % make % make install
You should now have an ncsa_auth program in the <prefix>/libexec/ directory where the helpers for squid lives (usually /usr/local/squid/libexec unless overridden by configure flags). You can also select with the --enable-basic-auth-helpers=... option which helpers should be installed by default when you install Squid.
- You may need to create a password file. If you have been using proxy authentication before, you probably already have such a file. You can get Apache's htpasswd program. Pick a pathname for your password file. We will assume you will want to put it in the same directory as your squid.conf.
Configure the external authenticator in squid.conf. For ncsa_auth you need to give the pathname to the executable and the password file as an argument. For example:
auth_param basic program /usr/local/squid/libexec/ncsa_auth /usr/local/squid/etc/passwd
After all that, you should be able to start up Squid. If we left something out, or haven't been clear enough, please let us know ( squid-faq@squid-cache.org ).
Why does proxy-auth reject all users after upgrading from Squid-2.1 or earlier?
The ACL for proxy-authentication has changed from:
acl foo proxy_auth timeout
to:
acl foo proxy_auth username
Please update your ACL appropriately - a username of REQUIRED will permit all valid usernames. The timeout is now specified with the configuration option:
auth_param basic credentialsttl timeout
Delay Pools
by David Luyer.
Delay pools provide a way to limit the bandwidth of certain requests based on any list of criteria. The idea came from a Western Australian university who wanted to restrict student traffic costs (without affecting staff traffic, and still getting cache and local peering hits at full speed). There was some early Squid 1.0 code by Central Network Services at Murdoch University, which I then developed (at the University of Western Australia) into a much more complex patch for Squid 1.0 called "DELAY_HACK." I then tried to code it in a much cleaner style and with slightly more generic options than I personally needed, and called this "delay pools" in Squid 2. I almost completely recoded this in Squid 2.2 to provide the greater flexibility requested by people using the feature.
To enable delay pools features in Squid 2.2, you must use the --enable-delay-pools configure option before compilation.
Terminology for this FAQ entry:
- pool
- a collection of bucket groups as appropriate to a given class
- bucket group
- a group of buckets within a pool, such as the per-host bucket group, the per-network bucket group or the aggregate bucket group (the aggregate bucket group is actually a single bucket)
- bucket
- an individual delay bucket represents a traffic allocation which is replenished at a given rate (up to a given limit) and causes traffic to be delayed when empty
- class
- the class of a delay pool determines how the delay is applied, ie, whether the different client IPs are treated seperately or as a group (or both)
- class 1
- a class 1 delay pool contains a single unified bucket which is used for all requests from hosts subject to the pool
- class 2
- a class 2 delay pool contains one unified bucket and 255 buckets, one for each host on an 8-bit network (IPv4 class C)
- class 3
- contains 255 buckets for the subnets in a 16-bit network, and individual buckets for every host on these networks (IPv4 class B )
Delay pools allows you to limit traffic for clients or client groups, with various features:
can specify peer hosts which aren't affected by delay pools, ie, local peering or other 'free' traffic (with the no-delay peer option).
- delay behavior is selected by ACLs (low and high priority traffic, staff vs students or student vs authenticated student or so on).
- each group of users has a number of buckets, a bucket has an amount coming into it in a second and a maximum amount it can grow to; when it reaches zero, objects reads are deferred until one of the object's clients has some traffic allowance.
- any number of pools can be configured with a given class and any set of limits within the pools can be disabled, for example you might only want to use the aggregate and per-host bucket groups of class 3, not the per-network one.
This allows options such as creating a number of class 1 delay pools and allowing a certain amount of bandwidth to given object types (by using URL regular expressions or similar), and many other uses I'm sure I haven't even though of beyond the original fair balancing of a relatively small traffic allocation across a large number of users.
There are some limitations of delay pools:
- delay pools are incompatible with slow aborts; quick abort should be set fairly low to prevent objects being retrived at full speed once there are no clients requesting them (as the traffic allocation is based on the current clients, and when there are no clients attached to the object there is no way to determine the traffic allocation).
- delay pools only limits the actual data transferred and is not inclusive of overheads such as TCP overheads, ICP, DNS, icmp pings, etc.
- it is possible for one connection or a small number of connections to take all the bandwidth from a given bucket and the other connections to be starved completely, which can be a major problem if there are a number of large objects being transferred and the parameters are set in a way that a few large objects will cause all clients to be starved (potentially fixed by a currently experimental patch).
How can I limit Squid's total bandwidth to, say, 512 Kbps?
acl all src 0.0.0.0/0.0.0.0 # might already be defined delay_pools 1 delay_class 1 1 delay_access 1 allow all delay_parameters 1 64000/64000 # 512 kbits == 64 kbytes per second
For an explanation of these tags please see the configuration file.
The 1 second buffer (max = restore = 64kbytes/sec) is because a limit is requested, and no responsiveness to a busrt is requested. If you want it to be able to respond to a burst, increase the aggregate_max to a larger value, and traffic bursts will be handled. It is recommended that the maximum is at least twice the restore value - if there is only a single object being downloaded, sometimes the download rate will fall below the requested throughput as the bucket is not empty when it comes to be replenished.
How to limit a single connection to 128 Kbps?
You can not limit a single HTTP request's connection speed. You can limit individual hosts to some bandwidth rate. To limit a specific host, define an acl for that host and use the example above. To limit a group of hosts, then you must use a delay pool of class 2 or 3. For example:
acl only128kusers src 192.168.1.0/255.255.192.0 acl all src 0.0.0.0/0.0.0.0 delay_pools 1 delay_class 1 3 delay_access 1 allow only128kusers delay_access 1 deny all delay_parameters 1 64000/64000 -1/-1 16000/64000
For an explanation of these tags please see the configuration file.
The above gives a solution where a cache is given a total of 512kbits to operate in, and each IP address gets only 128kbits out of that pool.
How do you personally use delay pools?
We have six local cache peers, all with the options 'proxy-only no-delay' since they are fast machines connected via a fast ethernet and microwave (ATM) network.
For our local access we use a dstdomain ACL, and for delay pool exceptions we use a dst ACL as well since the delay pool ACL processing is done using "fast lookups", which means (among other things) it won't wait for a DNS lookup if it would need one.
Our proxy has two virtual interfaces, one which requires student authentication to connect from machines where a department is not paying for traffic, and one which uses delay pools. Also, users of the main Unix system are allowed to choose slow or fast traffic, but must pay for any traffic they do using the fast cache. Ident lookups are disabled for accesses through the slow cache since they aren't needed. Slow accesses are delayed using a class 3 delay pool to give fairness between departments as well as between users. We recognize users of Lynx on the main host are grouped together in one delay bucket but they are mostly viewing text pages anyway, so this isn't considered a serious problem. If it was we could take those hosts into a class 1 delay pool and give it a larger allocation.
I prefer using a slow restore rate and a large maximum rate to give preference to people who are looking at web pages as their individual bucket fills while they are reading, and those downloading large objects are disadvantaged. This depends on which clients you believe are more important. Also, one individual 8 bit network (a residential college) have paid extra to get more bandwidth.
The relevant parts of my configuration file are (IP addresses, etc, all changed):
# ACL definitions # Local network definitions, domains a.net, b.net acl LOCAL-NET dstdomain a.net b.net # Local network; nets 64 - 127. Also nearby network class A, 10. acl LOCAL-IP dst 192.168.64.0/255.255.192.0 10.0.0.0/255.0.0.0 # Virtual i/f used for slow access acl virtual_slowcache myip 192.168.100.13/255.255.255.255 # All permitted slow access, nets 96 - 127 acl slownets src 192.168.96.0/255.255.224.0 # Special 'fast' slow access, net 123 acl fast_slow src 192.168.123.0/255.255.255.0 # User hosts acl my_user_hosts src 192.168.100.2/255.255.255.254 # "All" ACL acl all src 0.0.0.0/0.0.0.0 # Don't need ident lookups for billing on (free) slow cache ident_lookup_access allow my_user_hosts !virtual_slowcache ident_lookup_access deny all # Security access checks http_access [...] # These people get in for slow cache access http_access allow virtual_slowcache slownets http_access deny virtual_slowcache # Access checks for main cache http_access [...] # Delay definitions (read config file for clarification) delay_pools 2 delay_initial_bucket_level 50 delay_class 1 3 delay_access 1 allow virtual_slowcache !LOCAL-NET !LOCAL-IP !fast_slow delay_access 1 deny all delay_parameters 1 8192/131072 1024/65536 256/32768 delay_class 2 2 delay_access 2 allow virtual_slowcache !LOCAL-NET !LOCAL-IP fast_slow delay_access 2 deny all delay_parameters 2 2048/65536 512/32768
The same code is also used by a some of departments using class 2 delay pools to give them more flexibility in giving different performance to different labs or students.
Where else can I find out about delay pools?
This is also pretty well documented in the configuration file, with examples. Since people seem to lose their config files, here's a copy of the relevant section.
# DELAY POOL PARAMETERS (all require DELAY_POOLS compilation option) # ----------------------------------------------------------------------------- # TAG: delay_pools # This represents the number of delay pools to be used. For example, # if you have one class 2 delay pool and one class 3 delays pool, you # have a total of 2 delay pools. # # To enable this option, you must use --enable-delay-pools with the # configure script. #delay_pools 0 # TAG: delay_class # This defines the class of each delay pool. There must be exactly one # delay_class line for each delay pool. For example, to define two # delay pools, one of class 2 and one of class 3, the settings above # and here would be: # #delay_pools 2 # 2 delay pools #delay_class 1 2 # pool 1 is a class 2 pool #delay_class 2 3 # pool 2 is a class 3 pool # # The delay pool classes are: # # class 1 Everything is limited by a single aggregate # bucket. # # class 2 Everything is limited by a single aggregate # bucket as well as an "individual" bucket chosen # from bits 25 through 32 of the IP address. # # class 3 Everything is limited by a single aggregate # bucket as well as a "network" bucket chosen # from bits 17 through 24 of the IP address and a # "individual" bucket chosen from bits 17 through # 32 of the IP address. # # NOTE: If an IP address is a.b.c.d # -> bits 25 through 32 are "d" # -> bits 17 through 24 are "c" # -> bits 17 through 32 are "c * 256 + d" # TAG: delay_access # This is used to determine which delay pool a request falls into. # The first matched delay pool is always used, ie, if a request falls # into delay pool number one, no more delay are checked, otherwise the # rest are checked in order of their delay pool number until they have # all been checked. For example, if you want some_big_clients in delay # pool 1 and lotsa_little_clients in delay pool 2: # #delay_access 1 allow some_big_clients #delay_access 1 deny all #delay_access 2 allow lotsa_little_clients #delay_access 2 deny all # TAG: delay_parameters # This defines the parameters for a delay pool. Each delay pool has # a number of "buckets" associated with it, as explained in the # description of delay_class. For a class 1 delay pool, the syntax is: # #delay_parameters pool aggregate # # For a class 2 delay pool: # #delay_parameters pool aggregate individual # # For a class 3 delay pool: # #delay_parameters pool aggregate network individual # # The variables here are: # # pool a pool number - ie, a number between 1 and the # number specified in delay_pools as used in # delay_class lines. # # aggregate the "delay parameters" for the aggregate bucket # (class 1, 2, 3). # # individual the "delay parameters" for the individual # buckets (class 2, 3). # # network the "delay parameters" for the network buckets # (class 3). # # A pair of delay parameters is written restore/maximum, where restore is # the number of bytes (not bits - modem and network speeds are usually # quoted in bits) per second placed into the bucket, and maximum is the # maximum number of bytes which can be in the bucket at any time. # # For example, if delay pool number 1 is a class 2 delay pool as in the # above example, and is being used to strictly limit each host to 64kbps # (plus overheads), with no overall limit, the line is: # #delay_parameters 1 -1/-1 8000/8000 # # Note that the figure -1 is used to represent "unlimited". # # And, if delay pool number 2 is a class 3 delay pool as in the above # example, and you want to limit it to a total of 256kbps (strict limit) # with each 8-bit network permitted 64kbps (strict limit) and each # individual host permitted 4800bps with a bucket maximum size of 64kb # to permit a decent web page to be downloaded at a decent speed # (if the network is not being limited due to overuse) but slow down # large downloads more significantly: # #delay_parameters 2 32000/32000 8000/8000 600/8000 # # There must be one delay_parameters line for each delay pool. # TAG: delay_initial_bucket_level (percent, 0-100) # The initial bucket percentage is used to determine how much is put # in each bucket when squid starts, is reconfigured, or first notices # a host accessing it (in class 2 and class 3, individual hosts and # networks only have buckets associated with them once they have been # "seen" by squid). # #delay_initial_bucket_level 50
Customizable Error Messages
Squid-2 lets you customize your error messages. The source distribution includes error messages in different languages. You can select the language with the configure option:
--enable-err-language=lang
Furthermore, you can rewrite the error message template files if you like. This list describes the tags which Squid will insert into the messages:
%a:: User identity
%B:: URL with FTP %2f hack
%c:: Squid error code
%d:: seconds elapsed since request received (not yet implemented)
%e:: errno
%E:: strerror()
%f:: FTP request line
%F:: FTP reply line
%g:: FTP server message
%h:: cache hostname
%H:: server host name
%i:: client IP address
%I:: server IP address
%L:: contents of err_html_text config option
%M:: Request Method
%m:: Error message returned by external auth helper
%o:: Message returned by external acl helper
%p:: URL port \#
%P:: Protocol
%R:: Full HTTP Request
%S:: squid default signature. Automatically added unless %s is used.
%s:: caching proxy software with version
%t:: local time
%T:: UTC
%U:: URL without password
%u:: URL with password (Squid-2.5 and later only)
%w:: cachemgr email address
%z:: dns server error message
The Squid default signature is added automatically unless %s is used in the error page. To change the signature you must manually append the signature to each error page.
The default signature reads like:
<BR clear="all"> <HR noshade size="1px"> <ADDRESS> Generated %T by %h (%s) </ADDRESS> </BODY></HTML>
My squid.conf from version 1.1 doesn't work!
Yes, a number of configuration directives have been renamed. Here are some of them:
cache_host:: This is now called cache_peer. The old term does not really describe what you are configuring, but the new name tells you that you are configuring a peer for your cache.
cache_host_domain:: Renamed to cache_peer_domain
local_ip, local_domain:: The functaionality provided by these directives is now implemented as access control lists. You will use the always_direct and never_direct options. The new squid.conf file has some examples.
cache_stoplist:: This directive also has been reimplemented with access control lists. You will use the cache option since Squid-2.6. For example:
acl Uncachable url_regex cgi ? cache deny Uncachable
cache_swap:: This option used to specify the cache disk size. Now you specify the disk size on each cache_dir line.
cache_host_acl:: This option has been renamed to cache_peer_access and the syntax has changed. Now this option is a true access control list, and you must include an allow or deny keyword. For example:
acl that-AS dst_as 1241 cache_peer_access thatcache.thatdomain.net allow that-AS cache_peer_access thatcache.thatdomain.net deny all
This example sends requests to your peer thatcache.thatdomain.net only for origin servers in Autonomous System Number 1241.
units:: In Squid-1.1 many of the configuration options had implied units associated with them. For example, the connect_timeout value may have been in seconds, but the read_timeout value had to be given in minutes. With Squid-2, these directives take units after the numbers, and you will get a warning if you leave off the units. For example, you should now write:
connect_timeout 120 seconds read_timeout 15 minutes
Back to the SquidFaq
Include: Nothing found for "^Back to the"!
Contents
- What is the httpd-accelerator mode?
- How do I set it up?
- Domain based virtual host support
- Sending different requests to different backend web servers
- Running the web server on the same server
- Load balancing of backend servers
- When using an httpd-accelerator, the port number or host name for redirects or CGI-generated content is wrong
- Access to password protected content fails via the reverse proxy
What is the httpd-accelerator mode?
Occasionally people have trouble understanding accelerators and proxy caches, usually resulting from mixed up interpretations of "incoming" and "outgoing" data. I think in terms of requests (i.e., an outgoing request is from the local site out to the big bad Internet). The data received in reply is incoming, of course. Others think in the opposite sense of "a request for incoming data".
An accelerator caches incoming requests for outgoing data (i.e., that which you publish to the world). It takes load away from your HTTP server and internal network. You move the server away from port 80 (or whatever your published port is), and substitute the accelerator, which then pulls the HTTP data from the "real" HTTP server (only the accelerator needs to know where the real server is). The outside world sees no difference (apart from an increase in speed, with luck).
Quite apart from taking the load of a site's normal web server, accelerators can also sit outside firewalls or other network bottlenecks and talk to HTTP servers inside, reducing traffic across the bottleneck and simplifying the configuration. Two or more accelerators communicating via ICP can increase the speed and resilience of a web service to any single failure.
The Squid redirector can make one accelerator act as a single front-end for multiple servers. If you need to move parts of your filesystem from one server to another, or if separately administered HTTP servers should logically appear under a single URL hierarchy, the accelerator makes the right thing happen.
If you wish only to cache the "rest of the world" to improve local users browsing performance, then accelerator mode is irrelevant. Sites which own and publish a URL hierarchy use an accelerator to improve access to it from the Internet. Sites wishing to improve their local users' access to other sites' URLs use proxy caches. Many sites, like us, do both and hence run both.
Measurement of the Squid cache and its Harvest counterpart suggest an order of magnitude performance improvement over CERN or other widely available caching software. This order of magnitude performance improvement on hits suggests that the cache can serve as an httpd accelerator, a cache configured to act as a site's primary httpd server (on port 80), forwarding references that miss to the site's real httpd (on port 81).
In such a configuration, the web administrator renames all non-cachable URLs to the httpd's port (81). The cache serves references to cachable objects, such as HTML pages and GIFs, and the true httpd (on port 81) serves references to non-cachable objects, such as queries and cgi-bin programs. If a site's usage characteristics tend toward cachable objects, this configuration can dramatically reduce the site's web workload.
How do I set it up?
First, you have to tell Squid to listen on port 80 (usually), so set the 'http_port' option with the defaultsite option telling Squid it's an accelerator for this site:
http_port 80 accel defaultsite=your.main.website
Next, you need to tell Squid where to find the real web server:
cache_peer ip.of.webserver parent 80 0 no-query originserver
And finally you need to set up access controls to allow access to your site
acl our_sites dstdomain your.main.website http_access allow our_sites
You should now be able to start Squid and it will serve requests as a HTTP server.
Note: The accel option to http_port is optional and should only be specified for 2.6.STABLE8 and later. In all versions Squid-2.6 and later specifying one of defaultsite or vhost is sufficient.
Accelerator mode in Squid-2.5 worked quite differently, and upgrade to 2.6 or later is strongly recommended if you still use Squid-2.5.
Domain based virtual host support
If you are using Squid has an accelerator for a domain based virtual host system then you need to additionally specify the vhost option to http_port
http_port 80 accel defaultsite=your.main.website vhost
When both defaultsite and vhost is specified defaultsite specifies the domain name old HTTP/1.0 clients not sending a Host header should be sent to. Squid will run fine if you only use vhost, but there is still some software out there not sending Host headers so it's recommended to specify defaultsite as well. If defaultsite is not specified those clients will get an "Invalid request" error.
Sending different requests to different backend web servers
To control which web servers (cache_peer) gets which requests the cache_peer_access or cache_peer_domain directives is used. These directives limit which requests may be sent to a given peer.
Example mapping different host names to different peers:
www.example.com -> server 1 example.com -> server 1 download.example.com -> server 2 .example.net -> server 2
squid.conf:
cache_peer ip.of.server1 parent 80 0 no-query originserver name=server_1 acl sites_server_1 dstdomain www.example.com example.com cache_peer_access server_1 allow sites_server_1 cache_peer ip.of.server2 parent 80 0 no-query originserver name=server_2 acl sites_server_2 dstdomain www.example.net download.example.com .example.net cache_peer_access server_2 allow sites_server_2
Or the same using cache_peer_domain
cache_peer ip.of.server1 parent 80 0 no-query originserver name=server_1 cache_peer_domain server_1 www.example.com example.com cache_peer ip.of.server2 parent 80 0 no-query originserver name=server_2 cache_peer_domain server_2 download.example.com .example.net
It's also possible to route requests based on other criterias than the host name by using other acl types, such as urlpath_regex.
Example mapping requests based on the URL-path:
/foo -> server2 the rest -> server1
squid.conf:
cache_peer ip.of.server1 parent 80 0 no-query originserver name=server1 cache_peer ip.of.server2 parent 80 0 no-query originserver name=server2 acl foo urlpath_regex ^/foo cache_peer_access server2 allow foo cache_peer_access server1 deny foo
Note: Remember that the cache is on the requested URL and not which peer the request is forwarded to so don't use user dependent acls if the content is cached.
Running the web server on the same server
While not generally recommended it is possible to run both the accelerator and the backend web server on the same host. To do this you need to make them listen on different IP addresses. Usually the loopback address (127.0.0.1) is used for the web server.
In Squid this is done by specifying the IP address in http_port, and using 127.0.0.1 as address to the web server
http_port the.public.ip.address:80 accel defaultsite=your.main.website cache_peer 127.0.0.1 parent 80 0 no-query originserver
AndApache may be configured like in httpd.conf to listen on the loopback address:
Port 80 BindAddress 127.0.0.1
Other web servers uses similar directives specifying the address where it should listen for requests. See the manual to your web server for details.
Load balancing of backend servers
To load balance requests among a set of backend servers allow requests to be forwarded to more than one cache_peer, and use one of the load balancing options in the cache_peer lines. I.e. the round-robin option.
cache_peer ip.of.server1 parent 80 0 no-query originserver round-robin cache_peer ip.of.server2 parent 80 0 no-query originserver round-robin
Other load balancing methods is also available. See squid.conf.default for the full the description of the cache_peer directive options.
When using an httpd-accelerator, the port number or host name for redirects or CGI-generated content is wrong
This happens if the port or domain name of the accelerated content is different from what the client requested. When your httpd issues a redirect message (e.g. 302 Moved Temporarily) or generates absolute URLs, it only knows the port it's configured on and uses this to build the URL. Then, when the client requests the redirected URL, it bypasses the accelerator.
To fix this make sure that defaultsite is the site name requested by clients, and that the port number of http_port and the backent web server is the same. You may also need to configure the official site name on the web server.
Alternatively you can also use the location_rewrite helper interface to Squid to fixup redirects on the way out to the client, but this only works for the Location header, not URLs dynamically embedded in the returned content.
Access to password protected content fails via the reverse proxy
If the content on the web servers is password protected then you need to tell the proxy to trust your web server with authetication credentials. This is done via the login= option to cache_peer. Normally you would use login=PASS to have the login information forwarded. The other alternatives is meant to be used when it's the reverse proxy which processes the authentication as such but you like to have information about the authenticated account forwarded to the backend web server.
cache_peer ip.of.server parent 80 0 no-query originserver login=PASS
Back to the SquidFaq
Contents
Clients
Wget
Wget is a command-line Web client. It supports HTTP and FTP URLs, recursive retrievals, and HTTP proxies.
echoping
If you want to test your Squid cache in batch (from a cron command, for instance), you can use the echoping program, which will tell you (in plain text or via an exit code) if the cache is up or not, and will indicate the response times.
curl-loader
A stress-testing tool for performance analysis, available at http://sourceforge.net/projects/curl-loader
Load Balancers
Pen
Pen is a simple load-balancer with session affinity for TCP-based protocols.
L7SW
Layer-7 switching is a Layer-7 load-balancing engine for Linux. It's a young project, stemming off the more mature Keepalived.
Linux Virtual Server
Linux Virtual Server is a kernel-based layer 3-7 load balancer for Linux
HA Clusters
Keepalived
Keepalived is a software suite that implements HA (via VRRP) and status monitoring with failover capabilities. It's focused on Linux, support for other OSes is unclear.
VRRPd
VRRPd is aimple implementation of VRRPv2
Logfile Analysis
Rather than maintain the same list in two places, please see the Logfile Analysis Scripts page on the Web server.
Squeezer is a logfile analysis software aimed at measuring Squid's performance
SGI's Performance Co-Pilot
Jan-Frode Myklebust writes:
I use Performance CoPilot from http://oss.sgi.com/projects/pcp/ for keeping track of squid and server performance. It comes by default with a huge number of system performance metrics, and also has a nice plugin (PMDA, Performance Metrics Domain Agent) for collecting metrics from the squid access.log.
i.e. it can collect historic, or show live how many requests/s or byte/s squid is answering of type:
- total
- get
- head
- post
- other
- size.zero le3k le10k le30k le100k le300k le1m le3m gt3m unknown
- client.total
- cached.total
- cached.size.zero le3k le10k le30k le100k le300k le1m le3m gt3m unknown
- uncached.total
- uncached.size.zero le3k le10k le30k le100k le300k le1m le3m gt3m unknown
and also combine this with system level metrics like load, system cpu time, cpu i/o wait, per partition byte/s, network interface byte/s, and much more..
Because of it's historic logs of all this, it's great for collecting the performance numbers during high activity, and then replaying it to analyse what goes wrong later on.
Configuration Tools
3Dhierarchy.pl
Kenichi Matsui has a simple perl script which generates a 3D hierarchy map (in VRML) from squid.conf. 3Dhierarchy.pl.
Squid add-ons
transproxy
transproxy is a program used in conjunction with the Linux Transparent Proxy networking feature, and ipfwadm, to intercept HTTP and other requests. Transproxy is written by John Saunders.
Iain's redirector package
A redirector package from Iain Lea to allow Intranet (restricted) or Internet (full) access with URL deny and redirection for sites that are not deemed acceptable for a userbase all via a single proxy port.
Junkbusters
Junkbusters Corp has a copyleft privacy-enhancing, ad-blocking proxy server which you can use in conjunction with Squid.
Squirm
Squirm is a configurable, efficient redirector for Squid by Chris Foote. Features:
- Very fast
- Virtually no memory usage
- It can re-read it's config files while running by sending it a HUP signal
- Interactive test mode for checking new configs
- Full regular expression matching and replacement
- Config files for patterns and IP addresses.
If you mess up the config file, Squirm runs in Dodo Mode so your squid keeps working
chpasswd.cgi
Pedro L Orso has adapated the Apache's [../../htpasswd/ htpasswd] into a CGI program called chpasswd.cgi.
jesred
jesred by Jens Elkner.
squidGuard
squidGuard is a free (GPL), flexible and efficient filter and redirector program for squid. It lets you define multiple access rules with different restrictions for different user groups on a squid cache. squidGuard uses squid standard redirector interface.
Central Squid Server
The Smart Neighbour [URL disappeared] (or 'Central Squid Server' - CSS) is a cut-down version of Squid without HTTP or object caching functionality. The CSS deals only with ICP messages. Instead of caching objects, the CSS records the availability of objects in each of its neighbour caches. Caches that have smart neighbours update each smart neighbour with the status of their cache by sending ICP_STORE_NOTIFY/ICP_RELEASE_NOTIFY messages upon storing/releasing an object from their cache. The CSS maintains an up to date 'object map' recording the availability of objects in its neighbouring caches.
Cerberian content filter (subscription service)
The Cerberian content filter is a very flexible URL rating system with full Squid integration provided by MARA Systems AB. The service requires a license (priced by the number of seats) but evaluation licenses are available.
Filter-Modules patch for Squid
It's a patch for squid-2.4 and squid-2.5 to enable inline alteration of data passing through the proxy; available at http://sites.inka.de/~bigred/devel/squid-filter.html
Ident Servers
For Windows NT, Windows 95/98, and Unix.
Cacheability Validators
The Cacheability Engine is a python script that validates an URL, analyzing the clues a web server gives to understand how cacheable is the served content. An online tool is available at http://www.ircache.net/cgi-bin/cacheability.py
Include: Nothing found for "^Back to the"!
The external resources on this page are only provided as pointers so that users may find useful information.
Squeezer2 is an access.log analyzer aimed at understanding squid perfrmance
Squid Blog is a squid blog about some interesting suggestions extracting from the squid mail list
Gadgetry is available at CafePress
OtherHttpProxies is an inventory of other HTTP proxy implementations
Contents
- What is DISKD?
- Does it perform better?
- How do I use it?
- FATAL: Unknown cache_dir type 'diskd'
- If I use DISKD, do I have to wipe out my current cache?
- How do I configure message queues?
- How do I configure shared memory?
- Sometimes shared memory and message queues aren't released when Squid exits.
- What are the Q1 and Q2 parameters?
What is DISKD?
DISKD refers to some features in Squid-2.4 and later to improve Disk I/O performance. The basic idea is that each cache_dir has its own diskd child process. The diskd process performs all disk I/O operations (open, close, read, write, unlink) for the cache_dir. Message queues are used to send requests and responses between the Squid and diskd processes. Shared memory is used for chunks of data to be read and written.
Does it perform better?
Yes. We benchmarked Squid-2.4 with DISKD at the Second IRCache Bake-Off. The results are also described here. At the bakeoff, we got 160 req/sec with diskd. Without diskd, we'd have gotten about 40 req/sec.
How do I use it?
You need to run Squid version 2.4 or later. Your operating system must support message queues, and shared memory.
To configure Squid for DISKD, use the --enable-storeio option:
% ./configure --enable-storeio=diskd,ufs
FATAL: Unknown cache_dir type 'diskd'
You didn't put diskd in the list of storeio modules as described above. You need to run configure and and recompile Squid.
If I use DISKD, do I have to wipe out my current cache?
No. Diskd uses the same storage scheme as the standard "UFS" type. It only changes how I/O is performed.
How do I configure message queues?
Most Unix operating systems have message queue support by default. One way to check is to see if you have an ipcs command.
However, you will likely need to increase the message queue parameters for Squid. Message queue implementations normally have the following parameters:
- MSGMNB
- Maximum number of bytes per message queue.
- MSGMNI
- Maximum number of message queue identifiers (system wide).
- MSGSEG
- Maximum number of message segments per queue.
- MSGSSZ
- Size of a message segment.
- MSGTQL
- Maximum number of messages (system wide).
- MSGMAX
- Maximum size of a whole message. On some systems you may need to
increase this limit. On other systems, you may not be able to change it.
The messages between Squid and diskd are 32 bytes for 32-bit CPUs and 40 bytes for 64-bit CPUs. Thus, MSGSSZ should be 32 or greater. You may want to set it to a larger value, just to be safe.
We'll have two queues for each cache_dir -- one in each direction. So, MSGMNI needs to be at least two times the number of cache_dirs. I've found that 75 messages per queue is about the limit of decent performance. If each diskd message consists of just one segment (depending on your value of MSGSSZ), then MSGSEG should be greater than 75. MSGMNB and MSGTQL affect how many messages can be in the queues at one time. Diskd messages shouldn't be more than 40 bytes, but let's use 64 bytes to be safe. MSGMNB should be at least 64*75. I recommend rounding up to the nearest power of two, or 8192. MSGTQL should be at least 75 times the number of
FreeBSD
Your kernel must have
options SYSVMSG
You can set the parameters in the kernel as follows. This is just an example. Make sure the values are appropriate for your system:
options MSGMNB=8192 # max # of bytes in a queue options MSGMNI=40 # number of message queue identifiers options MSGSEG=512 # number of message segments per queue options MSGSSZ=64 # size of a message segment options MSGTQL=2048 # max messages in system
OpenBSD
You can set the parameters in the kernel as follows. This is just an example. Make sure the values are appropriate for your system:
option MSGMNB=16384 # max characters per message queue option MSGMNI=40 # max number of message queue identifiers option MSGSEG=2048 # max number of message segments in the system option MSGSSZ=64 # size of a message segment (Must be 2^N) option MSGTQL=1024 # max amount of messages in the system
Digital Unix
Message queue support seems to be in the kernel by default. Setting the options is as follows:
options MSGMNB="8192" # max # bytes on queue options MSGMNI="40" # # of message queue identifiers options MSGMAX="2048" # max message size options MSGTQL="2048" # # of system message headers
by (B.C.Phillips at massey dot ac dot nz) Brenden Phillips
If you have a newer version (DU64), then you can probably use sysconfig instead. To see what the current IPC settings are run
# sysconfig -q ipc
To change them make a file like this called ipc.stanza:
ipc: msg-max = 2048 msg-mni = 40 msg-tql = 2048 msg-mnb = 8192
then run
# sysconfigdb -a -f ipc.stanza
You have to reboot for the change to take effect.
Linux
Stefan Köpsell reports that if you compile sysctl support into your kernel, then you can change the following values:
- kernel.msgmnb
- kernel.msgmni
- kernel.msgmax
Winfried Truemper reports: The default values should be large enough for most common cases. You can modify the message queue configuration by writing to these files:
- /proc/sys/kernel/msgmax
- /proc/sys/kernel/msgmnb
- /proc/sys/kernel/msgmni
Solaris
Refer to Demangling Message Queues in Sunworld Magazine.
I don't think the above article really tells you how to set the parameters. You do it in /etc/system with lines like this:
set msgsys:msginfo_msgmax=2048 set msgsys:msginfo_msgmnb=8192 set msgsys:msginfo_msgmni=40 set msgsys:msginfo_msgssz=64 set msgsys:msginfo_msgtql=2048
Of course, you must reboot whenever you modify /etc/system before changes take effect.
How do I configure shared memory?
Shared memory uses a set of parameters similar to the ones for message queues. The Squid DISKD implementation uses one shared memory area for each cache_dir. Each shared memory area is about 800 kilobytes in size. You may need to modify your system's shared memory parameters:
- SHMSEG
- Maximum number of shared memory segments per process.
- SHMMNI
- Maximum number of shared memory segments for the whole system.
- SHMMAX
- Largest shared memory segment size allowed.
- SHMALL
- Total amount of shared memory that can be used.
For Squid and DISKD, SHMSEG and SHMMNI must be greater than or equal to the number of cache_dir's that you have. SHMMAX must be at least 800 kilobytes. SHMALL must be at least 800 kilobytes multiplied by the number of cache_dir's.
Note that some operating systems express SHMALL in pages, rather than bytes, so be sure to divide the number of bytes by the page size if necessary. Use the pagesize command to determine your system's page size, or use 4096 as a reasonable guess.
FreeBSD
Your kernel must have
options SYSVSHM
You can set the parameters in the kernel as follows. This is just an example. Make sure the values are appropriate for your system:
options SHMSEG=16 # max shared mem id's per process options SHMMNI=32 # max shared mem id's per system options SHMMAX=2097152 # max shared memory segment size (bytes) options SHMALL=4096 # max amount of shared memory (pages)
OpenBSD
OpenBSD is similar to FreeBSD, except you must use option instead of options, and SHMMAX is in pages instead of bytes:
option SHMSEG=16 # max shared mem id's per process option SHMMNI=32 # max shared mem id's per system option SHMMAX=2048 # max shared memory segment size (pages) option SHMALL=4096 # max amount of shared memory (pages)
Digital Unix
Message queue support seems to be in the kernel by default. Setting the options is as follows:
options SHMSEG="16" # max shared mem id's per process options SHMMNI="32" # max shared mem id's per system options SHMMAX="2097152" # max shared memory segment size (bytes) options SHMALL=4096 # max amount of shared memory (pages)
by (B.C.Phillips at massey dot ac dot nz) Brenden Phillips
If you have a newer version (DU64), then you can probably use sysconfig instead. To see what the current IPC settings are run
# sysconfig -q ipc
To change them make a file like this called ipc.stanza:
ipc: shm-seg = 16 shm-mni = 32 shm-max = 2097152 shm-all = 4096
then run
# sysconfigdb -a -f ipc.stanza
You have to reboot for the change to take effect.
Linux
Winfried Truemper reports: The default values should be large enough for most common cases. You can modify the shared memory configuration by writing to these files:
- /proc/sys/kernel/shmall
- /proc/sys/kernel/shmmax
- /proc/sys/kernel/shmmni
- /proc/sys/kernel/shm-use-bigpages
Stefan Köpsell reports that if you compile sysctl support into your kernel, then you can change the following values:
- kernel.shmall
- kernel.shmmni
- kernel.shmmax
Solaris
Refer to Shared memory uncovered in Sunworld Magazine.
To set the values, you can put these lines in /etc/system:
set shmsys:shminfo_shmmax=2097152 set shmsys:shminfo_shmmni=32 set shmsys:shminfo_shmseg=16
Sometimes shared memory and message queues aren't released when Squid exits.
Yes, this is a little problem sometimes. Seems like the operating system gets confused and doesn't always release shared memory and message queue resources when processes exit, especially if they exit abnormally. To fix it you can "manually" clear the resources with the ipcs command. Add this command into your RunCache or squid_start script:
ipcs | awk '/squid/ {printf "ipcrm -%s %s\n", $1, $2}' | /bin/sh
What are the Q1 and Q2 parameters?
In the source code, these are called magic1 and magic2. These numbers refer to the number of oustanding requests on a message queue. They are specified on the cache_dir option line, after the L1 and L2 directories:
cache_dir diskd /cache1 1024 16 256 Q1=72 Q2=64
If there are more than Q1 messages outstanding, then Squid will intentionally fail to open disk files for reading and writing. This is a load-shedding mechanism. If your cache gets really really busy and the disks can not keep up, Squid bypasses the disks until the load goes down again.
If there are more than Q2 messages outstanding, then the main Squid process "blocks" for a little bit until the diskd process services some of the messages and sends back some replies.
Reasonable Q1 and Q2 values are 64 and 72. If you would rather have good hit ratio and bad response time, set Q1 > Q2. Otherwise, if you would rather have good response time and bad hit ratio, set Q1 < Q2.
Include: Nothing found for "^Back to the"!
Contents
What is COSS?
COSS is a Cyclic Object storage system originally designed by Eric Stern. COSS works with a single file, and each stripe is a fixed size an in a fixed position in the file. The stripe size is a compile-time option.
As objects are written to a COSS stripe, their place is pre-reserved and data is copied into a memory copy of the stripe. Because of this, the object size must be known before it can be stored in a COSS filesystem. (Hence the max-size requirement with a coss cache_dir.)
When a stripe is filled, the stripe is written to disk, and a new memory stripe is created.
Does it perform better?
Yes. At the time of writing COSS is the fastest performing cache_dir available in squid. Because COSS cache_dirs can only store small cache objects, they need to be combineds with another cache_dir type (aufs, diskd or ufs) in order to allow caching of larger objects. Because COSS takes care of the small objects more efficiently, the non-COSS cache_dirs also perform more efficiently because they have a small number of larger objects to deal with.
How do I use it?
You need to run Squid version 2.6 or later to be able to run a stable version of COSS.
To configure Squid for COSS, use the --enable-storeio option (and the --enable-coss-aio-ops to enable async I/O):
% ./configure --enable-storeio=coss,ufs
If I use COSS, do I have to wipe out my current cache?
Yes. COSS uses a single file or direct partition access to store objects. To prepare a file or disk for COSS you need to run the following command:
dd if=/dev/zero bs=1048576 count=<size> of=<outfile>
where:
<size> is the size of the COSS partition in MB
<outfile> is the partition or filename that you want to use as the COSS store
What options are required for COSS?
The minimum configuration for a COSS partition is as follows:
cache_dir coss <file> <size> max-size=<max-size> cache_swap_log /var/spool/squid/%s
where:
<file> is the partition or filename that you want to use as the COSS store (you will need to pre-create the file if it doesn't exist)
<size> is the size of the COSS cache_dir in MB
<max-size> is the size of the largest object that this cache_dir can store. This value can not be bigger then 1MB in the default configuration.
The cache_swap_log option should be set to a directory that squid has write access to. This is used to store all the swap.state files for all cache_dirs, and needs to be set when using COSS because COSS does not have a normal filesystem that it can store this information on.
Are there any other configuration options for COSS?
COSS partitions have a number of different configuration options available. These options are:
block-size=<n>
This will limit the maximum size for a COSS cache_dir (where the size is calculated as the size of the disk space + the size of any membufs) as follows:
n=512 - 8192 MB
n=1024 - 16384 MB
n=2048 - 32768 MB
n=4096 - 65536 MB
n=8192 - 131072 MB
The default value for block-size is 512 bytes.
overwrite-percent=<n>
This will allow a trade-off between the size a COSS cache_dir will grow to, the accuracy of the LRU algorithm and the amount of disk I/O bandwidth used. <n> must be between 0 and 100.
If it is set to 0, the COSS cache_dir will always copy any cache hits to the current disk stripe. This reduces the amount of unique data that the cache will store, increases the amount of disk bandwidth used but makes the LRU algorithm work perfectly.
If it is set to 100, the COSS cache_dir will never copy any cache hits to the current stripe. This will mean that all objects will be stored exactly once, reducing the total disk bandwidth used, but it effectively makes the disk a FIFO (ie popular objects ony stay in the cache_dir for as long as it takes for COSS to loop back tot he original stripe).
The default value for overwrite-percent of 50 is a good balance between the two extremes.
max-stripe-waste=<n>
This option sets the maximum amount of space that a COSS cache_dir will waste when writing a stripe to disk. Every time a COSS stripe is written, it will waste up to max-size worth of space. This becomes a problem if max-size is set to a larger value (eg is max-size is 512K when a COSS stripe is 1MB, up to 50 of the space in that stripe could be written to disk with no data). max-stripe-waste overcomes this problem by dynamically reducing the max-size value to ensure that only <n> bytes of space will be wasted on each stripe write.
The max-stripe-waste option is not set by default.
membufs=<n>
This option determines the maximum number of stripes that COSS will use to send cache hits to clients. It is designed to limit the amount of memory that a given COSS cache_dir can cause squid to use. Once squid runs out of membufs, it starts to move all objects to the current disk stripe, effectively ignoring the overwrite-percent setting.
The default value for membufs is 10.
maxfullbufs=<n>
This option sets the maximum number of stripes that are full, but waiting to be freed that this cache_dir will hold in memory. Once again, this is a setting to limit the amount of memory that a given COSS cache_dir can grow to use.
Each cache_dir will reserve the last 2 maxfullbufs for cache hits (ie they will only be used when squid runs out of membufs). This is designed to allow a higher hit rate at the expense of storing new objects in the cache.
The default is to leave the maxfullbufs option as unlimited (ie we can always accept new objects).
Examples
cache_dir coss /var/spool/squid/coss 100 block-size=512 max-size=131072
- This will use a file with the filename /var/spool/squid/coss
- The cache_dir will store up to 100MB worth of data
- The block size is 512 byte
- Objects that are up to 131072 bytes long will be stored.
cache_dir coss /dev/sdf1 34500 max-size=524288 max-stripe-waste=32768 block-size=4096 maxfullbufs=10
- This will use the /dev/sdf1 partition
- The cache_dir will store up to 34500MB worth of data
- The block size is 4096 bytes
- Objects that are up to 524288 bytes long will be stored.
- If a given stripe has less than 524288 bytes available, this cache_dir will only accept smaller objects until there is less than 32768 bytes available in the stripe.
- If the default stripe size of 1MB is not changed, up to 10MB will be used for stripes that are waiting to be written to disk.
Contents
- How does Proxy Authentication work in Squid?
- How do I use authentication in access controls?
- How do I ask for authentication of an already authenticated user?
- Does Squid cache authentication lookups?
- Are passwords stored in clear text or encrypted?
- How do I use the Winbind authenticators?
- Can I use different authentication mechanisms together?
- Can I use more than one user-database?
- References
- Authentication in interception and transparent modes
- Other Resources
How does Proxy Authentication work in Squid?
Users will be authenticated if squid is configured to use proxy_auth ACLs (see next question).
Browsers send the user's authentication credentials in the Authorization request header.
If Squid gets a request and the http_access rule list gets to a proxy_auth ACL, Squid looks for the Authorization header. If the header is present, Squid decodes it and extracts a username and password.
If the header is missing, Squid returns an HTTP reply with status 407 (Proxy Authentication Required). The user agent (browser) receives the 407 reply and then prompts the user to enter a name and password. The name and password are encoded, and sent in the Authorization header for subsequent requests to the proxy.
NOTE: The name and password are encoded using "base64" (See section 11.1 of RFC 2616). However, base64 is a binary-to-text encoding only, it does NOT encrypt the information it encodes. This means that the username and password are essentially "cleartext" between the browser and the proxy. Therefore, you probably should not use the same username and password that you would use for your account login.
Authentication is actually performed outside of main Squid process. When Squid starts, it spawns a number of authentication subprocesses. These processes read usernames and passwords on stdin, and reply with "OK" or "ERR" on stdout. This technique allows you to use a number of different authentication protocols (named "schemes" in this context). When multiple authentication schemes are offered by the server (Squid in this case), it is up to the User-Agent to choose one and authenticate using it. By RFC it should choose the safest one it can handle; in practice usually Microsoft Internet Explorer chooses the first one it's been offered that it can handle, and Mozilla browsers are bug-compatible with the Microsoft system in this field.
The Squid source code comes with a few authentcation backends ("helpers") for Basic authentication. These include:
- LDAP: Uses the Lightweight Directory Access Protocol
- NCSA: Uses an NCSA-style username and password file.
- MSNT: Uses a Windows NT authentication domain.
- PAM: Uses the Linux Pluggable Authentication Modules scheme.
- SMB: Uses a SMB server like Windows NT or Samba.
- getpwam: Uses the old-fashioned Unix password file.
- SASL: Uses SALS libraries.
- mswin_sspi: Windows native authenticator
- YP: Uses the NIS database
In addition Squid also supports the NTLM, Negotiate and Digest authentication schemes which provide more secure authentication methods, in that where the password is not exchanged in plain text over the wire. Each scheme have their own set of helpers and auth_param settings. Notice that helpers for different authentication schemes use different protocols to talk with squid, so they can't be mixed.
For information on how to set up NTLM authentication see winbind below.
In order to authenticate users, you need to compile and install one of the supplied authentication modules found in the helpers/basic_auth/ directory, one of the others, or supply your own.
You tell Squid which authentication program to use with the auth_param option in squid.conf. You specify the name of the program, plus any command line options if necessary. For example:
auth_param basic program /usr/local/squid/bin/ncsa_auth /usr/local/squid/etc/passwd
How do I use authentication in access controls?
Make sure that your authentication program is installed and working correctly. You can test it by hand.
Add some proxy_auth ACL entries to your squid configuration. For example:
acl foo proxy_auth REQUIRED acl all src 0/0 http_access allow foo http_access deny all
The REQUIRED term means that any authenticated user will match the ACL named foo.
Squid allows you to provide fine-grained controls by specifying individual user names. For example:
acl foo proxy_auth REQUIRED acl bar proxy_auth lisa sarah frank joe acl daytime time 08:00-17:00 acl all src 0/0 http_access allow bar http_access allow foo daytime http_access deny all
In this example, users named lisa, sarah, joe, and frank are allowed to use the proxy at all times. Other users are allowed only during daytime hours.
How do I ask for authentication of an already authenticated user?
If a user is authenticated at the proxy you cannot "log out" and re-authenticate. The user usually has to close and re-open the browser windows to be able to re-login at the proxy. A simple configuration will probably look like this:
acl my_auth proxy_auth REQUIRED http_access allow my_auth http_access deny all
But there is a trick which can force the user to authenticate with a different account in certain situations. This happens if you deny access with an authentication related ACL last in the http_access deny statement. Example configuration:
acl my_auth proxy_auth REQUIRED acl google_users proxyauth user1 user2 user3 acl google dstdomain .google.com http_access deny google !google_users http_access allow my_auth http_access deny all
In this case if the user requests www.google.com then first second http_access line matches and triggers re-authentication unless the user is one of the listed users. Remember: it's always the last ACL on a http_access line that "matches". If the matching ACL deals with authentication a re-authentication is triggered. If you didn't want that you would need to switch the order of ACLs so that you get http_access deny !google_users google.
You might also run into an authentication loop if you are not careful. Assume that you use LDAP group lookups and want to deny access based on an LDAP group (e.g. only members of a certain LDAP group are allowed to reach certain web sites). In this case you may trigger re-authentication although you don't intend to. This config is likely wrong for you:
acl ldapgroup-allowed external LDAP_group PROXY_ALLOWED http_access deny !ldapgroup-allowed http_access allow all
The second http_access line would force the user to re-authenticate time and again if he/she is not member of the PROXY_ALLOWED group. This is perhaps not what you want. You rather wanted to deny access to non-members. So you need to rewrite this http_access line so that an ACL matches that has nothing to do with authentication. This is the correct example:
acl ldapgroup-allowed external LDAP_group PROXY_ALLOWED acl dummy src 0.0.0.0/0.0.0.0 http_access deny !ldapgroup-allowed dummy http_access allow all
This way the http_access line still matches. But it's the dummy ACL which is now last in the line. Since dummy is a static ACL (that always matches) and has nothing to do with authentication you will find that the access is just denied.
See also: http://www.squid-cache.org/mail-archive/squid-users/200511/0339.html
Does Squid cache authentication lookups?
It depends on the authentication scheme; Squid does some caching when it can. Successful Basic authentication lookups are cached for one hour by default. That means (in the worst case) its possible for someone to keep using your cache up to an hour after he has been removed from the authentication database.
You can control the expiration time with the auth_param basic credentialsttl configuration option.
Note: This has nothing to do with how often the user needs to re-authenticate himself. It is the browser who maintains the session, and re-authentication is a business between the user and his browser, not the browser and Squid. The browser authenticates on behalf of the user on every request sent to Squid. What this parameter controls is only how often Squid will ask the defined helper if the password is still valid.
Are passwords stored in clear text or encrypted?
In the basic scheme passwords is exchanged in plain text. In the other schemes only cryptographic hashes of the password is exchanges.
Squid stores cleartext passwords in its basic authentication memory cache.
Squid writes cleartext usernames and passwords when talking to the external basic authentication processes. Note, however, that this interprocess communication occors over TCP connections bound to the loopback interface or private UNIX pipes. Thus, its not possile for processes on other comuters or local users without root privileges to "snoop" on the authentication traffic.
Each authentication program must select its own scheme for persistent storage of passwords and usernames.
For the digest scheme Squid never sees the actual password, but the backend helper needs either plaintext passwords or Digest specific hashes of the same.
In the ntlm or Negotiate schemes Squid also never sees the actual password. Usually this is connected to a Windows realm or Kerberos realm and how these authentication services stores the password is outside of this document but usually it's not in plain text.
How do I use the Winbind authenticators?
by Jerry Murdock
Winbind is a recent addition to Samba providing some impressive capabilities for NT based user accounts. From Squid's perspective winbind provides a robust and efficient engine for both basic and NTLM challenge/response authentication against an NT domain controller.
The winbind authenticators have been used successfully under Linux, FreeBSD, Solaris and Tru64.
Supported Samba Releases
Samba-3.X is supported natively using the ntlm_auth helper shipped as part of Samba. No Squid specific winbind helpers need to be compiled (and even if compiled they won't work with Samba-3.X).
NOTE: Samba 2.2.X reached its End-Of-Life on October 1, 2004. It was supported using the winbind helpers shipped with Squid-2.5 but is no longer supported with later versions, even if using the helper from 2.5 may still work.
For Samba-3.X the winbind helpers which was shipped with Squid should not be used (and won't work if you attempt to do so), instead the ntlm_auth helper shipped as part of the Samba-3 distribution should be used. This helper supports all versions of Squid and both the ntlm and basic authentication schemes. For details on how to use this Samba helper see the Samba documentation. For group membership lookups the wbinfo_group helper shipped with Squid can be used (this is just a wrapper around the samba wbinfo program and works with all versions of Samba)
Configure Samba
For full details on how to configure Samba and joining a domain please see the Samba documentation. The Samba team has quite extensive documentation both on how to join a NT domain and how to join a Active Directory tree.
Samba must be built with these configure options:
--with-winbind
and is normally enabled by default if you installed Samba from a prepackaged distribution.
Then follow the Samba installation instructions. But please note that neither nsswitch or the pam modules needs to be installed for Squid to function, these are only needed if you want your OS to integrate with the domain for UNIX accounts.
Test Samba's winbindd
Edit smb.conf for winbindd functionality. The following entries in the [global] section of smb.conf may be used as a template.
workgroup = mydomain password server = myPDC security = domain winbind uid = 10000-20000 winbind gid = 10000-20000 winbind use default domain = yes
Join the NT domain as outlined in the winbindd man page for your version of samba.
Start nmbd (required to insure proper operation).
Start winbindd.
Test basic winbindd functionality "wbinfo -t":
# wbinfo -t Secret is good
Test winbindd user authentication:
# wbinfo -a mydomain\\myuser%mypasswd plaintext password authentication succeeded error code was NT_STATUS_OK (0x0) challenge/response password authentication succeeded error code was NT_STATUS_OK (0x0)
NOTE: both plaintext and challenge/response should return "succeeded." If there is no "challenge/response" status returned then Samba was not built with "--with-winbind-auth-challenge" and cannot support ntlm authentication.
SMBD and Machine Trust Accounts
The Samba team has incorporated functionality to change the machine trust account password in the new "net" command. A simple daily cron job scheduling "net rpc changetrustpw" is all that is needed, if anything at all.
winbind privileged pipe permissions
ntlm_auth requires access to the privileged winbind pipe in order to function properly. You enable this access by changing group of the winbind_privileged directory to the group you run Squid as (cache_effective_group setting in squid.conf).
chgrp squid /path/to/winbind_privileged
Configure Squid
As Samba-3.x has it's own authentication helper there is no need to build any of the Squid authentication helpers for use with Samba-3.x (and the helpers provided by Squid won't work if you do). You do however need to enable support for the ntlm scheme if you plan on using this. Also you may want to use the wbinfo_group helper for group lookups
--enable-auth="ntlm,basic" --enable-external-acl-helpers="wbinfo_group"
Test Squid without auth
Before going further, test basic Squid functionality. Make sure squid is functioning without requiring authorization.
Test the helpers
Testing the winbind ntlm helper is not really possible from the command line, but the winbind basic authenticator can be tested like any other basic helper. Make sure to run the test as your cache_effective_user
# /usr/local/bin/ntlm_auth --helper-protocol=squid-2.5-basic mydomain+myuser mypasswd OK
The helper should return "OK" if given a valid username/password. + is the domain separator set in your smb.conf
Relevant squid.conf parameters
Add the following to enable both the winbind basic and ntlm authenticators. IE will use ntlm and everything else basic:
auth_param ntlm program /usr/local/bin/ntlm_auth --helper-protocol=squid-2.5-ntlmssp auth_param ntlm children 30 auth_param ntlm max_challenge_reuses 0 auth_param ntlm max_challenge_lifetime 2 minutes # ntlm_auth from Samba 3 supports NTLM NEGOTIATE packet auth_param ntlm use_ntlm_negotiate on auth_param basic program /usr/local/bin/ntlm_auth --helper-protocol=squid-2.5-basic auth_param basic children 5 auth_param basic realm Squid proxy-caching web server auth_param basic credentialsttl 2 hours
And the following acl entries to require authentication:
acl AuthorizedUsers proxy_auth REQUIRED .. http_access allow all AuthorizedUsers
Test Squid with auth
- Internet Explorer, Mozilla, Firefox:
- Test browsing through squid with a NTLM capable browser. If logged into the domain, a password prompt should NOT pop up. Confirm the traffic really is being authorized by tailing access.log. The domain\username should be present.
Netscape, Mozilla ( < 1.4), Opera...:
- Test with a NTLM non-capable browser. A standard password dialog should appear. Entering the domain should not be required if the user is in the default domain and "winbind use default domain = yes" is set in smb.conf. Otherwise, the username must be entered in "domain+username" format. (where + is the domain separator set in smb.conf)
If no usernames appear in access.log and/or no password dialogs appear in either browser, then the acl/http_access portions of squid.conf are not correct.
Note that when using NTLM authentication, you will see two "TCP_DENIED/407" entries in access.log for every request. This is due to the challenge-response process of NTLM.
Can I use different authentication mechanisms together?
Yes, with limitations.
Commonly deployed user-agents support at least one and up to four different authentication protocols (also called schemes):
- Basic
- Digest
- NTLM
- Negotiate
Those schemes are explained in detail elsewhere (see ../ProxyAuthentication, NegotiateAuthentication and ../TroubleShooting). You can enable more than one at any given moment, just configure the relevant auth_param sections for each different scheme you want to offer to the browsers.
|
Due to a bug in common User-Agents (most notably Microsoft Internet Explorer) the order the auth-schemes are configured is relevant. RFC 2617, chapter 4.6, states: A user agent MUST choose to use the strongest auth-scheme it understands. Microsoft Internet Explorer instead chooses the first authe-scheme (in the order they are offered) it understands |
In other words, you SHOULD use this order for the auth_params directives:
- negotiate
- ntlm
- digest
- basic
omitting those you do not plan to offer.
Once the admin decides to offer multiple auth-schemes to the clients, Squid can not force the clients to choose one over the other.
Can I use more than one user-database?
Generally speaking, no. The only exception is the Basic authentication scheme, where you can cook a proxy script which relays the requests to different authenticators and applies an 'OR' type of logic. For all other auth-schemes this cannot be done; this is not a limitation in squid, but it's a feature of the authentication protocols themselves: allowing multiple user-databases would open the door for replay attacks to the protocols.
References
Authentication in interception and transparent modes
Simply said, it's not possible to authenticate users using proxy authentication schemes when running in interception or transparent modes. See ../InterceptionProxy for details on why.
Other Resources
Contents
Neighbor
In Squid, neighbor usually means the same thing as peer. A neighbor cache is one that you have defined with the cache_peer configuration option. Neighbor refers to either a parent or a sibling.
In Harvest 1.4, neighbor referred to what Squid calls a sibling. That is, Harvest had parents and neighbors. For backward compatability, the term neighbor is still accepted in some Squid configuration options.
Regular Expression
Regular expressions are patterns that used for matching sequences of characters in text. For more information, see A Tao of Regular Expressions and Newbie's page.
Open-access proxies
Squid's default configuration file denies all client requests. It is the administrator's responsibility to configure Squid to allow access only to trusted hosts and/or users.
If your proxy allows access from untrusted hosts or users, you can be sure that people will find and abuse your service. Some people will use your proxy to make their browsing anonymous. Others will intentionally use your proxy for transactions that may be illegal (such as credit card fraud). A number of web sites exist simply to provide the world with a list of open-access HTTP proxies. You don't want to end up on this list.
Be sure to carefully design your access control scheme. You should also check it from time to time to make sure that it works as you expect.
Mail relaying
SMTP and HTTP are rather similar in design. This, unfortunately, may allow someone to relay an email message through your HTTP proxy. To prevent this, you must make sure that your proxy denies HTTP requests to port 25, the SMTP port.
Squid is configured this way by default. The default squid.conf file lists a small number of trusted ports. See the Safe_ports ACL in squid.conf. Your configuration file should always deny unsafe ports early in the http_access lists:
http_access deny !Safe_ports (additional http_access lines ...)
Do NOT add port 25 to Safe_ports (unless your goal is to end up in the RBL). You may want to make a cron job that regularly verifies that your proxy blocks access to port 25.
Hijackable proxies
Squid's default configuration file denies all client requests. It is the administrator's responsibility to configure Squid to allow access only to trusted hosts and/or users.
If your proxy allows unrestricted access to any ports. Some people may use your proxy to make their website anonymous. A number of websites commonly seen in Spam and Phishing emails are using this method of hiding and software is available in some circles supporting the automatic detection of these partially-open proxies.
acl mycoolapp port 1234 ... http_access allow mycoolapp
Be careful that configuration lines like these are kept behind any lines that block public access to your squid.
acl mycoolapp port 1234 ... http_access deny !localnet ... http_access allow mycoolapp OR even better: acl mycoolapp port 1234 ... http_access allow localnet mycoolapp
Include: Nothing found for "^Back to the"!
Way Too Many Cache Misses
In normal operation Squid gives very few (typically well less than 1%) code TCP_SWAPFAIL_MISS indicating an object was thought to be in the cache but couldn't be found. Once in a while though this occurs very very frequently. When lots of errors occur, the problem is the Squid cache index (probably in a file named something like swap.state at the top level of the Squid cache directory structure) is out of sync with the actual cache contents.
Here's a script I use to make sure this doesn't happen. It's way too paranoid, doing a lot of unnecessary things including throwing away what's in the cache every time. But it always works.
sample script
#!/bin/bash # restart Squid # (probably after making arbitrary config changes) echo temporarily stopping Dans Guardian [Squid user] dansguardian -q while [[ `ps aux | grep dansguardian | wc -l` -gt 1 ]]; do sleep 1 done sleep 2 echo stopping Squid so can make arbitrary changes squid -k shutdown while [[ `ps aux | grep squid | wc -l` -gt 1 ]]; do sleep 1 done sleep 2 echo flushing-by-deleting old Squid cache including index rm -rf /var/spool/squid/* sleep 2 echo creating new Squid disk cache directories and index squid -z sleep 2 echo starting Squid again with new configuration squid sleep 2 echo starting Dans Guardian [Squid user] again dansguardian
Include: Nothing found for "^Back to the"!
Pruning the Cache Down
Clearing the cache can be necessary under some unusual circumstances. Usually if the estimated size of the cache was calculated incorrectly and needs adjusting.
To fix simple cases such as the above where the cache just needs to have a portion of the total removed Altering squid.conf and reconfiguring squid is sufficient. Squid will handle the changes automatically and purge the cache down to size again within 10 minutes of the configure.
old squid.conf
cache_dir ufs /squid/cache 1000 255 255
new squid.conf
cache_dir ufs /squid/cache 100 255 255
and reconfigure ...
squid -k reconfigure
Changing the Cache Levels
Altering the cache_dir L1 and L2 sizes has not been tested with the above altering. It is still recommended to manually delete the cache directory and rebuild after altering the configuration.
squid -k shutdown rm -r /squid/cache/* squid -z squid
If your cache directory and state files are at the root level of a partition there are a few system objects you need to be careful with. To get around these you may need to change the rm -r to a safer list of specific squid files:
rm -rf /squid/cache/[0-9]* rm -f /squid/cache/swap* rm -f /squid/cache/netdb* rm -f /squid/cache/*.log
If you wish to try the pruning method with a level change and let us know the results then please do. We would like this page to cover all known resizing requirements and options.
Back to the SquidFaq