Tải bản đầy đủ (.pdf) (770 trang)

o'reilly - squid the definitive guide

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (4.15 MB, 770 trang )

< Day Day Up >



Table of Contents

Index

Reviews

Reader Reviews

Errata

Academic
Squid: The Definitive Guide
By
Duane Wessels

Publisher: O'Reilly
Pub Date: January 2004
ISBN: 0-596-00162-2
Pages: 496

Squid is the most popular Web caching software in use today, and it works on a variety of
platforms including Linux, FreeBSD, and Windows. Written by Duane Wessels, the creator of
Squid, Squid: The Definitive Guide will help you configure and tune Squid for your particular
situation. Newcomers to Squid will learn how to download, compile, and install code. Seasoned
users of Squid will be interested in the later chapters, which tackle advanced topics such as
high-performance storage options, rewriting requests, HTTP server acceleration, monitoring,
debugging, and troubleshooting Squid.


< Day Day Up >


< Day Day Up >



Table of Contents

Index

Reviews

Reader Reviews

Errata

Academic
Squid: The Definitive Guide
By
Duane Wessels

Publisher: O'Reilly
Pub Date: January 2004
ISBN: 0-596-00162-2
Pages: 496


Copyright


Dedication

Preface


About This Book


Recommended Reading


Conventions Used in This Book


Comments and Questions


Acknowledgments

Chapter 1. Introduction


Section 1.1. Web Caching


Section 1.2. A Brief History of Squid


Section 1.3. Hardware and Operating System Requirements



Section 1.4. Squid Is Open Source


Section 1.5. Squid's Home on the Web


Section 1.6. Getting Help


Section 1.7. Getting Started with Squid


Section 1.8. Exercises

Chapter 2. Getting Squid


Section 2.1. Versions and Releases


Section 2.2. Use the Source, Luke


Section 2.3. Precompiled Binaries


Section 2.4. Anonymous CVS



Section 2.5. devel.squid-cache.org


Section 2.6. Exercises

Chapter 3. Compiling and Installing


Section 3.1. Before You Start


Section 3.2. Unpacking the Source


Section 3.3. Pretuning Your Kernel


Section 3.4. The configure Script


Section 3.5. make


Section 3.6. make Install


Section 3.7. Applying a Patch


Section 3.8. Running configure Later



Section 3.9. Exercises

Chapter 4. Configuration Guide for the Eager


Section 4.1. The squid.conf Syntax


Section 4.2. User IDs


Section 4.3. Port Numbers


Section 4.4. Log File Pathnames


Section 4.5. Access Controls


Section 4.6. Visible Hostname


Section 4.7. Administrative Contact Information


Section 4.8. Next Steps



Section 4.9. Exercises

Chapter 5. Running Squid


Section 5.1. Squid Command-Line Options


Section 5.2. Check Your Configuration File for Errors


Section 5.3. Initializing Cache Directories


Section 5.4. Testing Squid in a Terminal Window


Section 5.5. Running Squid as a Daemon Process


Section 5.6. Boot Scripts


Section 5.7. A chroot Environment


Section 5.8. Stopping Squid



Section 5.9. Reconfiguring a Running Squid Process


Section 5.10. Rotating the Log Files


Section 5.11. Exercises

Chapter 6. All About Access Controls


Section 6.1. Access Control Elements


Section 6.2. Access Control Rules


Section 6.3. Common Scenarios


Section 6.4. Testing Access Controls


Section 6.5. Exercises

Chapter 7. Disk Cache Basics


Section 7.1. The cache_dir Directive



Section 7.2. Disk Space Watermarks


Section 7.3. Object Size Limits


Section 7.4. Allocating Objects to Cache Directories


Section 7.5. Replacement Policies


Section 7.6. Removing Cached Objects


Section 7.7. refresh_pattern


Section 7.8. Exercises

Chapter 8. Advanced Disk Cache Topics


Section 8.1. Do I Have a Disk I/O Bottleneck?


Section 8.2. Filesystem Tuning Options



Section 8.3. Alternative Filesystems


Section 8.4. The aufs Storage Scheme


Section 8.5. The diskd Storage Scheme


Section 8.6. The coss Storage Scheme


Section 8.7. The null Storage Scheme


Section 8.8. Which Is Best for Me?


Section 8.9. Exercises

Chapter 9. Interception Caching


Section 9.1. How It Works


Section 9.2. Why (Not) Intercept?


Section 9.3. The Network Device



Section 9.4. Operating System Tweaks


Section 9.5. Configure Squid


Section 9.6. Debugging Problems


Section 9.7. Exercises

Chapter 10. Talking to Other Squids


Section 10.1. Some Terminology


Section 10.2. Why (Not) Use a Hierarchy?


Section 10.3. Telling Squid About Your Neighbors


Section 10.4. Restricting Requests to Neighbors


Section 10.5. The Network Measurement Database



Section 10.6. Internet Cache Protocol


Section 10.7. Cache Digests


Section 10.8. Hypertext Caching Protocol


Section 10.9. Cache Array Routing Protocol


Section 10.10. Putting It All Together


Section 10.11. How Do I


Section 10.12. Exercises

Chapter 11. Redirectors


Section 11.1. The Redirector Interface


Section 11.2. Some Sample Redirectors



Section 11.3. The Redirector Pool


Section 11.4. Configuring Squid


Section 11.5. Popular Redirectors


Section 11.6. Exercises

Chapter 12. Authentication Helpers


Section 12.1. Configuring Squid


Section 12.2. HTTP Basic Authentication


Section 12.3. HTTP Digest Authentication


Section 12.4. Microsoft NTLM Authentication


Section 12.5. External ACLs


Section 12.6. Exercises


Chapter 13. Log Files


Section 13.1. cache.log


Section 13.2. access.log


Section 13.3. store.log


Section 13.4. referer.log


Section 13.5. useragent.log


Section 13.6. swap.state


Section 13.7. Rotating the Log Files


Section 13.8. Privacy and Security


Section 13.9. Exercises


Chapter 14. Monitoring Squid


Section 14.1. cache.log Warnings


Section 14.2. The Cache Manager


Section 14.3. Using SNMP


Section 14.4. Exercises

Chapter 15. Server Accelerator Mode


Section 15.1. Overview


Section 15.2. Configuring Squid


Section 15.3. Gee, That Was Confusing!


Section 15.4. Access Controls


Section 15.5. Content Negotiation



Section 15.6. Gotchas


Section 15.7. Exercises

Chapter 16. Debugging and Troubleshooting


Section 16.1. Some Common Problems


Section 16.2. Debugging via cache.log


Section 16.3. Core Dumps, Assertions, and Stack Traces


Section 16.4. Replicating Problems


Section 16.5. Reporting a Bug


Section 16.6. Exercises

Appendix A. Config File Reference



http_port


https_port


ssl_unclean_shutdown


icp_port


htcp_port


mcast_groups


udp_incoming_address


udp_outgoing_address


cache_peer


cache_peer_domain



neighbor_type_domain


icp_query_timeout


maximum_icp_query_timeout


mcast_icp_query_timeout


dead_peer_timeout


hierarchy_stoplist


no_cache


cache_access_log


cache_log


cache_store_log



cache_swap_log


emulate_httpd_log


log_ip_on_direct


cache_dir


cache_mem


cache_swap_low


cache_swap_high


maximum_object_size


minimum_object_size


maximum_object_size_in_memory



cache_replacement_policy


memory_replacement_policy


store_dir_select_algorithm


mime_table


ipcache_size


ipcache_low


ipcache_high


fqdncache_size


log_mime_hdrs


useragent_log



referer_log


pid_filename


debug_options


log_fqdn


client_netmask


ftp_user


ftp_list_width


ftp_passive


ftp_sanitycheck


cache_dns_program



dns_children


dns_retransmit_interval


dns_timeout


dns_defnames


dns_nameservers


hosts_file


diskd_program


unlinkd_program


pinger_program


redirect_program



redirect_children


redirect_rewrites_host_header


redirector_access


redirector_bypass


auth_param


authenticate_ttl


authenticate_cache_garbage_interval


authenticate_ip_ttl


external_acl_type


wais_relay_host



wais_relay_port


request_header_max_size


request_body_max_size


refresh_pattern


quick_abort_min


quick_abort_max


quick_abort_pct


negative_ttl


positive_dns_ttl


negative_dns_ttl



range_offset_limit


connect_timeout


peer_connect_timeout


read_timeout


request_timeout


persistent_request_timeout


client_lifetime


half_closed_clients


pconn_timeout


ident_timeout



shutdown_lifetime


acl


http_access


http_reply_access


icp_access


miss_access


cache_peer_access


ident_lookup_access


tcp_outgoing_tos


tcp_outgoing_address



reply_body_max_size


cache_mgr


cache_effective_user


cache_effective_group


visible_hostname


unique_hostname


hostname_aliases


announce_period


announce_host


announce_file



announce_port


httpd_accel_host


httpd_accel_port


httpd_accel_single_host


httpd_accel_with_proxy


httpd_accel_uses_host_header


dns_testnames


logfile_rotate


append_domain


tcp_recv_bufsize



err_html_text


deny_info


memory_pools


memory_pools_limit


forwarded_for


log_icp_queries


icp_hit_stale


minimum_direct_hops


minimum_direct_rtt


cachemgr_passwd



store_avg_object_size


store_objects_per_bucket


client_db


netdb_low


netdb_high


netdb_ping_period


query_icmp


test_reachability


buffered_logs


reload_into_ims



always_direct


never_direct


header_access


header_replace


icon_directory


error_directory


maximum_single_addr_tries


snmp_port


snmp_access


snmp_incoming_address



snmp_outgoing_address


as_whois_server


wccp_router


wccp_version


wccp_incoming_address


wccp_outgoing_address


delay_pools


delay_class


delay_access


delay_parameters



delay_initial_bucket_level


incoming_icp_average


incoming_http_average


incoming_dns_average


min_icp_poll_cnt


min_dns_poll_cnt


min_http_poll_cnt


max_open_disk_fds


offline_mode


uri_whitespace



broken_posts


mcast_miss_addr


mcast_miss_ttl


mcast_miss_port


mcast_miss_encode_key


nonhierarchical_direct


prefer_direct


strip_query_terms


coredump_dir


ignore_unknown_nameservers



digest_generation


digest_bits_per_entry


digest_rebuild_period


digest_rewrite_period


digest_swapout_chunk_size


digest_rebuild_chunk_percentage


chroot


client_persistent_connections


server_persistent_connections


pipeline_prefetch



extension_methods


request_entities


high_response_time_warning


high_page_fault_warning


high_memory_warning


ie_refresh


vary_ignore_expire


sleep_after_fork

Appendix B. The Memory Cache

Appendix C. Delay Pools


Section C.1. Overview



Section C.2. Configuring Squid


Section C.3. Examples


Section C.4. Issues


Section C.5. Monitoring Delay Pools

Appendix D. Filesystem Performance Benchmarks


Section D.1. The Benchmark Environment


Section D.2. General Comments


Section D.3. Linux


Section D.4. FreeBSD


Section D.5. OpenBSD



Section D.6. NetBSD


Section D.7. Solaris


Section D.8. Number of Disk Spindles

Appendix E. Squid on Windows


Section E.1. Cygwin


Section E.2. SquidNT

Appendix F. Configuring Squid Clients


Section F.1. Manually


Section F.2. Proxy Auto-Configuration


Section F.3. WPAD


Section F.4. Summary


Colophon

Index

< Day Day Up >


< Day Day Up >

Copyright © 2004 O'Reilly Media, Inc.
Printed in the United States of America.
Published by O'Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O'Reilly & Associates books may be purchased for educational, business, or sales promotional
use. Online editions are also available for most titles (
). For more
information, contact our corporate/institutional sales department: (800) 998-9938 or

Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered
trademarks of O'Reilly Media, Inc. Squid: The Definitive Guide, the image of a giant squid and
related trade dress are trademarks of O'Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in this book, and O'Reilly & Associates
was aware of a trademark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and
authors assume no responsibility for errors or omissions, or for damages resulting from the use
of the information contained herein.

< Day Day Up >



< Day Day Up >

Dedication
To my darling Anne. You have no idea.

< Day Day Up >


< Day Day Up >

Preface
About This Book
Recommended Reading
Conventions Used in This Book
Comments and Questions
Acknowledgments

< Day Day Up >


< Day Day Up >

About This Book
I started the Squid project eight years ago while working at the National Laboratory for Applied
Network Research and the University of California. Back then I certainly enjoyed writing code
and fixing bugs but always felt bad about the lack of decent documentation. This book is my
attempt to rectify that situation. It's been a long time coming and almost didn't happen. Like
they say, "better late than never!"
This book is written for those who are tasked with setting up and maintaining one or more
Squid caches. If you're new to Squid, I'll show you how to download, compile, and install the

code. Those of you who have been using Squid for a while will be more interested in the later
chapters, where I talk about disk cache performance, modifying requests, surrogate mode,
caching hierarchies, monitoring Squid, and more.
In order to use this book, you should have a basic knowledge of Unix systems. Many of the
book's examples are based on free operating systems, such as Linux, FreeBSD, NetBSD, and
OpenBSD. I also have some tips for Solaris users. If you're more comfortable with Windows
systems, you can use Squid under a Unix emulator or give the native NT port a try.
Here's an overview of the book's contents:
Chapter 1, Introduction
This chapter introduces you to Squid and web caching. I give a brief history of the
project, and a few notes on our future work. I explain how you can find additional
support and information, including a FAQ, on the Squid web site.
Chapter 2, Getting Squid
In this chapter, I explain how and why you should download Squid's source code. You
may prefer to install a precompiled binary or use a preconfigured package. I also talk
about staying up to date with Squid using the anonymous CVS server.
Chapter 3, Compiling and Installing
Assuming you've downloaded the source code, this chapter explains how to configure
and compile Squid. In some cases you may need to tune your system before compiling
Squid. For example, your kernel may have relatively low file-descriptor limits that affect
Squid's performance.
Chapter 4, Configuration Guide for the Eager
Here, I give a brief introduction to Squid's configuration file. If you are the impatient
type and can't wait to start using Squid, this chapter will leave you with a minimal
configuration file you can start playing with.
Chapter 5, Running Squid
In this chapter, I explain how to run Squid for the first time and how to test Squid in a
terminal window. Following that, I suggest a number of ways to configure your system
so that Squid starts each time it boots. I also explain how to reconfigure Squid while it is
running and how to safely shut it down.

Chapter 6, All About Access Controls
I talk extensively about access controls in this chapter. Squid has a powerful collection
of access control features and a number of different rule sets that determine how
requests and responses are treated. This is an important chapter because a mistake in
your access controls may leave your cache, or even internal systems, vulnerable to
abuse from outsiders.
Chapter 7, Disk Cache Basics
This chapter is about Squid's primary function: storing cached responses on disk. I
explain how to configure the disk cache, including replacement policies and freshness
controls. I also show you how to manually remove unwanted objects from the cache.
Chapter 8, Advanced Disk Cache Topics
In this chapter, I explain how to improve the performance of Squid's disk cache. I'll talk
about Squid's different storage schemes and a number of filesystem tuning options that
may help. If your Squid cache handles a relatively light load, you probably don't need to
worry about disk performance.
Chapter 9, Interception Caching
Here, I explain how to configure Squid for HTTP interception, sometimes also called
transparent caching. Actually, configuring Squid is the easy part. The difficulty comes
from setting up a router or switch on your network and the host from which Squid is
running. I explain how to configure networking equipment from Cisco, Alteon, Foundry,
and Extreme. I'll also show you how to configure your operating system (Linux,
FreeBSD, NetBSD, OpenBSD, and Solaris) for HTTP interception. Finally, I talk about
WCCP.
Chapter 10, Talking to Other Squids
In this chapter, I cover the ins and outs of cache cooperation, including meshes, arrays,
and hierarchies. You may also find it useful if you simply need to forward requests from
Squid to another proxy or intermediary. I'll talk about the various intercache protocols
supported by Squid (ICP, HTCP, Cache Digests, and CARP) and how Squid chooses the
next-hop location for a given cache miss.
Chapter 11, Redirectors

Redirectors are the best way to make Squid rewrite HTTP requests before forwarding
them. I describe the interface between Squid and a redirector program so that you can
write your own. I also present a few of the more popular third-party redirectors
available.
Chapter 12, Authentication Helpers
In this chapter, I explain how Squid interfaces with external authentication databases
such as LDAP, NT domain controllers, and password files. Squid comes with a number of
authentication helpers and understands Basic, Digest, and NTLM authentication
credentials. I also document the API for each, in case you want to develop your own
helper.
Chapter 13, Log Files
I cover Squid's various log files in this chapter, including access.log, store.log, cache.
log, and others. I explain what each log file contains and how you should periodically
maintain them.
Chapter 14, Monitoring Squid
This chapter provides a lot of information on monitoring Squid's operation. I cover both
SNMP and Squid's own cache manager interface. You'll find it useful for both long-term
monitoring and short-term problem diagnosis.
Chapter 15, Server Accelerator Mode
Squid's server accelerator mode is useful in a number of situations. You can use it to
boost your origin server's poor performance, as a firewall to protect the server, or even
to build your own content delivery network. I show how to set up Squid and make sure
that outsiders can't abuse your service.
Chapter 16, Debugging and Troubleshooting
The book's final chapter explains how to debug and troubleshoot problems with Squid.
You may find that some sites, or some user agents, don't work properly with Squid. I
show how to isolate and reproduce the problem and how to present the information to
Squid developers for assistance.
Appendix A, Config File Reference
This appendix is a reference guide for each of Squid's 200 configuration file directives.

Each has a description, syntax, defaults, and examples.
Appendix B, The Memory Cache
This brief appendix explains a little about Squid's memory cache.
Appendix C, Delay Pools
You can use Squid's delay pools feature to limit bandwidth consumed by web surfers. I
explain how the delay pools work and provide a number of example configurations.
Appendix D, Filesystem Performance Benchmarks
In this appendix, I present the results of numerous filesystem benchmarks. These may
help you make informed decisions regarding particular operating systems, filesystem
features, and Squid's storage techniques.
Appendix E, Squid on Windows
Have a look at this appendix if you'd like to run Squid on your Windows box. I talk about
using Cygwin and about a native port of Squid, called SquidNT.
Appendix F, Configuring Squid Clients
This appendix contains information on how to configure various user agents to use
Squid. I talk about manual configuration, environment variables, Proxy Auto-
Configuration functions, and the Web Proxy Auto Discovery protocol.
As I'm finishing up this book, the latest stable version is Squid-2.5.STABLE4, and the
development version is Squid-3.0. Perhaps the most important difference between the two is
that Squid-3 is being rewritten in C++. You should find that most things are backward-
compatible, although a few new configuration directives have been created. Please read the
release notes carefully if you use Squid-3.0 or later.
I have created a web site for the book, located at
There, you will find
errata, supplemental information, and links to online resources.
Topics Not Covered
Due to a lack of time and space, there are some topics I was unable to cover in this book; they
include:
Non-HTTP protocols
You'll find that I mostly talk about HTTP, even though Squid also supports FTP, Gopher,

and some other relatively obscure protocols.
Customizing error messages
Squid's error messages can be customized and the source distribution includes versions
of the error messages in a number of different languages. You can probably figure out
how to customize the error messages by modifying the default pages or by reading
Squid's source code.
Load balancing Squids
Load balancing is a popular way to increase the capacity of a caching service. Refer to
one of the load balancing books mentioned in the following section if necessary.
What is cachable
HTTP has a number of somewhat complicated rules for determining what may, or may
not be, cached, and for how long. Refer to Web Caching, or HTTP: The Definitive Guide
(for more information, see the next section).
Copyright
A number of nontechnical issues surround web caching. These include copyrights and
privacy.
Modifying the source
I don't go into detail about Squid's source code in this book. The Squid project hosts a
programmers' guide, which is generally incomplete and out of date. If you have
questions about the source code, please join the squid-dev mailing list.
SOCKS
Squid doesn't support the SOCKS protocol at this time.

< Day Day Up >


< Day Day Up >

Recommended Reading
While reading this book, you may want to consult some of these other resources for more

information (I'll refer to them throughout this book):
● The Design and Implementation of the 4.4 BSD Operating System by Marshall Kirk
McKusick, Kieth Bostic, Michael J. Karels, and John S. Quarterman (Addison-Wesley
Longman)
● DNS and BIND by Paul Albitz and Cricket Liu (O'Reilly & Associates)
● HTTP: The Definitive Guide by David Gourley and Brian Totty (O'Reilly)
● Load Balancing Servers, Firewalls, and Caches by Chandra Koopurapu (John Wiley &
Sons)
● Mastering Regular Expressions by Jeffrey E. F. Friedl (O'Reilly)
● Server Load Balancing by Tony Bourke (O'Reilly)
● Unix System Administration Handbook and Linux System Administration Handbook by
Evi Nemeth, Garth Snyder, Scott Seebass, and Trent R. Hein (Prentice Hall)
● My book, Web Caching (O'Reilly)
● RFC 1413: Identification Protocol
● RFC 1738: Uniform Resource Locators (URL)
● RFC 2186: Internet Cache Protocol (ICP), Version 2
● RFC 2187: Application of Internet Cache Protocol (ICP), Version 2
● RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax
● RFC 2616: Hypertext Transfer Protocol—HTTP/1.1
● RFC 2617: HTTP Authentication: Basic and Digest Access Authentication
● RFC 2756: Hypertext Caching Protocol
● RFC 2817: Upgrading to TLS Within HTTP/1.1
● RFC 3040: Internet Web Replication and Caching Taxonomy
● RFC 3143: Known HTTP Proxy/Caching Problems
● Caching-related web sites, such as and -
cache.com/

< Day Day Up >



< Day Day Up >

Conventions Used in This Book
I use the following typesetting conventions in this book:
Italic
Used for new terms where they are defined, buttons, pages, configuration file directives,
filenames, modules, ACLs, directories, and URI/URLs
Constant width
Used for configuration file examples, program output, HTTP header names and
directives, scripts, options, environment variables, functions, methods, rules, keywords,
libraries, and command names
Constant width italic
Used for replaceable text within examples and code pieces
Constant width bold
Used to indicate commands to be typed verbatim
When displaying a Unix command, I'll include a shell prompt, like this:
% ls -l
If the command is specific to the Bourne shell (sh) or C shell (csh), the prompt will indicate
which you should use:
sh$ ulimit -a
csh% limits
If the command requires super-user privileges, the shell prompt is a hash mark:
# make install
Occasionally, I provide configuration file examples with long lines. If the line is too wide to fit
on the page, it's wrapped around and indented. Squid doesn't accept this sort of syntax, so you
must make sure to place everything on one line.
This icon signifies a tip, suggestion, or general note.
This icon indicates a warning or caution.

< Day Day Up >



< Day Day Up >

Comments and Questions
Please address comments and questions concerning this book to the publisher:
O'Reilly & Associates, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
(800) 998-9938 (in the United States or Canada)
(707) 829-0515 (international or local)
(707) 829-0104 (fax)
There is a web page for this book, which lists errata, examples, and any additional information.
You can access this page at:
/>To comment or ask technical questions about this book, send email to:

For more information about books, conferences, Resource Centers, and the O'Reilly Network,
check the O'Reilly web site at:

You can contact the author at


< Day Day Up >


< Day Day Up >

Acknowledgments
Looking back at the events and people that allowed me to write this book makes me feel
extremely humble and grateful. I'm so happy to have been a part of the Harvest project with

Mike Schwartz, Peter Danzig, and the others. That led directly to my work with kc claffy and
Hans-Werner Braun at NLANR/UCSD. The Squid project would have never been at all without
their support, and the grant from the National Science Foundation.
I'm also very thankful for all the hard work put in by the small crew of Squid developers:
Henrik Nordström, Robert Collins, Adrian Chadd, and everyone else who has contributed time
and code to the project. And I'm sorry that you ever had to read and/or fix any ugly code I
wrote.
To all the reviewers who read the drafts—Joe Cooper, Scott Pepple, Robert Collins, and Adrian
Chadd—thanks for finding my mistakes and suggesting ways to make the book better. I also
owe so much to the people at O'Reilly for making the book possible, and for making it all come
together. My editors Tatiana Diaz and Nat Torkington, the production editor Mary Anne Mayo,
the graphic designer Melanie Wang, the illustrator, Rob Romano, the XML mungers Andrew
Savikas and Joe Wizda, and the countless other folks working behind the scenes for me.
To my good friend, and business partner, Alex Rousskov: thanks for giving me the time and
freedom to see this little project through. Finally, to the members of my new family, Annie and
Blooey, thanks for putting up with the late nights. Can I make it up to you with extra back
scratches?

< Day Day Up >


< Day Day Up >

Chapter 1. Introduction
This long-overdue book is about Squid: a popular open source caching proxy for the Web. With
Squid you can:
● Use less bandwidth on your Internet connection when surfing the Web
● Reduce the amount of time web pages take to load
● Protect the hosts on your internal network by proxying their web traffic
● Collect statistics about web traffic on your network

● Prevent users from visiting inappropriate web sites at work or school
● Ensure that only authorized users can surf the Internet
● Enhance your user's privacy by filtering sensitive information from web requests
● Reduce the load on your own web server(s)
● Convert encrypted (HTTPS) requests on one side, to unencrypted (HTTP) requests on
the other
Squid's job is to be both a proxy and a cache. As a proxy, Squid is an intermediary in a web
transaction. It accepts a request from a client, processes that request, and then forwards the
request to the origin server. The request may be logged, rejected, and even modified before
forwarding. As a cache, Squid stores recently retrieved web content for possible reuse later.
Subsequent requests for the same content may be served from the cache, rather than
contacting the origin server again. You can disable the caching part of Squid if you like, but the
proxying part is essential.
Figure 1-1. Squid sits between clients and servers
As Figure 1-1 shows, Squid accepts HTTP (and HTTPS) requests from clients, and speaks a
number of protocols to servers. In particular, Squid knows how to talk to HTTP, FTP, and
Gopher servers.
[1]
Conceptually, Squid has two "sides." The client-side talks to web clients (e.
g., browsers and user-agents); the server-side talks to HTTP, FTP, and Gopher servers. These
are called origin servers, because they are the origin location for the data they serve.

×