内容简介
网络应用牵涉到很多专业人土,而网站运维人员必须确保应用的每一部分在其整个生命周期中都能正常工作。当初创公司遭遇了未曾预期的访问流量尖峰,或者当某个新特性导致成熟应用失效时,你就需要这样的专业知识。在这部文章和访谈集中,网站运维老手theo schlossnagle、baron schwartz和alistair croll向这个日新月异的领域提供了他们的真知灼见。你还将学到如何使网站蓬勃发展的秘诀,这是来自·规模网站建设者的第一手资料。 ·学习网站运维技能,了解这些技巧来自于经验而非学校教育的原因 ·理解为何从应用程序和基础设施收集统计数据都很重要 ·为数据库架构和规模日益增长带来的隐患考虑通用的处理方法 ·学习如何处理宕机和降级相关的人为因素 ·找到在蜂拥而至的巨大流量后避免灾难的方法 ·问题发生后了解症结所在,防止其再次发生
目录
foreword
preface
1 web operations: the career
theo schlossnagle
why does web operations have it tough?
from apprentice to master
conclusion
2 how picnik uses cloud computing: lessons learned
justin huff
where the cloud fits (and why!)
where the cloud doesn't fit (for picnik)
conclusion
3 infrastructure and application metrics
john aiispaw, with matt massie
time resolution and retention concerns
locality of metrics collection and storage
layers of metrics
providing context for anomaly detection and alerts
log lines are metrics, too
correlation with change management and incident timelines
making metrics available to your alerting mechanisms
using metrics to guide load-feedback mechanisms
a metrics collection system, illustrated: ganglia
conclusion
4 continuous deployment
eric ries
small batches mean faster feedback
small batches mean problems are instantly localized
small batches reduce risk
small batches reduce overhead
the quality defenders' lament
getting started
continuous deployment is for mission-critical
applications
conclusion
5 infrastructure as code
adam jacob
service-oriented architecture
conclusion
6 monitoring
patrick debois
story: "the start of a journey"
step 1: understand what you are monitoring
step 2: understand normal behavior
step 3: be prepared and learn
conclusion
7 how complex systems fail
john aiispaw and richard cook
how complex systems fail
further reading
8 community management and web operations
heather champ and john aiispaw
9 dealing with unexpected traffic spikes
brian moon
how it all started
alarms abound
putting out the fire
surviving the weekend
preparing for the future
cdn to the rescue
proxy servers
?corralling the stampede
streamlining the codebase
how do we know it works?
the real test
lessons learned
improvements since then
10 dev and cps collaboration and cooperation
paul hammond
deployment
shared, open infrastructure
trust
on-call developers
avoiding blame
conclusion
11 how your visitors feel: user-facing metrics
alistair croll and sean power
why collect user-facing metrics?
what makes a site slow?
measuring delay
building an sla
visitor outcomes: analytics
other metrics marketing cares about
how user experience affects web cps
the future of web monitoring
conclusion
12 relational database strategy and tactics for the web
baron schwartz
requirements for web databases
how typical web databases grow
the yearning for