一种轻量级、非侵入式的网站监控方法(运维视角)
我是一名Linux运维工程师,专注于DevOps/SRE领域。在过去几个月里,我利用业余时间开展了一个小型的*网站监控*副项目:
https://inostop.com/en/
我之前构建的大多数监控和运维工具都是在公司内部使用的。这是我第一次尝试将一个相对完整的工具转变为可公开使用的产品。
在日常运维中,网站监控通常涉及以下内容:
- 基础设施监控
- 应用程序/API监控
- 部分CDN监控
这些通常是基于Prometheus或Zabbix等工具构建的,结合日志系统(ELK/OpenObserve)和分布式追踪(OpenTelemetry)。虽然这些技术栈功能强大,但当你只是想快速监控网站的可用性时,它们可能显得*过于复杂和沉重*。
这促使我尝试一种更简单的方法:
- 非侵入式(无需代码更改/侧车模式)
- 采用带外探测来评估网站可用性
- 设定保守的阈值以减少误报
到目前为止,该项目涵盖了:
- 域名和TLS证书监控,Ping和Telnet检查
- 基本警报阈值和多阶段警报静音,以减少警报疲劳
目前仍面临一些挑战:
- 网站监控结果的用户体验仍有改进空间(后端使用Go编写)。
- AI目前仅作为收集数据的分析层,而不是主动执行真实网络探测。
该项目仍在不断发展(我重写了其中的部分内容,次数比我愿意承认的还要多)。
如果你想试用,可以使用早期访问代码*95f40841e4888668c4d5f7e88506075d*,有效期为1个月,主要用于收集早期反馈。
我非常希望听到社区的反馈:
- 轻量级、非侵入式的网站监控方法在实际中是否可行?
- 是否有更好的模式或架构值得探索?
- 如果你是QA或测试工程师,我很想听听你的想法。
查看原文
I’m a Linux ops engineer working in the DevOps/SRE space. and over the past few months, I’ve been working on a small *website monitoring* side project in my spare time:
https://inostop.com/en/<p>Most of the monitoring and ops tools I’ve built before were used internally within companies. This is my first attempt to turn a relatively complete tool into something publicly usable.<p>In day-to-day operations, website monitoring usually involves:<p>- Infrastructure monitoring
- Application / API monitoring
- Partial CDN monitoring<p>These are often built on top of tools like Prometheus or Zabbix, combined with log systems (ELK / OpenObserve) and distributed tracing (OpenTelemetry).
While powerful, this stack can feel *heavyweight and overkill* when you just want to quickly monitor a website’s availability.<p>That led me to experiment with a simpler approach:<p>- Non-intrusive (no code changes required/Sidecar)
- Out-of-band probing to estimate website availability
- Conservative thresholds to reduce false alarms<p>So far, the project covers:<p>- Domain and TLS certificate monitoring, Ping, Telnet checks
- Basic alert thresholds and multi-stage alert silencing to reduce alert fatigue<p>There are still open challenges:<p>- There’s still room to improve the UX of the Website Monitoring results (backend is written in Go).<p>- AI currently works only as an analysis layer on collected data, rather than actively performing real network probes<p>This project is still evolving (I’ve rewritten parts of it more times than I’d like to admit ).<p>If you’d like to try it out, there’s an early access code *95f40841e4888668c4d5f7e88506075d*, valid for 1 months, mainly for collecting early feedback.<p>I’d love to hear feedback from the community:<p>- Does a lightweight, non-intrusive website monitoring approach make sense in practice?
- Are there better patterns or architectures worth exploring?
- If you’re a QA or test engineer, I’d love to hear your thoughts.