一种轻量级、非侵入式的网站监控方法(运维视角)

2作者: marksugar2 天前原帖
我是一名Linux运维工程师,专注于DevOps/SRE领域。在过去几个月里,我利用业余时间开展了一个小型的*网站监控*副项目: https://inostop.com/en/ 我之前构建的大多数监控和运维工具都是在公司内部使用的。这是我第一次尝试将一个相对完整的工具转变为可公开使用的产品。 在日常运维中,网站监控通常涉及以下内容: - 基础设施监控 - 应用程序/API监控 - 部分CDN监控 这些通常是基于Prometheus或Zabbix等工具构建的,结合日志系统(ELK/OpenObserve)和分布式追踪(OpenTelemetry)。虽然这些技术栈功能强大,但当你只是想快速监控网站的可用性时,它们可能显得*过于复杂和沉重*。 这促使我尝试一种更简单的方法: - 非侵入式(无需代码更改/侧车模式) - 采用带外探测来评估网站可用性 - 设定保守的阈值以减少误报 到目前为止,该项目涵盖了: - 域名和TLS证书监控,Ping和Telnet检查 - 基本警报阈值和多阶段警报静音,以减少警报疲劳 目前仍面临一些挑战: - 网站监控结果的用户体验仍有改进空间(后端使用Go编写)。 - AI目前仅作为收集数据的分析层,而不是主动执行真实网络探测。 该项目仍在不断发展(我重写了其中的部分内容,次数比我愿意承认的还要多)。 如果你想试用,可以使用早期访问代码*95f40841e4888668c4d5f7e88506075d*,有效期为1个月,主要用于收集早期反馈。 我非常希望听到社区的反馈: - 轻量级、非侵入式的网站监控方法在实际中是否可行? - 是否有更好的模式或架构值得探索? - 如果你是QA或测试工程师,我很想听听你的想法。
查看原文
I’m a Linux ops engineer working in the DevOps&#x2F;SRE space. and over the past few months, I’ve been working on a small *website monitoring* side project in my spare time: https:&#x2F;&#x2F;inostop.com&#x2F;en&#x2F;<p>Most of the monitoring and ops tools I’ve built before were used internally within companies. This is my first attempt to turn a relatively complete tool into something publicly usable.<p>In day-to-day operations, website monitoring usually involves:<p>- Infrastructure monitoring - Application &#x2F; API monitoring - Partial CDN monitoring<p>These are often built on top of tools like Prometheus or Zabbix, combined with log systems (ELK &#x2F; OpenObserve) and distributed tracing (OpenTelemetry). While powerful, this stack can feel *heavyweight and overkill* when you just want to quickly monitor a website’s availability.<p>That led me to experiment with a simpler approach:<p>- Non-intrusive (no code changes required&#x2F;Sidecar) - Out-of-band probing to estimate website availability - Conservative thresholds to reduce false alarms<p>So far, the project covers:<p>- Domain and TLS certificate monitoring, Ping, Telnet checks - Basic alert thresholds and multi-stage alert silencing to reduce alert fatigue<p>There are still open challenges:<p>- There’s still room to improve the UX of the Website Monitoring results (backend is written in Go).<p>- AI currently works only as an analysis layer on collected data, rather than actively performing real network probes<p>This project is still evolving (I’ve rewritten parts of it more times than I’d like to admit ).<p>If you’d like to try it out, there’s an early access code *95f40841e4888668c4d5f7e88506075d*, valid for 1 months, mainly for collecting early feedback.<p>I’d love to hear feedback from the community:<p>- Does a lightweight, non-intrusive website monitoring approach make sense in practice? - Are there better patterns or architectures worth exploring? - If you’re a QA or test engineer, I’d love to hear your thoughts.