Grafana Loki单节点部署指南:阿里云OSS存储生产可用方案

本文将带你快速部署一个生产可用的 Loki + Grafana 日志聚合方案,存储后端使用阿里云 OSS。

📊 部署效果

想了解实际查询体验如何?查看这篇体验文章:
👉 Grafana Loki:别再空谈架构,我带你看真实查询效果

🚀 10分钟快速部署(MVP)

前置准备

① 前置条件检查

# 1. 系统更新
sudo apt update && sudo apt install -y wget gnupg2 software-properties-common
# 2. 已创建 OSS Bucket + RAM 子账号(见前一篇)
# 3. 内网/公网 Endpoint:oss-cn-<region>-internal.aliyuncs.com

② 安装二进制包(APT 源)

# 导入 GPG 与仓库
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://apt.grafana.com/gpg.key | sudo gpg --dearmor \
  -o /etc/apt/keyrings/grafana.gpg
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] \
  https://apt.grafana.com stable main" | sudo tee /etc/apt/sources.list.d/grafana.list

sudo apt update
sudo apt install -y loki promtail grafana

③ 目录权限

sudo mkdir -p /var/lib/loki/{index,index_cache,compactor}
sudo chown -R loki /var/lib/loki
sudo chmod 750 /var/lib/loki

④ 配置 Loki(/etc/loki/config.yml

为了便于理解,我这里把官方的文档都放上来注释掉了,文章的末尾附录部分也复制了几个官方的example。

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: info #开始可以使用debug模式查看

common:
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory
      # The number of ingesters to write to and read from.
      # CLI flag: -pattern-ingester.distributor.replication-factor
      # [replication_factor: <int> | default = 1]
  replication_factor: 1
  path_prefix: /var/lib/loki

schema_config:
  configs:
    - from: 2020-05-15
      store: tsdb
      object_store: alibabacloud
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
# Configures storing index in an Object Store
# (GCS/S3/Azure/Swift/COS/Filesystem) in a prometheus TSDB-like format. Required
# fields only required when TSDB is defined in config.
  tsdb_shipper:
  # Directory where ingesters would write index files which would then be
  # uploaded by shipper to configured storage
  # CLI flag: -tsdb.shipper.active-index-directory
    active_index_directory: /var/lib/loki/index
  # Cache location for restoring index files from storage for queries
  # CLI flag: -boltdb.shipper.cache-location
    cache_location: /var/lib/loki/index_cache
    cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space

  alibabacloud:
    bucket: xxxxxx
    endpoint: xxxxxxx.aliyuncs.com
    access_key_id: xxxxxxxxxx
    secret_access_key: xxxxxxxxxxx

frontend:
# Defines the encoding for requests to and responses from the scheduler and
# querier. Can be 'json' or 'protobuf' (defaults to 'json').
# CLI flag: -frontend.encoding
  encoding: protobuf
# Enable anonymous usage reporting.
# CLI flag: -reporting.enabled
analytics:
  reporting_enabled: false

compactor:
  # Working directory to store downloaded bloom blocks. Supports multiple
  # directories, separated by comma.
  # CLI flag: -bloom.shipper.working-directory
  working_directory: /var/lib/loki/compactor
# Interval at which to re-run the compaction operation.
# CLI flag: -compactor.compaction-interval
  compaction_interval: 5m

# Interval to use for time-based splitting when a request is within the
# `query_ingesters_within` window; defaults to `split-queries-by-interval` by
# setting to 0.
# CLI flag: -querier.split-ingester-queries-by-interval
querier:
  query_ingesters_within: 0s

⚠️ 踩坑提示
endpoint 末尾 不要 带 bucket 名。
• 若出现 AccessDenied,检查 RAM 策略是否给足 oss:* 权限。

⑤ 启动并设置开机自启

sudo systemctl enable --now loki grafana-server
# 验证
curl http://localhost:3100/ready          # 返回 ready
curl http://localhost:3000/api/health    # Grafana 存活

⑥ Grafana 添加数据源

浏览器打开 http://<ECS_IP>:3000
默认账号 admin / admin → 立即改密 →
Configuration → Data Sources → Loki → URL 填 http://localhost:3100 → Save & Test ✔

点击Loki
点击Loki

输入Loki的地址和端口
输入Loki的地址和端口

点击测试并保存
点击测试并保存

恭喜!基础环境已就绪

🔧 深度配置解析

为什么选择这种架构?

Loki 的独特设计理念

  • 📊 仅索引标签:不对日志内容建立全文索引,大幅降低存储成本
  • 🗜️ 高效压缩:日志以压缩块形式存储,存储效率比传统方案高 10 倍以上
  • 🔄 云原生:天然支持对象存储,易于扩展

为什么选择 TSDB + 阿里云 OSS

  • 从 Loki 2.8 开始,TSDB 是官方推荐的索引存储方式
  • 阿里云内网访问 OSS 无流量费用
  • 单节点方案运维简单,适合中小型应用

存储配置详解

TSDB 索引配置

tsdb_shipper:
  active_index_directory: /var/lib/loki/index      # 活跃索引目录
  cache_location: /var/lib/loki/index_cache        # 索引缓存目录
  cache_ttl: 24h                                   # 缓存保留时间

为什么这样配置

  • active_index_directory:存储当前时间段的索引文件
  • cache_location:缓存从 OSS 下载的历史索引,提升查询性能
  • cache_ttl:平衡磁盘使用和查询性能

阿里云 OSS 配置

更推荐使用环境变量。

alibabacloud:
  bucket: loki-logs-prod                           # 存储桶名称
  endpoint: oss-cn-hangzhou-internal.aliyuncs.com # 内网地址
  access_key_id: LTA***                           # 访问密钥 ID
  secret_access_key: ***                          # 访问密钥

endpoint 选择策略

  • 内网地址:oss-cn-{region}-internal.aliyuncs.com(无流量费)
  • 公网地址:oss-cn-{region}.aliyuncs.com(有流量费)

⚠️ 踩坑指南

常见错误及解决方案

错误现象可能原因解决方案
permission deniedloki 用户权限不足sudo chown -R loki /var/lib/loki
failed to create OSS clientOSS 配置错误检查 bucket、endpoint、密钥配置、RAM权限改为*
context deadline exceeded网络连接超时确认使用正确的 endpoint(内网/公网)
schema config invalid配置文件语法错误删除所有 # 注释,检查 YAML 缩进

调试技巧

启用调试模式

server:
  log_level: debug  # 生产环境记得改回 info

检查服务状态

# 查看详细日志
sudo journalctl -u loki -f

# 测试 API 接口
curl http://localhost:3100/ready
curl http://localhost:3100/metrics

网络和性能问题

带宽限制影响

  • 阿里云 99 元 ECS 带宽较小(ECS-OSS 无限制,外网访问 3Mbps 是瓶颈)
  • 查询 24 小时日志可能需要传输数 MB 数据
  • 建议:升级带宽或使用按量付费带宽

存储成本优化

limits_config:
  retention_period: 30d  # 日志保留 30 天
compactor:
  retention_enabled: true
  retention_delete_delay: 2h

🔐 生产环境安全加固

⚠️ 重要提醒:本教程默认未启用身份验证,但已通过安全组关闭外网访问,仅适用于内网环境,切勿暴露端口到公网。

方案一:Nginx 反向代理认证

server {
    listen 80;
    server_name loki.yourdomain.com;
    
    auth_basic "Loki Access";
    auth_basic_user_file /etc/nginx/.htpasswd;
    
    location / {
        proxy_pass http://127.0.0.1:3100;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

方案二:网络隔离

  • 使用阿里云安全组限制访问来源
  • 仅允许内网 IP 段访问 3100 端口

🔜 下期预告

Loki 环境已搭建完成,下一步就是让应用程序发送日志到 Loki。

📋 附录:官方配置示例

为了便于理解不同场景的配置差异,这里提供几个官方配置示例供参考:

本地文件系统配置示例


# This is a complete configuration to deploy Loki backed by the filesystem.
# The index will be shipped to the storage via tsdb-shipper.

auth_enabled: false

server:
  http_listen_port: 3100

common:
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory
  replication_factor: 1
  path_prefix: /tmp/loki

schema_config:
  configs:
  - from: 2020-05-15
    store: tsdb
    object_store: filesystem
    schema: v13
    index:
      prefix: index_
      period: 24h

storage_config:
  filesystem:
    directory: /tmp/loki/chunks

官方S3兼容存储


# This is a complete configuration to deploy Loki backed by a s3-compatible API
# like MinIO for storage.
# Index files will be written locally at /loki/index and, eventually, will be shipped to the storage via tsdb-shipper.

auth_enabled: false

server:
  http_listen_port: 3100

common:
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory
  replication_factor: 1
  path_prefix: /loki

schema_config:
  configs:
  - from: 2020-05-15
    store: tsdb
    object_store: s3
    schema: v13
    index:
      prefix: index_
      period: 24h

storage_config:
 tsdb_shipper:
   active_index_directory: /loki/index
   cache_location: /loki/index_cache
 aws:
   s3: s3://access_key:secret_access_key@custom_endpoint/bucket_name
   s3forcepathstyle: true

S3 扩展配置


# S3 configuration supports an expanded configuration.
# Either an `s3` endpoint URL can be used, or an expanded configuration can be used.

storage_config:
  aws:
    bucketnames: bucket_name1, bucket_name2
    endpoint: s3.endpoint.com
    region: s3_region
    access_key_id: s3_access_key_id
    secret_access_key: s3_secret_access_key
    insecure: false
    http_config:
      idle_conn_timeout: 90s
      response_header_timeout: 0s
      insecure_skip_verify: false
    s3forcepathstyle: true
    

阿里云OSS存储 配置文件


# This partial configuration uses Alibaba for chunk storage.
common:
  path_prefix: /tmp/loki

schema_config:
  configs:
  - from: 2020-05-15
    store: tsdb
    object_store: alibabacloud
    schema: v13
    index:
      prefix: index_
      period: 24h

storage_config:
  tsdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/index_cache
  alibabacloud:
    bucket: <bucket>
    endpoint: <endpoint>
    access_key_id: <access_key_id>
    secret_access_key: <secret_access_key>

您可能感兴趣的文章

发现更多精彩内容

评论