【Elasticsearch】Elasticsearch 核心技术（一）：索引-海口c网

Elasticsearch 核心技术（一）：索引

1.索引的定义
2.索引的命名规范
3.索引的增、删、改、查
- 3.1 创建索引
- - 3.1.1 创建空索引
- 3.2 删除索引
- 3.3 文档操作
- - 3.3.1 添加/更新文档（指定ID）
  - 3.3.2 添加文档（自动生成ID）
  - 3.3.3 更新文档（部分更新）
  - 3.3.4 删除文档
- 3.4 查询操作
- - 3.4.1 获取单个文档
  - 3.4.2 搜索文档（简单查询）
  - 3.4.3 复合查询
  - 3.4.4 聚合查询
- 3.5 批量操作
- - 3.5.1 批量索引文档
  - 3.5.2 批量更新/删除
4.索引别名
- 4.1 什么是索引别名？
- 4.2 索引别名的作用
- 4.3 具体实现方法
- - 4.3.1 创建别名
  - 4.3.2 为多个索引创建别名
  - 4.3.3 切换别名（原子操作）
  - 4.3.4 带过滤条件的别名
  - 4.3.5 查看别名信息
- 4.4 使用示例场景
- - 场景1：索引重建与零停机切换
  - 场景2：分片查询
- 4.5 注意事项
- 4.6 实际应用案例
- - 电商平台商品索引管理
5.索引模板
- 5.1 什么是索引模板？
- 5.2 索引模板主要解决的业务问题
- 5.3 索引模板类型
- 5.4 具体实现方法
- - 5.4.1 创建简单索引模板
  - 5.4.2 使用组件模板（更模块化）
- 5.5 实际应用案例
- - 案例 1：电商平台订单索引
  - 案例 2：多租户 SaaS 应用
- 5.6 使用时注意事项
- 5.7 最佳实践建议

1.索引的定义

索引是具有相同数据结构的文档的集合，由唯一索引名称标定。一个集群中有多个索引，不同的索引代表不同的业务类型数据。比如：

将采集的不同业务类型的数据存储到不同的索引
- 微博业务：weibo_index
- 新闻业务：news_index
按日期切分存储日志索引
- 2024 年 7 月的日志对应 logs_202407
- 2024 年 8 月的日志对应 logs_202408

例如，通过以下命令创建名为 index_00001 的索引。

PUT index_00001

2.索引的命名规范

✅ 索引的命名规范如下：

只能使用小写字母，不能使用大写字母
不能包括 \ / * ? " < > | '' , # : 及空格等特殊符号
不能以 - _ + 作为开始字符
不能命名为 . 或者 ..
不能超过 $255$ 个字节
不建议使用中文命名

❌ 错误的命名方式

不允许：PUT INDEX_0002
不允许：PUT _index_0003
不允许：PUT index?_0004
不允许：PUT ..
不规范：PUT 索引0006

3.索引的增、删、改、查

以下示例展示了 Elasticsearch 中最常用的索引和文档操作。实际使用时，可以根据具体需求调整查询条件和参数。

3.1 创建索引

3.1.1 创建空索引

PUT /my_index
{"settings": {"number_of_shards": 3,"number_of_replicas": 2},"mappings": {"properties": {"title": {"type": "text"},"description": {"type": "text"},"price": {"type": "float"},"created_at": {"type": "date"}}}
}

在这里插入图片描述

3.2 删除索引

DELETE /my_index

3.3 文档操作

3.3.1 添加/更新文档（指定ID）

PUT /my_index/_doc/1
{"title": "Elasticsearch Guide","description": "A comprehensive guide to Elasticsearch","price": 49.99,"created_at": "2023-01-15"
}

在这里插入图片描述

3.3.2 添加文档（自动生成ID）

POST /my_index/_doc
{"title": "Learning Elasticsearch","description": "Beginner's guide to Elasticsearch","price": 29.99,"created_at": "2023-02-20"
}

在这里插入图片描述

3.3.3 更新文档（部分更新）

POST /my_index/_update/1
{"doc": {"price": 39.99}
}

在这里插入图片描述

3.3.4 删除文档

DELETE /my_index/_doc/1

3.4 查询操作

3.4.1 获取单个文档

GET /my_index/_doc/1

在这里插入图片描述

3.4.2 搜索文档（简单查询）

GET /my_index/_search
{"query": {"match": {"title": "guide"}}
}

在这里插入图片描述

3.4.3 复合查询

GET /my_index/_search
{"query": {"bool": {"must": [{ "match": { "title": "guide" } }],"filter": [{ "range": { "price": { "lte": 50 } } }]}},"sort": [{ "created_at": { "order": "desc" } }],"from": 0,"size": 10
}

在这里插入图片描述

3.4.4 聚合查询

GET /my_index/_search
{"size": 0,"aggs": {"avg_price": {"avg": { "field": "price" }},"price_ranges": {"range": {"field": "price","ranges": [{ "to": 20 },{ "from": 20, "to": 50 },{ "from": 50 }]}}}
}

在这里插入图片描述

3.5 批量操作

3.5.1 批量索引文档

POST /my_index/_bulk
{ "index": { "_id": "2" } }
{ "title": "Advanced Elasticsearch", "description": "For experienced users", "price": 59.99, "created_at": "2023-03-10" }
{ "index": { "_id": "3" } }
{ "title": "Elasticsearch Cookbook", "description": "Practical recipes", "price": 45.50, "created_at": "2023-04-05" }

在这里插入图片描述

3.5.2 批量更新/删除

POST /my_index/_bulk
{ "update": { "_id": "2" } }
{ "doc": { "price": 55.99 } }
{ "delete": { "_id": "3" } }

在这里插入图片描述

4.索引别名

4.1 什么是索引别名？

索引别名（Index Alias）是 Elasticsearch 中一个指向一个或多个索引的虚拟名称。它就像是一个指针或快捷方式，允许你通过一个名称来引用实际的索引，而不需要直接使用索引的真实名称。

4.2 索引别名的作用

简化索引管理：为复杂的索引名称提供简单的别名。
无缝切换索引：在不更改应用代码的情况下切换底层索引。
实现零停机维护：重建索引时不影响查询。
分组查询：通过一个别名查询多个索引。
权限控制：为不同用户提供不同的别名访问同一索引。
实现索引生命周期策略：如热温冷架构。

4.3 具体实现方法

4.3.1 创建别名

# 为单个索引创建别名
POST /_aliases
{"actions": [{"add": {"index": "products_2023","alias": "current_products"}}]
}

4.3.2 为多个索引创建别名

POST /_aliases
{"actions": [{"add": {"index": "products_2023_q1","alias": "all_products"}},{"add": {"index": "products_2023_q2","alias": "all_products"}}]
}

4.3.3 切换别名（原子操作）

POST /_aliases
{"actions": [{"remove": {"index": "products_2023","alias": "current_products"}},{"add": {"index": "products_2024","alias": "current_products"}}]
}

4.3.4 带过滤条件的别名

POST /_aliases
{"actions": [{"add": {"index": "products","alias": "high_value_products","filter": {"range": {"price": {"gte": 1000}}}}}]
}

4.3.5 查看别名信息

GET /_alias/current_products
GET /products_2023/_alias

4.4 使用示例场景

场景1：索引重建与零停机切换

创建新索引并导入数据

PUT /products_2024_v2
{"settings": { /* 新设置 */ },"mappings": { /* 新映射 */ }
}# 导入数据到新索引...

原子切换别名

POST /_aliases
{"actions": [{"remove": {"index": "products_2024_v1","alias": "current_products"}},{"add": {"index": "products_2024_v2","alias": "current_products"}}]
}

场景2：分片查询

# 创建按月分片的索引
PUT /logs_2023-01
PUT /logs_2023-02
PUT /logs_2023-03# 创建全局别名
POST /_aliases
{"actions": [{"add": {"index": "logs_2023-*","alias": "logs_2023"}}]
}# 查询时可以使用别名查询所有分片
GET /logs_2023/_search
{"query": {"match_all": {}}
}

4.5 注意事项

性能考虑
- 别名指向多个索引时，查询会分散到所有索引
- 过多的索引可能导致查询性能下降
写入限制
- 一个别名只能指向一个索引时才能用于写入操作
- 多索引别名只能用于读取
过滤别名
- 过滤条件会增加查询开销
- 复杂的过滤条件可能影响性能
别名与索引关系
- 删除索引不会自动删除关联的别名
- 别名可以独立于索引存在
权限控制
- 确保应用程序只有别名访问权限而非实际索引
- 可以通过别名实现数据访问隔离
监控与维护
- 定期检查别名配置是否正确
- 避免创建循环引用

4.6 实际应用案例

电商平台商品索引管理

初始设置

PUT /products_v1
POST /_aliases
{"actions": [{"add": {"index": "products_v1","alias": "products"}}]
}

应用代码始终使用 products 别名

# 查询
GET /products/_search# 写入
POST /products/_doc

需要重建索引时

# 创建新索引
PUT /products_v2
{"settings": {"number_of_shards": 5},"mappings": {"properties": {"name": { "type": "text" },"price": { "type": "scaled_float", "scaling_factor": 100 }}}
}# 导入数据到新索引...# 原子切换
POST /_aliases
{"actions": [{"remove": {"index": "products_v1","alias": "products"}},{"add": {"index": "products_v2","alias": "products"}}]
}# 可选的：删除旧索引
DELETE /products_v1

通过这种方式，应用代码不需要任何修改即可切换到新索引，实现了零停机索引迁移。

5.索引模板

5.1 什么是索引模板？

索引模板（Index Template）是 Elasticsearch 中一种自动为新创建的索引应用预定义配置（包括设置、映射和别名）的机制。当新索引的名称与模板中定义的模式匹配时，Elasticsearch 会自动将模板中的配置应用到该索引。

5.2 索引模板主要解决的业务问题

标准化配置管理：确保遵循统一的索引结构标准。
自动化索引创建：减少人工干预和配置错误。
大规模索引管理：简化大量相似索引的管理工作。
动态索引场景：处理按时间、业务分片的索引（如日志、时间序列数据）。
一致性保障：确保所有匹配索引具有相同的设置和映射。

5.3 索引模板类型

Elasticsearch $7.8 +$ 支持两种模板类型：

传统模板（Legacy Templates）：适用于索引。
组件模板（Component Templates）：可复用的模板模块。

5.4 具体实现方法

5.4.1 创建简单索引模板

PUT /_index_template/logs_template
{"index_patterns": ["logs-*"],  // 匹配所有以logs-开头的索引"template": {"settings": {"number_of_shards": 3,"number_of_replicas": 1,"index.lifecycle.name": "logs_policy"  // 关联ILM策略},"mappings": {"properties": {"@timestamp": {"type": "date"},"level": {"type": "keyword"},"message": {"type": "text"},"service": {"type": "keyword"}}},"aliases": {"all_logs": {}  // 为匹配索引自动添加别名}},"priority": 200,  // 优先级(越高越优先)"version": 1,"_meta": {"description": "日志索引模板"}
}

5.4.2 使用组件模板（更模块化）

# 创建组件模板1：基础设置
PUT /_component_template/logs_settings
{"template": {"settings": {"number_of_shards": 3,"number_of_replicas": 1,"index.lifecycle.name": "logs_policy"}}
}# 创建组件模板2：日志映射
PUT /_component_template/logs_mappings
{"template": {"mappings": {"properties": {"@timestamp": { "type": "date" },"level": { "type": "keyword" },"message": { "type": "text" }}}}
}# 组合组件模板创建索引模板
PUT /_index_template/logs_composite_template
{"index_patterns": ["logs-*"],"composed_of": ["logs_settings", "logs_mappings"],"priority": 200,"version": 2
}

5.5 实际应用案例

案例 1：电商平台订单索引

业务需求：每天创建一个新索引存储订单数据，确保所有订单索引结构一致

PUT /_index_template/orders_daily_template
{"index_patterns": ["orders-*"],"template": {"settings": {"number_of_shards": 5,"number_of_replicas": 2,"index.refresh_interval": "30s"},"mappings": {"properties": {"order_id": { "type": "keyword" },"customer_id": { "type": "keyword" },"order_date": { "type": "date" },"amount": { "type": "double" },"items": {"type": "nested","properties": {"product_id": { "type": "keyword" },"quantity": { "type": "integer" },"price": { "type": "double" }}}}},"aliases": {"current_orders": {}}},"priority": 100
}# 每天自动创建索引（应用会使用日期创建索引如orders-2023-10-01）
# 由于匹配orders-*模式，模板配置会自动应用

案例 2：多租户 SaaS 应用

业务需求：为每个客户创建独立索引，但保持统一结构

PUT /_index_template/tenant_data_template
{"index_patterns": ["tenant_*"],"template": {"settings": {"number_of_shards": 2,"analysis": {"analyzer": {"tenant_analyzer": {"type": "custom","tokenizer": "standard","filter": ["lowercase"]}}}},"mappings": {"properties": {"tenant_id": { "type": "keyword" },"content": {"type": "text","analyzer": "tenant_analyzer"},"created_at": { "type": "date" }}}}
}# 当创建tenant_acme、tenant_xyz等索引时，模板会自动应用

5.6 使用时注意事项

优先级冲突
- 多个模板可能匹配同一个索引名称。
- 使用 priority 字段明确优先级（数值越大优先级越高）。
- 建议为不同模式设置明确的优先级。
模板应用时机
- 只在索引创建时应用。
- 对已有索引的修改不会触发模板重新应用。
版本控制
- 使用 version 字段跟踪模板版本。
- 更新模板时递增版本号。
索引模式设计
- 模式应足够具体以避免意外匹配。
- 如 logs-app-* 比 logs-* 更明确。
组件模板优势
- 7.8+ 版本推荐使用组件模板。
- 提高配置复用性。
- 便于维护和更新。
系统模板
- Elasticsearch 有内置系统模板（如 .monitoring-*）。
- 避免与系统模板冲突。

模板查看与管理

# 查看所有模板
GET /_index_template# 查看特定模板
GET /_index_template/<template_name># 删除模板
DELETE /_index_template/<template_name>

测试验证：

使用模拟索引创建测试模板效果。

POST /_index_template/_simulate/index/test_logs-001
{"index_patterns": ["logs-*"],"template": {"settings": {"number_of_shards": 2}}
}

5.7 最佳实践建议

命名规范
- 为模板使用描述性名称（如 ecommerce_orders_template）。
- 索引模式使用明确的前缀/后缀（如 metric-*-prod）。

文档化

使用 _meta 字段记录模板用途和变更历史。

"_meta": {"description": "用于存储所有产品日志","created_by": "data_team","version": "1.1"
}

版本控制
- 将模板配置纳入版本控制系统。
- 实现模板的 CI/CD 流程。
监控模板使用
- 定期检查哪些索引应用了哪些模板。
- 监控模板匹配情况。

与 ILM 策略结合

"settings": {"index.lifecycle.name": "hot_warm_cold_policy","index.lifecycle.rollover_alias": "logs_alias"
}

通过合理使用索引模板，可以显著提高 Elasticsearch 集群的管理效率，确保索引配置的一致性，并为动态索引场景提供自动化支持。