ECK Elasticsearch (9.0.1) Pod Stuck - 'Running' but Never 'Ready' (Local Storage)

Dror_Roditti · June 5, 2025, 9:44am

Hello Elasticsearch Community,

I'm facing a strange issue with a simple Elasticsearch deployment using ECK ( on my on-premise Kubernetes cluster, bare metal. I'm trying to deploy Elasticsearch 9.0.1 with local storage, but the pod consistently gets stuck in a Running state without ever becoming Ready.

I was able to deploy similar steps in the past successfully

This problem is consistent in both single-node and multi-node setups:

Single-Node Cluster (count: 1): The single quickstart-es-default-0 pod starts, but never becomes Ready.
Three-Node Cluster (count: 3): If I try a 3-node deployment, two nodes get to the Ready state, but one node consistently remains in Running (not Ready). Here's a summary of my setup and the extensive troubleshooting steps I've taken:

Setup:

Kubernetes: On-prem self-hosted
ECK Version: 3.0.0
Elasticsearch Version: 9.0.1
Storage: Local PersistentVolume on a specific node , mounted via storageClassName

Pod Logs HANG:

When tailing the kubectl logs -f for the problematic pod, the logs consistently stop at the following lines, and no further output is printed to stdout/stderr from the container:

compressed ordinary object pointers [true]", "ecs.version":

{"@timestamp":"2025-06-04T12:31:25.645Z", "log.level": "INFO", "message":"Registered local node features [ES_V_8, ES_V_9, cluster.reroute.ignores_metric_param, cluster.stats.source_modes, linear_retriever_supported, lucene_10_1_upgrade, lucene_10_upgrade, security.queryable_built_in_roles, simulate.ignored.fields]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.features.FeatureService","elasticsearch.node.name":"quickstart-es-default-0","elasticsearch.cluster.name":"quickstart"}

{"@timestamp":"2025-06-04T12:31:25.685Z", "log.level": "INFO", "message":"Updated default factory retention to [null]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.cluster.metadata.DataStreamGlobalRetentionSettings","elasticsearch.node.name":"quickstart-es-default-0","elasticsearch.cluster.name":"quickstart"}

{"@timestamp":"2025-06-04T12:31:25.685Z", "log.level": "INFO", "message":"Updated max factory retention to [null]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.cluster.metadata.DataStreamGlobalRetentionSettings","elasticsearch.node.name":"quickstart-es-default-0","elasticsearch.cluster.name":"quickstart"}

On the pods that do get to the running state i have these printed in the logs after the point the previous pod hangs

ry.RecoverySettings","elasticsearch.node.name":"cortex-index-es-default-0","elasticsearch.cluster.name":"cortex-index"}

{"@timestamp":"2025-06-05T06:56:30.145Z", "log.level": "INFO", "message":"Registered local node features [ES_V_8, ES_V_9, cluster.reroute.ignores_metric_param, cluster.stats.source_modes, linear_retriever_supported, lucene_10_1_upgrade, lucene_10_upgrade, security.queryable_built_in_roles, simulate.ignored.fields]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.features.FeatureService","elasticsearch.node.name":"cortex-index-es-default-0","elasticsearch.cluster.name":"cortex-index"}

{"@timestamp":"2025-06-05T06:56:30.180Z", "log.level": "INFO", "message":"Updated default factory retention to [null]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.cluster.metadata.DataStreamGlobalRetentionSettings","elasticsearch.node.name":"cortex-index-es-default-0","elasticsearch.cluster.name":"cortex-index"}

{"@timestamp":"2025-06-05T06:56:30.181Z", "log.level": "INFO", "message":"Updated max factory retention to [null]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.cluster.metadata.DataStreamGlobalRetentionSettings","elasticsearch.node.name":"cortex-index-es-default-0","elasticsearch.cluster.name":"cortex-index"}

{"@timestamp":"2025-06-05T06:56:30.590Z", "log.level": "INFO", "message":"[controller/106] [Main.cc@123] controller (64 bit): Version 9.0.1 (Build 5ac89bc732bee2) Copyright (c) 2025 Elasticsearch BV", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"ml-cpp-log-tail-thread","log.logger":"org.elasticsearch.xpack.ml.process.logging.CppLogMessageHandler","elasticsearch.node.name":"cortex-index-es-default-0","elasticsearch.cluster.name":"cortex-index"}

{"@timestamp":"2025-06-05T06:56:31.041Z", "log.level": "INFO", "message":"OTel ingest plugin is enabled", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.xpack.oteldata.OTelPlugin","elasticsearch.node.name":"cortex-index-es-default-0","elasticsearch.cluster.name":"cortex-index"}

Pod User & Host Permissions:

Elasticsearch container runs as uid=1000(elasticsearch) gid=1000(elasticsearch).
The local storage path (/mnt/data/test-es-pv/) on the host is owned by (UID 1000, GID 1000).
Added securityContext: { runAsUser: 1000, fsGroup: 1000 } to the pod template.

Readiness Probe: The probe checks port 8080, as configured in elasticsearch.yml, but it fails because Elasticsearch never gets to the point of listening on this port.

I'm at a bit of a loss as i am not able to find any clue in the logs to why Elastic would not start…

Am I doing something completely wrong? Is there a way to get additional logs from the pod to understand what has happened?

Dror_Roditti · June 8, 2025, 7:01pm

--- AN UPDATE ---
Issue was after disabling these two xpacks

xpack.ml.enabled: false
xpack.watcher.enabled: false

it seems i had some resource stress on the hosting nodes

Topic		Replies	Views
ECK Elasticsearch (9.0.1) Pod Stuck - 'Running' but Never 'Ready' (Local Storage) Elastic Cloud on Kubernetes (ECK)	4	18	June 5, 2025
Trouble with installing ECK on my RKE2 Kubernetes cluster Elastic Cloud on Kubernetes (ECK)	12	1388	June 9, 2023
Elasticsearch ECK : Kubernetes deployment via default settings showing error Elasticsearch	3	344	November 22, 2020
Kibanas is stuck after elasticsearch cluster is ready Elastic Cloud on Kubernetes (ECK)	3	573	November 21, 2023
Default Elasticsearch ECK Installation stuck on "readiness probe failed" Elastic Cloud on Kubernetes (ECK) docker , painless	2	2110	March 15, 2023

ECK Elasticsearch (9.0.1) Pod Stuck - 'Running' but Never 'Ready' (Local Storage)

Related topics