Getting Started with ElasticSearch: Performance Tips, Configuration, and Minimum Hardware Requirements?

Def1ne · May 16, 2025, 9:18pm

Hello everyone,

I’m developing an enterprise cybersecurity project focused on Internet-wide scanning, similar to Shodan or Censys, aimed at mapping exposed infrastructure (services, ports, domains, certificates, ICS/SCADA, etc). The data collection is continuous, and the system needs to support an average of 1TB of ingestion per day.

I recently started implementing Elasticsearch as the fast indexing layer for direct search. The idea is to use it for simple and efficient queries, with data organized approximately as follows:

IP → identified ports and services, banners (HTTP, TLS, SSH), status
Domain → resolved IPs, TLS status, DNS records
Port → listening services and fingerprints
Cert_sha256 → list of hosts sharing the same certificate

Entity correlation will be handled by a graph engine (TigerGraph), and raw/historical data will be stored in a data lake using Ceph.

What I would like to better understand:

Elasticsearch cluster sizing

How can I estimate the number of data nodes required for a projected volume of, for example, 100 TB of useful data?
What is the real overhead to consider (indices, replicas, mappings, etc)?

Hardware recommendations

What are the ideal CPU, RAM, and storage configurations per node for ingestion and search workloads?
Are SSD/NVMe mandatory for hot nodes, or is it possible to combine with magnetic disks in different tiers?

Best practices to scale from the start

What optimizations should I apply to mappings and ingestion early in the project?

Thanks in advance.

Christian_Dahlqvist · May 17, 2025, 7:14am

This depends a lot on the use case, how you will be querying the data and what latency requirements you have. Based on your description it sounds like your use case may not necessarily be a typical one, so I would recommend setting up a POC.

This also depends on the use case and requirements.

For nodes handling high indexing throughput NVMe SSDs are recommended as this is very I/O intensive. The hot-warm (or hot-warm-cold) architecture assumes that some nodes will handle all indexing and that older data that is read only is moved over to a different set of nodes that hold more data, possibly with less performant hardware. This often also assumes that it is the most recent data that is queried most frequently, so it is not clear whether or not this type of architecture is suitable for your use case. Testing this is also something I would recommend as part of a POC.

Topic		Replies	Views
Elastic setup (slow queries) Elasticsearch	4	825	July 3, 2017
Doubts about hardware requirements for elasticsearch Elasticsearch	11	1485	July 3, 2020
Help regarding hardware requirements Elasticsearch	3	407	July 20, 2018
Minimal/optimal hardware setup for one node Elasticsearch stack with daily index 10GB to 20GB Elasticsearch	4	991	May 3, 2023
Estmation of hardware needed Elasticsearch	2	829	July 5, 2017

Getting Started with ElasticSearch: Performance Tips, Configuration, and Minimum Hardware Requirements?

Related topics