ClickHouse Operational Overview
The following ClickHouse background topics are available.
- Shards and Replicas
- ClickHouse Related Processes
- Supervisor/Worker Nodes Running ClickHouse Functions
- ClickHouse Keeper Cluster Considerations
- Event Insertion Flow
- Event Replication Flow
- Query Flow
Shards and Replicas
A shard is a database partition designed to provide high insertion and query rates. Events are written to and read from multiple shards in parallel. You need to choose the number of shards based on your incoming EPS (see example below and the latest ClickHouse Sizing Guide located in Fortinet Documents Library here).
If you want replication, then you can have replicas within each shard. ClickHouse will replicate database writes to a node within a shard to all other replicas within the same shard. A typical choice for replication size = 2, implying that you will have 2 nodes in each shard. A replica provides (a) faster queries and (b) prevents data loss in case a node goes down.
It is important to understand how ClickHouse insertion, Replication and Query works in FortiSIEM.
If adding a new shard to an existing cluster, the data must be manually rebalanced as described in the Scaling FortiSIEM within ClickHouse section of the Reference Architecture, with detailed steps at Rebalancing Shards. |
ClickHouse Related Processes
ClickHouse is a distributed database with replication capabilities. FortiSIEM Supervisor and Worker software images include ClickHouse binaries. The user does not need to install anything else. You can configure a ClickHouse cluster from the FortiSIEM GUI.
There are two main ClickHouse processes:
ClickHouseServer
process: This is the ClickHouse Database Service.ClickHouseKeeper
process: This is the ClickHouse Keeper Service providing Replication Management.
In addition, two more FortiSIEM processes provide ClickHouse related services:
phClickHouseMonitor
process: This runs on the Supervisor and Worker nodes and provides the following services:- On Supervisor/Worker nodes: CMDB Group Query helper, Lookup Table Query helper and DeviceToCMDBAttr Query helper.
- On Supervisor node only: Provides Online data display and the list of available ClickHouse nodes.
phMonitor
process: Provides ClickHouse configuration management on Supervisor node.
Supervisor/Worker Nodes Running ClickHouse Functions
A FortiSIEM Supervisor/Worker node can be of 3 types (not mutually exclusive):
- ClickHouse Keeper Node: This node runs ClickHouse Keeper service providing replication management.
- ClickHouse Data Node: This node inserts events into ClickHouse database.
- ClickHouse Query Node: This node provides ClickHouse query services.
FortiSIEM Supervisor/Worker node can be a specific node only, or a mix of the 3 node types. For example:
Small EPS Environments: One Supervisor node that is a Keeper, Data and Query node
Medium EPS Environments:
- Supervisor node as ClickHouse Keeper Node. Note that 2 additional Keeper nodes are recommended (see ClickHouse Keeper Cluster Considerations).
High EPS Environments (Option 1):
- Supervisor node does not run any ClickHouse service
- 3 separate Worker nodes as ClickHouse Keeper Nodes - these form the ClickHouse Keeper cluster (see ClickHouse Keeper Cluster Considerations).
- N Worker nodes, with each node acting as both ClickHouse Data Node and ClickHouse Query Node - these form the ClickHouse Database cluster.
High EPS environments (Option 2):
- Supervisor node does not run any ClickHouse service
- 3 separate Worker nodes as ClickHouse Keeper Nodes – these form the ClickHouse Keeper cluster (see ClickHouse Keeper Cluster Considerations).
- A ClickHouse Database cluster consisting of
- N/2 Worker nodes as ClickHouse Data Node only
- N/2 Worker nodes as ClickHouse Query Node only
In Option 1, events are ingested at all N nodes, query goes to all N nodes. In Option 2, events are ingested in N/2 nodes and queried from N/2 nodes. There are other options with N/2 Data Only nodes and N Query Nodes for better query performance. Option 1 is the most balanced Option that has been seen to work well.
ClickHouse Keeper Cluster Considerations
ClickHouse Keeper provides the coordination system for data replication and distributed DDL queries execution. You should use odd number of nodes in Keeper Cluster for maintaining quorum, although ClickHouse Allows even number of nodes.
- If you use 3 nodes and lose 1 node, the Cluster keeps running without any intervention.
- If you use 2 nodes and lose 1 node, then quorum is lost. You need to use the following steps in Recovering from Losing Quorum in ClickHouse Cluster to recover quorum.
- If you use 1 node and lose that node, then the ClickHouse event database becomes read only and insertion stops. You need to use the following steps in Recovering from Complete Loss of ClickHouse Keeper Cluster to recover the ClickHouse Keeper database.
Note that for high EPS environments, ClickHouse recommends running ClickHouse Keeper and Database services on separate nodes, to avoid disk and CPU contention between Query and Replication Management engines. If you have powerful servers with good CPU, memory and high throughput disks, and EPS is not high, it may be reasonable to co-locate ClickHouse Keeper and Data/Query nodes.
See ClickHouse Reference in the Appendix for related information.
Event Insertion Flow
- Collectors send events to the Worker list specified in ADMIN > Settings > System > Event Worker.
- Data Manager process on each Worker node will first select a ClickHouse Data Node and insert to that node. It may be local or remote.
Event Replication Flow
- After insertion, ClickHouse Data Node will inform a ClickHouse Keeper Node.
- The ClickHouse Keeper Node initiates replication to all other nodes in the same shard.
Query Flow
- GUI sends request to App Server which sends to Query Master on Supervisor
- Query Master provides Query management. It sends the request to a (randomly chosen) ClickHouse Query node. Each query may go to a different ClickHouse Query node.
- The ClickHouse Query node co-ordinates the Query (like Elasticsearch Coordinating node)
- It sends the results to other ClickHouse Query nodes
- It generates the final result by combining partial results obtained from all ClickHouse Query nodes
- It sends the result back to Query Master
- Query Master sends the results back to App Server; which in turn, sends it back to GUI.