Rebalancing Shards

Suppose you have been running FortiSIEM on ClickHouse with one shard. The disks have become full, or your overall EPS has increased, and you want to add another shard. Then you should consider moving some data from the current shard (Shard1) to the new Shard (Shard2). Follow these steps to accomplish this objective.

  1. Identify the Shard1 partition(s) that you want to move to Shard2.
  2. For each partition to be moved, take the following steps:
    1. Logon to Supervisor node as admin.
    2. Copy the partition from Shard1 to one of the replicas of Shard2 running the following command. This is a replicated SQL, so you only need run it against one of the destination replicas.
      clickhouse-client -h <Shard2_Replica1_IP> -q "ALTER TABLE fsiem.events_replicated FETCH PARTITION <partition_id> FROM '/clickhouse/tables/<source_shard_id>/fsiem.events'"
    3. Detach the partition from the source shard by running the following command. This is a replicated SQL. You only need run it against one of the source replicas.
      clickhouse-client -h <Shard1_Replica1_IP> -q "ALTER TABLE fsiem.events_replicated DETACH PARTITION <partition_id>"
    4. Attach the partition to one of the destination replicas by running the following command. This is a replicated SQL. You only need run it against one of the destination replicas.
      clickhouse-client -h <Shard2: Replica1_IP> -q "ALTER TABLE fsiem.events_replicated ATTACH PARTITION <partition_id>"
    5. Delete the detached directories to release disk space by logging on to each node of Shard1 as root and running the following command. This is non-replicated SQL. You need run it against all source replicas.
      clickhouse-client -q " select path from system.detached_parts where partition_id = '<partition_id>'" | xargs rm -rf

Note: Data is not available for Query only during Step 2c and 2d above.

An example of shard rebalancing follows:

Suppose the existing ClickHouse Install has one shard (Shard1), and a new shard (Shard2) has been added.

Shard1: Replica1: IP: 172.30.58.242

Shard1: Replica2: IP: 172.30.58.243

Shard2: Replica1: IP: 172.30.58.244

Shard2: Replica2: IP: 172.30.58.245

To identify the partitions in Shard1 with the largest data, run the following command:

SELECT partition, formatReadableSize(sum(bytes_on_disk)) FROM system.parts where table = 'events_replicated' and active group by partition order by partition

 

The following example output appears:

┌─partition────────┬─formatReadableSize(sum(bytes_on_disk))─┐
│ (18250,20231017) │ 3.15 GiB │
│ (18250,20231018) │ 3.15 GiB │
│ (18250,20231019) │ 3.71 GiB │
│ (18250,20231020) │ 7.85 GiB │
│ (18250,20231021) │ 7.99 GiB │
│ (18250,20231022) │ 8.00 GiB │
│ (18250,20231023) │ 5.61 GiB │
└──────────────────┴────────────────────────────────────────┘

Suppose you want to move partition (18250, 20231018) from Shard1 to Shard2. Proceed by taking the following steps:

  1. Copy Partition (Step 2b above):

    clickhouse-client -h 172.30.58.244 -q "ALTER TABLE fsiem.events_replicated FETCH PARTITION (18250, 20231018) FROM '/clickhouse/tables/1/fsiem.events'"

  2. Detach Partition (Step 2c above):

    clickhouse-client -h 172.30.58.242 -q "ALTER TABLE fsiem.events_replicated DETACH PARTITION (18250, 20231018)"

  3. Attach Partition (Step 2d above):

    clickhouse-client -h 172.30.58.244 -q "ALTER TABLE fsiem.events_replicated ATTACH PARTITION (18250, 20231018)"

  4. Delete Detached Directories (Step 2e above):

    1. Logon to Shard 1, Replica 1 (172.30.58.242).

    2. Run the following command as root:

      clickhouse-client -q " select path from system.detached_parts where partition_id = '18250-20231018'" | xargs rm -rf

    3. Logon to Shard 1, Replica 2 (172.30.58.243).

    4. Run the following command as root:

      clickhouse-client -q " select path from system.detached_parts where partition_id = '18250-20231018'" | xargs rm -rf