-
Notifications
You must be signed in to change notification settings - Fork 191
Tidying up more applies_to tags in the Troubleshooting section #4473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Vale Linting ResultsSummary: 1 warning, 2 suggestions found
|
| File | Line | Rule | Message |
|---|---|---|---|
| troubleshoot/elasticsearch/increase-tier-capacity.md | 51 | Elastic.DontUse | Don't use 'Note that'. |
…ded topic as well
|
|
||
| ::::::{tab-item} {{ech}} | ||
| In order to increase the disk capacity of the data nodes in your cluster: | ||
| ::::::{applies-item} { ess: } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these steps work for ech and ece (tweaking the first steps - you can use these)
the autoscaling UI has also since changed to a multiselect dropdown (both envs) - screenshots should be removed:
not sure about the limit reached stuff. assume it's right?
|
|
||
| ::::::{tab-item} Self-managed | ||
| In order to increase the data node capacity in your cluster, you will need to calculate the amount of extra disk space needed. | ||
| ::::::{applies-item} { self: } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing ECK steps.
cursor tells me that the steps for ECK are different ... would be good to get @eedugon's confirmation here
::::::{tab-item} {{eck}}
In order to increase the disk capacity of data nodes in your {{eck}} cluster, you can either add more data nodes or increase the storage size of existing nodes.
**Option 1: Add more data nodes**
Update the `count` field in your data node NodeSet to add more nodes:
```yaml subs=true
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: {{version.stack}}
nodeSets:
- name: data-nodes
count: 5 # Increase from previous count
config:
node.roles: ["data"]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
```
Apply the changes:
```sh
kubectl apply -f your-elasticsearch-manifest.yaml
```
ECK will automatically create the new nodes and {{es}} will relocate shards to balance the load. You can monitor the progress using:
```console
GET /_cat/shards?v&h=state,node&s=state
```
**Option 2: Increase storage size of existing nodes**
If your storage class supports [volume expansion](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#expanding-persistent-volumes-claims), you can increase the storage size in the `volumeClaimTemplates`:
```yaml subs=true
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: {{version.stack}}
nodeSets:
- name: data-nodes
count: 3
config:
node.roles: ["data"]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi # Increased from previous size
```
Apply the changes. If the volume driver supports `ExpandInUsePersistentVolumes`, the filesystem will be resized online without restarting {{es}}. Otherwise, you may need to manually delete the Pods after the resize so they can be recreated with the expanded filesystem.
For more information, see [Updating deployments](/deploy-manage/deploy/cloud-on-k8s/update-deployments.md) and [Volume claim templates](/deploy-manage/deploy/cloud-on-k8s/volume-claim-templates.md).
::::::
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shainaraskas , the previous look good to me!
We could link to our official doc about volume claim templates for ECK and volume expansion: https://www.elastic.co/docs/deploy-manage/deploy/cloud-on-k8s/volume-claim-templates#k8s-volume-claim-templates-update
|
|
||
| ::::::{tab-item} {{ech}} | ||
| In order to get the shards assigned we’ll need to increase the number of shards that can be collocated on a node in the cluster. We’ll achieve this by inspecting the system-wide `cluster.routing.allocation.total_shards_per_node` [cluster setting](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-get-settings) and increasing the configured value. | ||
| ::::::{applies-item} { ess: } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this setting is invalid in ESS
technically these steps work but only because they're being set in an invalid way
@eedugon would we expect people to ever work around non-whitelisted settings in this way?
regardless, this is another case where the ech and self-managed instructions are very similar. the difference between them raises a red flag for me - you can still add nodes in ECH, so checking the target tier and scaling up that tier should also be done before increasing the total number of shards per node. this is the same fix that is causing us grief over here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shainaraskas : cluster.routing.allocation.total_shards_per_node is a dynamic setting. When needed, it's recommended to use it with the cluster settings API and not defining it statically on elasticsearch.yml (as that would require a rolling restart of all nodes).
So, in ECH, even if the setting is not whitelisted I don't think it's set in an invalid way when set through the cluster settings API.
Anyway, take this in mind, as I think it's related with the existence of this document:
In the past the default of that setting was 1000, and the most common reason to need that setting was as a temporary measure to allow an unexpected amount of shards being allocated. That's why this document was super useful.
Currently that setting defaults to no limit so it will probably won't be needed anymore, except if a user wants to keep the amount of shards under strict control and limits.
would we expect people to ever work around non-whitelisted settings in this way?
IMO, if the setting is dynamic and there are legitimate use cases for it I'd say yes, without needing to whitelist them at node config level. But it's just my opinion.
Anyway this document probably won't be as useful as it was in the past, considering that today cluster.routing.allocation.total_shards_per_node does not have a limit by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course if we still want to document this for ECH we need to ensure the reader doesn't try to configure cluster.routing.allocation.total_shards_per_node as a user setting because it's not whitelisted, they should do it with the cluster settings API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe (final thought here) we can rewrite a bit the introduction for users to understand that there's a dynamic cluster setting (cluster.routing.allocation.total_shards_per_node) that sets a maximum amount of shards a node can handle. In older versions that maximum defaulted to 1000, and that could cause the error Total number of shards per node has been reached.
If that setting (cluster.routing.allocation.total_shards_per_node) is set the user might need to increase it if they have exceeded the amount of shards in any of the nodes.
And the instructions to set it.... I'd say they are the same regardless of the deployment type (i'd only suggest the dynamic way in this troubleshooting document).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue on this page
| To accomplish this, complete the following steps: | ||
|
|
||
| :::::::{tab-set} | ||
| :::::::{applies-switch} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the applies to says stack, we need procedures for all 4 deployment types. likely this will be a split between ECE/ECH and self/ECK, but we would need to verify in more detail
|
I have a comment / question about the Don't you think the
^^ That's regardless of the deployment type! Calculating or knowing how much extra disk you need (and what tier needs disk) is 1) not strictly related with "adding more disk", and 2) independent of the deployment type. The only valid payload of that section is at the end of step 3, look at the 2 bullets:
And step 4 is not really a next step, is an informational comment in case of adding more nodes (and again valid for all deployment types!) In short: self-managed instructions says almost nothing in reality (I'm not saying there's much to say but the content feels weird as the majority of it is not really for self-managed exclusively). My proposal is:
And then we can offer the instructions to execute the previous tasks in all deployment types, such as:
|

Part of #4117
Summary
Generative AI disclosure