Skip to content

Commit 94fb009

Browse files
authored
Merge pull request #40021 from github/repo-sync
Repo sync
2 parents a59f03a + c45861d commit 94fb009

File tree

33 files changed

+594
-30
lines changed

33 files changed

+594
-30
lines changed
Lines changed: 276 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,276 @@
1+
---
2+
title: Check system capacity before upgrading
3+
intro: 'Before upgrading {% data variables.product.prodname_ghe_server %}, you should perform these capacity checks and take the recommended steps.'
4+
shortTitle: Check capacity before upgrade
5+
versions:
6+
ghes: '*'
7+
contentType: how-tos
8+
topics:
9+
- Enterprise
10+
- Upgrades
11+
---
12+
13+
Upgrading to newer versions of {% data variables.product.prodname_ghe_server %} typically increases resource consumption. Each feature release adds new functionality, some enabled by default, others opt-in, which requires more processing power. Customer usage patterns also affect demand; for example, enterprises with tens of thousands of organizations may see higher resource usage.
14+
15+
Resource increases most often appear as higher CPU utilization, more I/O operations per second (IOPS), greater memory use, or larger Aqueduct queue backlogs. To prepare for these changes, check your system’s available capacity and apply any remediation recommendations before upgrading. Run these checks during your busiest times of day and week to get the most accurate results.
16+
17+
## Resource requirements
18+
19+
Before upgrading your instance, it's crucial to verify that your system meets the necessary resource requirements:
20+
21+
1. [CPU usage below 70%](#cpu-usage-below-70)
22+
1. [Memory usage below 70%](#memory-usage-below-70)
23+
1. [Disk not saturated](#disk-not-saturated)
24+
1. [Unicorn queue under 200–300](#unicorn-queue-under-200300)
25+
1. [Aqueduct backlog under 1–2 hours](#aqueduct-backlog-under-12-hours)
26+
27+
### CPU usage below 70%
28+
29+
1. **Check CPU utilization.**
30+
In the {% data variables.enterprise.management_console %}, go to the monitor page (`https://HOSTNAME.com:8443/setup/monitor`) and view the `CPU` graph.
31+
32+
* If utilization is **regularly below 70%**, continue to [Memory usage](#memory-usage-below-70).
33+
* If utilization is **regularly above 70%**, the system does not meet the criteria to upgrade.
34+
35+
1. **Compare utilization with CPU load average.**
36+
The comparison helps identify possible disk saturation.
37+
38+
{% ifversion ghes > 3.15 %}
39+
* Go to **Operational Health view** and check the `Load` graph.
40+
* In the matrix, find the value where the `shortterm` row intersects with the `avg` column.
41+
* Calculate load average percentage:
42+
43+
```text
44+
(short-term avg ÷ number of vCPUs) × 100
45+
```
46+
47+
* In the same view, check the `CPU` graph. In the matrix, find the value where the `idle` row intersects with the `avg` column. Subtract this value from 100 to get utilization.
48+
{% endif %}
49+
50+
{% ifversion ghes < 3.16 %}
51+
* On the `Load` graph, click **short-term** to show only the short-term line. Find the peak load value.
52+
* On the `CPU` graph, click **idle** to show only the idle line. Note the idle value at the same timestamp.
53+
* Calculate utilization:
54+
55+
```text
56+
100 – idle
57+
```
58+
59+
* Calculate load average percentage:
60+
61+
```text
62+
(peak load value ÷ number of vCPUs) × 100
63+
```
64+
65+
{% endif %}
66+
67+
1. **Interpret the results.**
68+
69+
If the CPU load average percentage is more than 50% higher than utilization, this likely indicates resource contention. Do not proceed with the upgrade until you have investigated possible disk saturation (see [Disk not saturated](#disk-not-saturated)).
70+
71+
### Memory usage below 70%
72+
73+
1. **Check memory usage.**
74+
In the {% data variables.enterprise.management_console %}, go to the monitor page (`https://HOSTNAME.com:8443/setup/monitor`) and view the `Memory` graph.
75+
76+
1. **Interpret the results.**
77+
78+
* If memory usage is **regularly below 70%**, continue to [Disk not saturated](#disk-not-saturated).
79+
* If memory usage is **regularly above 70%**, the system does not meet the criteria to upgrade.
80+
81+
### Disk not saturated
82+
83+
1. **Check provider specifications.**
84+
If your cloud or hardware provider offers disk utilization metrics, use them to confirm whether the disk is saturated.
85+
86+
* If metrics are not available, request the disk specifications from your provider, including maximum throughput and maximum IOPS.
87+
* Compare these limits with your observed disk usage. If usage is approaching the maximum values, the disk is saturated.
88+
89+
1. **Check disk graphs in the {% data variables.enterprise.management_console %}.**
90+
Go to the monitor page (`https://HOSTNAME.com:8443/setup/monitor`).
91+
92+
* View the `Disk Operations` and `Disk Traffic` graphs.
93+
* Compare Y-axis values with your provider’s specifications (not the maximum scale shown on the graph).
94+
* Review both data and root disks.
95+
96+
{% ifversion ghes < 3.16 %}
97+
These graphs are available in the default dashboards on the monitor page.
98+
{% endif %}
99+
{% ifversion ghes > 3.15 %}
100+
These graphs are available in the "System & Application Insights" view.
101+
{% endif %}
102+
103+
1. **Interpret the results.**
104+
If disk usage is approaching provider-defined maximums, the disk is saturated. In this case, the system does not meet the criteria to upgrade.
105+
106+
### Unicorn queue under 200–300
107+
108+
1. **Check the queued requests graph.**
109+
In the {% data variables.enterprise.management_console %}, go to the monitor page (`https://HOSTNAME.com:8443/setup/monitor`) and view the `Queued Requests` graph.
110+
111+
{% ifversion ghes < 3.16 %}
112+
This graph is available in the default dashboards on the monitor page.
113+
{% endif %}
114+
{% ifversion ghes > 3.15 %}
115+
This graph is available in the "System & Application Insights" view.
116+
{% endif %}
117+
118+
1. **Interpret the results.**
119+
120+
* If queued requests are **consistently below 200**, continue to [Aqueduct backlog under 1–2 hours](#aqueduct-backlog-under-12-hours).
121+
* If queued requests are **regularly at or above 200–300**, the system does not meet the criteria to upgrade.
122+
123+
1. **Optional: Check unicorn worker utilization.**
124+
From the administrative shell, run:
125+
126+
```shell
127+
ps -ef | grep unicorn | grep -v gitauth | grep -v ".rb" | grep -v init | grep git
128+
```
129+
130+
Look at the last column of the output. If all processes show `> 90% utilization`, more unicorn workers are required.
131+
132+
### Aqueduct backlog under 1–2 hours
133+
134+
1. **Check the Aqueduct queue depth.**
135+
In the {% data variables.enterprise.management_console %}, go to the monitor page (`https://HOSTNAME.com:8443/setup/monitor`) and view the `Aqueduct queue depth` graph.
136+
137+
{% ifversion ghes < 3.16 %}
138+
This graph appears in the default dashboards on the monitor page.
139+
{% endif %}
140+
{% ifversion ghes > 3.15 %}
141+
This graph is available in the "System & Application Insights" view.
142+
{% endif %}
143+
144+
1. **Interpret the results.**
145+
146+
* If the backlog **lasts less than 1–2 hours**, you meet this requirement.
147+
* If the backlog **regularly lasts longer than 1–2 hours**, the system does not meet the criteria to upgrade.
148+
149+
1. **Monitor the `index_high` queue.**
150+
Large deployments may experience significant increases in `index_high` queue depth, which can worsen backlogs. Pay special attention to this queue when monitoring.
151+
152+
If **all criteria** (CPU, memory, disk, unicorn queue, Aqueduct backlog) are met, you can proceed with upgrading to your target feature version. After upgrading, expect resource consumption to increase further.
153+
154+
If **any criteria are not met**, resolve the underlying issues before attempting to upgrade.
155+
156+
## Upgrading hardware and fine-tune workers
157+
158+
If your system did not meet one or more of the resource requirements, you will need to increase capacity before upgrading. The following sections describe how to add hardware resources and adjust worker configuration to resolve common bottlenecks.
159+
160+
1. [CPU above 70%](#cpu-above-70)
161+
1. [Memory above 70%](#memory-above-70)
162+
1. [Disk saturated](#disk-saturated)
163+
1. [Unicorn queue above 200–300](#unicorn-queue-above-200300)
164+
1. [Aqueduct backlog above 1–2 hours](#aqueduct-backlog-above-12-hours)
165+
166+
### CPU above 70%
167+
168+
If CPU utilization is regularly above 70%:
169+
170+
* **Increase CPU resources.**
171+
Add at least 20% more vCPUs.
172+
* **Account for new workers.**
173+
Allocate 1 vCPU per worker. For example, if you add 5 unicorn workers and 10 Resque workers, increase vCPUs by at least 15.
174+
175+
### Memory above 70%
176+
177+
If memory usage is regularly above 70%:
178+
179+
* **Increase memory.**
180+
Add additional RAM to reduce average usage below 70%.
181+
* **Account for new workers.**
182+
Allocate 1 GB of memory per worker. For example, if you add 5 unicorn workers and 10 Resque workers, increase memory by at least 15 GB.
183+
184+
### Disk saturated
185+
186+
If the disk saturation check indicates saturation, upgrade to disks with higher throughput and maximum IOPS.
187+
188+
### Unicorn queue above 200–300
189+
190+
If unicorn requests are consistently queued above 200–300, you may need to add more unicorn workers. Follow these steps to determine the total target number of workers and update your configuration.
191+
192+
#### 1. Estimate additional workers
193+
194+
Run the following command during peak hours to view utilization per worker:
195+
196+
```shell
197+
ps -ef | grep unicorn | grep -v gitauth | grep -v ".rb" | grep -v init | grep git
198+
```
199+
200+
Example output:
201+
202+
```shell
203+
git 3048972 3045762 0 Aug01 ? 00:07:47 unicorn 3-16-nightly.ghe-test.com[6e6ad46] worker[00]: 20491 reqs, 10.8 req/s, 13ms avg, 85.2% util
204+
git 3048979 3045762 0 Aug01 ? 00:07:53 unicorn 3-16-nightly.ghe-test.com[6e6ad46] worker[01]: 20951 reqs, 12.5 req/s, 13ms avg, 80.3% util
205+
git 3048985 3045762 0 Aug01 ? 00:08:04 unicorn 3-16-nightly.ghe-test.com[6e6ad46] worker[02]: 21502 reqs, 10.5 req/s, 15ms avg, 76.5% util
206+
git 3048992 3045762 0 Aug01 ? 00:07:45 unicorn 3-16-nightly.ghe-test.com[6e6ad46] worker[03]: 20249 reqs, 14.2 req/s, 15ms avg, 86.9% util
207+
```
208+
209+
The average requests/second is 12 req/s.
210+
211+
From this output, calculate the average requests per second (req/s).
212+
213+
* In the example above: 12 req/s.
214+
* Target is to reduce queued requests to ≤100.
215+
* Formula:
216+
217+
```bash
218+
(Queued requests – 100) ÷ avg req/s
219+
```
220+
221+
* Example: (280 – 100) ÷ 12 = 15 additional workers needed.
222+
223+
>[!TIP] If you want to confirm your findings, you can reach out to us by visiting {% data variables.contact.contact_ent_support %}, uploading a bundle, and asking for the total target number of unicorn workers.
224+
225+
#### 2. Check current configuration
226+
227+
Make sure the total number of workers (unicorn + Resque) does not exceed vCPUs. Allocate at least 1 vCPU per worker.
228+
229+
Check current numbers:
230+
231+
* Unicorn workers
232+
233+
```shell
234+
ps -ef | grep unicorn | grep -v gitauth | grep -v ".rb" | grep -v init | grep git | wc -l
235+
```
236+
237+
Add your calculated number of new workers to this value to get the total target.
238+
239+
* Resque workers
240+
241+
```shell
242+
ps -ef | grep aqueduct-1.1.0 | grep -v "grep aqueduct-1.1.0" | wc -l
243+
```
244+
245+
#### 3. Adjust configuration
246+
247+
If the sum of unicorn + Resque workers exceeds vCPUs, add more vCPUs before continuing.
248+
249+
Update the number of unicorn workers:
250+
251+
```shell
252+
ghe-config app.github.github-workers <NUM-WORKERS>
253+
ghe-config-apply
254+
```
255+
256+
Replace <NUM-WORKERS> with the total target number of unicorn workers.
257+
258+
### Aqueduct backlog above 1–2 hours
259+
260+
If Aqueduct jobs are regularly backlogged for more than 1–2 hours, add resqued-low workers to reduce the risk of queue backups. This issue often worsens after upgrading.
261+
262+
#### 1. Add resqued-low workers
263+
264+
* Increase the number of workers by **5–10**.
265+
Be mindful of CPU capacity—each worker requires at least **1 vCPU**.
266+
267+
```shell
268+
ghe-config app.github.resqued-low-workers <NUM-WORKERS>
269+
ghe-config-apply
270+
```
271+
272+
Replace <NUM-WORKERS> with the new total number of resqued-low workers.
273+
274+
#### 2. Validate total worker count
275+
276+
Ensure the combined number of unicorn + Resque workers does not exceed the total number of vCPUs. See [Unicorn queue above 200–300](#unicorn-queue-above-200300) for instructions on checking current worker configuration.

content/admin/upgrading-your-instance/preparing-to-upgrade/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,5 +11,6 @@ children:
1111
- /upgrade-requirements
1212
- /enabling-automatic-update-checks
1313
- /taking-a-snapshot
14+
- /check-system-capacity-before-upgrading
1415
shortTitle: Prepare to upgrade
1516
---

content/admin/upgrading-your-instance/preparing-to-upgrade/upgrade-requirements.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ topics:
3535

3636
## Requirements
3737

38+
* You must perform a capacity check. For more information, see [AUTOTITLE](/admin/upgrading-your-instance/preparing-to-upgrade/check-system-capacity-before-upgrading).
3839
* You must upgrade from a feature release that's **at most** two releases behind. For example, to upgrade to {% data variables.product.prodname_enterprise %} {{ enterpriseServerReleases.latest }}, you must be on {% data variables.product.prodname_enterprise %} {{ enterpriseServerReleases.supported[1] }} or {{ enterpriseServerReleases.supported[2] }}.
3940
* When upgrading using an upgrade package, schedule a maintenance window for {% data variables.product.prodname_ghe_server %} end users.
4041
* {% data reusables.enterprise_installation.hotpatching-explanation %}

content/admin/upgrading-your-instance/troubleshooting-upgrades/known-issues-with-upgrades-to-your-instance.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ redirect_from:
2121

2222
{% data variables.product.company_short %} strongly recommends regular backups of your instance's configuration and data. Before you proceed with any upgrade, back up your instance, then validate the backup in a staging environment. For more information, see [AUTOTITLE](/admin/configuration/configuring-your-enterprise/configuring-backups-on-your-appliance) and [AUTOTITLE](/admin/installation/setting-up-a-github-enterprise-server-instance/setting-up-a-staging-instance).
2323

24-
## Hold off on upgrading to 3.15 and above
24+
## Lifting the pause on upgrades to version 3.15 and above
2525

26-
We have received a few reports of performance issues with {% data variables.product.prodname_ghe_server %} versions 3.15, 3.16, and 3.17. Out of an abundance of caution, we recommend holding off on upgrading to these versions until further notice.
26+
We have lifted the pause on upgrades to versions 3.15, 3.16, and 3.17. You can now upgrade to 3.15.12, 3.16.8, 3.17.5, or later. We do not recommend upgrading to earlier releases of 3.15, 3.16, or 3.17. As an additional step, it is recommended to check system capacity before upgrading. See [AUTOTITLE](/admin/upgrading-your-instance/preparing-to-upgrade/check-system-capacity-before-upgrading).
2727

2828
We are extending the support window for versions 3.14, 3.15, 3.16, and 3.17. The support window for 3.13 remains unchanged. The closing down date for each of 3.14, 3.15, 3.16, and 3.17 has been updated to "Support temporarily extended until further notice". For more information, see [AUTOTITLE](/admin/all-releases#releases-of-github-enterprise-server).
2929

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
date: '2025-08-19'
2+
sections:
3+
security_fixes:
4+
- |
5+
**HIGH:** An improper access control vulnerability was identified that allowed authenticated users to obtain code content from private repositories they did not have permission to access. If a user knew the names of a private repository and its branches, tags, or commit SHAs, they could use the compare/diff functionality to retrieve code from those repositories without authorization. Exploiting this vulnerability also required the attacker to have legitimate access to another repository within the same fork network. This vulnerability has been assigned [CVE-2025-8447](https://www.cve.org/cverecord?id=CVE-2025-8447) and was reported through the [GitHub Bug Bounty program](https://bounty.github.com/).
6+
- |
7+
Packages have been updated to the latest security versions.
8+
- |
9+
Elasticsearch packages have been updated to the 8.18.0 security version.
10+
bugs:
11+
- |
12+
After enabling GitHub Actions or performing an upgrade with GitHub Actions enabled, administrators experienced a delay of approximately 10 minutes longer than they should have due to a faulty connection check. This is fixed for future enablement and upgrades.
13+
- |
14+
After upgrading to GHES 3.14.16, GHES 3.15.11, GHES 3.16.7, or GHES 3.17.4, administrators found that draft pull requests for private repositories were no longer available.
15+
changes:
16+
- |
17+
When administrators run the `ghe-support-bundle` command on an unconfigured node, the output clearly states that metadata collection was skipped, instead of producing misleading `curl` errors. This improves the clarity of support bundle diagnostics.
18+
known_issues:
19+
- |
20+
During the validation phase of a configuration run, a `No such object` error may occur for the Notebook and Viewscreen services. This error can be ignored as the services should still correctly start.
21+
- |
22+
If the root site administrator is locked out of the Management Console after failed login attempts, the account does not unlock automatically after the defined lockout time. Someone with administrative SSH access to the instance must unlock the account using the administrative shell. For more information, see [AUTOTITLE](/admin/configuration/administering-your-instance-from-the-management-console/troubleshooting-access-to-the-management-console#unlocking-the-root-site-administrator-account).
23+
- |
24+
On an instance with the HTTP `X-Forwarded-For` header configured for use behind a load balancer, all client IP addresses in the instance's audit log erroneously appear as 127.0.0.1.
25+
- |
26+
{% data reusables.release-notes.large-adoc-files-issue %}
27+
- |
28+
Admin stats REST API endpoints may timeout on appliances with many users or repositories. Retrying the request until data is returned is advised.
29+
- |
30+
When following the steps for [Replacing the primary MySQL node](/admin/monitoring-managing-and-updating-your-instance/configuring-clustering/replacing-a-cluster-node#replacing-the-primary-mysql-node), step 14 (running `ghe-cluster-config-apply`) may fail with errors. If this occurs, re-running `ghe-cluster-config-apply` is expected to succeed.
31+
- |
32+
Running `ghe-cluster-config-apply` as part of the steps for [Replacing a node in an emergency](/admin/monitoring-managing-and-updating-your-instance/configuring-clustering/replacing-a-cluster-node#replacing-a-node-in-an-emergency) may fail with errors if the node being replaced is still reachable. If this occurs, shutdown the node and repeat the steps.
33+
- |
34+
{% data reusables.release-notes.2024-06-possible-frontend-5-minute-outage-during-hotpatch-upgrade %}
35+
- |
36+
When restoring data originally backed up from a 3.13 or greater appliance version, the Elasticsearch indices need to be reindexed before some of the data will show up. This happens via a nightly scheduled job. It can also be forced by running `/usr/local/share/enterprise/ghe-es-search-repair`.
37+
- |
38+
An organization-level code scanning configuration page is displayed on instances that do not use GitHub Advanced Security or code scanning.
39+
- |
40+
In the header bar displayed to site administrators, some icons are not available.
41+
- |
42+
When enabling automatic update checks for the first time in the Management Console, the status is not dynamically reflected until the "Updates" page is reloaded.
43+
- |
44+
When restoring from a backup snapshot, a large number of `mapper_parsing_exception` errors may be displayed.
45+
- |
46+
After a restore, existing outside collaborators cannot be added to repositories in a new organization. This issue can be resolved by running `/usr/local/share/enterprise/ghe-es-search-repair` on the appliance.
47+
- |
48+
After a geo-replica is promoted to be a primary by running `ghe-repl-promote`, the actions workflow of a repository does not have any suggested workflows.
49+
- |
50+
Unexpected elements may appear in the UI on the repository overview page for locked repositories.

0 commit comments

Comments
 (0)