diff --git a/content/admin/managing-iam/provisioning-user-accounts-with-scim/user-provisioning-with-scim-on-ghes.md b/content/admin/managing-iam/provisioning-user-accounts-with-scim/user-provisioning-with-scim-on-ghes.md index 5e6795b0283d..95cd8992cdae 100644 --- a/content/admin/managing-iam/provisioning-user-accounts-with-scim/user-provisioning-with-scim-on-ghes.md +++ b/content/admin/managing-iam/provisioning-user-accounts-with-scim/user-provisioning-with-scim-on-ghes.md @@ -67,13 +67,25 @@ During the {% data variables.release-phases.private_preview %}, your account tea When SCIM is enabled, you will no longer be able to delete, suspend, or promote SCIM-provisioned users directly on {% data variables.product.prodname_ghe_server %}. You must manage these processes from your IdP. -## What will happen to existing users on my instance? +To view suspended members, navigate to the "Suspended Members" tab of your enterprise settings. This page will be present when SCIM is enabled on {% data variables.product.prodname_ghe_server %}. -If you currently use SAML SSO, and you are enabling SCIM, you should be aware of what happens to existing users during SCIM provisioning. +{% data reusables.enterprise-accounts.access-enterprise %} +{% data reusables.enterprise-accounts.people-tab %} +1. Click **Suspended Members**. -* When SCIM is enabled, users with SAML-linked identities will **not be able to sign in** until their identities have been provisioned by SCIM.{% ifversion scim-for-ghes-ga %} You will no longer be able to update the SAML `NameID` of existing users in the site admin dashboard.{% endif %} -* When your instance receives a SCIM request, SCIM identities are matched to existing users by **comparing the `userName` SCIM field with the {% data variables.product.prodname_dotcom %} username**. If a user with a matching username doesn't exist, {% data variables.product.prodname_dotcom %} creates a new user. -* If {% data variables.product.prodname_dotcom %} successfully identifies a user from the IdP, but account details such as email address, first name, or last name don't match, the instance **overwrites the details** with values from the IdP. Any email addresses other than the primary email provisioned by SCIM will also be deleted from the user account. +## What happens when I enable SCIM? + +If you currently use SAML SSO, and you are enabling SCIM, you should be aware of what happens to existing user accounts on {% data variables.product.prodname_ghe_server %} once SCIM is enabled. + +* Existing users with SAML mappings will **not be able to sign in** until their identities have been provisioned by SCIM. +{%- ifversion scim-for-ghes-ga %} +* {% data variables.product.prodname_ghe_server %} will no longer store SAML mappings for users. Instead, SCIM identities will be stored for users when a user is provisioned. +* You will no longer see the "SAML authentication" section on the `https://HOSTNAME/users/USER/security` site admin page for users. It will not be possible to view or update SAML NameID mappings that were previously visible in this section, since these stored SAML mappings are no longer evaluated during SAML authentication when SCIM is enabled. +{%- endif %} +* When your instance receives a SCIM request, SCIM identities are matched to existing users by **comparing the SCIM `userName` attribute value with the {% data variables.product.prodname_ghe_server %} username**. This means that an existing {% data variables.product.prodname_ghe_server %} user account, regardless of whether it was originally created as a local user account or via SAML JIT-provisioning, can be converted into a SCIM-linked user account if these two values match. + * If a user account with a matching username does exist, {% data variables.product.prodname_ghe_server %} links the SCIM identity to this user account. + * If a user account with a matching username doesn't exist, {% data variables.product.prodname_ghe_server %} creates a new user account and links it to this SCIM identity. +* If {% data variables.product.prodname_dotcom %} successfully matches a user who is authenticating via SAML with an existing user account, but account details such as email address, first name, or last name don't match, the instance **overwrites the details** with values from the IdP. Any email addresses other than the primary email provisioned by SCIM will also be deleted from the user account. ## What happens during SAML authentication? @@ -89,19 +101,26 @@ After an IdP administrator grants a person access to {% data variables.location. ## What happens if I disable SCIM? -SCIM will be disabled on your instance if any of the following things happens. +SCIM will be disabled on {% data variables.product.prodname_ghe_server %} if any of the following things happens. * The **Enable SCIM configuration** checkbox is unselected on the "Authentication security" page in the enterprise settings. * The **SAML** radio button is unselected in the "Authentication" section of the Management Console. * The SAML **Issuer** or **Single sign-on URL** field is updated in the "Authentication" section of the Management Console. -If SCIM is disabled on the instance: +When SCIM is disabled on {% data variables.product.prodname_ghe_server %}: +* All linked SCIM identities and SCIM-provisioned groups will be deleted from the instance. * Requests to the SCIM API endpoints on your instance will no longer succeed. -* SCIM-provisioned users will remain unchanged and will not be suspended. +* All SCIM external identities on {% data variables.product.prodname_ghe_server %} will be deleted. +* All user accounts will remain with the same usernames, and they will not be suspended when SCIM is disabled. +* All of the external groups that were previously provisioned by SCIM will be deleted. +* All user accounts, including SCIM-provisioned user accounts, will remain on the instance and will not be suspended. * Site administrators will be able to manage the lifecycle of SCIM-provisioned users, such as suspension and deletion, from the site admin dashboard. * Users will still be able to sign on via SAML, if enabled. -* Users will be unlinked from their external identity record, and the record will be deleted. +* The "Suspended Members" page in your enterprise settings will no longer be present. Suspended members can still be seen in the [Site Admin dashboard](/admin/managing-accounts-and-repositories/managing-users-in-your-enterprise/suspending-and-unsuspending-users#viewing-suspended-users-in-the-site-admin-dashboard) +{%- ifversion scim-for-ghes-ga %} +* You will be able to see the "SAML authentication" section on the `https://HOSTNAME/users/USER/security` site admin page for users. If any SAML mappings were previously created for users on the {% data variables.product.prodname_ghe_server %} before SCIM was enabled, it will be possible to once again view and update them in this section. +{%- endif %} {% endif %} diff --git a/content/repositories/archiving-a-github-repository/referencing-and-citing-content.md b/content/repositories/archiving-a-github-repository/referencing-and-citing-content.md index 22beb97d461b..af8884b679ff 100644 --- a/content/repositories/archiving-a-github-repository/referencing-and-citing-content.md +++ b/content/repositories/archiving-a-github-repository/referencing-and-citing-content.md @@ -31,4 +31,4 @@ Zenodo archives your repository and issues a new DOI each time you create a new ## Publicizing and citing research material with Figshare -Academics can use the data management service [Figshare](http://figshare.com) to publicize and cite research material. For more information, see [Figshare's support site](https://info.figshare.com/user-guide/integrations/#github). +Academics can use the data management service [Figshare](http://figshare.com) to publicize and cite research material. For more information, see [Figshare's support site](https://info.figshare.com/user-guide/how-to-connect-figshare-with-your-github-account/). diff --git a/src/shielding/lib/fastly-ips.ts b/src/shielding/lib/fastly-ips.ts new file mode 100644 index 000000000000..f010031ec62d --- /dev/null +++ b/src/shielding/lib/fastly-ips.ts @@ -0,0 +1,81 @@ +// Logic to get and store the current list of public Fastly IPs from the Fastly API: https://www.fastly.com/documentation/reference/api/utils/public-ip-list/ + +// Default returned from ➜ curl "https://api.fastly.com/public-ip-list" +export const DEFAULT_FASTLY_IPS: string[] = [ + '23.235.32.0/20', + '43.249.72.0/22', + '103.244.50.0/24', + '103.245.222.0/23', + '103.245.224.0/24', + '104.156.80.0/20', + '140.248.64.0/18', + '140.248.128.0/17', + '146.75.0.0/17', + '151.101.0.0/16', + '157.52.64.0/18', + '167.82.0.0/17', + '167.82.128.0/20', + '167.82.160.0/20', + '167.82.224.0/20', + '172.111.64.0/18', + '185.31.16.0/22', + '199.27.72.0/21', + '199.232.0.0/16', +] + +let ipCache: string[] = [] + +export async function getPublicFastlyIPs(): Promise { + // Don't fetch the list in dev & testing, just use the defaults + if (process.env.NODE_ENV !== 'production') { + ipCache = DEFAULT_FASTLY_IPS + } + + if (ipCache.length) { + return ipCache + } + + const endpoint = 'https://api.fastly.com/public-ip-list' + let ips: string[] = [] + let attempt = 0 + + while (attempt < 3) { + try { + const response = await fetch(endpoint) + if (!response.ok) { + throw new Error(`Failed to fetch: ${response.status}`) + } + const data = await response.json() + if (data && Array.isArray(data.addresses)) { + ips = data.addresses + break + } else { + throw new Error('Invalid response structure') + } + } catch (error: any) { + console.error( + `Failed to fetch Fastly IPs: ${error.message}. Retrying ${3 - attempt} more times`, + ) + attempt++ + if (attempt >= 3) { + ips = DEFAULT_FASTLY_IPS + } + } + } + + ipCache = ips + return ips +} + +// The IPs we check in the rate-limiter are in the form `X.X.X.X` +// But the IPs returned from the Fastly API are in the form `X.X.X.X/Y` +// For an IP in the rate-limiter, we want `X.X.X.*` to match `X.X.X.X/Y` +export async function isFastlyIP(ip: string): Promise { + // If IPs aren't initialized, fetch them + if (!ipCache.length) { + await getPublicFastlyIPs() + } + const parts = ip.split('.') + const prefix = parts.slice(0, 3).join('.') + return ipCache.some((fastlyIP) => fastlyIP.startsWith(prefix)) +} diff --git a/src/shielding/middleware/index.ts b/src/shielding/middleware/index.ts index a22f952c520d..c0ea035c1125 100644 --- a/src/shielding/middleware/index.ts +++ b/src/shielding/middleware/index.ts @@ -6,9 +6,11 @@ import handleOldNextDataPaths from './handle-old-next-data-paths' import handleInvalidQuerystringValues from './handle-invalid-query-string-values' import handleInvalidNextPaths from './handle-invalid-nextjs-paths' import handleInvalidHeaders from './handle-invalid-headers' +import { createRateLimiter } from './rate-limit' const router = express.Router() +router.use(createRateLimiter()) router.use(handleInvalidQuerystrings) router.use(handleInvalidPaths) router.use(handleOldNextDataPaths) diff --git a/src/shielding/middleware/rate-limit.ts b/src/shielding/middleware/rate-limit.ts new file mode 100644 index 000000000000..e22f8a0f44f3 --- /dev/null +++ b/src/shielding/middleware/rate-limit.ts @@ -0,0 +1,168 @@ +import type { Request } from 'express' + +import rateLimit from 'express-rate-limit' + +import statsd from '@/observability/lib/statsd.js' +import { noCacheControl } from '@/frame/middleware/cache-control.js' +import { isFastlyIP } from '@/shielding/lib/fastly-ips' + +const EXPIRES_IN_AS_SECONDS = 60 + +const MAX = process.env.RATE_LIMIT_MAX ? parseInt(process.env.RATE_LIMIT_MAX, 10) : 50 +if (isNaN(MAX)) { + throw new Error(`process.env.RATE_LIMIT_MAX (${process.env.RATE_LIMIT_MAX}) not a number`) +} + +// We apply this rate limiter to _all_ routes in src/shielding/index.ts except for `/api/*` routes +export function createRateLimiter(max = MAX, isAPILimiter = false) { + return rateLimit({ + // 1 minute + windowMs: EXPIRES_IN_AS_SECONDS * 1000, + // limit each IP to X requests per windowMs + // We currently have about 12 instances in production. That's routed + // in Moda to spread the requests to each healthy instance. + // So, the true rate limit, per `windowMs`, is this number multiplied + // by the current number of instances. + max: max, + + // Return rate limit info in the `RateLimit-*` headers + standardHeaders: true, + // Disable the `X-RateLimit-*` headers + legacyHeaders: false, + + keyGenerator: (req) => { + return getClientIPFromReq(req) + }, + + skip: async (req) => { + const ip = getClientIPFromReq(req) + if (await isFastlyIP(ip)) { + return true + } + // IP is empty when we are in a non-production (not behind Fastly) environment + // In these environments, we don't want to rate limit (including tests) + // However, if you want to test rate limiting locally, you can manually set + // the `fastly-client-ip` header to your IP address to bypass this check set the + if (ip === '') { + return true + } + + // We handle /api/* routes with a separate rate limiter + // When it is a separate rate limiter, isAPILimiter will be passed as true + if (req.path.startsWith('/api/') || isAPILimiter) { + return false + } + + // If the request is not suspicious, don't rate limit it + if (!isSuspiciousRequest(req)) { + return true + } + + // At this point, a request is suspicious. We want to track how many are in datadog + const tags = [`url:${req.url}`, `ip:${ip}`, `path:${req.path}`, `qs:${req.url.split('?')[1]}`] + statsd.increment('middleware.rate_limit_dont_skip', 1, tags) + + return false + }, + + handler: (req, res, next, options) => { + const tags = [`url:${req.url}`, `ip:${req.ip}`, `path:${req.path}`] + statsd.increment('middleware.rate_limit', 1, tags) + noCacheControl(res) + res.status(options.statusCode).send(options.message) + }, + + // Temporary so that we can see what is coming from Fastly v app level + statusCode: 418, // "i'm a teapot" + }) +} + +function getClientIPFromReq(req: Request) { + // Moda forwards the client's IP using the `fastly-client-ip` header. + // However, in non-fastly environments, this header is not present. + // Staging is behind Okta, so we don't need to rate limit there. + let ip = req?.headers?.['fastly-client-ip'] || '' + // This is to satisfy TypeScript since a header could be a string array, but fastly-client-ip is not + if (typeof ip !== 'string') { + ip = '' + } + return ip +} + +const RECOGNIZED_KEYS_BY_PREFIX = { + '/_next/data/': ['versionId', 'productId', 'restPage', 'apiVersion', 'category', 'subcategory'], + '/api/search': ['query', 'language', 'version', 'page', 'product', 'autocomplete', 'limit'], + '/api/anchor-redirect': ['hash', 'path'], + '/api/webhooks': ['category', 'version'], + '/api/pageinfo': ['pathname'], +} + +const RECOGNIZED_KEYS = { + search: ['query', 'page'], +} + +const MISC_KEYS = [ + // Learning track pages + 'learn', + 'learnProduct', + + // Platform picker + 'platform', + + // Tool picker + 'tool', + + // When apiVersion isn't the only one. E.g. ?apiVersion=XXX&tool=vscode + 'apiVersion', + + // Lowercase for rest pages + 'apiversion', + + // We use the query param "feature" to enable experiments in the browser + 'feature', +] + +/** + * Return true if the request looks like a DoS request. I.e. suspicious. + * + * We've seen lots of requests slip past the CDN and its edge rate limiter + * that clearly are not realistic URLs that you'd get in a browser. + * For example `?action=octrh&api=h9vcd&immagine=jzs3c&lang=xb0kp&m=rrmek` + * There are certain URLs that have query strings that are valid, but + * have one more query string keys. In particular the `/api/..` endpoints. + * + * Remember, just because this function might return true, it doesn't mean + * the request will be rate limited. It has to be both suspicious AND + * have lots and lots of requests. + * + * @param {Request} req + * @returns boolean + */ +function isSuspiciousRequest(req: Request) { + const keys = Object.keys(req.query) + + // Since this function can only speculate by query strings (at the + // moment), if the URL doesn't have any query strings it's not suspicious. + if (!keys.length) { + return false + } + + // E.g. `/en/rest/actions?apiVersion=YYYY-MM-DD` + if (keys.length === 1 && keys[0] === 'apiVersion') return false + + // Now check what query string keys are *left* based on a list of + // recognized keys per different prefixes. + for (const [prefix, recognizedKeys] of Object.entries(RECOGNIZED_KEYS_BY_PREFIX)) { + if (req.path.startsWith(prefix)) { + return keys.filter((key) => !recognizedKeys.includes(key)).length > 0 + } + } + + // E.g. `/fr/search?query=foo + if (req.path.split('/')[2] === 'search') { + return keys.filter((key) => !RECOGNIZED_KEYS.search.includes(key)).length > 0 + } + + const unrecognizedKeys = keys.filter((key) => !MISC_KEYS.includes(key)) + return unrecognizedKeys.length > 0 +} diff --git a/src/shielding/tests/shielding.ts b/src/shielding/tests/shielding.ts index a68bfca4043d..41e4279d0815 100644 --- a/src/shielding/tests/shielding.ts +++ b/src/shielding/tests/shielding.ts @@ -2,6 +2,7 @@ import { describe, expect, test } from 'vitest' import { SURROGATE_ENUMS } from '@/frame/middleware/set-fastly-surrogate-key.js' import { get } from '@/tests/helpers/e2etest.js' +import { DEFAULT_FASTLY_IPS } from '@/shielding/lib/fastly-ips' describe('honeypotting', () => { test('any GET with survey-vote and survey-token query strings is 400', async () => { @@ -94,6 +95,73 @@ describe('index.md and .md suffixes', () => { }) }) +describe('rate limiting', () => { + // We can't actually trigger a full rate limit because + // then all other tests will all fail. And we can't rely on this + // test always being run last. + + test('only happens if you have junk query strings', async () => { + const res = await get('/robots.txt?foo=bar', { + headers: { + // Rate limiting only happens in production, so we need to + // make the environment look like production. + 'fastly-client-ip': 'abc', + }, + }) + expect(res.statusCode).toBe(200) + const limit = parseInt(res.headers['ratelimit-limit']) + const remaining = parseInt(res.headers['ratelimit-remaining']) + expect(limit).toBeGreaterThan(0) + expect(remaining).toBeLessThan(limit) + + // A second request + { + const res = await get('/robots.txt?foo=buzz', { + headers: { + 'fastly-client-ip': 'abc', + }, + }) + expect(res.statusCode).toBe(200) + const newLimit = parseInt(res.headers['ratelimit-limit']) + const newRemaining = parseInt(res.headers['ratelimit-remaining']) + expect(newLimit).toBe(limit) + // Can't rely on `newRemaining == remaining - 1` because of + // concurrency of test-running. + expect(newRemaining).toBeLessThan(remaining) + } + }) + + test('nothing happens if no unrecognized query string', async () => { + const res = await get('/robots.txt') + expect(res.statusCode).toBe(200) + expect(res.headers['ratelimit-limit']).toBeUndefined() + expect(res.headers['ratelimit-remaining']).toBeUndefined() + }) + + test('Fastly IPs are not rate limited', async () => { + // Fastly IPs are in the form `X.X.X.X/Y` + // Rate limited IPs are in the form `X.X.X.X` + // Where the last X could be any 2-3 digit number + const mockFastlyIP = + DEFAULT_FASTLY_IPS[0].split('.').slice(0, 3).join('.') + `.${Math.floor(Math.random() * 100)}` + // Cookies only allows 1 request per minute + const res1 = await get('/api/cookies', { + headers: { + 'fastly-client-ip': mockFastlyIP, + }, + }) + expect(res1.statusCode).toBe(200) + + // A second request shouldn't be rate limited because it's from a Fastly IP + const res2 = await get('/api/cookies', { + headers: { + 'fastly-client-ip': mockFastlyIP, + }, + }) + expect(res2.statusCode).toBe(200) + }) +}) + describe('404 pages and their content-type', () => { const exampleNonLanguage404plain = ['/_next/image/foo'] test.each(exampleNonLanguage404plain)(