Bug Report: org:search:dump fails on large sources with RangeError: Invalid array length

## Description
`coveo org:search:dump` consistently crashes when attempting to export a very large source (~4M items). The failure always occurs after ~600k–700k results and produces:

```
RangeError: Invalid array length
```

This is caused by the CLI aggregating **all field names from every result** into a single array (`aggregatedFieldsWithDupes`), which eventually exceeds JavaScript’s maximum array length. The issue is structural and unrelated to memory exhaustion or Node heap size.

---

## Steps To Reproduce
Steps to reproduce the behavior:

1. Run a source dump on a large source, for example:
   ```bash
   coveo org:search:dump --source "YourSourceName" --destination ./dump
   ```
2. Allow the dump to progress past ~600k results.
3. Observe the CLI terminating with:
   ```
   RangeError: Invalid array length
   ```
4. Check stack trace showing failure in:
   - `extractFieldsFromAggregatedResults`
   - `dumpAggregatedResults`
   - `aggregateResults`
   - `fetchResults`

---

## Expected behavior
`org:search:dump` should:
- Successfully export large sources (millions of items).
- Stream results directly to disk without accumulating unbounded arrays.
- Track unique field names incrementally using a `Set` or similar structure.
- Avoid exceeding JavaScript’s array-length limits.

The dump should complete regardless of source size or number of fields.

---

## Screenshots
Stack trace excerpt illustrating the error:
```
RangeError: Invalid array length
    at Array.push (<anonymous>)
    at Dump.extractFieldsFromAggregatedResults (.../dump.js:162:40)
    at Dump.dumpAggregatedResults (.../dump.js:157:14)
    at Dump.aggregateResults (.../dump.js:149:18)
```

[error.log](https://github.com/user-attachments/files/24061342/error.log)

---

## Desktop:
- **OS:** Windows 11
  - **OS version:** 23H2
- **Browser:** N/A (CLI operation)
- **CLI Version:** Latest version as of 2025-12-09
- **Local Node version:** e.g., 18.x
- **Local NPM version:** e.g., 9.x

---

## Where the problem occurs
The issue originates in `dump.ts`:

```ts
private extractFieldsFromAggregatedResults() {
  this.aggregatedFieldsWithDupes.push(
    ...this.aggregatedResults.flatMap(Object.keys)
  );
}
```

Call chain:
```
extractFieldsFromAggregatedResults
  → dumpAggregatedResults
    → aggregateResults
      → fetchResults
```

Because `aggregatedFieldsWithDupes` grows:
- for every result,
- across the entire dump,
- containing duplicates,
- and includes potentially thousands of fields per item (dynamic fields, dictionary fields, system fields),

…the array eventually crosses JavaScript’s array-length ceiling (~2³²−1). The spread operator `push(...hugeArray)` triggers:

```
RangeError: Invalid array length
```

Increasing Node’s heap size does **not** affect this outcome.

---

## Why this design fails at scale
- JavaScript array lengths are capped to a 32-bit range.
- `aggregatedFieldsWithDupes` grows unbounded as the dump progresses.
- Large sources with many fields multiply the size of this array quickly.
- The CLI attempts to aggregate **all field names for all items before writing output**, which is not feasible for multi-million-item dumps.

Thus, the failure is inherent to the current design rather than an environmental or memory constraint.

---

## Impact
- `org:search:dump` cannot export large enterprise sources.
- The crash occurs reliably around 600k–700k items processed.
- Prevents use of the CLI for:
  - updating `permanentid` mappings (ID_MAPPING) across associated machine learning models.
  - audit,
  - analytics extraction.
- `--fieldsToExclude` helps only in limited cases; many sources contain high-cardinality dynamic fields where broad exclusion is not feasible.

---

## Proposed fix
### Switch from **aggregate-then-write** to a **streaming write model**
Rather than accumulating all field names and all results in memory, modify the algorithm to:

1. Write each page of results **directly** to disk on retrieval.
2. Track field names using a **Set<string>** instead of a giant deduplicated-on-write array.
3. Avoid ever using `push(...largeArray)`.
4. Keep memory usage constant regardless of source size.

### Benefits
- Eliminates array-length overflow.
- Enables dumping extremely large sources.
- Reduces memory footprint dramatically.
- Matches proven durable patterns used in log processing, ETL tools, and database dump pipelines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug Report: org:search:dump fails on large sources with RangeError: Invalid array length #1536

Description

Steps To Reproduce

Expected behavior

Screenshots

Desktop:

Where the problem occurs

Why this design fails at scale

Impact

Proposed fix

Switch from aggregate-then-write to a streaming write model

Benefits

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug Report: org:search:dump fails on large sources with RangeError: Invalid array length #1536

Description

Description

Steps To Reproduce

Expected behavior

Screenshots

Desktop:

Where the problem occurs

Why this design fails at scale

Impact

Proposed fix

Switch from aggregate-then-write to a streaming write model

Benefits

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions