Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Nov 21, 2025

Create CHANGELOG.md files for Microsoft.Extensions.DataIngestion* projects ✅

Successfully created CHANGELOG.md files for all 4 DataIngestion projects documenting their public-facing APIs.

Completed:

  • Explore the DataIngestion projects and understand their public APIs
  • Review PRs merged before Introduce SectionChunker #7015 (merged Nov 10, 2025) to understand initial APIs
  • Create CHANGELOG.md for Microsoft.Extensions.DataIngestion.Abstractions with 10.0.0-preview.1 entry
  • Create CHANGELOG.md for Microsoft.Extensions.DataIngestion with:
  • Create CHANGELOG.md for Microsoft.Extensions.DataIngestion.MarkItDown with 10.0.0-preview.1 entry
  • Create CHANGELOG.md for Microsoft.Extensions.DataIngestion.Markdig with 10.0.0-preview.1 entry
  • Verify CHANGELOG format matches existing patterns in the repository
  • Review and finalize all CHANGELOGs
  • Code review completed
  • Security checks completed
  • Address feedback: Remove internal types (MarkdownParser and ElementsChunker)

Summary:

  1. Microsoft.Extensions.DataIngestion.Abstractions/CHANGELOG.md - Documents all abstraction types including IngestionDocument, IngestionChunk, IngestionChunker, IngestionDocumentReader, and all IngestionDocumentElement types (Section, Paragraph, Header, Footer, Table, Image)

  2. Microsoft.Extensions.DataIngestion/CHANGELOG.md - Documents concrete implementations:

    • Version 10.1.0-preview.1: SectionChunker (from PR Introduce SectionChunker #7015)
    • Version 10.0.0-preview.1: IngestionPipeline, chunkers (HeaderChunker, SemanticSimilarityChunker), enrichers (Classification, Keyword, Sentiment, Summary, ImageAlternativeText), and VectorStoreWriter
  3. Microsoft.Extensions.DataIngestion.MarkItDown/CHANGELOG.md - Documents MarkItDownReader and MarkItDownMcpReader classes

  4. Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md - Documents MarkdownReader class

All CHANGELOGs follow the same format as existing files in the repository and only document public APIs (internal types removed based on feedback).

Original prompt

This section details on the original issue you should resolve

<issue_title>Create CHANGELOG.md‎ files for Microsoft.Extensions.DataIngestion* projects</issue_title>
<issue_description>For the 10.0.0-preview.1 version, briefly describe every public facing API that was merged before #7015 was merged.

For 10.0.1-preview.1, mention just #7015 for the core Microsoft.Extensions.DataIngestion project</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Microsoft Reviewers: Open in CodeFlow

Copilot AI changed the title [WIP] Add CHANGELOG.md files for DataIngestion projects Add CHANGELOG.md files for Microsoft.Extensions.DataIngestion* projects Nov 21, 2025
Copilot AI requested a review from adamsitnik November 21, 2025 13:09
Copilot finished work on behalf of adamsitnik November 21, 2025 13:09
Comment on lines +6 to +20
- Introduced `IngestionDocument` class for representing format-agnostic document containers
- Introduced `IngestionDocumentElement` abstract base class for document elements
- Introduced document element types:
- `IngestionDocumentSection` - Represents a section or page in a document
- `IngestionDocumentParagraph` - Represents a paragraph
- `IngestionDocumentHeader` - Represents a header with optional level
- `IngestionDocumentFooter` - Represents a footer
- `IngestionDocumentTable` - Represents a table with 2D cell array
- `IngestionDocumentImage` - Represents an image with optional binary content and alternative text
- Introduced `IngestionChunk<T>` class for representing content chunks
- Introduced `IngestionChunker<T>` abstract base class for splitting documents into chunks
- Introduced `IngestionDocumentReader` abstract base class for reading source content and converting to documents
- Introduced `IngestionDocumentProcessor` abstract base class for processing documents
- Introduced `IngestionChunkProcessor<T>` abstract base class for processing chunks
- Introduced `IngestionChunkWriter<T>` abstract base class for writing chunks to storage
Copy link
Member

@stephentoub stephentoub Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamsitnik, it's up to you, but I don't know that all of this detail is necessary; it's part of "Initial preview release". Same goes for the others. i.e. this doesn't need to be documentation for what composes the library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub to be honest I am not sure myself, but on the other hand saying that it was just "Initial preview release" is not enough (for me) and I am not sure what middle ground should look like. With what Copilot wrote here, it will be easy to monitor upcoming changes. And to use this file to answer question "what exactly was included in XYZ release".

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot It looks great, we just need two minor corrections.

Comment on lines +6 to +20
- Introduced `IngestionDocument` class for representing format-agnostic document containers
- Introduced `IngestionDocumentElement` abstract base class for document elements
- Introduced document element types:
- `IngestionDocumentSection` - Represents a section or page in a document
- `IngestionDocumentParagraph` - Represents a paragraph
- `IngestionDocumentHeader` - Represents a header with optional level
- `IngestionDocumentFooter` - Represents a footer
- `IngestionDocumentTable` - Represents a table with 2D cell array
- `IngestionDocumentImage` - Represents an image with optional binary content and alternative text
- Introduced `IngestionChunk<T>` class for representing content chunks
- Introduced `IngestionChunker<T>` abstract base class for splitting documents into chunks
- Introduced `IngestionDocumentReader` abstract base class for reading source content and converting to documents
- Introduced `IngestionDocumentProcessor` abstract base class for processing documents
- Introduced `IngestionChunkProcessor<T>` abstract base class for processing chunks
- Introduced `IngestionChunkWriter<T>` abstract base class for writing chunks to storage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephentoub to be honest I am not sure myself, but on the other hand saying that it was just "Initial preview release" is not enough (for me) and I am not sure what middle ground should look like. With what Copilot wrote here, it will be easy to monitor upcoming changes. And to use this file to answer question "what exactly was included in XYZ release".

Copilot AI requested a review from adamsitnik November 21, 2025 13:25
Copilot finished work on behalf of adamsitnik November 21, 2025 13:25
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@stephentoub
Copy link
Member

to be honest I am not sure myself, but on the other hand saying that it was just "Initial preview release" is not enough (for me)

Can you elaborate on why? This is a change log. Saying that it's the initial implementation is what changed. It doesn't need to document the full shape of what was added, and frankly I don't think anyone will care, because they look to a change log to understand how various changes impact their previous adoption, and there isn't any.

This is what the other AI library changelogs did, e.g.
image

Subsequent changes can be detailed, covering the diff from the previous source.

Up to you, though.

@stephentoub stephentoub marked this pull request as ready for review November 21, 2025 15:35
Copilot AI review requested due to automatic review settings November 21, 2025 15:35
Copilot finished reviewing on behalf of stephentoub November 21, 2025 15:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds CHANGELOG.md files for the four Microsoft.Extensions.DataIngestion* projects to document their initial public-facing APIs. The CHANGELOGs follow the repository's established format and document all public APIs introduced in the 10.0.0-preview.1 release, with the core DataIngestion project also documenting the SectionChunker addition from 10.1.0-preview.1 (PR #7015).

Key changes:

  • Documents public APIs for the DataIngestion abstractions, core implementations, and two reader packages
  • Correctly excludes internal types like MarkdownParser and ElementsChunker from documentation
  • Follows the repository's CHANGELOG format and conventions

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md Documents core library APIs including pipeline, chunkers, enrichers, and vector store writer across two versions (10.0.0-preview.1 and 10.1.0-preview.1)
src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/CHANGELOG.md Documents all abstraction types including IngestionDocument, IngestionChunk, base classes for chunkers/readers/processors/writers, and document element types
src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/CHANGELOG.md Documents MarkItDownReader and MarkItDownMcpReader classes for converting documents via MarkItDown tool and MCP server
src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md Documents MarkdownReader class for reading markdown documents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create CHANGELOG.md‎ files for Microsoft.Extensions.DataIngestion* projects

3 participants