-
Notifications
You must be signed in to change notification settings - Fork 843
Add CHANGELOG.md files for Microsoft.Extensions.DataIngestion* projects #7072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: adamsitnik <[email protected]>
| - Introduced `IngestionDocument` class for representing format-agnostic document containers | ||
| - Introduced `IngestionDocumentElement` abstract base class for document elements | ||
| - Introduced document element types: | ||
| - `IngestionDocumentSection` - Represents a section or page in a document | ||
| - `IngestionDocumentParagraph` - Represents a paragraph | ||
| - `IngestionDocumentHeader` - Represents a header with optional level | ||
| - `IngestionDocumentFooter` - Represents a footer | ||
| - `IngestionDocumentTable` - Represents a table with 2D cell array | ||
| - `IngestionDocumentImage` - Represents an image with optional binary content and alternative text | ||
| - Introduced `IngestionChunk<T>` class for representing content chunks | ||
| - Introduced `IngestionChunker<T>` abstract base class for splitting documents into chunks | ||
| - Introduced `IngestionDocumentReader` abstract base class for reading source content and converting to documents | ||
| - Introduced `IngestionDocumentProcessor` abstract base class for processing documents | ||
| - Introduced `IngestionChunkProcessor<T>` abstract base class for processing chunks | ||
| - Introduced `IngestionChunkWriter<T>` abstract base class for writing chunks to storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamsitnik, it's up to you, but I don't know that all of this detail is necessary; it's part of "Initial preview release". Same goes for the others. i.e. this doesn't need to be documentation for what composes the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub to be honest I am not sure myself, but on the other hand saying that it was just "Initial preview release" is not enough (for me) and I am not sure what middle ground should look like. With what Copilot wrote here, it will be easy to monitor upcoming changes. And to use this file to answer question "what exactly was included in XYZ release".
adamsitnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot It looks great, we just need two minor corrections.
src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md
Outdated
Show resolved
Hide resolved
| - Introduced `IngestionDocument` class for representing format-agnostic document containers | ||
| - Introduced `IngestionDocumentElement` abstract base class for document elements | ||
| - Introduced document element types: | ||
| - `IngestionDocumentSection` - Represents a section or page in a document | ||
| - `IngestionDocumentParagraph` - Represents a paragraph | ||
| - `IngestionDocumentHeader` - Represents a header with optional level | ||
| - `IngestionDocumentFooter` - Represents a footer | ||
| - `IngestionDocumentTable` - Represents a table with 2D cell array | ||
| - `IngestionDocumentImage` - Represents an image with optional binary content and alternative text | ||
| - Introduced `IngestionChunk<T>` class for representing content chunks | ||
| - Introduced `IngestionChunker<T>` abstract base class for splitting documents into chunks | ||
| - Introduced `IngestionDocumentReader` abstract base class for reading source content and converting to documents | ||
| - Introduced `IngestionDocumentProcessor` abstract base class for processing documents | ||
| - Introduced `IngestionChunkProcessor<T>` abstract base class for processing chunks | ||
| - Introduced `IngestionChunkWriter<T>` abstract base class for writing chunks to storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephentoub to be honest I am not sure myself, but on the other hand saying that it was just "Initial preview release" is not enough (for me) and I am not sure what middle ground should look like. With what Copilot wrote here, it will be easy to monitor upcoming changes. And to use this file to answer question "what exactly was included in XYZ release".
…nker) Co-authored-by: adamsitnik <[email protected]>
adamsitnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds CHANGELOG.md files for the four Microsoft.Extensions.DataIngestion* projects to document their initial public-facing APIs. The CHANGELOGs follow the repository's established format and document all public APIs introduced in the 10.0.0-preview.1 release, with the core DataIngestion project also documenting the SectionChunker addition from 10.1.0-preview.1 (PR #7015).
Key changes:
- Documents public APIs for the DataIngestion abstractions, core implementations, and two reader packages
- Correctly excludes internal types like
MarkdownParserandElementsChunkerfrom documentation - Follows the repository's CHANGELOG format and conventions
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/Libraries/Microsoft.Extensions.DataIngestion/CHANGELOG.md | Documents core library APIs including pipeline, chunkers, enrichers, and vector store writer across two versions (10.0.0-preview.1 and 10.1.0-preview.1) |
| src/Libraries/Microsoft.Extensions.DataIngestion.Abstractions/CHANGELOG.md | Documents all abstraction types including IngestionDocument, IngestionChunk, base classes for chunkers/readers/processors/writers, and document element types |
| src/Libraries/Microsoft.Extensions.DataIngestion.MarkItDown/CHANGELOG.md | Documents MarkItDownReader and MarkItDownMcpReader classes for converting documents via MarkItDown tool and MCP server |
| src/Libraries/Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md | Documents MarkdownReader class for reading markdown documents |

Create CHANGELOG.md files for Microsoft.Extensions.DataIngestion* projects ✅
Successfully created CHANGELOG.md files for all 4 DataIngestion projects documenting their public-facing APIs.
Completed:
Summary:
Microsoft.Extensions.DataIngestion.Abstractions/CHANGELOG.md - Documents all abstraction types including IngestionDocument, IngestionChunk, IngestionChunker, IngestionDocumentReader, and all IngestionDocumentElement types (Section, Paragraph, Header, Footer, Table, Image)
Microsoft.Extensions.DataIngestion/CHANGELOG.md - Documents concrete implementations:
Microsoft.Extensions.DataIngestion.MarkItDown/CHANGELOG.md - Documents MarkItDownReader and MarkItDownMcpReader classes
Microsoft.Extensions.DataIngestion.Markdig/CHANGELOG.md - Documents MarkdownReader class
All CHANGELOGs follow the same format as existing files in the repository and only document public APIs (internal types removed based on feedback).
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.
Microsoft Reviewers: Open in CodeFlow