Skip to content

Conversation

@0xFirestorm
Copy link

Description

This PR replaces three GPL/LGPL licensed dependencies with permissively licensed alternatives. All replacements maintain full backward compatibility with existing functionality.
Dependencies Replaced:

  • aioconsole → questionary (GPL-3.0 → MIT)
    Used in XPIA manual workflow for async user input
    File: pyrit/executor/workflow/xpia.py

  • pycountry → iso3166 (LGPL-2.1 → Apache-2.0)
    Provides country/region names for bias testing datasets
    File: pyrit/datasets/seclists_bias_testing_dataset.py
    Added hardcoded subdivision list extracted from pycountry's subdivision list, since iso3166 only provides countries

  • fpdf2 → ReportLab (LGPL-3.0 → BSD)
    Generates PDFs and overlays text on existing PDFs
    File: pyrit/prompt_converter/pdf_converter.py
    Implemented custom text wrapping to match original behavior.

Issue no:
#1123

@romanlutz

Tests and Documentation

Unit test for the three modified files:
xpia
seclists(pycountry)
fpdf

Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for coming up with replacement suggestions! I responded on all three. Let me knwo what you think.

subdivisions = list(pycountry.subdivisions)
# NEW:
# Build country names list from iso3166
country_names = [c.name for c in countries]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're just doing a list like this for subdivision names why not for countries as well? Perhaps that's simpler than adding a dependency

Comment on lines 626 to 627
prompt = "Please trigger the processing target's execution and paste the output here:"
return await questionary.text(prompt).ask_async() # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reluctant to introduce yet another dependency just for this one line...

So far, this workflow is something we run by itself anyway so blocking with input(...) would be ok. Or maybe something like

await asyncio.to_thread(input, prompt)

is also fine. Not sure if that messes with jupyter, though.

@0xFirestorm
Copy link
Author

@0xFirestorm please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

@0xFirestorm
Copy link
Author

Hi @romanlutz ,
I’ve removed the krippendorff dependency and implemented the functionality myself. I’ve pushed the changes. Could you please review them when you get a chance?

@romanlutz
Copy link
Contributor

Sounds good. I wanted to review but it seems like you pulled in th changes from the Entra PR. I'll pull when that's merged to make it easier to see the actual diff

Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marked as "request changes" just because of krippendorff which we need to address differently as discussed via Discord. Thanks for the excellent contribution and please feel free to ping me on Discord anytime.

pyproject.toml Outdated
"feedgen>=1.0.0",
"flake8>=7.2.0",
"flake8-copyright>=0.2.4",
"fpdf2>=2.8.3",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be removed?

Comment on lines 129 to 139
Parameters
----------
n_v : ndarray, with shape (V,)
Number of pairable elements for each value.
dtype : data-type
Result and computation data-type.
Returns
-------
e : ndarray, with shape (V, V)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Across this file, the doc style isn't what we're using elsewhere ("Google style"). Can you adjust that?

from fpdf import FPDF
from pypdf import PageObject, PdfReader, PdfWriter

# from fpdf import FPDF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# from fpdf import FPDF

Comment on lines 177 to 195
# def _generate_pdf(self, content: str) -> bytes:
# """
# Generates a PDF with the given content.

# Args:
# content (str): The text content to include in the PDF.

# Returns:
# bytes: The generated PDF content in bytes.
# """
# pdf = FPDF(format=(self._page_width, self._page_height)) # Use custom page size
# pdf.add_page()
# pdf.set_font(self._font_type, size=self._font_size) # Use custom font settings
# pdf.multi_cell(self._column_width, self._row_height, content) # Use configurable cell dimensions

# pdf_bytes = BytesIO()
# pdf.output(pdf_bytes)
# return pdf_bytes.getvalue()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this and the references to FPDF in the code below. I'll validate this with an example I have and will report back.

from pyrit.score.scorer_evaluation.metrics_type import MetricsType
from pyrit.score.true_false.true_false_scorer import TrueFalseScorer

from .krippendorff import alpha as _krippendorff_alpha
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably just call it that from the start

@@ -0,0 +1,154 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
"""Krippendorff's alpha for ordinal data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this adapted from LightTag/simpledorff? If so, it would be nice to give them a shoutout in the docstring.

We should also add the unit tests they have to make sure the implementation doesn't have a small bug anywhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was inspired by LightTag/simpledorff’s decomposition but reimplemented in numpy based form for ordinal data. I’ll add a docstring acknowledgment and create equivalent unit tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if the function was broken up a bit more to help with readability and to make it easier to understand. If you want I can provide suggestions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I’d appreciate some suggestions on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I should be able to get to this by Thursday.

Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is essentially ready, just needs a few small tweaks. We can merge it this week and include in our upcoming release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants