Skip to content

Conversation

@takeruhukushima
Copy link

  • Changed select_consensus_statements to return up to 100 results by default
  • Added comprehensive test coverage for multi-group consensus
  • Implemented proper sorting by z-score (p-test) for representativeness
  • Enhanced test output to show detailed metrics for all results

- Changed select_consensus_statements to return up to 100 results by default
- Added comprehensive test coverage for multi-group consensus
- Implemented proper sorting by z-score (p-test) for representativeness
- Enhanced test output to show detailed metrics for all results

Signed-off-by: takeru.fukushima <[email protected]>
@nicobao
Copy link
Member

nicobao commented Sep 8, 2025

Thank you @takeruhukushima for your PR :)

It's on good track!

I think however, we shouldn't modify the existing API as it is designed to be 1-1 identical with pol.is API.

Instead, we should probably either create a new function for it, or create configuration variable to configure what's the expected list like (probably better).

Finally, our need is not only to provide the top 100, but to provide the list of ALL statements ranked by representativeness. I think it's probably not necessary to add another API. We can just add the relevant info in statements_df so that the library consumer can re-order the list of statements based on the representativeness for each group.

What we could do is:

Takeru, @patcon is the main maintainer of the library so I'll address him to see what he thinks :)

Hey @patcon just a heads up, Takeru did the japanese translation in Agora, and he's a young and motivated student very eager to learn and contribute to civic tech tools :)

I told him we're interested at Agora in being able to retrieve more than just 5 representative opinions, but I didn't detail much.

Let me know what you think of the requirements, which I think we've already briefly discussed last time we spoke!

@nicobao
Copy link
Member

nicobao commented Sep 8, 2025

Also @takeruhukushima, could you join the Polis User Discord group that @patcon manages?
https://discord.com/invite/wFWB8kzQpP
We can discuss things in the red-dwarf-polis-library channel as well

@nicobao nicobao changed the title feat(consensus): Return up to 100 opinions ranked by representativeness feat(stats): allow users to rank all opinions by representativeness Sep 8, 2025
@nicobao nicobao changed the title feat(stats): allow users to rank all opinions by representativeness feat(stats): allow users to rank all opinions by representativeness for each group Sep 8, 2025
@takeruhukushima
Copy link
Author

I'm sorry I messed with the existing functions.

I picked the top 100 entries because I thought there might be a theoretical risk of crashes or similar issues. That was my own arbitrary judgment, and I should have asked first.

@nicobao
Copy link
Member

nicobao commented Sep 8, 2025

I'm sorry I messed with the existing functions.

I picked the top 100 entries because I thought there might be a theoretical risk of crashes or similar issues. That was my own arbitrary judgment, and I should have asked first.

It's chill, thanks for your efforts Takeru it's on good track!

I'll wait for @patcon feedback first, and I'll give you more detailed feedbacks on your code and on the requirements in a few days, if you don't mind!

- Changed pick_max, confidence and prob_threshold parameters to be Optional
- Updated functions in consensus.py, stats.py, and base.py

Signed-off-by: takeru.fukushima <[email protected]>
@nicobao
Copy link
Member

nicobao commented Sep 19, 2025

Hi @takeruhukushima
I am reviewing this today.

@nicobao
Copy link
Member

nicobao commented Sep 23, 2025

Hi @takeruhukushima
Sorry for the long delay. I'm taking over the role of maintainer for the time being.
This PR is a bit too early, as we haven't defined the scope as to what should be done. We've discussed it on Monday during the 1st RedDwarf Open Call, but we're still not sure how to proceed.

In general @patcon what I'd like is to have all these values available: pa, pat, rat, ra, pa, rdt, rd, pdt and pd so I can rank all the statements.
We'd modify statements_df to have all these values for each cluster.
We could then add a score that can be use for the end-users (library consumer) to rank the statements accordingly. The score can be calculated via the following function:
https://github.com/nicobao/red-dwarf/blob/fix-repful-for/reddwarf/utils/stats.py#L507-L509
We'd also need to expose a way to calculate whether a statement if repful_for "agree" or "disagree" or nothing at all. And better so, we'd want to create granular functions that enable to have "agree", "strong_agree", "disagree", "strong_agree", "divisive", "strong_divisive". Logically, the score calculated above would rank first the "strong" repful_for first. And of course sometimes repful_for would be "not_representative" or just undefined.

@patcon could you provide feedback on this?

I'd like also to get more clarity as to what entails "confidence" in the source code? And what represents pat and rat exactly, as opposed to pa and ra? (same for disagree).
Finally how to identify "divisive statement for a specific cluster"?

@takeruhukushima I am sorry, we need a little more time to figure out the details, this is a complex feature. I'll come back to you for your "edit" PR in Agora which should be more easier to go through with. Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants