Skip to content

Duplicate entry in results table if max_ll is the same across two model runs #36

@grst

Description

@grst

I have a case where a certain gene appears twice in the results table. This is annoying, because in that case I can't merge it back into an AnnData object.

FSV M g l max_delta max_ll max_mu_hat max_s2_t_hat model n s2_FSV s2_logdelta time BIC max_ll_null LLR pval qval
311 2.06039e-09 4 ENSG00000117090 54 4.85165e+08 1719.84 0.0110949 2.31245e-11 SE 2068 0.0197839 3.37435e+15 0.00366592 -3409.13 1719.84 -0.000104498 1 1
312 2.04339e-09 4 ENSG00000117090 181.915 4.85165e+08 1719.84 0.0110949 2.31245e-11 SE 2068 0.0194589 3.37435e+15 0.00116491 -3409.13 1719.84 -0.000104498 1 1

I think I tracked it down to

model_results = model_results[model_results.groupby(['g'])['max_ll'].transform(max) == model_results['max_ll']]

where the result from the model run with the max value for max_ll is chosen. In this case, the max_ll value is identical across two model runs, resulting in two values being chosen.

I'm unsure what the best solution is here. Just pick the first one?
The entries seem almost the same anyway, except for FSV and I values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions