Skip to content

Conversation

@bikramkhastgir
Copy link

Issue: Addressing #109: Error loading document: Found multiple competing main tex files

This PR enhances the find_main_tex_file function in functions/import_pipeline/latex_utils.py to handle a few more edge cases to identify the master document in multi-file LaTeX projects.

Fix Description:

  • Rule 1 kept unchanged. Modified valid_main_path to contain the file content along with the path as a nested list to avoid having to reread the files.
  • Added a couple more rule-based heuristics.
    • Rule 2: Check for \input or \include as that is an indicator of a main file. In case, multiple such files are found,
    • Rule 3: exclude files using classes typically for figures or parts (e.g. 'standalone', 'tikz').
  • If still no file or multiple files are returned, to raise ValueError (as per the original logic)
  • Added sets of exclude and include constants for future enhancements.
  • Changed return type and logic to handle nested lists.

Test Data Example:
Arxiv repositories containing multiple .tex files with and without a naming convention (main.tex)

  1. https://arxiv.org/abs/2510.14090v1 - Quantum Low-Density Parity-Check Codes
  2. https://arxiv.org/abs/2412.15163 - Operationalising Rawlsian Ethics for Fairness in Norm-Learning Agents
  3. https://arxiv.org/abs/2511.10409 - Explaining Decentralized Multi-Agent Reinforcement Learning Policies
  4. https://arxiv.org/abs/2511.09724 - PALMS+: Modular Image-Based Floor Plan Localization Leveraging Depth Foundation Model (has main.tex along with multiple .tex files)

Changes:
File located at: functions/import_pipeline/latex_utils.py
The constants and the enhanced find_main_tex_file function have been modified in the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant