Skip to content

Automatically sync translations across QuantEcon lecture repositories using Claude Sonnet 4.5

Notifications You must be signed in to change notification settings

QuantEcon/action-translation-sync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Translation Sync Action

A GitHub Action that automatically synchronizes translations across repositories using Claude Sonnet 4.5.

Version: v0.5.1 | Status: Testing & Development โœ…

Overview

This action monitors a source repository for merged pull requests and automatically translates changed MyST Markdown files to a target repository, creating pull requests for review.

Key Features:

  • ๐ŸŒ Language-Extensible: Easy configuration for multiple target languages
  • ๐Ÿ—บ๏ธ Heading-Map System: Robust cross-language section matching
  • ๐Ÿ”„ Intelligent Diff Translation: Only translates changed sections
  • โœ๏ธ MyST Markdown Support: Preserves code blocks, math equations, and directives
  • ๐Ÿ“š Glossary Support: Built-in glossaries for consistent terminology
  • โœ… GPT5 Validated: 100% pass rate on 21 comprehensive test scenarios

Features

  • ๐ŸŒ Language Configuration (v0.5.1): Extensible system for language-specific rules (punctuation, typography)
  • ๐Ÿ—บ๏ธ Heading-Map System: Robust cross-language section matching that survives reordering
  • ๐Ÿ”„ Intelligent Diff Translation: Only translates changed sections, preserving existing translations
  • ๐Ÿ“„ Full File Translation: Handles new files with complete translation
  • โœ๏ธ MyST Markdown Support: Preserves code blocks, math equations, and MyST directives
  • ๐Ÿ“š Glossary Support: Built-in glossaries for consistent technical terminology (355 terms for zh-cn)
  • ๐Ÿ“‘ Automatic TOC Updates: Updates _toc.yml when new files are added
  • ๐Ÿ” PR-Based Workflow: All translations go through pull request review
  • โ™ป๏ธ Recursive Subsections: Full support for nested headings at any depth (##-######)
  • โœ… Extensively Tested: 147 unit tests passing, 24 GitHub integration test scenarios

Usage

Basic Setup

Add this workflow to your source repository (e.g., .github/workflows/sync-translations.yml):

name: Sync Translations

on:
  pull_request:
    types: [closed]
    paths:
      - 'lectures/**/*.md'

jobs:
  sync-to-chinese:
    if: github.event.pull_request.merged == true
    runs-on: ubuntu-latest
    
    steps:
      - uses: quantecon/action-translation-sync@v1
        with:
          target-repo: 'quantecon/lecture-python.zh-cn'
          target-language: 'zh-cn'
          docs-folder: 'lectures/'
          source-language: 'en'
          glossary-path: '.github/translation-glossary.json'
          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
          github-token: ${{ secrets.GITHUB_TOKEN }}
          # Optional: Request reviewers for translation PRs
          pr-reviewers: 'username1,username2'
          pr-team-reviewers: 'translation-team'
          pr-labels: 'translation,automated,needs-review'

Inputs

Input Required Default Description
target-repo Yes - Target repository (format: owner/repo)
target-language Yes - Target language code (e.g., zh-cn, ja, es)
docs-folder No lectures/ Documentation folder to monitor
source-language No en Source language code
glossary-path No - Path to custom glossary (built-in glossary used by default)
toc-file No _toc.yml Table of contents file name
anthropic-api-key Yes - Anthropic API key for Claude
claude-model No claude-sonnet-4-5-20250929 Claude model to use for translation
github-token Yes - GitHub token for API access
pr-labels No translation-sync,automated Comma-separated PR labels
pr-reviewers No - Comma-separated GitHub usernames (e.g., user1,user2)
pr-team-reviewers No - Comma-separated GitHub team slugs (e.g., team1,team2)

Outputs

Output Description
pr-url URL of the created pull request
files-synced Number of files synchronized

Glossary Format

The action includes built-in glossaries for consistent translation across all QuantEcon lectures.

Location: glossary/{language}.json

Current glossaries:

  • glossary/zh-cn.json - Simplified Chinese (355 terms) โœ…
  • glossary/ja.json - Japanese (planned)
  • glossary/es.json - Spanish (planned)

The built-in glossary is automatically used - no configuration needed!

See glossary/README.md for details on the glossary structure and how to contribute.

Custom Glossary (Optional)

If you need to add project-specific terms, you can provide a custom glossary:

with:
  glossary-path: '.github/custom-glossary.json'

Glossary format:

{
  "version": "1.0",
  "terms": [
    {
      "en": "household",
      "zh-cn": "ๅฎถๅบญ",
      "context": "economics"
    },
    {
      "en": "equilibrium",
      "zh-cn": "ๅ‡่กก"
    }
  ],
  "style_guide": {
    "preserve_code_blocks": true,
    "preserve_math": true,
    "preserve_citations": true,
    "preserve_myst_directives": true
  }
}

How It Works

  1. Trigger: Activates when a PR is merged in the source repository
  2. Detection: Identifies changed MyST Markdown files
  3. Analysis: For each file:
    • If file exists in target: Detects specific changes (diff mode)
    • If file is new: Translates entire file (full mode)
  4. Section Matching: Uses heading-map system for robust cross-language matching
  5. Translation: Uses Claude Sonnet 4.5 with glossary support
  6. Heading-Map Update: Automatically maintains Englishโ†’Translation mappings
  7. Validation: Verifies MyST syntax of translated content
  8. PR Creation: Opens a pull request in the target repository
  9. Review: Team reviews and merges the translation

Heading-Map System (v0.4.0)

The action uses a heading-map system to reliably match sections across language versions:

---
title: Dynamic Programming
heading-map:
  Introduction: ็ฎ€ไป‹
  Economic Model: ็ปๆตŽๆจกๅž‹
  Python Setup: Python ่ฎพ็ฝฎ
---

Benefits:

  • ๐ŸŽฏ Robust matching: Finds sections even if reordered or restructured
  • ๐Ÿ”„ Self-maintaining: Automatically populated and updated
  • ๐Ÿ‘๏ธ Transparent: Visible in document frontmatter
  • ๐Ÿ“– Human-readable: Easy to inspect and manually correct if needed

See docs/HEADING-MAPS.md for detailed guide.

Documentation

For comprehensive documentation, see the docs/ directory:

Companion Tools

This project includes two standalone tools for different stages of the translation workflow:

1. Bulk Translator Tool

Purpose: One-time bulk translation for initial repository setup

๐Ÿ“ฆ tool-bulk-translator/ - Standalone CLI tool

Features:

  • Translates entire lecture series in one operation
  • One-lecture-at-a-time approach for optimal quality and context
  • Preserves complete Jupyter Book structure
  • Auto-generates heading-maps for all sections
  • Dry-run mode to preview before translating (no API costs)

Use case: Creating a new lecture-python.zh-cn from existing lecture-python

After bulk translation, use the main action for incremental updates.

2. GitHub Action Test Tool

Purpose: Testing and validation of the translation sync action

๐Ÿงช tool-test-action-on-github/ - Automated testing framework

Features:

  • 24 comprehensive test scenarios
  • Real GitHub PR workflow testing
  • Dry-run mode for validation without API costs
  • GPT5 evaluation reports

Test coverage:

  • Basic changes (intro, title, content, reordering)
  • Structural changes (add/delete sections, subsections)
  • Scientific content (code cells, math equations)
  • Document lifecycle (create, delete, rename, multi-file)
  • Edge cases (preamble-only, deep nesting, special chars, empty sections)

Use case: Validating changes to the action before deployment

Development

Prerequisites

  • Node.js 20+
  • npm or yarn

Setup

# Install dependencies
npm install

# Build the action
npm run build

# Run tests
npm test

# Lint code
npm run lint

# Format code
npm run format

Project Structure

.
โ”œโ”€โ”€ src/                          # Main action source code
โ”‚   โ”œโ”€โ”€ index.ts                  # GitHub Actions entry point
โ”‚   โ”œโ”€โ”€ parser.ts                 # MyST Markdown parser (section-based)
โ”‚   โ”œโ”€โ”€ diff-detector.ts          # Change detection (ADD/MODIFY/DELETE)
โ”‚   โ”œโ”€โ”€ translator.ts             # Claude API integration
โ”‚   โ”œโ”€โ”€ file-processor.ts         # Translation orchestration
โ”‚   โ”œโ”€โ”€ heading-map.ts            # Heading-map system
โ”‚   โ”œโ”€โ”€ language-config.ts        # Language-specific rules (v0.5.1)
โ”‚   โ”œโ”€โ”€ types.ts                  # TypeScript type definitions
โ”‚   โ””โ”€โ”€ inputs.ts                 # GitHub Actions input handling
โ”œโ”€โ”€ docs/                         # Comprehensive documentation
โ”œโ”€โ”€ glossary/                     # Built-in translation glossaries
โ”‚   โ”œโ”€โ”€ zh-cn.json                # Simplified Chinese (355 terms)
โ”‚   โ””โ”€โ”€ README.md                 # Glossary format and contribution guide
โ”œโ”€โ”€ tool-bulk-translator/         # Standalone CLI for bulk translation
โ”‚   โ”œโ”€โ”€ src/bulk-translate.ts     # Main CLI implementation
โ”‚   โ”œโ”€โ”€ examples/                 # Usage examples
โ”‚   โ””โ”€โ”€ README.md                 # Tool documentation
โ”œโ”€โ”€ tool-test-action-on-github/   # GitHub integration testing
โ”‚   โ”œโ”€โ”€ test-action-on-github.sh  # Test script (24 scenarios)
โ”‚   โ”œโ”€โ”€ test-action-on-github-data/  # Test fixtures
โ”‚   โ””โ”€โ”€ reports/                  # GPT5 evaluation reports
โ”œโ”€โ”€ examples/                     # Example workflow configurations
โ”œโ”€โ”€ action.yml                    # GitHub Action metadata
โ””โ”€โ”€ package.json                  # Dependencies and scripts

License

MIT

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

For development guidelines, see:

Acknowledgements

We would like to thank the following contributors for their valuable reviews and contributions to this project:

About

Automatically sync translations across QuantEcon lecture repositories using Claude Sonnet 4.5

Resources

Stars

Watchers

Forks

Sponsor this project

  •  

Packages

No packages published