Indicate: Transliterate Indic Languages to English

Transliterations to/from Indian languages are still generally low quality. One problem is access to data. Another is that there is no standard transliteration.

For Hindi--English, we build novel dataset for names using the ESPNcricinfo. For instance, see here for hindi version of the english scorecard.

We also create a dataset from election affidavits and exploit the Google Dakshina dataset.

To overcome the fact that there isn't one standard way of transliteration, we provide k-best transliterations.

Install

We strongly recommend installing indicate inside a Python virtual environment (see venv documentation)

Requirements: Python 3.11 or 3.12 (TensorFlow does not yet support Python 3.13)

pip install indicate

Usage

Python API

from indicate import transliterate
english_translated = transliterate.hindi2english("हिंदी")
print(english_translated)
# Output: hindi

Command Line Interface

The package provides both modern and legacy CLI interfaces:

Modern CLI (Recommended)

# Basic usage
indicate hindi2english "राजशेखर चिंतालपति"

# From file
indicate hindi2english --input hindi.txt --output english.txt

# From stdin
echo "गौरव सूद" | indicate hindi2english

# Batch processing for large files
indicate hindi2english --input large_file.txt --batch --quiet

# Get help
indicate hindi2english --help

# Package information
indicate info

Legacy CLI (Backward Compatibility)

# Still supported for backward compatibility
hindi2english --type hin2eng --input "हिंदी"

Functions

We expose 1 function, which will take Hindi text and transliterate it to English.

transliterate.hindi2english(input)
- What it does: Converts given hindi text into English alphabet
- Output: Returns text in English

Testing Locally

To test the package locally, follow these steps:

Clone the repository:

git clone https://github.com/in-rolls/indicate.git
cd indicate

Install with uv (recommended):

uv sync

Or with pip:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -e .

Run tests:

# Run all tests
python -m unittest discover tests/

# Run specific test
python -m unittest tests.test_010_hindi_translate

Test the transliteration:

# Modern CLI
indicate hindi2english "हिंदी"

# Legacy CLI
hindi2english --type hin2eng --input "हिंदी"

# Python usage
python -c "from indicate import transliterate; print(transliterate.hindi2english('हिंदी'))"

Data

The datasets used to train the model:

Indian Election affidavits
Google Dakshina dataset
ESPN Cric Info for hindi version of the english scorecard
IIT Bombay English-Hindi Corpus

Evaluation

Model was evaluated on test dataset of Google Dakshina dataset, Model predicted 73.64% exact matches. Indic-trans predicted 63.12% exact matches on Google Dakshina dataset.

Below is the edit distance metrics on test dataset (0.0 mean exact match, the farther away from 0.0, the difference is more between predicted text and actual text):

Authors

Rajashekar Chintalapati and Gaurav Sood

Contributor Code of Conduct

The project welcomes contributions from everyone! In fact, it depends on it. To maintain this welcoming atmosphere, and to collaborate in a fun and productive way, we expect contributors to the project to abide by the Contributor Code of Conduct.

License

The package is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.github		.github
data		data
docs		docs
images		images
indicate		indicate
notebooks		notebooks
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Citation.cff		Citation.cff
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Indicate: Transliterate Indic Languages to English

Install

Usage

Python API

Command Line Interface

Modern CLI (Recommended)

Legacy CLI (Backward Compatibility)

Functions

Testing Locally

Data

Evaluation

Authors

Contributor Code of Conduct

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

in-rolls/indicate

Folders and files

Latest commit

History

Repository files navigation

Indicate: Transliterate Indic Languages to English

Install

Usage

Python API

Command Line Interface

Modern CLI (Recommended)

Legacy CLI (Backward Compatibility)

Functions

Testing Locally

Data

Evaluation

Authors

Contributor Code of Conduct

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages