Skip to content

Commit 772bf07

Browse files
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
1 parent 81e2fcd commit 772bf07

File tree

13 files changed

+295
-158
lines changed

13 files changed

+295
-158
lines changed

code_of_conduct.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,27 @@
11
# Contributor Covenant Code of Conduct
22

3+
<!--TOC-->
4+
5+
______________________________________________________________________
6+
7+
**Table of Contents**
8+
9+
- [Our Pledge](#our-pledge)
10+
- [Our Standards](#our-standards)
11+
- [Enforcement Responsibilities](#enforcement-responsibilities)
12+
- [Scope](#scope)
13+
- [Enforcement](#enforcement)
14+
- [Enforcement Guidelines](#enforcement-guidelines)
15+
- [1. Correction](#1-correction)
16+
- [2. Warning](#2-warning)
17+
- [3. Temporary Ban](#3-temporary-ban)
18+
- [4. Permanent Ban](#4-permanent-ban)
19+
- [Attribution](#attribution)
20+
21+
______________________________________________________________________
22+
23+
<!--TOC-->
24+
325
## Our Pledge
426

527
We as members, contributors, and leaders pledge to make participation in our

data/tabular/ld50_catmos/meta.yaml

Lines changed: 135 additions & 136 deletions
Large diffs are not rendered by default.

data/tabular/mona/example_processing_and_templates.ipynb

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020
"from tqdm import tqdm\n",
2121
"\n",
2222
"# import datasets\n",
23-
"import rdkit\n",
2423
"import rdkit.Chem as Chem\n",
2524
"import rdkit.RDLogger as RDLogger"
2625
]
@@ -1444,7 +1443,7 @@
14441443
" k = md[\"name\"]\n",
14451444
" v = md.get(\"value\", np.nan)\n",
14461445
" df_row[\"md_\" + transform_key(k)] = v\n",
1447-
" if not (v is np.nan):\n",
1446+
" if v is not np.nan:\n",
14481447
" md_keys.append(k)\n",
14491448
" md_key_counter.update(md_keys)\n",
14501449
" compounds = entry.get(\"compound\", [])\n",

data/tabular/ocp/transform.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ def uniCode2Latex(text: str) -> str:
2121
text = text.replace(chr(code), f"$_{code-8320}$")
2222

2323
text = text.replace("\u0305", "$^-$")
24-
text = text.replace("\u207A", "$^+$")
25-
text = text.replace("\u207B", "$^-$")
24+
text = text.replace("\u207a", "$^+$")
25+
text = text.replace("\u207b", "$^-$")
2626
text = text.replace("\u2074", "$^4$")
2727
text = text.replace("\u2070", "$^0$")
2828
text = text.replace("\u2078", "$^1$")

data/tabular/orbnet_denali/develop_transform.ipynb

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,7 @@
2525
"metadata": {},
2626
"outputs": [],
2727
"source": [
28-
"from pathlib import Path\n",
2928
"from rdkit import Chem\n",
30-
"import matplotlib.pyplot as plt\n",
31-
"import numpy as np\n",
32-
"import os\n",
3329
"import pandas as pd\n",
3430
"from glob import glob"
3531
]
@@ -474,7 +470,6 @@
474470
"metadata": {},
475471
"outputs": [],
476472
"source": [
477-
"from rdkit.Chem import rdDetermineBonds\n",
478473
"from chemnlp.utils import xyz_to_mol"
479474
]
480475
},

docs/CONTRIBUTING.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,23 @@
11
# Contributing to ChemNLP
22

3+
<!--TOC-->
4+
5+
______________________________________________________________________
6+
7+
**Table of Contents**
8+
9+
- [Getting Started](#getting-started)
10+
- [Implementing a Dataset](#implementing-a-dataset)
11+
- [meta.yaml Structure](#metayaml-structure)
12+
- [transform.py Guidelines](#transformpy-guidelines)
13+
- [Text Templates](#text-templates)
14+
- [Testing Your Contribution](#testing-your-contribution)
15+
- [Submitting Your Contribution](#submitting-your-contribution)
16+
17+
______________________________________________________________________
18+
19+
<!--TOC-->
20+
321
Thank you for your interest in contributing to ChemNLP! There are many ways to contribute, including implementing datasets, improving code, and enhancing documentation.
422

523
## Getting Started
@@ -17,7 +35,6 @@ One of the most valuable contributions is implementing a dataset. Here's how to
1735
1. Choose a dataset from our [awesome list](https://github.com/kjappelbaum/awesome-chemistry-datasets) or add a new one there.
1836
2. Create an issue in this repository stating your intention to add the dataset.
1937
3. Make a Pull Request (PR) that adds a new folder in `data` with the following files:
20-
2138
- `meta.yaml`: Describes the dataset (see structure below).
2239
- `transform.py`: Python code to transform the original dataset into a usable form.
2340

docs/api/meta_yaml_augmentor.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,25 @@
11
# Meta YAML Augmenter
22

3+
<!--TOC-->
4+
5+
______________________________________________________________________
6+
7+
**Table of Contents**
8+
9+
- [Overview](#overview)
10+
- [generate_augmented_meta_yaml](#generate_augmented_meta_yaml)
11+
- [CLI Interface](#cli-interface)
12+
- [Usage](#usage)
13+
- [Arguments](#arguments)
14+
- [Example](#example)
15+
- [Augmentation Process](#augmentation-process)
16+
- [Notes](#notes)
17+
- [Example Usage in Python](#example-usage-in-python)
18+
19+
______________________________________________________________________
20+
21+
<!--TOC-->
22+
323
## Overview
424

525
The Meta YAML Augmenter is a tool designed to enhance existing `meta.yaml` files for chemical datasets. It uses Large Language Models (LLMs) to generate additional templates and improve the metadata structure, particularly focusing on advanced sampling methods and template formats.

docs/api/meta_yaml_generator.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,19 @@
11
# Meta YAML Generator
22

3+
<!--TOC-->
4+
5+
______________________________________________________________________
6+
7+
**Table of Contents**
8+
9+
- [Overview](#overview)
10+
- [`generate_meta_yaml`](#generate_meta_yaml)
11+
- [Usage Example](#usage-example)
12+
13+
______________________________________________________________________
14+
15+
<!--TOC-->
16+
317
## Overview
418

519
The Meta YAML Generator is a tool designed to automatically create a `meta.yaml` file for chemical datasets using Large Language Models (LLMs). It analyzes the structure of a given DataFrame and generates a comprehensive metadata file, including advanced sampling methods and template formats.

docs/api/sampler.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,28 @@
11
# Sampler Module
22

3+
<!--TOC-->
4+
5+
______________________________________________________________________
6+
7+
**Table of Contents**
8+
9+
- [Overview](#overview)
10+
- [TemplateSampler](#templatesampler)
11+
- [Class: TemplateSampler](#class-templatesampler)
12+
- [Initialization](#initialization)
13+
- [Configuration Options](#configuration-options)
14+
- [Main Methods](#main-methods)
15+
- [`sample`](#sample)
16+
- [`enable_class_balancing`](#enable_class_balancing)
17+
- [`disable_class_balancing`](#disable_class_balancing)
18+
- [Identifier Wrapping](#identifier-wrapping)
19+
- [Usage Examples](#usage-examples)
20+
- [Notes](#notes)
21+
22+
______________________________________________________________________
23+
24+
<!--TOC-->
25+
326
## Overview
427

528
The `sampler` module provides functionality for generating text samples based on templates and data. It is primarily used for creating datasets for natural language processing tasks in chemistry and related fields. The main class in this module is `TemplateSampler`, which allows for flexible text generation with support for multiple choice questions, class balancing, and identifier wrapping.

docs/api/sampler_cli.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,31 @@
11
# Sampler CLI
22

3+
<!--TOC-->
4+
5+
______________________________________________________________________
6+
7+
**Table of Contents**
8+
9+
- [Overview](#overview)
10+
- [Usage](#usage)
11+
- [Arguments](#arguments)
12+
- [Options](#options)
13+
- [Detailed Option Descriptions](#detailed-option-descriptions)
14+
- [`chunksize`](#chunksize)
15+
- [`class_balanced`](#class_balanced)
16+
- [`benchmarking`](#benchmarking)
17+
- [`multiple_choice`](#multiple_choice)
18+
- [`additional_templates`](#additional_templates)
19+
- [`use_standard_templates`](#use_standard_templates)
20+
- [`wrap_identifiers`](#wrap_identifiers)
21+
- [Examples](#examples)
22+
- [Notes](#notes)
23+
- [Troubleshooting](#troubleshooting)
24+
25+
______________________________________________________________________
26+
27+
<!--TOC-->
28+
329
## Overview
430

531
The Sampler CLI is a command-line interface tool designed to process chemical datasets using the `TemplateSampler`. It allows for flexible text generation based on templates, with support for various sampling scenarios including class balancing, benchmarking, and multiple-choice questions.

0 commit comments

Comments
 (0)