Add HuggingFace support #245

ryanDing26 · 2025-10-19T18:25:27Z

Brief Description of Changes

LangChain has a HuggingFace integration which works similarly to how the current Chat[PROVIDER] object models work in agent/llm.py and given that HuggingFace hosts a variety of custom models for prospective users, I figured it might be good to add considering there may be users who want to use models which may not be supported through other channels and who are more familiar with HuggingFace calls than the other Chat models.

Additions

pip install langchain-huggingface added into new_software_v007.sh
Simple few lines of code added to llm.py:

# Line 9: HuggingFace as possible source type after Groq
SourceType = Literal["OpenAI", "AzureOpenAI", "Anthropic", "Ollama", "Gemini", "Bedrock", "Groq", "HuggingFace", "Custom"]
...
# Line 29: Updated documentation
"""source (str): Source provider: "OpenAI", "AzureOpenAI", "Anthropic", "Ollama", "Gemini", "Bedrock", "HuggingFace", or "Custom""""
...
# Lines 199-213: actual model configuration
elif source == "HuggingFace":
        try:
            from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint 
        except ImportError:
            raise ImportError(  # noqa: B904
                "langchain-huggingface package is required for HuggingFace models. Install with: pip install langchain-huggingface"
            )
        return ChatHuggingFace(
            llm = HuggingFaceEndpoint(
                repo_id=model,
                temperature=temperature,
                stop_sequences=stop_sequences,
                huggingfacehub_api_token=os.getenv("HUGGINGFACE_API_KEY")
            )
        )

Sample Prompt

Note that the actual output isn't that great due to model selection, this is just to show functionality. For testing purposes, Mistral's Ollama and Bedrock compatibility were removed to ensure the model wasn't using those instead. I manually replaced triple tick with triple double quotes from the <execute> blocks so they can be formatted nicely here:

>>> from biomni.agent import A1
Loaded environment variables from .env
>>> agent = A1(llm="mistralai/Mistral-7B-Instruct-v0.3", source="HuggingFace")
🎓 Academic mode: Using all datasets (including non-commercial)

==================================================
🔧 BIOMNI CONFIGURATION
==================================================
📋 DEFAULT CONFIG (Including Database LLM):
  Path: ./data
  Timeout Seconds: 600
  Llm: claude-sonnet-4-5
  Temperature: 0.7
  Use Tool Retriever: True
  Commercial Mode: Academic (all datasets)

🤖 AGENT LLM (Constructor Override):
  LLM Model: mistralai/Mistral-7B-Instruct-v0.3
  Source: HuggingFace
==================================================

Checking and downloading missing data lake files...
>>> agent.go("Generate a drug-likeliness profile for C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F")
Using prompt-based retrieval with the agent's LLM
================================ Human Message =================================

Generate a drug-likeliness profile for C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F
parsing error...
================================ Human Message =================================

Each response must include thinking process followed by either <execute> or <solution> tag. But there are no tags in the current response. Please follow the instruction, fix and regenerate the response again.
================================== Ai Message ==================================

Apologies for the oversight. Here's the corrected response:

To generate a drug-likeliness profile for the given compound, I will follow these steps:

1. [ ] Load the necessary libraries and functions
2. [ ] Preprocess the given compound SMILES string
3. [ ] Calculate the molecular descriptors for the preprocessed compound
4. [ ] Use the calculated descriptors to predict the drug-likeliness score

Let's start by loading the necessary libraries and functions:

1. [ ] Load the necessary libraries and functions

"""python
<execute>
from rdkit import Chem
from rdkit.Chem import Descriptors
from rdkit.Chem.rdMolDescriptors import CalcMolFormula
from rdkit.Chem.rdMolDescriptors import CalcMolLogP
from rdkit.Chem.rdMolDescriptors import CalcMolWt
from rdkit.Chem.rdMolDescriptors import CalcNumRotatableBonds
from rdkit.Chem.rdMolDescriptors import CalcFractionCsp3
from rdkit.Chem.rdMolDescriptors import CalcMolPolarSurfaceArea
from rdkit.Chem.rdMolDescriptors import CalcAlogP
from rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondDonors
from rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondAcceptors
from rdkit.Chem.rdMolDescriptors import CalcNumRings
<execute>
"""

Now, let's preprocess the given compound SMILES string:

2. [ ] Preprocess the given compound SMILES string

"""python
<execute>
smiles = "C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F"
mol = Chem.MolFromSmiles(smiles)
<execute>
"""

Next, we will calculate the molecular descriptors for the preprocessed compound:

3. [ ] Calculate the molecular descriptors for the preprocessed compound

"""python
<execute>
mol_formula = CalcMolFormula(mol)
mol_logP = CalcMolLogP(mol)
mol_weight = CalcMolWt(mol)
num_rotatable_bonds = CalcNumRotatableBonds(mol)
fraction_csp3 = CalcFractionCsp3(mol)
polar_surface_area = CalcMolPolarSurfaceArea(mol)
alogP = CalcAlogP(mol)
hydrogen_bond_donors = CalcMolHydrogenBondDonors(mol)
hydrogen_bond_acceptors = CalcMolHydrogenBondAcceptors(mol)
num_rings = CalcNumRings(mol)
<execute>
"""

Finally, we will use the calculated descriptors to predict the drug-likeliness score:

4. [ ] Use the calculated descriptors to predict the drug-likeliness score

"""python
<execute>
# Load the drug-likeliness model
from sklearn.externals.joblib import load
model = load("drug_likeliness_model.pkl")

# Calculate the descriptors for the given compound
descriptors = [mol_formula, mol_logP, mol_weight, num_rotatable_bonds,
               fraction_csp3, polar_surface_area, alogP,
               hydrogen_bond_donors, hydrogen_bond_acceptors, num_rings]

# Predict the drug-likeliness score
score = model.predict([descriptors])

# Print the drug-likeliness score
print("Drug-likeliness score:", score[0])
<execute>
"""

The drug-likeliness score for the given compound will be printed out.</execute>
================================== Ai Message ==================================

<observation>Error: unterminated string literal (detected at line 16) (<string>, line 16)</observation>
================================== Ai Message ==================================

<solution>The error occurred due to an unclosed string literal in the SMILES string. Let's fix the SMILES string by adding a missing closing parenthesis:

1. [ ] Load the necessary libraries and functions
2. [ ] Preprocess the given compound SMILES string
3. [ ] Calculate the molecular descriptors for the preprocessed compound
4. [ ] Use the calculated descriptors to predict the drug-likeliness score

Let's start by loading the necessary libraries and functions:

1. [ ] Load the necessary libraries and functions

"""python
<execute>
from rdkit import Chem
from rdkit.Chem import Descriptors
from rdkit.Chem.rdMolDescriptors import CalcMolFormula
from rdkit.Chem.rdMolDescriptors import CalcMolLogP
from rdkit.Chem.rdMolDescriptors import CalcMolWt
from rdkit.Chem.rdMolDescriptors import CalcNumRotatableBonds
from rdkit.Chem.rdMolDescriptors import CalcFractionCsp3
from rdkit.Chem.rdMolDescriptors import CalcMolPolarSurfaceArea
from rdkit.Chem.rdMolDescriptors import CalcAlogP
from rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondDonors
from rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondAcceptors
from rdkit.Chem.rdMolDescriptors import CalcNumRings
<execute>
"""

Now, let's preprocess the given compound SMILES string with the corrected SMILES string:

2. [ ] Preprocess the given compound SMILES string

"""python
<execute>
smiles = "C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F"
mol = Chem.MolFromSmiles(smiles)
<execute>
"""

Next, we will calculate the molecular descriptors for the preprocessed compound:

3. [ ] Calculate the molecular descriptors for the preprocessed compound

"""python
<execute>
mol_formula = CalcMolFormula(mol)
mol_logP = CalcMolLogP(mol)
mol_weight = CalcMolWt(mol)
num_rotatable_bonds = CalcNumRotatableBonds(mol)
fraction_csp3 = CalcFractionCsp3(mol)
polar_surface_area = CalcMolPolarSurfaceArea(mol)
alogP = CalcAlogP(mol)
hydrogen_bond_donors = CalcMolHydrogenBondDonors(mol)
hydrogen_bond_acceptors = CalcMolHydrogenBondAcceptors(mol)
num_rings = CalcNumRings(mol)
<execute>
"""

Finally, we will use the calculated descriptors to predict the drug-likeliness score:

4. [ ] Use the calculated descriptors to predict the drug-likeliness score

"""python
<execute>
# Load the drug-likeliness model
from sklearn.externals.joblib import load
model = load("drug_likeliness_model.pkl")

# Calculate the descriptors for the given compound
descriptors = [mol_formula, mol_logP, mol_weight, num_rotatable_bonds,
               fraction_csp3, polar_surface_area, alogP,
               hydrogen_bond_donors, hydrogen_bond_acceptors, num_rings]

# Predict the drug-likeliness score
score = model.predict([descriptors])

# Print the drug-likeliness score
print("Drug-likeliness score:", score[0])
<execute>
"""

The drug-likeliness score for the given compound will be printed out.</solution></execute>
(['================================\x1b[1m Human Message \x1b[0m=================================\n\nGenerate a drug-likeliness profile for C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F', '================================\x1b[1m Human Message \x1b[0m=================================\n\nEach response must include thinking process followed by either <execute> or <solution> tag. But there are no tags in the current response. Please follow the instruction, fix and regenerate the response again.', '==================================\x1b[1m Ai Message \x1b[0m==================================\n\nApologies for the oversight. Here\'s the corrected response:\n\nTo generate a drug-likeliness profile for the given compound, I will follow these steps:\n\n1. [ ] Load the necessary libraries and functions\n2. [ ] Preprocess the given compound SMILES string\n3. [ ] Calculate the molecular descriptors for the preprocessed compound\n4. [ ] Use the calculated descriptors to predict the drug-likeliness score\n\nLet\'s start by loading the necessary libraries and functions:\n\n1. [ ] Load the necessary libraries and functions\n\n```python\n<execute>\nfrom rdkit import Chem\nfrom rdkit.Chem import Descriptors\nfrom rdkit.Chem.rdMolDescriptors import CalcMolFormula\nfrom rdkit.Chem.rdMolDescriptors import CalcMolLogP\nfrom rdkit.Chem.rdMolDescriptors import CalcMolWt\nfrom rdkit.Chem.rdMolDescriptors import CalcNumRotatableBonds\nfrom rdkit.Chem.rdMolDescriptors import CalcFractionCsp3\nfrom rdkit.Chem.rdMolDescriptors import CalcMolPolarSurfaceArea\nfrom rdkit.Chem.rdMolDescriptors import CalcAlogP\nfrom rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondDonors\nfrom rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondAcceptors\nfrom rdkit.Chem.rdMolDescriptors import CalcNumRings\n<execute>\n```\n\nNow, let\'s preprocess the given compound SMILES string:\n\n2. [ ] Preprocess the given compound SMILES string\n\n```python\n<execute>\nsmiles = "C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F"\nmol = Chem.MolFromSmiles(smiles)\n<execute>\n```\n\nNext, we will calculate the molecular descriptors for the preprocessed compound:\n\n3. [ ] Calculate the molecular descriptors for the preprocessed compound\n\n```python\n<execute>\nmol_formula = CalcMolFormula(mol)\nmol_logP = CalcMolLogP(mol)\nmol_weight = CalcMolWt(mol)\nnum_rotatable_bonds = CalcNumRotatableBonds(mol)\nfraction_csp3 = CalcFractionCsp3(mol)\npolar_surface_area = CalcMolPolarSurfaceArea(mol)\nalogP = CalcAlogP(mol)\nhydrogen_bond_donors = CalcMolHydrogenBondDonors(mol)\nhydrogen_bond_acceptors = CalcMolHydrogenBondAcceptors(mol)\nnum_rings = CalcNumRings(mol)\n<execute>\n```\n\nFinally, we will use the calculated descriptors to predict the drug-likeliness score:\n\n4. [ ] Use the calculated descriptors to predict the drug-likeliness score\n\n```python\n<execute>\n# Load the drug-likeliness model\nfrom sklearn.externals.joblib import load\nmodel = load("drug_likeliness_model.pkl")\n\n# Calculate the descriptors for the given compound\ndescriptors = [mol_formula, mol_logP, mol_weight, num_rotatable_bonds,\n               fraction_csp3, polar_surface_area, alogP,\n               hydrogen_bond_donors, hydrogen_bond_acceptors, num_rings]\n\n# Predict the drug-likeliness score\nscore = model.predict([descriptors])\n\n# Print the drug-likeliness score\nprint("Drug-likeliness score:", score[0])\n<execute>\n```\n\nThe drug-likeliness score for the given compound will be printed out.</execute>', '==================================\x1b[1m Ai Message \x1b[0m==================================\n\n<observation>Error: unterminated string literal (detected at line 16) (<string>, line 16)</observation>', '==================================\x1b[1m Ai Message \x1b[0m==================================\n\n<solution>The error occurred due to an unclosed string literal in the SMILES string. Let\'s fix the SMILES string by adding a missing closing parenthesis:\n\n1. [ ] Load the necessary libraries and functions\n2. [ ] Preprocess the given compound SMILES string\n3. [ ] Calculate the molecular descriptors for the preprocessed compound\n4. [ ] Use the calculated descriptors to predict the drug-likeliness score\n\nLet\'s start by loading the necessary libraries and functions:\n\n1. [ ] Load the necessary libraries and functions\n\n```python\n<execute>\nfrom rdkit import Chem\nfrom rdkit.Chem import Descriptors\nfrom rdkit.Chem.rdMolDescriptors import CalcMolFormula\nfrom rdkit.Chem.rdMolDescriptors import CalcMolLogP\nfrom rdkit.Chem.rdMolDescriptors import CalcMolWt\nfrom rdkit.Chem.rdMolDescriptors import CalcNumRotatableBonds\nfrom rdkit.Chem.rdMolDescriptors import CalcFractionCsp3\nfrom rdkit.Chem.rdMolDescriptors import CalcMolPolarSurfaceArea\nfrom rdkit.Chem.rdMolDescriptors import CalcAlogP\nfrom rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondDonors\nfrom rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondAcceptors\nfrom rdkit.Chem.rdMolDescriptors import CalcNumRings\n<execute>\n```\n\nNow, let\'s preprocess the given compound SMILES string with the corrected SMILES string:\n\n2. [ ] Preprocess the given compound SMILES string\n\n```python\n<execute>\nsmiles = "C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F"\nmol = Chem.MolFromSmiles(smiles)\n<execute>\n```\n\nNext, we will calculate the molecular descriptors for the preprocessed compound:\n\n3. [ ] Calculate the molecular descriptors for the preprocessed compound\n\n```python\n<execute>\nmol_formula = CalcMolFormula(mol)\nmol_logP = CalcMolLogP(mol)\nmol_weight = CalcMolWt(mol)\nnum_rotatable_bonds = CalcNumRotatableBonds(mol)\nfraction_csp3 = CalcFractionCsp3(mol)\npolar_surface_area = CalcMolPolarSurfaceArea(mol)\nalogP = CalcAlogP(mol)\nhydrogen_bond_donors = CalcMolHydrogenBondDonors(mol)\nhydrogen_bond_acceptors = CalcMolHydrogenBondAcceptors(mol)\nnum_rings = CalcNumRings(mol)\n<execute>\n```\n\nFinally, we will use the calculated descriptors to predict the drug-likeliness score:\n\n4. [ ] Use the calculated descriptors to predict the drug-likeliness score\n\n```python\n<execute>\n# Load the drug-likeliness model\nfrom sklearn.externals.joblib import load\nmodel = load("drug_likeliness_model.pkl")\n\n# Calculate the descriptors for the given compound\ndescriptors = [mol_formula, mol_logP, mol_weight, num_rotatable_bonds,\n               fraction_csp3, polar_surface_area, alogP,\n               hydrogen_bond_donors, hydrogen_bond_acceptors, num_rings]\n\n# Predict the drug-likeliness score\nscore = model.predict([descriptors])\n\n# Print the drug-likeliness score\nprint("Drug-likeliness score:", score[0])\n<execute>\n```\n\nThe drug-likeliness score for the given compound will be printed out.</solution></execute>'], '<solution>The error occurred due to an unclosed string literal in the SMILES string. Let\'s fix the SMILES string by adding a missing closing parenthesis:\n\n1. [ ] Load the necessary libraries and functions\n2. [ ] Preprocess the given compound SMILES string\n3. [ ] Calculate the molecular descriptors for the preprocessed compound\n4. [ ] Use the calculated descriptors to predict the drug-likeliness score\n\nLet\'s start by loading the necessary libraries and functions:\n\n1. [ ] Load the necessary libraries and functions\n\n```python\n<execute>\nfrom rdkit import Chem\nfrom rdkit.Chem import Descriptors\nfrom rdkit.Chem.rdMolDescriptors import CalcMolFormula\nfrom rdkit.Chem.rdMolDescriptors import CalcMolLogP\nfrom rdkit.Chem.rdMolDescriptors import CalcMolWt\nfrom rdkit.Chem.rdMolDescriptors import CalcNumRotatableBonds\nfrom rdkit.Chem.rdMolDescriptors import CalcFractionCsp3\nfrom rdkit.Chem.rdMolDescriptors import CalcMolPolarSurfaceArea\nfrom rdkit.Chem.rdMolDescriptors import CalcAlogP\nfrom rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondDonors\nfrom rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondAcceptors\nfrom rdkit.Chem.rdMolDescriptors import CalcNumRings\n<execute>\n```\n\nNow, let\'s preprocess the given compound SMILES string with the corrected SMILES string:\n\n2. [ ] Preprocess the given compound SMILES string\n\n```python\n<execute>\nsmiles = "C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F"\nmol = Chem.MolFromSmiles(smiles)\n<execute>\n```\n\nNext, we will calculate the molecular descriptors for the preprocessed compound:\n\n3. [ ] Calculate the molecular descriptors for the preprocessed compound\n\n```python\n<execute>\nmol_formula = CalcMolFormula(mol)\nmol_logP = CalcMolLogP(mol)\nmol_weight = CalcMolWt(mol)\nnum_rotatable_bonds = CalcNumRotatableBonds(mol)\nfraction_csp3 = CalcFractionCsp3(mol)\npolar_surface_area = CalcMolPolarSurfaceArea(mol)\nalogP = CalcAlogP(mol)\nhydrogen_bond_donors = CalcMolHydrogenBondDonors(mol)\nhydrogen_bond_acceptors = CalcMolHydrogenBondAcceptors(mol)\nnum_rings = CalcNumRings(mol)\n<execute>\n```\n\nFinally, we will use the calculated descriptors to predict the drug-likeliness score:\n\n4. [ ] Use the calculated descriptors to predict the drug-likeliness score\n\n```python\n<execute>\n# Load the drug-likeliness model\nfrom sklearn.externals.joblib import load\nmodel = load("drug_likeliness_model.pkl")\n\n# Calculate the descriptors for the given compound\ndescriptors = [mol_formula, mol_logP, mol_weight, num_rotatable_bonds,\n               fraction_csp3, polar_surface_area, alogP,\n               hydrogen_bond_donors, hydrogen_bond_acceptors, num_rings]\n\n# Predict the drug-likeliness score\nscore = model.predict([descriptors])\n\n# Print the drug-likeliness score\nprint("Drug-likeliness score:", score[0])\n<execute>\n```\n\nThe drug-likeliness score for the given compound will be printed out.</solution></execute>')

for more information, see https://pre-commit.ci

ryanDing26 · 2025-10-19T18:28:21Z

Not sure what to do about the failed check as it corresponds to a file not within the scope of my change, wanted to make note of that. Thanks!

ryanDing26 and others added 3 commits October 19, 2025 11:02

Add HuggingFace support

2215da3

Add pip installation to new software script

e6cc6cc

[pre-commit.ci] auto fixes from pre-commit.com hooks

e1a9cc3

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add HuggingFace support #245

Add HuggingFace support #245

Uh oh!

ryanDing26 commented Oct 19, 2025

Uh oh!

ryanDing26 commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add HuggingFace support #245

Are you sure you want to change the base?

Add HuggingFace support #245

Uh oh!

Conversation

ryanDing26 commented Oct 19, 2025

Uh oh!

ryanDing26 commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant