Skip to content

Conversation

@ryanDing26
Copy link
Contributor

Brief Description of Changes

LangChain has a HuggingFace integration which works similarly to how the current Chat[PROVIDER] object models work in agent/llm.py and given that HuggingFace hosts a variety of custom models for prospective users, I figured it might be good to add considering there may be users who want to use models which may not be supported through other channels and who are more familiar with HuggingFace calls than the other Chat models.

Additions

  • pip install langchain-huggingface added into new_software_v007.sh
  • Simple few lines of code added to llm.py:
# Line 9: HuggingFace as possible source type after Groq
SourceType = Literal["OpenAI", "AzureOpenAI", "Anthropic", "Ollama", "Gemini", "Bedrock", "Groq", "HuggingFace", "Custom"]
...
# Line 29: Updated documentation
"""source (str): Source provider: "OpenAI", "AzureOpenAI", "Anthropic", "Ollama", "Gemini", "Bedrock", "HuggingFace", or "Custom""""
...
# Lines 199-213: actual model configuration
elif source == "HuggingFace":
        try:
            from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint 
        except ImportError:
            raise ImportError(  # noqa: B904
                "langchain-huggingface package is required for HuggingFace models. Install with: pip install langchain-huggingface"
            )
        return ChatHuggingFace(
            llm = HuggingFaceEndpoint(
                repo_id=model,
                temperature=temperature,
                stop_sequences=stop_sequences,
                huggingfacehub_api_token=os.getenv("HUGGINGFACE_API_KEY")
            )
        )

Sample Prompt

Note that the actual output isn't that great due to model selection, this is just to show functionality. For testing purposes, Mistral's Ollama and Bedrock compatibility were removed to ensure the model wasn't using those instead. I manually replaced triple tick with triple double quotes from the <execute> blocks so they can be formatted nicely here:

>>> from biomni.agent import A1
Loaded environment variables from .env
>>> agent = A1(llm="mistralai/Mistral-7B-Instruct-v0.3", source="HuggingFace")
🎓 Academic mode: Using all datasets (including non-commercial)

==================================================
🔧 BIOMNI CONFIGURATION
==================================================
📋 DEFAULT CONFIG (Including Database LLM):
  Path: ./data
  Timeout Seconds: 600
  Llm: claude-sonnet-4-5
  Temperature: 0.7
  Use Tool Retriever: True
  Commercial Mode: Academic (all datasets)

🤖 AGENT LLM (Constructor Override):
  LLM Model: mistralai/Mistral-7B-Instruct-v0.3
  Source: HuggingFace
==================================================

Checking and downloading missing data lake files...
>>> agent.go("Generate a drug-likeliness profile for C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F")
Using prompt-based retrieval with the agent's LLM
================================ Human Message =================================

Generate a drug-likeliness profile for C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F
parsing error...
================================ Human Message =================================

Each response must include thinking process followed by either <execute> or <solution> tag. But there are no tags in the current response. Please follow the instruction, fix and regenerate the response again.
================================== Ai Message ==================================

Apologies for the oversight. Here's the corrected response:

To generate a drug-likeliness profile for the given compound, I will follow these steps:

1. [ ] Load the necessary libraries and functions
2. [ ] Preprocess the given compound SMILES string
3. [ ] Calculate the molecular descriptors for the preprocessed compound
4. [ ] Use the calculated descriptors to predict the drug-likeliness score

Let's start by loading the necessary libraries and functions:

1. [ ] Load the necessary libraries and functions

"""python
<execute>
from rdkit import Chem
from rdkit.Chem import Descriptors
from rdkit.Chem.rdMolDescriptors import CalcMolFormula
from rdkit.Chem.rdMolDescriptors import CalcMolLogP
from rdkit.Chem.rdMolDescriptors import CalcMolWt
from rdkit.Chem.rdMolDescriptors import CalcNumRotatableBonds
from rdkit.Chem.rdMolDescriptors import CalcFractionCsp3
from rdkit.Chem.rdMolDescriptors import CalcMolPolarSurfaceArea
from rdkit.Chem.rdMolDescriptors import CalcAlogP
from rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondDonors
from rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondAcceptors
from rdkit.Chem.rdMolDescriptors import CalcNumRings
<execute>
"""

Now, let's preprocess the given compound SMILES string:

2. [ ] Preprocess the given compound SMILES string

"""python
<execute>
smiles = "C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F"
mol = Chem.MolFromSmiles(smiles)
<execute>
"""

Next, we will calculate the molecular descriptors for the preprocessed compound:

3. [ ] Calculate the molecular descriptors for the preprocessed compound

"""python
<execute>
mol_formula = CalcMolFormula(mol)
mol_logP = CalcMolLogP(mol)
mol_weight = CalcMolWt(mol)
num_rotatable_bonds = CalcNumRotatableBonds(mol)
fraction_csp3 = CalcFractionCsp3(mol)
polar_surface_area = CalcMolPolarSurfaceArea(mol)
alogP = CalcAlogP(mol)
hydrogen_bond_donors = CalcMolHydrogenBondDonors(mol)
hydrogen_bond_acceptors = CalcMolHydrogenBondAcceptors(mol)
num_rings = CalcNumRings(mol)
<execute>
"""

Finally, we will use the calculated descriptors to predict the drug-likeliness score:

4. [ ] Use the calculated descriptors to predict the drug-likeliness score

"""python
<execute>
# Load the drug-likeliness model
from sklearn.externals.joblib import load
model = load("drug_likeliness_model.pkl")

# Calculate the descriptors for the given compound
descriptors = [mol_formula, mol_logP, mol_weight, num_rotatable_bonds,
               fraction_csp3, polar_surface_area, alogP,
               hydrogen_bond_donors, hydrogen_bond_acceptors, num_rings]

# Predict the drug-likeliness score
score = model.predict([descriptors])

# Print the drug-likeliness score
print("Drug-likeliness score:", score[0])
<execute>
"""

The drug-likeliness score for the given compound will be printed out.</execute>
================================== Ai Message ==================================

<observation>Error: unterminated string literal (detected at line 16) (<string>, line 16)</observation>
================================== Ai Message ==================================

<solution>The error occurred due to an unclosed string literal in the SMILES string. Let's fix the SMILES string by adding a missing closing parenthesis:

1. [ ] Load the necessary libraries and functions
2. [ ] Preprocess the given compound SMILES string
3. [ ] Calculate the molecular descriptors for the preprocessed compound
4. [ ] Use the calculated descriptors to predict the drug-likeliness score

Let's start by loading the necessary libraries and functions:

1. [ ] Load the necessary libraries and functions

"""python
<execute>
from rdkit import Chem
from rdkit.Chem import Descriptors
from rdkit.Chem.rdMolDescriptors import CalcMolFormula
from rdkit.Chem.rdMolDescriptors import CalcMolLogP
from rdkit.Chem.rdMolDescriptors import CalcMolWt
from rdkit.Chem.rdMolDescriptors import CalcNumRotatableBonds
from rdkit.Chem.rdMolDescriptors import CalcFractionCsp3
from rdkit.Chem.rdMolDescriptors import CalcMolPolarSurfaceArea
from rdkit.Chem.rdMolDescriptors import CalcAlogP
from rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondDonors
from rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondAcceptors
from rdkit.Chem.rdMolDescriptors import CalcNumRings
<execute>
"""

Now, let's preprocess the given compound SMILES string with the corrected SMILES string:

2. [ ] Preprocess the given compound SMILES string

"""python
<execute>
smiles = "C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F"
mol = Chem.MolFromSmiles(smiles)
<execute>
"""

Next, we will calculate the molecular descriptors for the preprocessed compound:

3. [ ] Calculate the molecular descriptors for the preprocessed compound

"""python
<execute>
mol_formula = CalcMolFormula(mol)
mol_logP = CalcMolLogP(mol)
mol_weight = CalcMolWt(mol)
num_rotatable_bonds = CalcNumRotatableBonds(mol)
fraction_csp3 = CalcFractionCsp3(mol)
polar_surface_area = CalcMolPolarSurfaceArea(mol)
alogP = CalcAlogP(mol)
hydrogen_bond_donors = CalcMolHydrogenBondDonors(mol)
hydrogen_bond_acceptors = CalcMolHydrogenBondAcceptors(mol)
num_rings = CalcNumRings(mol)
<execute>
"""

Finally, we will use the calculated descriptors to predict the drug-likeliness score:

4. [ ] Use the calculated descriptors to predict the drug-likeliness score

"""python
<execute>
# Load the drug-likeliness model
from sklearn.externals.joblib import load
model = load("drug_likeliness_model.pkl")

# Calculate the descriptors for the given compound
descriptors = [mol_formula, mol_logP, mol_weight, num_rotatable_bonds,
               fraction_csp3, polar_surface_area, alogP,
               hydrogen_bond_donors, hydrogen_bond_acceptors, num_rings]

# Predict the drug-likeliness score
score = model.predict([descriptors])

# Print the drug-likeliness score
print("Drug-likeliness score:", score[0])
<execute>
"""

The drug-likeliness score for the given compound will be printed out.</solution></execute>
(['================================\x1b[1m Human Message \x1b[0m=================================\n\nGenerate a drug-likeliness profile for C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F', '================================\x1b[1m Human Message \x1b[0m=================================\n\nEach response must include thinking process followed by either <execute> or <solution> tag. But there are no tags in the current response. Please follow the instruction, fix and regenerate the response again.', '==================================\x1b[1m Ai Message \x1b[0m==================================\n\nApologies for the oversight. Here\'s the corrected response:\n\nTo generate a drug-likeliness profile for the given compound, I will follow these steps:\n\n1. [ ] Load the necessary libraries and functions\n2. [ ] Preprocess the given compound SMILES string\n3. [ ] Calculate the molecular descriptors for the preprocessed compound\n4. [ ] Use the calculated descriptors to predict the drug-likeliness score\n\nLet\'s start by loading the necessary libraries and functions:\n\n1. [ ] Load the necessary libraries and functions\n\n```python\n<execute>\nfrom rdkit import Chem\nfrom rdkit.Chem import Descriptors\nfrom rdkit.Chem.rdMolDescriptors import CalcMolFormula\nfrom rdkit.Chem.rdMolDescriptors import CalcMolLogP\nfrom rdkit.Chem.rdMolDescriptors import CalcMolWt\nfrom rdkit.Chem.rdMolDescriptors import CalcNumRotatableBonds\nfrom rdkit.Chem.rdMolDescriptors import CalcFractionCsp3\nfrom rdkit.Chem.rdMolDescriptors import CalcMolPolarSurfaceArea\nfrom rdkit.Chem.rdMolDescriptors import CalcAlogP\nfrom rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondDonors\nfrom rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondAcceptors\nfrom rdkit.Chem.rdMolDescriptors import CalcNumRings\n<execute>\n```\n\nNow, let\'s preprocess the given compound SMILES string:\n\n2. [ ] Preprocess the given compound SMILES string\n\n```python\n<execute>\nsmiles = "C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F"\nmol = Chem.MolFromSmiles(smiles)\n<execute>\n```\n\nNext, we will calculate the molecular descriptors for the preprocessed compound:\n\n3. [ ] Calculate the molecular descriptors for the preprocessed compound\n\n```python\n<execute>\nmol_formula = CalcMolFormula(mol)\nmol_logP = CalcMolLogP(mol)\nmol_weight = CalcMolWt(mol)\nnum_rotatable_bonds = CalcNumRotatableBonds(mol)\nfraction_csp3 = CalcFractionCsp3(mol)\npolar_surface_area = CalcMolPolarSurfaceArea(mol)\nalogP = CalcAlogP(mol)\nhydrogen_bond_donors = CalcMolHydrogenBondDonors(mol)\nhydrogen_bond_acceptors = CalcMolHydrogenBondAcceptors(mol)\nnum_rings = CalcNumRings(mol)\n<execute>\n```\n\nFinally, we will use the calculated descriptors to predict the drug-likeliness score:\n\n4. [ ] Use the calculated descriptors to predict the drug-likeliness score\n\n```python\n<execute>\n# Load the drug-likeliness model\nfrom sklearn.externals.joblib import load\nmodel = load("drug_likeliness_model.pkl")\n\n# Calculate the descriptors for the given compound\ndescriptors = [mol_formula, mol_logP, mol_weight, num_rotatable_bonds,\n               fraction_csp3, polar_surface_area, alogP,\n               hydrogen_bond_donors, hydrogen_bond_acceptors, num_rings]\n\n# Predict the drug-likeliness score\nscore = model.predict([descriptors])\n\n# Print the drug-likeliness score\nprint("Drug-likeliness score:", score[0])\n<execute>\n```\n\nThe drug-likeliness score for the given compound will be printed out.</execute>', '==================================\x1b[1m Ai Message \x1b[0m==================================\n\n<observation>Error: unterminated string literal (detected at line 16) (<string>, line 16)</observation>', '==================================\x1b[1m Ai Message \x1b[0m==================================\n\n<solution>The error occurred due to an unclosed string literal in the SMILES string. Let\'s fix the SMILES string by adding a missing closing parenthesis:\n\n1. [ ] Load the necessary libraries and functions\n2. [ ] Preprocess the given compound SMILES string\n3. [ ] Calculate the molecular descriptors for the preprocessed compound\n4. [ ] Use the calculated descriptors to predict the drug-likeliness score\n\nLet\'s start by loading the necessary libraries and functions:\n\n1. [ ] Load the necessary libraries and functions\n\n```python\n<execute>\nfrom rdkit import Chem\nfrom rdkit.Chem import Descriptors\nfrom rdkit.Chem.rdMolDescriptors import CalcMolFormula\nfrom rdkit.Chem.rdMolDescriptors import CalcMolLogP\nfrom rdkit.Chem.rdMolDescriptors import CalcMolWt\nfrom rdkit.Chem.rdMolDescriptors import CalcNumRotatableBonds\nfrom rdkit.Chem.rdMolDescriptors import CalcFractionCsp3\nfrom rdkit.Chem.rdMolDescriptors import CalcMolPolarSurfaceArea\nfrom rdkit.Chem.rdMolDescriptors import CalcAlogP\nfrom rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondDonors\nfrom rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondAcceptors\nfrom rdkit.Chem.rdMolDescriptors import CalcNumRings\n<execute>\n```\n\nNow, let\'s preprocess the given compound SMILES string with the corrected SMILES string:\n\n2. [ ] Preprocess the given compound SMILES string\n\n```python\n<execute>\nsmiles = "C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F"\nmol = Chem.MolFromSmiles(smiles)\n<execute>\n```\n\nNext, we will calculate the molecular descriptors for the preprocessed compound:\n\n3. [ ] Calculate the molecular descriptors for the preprocessed compound\n\n```python\n<execute>\nmol_formula = CalcMolFormula(mol)\nmol_logP = CalcMolLogP(mol)\nmol_weight = CalcMolWt(mol)\nnum_rotatable_bonds = CalcNumRotatableBonds(mol)\nfraction_csp3 = CalcFractionCsp3(mol)\npolar_surface_area = CalcMolPolarSurfaceArea(mol)\nalogP = CalcAlogP(mol)\nhydrogen_bond_donors = CalcMolHydrogenBondDonors(mol)\nhydrogen_bond_acceptors = CalcMolHydrogenBondAcceptors(mol)\nnum_rings = CalcNumRings(mol)\n<execute>\n```\n\nFinally, we will use the calculated descriptors to predict the drug-likeliness score:\n\n4. [ ] Use the calculated descriptors to predict the drug-likeliness score\n\n```python\n<execute>\n# Load the drug-likeliness model\nfrom sklearn.externals.joblib import load\nmodel = load("drug_likeliness_model.pkl")\n\n# Calculate the descriptors for the given compound\ndescriptors = [mol_formula, mol_logP, mol_weight, num_rotatable_bonds,\n               fraction_csp3, polar_surface_area, alogP,\n               hydrogen_bond_donors, hydrogen_bond_acceptors, num_rings]\n\n# Predict the drug-likeliness score\nscore = model.predict([descriptors])\n\n# Print the drug-likeliness score\nprint("Drug-likeliness score:", score[0])\n<execute>\n```\n\nThe drug-likeliness score for the given compound will be printed out.</solution></execute>'], '<solution>The error occurred due to an unclosed string literal in the SMILES string. Let\'s fix the SMILES string by adding a missing closing parenthesis:\n\n1. [ ] Load the necessary libraries and functions\n2. [ ] Preprocess the given compound SMILES string\n3. [ ] Calculate the molecular descriptors for the preprocessed compound\n4. [ ] Use the calculated descriptors to predict the drug-likeliness score\n\nLet\'s start by loading the necessary libraries and functions:\n\n1. [ ] Load the necessary libraries and functions\n\n```python\n<execute>\nfrom rdkit import Chem\nfrom rdkit.Chem import Descriptors\nfrom rdkit.Chem.rdMolDescriptors import CalcMolFormula\nfrom rdkit.Chem.rdMolDescriptors import CalcMolLogP\nfrom rdkit.Chem.rdMolDescriptors import CalcMolWt\nfrom rdkit.Chem.rdMolDescriptors import CalcNumRotatableBonds\nfrom rdkit.Chem.rdMolDescriptors import CalcFractionCsp3\nfrom rdkit.Chem.rdMolDescriptors import CalcMolPolarSurfaceArea\nfrom rdkit.Chem.rdMolDescriptors import CalcAlogP\nfrom rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondDonors\nfrom rdkit.Chem.rdMolDescriptors import CalcMolHydrogenBondAcceptors\nfrom rdkit.Chem.rdMolDescriptors import CalcNumRings\n<execute>\n```\n\nNow, let\'s preprocess the given compound SMILES string with the corrected SMILES string:\n\n2. [ ] Preprocess the given compound SMILES string\n\n```python\n<execute>\nsmiles = "C1CO[C@@H]1CN2C3=C(C=CC(=C3)C(=O)O)N=C2CN4CCC(CC4)C5=NC(=CC=C5)OCC6=C(C=C(C=C6)C#N)F"\nmol = Chem.MolFromSmiles(smiles)\n<execute>\n```\n\nNext, we will calculate the molecular descriptors for the preprocessed compound:\n\n3. [ ] Calculate the molecular descriptors for the preprocessed compound\n\n```python\n<execute>\nmol_formula = CalcMolFormula(mol)\nmol_logP = CalcMolLogP(mol)\nmol_weight = CalcMolWt(mol)\nnum_rotatable_bonds = CalcNumRotatableBonds(mol)\nfraction_csp3 = CalcFractionCsp3(mol)\npolar_surface_area = CalcMolPolarSurfaceArea(mol)\nalogP = CalcAlogP(mol)\nhydrogen_bond_donors = CalcMolHydrogenBondDonors(mol)\nhydrogen_bond_acceptors = CalcMolHydrogenBondAcceptors(mol)\nnum_rings = CalcNumRings(mol)\n<execute>\n```\n\nFinally, we will use the calculated descriptors to predict the drug-likeliness score:\n\n4. [ ] Use the calculated descriptors to predict the drug-likeliness score\n\n```python\n<execute>\n# Load the drug-likeliness model\nfrom sklearn.externals.joblib import load\nmodel = load("drug_likeliness_model.pkl")\n\n# Calculate the descriptors for the given compound\ndescriptors = [mol_formula, mol_logP, mol_weight, num_rotatable_bonds,\n               fraction_csp3, polar_surface_area, alogP,\n               hydrogen_bond_donors, hydrogen_bond_acceptors, num_rings]\n\n# Predict the drug-likeliness score\nscore = model.predict([descriptors])\n\n# Print the drug-likeliness score\nprint("Drug-likeliness score:", score[0])\n<execute>\n```\n\nThe drug-likeliness score for the given compound will be printed out.</solution></execute>')

@ryanDing26
Copy link
Contributor Author

Not sure what to do about the failed check as it corresponds to a file not within the scope of my change, wanted to make note of that. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant