(Jump directly to How to Test)
Codes and input files are kept in the genai directory. All unit tests are kept in the test directory.
The initial task has three subtasks: hello world, split PDF and count word frequency in PDF. The first one is pretty straightforward. The script initial_task.py helps execute the second and third one.
The script hello.py runs a default ‘Hello World’ job in Ganga on Local backend.
[Note: A visual tree of the working directories may help you easily follow the different code dependencies mentioned below.]
(Go back to Testing)
initial_task.py → task execution script (submits ganga job)
run_initial_task.sh → wrapper script that invokes the actual task script
split_pdf.py → splits PDF file
- The script
initial_task.pycreates a bash script calledrun_initial_task.shand submits a ganga job that executes this script as anExecutableapplication. - This wrapper script, when invoked by the ganga job ‘split_pdf’, calls the python script
split_pdf.pythat splits the PDF fileLHC.pdfinto 29 separate PDFs to account for the 29 pages. - These extracted files are stored in the folder
extracted_pagesinside thegenaidirectory.
(Go back to Testing)
initial_task.py → task execution script (submits ganga job)
run_initial_task.sh → wrapper script that invokes the actual task script
count_it.py → counts the number of occurences of the word ‘it’ in LHC.pdf
- The same script
initial_task.pysubmits a ganga job named ‘count_it’ that invokes the bash scriptrun_initial_task.sh. - However, this time the job passes individual page numbers and the target word ‘it’ as arguments to the bash script using
ArgSplitter. As a result,run_initial_task.shgets called 29 times to account for every page inLHC.pdf. - Each time
run_initial_task.shgets called, it invokes the Python scriptcount_it.pywith a different page number as one of the arguments. - The Python script then counts the word frequency of ‘it’ in that page and prints it out. Ganga’s
ArgSplittersaves the output to a file calledstdoutin the user's local ganga workspace directory. - Then the job calls
TextMergerto merge the 29 stdout files. - Finally,
count_it.pyparses the merged output by singling out the word counts, adds them up and stores the final count to a text file calledcount_it.txtin thegenaidirectory.
- There are 4 test files that contain 17 unit tests.
- The files
test_Hello.py,test_SplitPDF.pyandtest_CountIt.pycontain tests that demonstrate if each unit that contribute to executing the tasks is working. - The file
test_CompleteSystem.pycontains 2 unit tests. These tests make complete system calls to demonstrate if the subtasks split PDF and Count word frequency are getting executed properly. - I used the
sleep_until_completedfunction from ganga’s core testing framework to wait for job completion before making post-job assertions.
For this task, I chose the LLM deepseek-coder-1.3b-instruct. This model is trained for code generation and completion.
I used the Extractum LLM search directory, which has details on about 30,000 LLMs, to make a list of models that I would be able to test locally as well as on online notebook platforms Google Collab and Kaggle for free. These online platforms provide free GPU time that helped expedite testing time.
After shortlisting, I retrieved the models from Huggingface and created a test script to test their performance on the prompt.
I tested 33 LLMs (see Appendix C: List of LLMs tested).
Based on quality of output, the best model was deepseek-coder-1.3b-instruct. It was consistently able to generate a perfect Python code snippet to approximate Pi using accept-reject simulation, generate another snippet to submit the Ganga job and also a wrapper bash script.
While testing the models, I was also able to fine tune my prompt (Appendix B shows this version).
I faced some drawbacks and challenges while testing the model.
- The LLM could not generate proper import statements for ganga.
- It would not use the bash script that it wrote as an argument to the ganga job. Instead, it kept passing the Python script as the argument to
Fileor started hallucinating. - It used different types of markers to delineate the different code snippets. This issue made parsing its output to extract only the codes somewhat challenging.
With the LLM selected and a working prompt crafted, I created two Python scripts, InterfaceGanga.py and run_InterfaceGanga.py, to programmatically generate output from the LLM. I also created a test file test_GangaLLM.py that executes a unit test to examine if the proposed code by the LLM tries to execute the job in Ganga.
-
InterfaceGanga.pyContains the class
InterfaceGangathat contains methods to:- Initialize model parameters
- Run inference on the LLM to generate output
- Store the output
- Extract necessary code snippets from the output
- Write the snippets to appropriate scripts
-
run_InterfaceGanga.pyCreates an
InterfaceGangaobject to generate code for the task using the LLM and store them as scripts in thegenaidirectory. -
test_GangaLLM.pyExecutes
run_InterfaceGanga.pyand checks if the code generated by the LLM attempts to execute the proposed code in Ganga. This test file is kept in thetestdirectory.
The setup.py file includes all the required packages to run and test my code.
1.setup.webm
-
In the Linux terminal in your favourite directory, clone this repository by replacing
[PAT]in the command below with your GitHub PAT (Personal Access Token) and enter theGangaGSoC2024project directory.git clone https://[PAT]@github.com/dg1223/GangaGSoC2024.git cd GangaGSoC2024 -
Set up a virtual environment
python3 -m venv GSoC cd GSoC/ . bin/activate
-
Install dependencies (note the double dots in the second command - we need to be in the project's root directory to install additional packages)
python -m pip install --upgrade pip wheel setuptools python -m pip install ..
-
Activate ganga
activate-ganga.webm
./bin/ganga
You can run all three subtasks in the ganga prompt.
Demonstrate that you can run a simple Hello World Ganga job that executes on a Local backend.
This task is executed by the script hello.py.
2.hello-world.webm
In the ganga prompt, first go to the genai directory which should be at the same level as the GSoC directory.
cd ../genaiThen run:
ganga hello.pyIt will run the Hello World Ganga job. If the job runs successfully, you should see the following output:
To check the job's stdout, run the command: jobs(job_id).peek('stdout')
Let’s say the job_id is 100. Running the command shown in the output should show you the stdout of the job:
jobs(100).peek('stdout')You should see:
Hello World
/path_to_your_ganga_workspace/user/LocalXML/100/output/stdout (END)
Press q to get back to ganga prompt.
Create a job in Ganga that demonstrates splitting a job into multiple pieces and then collates the results at the end.
This task is executed by the script initial_task.py which takes the script split_pdf.py as an argument. split_pdf.py contains the logic for this subtask.
3.split-pdf.webm
In the ganga prompt, run:
ganga initial_task.py split_pdf.pyIf the job runs successfully, you should see the following output:
Extracted pages from LHC.pdf have been saved in the folder /path_to_this_git_repo/GangaGSoC2024/genai/extracted_pages
For a detailed stdout, run the command: jobs(job_id).peek('stdout') in ganga prompt.
Let’s say the job_id is 101. Running the command shown in the output should show you the stdout of the job:
jobs(101).peek('stdout')You should see 29 lines in the output. Each one of them should look like the following:
Extracted page 1 from /path_to_this_git_repo/GangaGSoC2024/genai/LHC.pdf and saved as /path_to_this_git_repo/GangaGSoC2024/genai/extracted_pages/LHC_page_1.pdf
Press q to get back to ganga prompt.
Check output
In the genai directory, you should see a new folder called extracted_pages. In this folder, there should be 29 PDF files. Page 1 of LHC.pdf has been extracted as LHC_page_1.pdf, page 2 as LHC_page_2.pdf and so on up to LHC_page_29.pdf.
Create a a second job in Ganga that will count the number of occurences of the word "it" in the text of the PDF file.
This task is executed by the script initial_task.py which takes the script count_it.py as an argument. count_it.py contains the logic for this subtask.
4.count_it.webm
In the ganga prompt, run:
ganga initial_task.py count_it.pyAs the job executes, you should see the following output that includes a timer. This job times out after 1 minute. It should not take more than a few seconds for this job to finish.
Waiting for job to finish. Maximum wait time: 1 minute
00:01
If the job runs successfully, you should see the following output:
>>> Frequency of the word 'it' = 31 <<<
The word count has been stored in the same directory as this script: /path_to_this_git_repo/GangaGSoC2024/genai/count_it.txt
Run this command to see the stored result: cat /path_to_this_git_repo/GangaGSoC2024/genai/count_it.txt
Run this command to check the output from TextMerger: jobs(746).peek('stdout')
Check output
As the second line in the output suggests, the job should have created a text file called count_it.txt in the genai directory. Open this file to check the word count. It should read 31.
Alternatively, you can check the content of this file by running the command shown by the third line of the job’s stdout:
cat /path_to_this_git_repo/GangaGSoC2024/genai/count_it.txt
You should see 31 in the ganga prompt.
Let’s say the job_id is 102. Running the command shown in the last line of the output should display the stdout from the job:
jobs(101).peek('stdout')You should see the merged output from Ganga TextMergeTool.
# Ganga TextMergeTool - [date_and_timestamp] #
# Start of file /path_to_your_ganga_workspace/user/LocalXML/746/0/output/stdout #
5
...
# Ganga Merge Ended Successfully #
(END)
Press q to get back to ganga prompt.
This is all there is to checking if the ‘initial task’ was successfully executed. How to execute the unit tests is shown in the last section below.
Quit ganga and get back to the GSoC directory:
quitEdge Cases to Consider in Counting Word Frequency
There were 2 edge cases that I needed to address to get the correct word count. I used the most popular PDF processing library pypdf to extract text from LHC.pdf. Upon examining the extracted text, I found the following edge cases:
- The word ‘It’ appears after a line break and a bullet point in page 3.
- It is already known that…’
- Citation markers (square brackets
[])- page 8: safety systems to contain it.[85]
- page 16: TV series based on it.[177]
The purpose is to demonstrate that you can communicate with a Large Language Model in a programmatic way.
The most straightforward way to test this task is to run the corresponding unit test. If it passes, then the task is complete.
However, this test takes time to complete if it is run on a CPU. In this case, I suggest running all the tests together (see Running unit tests) to save time. The test first executes the run script run_InterfaceGanga.py that automatically detects if the system has a CUDA compatible GPU or not.
If you want to run this unit test spefically from test_GangaLLM.py, go to the test directory (assuming you are in GSoC):
cd ../testNow run:
python -m pytest test_GangaLLM.pyIf it passes, it means the test tried to execute the code in ganga that was proposed by the LLM.
The test actually calls the function run_ganga_llm() from run_InterfaceGanga.py.
Success
If the LLM remains consistent with the type of answers it produced when I tested it locally, you should see 3 scripts in the test directory:
estimate_pi.pyorpi_estimation.py, or a Python script with the same name as the function name that the LLM generated for the Pi approximation code.run_ganga.sh: This script is supposed to be theExecutableapplication for the ganga job that invokes the Python script to estimate Pi’s value.run_ganga_job.py: This is the main script that submits the ganga job.
Failure
The test will fail if the script run_ganga_job.py is not found. It means the LLM either provided the code snippets in a different style than what it did during my testing or it hallucinated.
The test will also fail if it fails in its attempt to run the ganga job.
Depending on the system configuration. the test takes 8-25 minutes to finish on a CPU (at least Intel Core i5 3rd generation) or less than a minute on a CUDA compatible GPU such as the NVIDIA Tesla P100. Minimum memory requirements are 16GB RAM and 8GB vRAM (if run on GPU).
(Go back to Subtask 3 or ‘Interfacing Ganga’)
5.test.webm
Assuming you are in the test directory of the project, all of the 18 unit tests can be run by executing:
python -m pytestTest scripts:
test_ArgSplitter.py
test_CompleteSystem.py
test_CountIt.py
test_GangaLLM.py
test_Hello.py
test_SplitPDF.py
test_trivial.py
(Go back to Preparation)
https://github.com/jncraton/languagemodels
languagemodels API documentation
👋 Welcome to MLC LLM — mlc-llm 0.1.0 documentation
New localllm lets you develop gen AI apps locally, without GPUs | Google Cloud Blog
Large Language Models for Code Generation
Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)
GitHub - GoogleCloudPlatform/localllm
(Go back to Initial task)
genai
./genai
├── count_it.py
├── hello.py
├── initial_task.py
├── __init__.py
├── InterfaceGanga.py
├── LHC.pdf
├── run_InterfaceGanga.py
└── split_pdf.py
test
./test
├── __init__.py
├── LHC.pdf
├── test_ArgSplitter.py
├── test_CompleteSystem.py
├── test_CountIt.py
├── test_GangaLLM.py
├── test_Hello.py
├── test_SplitPDF.py
└── test_trivial.py
(Go back to Preparation)
I want to use Ganga to calculate an approximation to the number pi using an accept-reject simulation method with one million simulations. I would like to perform this calculation through a Ganga job. The job should be split into a number of subjobs that each do thousand simulations.The code should be written in Python.
Here are some instructions that you can follow.
- Write code to calculate the approximation of pi using the above-mentioned method.
- Write a bash script that will execute the code above.
- Run a ganga job using local backend: j = Job(name=job_name, backend=Local())
- Run the Bash script as an Executable application: j.application = Executable() j.application.exe = File(the_script_to_run)
- Use ArgSplitter to split the job: j.splitter = ArgSplitter(args=splitter_args) It should split the job into a number of subjobs that each do thousand simulations.
- Merge output from the splitter using TextMerger: j.postprocessors.append(TextMerger(files=['stdout']))
- Run the ganga job: j.submit()
Do not give me code as IPython or Jupyter prompts. Give me the python script.
(Go back to Choose the best model)
List of LLMs that were tested:
# 33 models
deepseek-coder-1.3b-base
deepseek-coder-1.3b-instruct
deepseek-coder-6.7b-base
deepseek-coder-6.7b-instruct
Deci/DeciCoder-1b
ramgpt/deepseek-coder-6.7B-GPTQ
mlx-community/stable-code-3b-mlx
mlx-community/CodeLlama-7b-Python-4bit-MLX
mlx-community/CodeLlama-7b-Instruct-hf-4bit-MLX
stabilityai/stable-code-3b
stabilityai/stablecode-instruct-alpha-3b
stabilityai/stablecode-completion-alpha-3b
TheBloke/CodeLlama-7B-GGUF
TheBloke/CodeLlama-7B-GGML
TheBloke/Llama-2-Coder-7B-GGUF
TheBloke/deepseek-coder-1.3b-base-AWQ
TheBloke/deepseek-coder-6.7B-base-GGUF
TheBloke/deepseek-coder-6.7B-instruct-GGUF
TheBloke/stablecode-instruct-alpha-3b-GGML
Salesforce/codegen2-1B
microsoft/phi-1
LoneStriker/deepseek-coder-6.7b-instruct-4.0bpw-h6-exl2-2
casperhansen/mpt-7b-8k-chat-gptq
smangrul/codellama-hugcoder-merged
unsloth/gemma-2b-bnb-4bit
unsloth/codellama-13b-bnb-4bit
davzoku/cria-llama2-7b-v1.3-q4-mlx
Deci/DeciCoder-1b
WizardLM/WizardCoder-1B-V1.0
WizardLM/WizardCoder-3B-V1.0
codellama/CodeLlama-7b-hf
codellama/CodeLlama-7b-Python-hf
smallcloudai/Refact-1_6B-fim