❓ Project Description

A multi-threaded ReAct agent system designed to extract responsibilities, requirements, and skills from job advertisements using deep-learning agent principles and dynamic tool calling. Extractions are compared to a skill taxonomy (ESCO).

💡 Motivation

Extracting responsibilities, requirements and skills from job advertisements can be fairly complicated given by the following challenges:

Job ads

Parsing Complexity:
- Job ads vary widely in quality and structure.
- No single parsing rule can reliably extract or organise the information from all ads.
Job “Theme” Quality:
- The richness and quality of a job ad depend on its domain.
- For example, Trade & Services and Self-Employment ads can possess significantly less context when compared with other themes.
Company “Lingo”:
- A company can use their own cultural lingo in the job ad.
Skills, Responsibilities, and Requirements:
- Job ads may not contain all required fields to MAP, given the theme quality.
Regional Variability & Cultural variation:
- Job ad terminology and expectations can differ across regions and the culture.

Skills:

Domain and jargon drift:
- Skill definitions evolve rapidly across domains.
- Example: Context Engineering is emerging with entirely new skill sets.
Skill blurring:
- A skill can represent both hard and soft skills, depending on the context.
- Example: “Communication” may refer to a technical process (hard) or interpersonal ability (soft).
Overlapping skills:
- Similar skills may have overlapping meanings.
- Example: “Charting” vs “Visualising” may describe the same capability in different contexts.

Skill Taxonomy:

Description columns define either an occupation or a skill.
Descriptions don’t always align with actual requirements or responsibilities, which can shift with context.
- Example: The occupation Data Analyst can describe the responsibility to “Create visual reports to communicate insights” ➡️ What someone does
- Meanwhile, the skill Data Visualisation can be described as ”Designing visuals using tools” ➡️ What someone must know
Occupation Blurring: The same occupation may be interpreted differently by companies and taxonomies, leading to contextual mismatches.

🤖 ReAct Agent: Why?

ReAct agents fully exhaust a task before progressing to the next, ensuring thorough and coherent reasoning.
Deep Learning ReAct agents implement the following principles:
- 🗃️ Planning: For example, creating a TODO list to plan and repeat the objective (reason why the agent decided to "tick off a task")
- 📝 Offload context: Capturing notes to assist the agent in accomplishing a task.
- 🤝 Task delegation: Can delegate tasks to sub-agents that specialise in the task at hand.
- 💬 Use Careful & Extensive Prompt Engineering: Building prompts to describe, constrain, and clarify subsequent processes to the Agent as a “system prompt” (see example here).

Due to their capacity to reason and act iteratively, powered by a strong LLM foundation, ReAct agents excel in interpreting nuanced, context-rich data. In job advertisement analysis, where skills, responsibilities, and requirements are often vague or context-dependent, ReAct agents overcome this by:

Actively reasoning through the context of the job ad and the implied meaning.
Decomposing ambiguous statements/job-ad sections into clear elements.
Decomposing ambiguous statements/job-ad sections into clear elements.

🧰 Tools included

#	Name	purpose
1	update_content	A tool to extract raw context from a job advertisement.
2	write_todo	A tool to write concise TASKS to inform and track your progress.
3	read_todos	A tool to read the TODOs to remind the ReAct agent of the plan.
4	extract_soft_skills	A tool to extract soft skill entities from a job advertisement.
5	extract_hard_skills	A tool to extract hard skill entities from a job advertisement.
6	check_for_bothskills	A tool to validate and resolve overlapping skills (hard/soft) that were categorised as 'hard' or 'soft' skills.
7	extract_responsibilities	A tool to extract responsibilities from a job advertisement.
8	extract_requirements	A tool to extract requirements from a job advertisement.
9	evaluate_correctness	A tool to perform G-Eval for Correctness.

🔑 Key Concepts Implemented

Entity extractins with LangExtract
- A Google tool optimised for long-document entity extraction.
- Extracts hard skills, soft skills, years of experience, and contact persons with high recall by using chunking, parallel processing, and multi-pass strategies.
Context offloading with InjectedToolCallId and InjectedState.
Prompt base extraction tools
Custom G-Eval functionality for Correctness using Jinja2.
LLM hyperparameter optimisation: Optimising recursion_limit & remaining_steps
Semantical evalution - Evaluating and comparing results between ReAct extractions and skill taxonomy.
- Embedding generation: TechWolf/JobBERT-v2 sentence-transformer model.
- Comparison methodology: Cosine Similarity
- Leveraging Datasets.map() utility
Deep agents princples: 🗃️ Planning, 📝 Offload context, 💬 Careful & Extensive Prompt Engineering

↔️ Symantical Evaluation: Approach

Taxonomy Skill Concatenation:
- Purpose: Combine all possible labels for occupations and skills — including both preferred labels and alternative labels.
- Reason: Ensures that the evaluation considers all available labels in the taxonomy, capturing variations and synonyms in skill or occupation naming.
ReAct – Threshold Alpha on “Correctness”:
- Correctness (GEval measure): Evaluates how well the agent’s output aligns with the expected results.
- In Context: Measures whether skills extracted from job ads are accurate and contextually correct.
  - High Alignment Range: 0.8–1.0 (as per GEval definition).
Filtering: Only extracted results with Correctness ≥ 0.8 are considered for evaluation to ensure the taxonomy mapping reflects only valid extractions.
Cosine Similarity:
- Computes the similarity between extracted results and taxonomy embeddings.
- Maps each extracted skill or label to the K closest items in the embedding space.
- Ensures a semantic alignment between extracted results and the taxonomy, even if exact wording differs.

𐄷 Project Notebooks

#	Name	Technique
1	`ads.ipynb`	Data preprocessing and traditional ML techniques
2	`ReAct.ipynb`	Schemas, tools, prompts, and agent definition
3	`semantical_eval.ipynb`	Semantic Evaluation skill-taxonomy <> ReAct exractions

🪾 Project Structure

.
├── LICENSE
├── notebooks
│   ├── ads.ipynb
│   ├── data
│   │   ├── ads_preprocessed.csv
│   │   ├── ads-50k.json
│   │   ├── occupations_en.csv
│   │   ├── sample_results.csv
│   │   └── skills_en.csv
│   ├── output_images
│   │   ├── geval_class.png
│   │   ├── geval_correctness.png
│   │   ├── langextract_json.png
│   │   └── output.png
│   ├── ReAct.ipynb
│   ├── semantical_eval.ipynb
│   ├── softskills
│   │   ├── softskills.html
│   │   └── softskills.jsonl
│   └── utils
│       ├── __init__.py
│       ├── __pycache__
│       │   ├── __init__.cpython-312.pyc
│       │   └── print_utils.cpython-312.pyc
│       └── print_utils.py
├── pyproject.toml
├── README.md
├── src
│   ├── agent
│   │   ├── __init__.py
│   │   ├── evaluation.py
│   │   ├── preprocess_utils.py
│   │   ├── prompts.py
│   │   ├── req_and_res.py
│   │   ├── skill_utils.py
│   │   ├── state.py
│   │   ├── studio
│   │   │   ├── langgraph.json
│   │   │   ├── react.py
│   │   │   └── requirements.txt
│   │   └── todo_utils.py
│   └── data
│       ├── hard_skills.json
│       └── soft_skills.json
├── .gitignore
└── uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

❓ Project Description

💡 Motivation

Job ads

Skills:

Skill Taxonomy:

🤖 ReAct Agent: Why?

🧰 Tools included

🔑 Key Concepts Implemented

↔️ Symantical Evaluation: Approach

𐄷 Project Notebooks

🪾 Project Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
notebooks		notebooks
src		src
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

Kokolipa/react-jobads-extractions

Folders and files

Latest commit

History

Repository files navigation

❓ Project Description

💡 Motivation

Job ads

Skills:

Skill Taxonomy:

🤖 ReAct Agent: Why?

🧰 Tools included

🔑 Key Concepts Implemented

↔️ Symantical Evaluation: Approach

𐄷 Project Notebooks

🪾 Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages