Skip to content
/ chat-lse Public template

A proof of concept of a full data pipeline to index data from LSE websites and make /search available as an API endpoint

License

Notifications You must be signed in to change notification settings

LSE-DSI/chat-lse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💬 ChatLSE

Total Contributors:

All Contributors

Table of Contents:

💡 About the Project

The ChatLSE project is a proof of concept of a full data pipeline to index data from LSE websites. In this project, we gathered all public LSE documents and webpages into a database and then develop a chat interface using an LLM. Think of it as a ChatGPT meant to be particularly knowledgeable of LSE documents. Utilising retrieval augmented generation (RAG), the ChatLSE chatbot is capable of answering queries from staff and students by consulting relevant LSE documents and regulations.

As all parts of this application are completely open-source, this project also aims to serve as a blueprint for a fully open-source RAG solution. The full workflow of the project is illustrated below:

Overall workflow of the project

The workflow improves upon vanilla implementations of RAG by adding components of query rewriter and query classifier. They ensure that the chatbot behaves more naturally when interacting with users, being able to handle follow-up questions by referring to previous context and knowing when to deny answering questions that are out of the scope of its intended usage.

🧑‍💻 The Team

ChatLSE was initially created by a small team from the LSE Data Science Institute over the summer of 2024. We now hope to make it open-source and community-driven to allow everyone to contribute to this project. Everyone who contributes to this project, no matter how small or big their contributions are, is recognised in this project as a contributor and a community member.

The project is coordinated and managed by Jonathan Cardoso-Silva.

Please see the Contributors Table for the GitHub profiles of all our contributors.

🔧 Contributing

This repository is always a work in progress and everyone is encouraged to help us build something that is useful to the many.

Everyone who joins the project should check out our contributing guidelines for more information on how to get started.

Community members are provided with opportunities to learn new skills, share their ideas and collaborate with others.

✉️ Get in Touch

You can contact the ChatLSE team by emailing [email protected].

✨ Contributors

Thanks goes to these wonderful people (emoji key):

Terry Zhou
Terry Zhou

🐛 💻 🔣 📖 🤔 👀
Jon Cardoso-Silva
Jon Cardoso-Silva

💻 📖 🤔 👀 🧑‍🏫 📆
akshsabherwal
akshsabherwal

🐛 💻 📖 🤔 👀
KristinaD1910
KristinaD1910

🐛 💻 🔣 📖 🤔 👀
Jinshuai Ma
Jinshuai Ma

💻 📖 💡 🚇 🧑‍🏫
Riya Chhikara
Riya Chhikara

💻 🔣 📖 🧑‍🏫
Kylin Gao
Kylin Gao

🤔 ⚠️
Alexey Burmistrov
Alexey Burmistrov

🤔 ⚠️
Add your contributions

This project follows the all-contributors specification. Contributions of any kind welcome!

About

A proof of concept of a full data pipeline to index data from LSE websites and make /search available as an API endpoint

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 6