Skip to content

Conversation

@NewdlDewdl
Copy link
Contributor

@NewdlDewdl NewdlDewdl commented Nov 7, 2025

Add Discount Programs Scraper

Summary

Implements scraper and parser for UTD Student Government discount programs (https://sg.utdallas.edu/discount/)

Extracts 205 discount programs across 13 categories.

Closes #109

Dependencies

⚠️ Depends on nebula-api PR #307 being merged first (adds DiscountProgram schema)

About go.mod Replace Directive

This PR includes a temporary replace directive for local development:

replace github.com/UTDNebula/nebula-api/api => ../nebula-api/api

After nebula-api PR merges, remove this line and update:

go get github.com/UTDNebula/nebula-api/api@latest
go mod tidy

Changes

New Files

  • scrapers/discounts.go - Scraper (saves raw HTML)
  • parser/discountsParser.go - Parser (extracts to JSON)
  • parser/discountsParser_test.go - Unit tests (7 functions, 26 test cases)
  • DISCOUNT_SCRAPER.md - Documentation

Modified Files

  • main.go - Added -discounts flag
  • go.mod - Added replace directive
  • README.md - Updated commands
  • runners/weekly.sh - Added to weekly schedule

Features

  • ✅ Headless mode compatible
  • ✅ Suppressed browser errors for clean output
  • ✅ Validates and cleans data (HTML entities, newlines)
  • ✅ 100% extraction success (205/205 entries)
  • ✅ All tests passing

Usage

# Scrape
./api-tools -scrape -discounts -o ./data -headless

# Parse
./api-tools -parse -discounts -i ./data -o ./data

# Test
go test ./parser -run Discount -v

Testing

✅ 7 test functions
✅ 26 test cases
✅ All passing

Covers: parsing, validation, HTML entities, phone detection, email extraction

Related PRs

  • nebula-api PR #307: Adds DiscountProgram schema (merge first)

rohin-sudo and others added 2 commits November 7, 2025 15:13
@NewdlDewdl
Copy link
Contributor Author

Note: This time checks failed due to the discount schema not existing in the nebula-api base repo yet. (Need PR). As of now the replace line has been removed and you can use the previous commit along with the repo version in the pull request in nebula-api for testing locally before confirming the merge.

@NewdlDewdl
Copy link
Contributor Author

🔄 Status Update

@mikehquan19 This PR is ready to merge after nebula-api PR #307.

Current Failing Checks - Expected ✅

The build is failing because the DiscountProgram schema doesn't exist in the base nebula-api repo yet:

undefined: schema.DiscountProgram

After PR #307 Merges:

  1. I'll update the go.mod dependency: go get github.com/UTDNebula/nebula-api/api@latest
  2. All checks will pass ✅
  3. Ready for final merge

What's Ready Now:

  • ✅ All 26 tests passing locally
  • ✅ 205/205 programs extracted successfully
  • ✅ Full documentation included
  • ✅ Code review ready

Just waiting on the schema PR to unlock this! Should be quick after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add the scraper for Comet Discount Program

2 participants