aboutsummaryrefslogtreecommitdiff

Scholscan

Filters academic articles using TF-IDF on titles plus logistic regression.

Build

go build -o scholscan .

Usage

# Train model from articles you like
./scholscan train positives.jsonl --rss-feeds feeds.txt > model.json

# Score new RSS feed
./scholscan scan --url RSS_URL --model model.json > results.jsonl

# Run web server
./scholscan serve --port 8080 --model model.json --rss-world rss_world.txt

Endpoints

  • GET / - redirect to live feed
  • GET /live-feed - filtered articles web UI
  • GET /tools - score individual articles
  • POST /score - API for scoring titles
  • POST /scan - API for scanning RSS
  • GET /api/filtered/feed - JSON feed
  • GET /api/filtered/rss - RSS feed
  • GET /api/health - health check

Model settings

  • TF-IDF: unigrams + bigrams, MinDF=2, MaxDF=0.8
  • Logistic regression: λ=0.001, L2 regularization
  • Class balancing: downsample majority to 1:1 ratio