Scholscan
Filters academic articles using TF-IDF on titles plus logistic regression.
Build
go build -o scholscan .
Usage
# Train model from articles you like
./scholscan train positives.jsonl --rss-feeds feeds.txt > model.json
# Score new RSS feed
./scholscan scan --url RSS_URL --model model.json > results.jsonl
# Run web server
./scholscan serve --port 8080 --model model.json --rss-world rss_world.txt
Endpoints
- GET
/- redirect to live feed - GET
/live-feed- filtered articles web UI - GET
/tools- score individual articles - POST
/score- API for scoring titles - POST
/scan- API for scanning RSS - GET
/api/filtered/feed- JSON feed - GET
/api/filtered/rss- RSS feed - GET
/api/health- health check
Model settings
- TF-IDF: unigrams + bigrams, MinDF=2, MaxDF=0.8
- Logistic regression: λ=0.001, L2 regularization
- Class balancing: downsample majority to 1:1 ratio
