aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 870bf34976e88d61137c1b215e338106032afda4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Scholscan

Filters academic articles using TF-IDF on titles plus logistic regression.

## Build
```
go build -o scholscan .
```

## Usage
```
# Train model from articles you like
./scholscan train positives.jsonl --rss-feeds feeds.txt > model.json

# Score new RSS feed
./scholscan scan --url RSS_URL --model model.json > results.jsonl

# Run web server
./scholscan serve --port 8080 --model model.json --rss-world rss_world.txt
```

## Endpoints

- GET `/` - redirect to live feed
- GET `/live-feed` - filtered articles web UI
- GET `/tools` - score individual articles
- POST `/score` - API for scoring titles
- POST `/scan` - API for scanning RSS
- GET `/api/filtered/feed` - JSON feed
- GET `/api/filtered/rss` - RSS feed
- GET `/api/health` - health check

## Model settings

- TF-IDF: unigrams + bigrams, MinDF=2, MaxDF=0.8
- Logistic regression: λ=0.001, L2 regularization
- Class balancing: downsample majority to 1:1 ratio