Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
742b527
backend blur
ema-png Feb 26, 2026
24d8f77
Merge remote-tracking branch 'origin/main' into backend-blur
p1an0guy Feb 26, 2026
4508294
Department UI
Noah-Gullo Feb 26, 2026
afb6474
fix: add pdf2pic type declarations for upload preview
p1an0guy Feb 26, 2026
431d426
supabase
Noah-Gullo Feb 26, 2026
cf87335
Merge main into pr-71 and resolve conflicts
p1an0guy Feb 26, 2026
d87b6de
Merge pull request #71 from codebox-calpoly/backend-blur
p1an0guy Feb 26, 2026
3fcff2d
fix: consistent wording in profile page
p1an0guy Feb 26, 2026
c8b8b3e
feat: department request flow with email notification
joshuapanicker Feb 26, 2026
38bc492
fix: department request use name only, add schema migrations
joshuapanicker Feb 26, 2026
41f4cac
feat: browse sidebar, leaderboard dark mode, catalog terms fallback
joshuapanicker Mar 3, 2026
e66e5fc
fix: note preview modal UI and remove bottom Close button
joshuapanicker Mar 3, 2026
5f92709
feat: remove Link resource type, add type filters on class notes page
joshuapanicker Mar 3, 2026
3eeff74
feat: profile page show nickname and email, avatar from nickname
joshuapanicker Mar 3, 2026
03ea6d4
feat: TEXTEXTRACTOR API, note search by content, PDF preview, preview…
joshuapanicker Mar 3, 2026
993e17b
fix: load note preview via auth fetch + blob URL; reduce blur
joshuapanicker Mar 3, 2026
f9f4050
fix: note search UI and upload approval flow
joshuapanicker Mar 3, 2026
0a12bd9
fix: navbar logo link styling in dark mode
joshuapanicker Mar 3, 2026
572981a
favorites button + favorites on profile
ema-png Mar 5, 2026
4f68a2a
add favorites button to note cards
p1an0guy Mar 5, 2026
c14e0d1
Merge pull request #75 from codebox-calpoly/favorites
p1an0guy Mar 5, 2026
d538f1a
Merge branch 'origin/request-department' into codex/merge-pr-74-reque…
p1an0guy Mar 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 159 additions & 0 deletions docs/TEXTEXTRACTOR_SETUP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# TEXTEXTRACTOR — Full setup and usage guide

This guide walks you through the **entire** TEXTEXTRACTOR flow: what it does, how to use the API, how it’s built, and optional external services for better handwriting support.

---

## What it does

- **Typed/digital PDFs**
Extracts text from the PDF’s built-in text layer (no OCR). Fast and accurate.

- **Scanned / handwritten PDFs**
When there’s little or no text in the file, the pipeline falls back to **OCR**:
1. Renders PDF pages to images (using `pdf2pic`).
2. Runs **Tesseract.js** on each image to recognize text (including handwriting, with varying quality).

So one API handles both: you send a PDF and get back plain text, with an indication of whether it came from the text layer or OCR.

---

## Using the API

### Endpoint

- **URL:** `POST /api/textextractor`
- **Auth:** None required for this route (add auth in your app if you want).
- **Body:** `multipart/form-data` with a PDF file.

### Form fields

| Field | Required | Description |
|------------------|----------|-------------|
| `file` or `pdf` | Yes | The PDF file. |
| `force_ocr` | No | If `"true"` or `"1"`, skip typed extraction and run OCR only (useful for known scanned docs). |
| `max_ocr_pages` | No | Max number of pages to run OCR on (default 10, cap 50). |

### Example (curl)

```bash
curl -X POST http://localhost:3000/api/textextractor \
-F "file=@/path/to/notes.pdf"
```

Force OCR for all (e.g. scanned) PDFs:

```bash
curl -X POST http://localhost:3000/api/textextractor \
-F "file=@/path/to/scan.pdf" \
-F "force_ocr=true"
```

### Example (JavaScript)

```javascript
const form = new FormData();
form.append("file", pdfFile); // File from <input type="file"> or fetch

const res = await fetch("/api/textextractor", {
method: "POST",
body: form,
});
const data = await res.json();

if (!res.ok) {
console.error(data.error, data.details);
return;
}

console.log("Method:", data.method); // "typed" or "ocr"
console.log("Text:", data.text);
console.log("Pages processed:", data.pages_processed);
console.log("Total pages:", data.total_pages);
if (data.ocr_confidence != null) {
console.log("OCR confidence:", data.ocr_confidence);
}
```

### Response shape

- **Success (200):**
- `text` (string) — Extracted text.
- `method` (`"typed"` | `"ocr"`) — How the text was obtained.
- `pages_processed` (number) — Pages used for extraction/OCR.
- `total_pages` (number) — Total pages in the PDF.
- `ocr_confidence` (number, optional) — Only when `method === "ocr"`; average confidence from Tesseract (0–100).

- **Error (4xx/5xx):**
- `error` (string) — Short message.
- `details` (string, optional) — Extra info (e.g. stack or internal message).

---

## How it’s built (in this repo)

1. **Dependencies**
- **pdfjs-dist** — Reads the text layer from a PDF buffer (typed PDFs); already in the project.
- **pdf2pic** — Renders PDF pages to PNG/JPEG (already used for previews).
- **tesseract.js** — OCR on images (works in Node; no separate Tesseract install).

2. **Library: `frontend/lib/textextractor.ts`**
- `extractTyped(buffer)` — Uses `pdfjs-dist` to get text and page count.
- `extractOcr(pdfPath, maxPages)` — Uses `pdf2pic` + Tesseract.js to OCR pages (writes PDF to a temp file because `pdf2pic` needs a path).
- `extract(buffer, options)` — Tries typed first; if the average text per page is below a threshold (default 20 chars), runs OCR. Options: `maxOcrPages`, `minCharsPerPage`, `forceOcr`.

3. **API route: `frontend/app/api/textextractor/route.ts`**
- Accepts `multipart/form-data` with `file` or `pdf`.
- Validates type (PDF) and size (max 25 MB).
- Calls `extract()` and returns the JSON above.

No external API keys are required for the built-in flow.

---

## Optional: external APIs for better handwriting

The built-in pipeline uses **Tesseract.js** for OCR. It’s free and runs locally but handwriting quality can be mixed. If you need better results on handwritten notes, you can:

1. **Keep the current API** for typed PDFs and “good enough” handwritten/scanned text.
2. **Add an optional external OCR step** (e.g. only when `force_ocr=true` or when Tesseract confidence is low) that calls one of the services below and then merge or replace the text.

### Free / freemium options

| Service | Free tier / notes |
|----------------------|--------------------|
| **Google Cloud Vision** | $300 free credit; Document AI has document + handwriting models. |
| **OCR.space** | 25,000 requests/month free; Engine 3 supports handwriting; API key required. |
| **OCRAPI.cloud** | 250 requests/month free; supports handwritten + typed. |
| **Azure Document Intelligence** | Free tier (e.g. 500 pages/month); good for documents and handwriting. |

### Wiring an external OCR provider

1. Add an env var for the API key (e.g. `OCR_API_KEY`, `GOOGLE_VISION_KEY`).
2. In `lib/textextractor.ts` (or a separate `lib/ocr-external.ts`):
- When you want to use the external API (e.g. `forceOcr` and key is set), render pages with `pdf2pic`, then send each image (or a multi-page PDF) to the provider’s REST API.
- Map their response to your existing `ExtractResult` shape (e.g. set `method: "ocr"`, `text`, `pagesProcessed`, optional `ocrConfidence`).
3. In the API route, pass a flag or option so the extractor can choose “internal OCR vs external OCR” (e.g. query param `ocr=external` or env-based).

This way you keep one **TEXTEXTRACTOR** API; only the backend implementation of “OCR path” changes when the key is set.

---

## Limits and tips

- **Size:** Max 25 MB per PDF (configurable in `route.ts`).
- **OCR pages:** Default max 10 pages for OCR; use `max_ocr_pages` to change (capped at 50) to balance speed and cost if you later add a paid API.
- **Handwriting:** Tesseract.js is best on clear, printed text; handwriting support is limited. For heavy handwriting, use an external API as above.
- **Performance:** Typed extraction is quick; OCR is slower (especially first run while Tesseract.js loads language data). Consider a “processing” status in the UI for large or OCR-only requests.

---

## Quick checklist

- [ ] Run `npm install` in `frontend` (adds `tesseract.js`; `pdfjs-dist` and `pdf2pic` are already present).
- [ ] Start dev server: `npm run dev` in `frontend`.
- [ ] Call `POST /api/textextractor` with a PDF in `file` or `pdf`.
- [ ] (Optional) Add auth to the route if the app is not public.
- [ ] (Optional) Integrate an external OCR API for better handwriting and set env keys.

You now have a single API that analyzes text from both typed and handwritten/typed PDF notes, with the option to plug in a stronger OCR service later.
101 changes: 101 additions & 0 deletions frontend/app/api/admin/approve-latest-resource/route.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
import { NextResponse } from "next/server";
import { headers } from "next/headers";
import { createClient } from "@supabase/supabase-js";
import { createClient as createServerClient } from "@/utils/supabaseServerClient";

type ResourceRow = {
id: string;
title: string;
status: string;
};

export async function POST() {
const headerStore = await headers();
const authHeader = headerStore.get("authorization");
const bearerToken = authHeader?.toLowerCase().startsWith("bearer ")
? authHeader.split(" ")[1]?.trim()
: null;

const supabaseUrl = process.env.NEXT_PUBLIC_SUPABASE_URL;
const supabaseServiceRoleKey = process.env.SUPABASE_SERVICE_ROLE_KEY;

if (!supabaseUrl || !supabaseServiceRoleKey) {
return NextResponse.json(
{ error: "Supabase environment variables are not configured." },
{ status: 500 },
);
}

const supabase = await createServerClient(bearerToken);
const {
data: { user },
error: userError,
} = await supabase.auth.getUser();

if (userError || !user) {
return NextResponse.json({ error: "Not authenticated." }, { status: 401 });
}

const adminClient = createClient(supabaseUrl, supabaseServiceRoleKey);

const { data: roles } = await adminClient
.from("user_roles")
.select("role")
.eq("profile_id", user.id)
.in("role", ["admin", "moderator", "developer"]);

if (!roles || roles.length === 0) {
return NextResponse.json(
{ error: "Only admins or moderators can approve notes." },
{ status: 403 },
);
}

// Find the most recently created pending resource and approve it.
const { data: rows, error: listError } = await adminClient
.from("resources")
.select("id, title, status")
.eq("status", "pending")
.order("created_at", { ascending: false })
.limit(1)
.returns<ResourceRow[]>();

if (listError) {
return NextResponse.json(
{ error: "Failed to list resources.", details: listError.message },
{ status: 500 },
);
}

const resource = rows?.[0];
if (!resource) {
return NextResponse.json(
{ ok: false, message: "No pending resources to approve." },
{ status: 200 },
);
}

const { error: updateError } = await adminClient
.from("resources")
.update({ status: "active" })
.eq("id", resource.id);

if (updateError) {
return NextResponse.json(
{ error: "Failed to approve resource.", details: updateError.message },
{ status: 500 },
);
}

return NextResponse.json(
{
ok: true,
id: resource.id,
title: resource.title,
previousStatus: resource.status,
newStatus: "active",
},
{ status: 200 },
);
}

21 changes: 21 additions & 0 deletions frontend/app/api/catalog/default-catalog-terms.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
/**
* Default catalog terms (2026-2028). Used when catalog_terms table is empty or unavailable.
* Single source of truth for the backend; DB table overrides when populated.
*/
export type CatalogTermItem = {
id: string;
label: string;
term: string;
year: number;
};

export const DEFAULT_CATALOG_TERMS: CatalogTermItem[] = [
{ id: "default-fall-2026", label: "Fall 2026", term: "Fall", year: 2026 },
{ id: "default-winter-2027", label: "Winter 2027", term: "Winter", year: 2027 },
{ id: "default-spring-2027", label: "Spring 2027", term: "Spring", year: 2027 },
{ id: "default-summer-2027", label: "Summer 2027", term: "Summer", year: 2027 },
{ id: "default-fall-2027", label: "Fall 2027", term: "Fall", year: 2027 },
{ id: "default-winter-2028", label: "Winter 2028", term: "Winter", year: 2028 },
{ id: "default-spring-2028", label: "Spring 2028", term: "Spring", year: 2028 },
{ id: "default-summer-2028", label: "Summer 2028", term: "Summer", year: 2028 },
];
4 changes: 3 additions & 1 deletion frontend/app/api/catalog/terms/route.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import { NextResponse } from "next/server";
import { headers } from "next/headers";
import { createClient } from "@/utils/supabaseServerClient";
import { DEFAULT_CATALOG_TERMS } from "../default-catalog-terms";

type TermRow = {
id: string;
Expand Down Expand Up @@ -46,12 +47,13 @@ export async function GET() {
return NextResponse.json({ error: error.message }, { status: 500 });
}

const terms = (rows ?? []).map((t) => ({
const fromDb = (rows ?? []).map((t) => ({
id: t.id,
label: t.label,
term: t.term,
year: t.year,
}));

const terms = fromDb.length > 0 ? fromDb : DEFAULT_CATALOG_TERMS;
return NextResponse.json({ terms }, { status: 200 });
}
3 changes: 1 addition & 2 deletions frontend/app/api/course-submissions/route.ts
Original file line number Diff line number Diff line change
Expand Up @@ -101,8 +101,7 @@ export async function POST(request: Request) {
const resendApiKey = process.env.RESEND_API_KEY?.trim();
const fromEmail = process.env.COURSE_REQUEST_NOTIFY_FROM_EMAIL?.trim();

const hasEmailConfig = Boolean(notifyEmail && resendApiKey && fromEmail);
if (!hasEmailConfig) {
if (!notifyEmail || !resendApiKey || !fromEmail) {
const missing: string[] = [];
if (!notifyEmail) missing.push("COURSE_REQUEST_NOTIFY_EMAIL");
if (!resendApiKey) missing.push("RESEND_API_KEY");
Expand Down
24 changes: 24 additions & 0 deletions frontend/app/api/credits/route.ts
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,34 @@ export async function GET() {
return NextResponse.json({ error: voucherError.message }, { status: 500 });
}

// count uploads rewarded to this profile
const { count: uploadCount, error: uploadError } = await supabase
.from("credits_ledger")
.select("id", { count: "exact", head: true })
.eq("profile_id", user.id)
.eq("metadata->>reason", "upload_reward");

if (uploadError) {
return NextResponse.json({ error: uploadError.message }, { status: 500 });
}

// count upvotes cast by this profile
const { count: upvoteCount, error: upvoteError } = await supabase
.from("votes")
.select("id", { count: "exact", head: true })
.eq("profile_id", user.id)
.eq("value", 1);

if (upvoteError) {
return NextResponse.json({ error: upvoteError.message }, { status: 500 });
}

return NextResponse.json(
{
credits: creditData?.credit_score ?? 0,
freeDownloads: voucherCount ?? 0,
uploadCount: uploadCount ?? 0,
upvoteCount: upvoteCount ?? 0,
},
{ status: 200 },
);
Expand Down
Loading