37

Scribe

AI-powered cold emails for research positions, with real citations and anti-hallucination validation

Scribe Demo

Three years ago, when the GPT API released, I wrote a scrappy Python script to help myself cold email professors. It worked well enough that friends started asking for it. One of them, Gurnoor, used it to land research positions at Harvard and Stanford. He now works for a Nobel Laureate at Berkeley. That script stayed the same until I nuked the codebase and rebuilt the entire thing from scratch.

Scribe is now a full platform. Upload your resume and it generates a template for you. Add placeholders like {{professor_most_recent_paper}} to tell the system exactly what to research. Submit up to 100 professors at once and watch the queue work through them.

Over 100 students have used it to break into research labs.

Architecture

I'm genuinely proud of how the architecture came together. The system runs a 4-step AI pipeline that transforms a simple template into a personalized email. The key step is the placeholders in the email template (like professor_most_recent_paper) that guide the system on where to search and what information to extract. These are the four steps in the pipeline.

  1. Template Parser: Analyzes your template, extracts search terms, and classifies the type as RESEARCH, BOOK, or GENERAL. Each type triggers different pipeline behavior downstream
  2. Web Scraper: Exa Search runs two queries, one for background (affiliations, bio) and one for publications. Playwright renders the pages, then a two-tier summarization system condenses everything without losing details
  3. ArXiv Enricher: If the template is RESEARCH type, pulls the professor's actual papers and scores them for relevance. Skipped entirely for BOOK and GENERAL.
  4. Email Composer: Writes the final email, validated up to 3 times to catch hallucinations

The whole thing is stateless. All pipeline data lives in memory and only hits the database once at the very end. All workers scale horizontally without fighting over database locks, and Logfire captures the entire execution trace for debugging.

FastAPI handles requests, which go into a database-backed queue and get picked up by Celery workers running the pipeline asynchronously. This prevents HTTP timeouts since generation takes 10-25 seconds, and because the queue lives in the database rather than memory, it survives worker restarts and tab closes.

The anti-hallucination system was the trickiest part. The scraper tags every extracted fact with a [PAGE X] marker tracking which source it came from. Facts that only appear on one page get flagged [UNCERTAIN]. A synthesis step then uses chain-of-thought reasoning to cross-verify before anything makes it into the email. Then the email composer runs validation loops that check whether the generated email actually references the professor's real work before saving.

Deployment

The backend runs on a Raspberry Pi 3B+ at home, exposed through a Cloudflare Tunnel. The 1GB RAM constraint is why the pipeline is stateless and processes one task at a time.

If you are interested in how this was implemented, here is the repo: https://github.com/Mishra-Manit/raspberry-pi-hosting

Scribe is open source. If you're curious about how the AI pipeline works or want to contribute, check out the GitHub repository.

Try It

If you're a student looking to break into research, or know someone who is: scribe.manitmishra.com

Thanks to credit grants, Scribe is free. No paywalls. Built for students, by a student.