Back to appREADMECase StudyPipeline DiagramTechnical Notes

Pipeline Diagram

The Indie Cini

Hosted on Render • Public URL • Code private

Data Collection

Python Ingestion

Selenium • BeautifulSoup • Requests

• Cinema venue pages
• Metacritic review pages
• Robust logging & validation

Automation

Render • Cron

• Scheduled daily runs
• Remote execution

Storage

Object Storage

AWS S3

• CSV / PKL intermediate outputs
• Select raw HTML archives

SQL Database

PostgreSQL (Render)

• Ingested relational tables
• Canonical current-state warehouse
• Append-only usage event storage

Transformation

Modeling Layer

dbt

• Staging, intermediate, marts
• Standardized venue + review data
• Join keys + review deduplication
• Frontend-facing marts

Frontend

Frontend UI

Next.js • React

• Screening calendar
• Review dashboard
• Filters & highlighting
• Fast, responsive UX

User Events

Usage Analytics

Next.js API route

• User interaction events
• Page views + outbound clicks
• Stored inanalytics_events

Collection

  • • Venue + Metacritic scraping
  • • Rate limiting + polite crawling
  • • Defensive checks + structured logs
  • • Automated daily runs via Render cron job

Transformation

  • • dbt staging + intermediate + marts
  • • Surrogate key construction
  • • Review deduplication + aggregation
  • • dbt tests for integrity assumptions

Frontend

  • • Calendar + review dashboard
  • • Dynamic filters + highlighting
  • • Responsive React interface
  • • Backend-driven data serving

Technologies & Tools

Python
Scrapers & ingestion
PostgreSQL
Warehouse
dbt
Modeling layer
Next.js
Frontend
Render & S3
Cloud deployment