# CitedRank

> CitedRank is a free, no-signup web toolkit for marketers, SEO consultants, and AI engineers who need to understand any website quickly. It bundles six tools — sitemap discovery, single-page content extraction, on-page SEO audit, generative-engine-optimization (GEO) readiness check, design system extraction, and structured-field extraction — into one URL-paste interface. Built for the AI-search era: every tool outputs LLM-friendly Markdown or JSON.

## Platform overview

CitedRank runs every tool on the same fast backend (Crawl4AI + Scrapling),
shares a unified page cache across single-page tools (Crawl, SEO, Clone) so
running any one of them on a URL makes the others sub-200ms, and exposes
the same functionality via REST API and the UI.

## Tools

### Sitemap — https://citedrank.co/sitemap-extractor

The free sitemap extractor that pulls every URL from any XML sitemap — even the hidden paths your sitemap.xml viewer misses.

**Output:** URL list.  
**Scope:** whole domain.

### Crawl — https://citedrank.co/crawl

Scrape any webpage to clean Markdown — for RAG, AI agents, and offline archiving. Plus image URLs and MHTML snapshots in one paste.

**Output:** CONTENT — text + images.  
**Scope:** single URL.

### SEO — https://citedrank.co/seo-checker

The free SEO checker that extracts every SEO signal from any page — title, meta, schema, headings, Open Graph, hreflang, link graph, media, content, tech.

**Output:** AUDIT — title / meta / schema / links.  
**Scope:** single URL.

### GEO Audit — https://citedrank.co/geo-audit

The free GEO audit that scores your AI search visibility — for ChatGPT, Perplexity, Gemini, and Google AI Overviews, before competitors steal the citations.

**Output:** GEO audit report.  
**Scope:** single URL.

### Clone — https://citedrank.co/clone

Free design system extractor — pull colors, fonts, CSS variables, and Tailwind config from any website in 15 seconds.

**Output:** design system (colors / fonts / CSS).  
**Scope:** single URL.

### Extract — https://citedrank.co/extract

The free AI web scraper that turns any URL into structured JSON — pick a template or write CSS selectors, no code.

**Output:** structured JSON (rows + fields).  
**Scope:** single URL.

## Use cases

- **Competitive content audit** — feed a competitor URL to Crawl + SEO, get their content as Markdown plus their meta-tag strategy in one pass.
- **AI search visibility check (GEO)** — see whether ChatGPT, Perplexity, or Google AI Overviews can extract clean facts from a page; identify missing JSON-LD, FAQ schema, definition-first paragraphs.
- **RAG ingestion** — Crawl a marketing or docs site to clean Markdown for vector-database ingestion. MHTML offline snapshots include images for multimodal RAG.
- **Design system extraction** — Clone any production site to its core colors, fonts, and CSS variables; export as Tailwind config or CSS variables.
- **Structured data scraping** — Extract product cards, search results, or Reddit threads as JSON via CSS selectors. Built-in templates for Hacker News, GitHub Trending, Product Hunt, Reddit, generic blogs.
- **Sitemap discovery for migrations** — find every URL on a legacy site before a redesign so no page is lost to 404s.

## Pricing

CitedRank is currently free during development. No signup required for sitemap discovery; the other tools accept a free API token retrieved via Google sign-in.

## FAQ

### What is GEO (Generative Engine Optimization)?

GEO is the practice of structuring web content so AI search engines — ChatGPT, Claude, Perplexity, Google AI Overviews, Bing Copilot — can extract, summarize, and cite it accurately. It overlaps SEO but emphasizes machine-readable signals: JSON-LD schema (FAQPage, HowTo, Article), llms.txt presence, definition-first paragraphs, numbered HowTo steps, and tabular data over prose.

### What is the difference between Crawl and SEO in CitedRank?

Crawl returns the **page's content** — clean Markdown body plus image URLs — for AI ingestion. SEO returns an **audit report** — title length, meta description, headings hierarchy, schema types, link graph — for diagnosis. They share a backend cache: running either populates the other instantly.

### What is the difference between Crawl and Extract?

Crawl gives you the **whole page as Markdown** (one large text blob). Extract gives you **structured JSON rows** by applying CSS selectors to repeating items — e.g., 30 product cards as `[{title, price, rating}, ...]`. Use Crawl for content; Extract for tabular data.

### What is llms.txt?

llms.txt is an emerging convention (https://llmstxt.org) — a Markdown file at `/llms.txt` that gives LLM crawlers structured context about a site. CitedRank's GEO tool flags missing llms.txt as a fixable issue.

### Can I use CitedRank from Claude Code, Cursor, or Claude Desktop?

Yes. Three ways:

1. **MCP server** at `https://citedrank.co/mcp` — Streamable HTTP transport. Add to Cursor / Continue / Cline config and the agent gets 5 tools (sitemap, crawl, seo, clone, extract) registered automatically. For Claude Desktop wrap with `npx mcp-remote`.
2. **Claude Code skill** at `https://citedrank.co/skills/citedrank/SKILL.md` — single-file integration spec.
3. **Raw REST API** — see the API section below.

## API

- POST `https://citedrank.co/api/sitemap?domain=X` — discover URLs
- POST `https://citedrank.co/api/fetch_one` body `{url, refresh?}` — Crawl single page
- POST `https://citedrank.co/api/seo` body `{url, refresh?}` — SEO audit
- POST `https://citedrank.co/api/clone` body `{url, refresh?}` — design system extract
- POST `https://citedrank.co/api/extract` body `{url, item_selector, fields[]}` — structured fields
- All endpoints require `Authorization: Bearer <token>` (free, get via sign-in)

## Links

- Home: https://citedrank.co
- MCP server: https://citedrank.co/mcp
- Claude Code skill: https://citedrank.co/skills/citedrank/SKILL.md
- Sitemap: https://citedrank.co/sitemap.xml
- robots.txt: https://citedrank.co/robots.txt

_Last updated: 2026-05-20._