bdata login, then bdata scraper create and bdata scraper run. Run it on demand with npx, so there is nothing to install. This tutorial walks you through building a Hacker News top-stories scraper end to end. The CLI runs unchanged inside the embedded terminal of any coding agent like Claude Code, Cursor or Codex.
Time to complete: about 10 minutes (AI generation runs in the background)
Prerequisites
- A Bright Data account (sign up free, no card required)
- A terminal. Any embedded terminal works too: Claude Code, Cursor, Codex, VS Code
Run the Bright Data CLI with npx
You do not install anything. Run the CLI on demand withnpx, which fetches the latest version each time:
bdata for your shell session. Otherwise, prefix any bdata command below with npx -p @brightdata/cli:
@brightdata/cli on npm. The bdata and brightdata commands are interchangeable. Prefer a permanent command instead of npx? Install it globally with npm install -g @brightdata/cli.
Build your first scraper from the terminal
Log in
Run
bdata login. The CLI opens a browser tab so you can authorize it against your Bright Data account, then stores your API key locally. You do not paste or copy a key.Expected result:The two zones (
cli_unlocker and cli_browser) are the Web Unlocker API and Browser API endpoints the CLI uses when running scrapers. Bright Data creates them automatically on first login.Create the scraper
Pass a target URL and one sentence describing the data you want. Bright Data’s AI Agent generates the output schema, writes the scraper code and returns a Collector ID.The AI pipeline runs in seven stages, printed live:
user_intent_analyzer, planner, collector_maintainer, output_schema_generator, code_generator, input_schema_generator, preview_runner and preview_picker. Typical wall-clock time is 5 to 15 minutes; complex targets can take up to 25 minutes.Expected result:Save the Collector ID (the
c_* string). It is the stable handle for every subsequent run, schedule or API call on this scraper.Run the scraper
Pass the Collector ID and a URL. Use The CLI tries realtime mode first. If the scraper triggers more pages than the realtime limit allows, the CLI silently falls back to batch mode (
--pretty to format the JSON output.POST /dca/trigger then poll GET /dca/dataset) and continues. No flag needed.Expected result: a JSON array, one row per result.
How do I use this from Claude Code, Cursor or Codex?
The Bright Data CLI runs inside any embedded terminal as-is. The coding agent is not building the scraper itself; the CLI calls Bright Data’s AI Agent, and the coding agent calls the CLI on your behalf. Two integrations make the CLI feel native inside a coding agent: Pin the Collector ID in the agent’s rules file so the agent re-uses your scraper across sessions instead of building a fresh one every time:CLAUDE.md CODEX.md
brightdata add mcp. The MCP server is separate from the Scraper Studio CLI but gives the agent additional scraping tools (scrape_as_markdown, search_engine and others) it can call directly:
What just happened?
Three CLI commands mapped to four Bright Data Scraper Studio API endpoints. Use this table to translate the CLI flow into raw HTTP calls when you are ready to integrate without the CLI:| You ran | Bright Data API endpoint behind it |
|---|---|
bdata login | Local credential store. Stores the API key from Account Settings. |
bdata scraper create | POST /dca/collector then POST /dca/collectors/{c_*}/automate_template |
bdata scraper run (small input) | POST /dca/trigger_immediate then GET /dca/get_result |
bdata scraper run (large input) | POST /dca/trigger then poll GET /dca/dataset?id=j_* |
How do I fix a scraper when the site changes?
When a target site is redesigned and a scraper starts returning null or missing fields, fix it in place withbdata scraper heal. Self-healing keeps the same Collector ID, so every trigger, schedule and integration that references the scraper keeps working. The flow is run, inspect, heal, approve, re-run.
Heal the scraper
Pass the Collector ID and a plain-language description of what broke. Keep the prompt under 1,000 characters.By default,
heal stops at an approval gate so you can review the proposed fix before it goes live.Expected result: the command returns an envelope withstatus: "awaiting_approval"and apreview_resultshowing sample output from the proposed fix, plus anext_stephint with the command to run next.
Approve or reject the fix
Review the preview. If it looks right, approve it. The fix commits to the existing scraper and the Collector ID does not change.To discard the proposed fix and try a sharper prompt instead, reject it:
Expected result: on approval,statusadvances todone. On reject, the scraper is left unchanged so you can runhealagain with a clearer prompt.
Frequently asked questions
Why did `bdata scraper create` take longer than 10 minutes?
Why did `bdata scraper create` take longer than 10 minutes?
AI generation timing depends on target complexity. Simple single-page scrapers finish in 5 to 10 minutes. Pages with lazy-load, pagination or anti-bot challenges can take 15 to 25 minutes. The CLI polls Bright Data’s AI Flow API every five seconds and prints the current stage, so you can leave it running and check back. No action is needed while you wait.
Why did the CLI switch from realtime to batch mode mid-run?
Why did the CLI switch from realtime to batch mode mid-run?
Realtime mode caps the number of page loads per request. When a scraper triggers more pages than the realtime limit allows, the CLI prints
Realtime page limit exceeded, switching to batch mode..., submits the same inputs to POST /dca/trigger, and polls GET /dca/dataset?id=j_* until the snapshot is ready. The switch is automatic and the final JSON shape is identical. See Scraper Studio specifications for the page-load limits.Why are some rows missing fields like `points` or `comment_count`?
Why are some rows missing fields like `points` or `comment_count`?
The AI Agent’s generated schema is per-row best-effort, not strict. Jobs posts, “Show HN” entries and very new submissions on Hacker News do not always have a points or comment count yet, so the scraper returns the row with those fields omitted rather than inventing a value. Treat missing fields as
null in your own code. To enforce a stricter schema, open the scraper in Scraper Studio or rewrite the schema with the Self-Healing tool.Can I trigger this scraper from my own code instead of the CLI?
Can I trigger this scraper from my own code instead of the CLI?
Yes. The Collector ID returned by
bdata scraper create (the c_* string) is the same handle the Bright Data Scraper Studio API uses. Pass it to POST /dca/trigger from any HTTP client. See the Bright Data Scraper Studio API quickstart for cURL, Python and Node.js examples.How do I fix the scraper when the target site changes?
How do I fix the scraper when the target site changes?
Fix it in place with
bdata scraper heal, which keeps the same Collector ID. See How do I fix a scraper when the site changes? above for the full run, heal, approve, re-run flow. Two alternatives:- Control panel: use the Self-Healing tool to describe the fix in plain language.
- Direct API: the
healandapprovecommands wrap a three-call loop.POST /dca/collectors/{c_*}/refactor_templatewith the prompt, pollGET .../refactor_template/progressuntilstatusispending_answer, thenPOST .../resume_automation_jobto approve or reject. See Trigger Self-Healing and Resume Self-Healing Job. For a Node.js implementation, see the Scraper Studio Self-Healing demo.
Does `bdata login` work without a browser, for example in CI?
Does `bdata login` work without a browser, for example in CI?
The Copy the key from Account Settings.
bdata login command requires a browser callback. For headless environments, export your API key as BRIGHTDATA_API_KEY and the CLI uses it directly without a login step:Related
Build with the AI Agent
Build the same scraper from the Bright Data control panel instead of the terminal
Scraper Studio API quickstart
Trigger an existing scraper from cURL, Python or Node.js
Self-Healing tool
Fix a scraper with a plain-language prompt when a target site changes
Bright Data CLI overview
Every
bdata command, with examples