Prerequisites
- Basic JavaScript familiarity (variables, functions, async control flow)
- An active Bright Data account
Basic scraper flow
A Scraper Studio IDE scraper usually follows this flow:- The scraper receives an input, such as a URL, keyword, location, or custom value.
- Interaction code uses that input to open a page, send a request, click, scroll, paginate, or create the next stage.
- Parser code extracts structured data from the loaded page or response.
- The scraper saves records to the output dataset with
collect()orset_lines(). - The final output is delivered according to the scraper’s delivery preferences.
Inputs
Inputs are the values passed into a scraper run. A scraper might use:url— for page-based scrapingkeyword— for search or discovery flowslocation— for location-based searchescountry,date,category, or any other custom field
input object:
Interaction code
Interaction code controls how the scraper reaches the data. It can:- Navigate to a page
- Send HTTP requests
- Wait for page elements
- Click buttons
- Type into forms
- Scroll through a page
- Handle pagination
- Create additional crawl stages
navigate(input.url)opens the page from the input.wait('.product-title')waits until the expected page element appears.parse()runs the parser code.collect(data)saves the extracted record.
Parser code
Parser code extracts structured fields from the page HTML or response. Parser code commonly uses Cheerio, a jQuery-like API, to read page elements:parse() and collect()
parse() and collect() connect interaction code, parser code, and final output.
parse()
parse() runs the parser code for the current page or response and returns its result.
Example:
collect()
collect() appends one record to the output dataset.
Example:
Multi-stage scrapers
Some scrapers need more than one step to reach the final data. For example:- Start from a search page.
- Discover result pages.
- Open each result page.
- Extract final details.
next_stage() sends new input values to the next stage.
Example:
url input:
- Search results → detail pages
- Category pages → product pages
- Listing pages → profile pages
- Pagination flows
- Discovery workflows where one page creates many child pages
Parent and child crawls
When a scraper usesnext_stage(), it creates a parent-child relationship between crawls.
For example:
- The search page is the parent crawl.
- Each result page created from it is a child crawl.
Code workers and Browser workers
Scraper Studio supports two worker types.Code worker
A Code worker uses HTTP requests and raw responses. Use a Code worker when:- The data is available in the raw HTML
- The data is available from a public JSON endpoint
- The page does not require browser interaction
- You want faster and more cost-efficient scraping
Browser worker
A Browser worker uses a real headless browser. Use a Browser worker when:- The page renders data with JavaScript
- You need to click, scroll, type, or interact with the page
- You need to wait for elements to appear
- You need to capture browser network traffic
- The site requires browser-like behavior
How does Scraper Studio handle blocking and CAPTCHAs?
Scraping at scale can trigger site defenses such as:- IP blocking
- Rate limits
- CAPTCHAs
- Fingerprinting
- Bot detection
- Route requests through Bright Data proxy infrastructure
- Retry blocked requests
- Use browser-like fingerprints for Browser workers
- Support CAPTCHA-solving workflows with
solve_captcha()when applicable
How schema fits into scraper structure
The scraper code and schema work together.Input schema
The input schema defines what values the scraper can receive. Example input fields:input.
Output schema
The output schema defines what fields the scraper returns. It is usually generated from the records passed tocollect().
Example collected record:
Common scraper patterns
Single-page scraper
Use this pattern when each input maps to one final page. Example:- Product pages
- Profile pages
- Article pages
- Detail pages
Search or discovery scraper
Use this pattern when an input creates multiple result pages. Example:- Search pages
- Category pages
- Directory listings
- Marketplace discovery
Multi-page detail scraper
Use this pattern when a scraper must move from a list page to many detail pages. Example flow: Input keyword → Search page → Result URLs → Detail pages → Output records Best for:- Product discovery
- Lead generation
- Local business listings
- Review collection
- Job listings
Debugging mental model
When a scraper does not return the expected data, check the flow in order:- Input — Did the scraper receive the expected input values?
- Interaction code — Did it reach the correct page or response?
- Parser code — Did the selectors extract the expected fields?
- Stages — Did
next_stage()create the expected child pages? - Output records — Did
collect()emit the expected structure? - Schema — Does the output schema include the returned fields?
- Worker type — Does the page require Browser worker behavior?
Related
Worker types
Choose between Browser worker and Code worker
Scraper Studio functions
Full reference for interaction and parser commands
Develop a scraper
Step-by-step walkthrough of building a scraper in the IDE
Best practices
Recommended patterns for fast, reliable scrapers