Scrapers FAQs - Bright Data Docs

What is the Scrapers?

The Scrapers allows users to extract fresh data on demand from websites using pre-built scrapers. It can be used to automate data collection and integrate with other systems.

Who can benefit from using the Scrapers?

Data analysts, scientists, engineers, and developers or individuals seeking efficient methods to collect and analyze web data for AI, ML, big data applications, and more with no scraping development efforts will find Scraper APIs particularly beneficial.

How do I get started with the Scrapers?

Getting started with Scraper APIs is straightforward, once you open your Bright Data account, you will need to generate an API key from your account settings. Once you have your key, you can refer to our API documentation for detailed instructions on making your first API call.

What is the difference between the scrapers?

Each scraper can require different inputs. There are 2 main types of scrapers:

PDP
These scrapers require URLs as inputs. A PDP scraper extracts detailed product information like specifications, pricing, and features from web pages
Discovery/ Discovery+PDP
Discovery scrapers allow you to explore and find new entities/products through search, categories, Keywords and more.

Why are there different discovery APIs for the same domain?

Each discovery API allow you to find the desired data using a different method, it can by keyword, category URL or even location

How do I authenticate with the Scrapers?

Authentication is done using an API key. Include the API key in the Authorization header of your requests as follows: Authorization: Bearer YOUR_API_KEY.

How do I customize my request and trigger it?

Once picking the API you want to run, you can customize your request using our detailed API parameters documentation, specifying the different types and expected inputs and responses.

Do you offer a free trial?

You’ll get a $2 credit to explore and test our services.

How do I test the API?

You can quickly test the product by customizing the code on the control panel (Demo video)

Pick your desired API from the variety of APIs

Enter your inputs

Enter your API key

Select your preferred delivery method

Using a webhook - update the webhook URL and copy paste the “trigger data collection” code using and run the code on your client.Using an API - fill out the needed credentials and information based on the specific setting you chose (S3, GCP, pubsub and more) and copy the code and run the code after collection ends

Copy the code and run it on your client

All of the above can also be done by free tools such as Webhook-site and PostmanWe also offer additional management APIs to acquire information about the collection status and fetch a list of all the snapshots under Management APIs tab

What data formats does the Scrapers support?

The Scrapers supports data extraction in various formats including JSON, NDJSON, JSONL and CSV. Specify your desired format in the request parameters.

What are the rates for the Scrapers?

We charge based on the number of records we delivered, you only pay for what you get, do note that unsuccessful attempts resulting from incorrect inputs by the user will still be billed. Since the failure to retrieve data was due to user input rather than Bright Data’s performance, resources were still consumed in processing the request. The rate per record depends on your subscription plan (starting from 0.7$ per 1000 records). Check our pricing plans or your account details for specific rates.

What should I do if my API key expires?

For account admins: If your API key expires, you need to create a new one in your account settings.

For account users: If your API key expires, please contact your account admin to issue a new API key.

How does Scraper APIs manage large-scale data extraction tasks?

Featuring capabilities for high concurrency and batch processing, Scraper APIs excel in large-scale data extraction scenarios. This ensures developers can scale their scraping operations efficiently, accommodating massive volumes of requests with high throughput.

How can I upgrade my subscription plan?

To upgrade your subscription plan, visit the billing section on your dashboard account and select the desired plan. For further assistance, contact our support team.

What specific use cases are Scraper APIs optimized for?

The Scraperss support a vast range of Use cases including competitive benchmarking, market trend analysis, dynamic pricing algorithms, sentiment extraction, and feeding data into machine learning pipelines. Essential for e-commerce, fintech, and social media analytics, these APIs empower developers to implement data-driven strategies effectively.

How fast is the Scrapers?

We offer real-time support for scrapers using URLs as inputs, with up to 20 URL inputs, and batch support for more than 20 inputs, regardless of the scraper type.The Scrapers delivers real-time data for up to 20 inputs per call, with response times varying by domain, ensuring fresh data without relying on cached information.Scrapers that discover new records (e.g., “Discover by keyword,” “Discover by hashtag”) generally take longer and use batch support, as the actual response times can be influenced by several factors, including the target URL’s load time and the execution duration of user-defined Page Interactions. An indication of the average response time for each scraper can be found on the specific Scraper page.

How do I cancel an API call?

You can cancel a run using the following endpoint:

curl -H "Authorization: API key" -H "Content-Type: application/json" -k "https://api.brightdata.com/datasets/v3/snapshot/SNAPSHOT_ID/cancel" -X POST

Make sure the snapshot id is the one you want to cancel.Note: If you cancel the run no data will be delivered to you and a snapshot can’t be canceled after it finished collecting

What is the difference between a notify URL and a webhook URL configurations?

The key difference between a notify URL and a webhook URL in API configurations lies in their purpose and usage:Notify URL:Typically used for asynchronous communication. The system sends a notification to the specified URL when a task is completed or when an event occurs. The notification is often lightweight and doesn’t include detailed data but may provide a reference or status for further action (e.g., “Job completed, check logs for details”).Webhook URL:Also used for asynchronous communication but is more data-centric. The system pushes detailed, real-time data payloads to the specified URL when a specific event occurs. Webhooks provide direct, actionable information without requiring the client to poll the system.Example Use Case:A notify URL might be used to inform you that a scraping job is finished. A webhook URL could send the actual scraped data or detailed metadata about the completion directly to you.

For how long a snapshot is available after i triggered a collection?

The snapshot is available for 30 days, you can retrieve the snapshot during this time period via delivery API options and the snapshot ID

Are there any limitations for specific scrapers or domains?

There are certain limitations on these platforms:

Facebook


Posts (by profile URL)
Comments
Reels

Instagram


Posts (by keyword)
Posts (by profile URL)
Comments
Reels

Media Links expiring after 24 hours.


Profiles
Posts (by keyword)
Posts (by profile URL)


Posts (by keyword)
Comments

TikTok


Profiles (by search URL)
Comments
Posts (by keyword)
Posts (by profile URL)

Quora


Posts

Vimeo


Posts(by keyword)
Posts(by URL)

X (Twitter)


Posts

YouTube


Profiles
Posts (by keyword)
Posts (by URL)
Posts (by search filters)

TikTok

Media only accessible with a generated token in the same session.

Posts are limited to amount that is shown publicly on profile (e.g. 10)

What does it mean when a snapshot is marked as empty?

When a snapshot is marked as empty, it means there are no valid or usable records in the snapshot. However, this does not imply the snapshot is completely devoid of content. In most cases, it contains information such as errors or dead pages:

Errors: Issues encountered during the data collection process, such as invalid inputs, system errors, or access restrictions.
Dead Pages: Pages that could not be accessed for reasons like 404 errors (page not found), removed content (e.g., unavailable products), or restricted access.

To view these details, you can use the parameter include_errors=true in your request, which will display the errors and information about the dead pages in the snapshot. This helps you diagnose and understand the issues within the snapshot.

How to stop a web scraper task?

You can stop a running collection by utilizing the following API call: https://docs.brightdata.com/api-reference/scrapers/management-apis/cancel-snapshot

Which domains do you provide scrapers for?

ae.comairbnb.comamazon.comapps.apple.comashleyfurniture.comasos.combalenciaga.combbc.comberluti.combestbuy.combooking.combottegaveneta.combsky.appcarsales.com.aucarters.comceline.comchanel.comchileautos.clcrateandbarrel.comcreativecommons.orgcrunchbase.comdelvaux.comdigikey.comdior.comebay.comedition.cnn.comen.wikipedia.orgenricheddata.comespn.cometsy.comexample.comfacebook.comfanatics.comfendi.comfinance.yahoo.comg2.comgithub.comglassdoor.comglobal.llbean.comgoodreads.comgoogle.comhermes.comhomedepot.cahomedepot.comikea.comimdb.comindeed.cominfocasas.com.uyinmuebles24.cominstagram.comla-z-boy.comlazada.com.mylazada.sglazada.vnlego.comlinkedin.comloewe.comlowes.commanta.commartindale.commassimodutti.commattressfirm.commediamarkt.demetrocuadrado.commontblanc.commouser.commoynat.commybobs.commyntra.comnews.google.comnordstrom.comolx.comotodom.plowler.comozon.rupinterest.compitchbook.complay.google.comprada.comproperati.com.coraymourflanigan.comrealestate.com.aureddit.comrevenuebase.aisephora.frshop.mango.comshopee.co.idsleepnumber.comslintel.comtarget.comtiktok.comtoctoc.comtokopedia.comtoysrus.comtrustpilot.comtrustradius.comunashamedcataddicts.quora.comus.shein.comventureradar.comvimeo.comwalmart.comwayfair.comwebmotors.com.brwildberries.ruworldpopulationreview.comworldpostalcode.comwww2.hm.comx.comxing.comyapo.clyelp.comyoutube.comysl.comzalando.dezara.comzarahome.comzillow.comzonaprop.com.arzoominfo.comzoopla.co.ukIf your target domain is not on this list, we can develop a custom scraper specifically for you

How can I use Bright Data to access hotel data through an API?

We don’t provide dedicated scrapers specifically for hotels, but we do offer a Booking.com scraper and the option to create a custom scraper tailored to your specific requirements.

How do I get the data I need?

Here’s a quick guide to help you get started and choose the right solution for your needs:

Option 1: Enriched, Pre-Collected Data – Explore Our Datasets Marketplace

If you’re looking for ready-to-use, high-quality data, our Datasets Marketplace is the perfect place to start. We’ve already done the heavy lifting by collecting and enriching vast amounts of data from a variety of sources. These datasets are designed to save you time and effort, so you can focus on analyzing the data and making smarter decisions.Simply browse our marketplace, find the dataset that fits your needs, and start using it right away.Option 2: Web Scrapers for Fresh and Real-Time DataIf your project requires fresh data or highly specific information that isn’t available in our Datasets Marketplace, we offer powerful tools to help you collect fresh and real-time data directly from the web. Here’s how you can get started:Pre-Built Web Scrapers We offer a wide range of pre-built web scrapers for popular websites, allowing you to collect data quickly and efficiently. These scrapers are ready to use and require minimal setup, making them a great choice for users who want to hit the ground running.Managed ServicesCan’t find your target website in our list of pre-built scrapers? No problem\! We can create a Managed Services tailored specifically to your needs. The Bright Data team of experts will work with you to design a solution that collects the exact data you’re looking for.Build Your Own ScraperFor users with JavaScript knowledge or access to developer resources, we also offer the option to build your own scraper using our Integrated Development Environment (IDE). This gives you full control and flexibility to create a scraper that meets your unique requirements.Have questions or need assistance? The Bright Data team of experts is always here to help. Let’s get started\!

How do I scrape data from google maps?

Find the “Google Maps reviews” scraper on the dashboard and choose if you want to run it as an API request or initiate it using the “No code” option from the control panel
Enter the input parameters (The place page URL and, Number of days to retrieve reviews from)
Configure the needed request parameters if using an API
Initiate the run and collect the data

How do I cancel a running snapshot?

To cancel a running snapshot, use one of the following methods:

API Request:
- Send a POST request to the endpoint: POST /datasets/v3/snapshot/cencel (playgrownd)
- Replace {snapshot_id} with the ID of the snapshot you want to cancel.
Control Panel:
- Go to the Logs tab of the scraper.
- Locate the running snapshot.
- Hover over the specific run and click the “X” to cancel it.

Both methods will stop the snapshot process if it is currently running.

Does the chatGPT scraper works with “SearchGPT” active?

Yes, Bright Data GPT scraper always works with the “Search” function active.

Can I view the code behind the scraper?

Scrapers available in the Web Scrapers Library are pre-built solutions, and their underlying code is not accessible for modification or viewing.
For those interested in seeing how scrapers work, the Web Scraper IDE provides several example templates when you create a new scraper. These examples serve as practical references to help you understand scraping techniques and build your own custom solutions.

Can i get the results directly to my machine or software while using the Scrapers?

Yes, using the Scrapers you can return the scrape data to the request point
Using the following endpoint - POST api.brightdata.com/datasets/v3/scrape
This endpoint allows you to fetch data efficiently and ensures seamless integration with your applications or workflows.

How does it works?
The API enables you to send a scraping request and receive the results directly at the request point. This eliminates the need for data retrieval or the need to send to external storage, streamlines your data collection process.

Limitations

For long collection operations the best practice is to use our tigger/ endpoing (In case the collection request is taking too long while using /scrape endpoint, you will get the
snapshot ID, which you will use to download the data once ready)

What is a dataset id and where can I find it?

A Dataset ID is a unique identifier used in Scrapers requests. It’s included in the request URL to specify which particular Web Scraper you want to access. This ID ensures that your API call retrieves data from the correct scraper in Bright Data. Here is how it is used: https://api.brightdata.com/datasets/v3/trigger?dataset_id=DATASET_ID_HEREA dataset id will look like: gd_XXXXXXXXXXXXXXXXX For example: gd_l1viktl72bvl7bjuj0You can find the exact dataset ID in two places:

In the browser URL bar when viewing a scraper page, it appears as /cp/scrapers/gd_xxx
In the Code examples panel on the scraper’s Configuration tab, it is pre-filled in the curl command, ready to copy.

Note: An id that looks like s_XXXXXXXXXXXXXXXXXXfor example: s_m7hm4et0141r2rhojq is not a dataset ID, it is a snapshot id - a snapshot is a collection of data that is collected from a single Scrapers request.

What is 'Discovery only' mode?

In Discovery-only mode, the results obtained during the discovery phase are returned as the final output and do not proceed to the PDP (Product Detail Page) stage.
For example, if an Amazon product discovery scraper is initiated in Discovery-only mode, it will return only the product URLs found during the discovery phase. When this mode is turned off, the scraper will continue to visit and extract data from each individual product page identified during discovery.

Is there an option to rerun failed snapshots?

Yes - you can rerun a snapshot using the rerun API.Example:curl —request POST —url https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}/rerun —header ‘Authorization: Bearer ’

How to retrieve your original inputs?

You can use the following API call to get your input data back:

curl -H "Authorization: Bearer TOKEN API" -H "Content-Type: application/json" "https://api.brightdata.com/datasets/v3/snapshot/sd_XXXX/input" -k

Simply replace:

YOUR_API_KEY with your actual API key
sd_XXXX with your snapshot ID

What is the difference between 'Synchronous (Real-time)' and 'Asynchronous' in the UI?

The Control Panel uses two UI labels that map directly to API endpoints:

UI Label	API Endpoint	Use when
Synchronous (Real-time)	`POST /datasets/v3/scrape`	You need instant results for 1–20 URLs
Asynchronous	`POST /datasets/v3/trigger`	You’re processing bulk URLs, discovery tasks, or large datasets

Where do I get a Snapshot ID?

Snapshot IDs are returned when you trigger a collection (POST /datasets/v3/trigger), filter a dataset (POST /datasets/filter), or via a dataset subscription. You can also list all your snapshots with GET /datasets/v3/snapshots.

What do I do with the delivery job ID in the response?

The id in the response is a delivery job ID. Use it to track delivery progress by calling GET /datasets/v3/delivery/{delivery\_id}. Poll until status is “done”.

Does the snapshot need to be in a specific status?

Yes. The snapshot must be in ready status. Check with GET /datasets/snapshots/{id} before calling deliver. Possible statuses: scheduled, building, ready, failed.

Can I deliver the same snapshot to multiple destinations?

Yes. Call this endpoint multiple times with different delivery configurations for the same snapshot ID.

What file formats are supported?

json, jsonl, and csv.

How do I split large snapshots into smaller files?

Use the batch_size parameter to set the number of records per file. Each file (batch) must stay under the 5GB hard limit. Estimate batch_size by dividing 5GB by your average record size, then start lower and adjust based on the actual file size you get.For example, if your average record is ~5KB, a batch_size of 1,000,000 records lands right at the ~5GB limit. Start at 500,000 records (~2.5GB) to stay safely under, then raise or lower batch_size based on the file size you receive.

Why did my request return a 400 error?

The most common cause is that your batch_size produces a file larger than 5GB. For example, if your average record size is ~5KB, a batch_size of 1,000,000 produces a ~5GB file that may exceed the limit. Lower your batch_size (e.g., to 100,000) and retry.

Can I compress the output?

Yes. Set compress: true to receive gzip-compressed files.

What is the maximum file size per batch?

5GB. This is a hard limit per delivered file. Use batch_size to control how many records go into each file and ensure each stays under this threshold.