How To Scrape And Analyze Competitor Pricing Data?

In this quickly evolving business world, staying competitive requires tracking price fluctuations regularly to ensure your strategies are updated. Know that manually gathering pricing information consumes a lot of effort and time, which cannot be afforded when you plan to grow your business.

What Is Competitor Pricing Data Scraping?

Competitor pricing data scraping is gathering real-time pricing information from the target platforms like websites, ecommerce stores, or marketplaces ethically. This process aims to help businesses monitor and analyze the prices of similar services and products their competitors offer.

It is easier to make informed decisions about pricing to maintain the right balance and boost profit margins. Also, with the industry's dynamic pricing model, you have a better chance of beating the competition by delivering affordable services to the target customer.

How To Scrape Competitor Pricing Data With Python?

The business dynamics continuously shift towards a better strategy with real-time data analysis. Here are some simple steps to scrape and compare prices of the same products on different platforms:

Project Setup

Some libraries that we require for the scraping:

httpx

This is used for sending HTTP requests to the webpages and gathering data as HTML.

parsel

This library helps in parsing HTML and collecting data using CSS and XPath selectors.

asyncio

It will ensure that your scrapers run asynchronously, which boosts the speed of web scraping.

loguru

It helps in monitoring and logging the competitor price tracker.

As asyncio is pre-installed in Python, it is essential to install the rest of the libraries using this command:

pip install httpx parsel loguru

Web Scraping Prices

We will be scraping data from three competitors, BestBuy, Walmart, and Amazon, to compare PlayStation 5 prices. The keyword for each product search will be “PS5 Digital Edition.” Let us start scraping the data from Walmart.

Scraping Walmart

import urllib.parse
import asyncio
import json
from httpx import AsyncClient, Response
from parsel import Selector
from typing import Dict, List
from loguru import logger as log

# create an HTTP client with headers that look like a real web browser
client = AsyncClient(
    headers={
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.9,lt;q=0.8,et;q=0.7,de;q=0.6",
    },
    follow_redirects=True,
    http2=True
)

async def scrape_walmart(search_query: str) -> List[Dict]:
    """scrape Walmart search pages"""

    def parse_walmart(response: Response) -> List[Dict]:
        """parse Walmart search pages"""
        selector = Selector(response.text)
        data = []
        product_box = selector.xpath("//div[@data-testid='item-stack']/div[1]")
        link = product_box.xpath(".//a[@link-identifier]/@link-identifier").get()
        title = product_box.xpath(".//a[@link-identifier]/span/text()").get()
        price = product_box.xpath(".//div[@data-automation-id='product-price']/span/text()").get()
        price = float(price[price.find("$")+1: -1]) if price else None
        rate = product_box.xpath(".//span[@data-testid='product-ratings']/@data-value").get()
        review_count = product_box.xpath(".//span[@data-testid='product-reviews']/@data-value").get()
        data.append({
                "link": "https://www.walmart.com/ip/" + link,
                "title": title,
                "price": price,
                "rate": float(rate) if rate else None,
                "review_count": int(review_count) if review_count else None
            })
        return data
    
    search_url = "https://www.walmart.com/search?q=" + urllib.parse.quote_plus(search_query) + "&sort=best_seller"
    response = await client.get(search_url)
    if response.status_code == 403:
        raise Exception("Walmart requests are blocked")       
    data = parse_walmart(response)
    log.success(f"scraped {len(data)} products from Walmart")
    return data

Run The Code

async def run():
    data = await scrape_walmart(
        search_query="PS5 digital edition"
    )
    # print the data in JSON format
    print(json.dumps(data, indent=2))

if __name__=="__main__":
    asyncio.run(run())

Functions

In the above code, we defined two functions:

scrape_walmart() requests the category page and gathers HTML from Walmart.
parse_walmart() will help parse the extracted HTML to collect each product's price, title, link, review count, and rate.

Scraping Amazon

import urllib.parse
import asyncio
import json
from httpx import AsyncClient, Response
from parsel import Selector
from typing import Dict, List
from loguru import logger as log

# create HTTP client with headers that look like a real web browser
client = AsyncClient(
    headers={
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.9,lt;q=0.8,et;q=0.7,de;q=0.6",
    },
    follow_redirects=True,
    http2=True
)

async def scrape_amazon(search_query: str) -> List[Dict]:
    """scrape Amazon search pages"""

    def parse_amazon(response: Response) -> List[Dict]:
        """parse Amazon search pages"""
        selector = Selector(response.text)
        data = []
        product_box = selector.xpath("//div[contains(@class, 'search-results')]/div[@data-component-type='s-search-result']")
        product_id = product_box.xpath(".//div[@data-cy='title-recipe']/h2/a[contains(@class, 'a-link-normal')]/@href").get().split("/dp/")[-1].split("/")[0]
        title = product_box.xpath(".//div[@data-cy='title-recipe']/h2/a/span/text()").get()
        price = product_box.xpath(".//span[@class='a-price']/span/text()").get()
        price = float(price.replace("$", "")) if price else None
        rate = product_box.xpath(".//span[contains(@aria-label, 'stars')]/@aria-label").re_first(r"(\d+\.*\d*) out")
        review_count = product_box.xpath(".//div[contains(@data-csa-c-content-id, 'ratings-count')]/span/@aria-label").get()
        data.append({
                "link": f"https://www.amazon.com/dp/{product_id}",
                "title": title,
                "price": price,
                "rate": float(rate) if rate else None,
                "review_count": int(review_count.replace(',','')) if review_count else None,
            })
        return data
    
    search_url = "https://www.amazon.com/s?k=" + urllib.parse.quote_plus(search_query)
    response = await client.get(search_url)
    if response.status_code == 403 or 503:
        raise Exception("Amazon requests are blocked")   
    data = parse_amazon(response)
    log.success(f"scraped {len(data)} products from Amazon")
    return data

Run The Code

async def run():
  amazon_data = await scrape_amazon(
        search_query="PS5 digital edition"
    )
    # print the data in JSON format
    print(json.dumps(amazon_data, indent=2, ensure_ascii=False))

if __name__=="__main__":
    asyncio.run(run())

Scraping BestBuy

import urllib.parse
import asyncio
import json
from httpx import AsyncClient, Response
from parsel import Selector
from typing import Dict, List
from loguru import logger as log

# create an HTTP client with headers that look like a real web browser
client = AsyncClient(
    headers={
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.35",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.9,lt;q=0.8,et;q=0.7,de;q=0.6",
    },
    follow_redirects=True,
    http2=True
)

async def scrape_bestbuy(search_query: str) -> List[Dict]:
    """scrape BestBuy search pages"""

    def parse_bestbuy(response: Response) -> List[Dict]:
        """parse BestBuy search pages"""
        selector = Selector(response.text)
        data = []
        product_box = selector.xpath("//ol[contains(@class, 'sku-item-list')]/li[@class='sku-item']")
        product_id = product_box.xpath(".//h4[@class='sku-title']/a/@href").get().split("?skuId=")[-1]
        title = product_box.xpath(".//h4[@class='sku-title']/a/text()").get()
        price = product_box.xpath(".//div[contains(@class, 'priceView')]/span/text()").get()
        price = float(price.replace("$", "")) if price else None
        rate = product_box.xpath(".//div[contains(@class, 'ratings-reviews')]/p/text()").get()
        review_count = product_box.xpath(".//span[@class='c-reviews ']/text()").get()
        data.append({
                "link": f"https://www.bestbuy.com/site/{product_id}.p",
                "title": title,
                "price": price,
                "rate": float(rate.split()[1]) if rate else None,
                "review_count": int(review_count[1:-1].replace(",", "")) if review_count else None
            })
        return data
    
    search_url = "https://www.bestbuy.com/site/searchpage.jsp?st=" + urllib.parse.quote_plus(search_query)
    response = await client.get(search_url)
    if response.status_code == 403:
        raise Exception("BestBuy requests are blocked")   
    data = parse_bestbuy(response)
    log.success(f"scraped {len(data)} products from BestBuy")
    return data

Run The Code

async def run():
bestbuy_data = await scrape_bestbuy(
        search_query="PS5 digital edition"

    )
    # print the data in JSON format
    print(json.dumps(bestbuy_data, indent=2, ensure_ascii=False))

if __name__=="__main__":
    asyncio.run(run())

Combine Scraping Logic

In this step, we will combine all the scraping logic to use it for tracking competitors' pricing:

async def track_competitor_prices(
        search_query: str
    ):
    """scrape products from different competitors"""
    data = {}
    data["walmart"] = await scrape_walmart(
        search_query=search_query
    )
    data["amazon"] = await scrape_amazon(
        search_query=search_query
    )
    data["bestbuy"] = await scrape_bestbuy(
        search_query=search_query
    )
    product_count = sum(len(products) for products in data.values())
    log.success(f"successfully scraped {product_count} products")
    # save the results into a JSON file
    
    with open("data.json", "w", encoding="utf-8") as file:
        json.dump(data, file, indent=2, ensure_ascii=False)

async def run():
    await track_competitor_prices(
        search_query="PS5 digital edition"
    )

if __name__=="__main__":
    asyncio.run(run())

Output File

All the results will be organized in a single JSON file:

{
  "walmart": [
    {
      "link": "https://www.walmart.com/ip/5113183757",
      "title": "Sony PlayStation 5 (PS5) Digital Console Slim",
      "price": 449.0,
      "rate": 4.6,
      "review_count": 369
    }
  ],
  "amazon": [
    {
      "link": "https://www.amazon.com/dp/B0CL5KNB9M",
      "title": "PlayStation®5 Digital Edition (slim)",
      "price": 449.0,
      "rate": 4.7,
      "review_count": 2521
    }
  ],
  "bestbuy": [
    {
      "link": "https://www.bestbuy.com/site/6566040.p",
      "title": "Sony - PlayStation 5 Slim Console Digital Edition - White",
      "price": 449.99,
      "rate": 4.8,
      "review_count": 769
    }
  ]
}

Compare Pricing From Competitors

The web scraping product data process will help manually analyze the insights to understand competitors' performance. A simple monitoring function to analyze the information:

def generate_insights(data):
    """analyze the data for insight values"""

    def calculate_average(lst):
        # Calculate the averages
        non_none_values = [value for value in lst if value is not None]
        return round(sum(non_none_values) / len(non_none_values), 2) if non_none_values else None

    # Extract all products across competitors
    all_products = [product for products in data.values() for product in products]

    # Calculate overall averages
    overall_average_price = calculate_average([product["price"] for product in all_products])
    overall_average_rate = calculate_average([product["rate"] for product in all_products])
    overall_average_review_count = calculate_average([product["review_count"] for product in all_products])

    # Find the lowest priced, highest reviewed, highest priced, and highest rated products across all competitors
    lowest_priced_product = min(all_products, key=lambda x: x["price"])
    highest_reviewed_product = max(all_products, key=lambda x: x.get("review_count", 0) if x.get("review_count") is not None else 0)
    highest_priced_product = max(all_products, key=lambda x: x["price"])
    highest_rated_product = max(all_products, key=lambda x: x["rate"])

    # Extract website names for each product
    website_names = {retailer: products[0]["link"].split(".")[1] for retailer, products in data.items()}

    insights = {
        "Overall Average Price": overall_average_price,
        "Overall Average Rate": overall_average_rate,
        "Overall Average Review Count": overall_average_review_count,
        "Lowest Priced Product": {
            "Product": lowest_priced_product,
            "Competitor": website_names.get(lowest_priced_product["link"].split(".")[1])
        },
        "Highest Priced Product": {
            "Product": highest_priced_product,
            "Competitor": website_names.get(highest_priced_product["link"].split(".")[1])
        },
        "Highest Rated Product": {
            "Product": highest_rated_product,
            "Competitor": website_names.get(highest_rated_product["link"].split(".")[1])
        },                
        "Highest Reviewed Product": {
            "Product": highest_reviewed_product,
            "Competitor": website_names.get(highest_reviewed_product["link"].split(".")[1])
        }
    }

    # Save the insights to a JSON file
    with open("insights.json", "w") as json_file:
        json.dump(insights, json_file, indent=2, ensure_ascii=False)

We have introduced generate_insights function, which will calculate various metrics like:

The lowest and highest price for products.
Average price, review count, and rate.
Top product in review count and rate.

Final Output

The insight data below helps represent statistics and numbers to make it easier for analysis. Now, you can successfully compare product prices from various competitors:

{
  "Overall Average Price": 449.33,
  "Overall Average Rate": 4.7,
  "Overall Average Review Count": 1219.67,
  "Lowest Priced Product": {
    "Product": {
      "link": "https://www.walmart.com/ip/5113183757",
      "title": "Sony PlayStation 5 (PS5) Digital Console Slim",
      "price": 449.0,
      "rate": 4.6,
      "review_count": 369
    },
    "Competitor": "walmart"
  },
  "Highest Priced Product": {
    "Product": {
      "link": "https://www.bestbuy.com/site/6566040.p",
      "title": "Sony - PlayStation 5 Slim Console Digital Edition - White",
      "price": 449.99,
      "rate": 4.8,
      "review_count": 769
    },
    "Competitor": "bestbuy"
  },
  "Highest Rated Product": {
    "Product": {
      "link": "https://www.bestbuy.com/site/6566040.p",
      "title": "Sony - PlayStation 5 Slim Console Digital Edition - White",
      "price": 449.99,
      "rate": 4.8,
      "review_count": 769
    },
    "Competitor": "bestbuy"
  },
  "Highest Reviewed Product": {
    "Product": {
      "link": "https://www.amazon.com/dp/B0CL5KNB9M",
      "title": "PlayStation 5 Digital Edition (slim)",
      "price": 449.0,
      "rate": 4.7,
      "review_count": 2521
    },
    "Competitor": "amazon"
  }
}

What Are The Challenges Of Scraping Competitor Pricing Data?

Competing in this dynamic market comes with hurdles and requires advanced solutions to handle them efficiently. Here are some common challenges you might face while extracting and analyzing competitor pricing data:

Real-Time Data

Pricing on ecommerce websites will change frequently based on the stock level, demand changes, and competitive pricing. It makes gathering information in real-time or often even in a few hours technically challenging.

Complex Page Structure

Product pages on online stores have various information beyond just the pricing, including product descriptions, reviews, related products, ratings, and more. This requires an advanced scraper to identify and extract the specific data per your requirements.

Price Changes

Many ecommerce sellers use dynamic pricing, where costs will change based on browsing history, location, active time zone, and market fluctuations. It is difficult to account for these changes and make an updated pricing model for a profitable business.

Location-Sensitive

The price changes depend on the geographic location due to various tax rates, regional strategies, and shipping costs. This makes it essential to have scrapers that simulate being in different locations using VPNs or proxy servers.

Product Variations

Ecommerce platforms often sell the same products in different variations like size, color, packaging, or seller with their prices. Significantly, your competitor's data scraping tool captures the accurate information to offer the correct prices.

What Are The Benefits Of Analyzing Extracted Competitor Pricing Data?

Competitor price data scraping has become essential for businesses looking to effortlessly beat the competition and gain better returns. Here are some reasons you must invest in scraping pricing data from your competitors:

Informed Pricing Model

Know competitor prices to set the ideal pricing model to attract the target audience’s attention. If a competitor constantly offers deals, your business can seize the opportunity to provide attractive discounts.

Trend Analysis

Gathering pricing information for a particular period will help determine the common patterns and seasonal changes. Using advanced professional scraping tools, this data analysis will help business owners anticipate seasonal changes and adjust pricing accordingly.

Stock Management

Monitoring the stock levels of the competitors will help you set the right price in your inventory. Maintaining the right balance of popular and less demanding products becomes effortless while delivering the best customer service.

What’s Next?

We have shared essential insights into scraping and analyzing competitor pricing data with the help of professionals. At Scraping Intelligence, you can access advanced technologies and the latest strategies to gather the newest information from your competitors.

Know that web scraping is a powerful solution for performing competitive analysis while extracting valuable and updated information from the target websites. We respect data privacy and terms of service to ensure our ethical responsibilities while extracting information.

Explore More

Get In Touch

How To Scrape And Analyze Competitor Pricing Data?

What Is Competitor Pricing Data Scraping?

How To Scrape Competitor Pricing Data With Python?

Project Setup

httpx

parsel

asyncio

loguru

Web Scraping Prices

Scraping Walmart

Run The Code

Functions

Scraping Amazon

Run The Code

Scraping BestBuy

Run The Code

Combine Scraping Logic

Output File

Compare Pricing From Competitors

Final Output

What Are The Challenges Of Scraping Competitor Pricing Data?

Real-Time Data

Complex Page Structure

Price Changes

Location-Sensitive

Product Variations

What Are The Benefits Of Analyzing Extracted Competitor Pricing Data?

Informed Pricing Model

Trend Analysis

Stock Management

What’s Next?

Headquarter

+1 281 899 0267

info@websitescraper.com

Incredible Solutions After Consultation

Get In Touch

About Us

Services

Solutions

Company