Case study
Amazon Reviews ETL Pipeline
Processed 34.7M JSON review records using Pig, Hive, SSIS, SQL Server, and Power BI to create analysis-ready datasets for sentiment and product trend reporting.
Business Impact
Reduced stakeholder data preparation time by 70%.
Tools
34.7M records processed
70% data prep time reduction
Analysis-ready reporting dataset
Problem
Large-scale JSON review data was too raw and noisy for fast stakeholder analysis or reporting.
Data / Architecture
A big data ETL pipeline using Pig and Hive for processing, SSIS and SQL Server for structured loads, and Power BI for reporting.
Process
Processed nested JSON review records into structured analytical tables.
Built ETL transformations for sentiment and product trend reporting.
Created Power BI outputs for stakeholder-ready exploration.
Business Impact
Reduced preparation time and enabled faster review trend analysis from large-scale customer feedback.
