Maintain Profitability with Returns and Refunds — An SQL Exercise

Ryan Lin
5 min readAug 2, 2021

Product returns have always been a major expense for e-commerce companies, with the typical return rate easily reaching 30 to 40% during the holiday season. Furthermore, returns are a tremendous strain on resources and customer support teams, and they cause pollution to our environment due to masses of discarded items and packaging.

So, how can we decrease returns and improve reverse logistics?

In this exercise, we will perform some data analysis on a specific dataset in the hopes of identifying initiatives or improvements to the process or policy.

Context Of The Dataset

Here will be the dataset used for this exercise — Pakistan Largest Ecommerce Dataset¹
It contains ~ 1 million records with columns — shipping method, payment method and order status etc.

Fig 1. Highlighted the few notable columns

Data Wrangling

1. Clean up on columns, dropping the last 5 columns since it contained only NULL value

ALTER TABLE [dbo].[EcomDataset]
DROP COLUMN [column22], [column23], [column24], [column25], [column26];

2. Drop rows with column that contains only NULL value

DELETE FROM [dbo].[EcomDataset]
WHERE [item_id] IS NULL
AND [status] IS NULL
AND [created_at] IS NULL
AND [sku] IS NULL
AND [price] IS NULL
AND [qty_ordered] IS NULL
AND [grand_total] IS NULL
AND [increment_id] IS NULL
AND [category_name_1] IS NULL
AND [sales_commission_code] IS NULL
AND [discount_amount] IS NULL
AND [payment_method] IS NULL
AND [Working_Date] IS NULL
AND [BI_Status] IS NULL
AND [MV] IS NULL
AND [Year] IS NULL
AND [Month] IS NULL
AND [Customer_Since] IS NULL
AND [M_Y] IS NULL
AND [FY] IS NULL
AND [Customer_ID] IS NULL;

3. Cast columns to the correct data type

ALTER TABLE [dbo].[EcomDataset]
ALTER COLUMN [Customer_ID] VARCHAR(50);

ALTER TABLE [dbo].[EcomDataset]
ALTER COLUMN [item_id] VARCHAR(50);

ALTER TABLE [dbo].[EcomDataset]
ALTER COLUMN [qty_ordered] INT;

ALTER TABLE [dbo].[EcomDataset]
ALTER COLUMN [Year] INT;

ALTER TABLE [dbo].[EcomDataset]
ALTER COLUMN [Month] INT;

Exploratory Data Analysis

A. Product categories that top the following status: order_refunded and refund

Fig 2. No. of orders with status “order_refunded”, categorized by product categories
Fig 3. No. of orders with status “refund”, categorized by product categories
SELECT [category_name_1], [status], COUNT([status]) AS 'status_cnt'
, DENSE_RANK() OVER
(PARTITION BY [status]
ORDER BY COUNT([status]) DESC) AS 'status_rnk'
FROM [dbo].[EcomDataset]
GROUP BY [category_name_1], [status]
HAVING [status] IN ('order_refunded', 'refund')
ORDER BY 'status_rnk', [status]

From the plot, Men’s fashion or fashion, in general, had the most refund requests for its orders. It may seem apparent, but a product with high-quality images, a thorough description, and even short films showcasing the product in 360° are essentials in reducing returns and refunds. Implementing a simple and easy-to-use sizing guide and fitting tools is also critical to ensure customers obtain the correct product fit.

B. The best selling product and its constitution to the following status: order_refunded and refund

Fig 4. Top 10 selling products and its average selling price
Fig 5. Top 10 selling products and its count of orders with status “order_refunded” and “refund”
SELECT DISTINCT TOP 10 [sku], [category_name_1]
, SUM([qty_ordered]) OVER (PARTITION BY [sku]) AS 'ordered_cnt'
, (
SELECT ISNULL(SUM([qty_ordered]), 0) FROM [dbo].[EcomDataset]
WHERE [sku] = A.[sku] AND [status] = 'order_refunded'
) AS 'order_refunded_cnt'
, (
SELECT ISNULL(SUM([qty_ordered]), 0) FROM [dbo].[EcomDataset]
WHERE [sku] = A.[sku] AND [status] = 'refund'
) AS 'refund_cnt'
FROM [dbo].[EcomDataset] A
WHERE [status] IN ('complete', 'paid')
ORDER BY 'ordered_cnt' DESC

The best-selling products constitute mostly inexpensive items. For low-priced items or large ones that would incur hefty shipping fees, it is typically cheaper to refund the purchase price and allow buyers to retain the items, especially when the product cannot be replenished or resold, or when the expense of enabling returns is significant.

C. Customers with a high refund rate, are they also the most profitable customers?

Fig 6. Total refund cost by customers. Highlighted (in gold) are the top spenders.
Fig 7. (Continues) Total refund cost by customers. Highlighted (in gold) are the top spenders.
SELECT DISTINCT A.[Customer_ID]
, ROUND(ISNULL(SUM(A.[grand_total]) OVER (PARTITION BY A.[Customer_ID]), 0), 2) AS 'refund_total'
, B.TOTAL
, DENSE_RANK() OVER (ORDER BY B.TOTAL DESC) AS 'RNK_TOP_CUST'
FROM [dbo].[EcomDataset] A
INNER JOIN (
SELECT DISTINCT [Customer_ID], SUM([grand_total]) OVER (PARTITION BY [Customer_ID]) AS 'TOTAL'
FROM [dbo].[EcomDataset]
WHERE [status] IN ('complete', 'paid')
) B ON A.[Customer_ID] = B.[Customer_ID]
WHERE [status] IN ('order_refunded', 'refund')
ORDER BY 'refund_total' DESC

In terms of the top 100 customers who did the most refund, value wise, approximately 44 percent are among the top 100 spenders, and depending on how we analyze this — high spenders naturally incur a higher percentage of the refund — we may want to look into lowering the refund rate for this group of customers to increase business revenue.

Alternatively, emphasize the most remarkable customer experience regarding refund procedure to encourage more lavish spending overall for consumers. Companies like ASOS are a fantastic example where they take pleasure in offering a tremendous returns experience.

Conclusion

Overall, while we were limited by the availability of data for this study, such as information on the reasons for returns or refunds, we discovered some intriguing facts that can be based on to begin adjustments to the process and policy to match company profitability.

Awesome!

I appreciate your time in reading this paper, and I know you’re busy, so I’ll try to make it concise and simple to understand. I hope you find this study informative in terms of my data abilities, and I’d be happy to discuss it more and hear your thoughts.

Applied Software:
Azure Data Studio 1.29.0
Tableau Public 2021.1
Microsoft Excel 16.51 (Build 21071101)
Google Doc

¹The dataset contains detailed information of millions of e-commerce orders in Pakistan from March 2016 to August 2018. https://www.kaggle.com/zusmani/pakistans-largest-ecommerce-dataset

--

--

Ryan Lin
0 Followers

Possess great enthusiasm to be a rockstar in product management. Interested in telling stories from my learnings. http://www.imryanlin.com/