This last summer, I did an internship for Lands’ End with the Global Inventory Strategy group. At Lands’ End, they plan on starting a line of clothing at full price, and then periodically marking down the price throughout the season. This is to ensure they move through their entire lot of inventory in the time period they have allocated. Right now, the decision about when and how much the markdowns should be is entirely arbitrary based on past experience, but they’d like to introduce more data analysis into the decision. The ability to optimize the markdown schedule will potentially make a difference of tens of millions of dollars per year.

Lands’ End has been working off-and-on for the last few years trying (unsuccessfully) to create a demand model showing price elasticities. This will tell them if they change the price by X% that the sales units should change by Y%. They have amassed an enormous amount of sales data, but there are so many factors that affect demand that they’ve been unable to build a model with the data they have (*e.g.*, time of year, time of month, temperature, marketing efforts, email campaigns, ad campaigns, competitor behaviors, clothing attributes, etc.). They looked at purchasing a commercial piece of software to accomplish this, but with a price tag of $1M, they didn’t have the budget for it. That’s where I come in…

I first brought in over 260M records of sales data. I then wrote a script to scrape historical weather data from Wunderground.com, and used Topo.ly to match the weather stations to the nearest zip code. I then pulled all this weather-data-by-zip-code into my massive data set. I then took all the marketing data and pulled in media drop dates, catalog circulation quantities, etc. Since I was only interested in price elasticity of demand and not actual demand, I had to come up with some way I could compare change in quantity of a $5 item with a $100 item. So I normalized the prices for each item by it’s average sales price over the last 6 years and thus created a price index. I also had to perform a bunch of other data manipulations to get all the variables I wanted to consider.

With this massive data set (over 100GB of data), I built a multivariate regression script to perform a multi-threaded regression on all 260M data points with all 60 variables. I looked at all first- and second-order variables and all first-order interactions, giving over 2000 inputs to the model. I then checked things like goodness-of-fit, correlation, heteroskedasticity, statistical significance, etc. The regression used least-squares, and I had three different regression algorithms (linear algebra, simulated annealing optimization, and quasi-Newton optimization). I had to run this script using a massive server with 128GB of memory and 16 CPU cores, which I essentially maxed out for about a week.

Upon completion of my regression, I then built a Markdown Price Optimization Tool (as in the screenshot above) that would import this statistical model, convert it from being based on marginal sales data to a cumulative overall demand curve, index the output quantities, and then perform an optimization on the actual markdown schedule. To use the tool, I input the clothing category, projected future media drops, average selling price, the quantity I have in inventory, the start date, and the number of weeks I have left to move through my inventory. I then click the “Optimize” button, and the tool performs an optimization that tells me how much and when I should markdown the price in order to maximize my revenue while still ensuring I move through all my inventory in the time allotted. The output is given in 5 different charts (weekly price, weekly, revenue, cumulative revenue, weekly quantity, and inventory level) and in a data table that can be copied to Excel.

This is a really fun and very data-intensive project to work on. I’m quite pleased with the result. I had the privilege of showing it to the vice president that oversees the planning teams, and nobody had ever seen him so excited about anything before. It has the potential to make a difference of tens of millions of dollars.