Tuesday, February 18, 2014
Data is playing a more and more important role in the business world and even in our daily lives. Almost everyone is generating data at an explosive rate of growth. From waking up, your alarm clock application is gathering data of your waking up time; when you have your breakfast while reading the news from your ipad, your news application is gathering the number of clicks of news and other related information about you. This goes on and on. Companies have collected massive data and the data sets is exploding as well. So the big problem companies face now is how to deal with the data?
A typical reason for the companies being interested in those data is that they want to acquire the information about their customers thus being able to provide better service and reduce cost. The better service can be better content, better timing, full racks; reduce cost can come from reduction in inventory due to more accurate forecast, or better targeted marketing initiatives resulting a better ROI, conversion rates and so on. But still… this is still a beautiful dream and vision of many companies. Let’s take a look at why.
First, data collection. Companies try very hard to collect data from various sources. Let’s take a CPG company as our model. It collects most of customer and POS data from retailers. Additionally, it purchases data from the market. There are multiple different sources such as Nielsen data, social media data and so on. In the data collection process, there is a high possibility that the data is not ready to use. By that, I mean there are missing values, wrong values, and different types of data mixed together. Companies aren’t going to gain any insights from it.
To give an example, I have done a predictive modelling project with one of the major CPG companies. The data they provided are of very low quality: there is no clear documentation of the data; there are lots of missing values; there are lots of related data being stored separately and so on. The processing of data is with potential risk of losing values of data. Sometimes analyst have to replace missing values with the mean value of the data sets, sometimes analyst have to remove whole sets of data because obvious outliers and missing of key values. So the first problem comes from the collection process of collecting data.
Second, data utilization. In many cases, companies do not trust the data when making big decisions. This is a result of bad data quality but also it is a reason why data quality is always poor. The awareness and emphasis on data analytics is a must from high level management for success in this field. For the company mentioned above, the high level just green lighted the project with CMU team for predictive modelling experimental. It’s a good news, but still, a lot of companies have done that way back.
Third, the integration of supply chain system. Take that CPG company as an example, if the data analysis and visibility is only at the cooperate level, rather than integrated with all parts of the supply chain, the force of using the analysis will be weakened. This requires huge investment and effort to bring visibility to the whole supply chain stake holders.
My question is who will be the biggest driving force from this revolution? Is it the companies in the industry or the data analysis expertise, or the customers?