Data science teams are not so different from other teams, you might think. But why are so many projects going wrong?
A quick Google search for “why data science projects fail” makes two things immediately clear:
- You are not the only one who asks this question!
- The listed reasons are the ones you would always expect when a team in a business fails: goals have been set wrong, or the communication between the teams is not really good.
Why is it that data science teams seem particularly vulnerable to failure? A Gartner survey puts most data projects at a miss rate of nearly 85 percent , suggesting that the problem must be deep.
1. The key is communication
First and foremost, poor communication is the cause of the failure of data projects. For example, if the project intentions do not match the goals of the executives and no effort is made to reconcile different expectations within the company, there is little chance of successfully completing the project.
In addition to sometimes excessive expectations and uncoordinated goals and intentions, there is also the opposite problem: Although “Big Data” is one of the biggest buzzwords of the 2010s, data science is too often seen as something that is simply made to do so Trend not to lag behind.
2. Growth, but at what cost?
It was mentioned in a blog post on the Dataiku website: The Glassdoor portal described the data scientist as the best job in the US. In this context, the immense growth that has taken place in data-related industries in recent years must also be emphasized. Linkedin recently published a study that showed nearly tenfold growth in machine learning jobs between 2012 and 2017 and a sixfold growth in the number of jobs for computer scientists.
The recruiting process of new employees and the compilation of teams are the first hurdles to be overcome. It may be tempting to use PhD experts for the data area. But: Complementary skills and individual perspectives offer many opportunities that remain untapped if the teams are too homogenous.
3. Technological problems
Another major weakness in data projects is the use of technologies that are inappropriate for the particular project. For example, Hadoop is only useful in certain circumstances, depending on the amount of data stored, the types of data structures, and the purpose of the data. Anyone who chooses to use an SQL BI tool, in spite of bad fits, ultimately gets hurt themselves. The same goes for programming languages and other data infrastructure decisions.
In addition, many data teams are beginning to model even though they have committed the deadly sin of data analysis – the use of unclean data.
Kaggle recently conducted a survey in which nearly half of respondents stated that a significant barrier at work was contaminated data. Models trained with dirty data can not provide meaningful insights – and improper data cleansing is an almost surefire guarantee of a failed data project.
4. All roads lead to o16n
Ultimately, all these difficulties come together to create a big problem: too often data teams are formed and asked to perform tasks without having any objective of operationalization (o16n) in mind. If data is analyzed as an end in itself, without any tangible business value, nobody will ever be able to really use it.
So it’s up to the companies to make sure they do not bring their teams to failure. It’s well worth the effort, because the field of data science will continue to grow.