Data Science – what it is and isn’t?

Over the last few years, the term “data science” has become ubiquitous across a range of sectors and situations. Data science covers the need to find insights from huge amounts of data that was piling up within companies such as Yahoo, Facebook, and LinkedIn.

Who is the data scientist

The data scientists then do the “data wrangling” – hunting for sources of data, joining them together and performing cleaning tasks over large data sets. Then they use their subject matter expertise to analyse the data to get insights and share those with decision-makers. So, they could then decide to act on or create new features for operations or customers.


As enterprises mature their big data capabilities, they are increasingly finding it more difficult to extract value from their data. Companies that want to capitalize on their early data science success need to embrace some drivers. They have to consolidate data into a single data lake to avoid data sprawl. For organizations that have consolidated data into a centralized lake, the next challenge is providing the right level of access to the data. For data scientists to perform advanced analytics, they require a few things: access to large amounts of data, the ability to augment the existing data with outside data sources, and the ability to model the data using cutting-edge tools and libraries.

Without governance and structure, data lakes quickly become uninhabitable data swamps, with lagoons of unsupported tables. The key here is to find the right balance between giving users the freedom to use certain tools and the ability to experiment while providing a consistent quality of service to the operational environment. Far too many organizations, early in their big data deployments, move quickly to establish data platforms and make technology choices without considering the business strategy along the way.

How to measure the success of data science

Financial – Value measurements are often the easiest way to communicate the success of a data science initiative across organizations. End-to-end data science projects have software deliverables that can be measured by software metrics. An example includes defect count.

Where is data science heading in 2021

In the open-source arena supporting data science, the process of detecting trends is really a matter of monitoring the type of packages being released, especially for data scientists using the R language. The flow of new and updated R packages occurs at a fast pace.


Analytics, Big Data, Business Intelligence, Data, Innovation, Optimisation, Technology



Related articles