Using big data to unlock big business potential – a six step guide
Big data is a powerful tool for organisations to unlock operational efficiencies, inform new product and service creation, and improve the overall customer experience. Yet many organisations struggle to maximise the potential of their data, most commonly because business leaders fall into the trap of investing in one-off solutions that are driven by technology rather than goals. In this blog, our Director of Data Science, Finn Wheatley, looks at the six steps to building a data science capability throughout your business.
Step 1: Getting started
We recommend starting with one small project. Setting out to solve one incremental problem will help you show value and secure buy-in from the top.
Any organisational change requires careful planning, and setting up a data science team is no exception. It is important to emphasise that data science will enhance the quality of your employees’ work, and that automation will allow them to focus on the elements of their work where human skills are most needed.
The best way to demonstrate value is to find a business problem where you can build a firm case for improvement, identifying measurable and sustained value with high confidence. Often, a good place to start with is with your customer data, typically a large and under-used dataset. This data could allow you to identify use cases such as classifying high value customers for marketing purposes. Then, create your first prototype to test the pipeline, develop your team’s skills and generate buy-in from stakeholders. This self-contained product will incorporate the major elements of your project, and provide a platform to develop the professional and technical capability within the team.
Step 2: Building a data science team
An effective data science team will be made up of people with diverse but overlapping skillsets. This includes data engineers, data visualisation developers, DevOps engineers, software engineers, data architects and analysts, as well as data scientists. Using Agile methods can improve your team’s ways of working and increase throughput. It is important that your team maintains close alignment with your users, as this allows them to respond quickly to the users’ changing priorities.
Agile teams that work in this way can rapidly create proofs of concept and show results – which offers the ability to fail fast and pivot to a more productive line of work if necessary. It is also important to ensure that your team fits your organisation and balances the requisite skills and experience.
Step 3: Building the foundations of your data-driven future
The infrastructure is a critical success factor in building your data science capability, and allows you to make it easier to explore data and build and deploy data products. There are a number of areas your infrastructure should support:
- Data processing
- ETL pipelines
- Test and deployment
- Data exploration and research
- Software development
- Data visualisation
- Enforcing prescribed compliance, audit, and governance standards
The foundation of your data platform will be a data lake, which is a data warehouse that accepts many data types and file formats. It is important that metadata are logged on ingest to a data lake, to ensure usability.
Considering whether to use the cloud or on-premise systems for your data platform is an important consideration. We generally recommend the cloud, especially considering the additional benefit of managed services that cloud providers offer to support rapid set-up, configurability and scalability
From the outset, you should also consider the deployment and maintenance of your data product. Focusing on this at the outset, rather than at the final stages of development, ensures you can seamlessly integrate and deploy your product. The choice of infrastructure can play a role here. A large advantage of cloud deployment is that it is often much easier to deploy attractive and user-friendly dashboards and visualisations to help demonstrate the value of your first project.
Step 4: The data engineer
Data engineers build pipelines to move data from source systems to the data lake. They also manipulate the raw data to create data assets. Data assets are key ideas in data science, and must meet a number of qualifying criteria:
- It is subject to a data quality and validation process to ensure integrity
- It is typically subject to a further level of processing or aggregation
- It meets a clear business need
- It is usually a composite, drawn from at least two or more data sources
- It is live data, which is updated on a frequent, usually daily, basis
- It must be a single source of truth – i.e. each data element must be generated in a uniform way.
These data assets can then be combined. This results in layers of assets that are each in a more highly aggregated and processed form. An example is a customer master data asset, which can be used to answer almost any question asked about a customer, from marketing techniques that are most likely to be successful, to predicting how to mitigate the risk of complaint. It is important for data engineers to be able to update assets as and when new requirements emerge, as this can bring substantial value to your business by avoiding the need for data scientists to manipulate old and messy data when developing each new product.
Step 5: Making the user experience simple
Returning your results in a convenient and intuitive form is critical factor that often distinguishes data science from analysis or statistics. So it is important to spend time ensuring that your results are clear, especially in cases where the dataset is large and complex. The data science team can provide visualisations such as graphs, maps and charts, as well as text, to clearly demonstrate the business value of the product.
Step 6: Call in the experts
Setting up a data science capability offers the potential to deliver substantial operational efficiencies, and it is critical to look at the big picture and focus on the end-goal rather than get distracted by the detail. Building a data science function can show results within the first six months and enlisting the help of specialists will allow you to focus on the results you need.
Learn more about building a data science function in your business by downloading our whitepaper.
About the author
Finn Wheatley, Director of Data Science
Finn has over a decade of experience working in lead data science and quantitative roles in both the public and private sectors. Following his undergraduate degree from King’s College London, Finn worked for several years in the hedge fund industry in risk management and portfolio management roles. Subsequent to an MSc in Computer Science from University College London, he joined the civil service and helped to establish the data science team at the Department for Work and Pensions (DWP), delivering innovative analytical projects for senior departmental leaders. Since joining Whitehat Analytics, he has been involved in establishing the data science team at EDF Energy.