Data Ops

Data Ops - The Ops terminology for Data

Automation of data pipelines and collaboration between teams are two of the key characteristics of “modern” DataOps

DataOps seeks to reduce the end-to-end cycle time of data analytics, from the origin of ideas to the literal creation of charts, graphs and models that create value.​ It is inspired on DevOps and concepts like continuous integrationdelivery, and operations are now being applied to the process of productionizing data (engineer, science, etc)

DataOps - Principles

Minimal Disruption

Minimal Disruption

Minimize disruption to data producers in how they deliver their data

Configuration (80/20)

Configuration (80/20)

Focus on 80% of uses cases that can be satisfied with configurable components

Tool Decoupling

Tool Decoupling

Today’s tool may not be the right tool tomorrow

Data Residency

Data Residency

Users should access the data where it leaves, no regardless where users live

Right Tool for the Right Job

Right Tool for the Right Job

Process drive tooling; not the other way around

Data Ops - Steps

Add Data and Logic Tests

Add Data and Logic Tests

Are data inputs free of issues​?
Is your business logic still correct?
Are you ouputs consistent​?
Document of Interest of Data Sources​ (DIDS)

End to End Version Control

End to End Version Control

Use Git​.  Analytic work is just code

Branch & Merge

Branch & Merge

Use Git Branching techniques​. Branching and Merging enable people to safely work on their own tasks 

Use CI/CD and Environments

Use CI/CD and Environments

Automation between DEV > QA > PROD (Continuous Integration)​. A need a controlled environment for their experiments

Parametrization

Parametrization

With parameters, you can vary: Inputs, Outputs and steps in the data pipeline