Tech

How Observability Designed for Pools of Data Can Unlock the Promise of DataOps


View all sessions on demand from Smart Security Summit here.


Today, it is not an exaggeration to say that every company is a data company. And if they don’t, they need to be. That’s why many organizations are investing in the modern data stack (think: Databricks and Snowflake, Amazon EMR, BigQuery, Dataproc).

However, these new technologies and the growing business importance of data initiative present significant challenges. Today’s data teams not only have to deal with huge volumes of data imported daily from a variety of sources, but they also have to be able to manage and monitor thousands of connected and dependent data applications. belong to each other.

The biggest challenge is managing the complexity of the interweaving systems we call the modern data stack. And as anyone who has spent time researching data knows, decoding data application performance, controlling cloud costs, and mitigating data quality issues is no easy task. easy.

When something breaks in these Byzantines data pipeline, without a single source of truth to refer back to, finger pointing starts with data scientists blaming the operation, the activity blaming the engineering, the engineering blaming the developer – etc

Incident

Smart Security Summit on Demand

Learn the critical role AI & ML plays in cybersecurity and industry-specific case studies. Watch sessions on demand today.

see here

Is it a password? Not enough infrastructure resources? A planning coordination problem? There is no single source of truth for everyone to gather together, everyone uses their own tools, working in the vaults. And different tools give different answers — and untangling the ropes to get to the core of the problem takes hours (even days).

Why modern data teams need a modern approach

Today’s data teams are facing many of the same challenges that software teams have faced: A fractured team working in bunkers, under gun command, to keep up with the pace of supply. more, faster without enough people, in an increasingly complex environment.

Software teams have successfully tackled those obstacles through DevOps discipline. A key part of successful DevOps teams is the visibility provided by the new generation of application performance management (APM). Software teams can accurately and efficiently diagnose the root cause of problems, work collaboratively from a single source of information, and enable developers to address problems early — before software is even available. into production — without having to throw problems over the fence work team.

So why do data teams struggle while software teams don’t? They are essentially using the same tools to solve the same problem.

Because, despite common similarities, observability for groups of data is a completely different animal than observability for groups of data.

Cost control is very important

First, consider that in addition to understanding the performance and reliability of data lines, data teams also grapple with the question of data quality — how can they rest assured that they are working? provide high-quality input to your analysis engine? And, as more workloads move to multiple types of public clouds, it’s important that teams can understand their data paths through a cost lens.

Unfortunately, data teams have a hard time getting the information they need. Different groups have different questions they need to be answered, and everyone is short-sightedly focused on solving their particular puzzle piece, using their own particular selection tool and specific strategies. Different tools give different answers.

Troubleshooting issues is a challenge. The problem can occur anywhere along a highly interconnected and complex application/pipeline for any of a thousand reasons. And, while the web application visibility tools have their purpose, they are never meant to absorb and correlate performance details buried in the components of a modern data stack or “untangle the wire” between upstream or downstream dependencies of the data application.

Additionally, as more data workloads move to the cloud, the costs of operating data pipelines can quickly spiral out of control. An organization with more than 100,000 data tasks in the cloud has countless decisions to make about where, when, and how to run these tasks. And each decision carries a price tag.

As organizations relinquish centralized control over infrastructure, it is essential for both data engineers and FinOps to understand where the money is going and to identify opportunities to reduce/control costs. .

A lot of visibility is hidden in plain sight

To gain insights into data performance, cost, and quality, data teams are forced to aggregate information from a variety of tools. And, as organizations scale their data stacks, the sheer amount of information (and resources) makes it extremely difficult to see the entire data jungle when you’re sitting in a tree.

Most of the necessary details are available—unfortunately, they are often hidden in plain sight. Each tool provides some essential information, but not all. What is needed is the ability to observe that brings all these details together and presents them in a context that makes sense and speaks the language of groups of data.

Observability designed from the ground up specifically for groups of data allows them to see how things fit together as a whole. And while there are many proprietary, open-source, and cloud-vendor-specific data visualization tools that provide insights into a particular layer or system, the ideal solution is the most important. Full-stack monitoring can bring it all together into one workload-context-cognition. Solutions that leverage deep artificial intelligence can also show not only where and why a problem exists, but also how that problem affects other data pipelines — and ultimately, what to do. with that problem.

Like DevOps observability provides the underlying foundation to help improve the speed and reliability of the software development lifecycle, DataOps observability can do the same for the application/pipeline lifecycle data. But – and this is a big deal but — The observability of DataOps as a technology must be designed from the ground up to meet the different needs of data groups.

DataOps observability cuts across multiple domains:

  • Application/pipeline/data model observability ensure that data analytics applications/pipelines run on time, every time, without errors.
  • the ability to observe activity enables data teams to understand how the entire platform is performing from start to finish, providing a unified view of how everything is working together, both horizontally and vertically.
  • business observability consists of two parts: profit and cost. The first is about ROI, monitoring and correlating the performance of data applications with business results. The second part is FinOps Observabilitywhere organizations use real-time data to manage and control their costs in the cloud, understand where money is being spent, establish budget barriers, and identify opportunities for optimization environment to reduce costs.
  • data observability review datasets, run quality checks to ensure accurate results. It tracks the lineage, usage, and integrity and quality of the data.

Groups of data cannot be centralized individually because the problems in the modern data stack are interrelated. Without a unified view of the entire data range, the promises of DataOps will not be fulfilled.

Observability for the modern data stack

Extracting, correlating, and analyzing everything at the foundation layer in a data-centric, workload-aware context delivers five capabilities that are hallmarks of complete DataOps visibility:

  • End-to-end visibility Correlation between telemetry data and metadata from the entire data stack to provide unified, in-depth insights into the behavior, performance, cost, and health of data and workflows your data.
  • Situational awareness put this aggregate information into a meaningful context.
  • actionable intelligence tells you not only what is happening but also why. Next-generation observation platforms go a step further and provide prescriptive AI-powered recommendations on what to do next.
  • Everything that either happens through or allows a high degree of automation.
  • This active ability is administration in practice, as the system automatically applies the recommendations — no human intervention required.

As more and more innovative technologies make their way into the modern data stack — and more and more workloads move to the cloud — having a unified DataOps visualization platform with the flexibility to understand Gaining increasing complexity and intelligence to provide solutions is becoming more and more necessary. That’s real DataOps observability.

Chris Santiago is VP of solutions engineering for to make clear.

DataDecision makers

Welcome to the VentureBeat community!

DataDecisionMakers is a place where experts, including those who work with data, can share data-related insights and innovations.

If you want to read about cutting-edge ideas and updates, best practices, and the future of data and data technology, join us at DataDecisionMakers.

You can even consider contribute an article your own!

Read more from DataDecisionMakers

goznews

Goz News: Update the world's latest breaking news online of the day, breaking news, politics, society today, international mainstream news .Updated news 24/7: Entertainment, Sports...at the World everyday world. Hot news, images, video clips that are updated quickly and reliably.

Related Articles

Back to top button