Join top executives in San Francisco on July 11-12 to hear how leaders are integrating and optimizing AI investments for success. Find out more
When it comes to data, sharing isn’t always a concern.
Yes increase data traffic departments like marketing, sales, and human resources are doing much to support better decision-making, enhance the customer experience, and — ultimately — improve business outcomes. But this has serious implications for security and compliance.
This article will discuss why, then present three core principles for safety integration Data.
Democratizing access to data: An important note
On the market today is an incredible range of no-code and low-code tool to move, share, and analyze data. Extract, transform, load (ETL) and extract, load, transform (ELT) platforms, iPaaS platforms, data visualization applications, and databases as a service — all This can be used relatively easily by non-technical professionals with minimal supervision from the administrator.
Furthermore, the number of SaaS applications that businesses use today is Continuous developmentso the need for self-service integration will likely only increase.
Many such applications, such as CRM and EPR, contain sensitive customer data, payroll data, invoicing data, etc. These tend to have tightly controlled access levels, so as long as the data stays inside them, there shouldn’t be much of a security risk.
However, once you get the data out of these environments and feed the data to downstream systems with completely different access level controls, there comes what we might call “access control bias”.
People who work with ERP data in one warehouse, for example, may not have the same level of trust from company management as the original ERP operators. So by simply connecting your app to a data warehouse — which is becoming increasingly necessary — you run the risk of leaking sensitive data.
This can lead to violations of regulations such as GDPR in Europe or HIPAA in the US, as well as requirements for data security certification such as SOC 2 Type 2, not to mention the trust of the parties. Stakeholders.
Three principles for secure data integration
How to prevent unnecessary flow of sensitive data to downstream systems? How do I stay safe in case I need to share? And in the event of a potential security incident, how to ensure that any damage is minimized?
These questions will be addressed by the following three principles.
By separating the functions of data storage, processing, and visualization, businesses can reduce the risk of data breaches. Let’s illustrate how it works with an example.
Imagine that you are an e-commerce company. Your main manufacturing database—connected to your CRM, payment gateway, and other apps—stores all your inventory, customer, and order information. As your company grows, you decide it’s time to hire your first data scientist. Naturally, the first thing they do is request access to a dataset with all of the above information so they can write data models, such as how weather impacts the ordering process. What is the most popular item or item in a particular category.
However, it is not very practical to give the data scientist direct access to your main database. Even with the best of intentions, they could, for example, export sensitive customer data from that database to a dashboard viewable by unauthorized users. Also, running analytic queries on the production database can slow it down to the point of inoperability.
The solution to this problem is to clearly define what types of data need to be analyzed and by using a variety of data types. data duplication techniqueto replicate data into a secondary repository designed specifically for analytics workloads, such as Redshift, BigQuery, or Snowflake.
This way, you prevent sensitive data from flowing downstream to the data scientist, while providing them with a secure sandbox environment that is completely separate from your production database.
Use data exclusion and data masking techniques
These two processes also help separate concerns because they completely block the flow of sensitive information to downstream systems.
In fact, most data security and compliance issues can actually be resolved as soon as data is being extracted from the application. After all, if there’s no good reason to send a customer’s phone number from your CRM to your production database, why bother?
The idea of data exclusion is simple: If you have a system in place that allows you to select subsets of data to extract as a ETL toolyou simply cannot select subsets containing sensitive data.
Of course, there are some situations when sensitive data needs to be extracted and shared. Where is this data mask/hash appears.
For example, let’s say you want to calculate a customer’s health score, and the only logical identifier is their email address. This will require you to extract this information from your CRM to your downstream systems. To keep it end-to-end secure, you can mask or hash it when extracting. This preserves the uniqueness of the information, but makes sensitive information unreadable.
Both data exclusion and data masking/hashing can be achieved with an ETL engine.
As an additional note, it’s worth mentioning that ETL tools are generally considered more secure than ELT tools because they allow data to be masked or hashed before they are loaded onto the target system. For more information, refer to this detailed comparison of ETL and ELT . Tools.
Keep a robust check-in and check-in system in place
Finally, be sure to have systems in place that allow you to understand who is accessing the data and how and where the data is being transferred.
This is, of course, important for compliance as many regulations require organizations to demonstrate that they are tracking access to sensitive data. But it’s also necessary to quickly detect and react to any suspicious behavior.
Auditing and logging is both the internal responsibility of the companies themselves and the responsibility of the data tool providers, such as pipeline solutions, data warehouses, and analytics platforms.
So, when evaluating such tools for inclusion in your data stack, it is important to note whether they have audio logging capabilities, role-based access control, and other facilities. other security mechanisms such as multi-factor authentication (MFA). SOC 2 Category 2 certification is also something to look for as it is the standard on how digital companies should handle customer data.
This way, if a potential security incident occurs, you’ll be able to conduct forensic analysis and minimize the damage.
Access vs. Security: Not a zero-sum game
Over time, businesses will increasingly face the need to share data as well as the need to keep it safe. Fortunately, meeting one of these needs does not mean ignoring the others.
The three principles outlined above can underpin a secure data integration strategy in organizations of all sizes.
First determine what data can be shared, then copy it to a secure sandbox environment.
Second, whenever possible, keep sensitive data sets in the source system by excluding them from pipelines and making sure to hash or mask any sensitive data that needs to be extracted.
Third, ensure that your business itself and the tools in your data stack have a robust logging system in place, so that if something goes wrong, you can minimize damage and investigate properly. .
Petr Nemeth is the founder and CEO of Dataddo.
Welcome to the VentureBeat community!
DataDecisionMakers is a place where professionals, including technical people who work with data, can share data-related insights and innovations.
If you want to read about cutting-edge ideas and updates, best practices, and the future of data and data technology, join us at DataDecisionMakers.
You can even consider contribute an article your own!