In the second part of our series, we will illustrate which efforts and cost traps can result from data redundancy and which important role data virtualization plays in finding a solution.
„The most cost-effective aspect of copying data is the hardware on which the data is stored.“
Copying data is expensive, with the most apparent cost factor – the infrastructure for storing the data – often representing the smallest portion of direct and indirect costs. The significantly larger part consists of the subsequent costs arising from the redundant storage of company data. In this post, we will take a closer look at these subsequent costs.
Data Governance Costs
Data governance costs arise because corporate data cannot simply be stored on a data repository to await further use. The requirements for data and its retention are extensive for both business and legal reasons. When storing data, it is crucial to adhere to all relevant compliance regulations (confidentiality of data, privacy of personal data, etc.) and ensure these with appropriate authorization concepts, their management, and auditing.
The integrity and availability to be ensured at all times are particularly important for business-critical data. Data security must be guaranteed not only through the mentioned authorization checks but also through all common and standard IT measures. In addition to encrypting data exchange, this may also include encrypting the data itself.
Data Ownership
Another major issue when copying data is the question of "Data Ownership." This includes not only the responsibility for data correctness but also the authority over interpretation and consistency ( "one version of truth") of the data and logic.
Copying data often requires copying associated business logic, which must be applied separately. This results in multiple copies of seemingly identical logic. However, when the logic is stored multiple times and in a decentralized manner, different strands can evolve independently over time, leading to inconsistencies in the calculation and interpretation of key figures.
Operating Costs of Infrastructure
We have already mentioned the costs for the data storage itself. However, these are only a part – and probably often the smallest part – of the total operating costs of the IT infrastructure. Additional costs include expenses for application servers where the data is stored, costs for necessary network connections and firewalls, as well as costs for monitoring this infrastructure.
“The applications cause initial and recurring costs.”
The applications themselves also cause initial and recurring costs. Right at the start, in addition to installation costs, there may be licensing costs for the operating system, application servers, and / or database systems, which may be one-time or recurring depending on the licensing model.
Additional costs arise when applying necessary and potentially security-critical updates to the systems, as well as monitoring all software components.
Maintenance and Monitoring of Loading Processes
A significant portion of the costs arise from the monitoring and maintenance of loading processes (ETLs: "Extract, Transform, Load"), since these need to be ensured continuously for the entire duration of data retention.
Loading processes are, depending on the use case, sometimes complex, recurring program executions that process data and copy it from one place to another based on predefined rules and implemented logic. These program executions need to be monitored because, for example, changes in the data source or target can lead to errors in execution, necessitating corrections to the loading process.
“No hardware has eternal life.”
No hardware has eternal life. This applies to hard drives as well as network cards, routers, or memory. Therefore, disaster recovery is a mandatory preparation for a potential failure and, in turn, is associated with costs for infrastructure and operations.
Costs of Neglecting Necessary Governance or Infrastructure Tasks
Due to the high effort involved in the mentioned tasks, they are often affected by prioritization and cost-cutting measures. However, neglecting these tasks can pose significant risks, leading to enormous costs or even sustained reputational damage.
Data privacy and security not only receive significant social attention. Failure to adhere to relevant regulations can result in material penalties. When things go wrong, the media and social networks are usually quick to respond, and the lasting damage to reputation is often challenging to mitigate.
Missing Desaster-Recovery
Not only malicious external attacks but also improper data storage can lead to costs. Inconsistencies in reporting, whether originating from the data or the applied logic, cause expenses. Either they are unnoticed, potentially leading to wrong decisions, or they are detected, requiring time-consuming and costly investigations, discussions, and corrections within the company. The same holds true for aborted or faulty loading processes, which, if unnoticed, can result in outdated or even incorrect data.
If adequate disaster recovery capabilities for systems are neglected, this can result in effort and costs on various levels. When data cannot be restored because it relies on unsaved historical information, time-dependent logic, or manual inputs, irreversible damage occurs without appropriate backup solutions. Even if the data can be recovered using other data sources, it usually involves significant effort and costs.
The Importance of Software Updates
Software updates for operating systems, applications, or database systems typically serve two purposes: updating and fixing system functionality and addressing security-critical issues. In the latter case, missing updates can lead to unauthorized access to the data.
However, even in the first case, the permanent failure can result in software and data becoming incompatible with subsequent versions of the software, making a future upgrade impossible. Such an upgrade may become necessary as system manufacturers typically limit their support for a software version over time.
After a predefined point in time, no more updates are provided for that version, and any critical security vulnerabilities cannot be addressed thereafter. To maintain compatibility between operating systems, applications, and database systems, regular software updates should also be considered.
Data Virtualization Reduces Implementation and Operational Effort
It is a seemingly small, inconspicuous step that directly and irreversibly triggers the aforementioned consequences: the data is copied from the source to a new environment. The fact that, with data virtualization, the data remains in its original source avoids the mentioned cost and effort drivers.
The tasks of data governance remain in the source system, and authenticated and authorized access to the data of all connected systems is ensured through secured access with Single Sign-On.
There are no loading processes. If an error occurs due to a change in the source system, it is detected by the users and can be corrected by them. This eliminates errors due to missing updates or unnoticed interruptions.
Since the data remains in the source system and meets all requirements for compliant and secure storage, none of these labor-intensive tasks need to be repeated elsewhere.
There are no discussions about the authority to interpret the data or its calculations because business logic can be centrally stored, accessible in one place, and properly managed for everyone's use.
These aspects remain valid as long as the criteria of the virtual data space are maintained:
No data persistence in the virtual space, even if introduced "only" to improve performance.
Consistent and exclusive use of Single Sign-On for the transfer of user identities and no circumvention using technical users.
Therefore, with the help of data virtualization, effort and costs can be significantly reduced, enabling a secure, reliable, and consistent handling of company data.
Comments