from DSSResources.com


How do data storage strategies differ among uses?

by Daniel J. Power
Editor, DSSResources.COM

Everything that happens is being digitized. Customers are data, processes are data, supply chains are data, employees are data, things generate and use data. Digital disruption caused by emerging digital technologies and innovative business models is changing how organizations, managers and other people act and function. Until recently, data was stored for two primary purposes, processing transactions and providing decision support. Now real-time data, unstructured data, business rules, and other current and historical data is processed and stored for other purposes including automating and recommending decisions. For many years, database designers and data modelers made various assumptions related to the purpose for data storage that involved answers to three questions. The answers are changing and evolving and more questions must be asked, considered and answered.

Increasing data volumes, variety, volatility and velocity means that storing data presents challenging problems and raises practical questions. Data is not integrated in most organizations, data silos are still a problem, a single version of the truth is more a goal than a reality in most organizations, cf., Devo Technology, 8/8/2018. A data storage strategy refers to resolving questions of what to store, where to store the data, when to store the data, how to store the data, how to access the data, who can access the data, and other related questions.

Let's explore data and data storage strategies. Data may be quantitative (numbers), dates, video, sounds, and text strings. Data may be static or changing. Data may be entered, received and captured slowly or quickly. At a machine level all data is digital data, a series of 0's and 1's. Digitizing or digitization is a process of converting analog data like sounds, handwriting, photos and images and so much more into machine usable data. So what questions help us decide how to store data? What are the contemporary answers to those questions?

Q1. How often should the same data value or "piece of information" be stored?

In a transaction processing environment the assumption has traditionally been to store a transaction once and once only. Decision support or data warehouse data storage is non-volatile and hence the assumption has been that multiple copies of the same data and information should be stored when the duplication improves query performance. Storing multiple copies is a problem if the data must be corrected or changed. Real-time data may have redundancy, but it is not changed and the redundancy may be important to algorithmic decision making.

Q2. Who will access the data?

If the data will be accessed and retrieved by sophisticated users, then using Structured Query Language (SQL) is acceptable. If the data will be accessed and retrieved by managers, then use a simple storage and retrieval scheme. Metadata, data about the data, is extremely important when managers use data directly. If the data will be accessed by an algorithm, then retrieval speed is the most important consideration. Computing machines read, process and share data very quickly.

Q3. What level of performance is required to retrieve data?

If fast, real-time retrieval is required, then a well-tuned parallel processing, relational database or a specialized data store is required. If one is asking unplanned, ad hoc questions, then performance matters less than in other data retrieval situations. Many factors related to the data model, the database software, and the hardware impact data retrieval.

These three questions have become less important and the answers have gotten more ambiguous in data rich business environments. Also, data storage has become more complex with Cloud storage. First, data storage may serve an expanded archival purpose because storage costs have declined and continue to decline. Second, more people need to retrieve data for many diverse reasons including someone retrieving and reading Facebook posts, a salesperson checking contact information and purchase history of a customer from a smart phone, and a person paying for an item at a retain store using the smart phone payment app. Finally, more computers are directly accessing data and devices associated with the Internet of Things (IoT) are generating and using data.

Perhaps we need to expand the data storage questions we ask and perhaps we need to recognize the answers are not binary or dichotomous, but rather multiple equally good alternative answers or even a range of values. So the best answer to Question 1, may be two (2) in many situations, once in the transaction database and once in the data backup and recovery archive. Perhaps the best answer to Question 2 is all stakeholder. Finally, perhaps Question 3 is now always in real-time, no delays.

Some new questions to think about are Q1: How will privacy be ensured? Q2: How long will the data be stored? Q3: Ultimately, how large might the data store become? and Q4: Is data backup necessary?

Our assumptions for data storage should be regularly revisited. Data storage is no longer limited to implementing a Relational Transaction Processing Database Management System or a static Data Warehouse (DW). Often data storage involves heterogeneous file structures and distributed processing. Data storage is better understood now because we have been doing it for 70 years, and data storage is easier and more robust because we have new technologies, including post-relational and mixed workload translytical databases in distributed off-premises computing environments.

In some situations, the best data storage design is to store data in "tiers" with data ranges or fields assigned to a tier for storage based on its organization value or worth, access requirements and retention needs. Partitioning storage can also improve security for sensitive data fields like Social Security numbers.

A data storage strategy is important because it largely determines how data can be used and secured. Levy (2018) explains 5 key components of a data storage strategy: 1) Identifying data and understanding its meaning, 2) storing data for easy, shared access and processing, 3) provisioning or packaging data so it can be reused and shared, 4) processing data so there is a unified, consistent data view, and 5) governing and controlling data use and storage with policies. A comprehensive data storage strategy should address all five of these topics.

Managers should continually think about data capture possibilities and uses. This task is important because options and possibilities are rapidly evolving. Perhaps equality important is to periodically investigate how to capture high-quality data once at the point of creation or origin and then how to store and process that data appropriately. Data is a record of what is and was happening and data from the past may help us diagnose problems and predict what will or should happen in the future.

Data is growing rapidly and data storage is increasingly important whether on-premises, on a local machine, using a distributed file system, or in the cloud. Also, more people want access to data in real-time and often thousands of concurrent users are analyzing and querying the same data store. Most users want instantaneous responses when data is requested. Data is in demand and technology is improving to meet the demand.

Digital business is dependent upon digital data capture, digitization and digital data storage, but only extensive use of data-based decision making can transform an organization and its culture so that data creates value for customers and stakeholders. Internal and external data must be captured, analyzed and acted upon to enhance and support the mission of a business. Managers must then use the analyses to support and inform their decision making. Information Technology (IT)leaders and experts should help assess what is possible in data capture and storage and the cost/benefit trade-off. Digital thinking means trying new technologies and assessing what works and what doesn't in an endless Do-Repeat loop. With technology change it is important to try and assess and then try again. Finally, digital transformation is about better serving your customer and the market. To use data effectively, know your customer and ask widely for feedback.

References

Devo Technology, "Survey reveals most organizations struggle to extract value from operational data," Press Release Devo Technology, Aug. 8, 2018 at URL http://dssresources.com/news/5013.php

Levy, E., "The 5 Essential Components of a Data Strategy," SAS White Paper, 2018 at URL https://www.sas.com/content/dam/SAS/en_us/doc/whitepaper1/5-essential-components-of-data-strategy-108109.pdf

NUODB, "Understanding Today's Database Landscape," at URL https://www.nuodb.com/product/database-comparison

Ross, J., "Don’t Confuse Digital With Digitization", MIT Sloan Management Review, September 29, 2017 at URL https://sloanreview.mit.edu/article/dont-confuse-digital-with-digitization/.

Various opinions, "Computers are becoming faster and faster, but their speed is still limited by the physical restrictions of an electron moving through matter. What technologies are emerging to break through this speed barrier?" Scientific American, at URL https://www.scientificamerican.com/article/computers-are-becoming-fa/

Last update: 2018-08-13 10:53
Author: Daniel Power

Print this record Print this record
Show this as PDF file Show this as PDF file

Please rate this entry:

Average rating: 4 from 5 (2 Votes )

completely useless 1 2 3 4 5 most valuable

You cannot comment on this entry





DSS Home |  About Us |  Contact Us |  Site Index |  Subscribe | What's New
Please Tell Your Friends about DSSResources.COMCopyright © 1995-2015 by D. J. Power (see his home page).
DSSResources.COMsm is maintained by Daniel J. Power. Please contact him at djpower1950@gmail.com with questions. See disclaimer and privacy statement.


Google
 
Web DSSResources.com

powered by phpMyFAQ 1.5.3