If left
unmanaged, your data can become overwhelming; making it difficult to procure
information you need when you need it. While software is designed to address
archiving, discovery, compliance, etc., the overarching goal is most always the
same: to make managing and maintaining data a feasible task. In this post,
you’ll see two types of data you’re accustomed to working with, paying close
attention to the differences between structured and unstructured data.
Unstructured Data
“Unstructured data refers to information that either does not have a
pre-defined data model and/or is not organized in a predefined manner.”
In fine, unstructured data is not useful when fit into a schema/table.
I’ll use email as an example. There are certain values from an email that can
be fit into a table. Sender, recipient, email body, etc. Although you can have
a column for the email body, the information stored in that column would be
useless when analyzed in such a way. What questions could analysts ask of all
data entries in the “email body” column? Could they be answered? The answer is
no.
When looking at the illustration it's obvious that social media plays a
heavy role in unstructured data. In addition to social media there are many
other common forms of unstructured data:
- Word Doc’s, PDF’s and
Other Text Files - Books, letters, other written documents, audio and
video transcripts
- Audio Files - Customer
service recordings, voicemails, 911 phone calls
- Presentations -
PowerPoints, SlideShares
- Videos - Police dash
cam, personal video, YouTube uploads
- Images - Pictures,
illustrations, memes
- Messaging - Instant
messages, text messages
Unstructured data is a valuable piece to the data pie of any business.
Tools that are widely accessible today can help businesses use this data to its
greatest potential.
Structured Data
Contrasting to unstructured data, structured data is data that can be
easily organized. Regardless of its simplicity, most experts in today’s data
industry estimate that structured data accounts for only 20% of the data
available. It is clean, analytical and usually stored in databases.
Today, big
data tools and apps have allowed for the
exploration of structured data that was once too expensive to gather and store.
Some examples of structured data:
Machine Generated: Sensory Data, Point-of-Sale
Data, Call Detail Records, Web Server Logs - Page requests, other server
activity
Human Generated: Input Data - Any data
input into a computer: age, zip code, gender, etc.
Although it's outnumbered by its unstructured brother, structured data
has always and will always play a critical role in data analytics. It functions
as a backbone to critical business insights. Without structured data, it is
difficult to know where to find insights hiding in your unstructured data sets.
Limitations of Data Warehousing
Data Warehousing is evolving but does
have certain limitations.
Future of Data Warehousing
1. Extra Reporting Work:
Traditionally, Data warehousing involves a 'scheduled push' of data periodically (like once a day) from operational data sources in to the Data Warehouse architecture. However, there is a growing demand for analyzing and reporting real time. Data warehousing must adapt to this demand.
Traditionally, Data warehousing involves a 'scheduled push' of data periodically (like once a day) from operational data sources in to the Data Warehouse architecture. However, there is a growing demand for analyzing and reporting real time. Data warehousing must adapt to this demand.
2. Analyzing unstructured data:
The Data Warehouse model described in the previous section does have a module (like HADOOP) to store unstructured data. However, frequently Data warehouses need to simultaneously analyze unstructured data with structured data to produce meaningful results at a particular grain. This is still an ongoing process in design.
The Data Warehouse model described in the previous section does have a module (like HADOOP) to store unstructured data. However, frequently Data warehouses need to simultaneously analyze unstructured data with structured data to produce meaningful results at a particular grain. This is still an ongoing process in design.
3. Data Ownership Concerns:
As businesses get cross functional and data is distributed, security is an ongoing concern. Clear business processes have to be defined within enterprises with regards to security of disparate data.
As businesses get cross functional and data is distributed, security is an ongoing concern. Clear business processes have to be defined within enterprises with regards to security of disparate data.
4. Cost/Benefit Ratio:
Costs of building, integrating and Data warehouses is still high. While in the past, there was cost associated with storage, now-a -days there is a cost of integration and maintenance.
Costs of building, integrating and Data warehouses is still high. While in the past, there was cost associated with storage, now-a -days there is a cost of integration and maintenance.
Future of Data Warehousing
Data Analytics can move beyond the limitations imposed due to the lack of structure in unstructured data and can now seamlessly use all forms of data together in a single context for analytics. The value of such a capability holds tremendous promises for the future of analytics.
The below video from the CEO of Xurmo Technologies gives us a better insight.
“In the past, companies couldn’t integrate these disparate technologies with the data warehouse because each technology required different file formats and data schemas,” says Stackowiak. “Today, you can integrate these technologies, and the result is that companies can access more of their data—not just the 20 percent from enterprise systems—and convert it into valuable, profitable information.”
Companies interested in building out their traditional data warehouse infrastructures may consider starting with reporting, if they don’t already have reporting capabilities in place, suggests Solari. Then, they can begin integrating analytics technologies to their reporting framework.
“When companies start bringing this data together and federating it inside a data warehouse, the total cost of ownership for the data warehouse may begin to go down while the ROI goes up,” says Solari. “The ability to integrate big data technologies, analytics technologies, back office systems, and traditional data warehouses has the potential to fundamentally change the economics of data warehousing for the better.”
References
http://smartdatacollective.com/michelenemschoff/206391/quick-guide-structured-and-unstructured-data
http://www.sherpasoftware.com/blog/structured-and-unstructured-data-what-is-it
http://www.computerweekly.com/feature/How-to-manage-unstructured-data-for-business-benefit
http://deloitte.wsj.com/cio/2013/07/17/the-future-of-data-warehouses-in-the-age-of-big-data/
References
http://smartdatacollective.com/michelenemschoff/206391/quick-guide-structured-and-unstructured-data
http://www.sherpasoftware.com/blog/structured-and-unstructured-data-what-is-it
http://www.computerweekly.com/feature/How-to-manage-unstructured-data-for-business-benefit
http://deloitte.wsj.com/cio/2013/07/17/the-future-of-data-warehouses-in-the-age-of-big-data/






