Wednesday, February 18, 2015

Big Unstructured Data v/s Structured Relational Data

If left unmanaged, your data can become overwhelming; making it difficult to procure information you need when you need it. While software is designed to address archiving, discovery, compliance, etc., the overarching goal is most always the same: to make managing and maintaining data a feasible task. In this post, you’ll see two types of data you’re accustomed to working with, paying close attention to the differences between structured and unstructured data.

Unstructured Data

“Unstructured data refers to information that either does not have a pre-defined data model and/or is not organized in a predefined manner.”
In fine, unstructured data is not useful when fit into a schema/table. I’ll use email as an example. There are certain values from an email that can be fit into a table. Sender, recipient, email body, etc. Although you can have a column for the email body, the information stored in that column would be useless when analyzed in such a way. What questions could analysts ask of all data entries in the “email body” column? Could they be answered? The answer is no.
When looking at the illustration it's obvious that social media plays a heavy role in unstructured data. In addition to social media there are many other common forms of unstructured data:
  • Word Doc’s, PDF’s and Other Text Files - Books, letters, other written documents, audio and video transcripts
  • Audio Files - Customer service recordings, voicemails, 911 phone calls
  • Presentations - PowerPoints, SlideShares
  • Videos - Police dash cam, personal video, YouTube uploads
  • Images - Pictures, illustrations, memes
  • Messaging - Instant messages, text messages
Unstructured data is a valuable piece to the data pie of any business. Tools that are widely accessible today can help businesses use this data to its greatest potential.

Structured Data
Contrasting to unstructured data, structured data is data that can be easily organized. Regardless of its simplicity, most experts in today’s data industry estimate that structured data accounts for only 20% of the data available. It is clean, analytical and usually stored in databases.
Today, big data tools and apps have allowed for the exploration of structured data that was once too expensive to gather and store. Some examples of structured data:
Machine Generated: Sensory Data, Point-of-Sale Data, Call Detail Records, Web Server Logs - Page requests, other server activity
Human Generated: Input Data - Any data input into a computer: age, zip code, gender, etc.
Although it's outnumbered by its unstructured brother, structured data has always and will always play a critical role in data analytics. It functions as a backbone to critical business insights. Without structured data, it is difficult to know where to find insights hiding in your unstructured data sets.



Limitations of Data Warehousing
Data Warehousing is evolving but does have certain limitations. 


1. Extra Reporting Work:
Traditionally, Data warehousing involves a 'scheduled push' of data periodically (like once a day) from operational data sources in to the Data Warehouse architecture. However, there is a growing demand for analyzing and reporting real time. Data warehousing must adapt to this demand.
2. Analyzing unstructured data:
The Data Warehouse model described in the previous section does have a module (like HADOOP) to store unstructured data. However, frequently Data warehouses need to simultaneously analyze unstructured data with structured data to produce meaningful results at a particular grain. This is still an ongoing process in design.
3. Data Ownership Concerns:
As businesses get cross functional and data is distributed, security is an ongoing concern. Clear business processes have to be defined within enterprises with regards to security of disparate data.
4. Cost/Benefit Ratio:
Costs of building, integrating and Data warehouses is still high. While in the past, there was cost associated with storage, now-a -days there is a cost of integration and maintenance.

Future of Data Warehousing
Data Analytics can move beyond the limitations imposed due to the lack of structure in unstructured data and can now seamlessly use all forms of data together in a single context for analytics. The value of such a capability holds tremendous promises for the future of analytics.
The below video from the CEO of Xurmo Technologies gives us a better insight.


“In the past, companies couldn’t integrate these disparate technologies with the data warehouse because each technology required different file formats and data schemas,” says Stackowiak. “Today, you can integrate these technologies, and the result is that companies can access more of their data—not just the 20 percent from enterprise systems—and convert it into valuable, profitable information.”

Companies interested in building out their traditional data warehouse infrastructures may consider starting with reporting, if they don’t already have reporting capabilities in place, suggests Solari. Then, they can begin integrating analytics technologies to their reporting framework.

“When companies start bringing this data together and federating it inside a data warehouse, the total cost of ownership for the data warehouse may begin to go down while the ROI goes up,” says Solari. “The ability to integrate big data technologies, analytics technologies, back office systems, and traditional data warehouses has the potential to fundamentally change the economics of data warehousing for the better.”

References
http://smartdatacollective.com/michelenemschoff/206391/quick-guide-structured-and-unstructured-data

http://www.sherpasoftware.com/blog/structured-and-unstructured-data-what-is-it

http://www.computerweekly.com/feature/How-to-manage-unstructured-data-for-business-benefit


http://deloitte.wsj.com/cio/2013/07/17/the-future-of-data-warehouses-in-the-age-of-big-data/

Tuesday, February 3, 2015

Evaluation of BI Vendors (Blogging Assignment)

This blog details the comparison between some of the most popular BI tools that are being used today across multiple organizations. This blog gives an overview of the ‘Leaders’ in the BI & Analytics platform as mentioned in Gartner’s 2014 report. The tools are evaluated by a certain criteria with appropriate weightage given to each category to determine the best tool.

The tools considered for comparison in this blog are:

1.) Tableau
2.) Qlikview
3.) TIBCO Spotfire
4.) SAS
5.) Microstrategy

The above tools were critiqued on 5 factors that were determined to be crucial for its relevance in the industry.

1.) Data Visualization
2.) Analytical Insight
3.) Integration
4.) Customer Support
5.) Cost & Miscellaneous

Criteria Analysis

1.) Data Visualization
This module refers to the mode of delivery of insight or about the extent to which the data can be represented in a tool with minimal effort from the user. This component includes multiple factors such as Reporting, Dashboard and Mobile Interface

Reporting forms one of the primary abilities of a tool to create complex reports from multiple data sources.

Dashboarding is the feature that allows a visual representation of the analysis performed using various graphics such as charts, plots etc.

Since an increasing number of users are going mobile for various tasks, a mobile interface for the BI tools helps customers to port their tasks based on the features provided.

This category was highly competitive among all the tools under consideration for this review. Visualization is an important factor in today’s world and it acts as a good scale of measurement for the ease of use of a tool for a customer.  

Qlikview is strong on dashboards but is cloggy when importing and exporting documents for aggregating reports. SAS provides industry standard reports and works faster in collaborating with database application and other Microsoft Office applications hence it has a higher rating in comparison. Tableau has been rated as the best BI tool as per the weighted score analysis as it provides its users with best in class reporting and dashboard facilities. Microstrategy’s data visualization is ranked lowest due to its inconsistent dashboard & less interactive visualization.


2.) Analytical Insight

This section was further divided across multiple categories:

   a) Predictive analytics: Represents the extent to which tools support statistical modelling to forecast and            predict trends.
   b) Scorecarding: Refers to the ability of the tool to create and depict different scorecards such as Six                Sigma, Balanced Scorecards and key performance indicators for measuring the performance of the                company.
   c) OLAP (Online Analytical Processing): Mainly deals with the performance of the tools with respect to            querying, pivoting and sorting capabilities of the tool.

Qlikview scored the lowest in this section since it doesn’t host a predictive analytics module and has very primitive OLAP capabilities. Microstrategy & Tableau come up to about the same level but better than Qlikview. SAS has been a dominant player in this section. Its analytics includes features that are unique directly integrated into the BI tool. SAS has a 36% share of the advanced analytics market, more than the share of the next 10 vendors combined. 

Hence SAS comes out to be the top vendor in this category due to its offering in the advanced analytics domain. Spotfire also deserves as a honorable mention as it offers a lot of flexibility in applying analytics functions.



3.) Integration

The integration feature involves taking into consideration the following aspects.

   a) Workflow Engines: Addresses the ability of tool to model workflows for applications as per the needs           of the user.
   b) Developer Tools: Indicates the sophistication level of SDK provided to developers to customize the              application to add new features/fix bugs.
   c)  Big Data Support: Ability of the tool to handle large, unstructured data (No SQL).

Microstrategy ranks among the highest in this category due to its integrated Intelligence server which supports a wide range of BI applications. Qlikview has the lowest ranking as it does not handle Big Data at all and relies on a third party tool to introduce workflows in BI applications. SAS & Spotfire also provide extensive support for big data by ensuring high compatibility with big data sources.



4.) Customer Support

Customer interaction & support forms a crucial part of any organization/vendor. It is imperative to obtain customer feedback to learn about the user experiences & address their queries. It completes the loop by acting as an input to the vendors to consider adding new features into the tool and further their performance.

Tableau offers support services for all of their products, including an extensive online database of resources that users can search for answers to their issues. Qlikview offers non-technical support in the period immediately following the sale of their product through Qoncierge, an international, online support staff that helps to address issues such as license-related questions, portal access issues, download issues, and other general questions. MicroStrategy provides training on not just its solutions, but on Business Intelligence as a whole. SAS also operates on similar lines. Tibco Spotfire offers a variety of training for the software. Spotfire also offers discounts on the training packages in the form of Educational Passports. 



5.) Cost & Miscellaneous
Here we’ll consider the cost factor associated with each tool and in some cases the offline support provided by vendors such as Qlikview & Tableau which is a nice feature to have.

Tableau turns out to be the cheapest option whereas SAS & Microstrategy turn out on the expensive side when purchasing a license either for an individual or an enterprise. Spotfire & Qlikview are reasonably priced and provide good value for money for the investment.


Overall Score and Assessment:
The table below summarizes my analysis. Considering all aspects, Tableau scores the highest, which is also supported by Gartner’s repor. There seems to be a competition between Tableau and Microstrategy in the Enterprise Informatics space. Tableau is currently more flexible with its pricing and offers more value added options which are gives it a significant advantage over Microstrategy in the market.

Qlikview’s customer base is significantly different, they focus on small-medium level enterprises and are doing extremely well in this market. Spotfire and SAS continue to invest heavily in the BI space and are improving their product range continuously.


Weight
Tableau
Spotfire
Qlikview
SAS
Microstrategy
Data Visualization
25%
9
8
8.5
8
7
Analytical Insight
20%
8
8.5
7
9
8
Integration
15%
8
8.5
7
8
9
Customer Support
15%
8.5
7
8
8.5
8
Cost & Miscellaneous
25%
9
8
8.5
6
7.5
Points
100%
8.575
8.025
7.9
7.775
7.775
Rank

1
2
3
4
4

Final Recommendation tool for BI: Tableau