2015年3月30日星期一

Blog Assignment #4 ’s Law, Cloud Computing, and DW/BI

Cloud computing is creating a new era for IT by providing a set of services that appear to have infinite capacity, immediate deployment, and high availability at trivial cost. The cloud appeals to organizations struggling with expanding data volumes, low utilization of IT assets, and lack of self-service business analytics. 



Breakthrough 1
Microsoft 2012 Parallel Data Warehouse (PDW)

Parallel Data Warehouse (PDW) is a next-generation platform built to run your data analysis fast and to scale storage from a few terabytes to over 6 petabytes in a single appliance. PDW ships to your data center as an appliance with hardware and software pre-installed and pre-configured for maximum performance. With PDW’s Massively Parallel Processing (MPP) design, queries run in minutes instead of hours, and in seconds instead of minutes in comparison to today’s Symmetric Multi-Processing (SMP) databases.

Breakthrough 2
Microsoft Azure

Microsoft Azure, as Microsoft's cloud platform, is an open and flexible cloud computing system that can build infrastructure, develop modern applications, gain insights from data, and manage identity and access.

Azure is the only major cloud platform ranked by Gartner as an industry leader for both infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS). This powerful combination of managed and unmanaged services lets you build, deploy, and manage applications any way you like for unmatched productivity.

Availability is not merely Disaster Recovery in the cloud, it’s the empowering connection between a datacenter and the cloud for protection and value creation. With today’s announcement, Microsoft provides comprehensive Availability on Demand in Azure for hybrid and heterogeneous environments, allowing organizations to harness a near unlimited amount of compute and storage in the cloud for dev/test, cloud bursting, migration, reporting/analytics, recovery, backup and long-term data retention.


Citation:

http://azure.microsoft.com/en-us/campaigns/azure-vs-aws/
http://azure.microsoft.com/en-us/overview/what-is-azure/
http://azure.microsoft.com/blog/2015/03/26/hybrid-cloud-without-the-hassle-simply-connect-to-azure-with-availability-on-demand/
http://www.teradata.com/Resources/Web-Casts/Exploring-Cloud-Computing-Options-for-Data-Warehousing/#sthash.oiD5lWki.dpuf
https://msdn.microsoft.com/en-us/library/dn520808.aspx

2015年3月4日星期三

MIS 587 Blog 3 Presentation and Visualization Methods


Financial Industry

For the financial industry, I am primarily focusing on the personal banking account visualization in the banking industry, specifically. There are many ways to display information about how a user’ expense look like.
1.       A bar graph with each bar representing a category of accumulating expense over a period of time or the whole lifetime of the banking account, such as education, travel, etc.
2.       A line chart represents the trend of the user’s total expense over a period time. For example, the amount of total expense of a week or a month will be a point on the graph and a line will link all the points.
3.       A monthly statement that contains all the detail of transcations.
My recommendation for the banking account users would be use of a pie chart showing percentage of expense for each of the categories. An example below from the Bank of America dashboard would be a good example to illustrate this.

In this way, users/ customers can easily understand how their expenses distribute and it is very important. Most users have higher priority concern over how they spend their money, rather than emphasizing when they spend, thus a pie graph is better than a line chart. Moreover, it is more user-friendly to customers comparing a bar chart or a data-intensive statement.

Insurance Industry

Standing from the customer perspective, each type of insurance they buy probably just a monthly payment email or letter in the regular time, in spite of the accident happens. Those bills do not make too much sense to most customers when they ask the following question, why my offer is so expensive, even though I never break a law. It seems more frequent in car insurance companies. Thus, visualization of users performance data is critical and would be a competitive advantage in this industry.
A bar graph is good for this task. It can show how each factor such as braking, sharp turn impacts the driver’s evaluation and the premium he/she pays. Then they will get a good idea of what a good driver is.

In addition to this, users can also get a similar bar graph based report to see how they actually perform, such as how many miles the driver travel, how often the driver travel during the night time, and how many times of sudden brake happen.

The visualization is good is because they can both let the drivers exactly know the criterion of being a good driver and how they actually perform, which can be used to optimize their driving pattern/habit and make them safer.

Telecommunication Industry

Almost everyone receives his/her phone bill every month. Since many Americans do not have fixed phone plan, they will choose pre-paid plan or choose pay by use plan. Under this situation, it will be better for customers know how much usage they use and how much money they need to pay over time. It is even better to show the trend as well as the average bill amount for them. A combination of Bar Chart and Line Chart would be great for this. Each bar will represent a total bill amount for each month and different portion of each bar represent different categories of charges such as phone call, text, and data. The green line represents the average expense for the total phone bills over time.



In this way, customers can understand how their monthly phone bill and usage look like and the budget function (the green line) also will be convenient for them to estimate their monthly budget. 

2015年2月19日星期四

Big Unstructured Data v/s Structured Relational Data


This blog is going to give an introduction about the differences between structured data and unstructured data and what those mean for the organizations in the current big data and data warehouses contexts.

Structured Data vs. Unstructured Data
To tell the differences between the two, there are many ways can do it. Let us see the video below first.

This video provides several ways to determine the differences. First, the video suggests one way, which is whether the data can fit into a pre-define model. If the answer is yes, then the data is structured, otherwise, it is unstructured. Pre-defined model can be tables, database, and etc. 

On the other hand, unstructured data would be text-heavy information without an organized manner. 

The other way to determine whether it is Structured Data or not, is to see whether the pieces of data are meaningful to us. As we know, data is just raw materials waiting for us to dig out useful information. Before obtaining useful information, raw data does not make any sense to us. At this time, the raw data is just the so called unstructured data. On the other hand, information is meaningful data. It can also be described as structured, organized, and processed data. The last approach to judge is to use experience. Normally, the types of data that listed below are unstructured data.
1 Word Doc & PDF’s & Text files

Ø  Unstructured data

Ø  Examples: Books, Articles

2. Audio files

Ø  Unstructured data

Ø  Example: Call center conversations.

3. email body

Ø  Unstructured data

Ø  Example: you don’t need an example here!

4. Videos

Ø  Unstructured data

Ø  Example: Video footage of criminal interrogation

On the other hand, those types of data below often are structured data.
1 A Data Mart / Data Warehouse

Ø  Structured Data


Data & Organizations & Warehouses

After understanding both of the structured and unstructured data, the next fact we need to remember is data is extremely important to the business nowadays, no matter what the type of data are.

  • 80 percent of business-relevant information originates in unstructured form. 

                         –  Justin Langseth

With the dramatically decreasing cost of storage, Data Warehouse, as a product born in the context of big data, can store, extract, transport, and load the huge amount of various data for the business organizations. Based on ETL, OLAP, and other BI applications, data warehouse is best assistant to deal with big unstructured data and users can mine the useful resources from the big data pool easily.

Limitations
However, data warehouse also has some drawbacks, especially in terms of analyzing different types of data.

Ø  Data is hosted on various systems which make silos of information.  It is time consuming to get the data and compile it. Some of the data comes in forms of Excel spreadsheets or PowerPoint presentations. There is no easy way to get access to the data and it requires intensive manual processing to gather the data and create reports. There is no ability to perform custom analysis or drill down capabilities.
Ø  Central place to view the data required for reporting and analysis.
Ø  No automated ways to get reports, no dashboards are available.
Ø  Reports are all Excel based, spreadsheets silo and skew the information.
Ø  No ability to do a quick analysis or “what if” modeling.

Outlook of Data Warehousing
There is no doubt that big data is the most popular word that everyone talks about right now. This is just the way that people gradually see and understand the world: the world is composed of unstructured stuff. Unstructured data is just what the reality is. In the past, people always emphasized the accuracy of the dataset and sampling method. This is because the size of the data is very limited so that we must take great care of it. However, things are changing over time. With the development of cloud computing and decreasing cost of computing and storage, people can see the big picture of the world by accepting and even embracing the unstructured data and even missing values.
Data warehousing is the best carrier to store the big data and explore the greatest use from it. It will be one of the most powerful tools not only in the statistical or business analytics world, but also in the whole business, medical, science fields, in the a few years in the future.

References:






2015年2月2日星期一

Business Intelligence & Analysis Products Scan & Evaluation

MIS 587
Blog 1
Due: 2/03/2015
Rui He

Business Intelligence & Analysis Products
Scan & Evaluation
The figure below is called “Magic Quadrant”, a scatter plot developed by Gartner to show how the major players in the current Business Intelligence and Analysis market perform.

Figure 1. Magic Quadrant for Business Intelligence and Analytics Platforms

In this figure, Gartner analysts set two rules, which are ability to execute and completeness of vision, as their judge criteria.  From the figure, we can easily capture a rough idea about how those major players perform according to the two criteria.
Today, I will choose five BI analysis products (not top 5 products) from the figure to compare their overall performance from 5 criteria.

The five products I will examine today will be
Tableau,
QilkView,
MicroStrategy,
Information Builders,
and Jaspersoft.
The five criteria used for judgment will be
Functionality,
Ease to use,
Performance,
Productivity,
and Cost.

Below is the table that displays my evaluation of the five chosen products according to my criteria.
 
Weight
Tableau
QilkView
MicroStrategy
Information Builders
Jaspersoft
Functionality
25%
8
9
9
9
8
Ease to Use
15%
8
10
8
8
8
Performance
30%
9
8
7
8
8
Productivity
20%
8
8
7
9
7
Cost
10%
7
8
7
9
9
Points
100%
8.20
8.30
7.40
8.30
7.90
Rank
 
3
1
5
1
4

What does each criterion mean here?
1.      Functionality
Functionality means capability and extensibility. Capability refers to the range of BI functions that the BI product can support. The more functions that the product has, the higher score it will get for this criterion. Extensibility mainly stands for integration ability in this scenario. For example, the easier that the product can be integrated into web portal, the higher score that the product will achieve. Or the more applications like R, Java, and etc. that the product can be embedded into, the higher score that the product will achieve.
2.      Ease to Use
It includes three aspects. Difficulty level of installation of the product, intuitive level of the GUI/user interface design of the product, and the convenience level of the future and upgrade/maintenance. The easier to install, the more intuitive to explore and use the product, and the easier to maintain and upgrade the BI product, the higher score that the product will get for this criterion.
3.      Performance
It includes aspects. First, the level of business functions that it can support. This is a different requirement than the functionality. Functionality puts more focus on the BI analysis technology perspective like ETL and OLAP, which are standard features. On the other hand, performance concentrates on some plus features or special points, such as ad-hoc slice and dice and some other embedded insights between its product families. Second, stability is also one of the important features in performance. The more stable that the product is, the higher score it will obtain.
4.      Productivity
This criterion focuses on the soft technologies that enable users can effectively improve their productivity. Business dashboards and data visualizations are very good examples that reveal the meaning of productivity.
5.      Cost
Cost is also a very critical factor to evaluate when consider which BI tool to choose, regardless the size of organizations or companies.

Detailed Evaluations and Explanations

Tableau
 Tableau is often the first supplier that comes to mind when businesses consider data visualization tools. While the product is easy to use, and produces very attractive visuals, it is not particularly sophisticated and may prove inadequate as needs mature. It is still a classic choice between ease-of-use and sophistication. However, Tableau must lose its market share when more competitors entering into the market in the future.

QilkView
The QlikView BI platform has the ability to be all things to all people, and will satisfy business users, developers and enterprise needs. It sets the right balance between ease-of-use and sophistication. Great extensibility and the ability to create new chart types and BI apps make QlikView win the game.


MicroStrategy
MicroStrategy is in many ways a meeting of the old and new in business intelligence, and takes the positives from both. It is truly an enterprise solution meeting the less glamorous demands for regular reporting, complex dashboards and extensive admin, while offering up the sexier self-service BI users now expect from a BI solution. It is expensive, and for organizations with less demanding requirements other options will be more economical.


Information Builders      
Information Builders is a long established supplier of BI, analytics and integration technologies.
The integration of BI and data mining is quite unique and puts IB ahead of the crowd. The maturity, sophistication and value for money, is very hard to beat.


Jaspersoft
Jaspersoft does not particularly distinguish itself in any way, but neither does it have any striking inadequacies. This BI suite will address the BI needs of many organizations without a great deal of fuss, and can be extended to meet bespoke requirements.


Conclusion
Overall, I will recommend either Qilkview or Information Builders for your BI Analysis Tools.