March 28, 2013

A Hard Nut to Crack on Filtering the Grouped Data


The spreadsheet has not provided any good solution to the filtering on the grouped data, which make the business users suffer because they will have to resort to the IT stuffs while making a business analysis.
 
Suppose that you need to filter out the countries in which the sales are above 200 million US dollars. It seems that you can just collapse the data to the 2nd level and then filter directly. However, the filtered result is:
 

The summary value in the subtotal column turns out to be 0, and the Switzerland is expanded additionally. It is obvious not the result we have expected. The reason to these phenomena is that the common spreadsheet cannot tell the summary section from the detail section. Filtering on these summary values will affect the details and double the mess for both the detail and the summary.

Then, let’s check the 2nd filtering: filter out the outstanding distributer by the half of the highest sales. Considering this way of filtering, we can see that the first trouble is to calculate the half of the total sales in each country. If collapsing the data section, entering the expression in B3, and dragging it to other cells, then you will find that the original detailed data in the column B will be replaced with “#Value!”:

       
Again, this is also because the summary section and the detail section cannot be differentiated. In this case, the formula can only be copied one by one manually. For these nearly 200 countries, it is almost impossible to make no error.

Then, it is more cumbersome to filter the details by the data in the column B. You can only filter respectively one group after another and enter the filtered value manually at the first. Then, copy the filtered result to the new sheet for future use, or the filtering cannot be carried on. This job involves great workload, and almost impossible to finish manually. Shall we return to the Office Primitivation?

esCalc can achieve the Office Automation for such kind of grouping problems. esCalc introduces the concept of “homocell”, which can have the data of the same business status to be calculated automatically, and the data of different status will not be interfered mutually.  Thus, the filtering in the summary zone will not affect the detail, and vice versa. For example, to filter by the subtotal of sales, simply collapse the data and filter in the summary section: 


Once expanded, you will find that the detail data will not be affected:


If filtering out the outstanding distributor by the half of the total sales, it will be much easier. Enter the formula in the detail zone of any group like the group 1:


Just input this formula once, and it will be executed in other groups automatically. The formula will be adapted automatically with no human intervention required. The final result is shown below:

It is originally posted on it.toolbox.com.

March 26, 2013

Problem: Further Computations on Grouped Data in Spreadsheet



Spreadsheets can usually carry out grouping on data easily, but the further computing on grouped data is much more difficult. For example, a data (wine inventory) is after shown as below after group and summary :


Say, if we are to compute the total amount of each wine or even each type. In the spreadsheet, it is a bit difficult to carry on these calculations or actions based on the above: calculate the ”UnitPrice*Stock” amount; sort the data in the group by inventory level; add an additional level of grouping by Name to calculate the inventory of the same type of wine; calculate the average unit price of each type of wine; and even highlight the subtotal section such as the F4 and F10. 

To calculate the resulting amount of ”UnitPrice*Stock”, a new column G need to be appended along with an expression of “=E4*F4”, and then drag the expression to every row of this column. Because UnitPrice is unavailable in the subtotal row, the formula is invalid there and must be removed for each row manually, which is error-prone since there are many types of wines.

To sort the data in a group by the inventory level, you can only sort manually on each group of data since the data is allocated to several irrelevant groups. For example, the 1st group of data is from the row 4~9. It is error-prone if the data volume is huge. Another solution is to ungroup and present the data in a 2D sheet without subtotal, sort by Type and Stock, and then summarize in groups. In this solution, if there is any data in the subtotal row like “proportion of subtotal to total”, then users will have to recalculate these data once the row of subtotal disappears in the procedure. All in all, both solutions suffer from obvious drawbacks.

Add another level of grouping by Name to calculate the inventory of a same type of wine. Because the data is not sorted by Name, the data cannot be grouped directly and we have to go back to solve the previous problem.

To calculate the average unit price of each type of wine, both the 2 solutions available are quite cumbersome. The first solution is to ungroup and paste the data into 2 pieces, either of which will respectively go through the average unit price calculation or the inventory summary calculation, then join the summary values of these 2 pieces of data. The second solution is to write the formula in every subtotal section, for example “=SUBTOTAL(1,E4:E9)” in E3. Since the data in various groups differ, users will have to write N different formulas again and again and errors may quite easily occur.

To highlight the subtotal section, users have no choice but to repeat the simple action on each group one after another manually. Without any means to automate, it is so sarcastic when considering the Office Automation.

In facts, the above problem can be solved with esCalc. esCalc introduces the concept of homocell to present the business relations between cells so as to enable the auto-processing on data of various business senses. For example, if inputting the formula =E4*F4 in G4 for once, then the formula will be pasted to all rows of the details data except for the summary row, as shown in the below sheet:


The data in the group is sorted by the inventory level. In the same way, just sort once in the F4, F5 or F11 cell, the respective positions will be auto-identified, and the data in each group will also be auto-sorted, as shown in the below sheet: 


As you know, the esCalc homocell mechanism can solve other problems automatically, and we will not dwell on it for it’s the topic of another essay.

Related Articles:

March 21, 2013

An Example Where Circles are Useful in Graphs

The originally post by Naomi Robbins,: http://www.forbes.com/sites/naomirobbins/2013/01/15/an-example-where-circles-are-useful-in-graphs/

I recently saw the infographic in Figure 1 and thought to myself, “Another application of bubbles in infographics.” After all, I criticized circles and bubbles in Misleading Graphs: Figures Not Drawn to Scale. In The Functional Art, Alberto Cairo wrote “No fashion plague is more prevalent as I write this book than the bubble.” I didn’t study Figure 1 carefully since the text is in Italian, a language that I don’t read.


This figure was originally published on July 22, 2012 in La Lettura, the Sunday cultural supplement of Corriere della Sera, the highest circulation newspaper of Italy. It was produced by Accurat, a design agency in Milan and New York that does amazing work.

Later I discovered Figure 1 was translated into English and discussed in Parson’s Journal for Information Mapping. There I learned that the circles represent the distance you can travel underground from the center of cities using their subway systems. Overlapping symbols show that you can get from one city to the other underground. I can’t think of a better way to show distances from the center of a city than with a circle.

The visualization then provides lots of other information by which each city’s subway system could be compared. Pictograms show the number of passengers, the stroke width of the circles shows the cost of a one-way ticket, and the little colored squares show the colors of the lines.

There is more than one takeaway from this story. First, our first impressions are not always correct. Second, exceptions exist for many graphical principles. There are examples where circles are an excellent choice to show the data we want to display.

March 13, 2013

Is it Possible to Realize Self-reporting for Users without IT Expertise?

The reporting tool is the software to present the table and statistics chart dynamically. It is the basic requirements for business, a basic means of running business, and the basis for boosting competitiveness. It is the most common tools for business personnel. However, using the traditional reporting tool requires a relatively stronger technical background. The average business personnel cannot handle it independently, and thus lots of problems are brought about: The business personnel is quite inefficient in preparing the report, the time-spanning is quite great, and the commercial opportunity can be slipped away easily; The report designed by technical personnel can easily bring about the errors on business logics, and mislead the enterprise decision; As the result of cooperation between the technical personnel and the business personnel, the report may deviate from the actual business, and fails to reflect the actual state of the businesses affairs.

For example, Bill is the sales director of a certain pharmaceutical company. He is in urgent need to prepare a certain product sales report to impress the clients in an important bidding event. This report requires presenting the monthly sales value, monthly growth, and year-over-year growth of these 3 products, as shown in the below figure:

However, he found that this report does not exist in the information management system of the company. Therefore, he himself and sales team tries to prepare this report by themselves. Quite soon, they found that many obstacles are unconquerable, for example, they needs SQL statements to retrieve the data, the complex expressions for filtering and processing, and the abstract expression for month-over-month comparison or link relative ratio calculation. Bill can do nothing but abandon the report, which greatly reduced the eventual effect of presentation.

There are still other similar cases available:

A retailer newly launched a supply-chain management system, and arranged the business personnel for fabricating the report. The business personnel have grasped a great many of fabrication skills for traditional reports. However, they still act slowly when preparing the simple report, and cannot fabricate the complex table.

A bank improves its business procedure that requires the modification on the corresponding reports. However, from the IT team point of view, the report modification so complex that the procedure will have to be simplified. As a result of compromising, the procedure is partly simplified, and some reports are modified.

When preparing the report related to Clinical Practice, the IT technician from the report vendor mixed up with the AE (Adverse Event) and SAE (Serious Adverse Event) concepts, which leads to the error of the key indicators.

As can be seen from the above examples, the technical obstacles have hindered the development of traditional reporting tools because of the native drawbacks of traditional reporting tools:

To retrieve the numbers, SQL statements become a must

For example, the 1st step to prepare the report is to retrieve the data, and the data is usually stored in the database. Therefore, we need to compose SQL statements.

In the case of "Specific Product Sales Report", the relevant data is stored in the 2 physical tables. The primary data of mediSales table is as follows:


The primary data of mediInfo table is as follows:


There are multiple ways to retrieve the data through SQL. In the extreme cases, although a long SQL statement can be used to calculate out the final report result, it is a bit too complex and the existence of reporting tool becomes ultimately meaningless. The simplest way is to split it into 2 statements. Firstly, retrieve the sales data from Jul to Dec of this year from mediSales table, as shown below:

SELECT mediSales.OrderID,mediSales.OrderDate,mediSales.Quantity,mediSales.mediID from mediSales
where ( mediSales.OrderDate>=#7/1/09# and mediSales.OrderDate<=#2009-12-31#)
or( mediSales.OrderDate>=#7/1/10# and mediSales.OrderDate<=#2010-12-31#);

The result is:


Then, retrieve the specific 3 products from mediInfo, as shown in the below figure:

SELECT mediInfo.MediID, mediInfo.MediName ,mediInfo.UnitPrice from mediInfo
where mediInfo.MediName in('ApXXXX','GpXXXX','JiXXXX');


The efficiency will be much higher if combining both of these 2 actions, and SQL makes it a bit more complex:

SELECT mediSales.OrderID,mediSales.OrderDate,mediSales.Quantity,mediInfo.MediName,mediInfo.UnitPrice from mediSales,mediInfo
where(( mediSales.OrderDate>=#7/1/09# and mediSales.OrderDate<=#2009-12-31#)
or( mediSales.OrderDate>=#7/1/10# and mediSales.OrderDate<=#2010-12-31#))
and mediInfo.MediID=mediSales.MediID and mediInfo.MediName in('ApXXXX','GpXXXX','JiXXXX');

SQL is the commonest database query language. Theoretically, they can be used to complete any complex data processing procedure. However, since it is a tool specially designed for IT technician, it becomes the impassable technical obstacle for business personnel as they will have to learn the primary key and foreign key, BCNF, association, nested query, etc. For the business personnel, learning these contents out the range of their major expertise is not only wasting their energies, but also of the bad effect. They needs a tool highly SQL visualized.

In addition, as for the data from text file and other non-databases, the senior languages like Perl and Java are often required to retrieve the data that requires the relatively higher technical standards.

The obstacle to data retrieval blocks in the way to reporting by the business personnel themselves.

There are also other native drawbacks of traditional reporting tools:
  • Data management requires the complex scripts
  • The commonest calculation cannot be implemented without involving the IT technicians
  • There are many limitations on the band-style report
What's the solution? For example, esCalc has been recognized in business computation powered totally by users primarily through word of mouth. It excels in visualized data computing, smarter spreadsheet calculating and professional self-service reports.

Related News about Raqsoft:

Get a Standalone Alternative for Excel at $0
Truth behind Ticket Purchasing Rush: Statistical Analysis Works
What Makes Self-service Statistical Computing Tools So Important?