esProc, A Script Language for Data Analytics with Parallel Mechanism

May 8, 2016

Dynamic Criterion in Report Building

You can perform data sorting in both the report script and in SQL. But both are only capable of sorting data by specified field(s). To sort data by a specified list, you have to compose the code using functions like decode and union. If the sorting criterion is a dynamic parameter, it’s more difficult to implement it in SQL or with a report script as a temporary table is usually needed. What’s worse, the sorting criterion doesn’t always correspond to the field values completely. Sometimes you need to fill in the difference, while other times you can just ignore them. It’s rather complicated in handling both issues.

With support of alignment functions, order-related computations and explicit sets, esProc can easily solve the sorting problems mentioned above. So you can use it as the tool for report data source preparation. The reporting tool regards an esProc script as the stored procedure, passes parameter to it and gets the returned result after execution through JDBC. For more details, see Howto Use esProc to Assist Reporting Tools.

Here are the sorting problems concerning external/dynamic criterion commonly seen in report development, and their solutions in esProc.

Sorting by simple external criterion

Sort the Service table according to the condition that the aNum field will be presented in an order specified by [1,3,2,4], as shown by selections of source and target tables below:

Source table

Target table

name	aNum
z	3
n	3
t	1
t	1
w	3
e	3
a	2
r	4
q	2
q	4
y	2

name	aNum
t	1
t	1
z	3
n	3
w	3
e	3
a	2
q	2
y	2
r	4
q	4

esProc code:

	A
1	$select name,aNum from tb
2	=A1.align@a([1,3,2,4],aNum).conj()

A1 performs a SQL statement. The align function groups and sorts records according to the specified list, with the default rule that only the first found record is retained; but with @a, the function will retrieve all records in a group. The conj function concatenates records together.

1. Dynamic sorts can be achieved through a parameter, such as A2=A1.align@a(arg_List, aNum).conj(). So when arg_List is [1,3,2,4], we can get our result. In this way the code becomes reusable.

2. If there are fewer items in the list than the aNum field values, such as arg_List=[1,2,4], records that don’t have matching items will be discarded by default, as shown by the following result:

To append those mismatched records after the sorted ones, use A2=A1.align@n(arg_List, aNum).conj() and the result is as follows:

Or you can use the equal statement - A2=A1.align@s(arg_List, aNum).

3. If the items of the list outnumber the field values, the extra items won’t appear in the sorting result.

Exporting data in specified order

Suppose we need to sort the database table PRODUCT for presentation. Instead of performing the sort by a field, the requirement is that the top N records be sorted in the specified order and the rest by an existing field, as shown in the following selections of source table and target table:

Source table

Sort the top 4 records in specified order

PRODUCT_ID	PRODUCT_NAME
100	Nokia
200	IPhone
300	Samsung
400	LG
500	HTC
600	BlackBerry

PRODUCT_ID	PRODUCT_NAME
300	Samsung
400	IPhone
100	Nokia
200	LG
500	HTC
600	BlackBerry

esProc code:

	A
1	$select PRODUCT_ID, PRODUCT_NAME from PRODUCT
2	=A1.align@s(arg_IDList, PRODUCT_ID)

The align groups and sorts data according to the specified list, and @s adds the mismatched records at the end. The above target table can be obtained when arg_IDList is [300,400,100,200].

Finding missing values

There are a lot of records in Table1 where ID field is a sequence consisting of inconsecutive integers. Now we need to find those missing integers and put them in the right places in Table1, while leaving corresponding fields in blank. Below is a selection of the source data:

ID	Name	Amount
1001	Jonathan	4418.6
1002	Timothy	3212.7
1003	Michael	2711.3
1007	Alexis	5322.4
1008	Ashley	832
1009	Daniel	3423.3
1010	Joseph	483.2
1012	Chloe	87332.2

If the set of integers is small, we can generate a sequence with consecutive numbers in SQL with the minimum and the maximum values, compute the difference between this sequence and the ID field values using a subquery, then create empty records and union with the original data, and finally perform a sort. But if the set is big, we need to look for a workaround for implementing the merge algorithm to increase the performance. As a result, the code becomes complicated. esProc supports a direct merge by fields, and thus can produce the following code:

1	=db.query(“select ID from table1 order by ID”)
2	=to(A1.m(1).ID,A1.m(-1).ID).new(~:ID,Name,Amount)
3	=[A1,A2].merge@u(ID)

The m function can get members of a set by their sequence numbers in both normal and reverse orders. A1.m(1) an be simplified as A1(1). The to function generates a consecutive sequence. The merge function merges ordered data, and it works with @u to get the union. Here’s the result:

Intra-group sorting in a fixed order

The attendance table records the attendance information. The requirement is to convert each person’s information per day (a fixed 7 records) to two rows - one is morning and the other is afternoon for reporting. For each row, Per_Code, Date, In and Out fields are the same but Break and Return fields are not. Below is the attendance information of a person on a certain day:

Per_Code	in_out	Date	Time	Type
1110263	1	2013-10-11	17:14.0	In
1110263	6	2013-10-11	37:00.0	Break
1110263	5	2013-10-11	38:21.0	Return
1110263	0	2013-10-11	43:21.0	NULL
1110263	6	2013-10-11	21:30.0	Break
1110263	5	2013-10-11	25:58.0	Return
1110263	2	2013-10-11	28:55.0	Out

According to this data, we should retrieve the 4 records of the morning by the specified sequence numbers and then transform them to a single record with static method. The operation will be performed on records of both morning and afternoon separately. There are a lot of real world cases that involve this kind of order-related algorithm. But as SQL lacks the intrinsic sequence numbers, it turns to pivot and over methods or the like. The problem is the composed code is difficult to understand and debug.

esProc code:

	A
1	=$select * from attendance order by Per_Code,Date,Time
2	=A1.group(Per_Code,Date)
3	=A2.(~.align([1,7,2,3],#))
4	=AM=A3.new(Per_Code, Date, ~.(Time)(1):In ,~.(Time)(2):Out, ~.(Time)(3):Break, ~.(Time)(4):Return)
5	=A2.(~.align([1,7,5,6],#))
6	=PM=A5.new(Per_Code, Date, ~.(Time)(1):In ,~.(Time)(2):Out, ~.(Time)(3):Break, ~.(Time)(4):Return)
7	=AM \| PM

A3 retrieves record 1, 7, 2 and 3 from each group. A4 joins and converts the four records into a single record and stores it in the empty two-dimensional table sequence AM. ~ represents the current group, # represents a sequence number in a group, and | concatenates records together.

Here’s the result for a person on a certain day:

April 7, 2016

The Standard Method of Performing Transposition

Sometimes the source data needs to be transposed before displaying it in the report. But cross table can only handle the most basic transposition. Most of the scenarios need to be dealt with using SQL, report script or Java script. Each way has its own trick with high degree of difficulty.

esProc supports dynamic scripting, order-related calculations and set operations, and has the ability of realizing various types of data transpositions using one universal method, that is, retrieving data, creating empty result set and filling data in it. It is an ideal tool for report data source preparation. The reporting tool regards an esProc script as the stored procedure, passes parameter to it and gets the returned result after execution through JDBC. You can learn more from How to Use esProc to Assist Reporting Tools.

These are common transposition problems with database tables in report development and their solutions using the standard esProc method.

Transposing multiple columns to one row

Below is the format of the Students table:

name	age	sex	kg
A	10	f	30
B	11	f	35
C	12	m	33

You need to combine all records into one row in a format and display it in the report as shown below:

Aage	Asex	Akg	Bage	Bsex	Bkg	Cage	Csex	Ckg
10	f	30	11	f	35	12	m	33

esProc script:

A2 creates an empty table sequence dynamically. A3 combines A1 into a single record and appends it to A2. ${} is used to parse a string into an expression for dynamic execution. string(field) means converting data of other types into the string type data. set.string() means concatenating members of a set into a string. The record function can be used to append records. You can also append records with this line of code: >A2.record(A1.conj(~.array().to(2,4)))

So you can see that the standard method of performing transposition includes 3 steps: Retrieving data in SQL, creating an empty result set using create function, and looping through the source data with run function and append new data to the result set with record function. The code for data appending could vary depending on different scenarios.

Multi-group transposition

In the database table kpi, every 4 records with the same f_site value is a group. You need to create a group report based on different f_site values. Each group has 5 columns of detailed data - they are "KPI Name"，"2015-04-21 13:15","2015-04-21 13:30","2015-04-21 13:45", and "2015-04-21 14:00". The values of KPI Name are the three kpi columns - ioh_kpi, idh_kpi and iol_kpi. The values of the four date fields are the kpi details of each time period. Below is a selection from the source data:

dataset_date	f_site	ioh_kpi	idh_kpi	iol_kpi
2015/04/21 13:15	X6SF_SARF1	1	2	3
2015/04/21 13:30	X6SF_SARF1	9	1	2
2015/04/21 13:45	X6SF_SARF1	8	9	1
2015/04/21 14:00	X6SF_SARF1	7	8	9
2015/04/21 13:15	XC_01	2	3	4
2015/04/21 13:30	XC_01	11	12	13
2015/04/21 13:45	XC_01	21	22	23
2015/04/21 14:00	XC_01	31	32	33

esProc script:

A2 gets the distinct dataset_date values, that is ["2015-04-21 13:15","2015-04-21 13:30","2015-04-21 13:45","2015-04-21 14:00"]. B2 gets field names of A1 starting from the third field, i.e. ["ioh_kpi","idh_kpi","iol_kpi"]. A3 creates a two-dimensional table dynamically, with field names being site,KPI Name,"2015-04-21 13:15","2015-04-21 13:30","2015-04-21 13:45","2015-04-21 14:00". A4-B5 use a loop statement to append data; the statement is equal to A1.group(f_site).run(…), but it is more clear than run function when steps are many. Here’s the result:

Then you just need to perform a group by site and then create a simple table for reporting.

Inverse transposition

A query finds that the tb1 table has one record with many fields, as shown below:

project	operator1	actionTime1	operator2	actionTime2	operator3	actionTime3	poerator4	actionTime4
A	Ashley	20140404	Rachel	20150101	Emily	20140909	Ashley	20150225

You need to create a report with three columns and multiple rows, as shown below:

project	operator	actionTime
A	Ashley	20140404
A	Rachel	20150101
A	Emily	20140909
A	Ashley	20150225

esProc script:

((A1.fno()-1)/2) calculates the number of records that need to be added to the result set and run function runs a loop to append them.

If there is more than one record in the tb1 table and each record has a different project, you need to use the following code:

Here’s the result:

Inserting a subtable with specified maximum length dynamically into the main table

dColThread and dColQuestion are the main table and the subtable that are related through tID field. Each record of the main table corresponds to multiple but less than 5 status field values. You need to insert the status values between Phone and Decline fields and name them QuestionNo1、QuestionNo2…QuestionNo5.

Below is a selection of dColThread:

tID	ApplicationName	User	Phone	Decline
A01	mfc	Bill	+70000000	1
A02	mfc	John	+18761221	2
A03	java	Jack	+8014001231	6
A04	mfc	Tim	+008613133123	4
A05	db	John	+18761221	8

Below is a selection of dColQuestion:

qID	tID	status
1	A01	yes
2	A01	no
3	A01	yes
4	A02	yes
5	A03	no
6	A04	no
7	A04	no
8	A05	yes

esProc script:

Create a two-dimensional table with fixed fields in A3; loop through each of A2’s groups to get the status value and complement the missing values to reach five; and then append the complete records to A3.

A3 gets the final result which can be displayed using the reporting tool’s table control directly:

Complementing the missing months before transposition

The tb table, with two fields - time and quantity, contains the product’s sales quantity per day. In certain months, the quantity value may be null. Below is a selection from the source data:

time	quantity
2014-01-01 15:20:25	3
2014-02-21 16:11:23	2
2015-01-05 11:14:21	1
2015-02-11 15:21:11	2

You need to present 12 rows in the grid of the report, with fields including the fixed months (values are 1-12), and the annual sales quantity, as shown below:

Month

Quantity in 2013

Quantity in 2014

Quantity in …

esProc script:

A1 performs a SQL group and aggregate; A2 gets a list of years; A3 groups A1 according to the sequence of 12 months; A4 dynamically creates an empty two-dimensional table; A5 loops through every group of A3 and appends one record each time. Below is the result of A3:

A4 gets the final result, as shown below:

Transposition with dynamic locating

In the database table tb, every 3 records with the same userid are a group. You need to transform these groups into rows and present result in a report with the table control. Below is a selection from the table tb:

userid	type	descr
scooby	dog	dog
scooby	weight	50
scooby	hair	long
mickey	mouse	mouse
mickey	hair
mickey	weight	2

The desired layout is:

userid	type	hair	weight
mickey	mouse		2
scooby	dog	long	50

esProc script:

The align function aligns data with members of a set ([‘hair’,’weight’]); @n means placing the unaligned data in a separate row. As with this example, such a row is composed of records corresponding to mouse and dog. Here’s the result:

Column-to-row transposition with three related tables

There are three tables – Students table, Exam table and Retest table – which are related between each other through stu_id, as shown below:

Students

stu_id	stu_name	class_id
1	Ashley	1-1
2	Rachel	1-1
3	Emily	1-3

Exam

stu_id	subject	score
1	java	77
1	c++	80
2	java	67
2	c++	58
3	java	56
3	c++	85

Retest

stu_id	subject	score
2	c++	78
3	java	82

You need to query the three tables to get the score of each subject, the total score and the retest score for every student. Below is the desired layout:

stu_id	stu_name	java_score	c++_score	scoresSum	javaRetest	c++Retest
1	Ashley	77	80	156
2	Rachel	67	58	125		78
3	Emily	56	85	141	82

esProc script:

menu

May 8, 2016

Dynamic Criterion in Report Building

Sorting by simple external criterion

Exporting data in specified order

Finding missing values

Intra-group sorting in a fixed order

April 7, 2016

The Standard Method of Performing Transposition

Transposing multiple columns to one row

Multi-group transposition

Inverse transposition

Inserting a subtable with specified maximum length dynamically into the main table

Complementing the missing months before transposition

Transposition with dynamic locating

Column-to-row transposition with three related tables