August 28, 2015

esProc Simplifies SQL-style Computations – Transpose Rows and Columns

During database application development, we often need to deal with complicated SQL-style computations. The transposition of rows and columns is one of them. Oracle uses pivot function to realize this computation. The other databases haven’t the counterparts to realize it directly, which makes the corresponding code difficult to write, understand and maintain. Besides, even the pivot function can only transpose the fixed columns, but is powerless about the unfixed ones. So are the other databases. Generally all of them must resort to the high level programming languages to realize the dynamic SQL.

However, coding this computation with esProc will be concise and easy to understand. We’ll use an an example to illustrate this.

The following figure shows part of the SALES – a database table where order data are stored.

It is required to compute the total order amount, the maximum and minimum order amount, and the total number of orders of each month of the year 2013, and then transpose the data into a table with thirteen columns and four rows, in which the four operations occupy the first column, with subtotal being the column name, and every month covers a column, with the column names being 1, 2, 3, 4… The first five fields are as follows:

esProc code:

A1:Execute the SQL statement of selecting the data of the year 2013 and grouping and summarizing the data by the month. Result is as follows:

This simple SQL statement for data grouping and summarizing is supported by any database. The difficulty is the transposition of rows and columns following it.

A2=create(subtotal).record(["OSum","OMAX","OMIN","OCount"])

This line of code creates an empty table sequence where there is only one field: subtotal, as shown below:


Note: A table sequence is a data type in esProc. It is a structured two-dimension table similar to the data table of in SQL, but with more powerful function and more flexible usage. By the way, the result of A1 is a table sequence as well.

B2=A2.derive(${to(A1.len()).string()}).

This line of code adds twelve columns to the table sequence in A2 and thus forms the data structure after the transposition, as shown below:


derive function is used to add new columns to an existing table sequence so as to form a new one. For example, derive(1) means adding one column, where 1 is the field name and the field value is the same as the column name. derive(0:field1, null:field2) means adding two columns, where, respectively, field names are field1 and field2 and field values are 0 and null.

According to the requirement of transposition, twelve columns should be added here, for which the code should be derive(1,2,3,4,5,6,7,8,9,10,11,12). A macro, that is ${}, whose role is to convert a string into an expression, is used here in order to generate the code dynamically. to(A1.len()) in the macro is a sequence, whose value is [1,2,3,4,5,6,7,8,9,10,11,12]. The function string() is used to convert the sequence into the string “1,2,3,4,5,6,7,8,9,10,11,12”.

A3-A5:Perform loop on A1, accessing one record each time, rearranging it vertically and, at the same time, modifying the corresponding column in the table sequence in B2. Please note the working range of the loop statement can be represented by the indentation, with no need of using braces ({}), or begin/end. So both B4 and B5 are in the working range and neither A4 nor A5 is in it.

Note: In esProc’s loop body, the loop variable is the cell where for statement is entered. In other word, A3 can be used to reference the current record and A3.MONTH can be used to reference the MONTH field of the current record.

B4=A3.OSum | A3.OMAX | A3.OMIN | A3.OCount

This line of code concatenates the summarized fields of the current record in columns. The operator “|” represents concatenation. For example, the records of December in A1 should be like this after being concatenated:


A3.OSum in the code represents the OSum field of the current record. Since OSum is the second field of the records, it can be referenced by its sequence number, thus the code shall be written as A3.#2. 
Equally, the above line of code can be put as B4=A3.#2 | A3.#3 | A3.#4 | A3.#5.

B5=eval(“B2.run(B4(#):#”+ string(#A3+1)+ “)”).

This line of code means modifying the fields in B2 based on the result of B4.

eval function parses strings into expressions dynamically. For example, the computed result of eval(“2+3″) is 5; and here B2.run(B4(#): #13), the loop code for December, in eval function means inserting members of B4 in order into the 13th column (i.e. December) in B2 according to the sequence numbers of the records in B2.

run function is used to modify the fields. For instance, run(field1+field2:field1, 0:#2) means modifying the value of field1 into field1+field2 and the value of the second field (i.e. #2) into 0.

#A3 means the current loop number. Its value is 1 when the first loop is being executed, and the value is 2 when the second one is being executed, and so on and so forth.

When the loop statement in A3-B5 is executed, the final result will be got in B2. The first several columns are as follows:
In addition, esProc program can be called by the reporting tool or a Java program in a way similar to that in which a Java program calls an ordinary database. The JDBC provided by esProc can be used to return a computed result of the form of ResultSet to the Java main program. For more details, please refer to the related documents。

August 27, 2015

esProc Simplifies SQL-style Computations– Get Top N Rows of Each Group

The operation of getting the top N rows of data is common, such as the biggest salary raise of each employee, the three lowest scores in playing golf and the five days of each month when each product has its biggest sales. SQL resorts to advanced techniques, like window functions and keep/top/row number, to solve this kind of problem. Therefore the code is quite difficult. And as many databases (like MYSQL) doesn’t support these advanced techniques, they can only realize this operation using complicated JOIN statement and nested subquery. If multi-layered groups or multi-level relationships are involved, the computation will become even more complicated.

Yet esProc’s top function can select the top N rows of data of each group according to the row numbers as well as the maximum and minimum values, thus the computation will be made simpler. The following example will teach you how esProc works.  

golf, a database table, stores scores members have got in playing golf. Please select the three highest scores each member has had. Some of the data are as follows:

esProc Code: 

A1:Select data from the database. Click the cell and you can see the result in detail:

A2:=A1.group(User_ID). The code groups the selected data in A1. Result is as follows:

As shown in the above figure, the selected data have been divided into multiple groups by User_ID, with each row representing a group. Click the hyperlinks in blue and you can see members of each group: 
A3:=A2.(~.top(-Score;3)). The code gets the top three records of Score field of each group. Here “~” represents each group of data and ~.top() represents that top function is used to compute each group of data in order. top function can get the top N records of a data set. For example, top(Score;3) means sorting the records in ascending order by Score field and getting the top 3 records (i.e. the three smallest values); top(-Scroe;3) means sorting the records in descending order by Score field and getting the top 3 records (i.e. the three biggest values); and top(#;3) means getting the top three records according to the original order. The computed result of this step is as follows: 

A4:=A3.union(). It performs union operation on each group of data. Result is as follows: 
The above four steps can also be combined into one step for the convenience of maintenance and debugging. Thus the code will be db.query(“select * from golf”).group(User_ID). (~.top(-Score;3)).union().
In addition, esProc program can be called by the reporting tool or a Java program in a way similar to that in which a Java program calls an ordinary database. The JDBC provided by esProc can be used to return a computed result in the form of ResultSet to the Java main program. Please refer to related documents for more details.

August 26, 2015

esProc Simplifies SQL-style Computations – Data Grouping with Fixed Criteria

During database application development, we often need to handle complicated SQL-style computations. Data grouping with fixed criteria is just one type of these. It refers to the computation whose grouping criteria not come from the to-be-grouped data, but from the outside, such as another table, the external parameters or a list of conditions. For the operation of data grouping with fixed criteria in certain cases, like the one in which the grouping criteria are defined by another table and no specific grouping order is required, it can be realized easily in SQL. But for the other cases, SQL has difficulty in realizing the operation. esProc, however, provides easy solutions to various problems of this operation, which will be illustrated through the following examples.

A table – sales - stores the records of orders, in which column CLIENT contains names of clients and column AMOUNT contains order amounts. Now group the table according to the “list of potential clients” and sum up column AMOUNT of each group. Some of the data of sales are displayed in the following figure:

Example 1:The “list of potential clients” comes from the Std field of another table - potential – and only has four records, which are TAS, DSGC, GC and ZTOZ in order. Client ZTOZ is not in sales. It is required to group and summarize the data according to these four records in their order.
If no specific order is required, this operation can be realized easily with SQL. The corresponding code is as follows:

select potential.std as client, sum(sales.amount) as amount from potential left join client on potential.std=sales.client group by potential.std.

But since a specific order is required in this example, a field for sorting the data must be added if SQL is used to realize the operation and this field should finally be removed using subquery . By contrast, it will be much easier to perform the operation in esProc, as the following code shows:


A1,B1:Query data from the database, and name the two groups of selected data respectively as sales and potential, as shown below:

A3:=sales.align@a(potential:Std,Client) This line of code uses align function to divide the Client field of sales into four groups according to the Std field of potential, as shown in the following figure:

It can be seen that the data of the first three groups have existed in sales, while those in the fourth group are not included in it. So the value of the fourth group is empty. The @a option of align function means fetching all the eligible data of each group; without it only the first eligible data of each group will be fetched.

A4:=pjoin(potential.(Std),A3.(~.sum(Amount)))

This line of code joins two groups of data using pjoin function. One part is potential.(Std), which represents the Std field of potential, and the other part is A3.(~.sum(Amount)), which means summing up the Amount field of each group in A3. The final result of this example is as follows:


Example 2:The list of potential clients has fixed values but there are many clients in it many. If there are only a small number of clients, union statement can be used in SQL to combine all the clients into a pseudo table. But the method is not desirable if there are a large number of clients, and a new table must be created to store the data permanently. Yet the use of esProc can save us the trouble of creating the table. The esProc code is as follows:

In the above code, A2 contains strings which are separated from each other by commas and which represent a great number of fixed values conveniently.

Example 3:The list of potential client is the external parameters, like TAS, BON, CHO, ZTOZ.
As the parameters change frequently, it is inconvenient to create a pseudo table using union statement in SQL. Alternatively, we have to create a temporary table, parse the parameters and insert the parsed parameters into this table for later computation. In esProc, it is unnecessary to create a temporary table for this case. The realization process is as follows:

First, define a parameter arg1, as shown in the following figure:


Then modify the script file, as shown below:

Run the s esProc_sql_fixedcriteria_cript and enter the parameter values, which are assumed to be “TAS, BON, CHO, ZTOZ“, as shown below:

Because the grouping criteria are the same as those in example 1, their final results are the same.
Note: The code in A2 converts the strings “TAS, BON, CHO, ZTOZ” into the sequence ["TAS","DSGC","GC","ZTOZ"]. But this step of conversion can be omitted if the parameters are entered and ["TAS","DSGC","GC","ZTOZ"] is got directly.

Example 4:The criteria for the operation of data grouping with fixed criteria can be either the numerical values or conditions. For example, divide the order amounts into four sections by 1000, 2000 and 4000, with each section being a group of orders, and sum up the total amount of each group.
If the criteria have been given, they can be fixed in SQL; if the criteria are the dynamic external parameters, high-level programming languages, like Java, should be used to write a program to compose a SQL statement and the process will be rather complicated. By supporting dynamic expressions, esProc can realize the operation in this example easily. The following shows the esProc code:


In the above code, the variable byFac represents the grouping criteria in this example, which include four string conditions. byFac can be external parameters, or a view or a table in a database as well. The final result of A4 s as follows:

Example 5:In the previous operations of conditional grouping, it happens that there are no overlapped conditions. But actually it is common to find that conditions overlap with each other. For example, group the order amounts according to the following conditions:
1,000~4,000:Common orders - r14
Less than 2000:Non-important orders - r2
Above 3000:Important orders - r3

Here both r2 and r3 overlap with r14. In this case, if we don’t want the overlapped data, we can first select the data satisfying r14 and then select from the rest of the data those satisfying r2, and so forth.
esProc provides enum function to perform the conditional grouping whose conditions overlap with each other, as shown below:


The grouping result of A3 is as follows:

Computed result is as follows:

But sometimes overlapped data is needed. For this purpose, data satisfying r14 will be selected from sales – one of the two groups of selected data mentioned above, and then data satisfying r2 will be selected from the original table of sales, and so forth. Here @r option will be used in enum function to modify the code in A3 into =sales.enum@r(byFac,Amount). Now the grouping result is as follows: 

In the above figure, the data in red boxes overlap with each other. The final computed result is as follows: 

In addition, esProc program can be called by the reporting tool or a Java program in a way similar to that in which a Java program calls an ordinary database. The JDBC provided by esProc can be used to return a computed result of the form of ResultSet to the Java main program. For more details, please refer to the related documents.

August 25, 2015

esProc Simplifies SQL-style Computations – Relational Computation on Groups

During developing database applications, we are often faced with complicated SQL-style computations like relational computation on multilayered groups. As SQL requires summarizing during data grouping and it doesn’t support object-style relational access, it is difficult to deal with these computations. We have to resort to an advanced technique of window functions nested with multilayered subqueries. esProc, however, can handle them more easily by supporting real grouping and visual object-style relational access.

In practice, there are a lot of cases in which relational computation on groups is needed, such as the one presented on the website http://forums.bit-tech.net/showthread.php?t=207052. Based on this practical example, we have designed a more common one for illustrating in detail how esProc works to realize the relational computation on groups. 

Computing target: Query branches of a DVD store where there are less than four categories of DVD copies. 

Data Structure:Branch Table stores information of the branch stores. DVD Table stores titles and categories of DVDs, in which DVDs, like Transformers IV, are virtual data items instead of physical disks. DVDCopy Table stores physical DVD copies branch stores have. Note: DVDCopy Table is related to Branch Table through BranchID field, and to DVD Table through DVDID field. The following is part of the data:

Branch Table:

DVD Table: 

DVDCopy Table: 

Description:
1. Computed results should be certain records in Branch Table.

2. That the Status field of a record in DVDCopy shows Miss means the DVD is missing, and that a record’s LastDateReturned field is empty means the DVD has been rented out but not returned. Obviously the DVDs that are missing and haven’t been returned are outside of our computing objects and should be filtered away.

3. We should consider the situation that there may be certain branch stores whose information isn’t included in DVDCopy Table, though it is rarely seen.

Solution:
1. Select valid, existing DVD copies the branch stores have from DVDCopy Table.

2. Group DVDCopy Table by BID. Each group will contain all DVD copies a branch store has.

3. Select the DVDs corresponding to the DVD copies each branch store has, and compute the number of categories to which these DVDs belong.

4. Select branch stores where the number of categories of existing DVDs is less than four. These branch stores are eligible.

5. Select branch stores which DVDCopy Table hasn’t. They are also eligible.

6. Combine the two kinds of eligible branch stores.

Code written in esProc:


A1-A3:Query data from three tables in the database. The three tables are made variables which are named respectively as Branch, DVD and DVDCopy. Computed results are as follows:

A4: Switch the DVDID field and BID field in DVDCopy Table to corresponding records in DVD Table and Branch Table respectively. Note: This step is the basis of object-style relational access, which requires the use of switch function. After computing, DVDCopy becomes as follows:


Fields in blue have corresponding records. Click one and you can see the details, as shown in the following figure:

Now we can perform object-style relational access only with the operator “.”. For instance, DVDCopy.(DVDID)(Category) represents the category of each DVD copy, and DVDCopy.(BID) gets the detailed information (complete record) about the branch store corresponding to each DVD copy.

A5:=DVDCopy.select(Statues!=”Miss” && LastDateReturend!=null). This line of code aims to filter away the DVD copies that are missing and that haven’t been returned. After the code is executed, values of A5 are shown as follows:


A6:=A5.group(BID) is to group data in A5 by BID, with each row representing all DVD copies a branch store has. The result is as follows:


Click the data in blue and you’ll see members of each group:

It can be seen that group function only groups data, but doesn’t summarize the data at the same time. In this point, it is different from the function for grouping in SQL. Sometimes, we need to further process the grouped data, rather than simply summarizing them. To do this, esProc’s group function is more convenient to use, as shown in the code in A7:

A7:=A6.new( ~.BID:BonList, ~.(DVDID).id(Category).count():CatCount )

The above line of code computes the number of categories of DVD copies to which each branch store corresponds. new function can generate a new object A7 based on the data in A6. A7 has two columns: BonList and CatCount. BonList originates directly from column BID of the grouped data in A6, and CatCount originates from column DVDID of the grouped data. There are three steps to compute CatCount: ~.(DVDID) finds the DVD records corresponding to all DVD copies each branch store has; id(Category) removes repeated records of Category from these DVD records; and count() computes the number of categories. The result is as follows:


That is, branch store B002 has three categories of DVD copies, B003 also has three categories and B001 has four categories.

A8:A7.select(CatCount). This line of code makes query to select branch stores whose CatCount is less than 4. The result is as follows:


The above branch stores that are in short supply are computed according to DVDCopy Table. But maybe some branch stores with serious supply shortage are not in the DVDCopy, such as the cases that all the DVD copies in the branch store has been rented out, or that the branch store hasn’t any DVD copies. So these branch stores should also be counted. The code for this step is as follows:

A9:=A8.(BonList) | (Branch \ A7.(BonList))

In the above code, operator “|” represents the union operation of two data sets (which can be replaced by union function); operator “\” represents the complement operation (which can be replaced by diff function). A8.(BonList), Branch and A7.(BonList) represent respectively branch stores with supply shortage in DVDCopy Table, all branch stores and branch stores which are included in DVDCopy Table. Their respective values are:


A9 computes the final result of this example. Its values are:

Or we can realize the computation indirectly. For instance, first compute “branch stores that are not in short supply”, and then compute the complement of the result and Branch Table. The final result should be the same as that of A9.

Please note variables like A8 or Branch cannot be used to represent a data set in SQL because it doesn’t support explicit set. Thus the simple code in the above have to be replaced by lines of complicated SQL statements.

Besides, esProc can be called by a Java program. The method is similar to that with which a Java program calls an ordinary database. The JDBC provided by esProc can be used to return a computed result of the form of ResultSet to Java main program. For more details, please refer to the related documents .

August 24, 2015

esProc Simplifies SQL-style Computations – Records Corresponding to Max Value

In developing database applications, usually it is the records corresponding to the max/min value that we need to retrieve, instead of the value itself. For example, the occasion in which each employee gets his/her biggest pay raise; the three lowest scores ever got in golf; the five days in each month when each product gets its highest sales amount; and so on. As the max function of SQL can only retrieve the max value, instead of the records to which the max value corresponds, it is quite complicated to handle the computation in SQL with the aid of some advanced techniques, such as window functions or the nested sub-queries or keep/top/row number. If multi-layered grouping or relations are involved, the computation will become even more complicated.

With the top function in esProc, the records corresponding to the max value can be retrieved and the computation becomes much easier. The following is such an example.

The database table golf contains the scores of members in a golf club. Please select the best three scores each member has ever got. Part of the data is as follows:

The code written in esProc: 

A1: Retrieve data from the database. If the data come from a structured text file, the following equivalent code can be used: =file(“\\golf”).import@t(). Click the cell and we can check the retrieving result:  

A2:=A1.group(User_ID), i.e., group the result of A1. The result is as follows:  

As shown in the above figure, the data have been separated into multiple groups by User_ID and each row is a group. Click the blue hyperlink and members of the group will be shown as follows:

A3:=A2.(~.top(-Score;3)). The code is to compute the records of each group of data whose field Score is in the top three. Here “~” represents each group of data. ~.top() represents that top function will work on every group of data in turn. The top function can retrieve the N biggest/smallest records from a data set. For example, top(Score;3) represents sorting by Score in ascending order and fetching the first three records (i.e. min values); top(-Score;3) represents sorting in descending order and fetching the first three records (i.e. max values). The result of this step is as follows:

A4:=A3.union(), which means union data of every group. The result is as follows:

In the above, the computation is performed step by step. But the steps can be integrated into one for the convenience of maintenance and debugging: db.query(“select * from golf”).group(User_ID). (~.top(-Score;3)).union().

The method with which a Java program calls esProc is similar to that with which it calls an ordinary database. The JDBC provided by esProc can be used to return a computed result of the form of ResultSet to Java main program. For more details, please refer to the related documents .