July 24, 2014

The Branch Statement in esProc

In branch statement, we need to check one or multiple conditions, execute different code according to the results or return different results. In esProc, the branch statement if/else has the following form:

1. if x … else … at the same line
When condition x after if is true, execute the statement after it. Otherwise execute the statement after else. The else part can be omitted, else and if must be in the same row. After execution, the value of the cell in which if is located is the computed result of x.
 

Sometimes, there is no else part, we only need to decide whether the code after if statement should  be executed:
The below logical connector can be used in the judgment state ments:


  • a&&b

For the "a and b", the result is true only on condition that both a and b are true.

  • a||b

For the "a or b", the result is true only on condition that either a or b is true.

  • !a

For the "not a", the result is true only on condition that a is false.


2. Function if(x,a,b)
When x is true, compute expression a and return the result; otherwise, compute expression b and return the result. Sometimes, the previously mentioned if…else… code can be replaced by function if in order to make the statement more concise.
 

3. if x … else … in the code block
If x is true, then execute the code block of if; otherwise, execute the code block of else. You can certainly ignore the else part. else and if must be in the same column. Different from some other program languages, esProc employs code block, in stead of symbols such as {} or statements like end if, to determine the statement’s valid scope.


As shown in the example, code block of if, or code block of else, is executed alternatively based on the judging results of B4 to compute the name list and total number of male employees, as well as those of female employees. Results can be seen in B2, B3, C2 and C3 after the computations are over:
 
4. Multiple blocks of if x … else if y …
This is code for multi-branch statement in esProc, which can be written repeatedly forever. If x is true, then execute the code block of if; otherwise, if y is true, execute the code block of else if…., else and if must be in the same cell. Again the point is emphasized that there is no such corresponding statement of end if, and esProc employs the scope of code block to determine when if statement will be over.


5. The Nested Branch Statement
Branch statement can be used to make further judgment in the code block of if or else.
 
In the above example, a character string of year and month, with a separator "-", is input in A1, and the code after it computes the total days of the month. The computation should first judge which month it is. For February, further judgment is needed to determine whether it is a leap year. The computed result in A8 is as follows:
 

esProc’s Multilayer Parameters

esProc provides a large number of functions, many of which use many parameters. In order to clearly judge positions of these function parameters, and make writing and reading easier, esProc is specially equipped with multilayer separators of function parameters.

1.Separators of function parameters

Colon (:), comma (,) and semicolon (;) are used as separators of function parameters in esProc. Their priority decreases in turn.
The most common method is to use commas to separate parameters, which is in line with function syntax in most programming languages. For example:
In this example, parameters of if(), create() and T.insert() function in A3, A4 and A5 are separated from each other by commas. 

Some functions have “coupled” parameters, which are closely related or work together. In this case, a colon is often used to separate them. 

For example, for if() function in A3, each condition corresponds to a returned result and colons are used to separate the results. In the in() function used in one condition, 150 and 180 are also separated by a colon and together they form a numerical interval [150,180]. In A5's T.insert() function, field value and field name also come in pairs with colons in between to separate them.  

In some functions, indicative parameters can be added to certain parameters to change the computational method relating to them. In this case, colons are usually used as separators. See below:


Both A2 and A3 sort records of cities according to state ID first, then sort by name if cities belong to the same state. Difference is that -1 is appended after NAME in A3’s function, meaning that sorting by name is in a descending order.

Sometimes, parameters in function can be divided into different parts according to their roles. Semicolons are usually used to separate these parts.

In A2’s groups() function, the parameter before the semicolon is used for grouping, and those after it are for summarizing computation, whose parameters are separated by a colon to define name of the summarizing field. In A3's top() function, the parameter after semicolon defines that the top 5 records are fetched.

In some functions, parameters are quite many. Usually these parameters are divided into several groups which are separated by semicolons: 

T.switch() function in A4 transforms different fields into records of another table sequence, and a semicolon is used here. Besides, comma, colon and semicolon are all used in A4 as separators. This kind of code writing creates clear layers for function parameters. 

2.Omission of function parameters
Some esProc parameter functions have default values and, therefore, can be omitted, making functions more concise.

The parameter after a colon is generally used to complement another’s computational model. If default mode is used, the parameter can be omitted. For example: 

In A5, parameters after colon are used to designate field names corresponding to certain values when inserting records. Parameters for designating field names can be omitted in A6 because it uses default field names to set field values one by one.

But colons cannot be omitted when they are used to separate intervals:
In A3, in(B1+B2,180:) and in(B1+B2,:120) represent respectively B1+B2>=180 and B1+B2<=120, in which the colons cannot be omitted.

For parameters separated by semicolons, the semicolon can be omitted if there is no parameter after it. For example, if n is not set in A.top() function and only the top one is needed; or when the transformation of a certain field is not needed in T.switch() function. 

When commas are used as separators, they should generally be retained even if parameters are set by default. For example:

Expression in A2 equals to =A1.to(4,A1.len()), expression in A3 equals to =A1.to(1,4). 

July 23, 2014

Differences and Similarities between esProc TSeq and SQL Data Table

esProc TSeq and SQL data table are all the structured 2-dimensional data objects. The records, indexes, and primary keys can all be applied to the structured data computing. Although both of them can be applied to the structured data computing, their application scenarios differ distinctly. TSeq is fit for solving the complex computing problem involving not-so-big data volume, while data table is fit for the common computation involving possible big data volume.

Their differences are determined by the underlying mechanism.

TSeq is ordered. Each record and data in each column has a specific sequence number. TSeq supports the explicit set, and set operations can be performed directly between multiple TSeq; TSeq is also the generic set. The basic element can be numeric value, reference, or another set.

SQL data table does not have the above features. But SQL data table is transparent to the memory and external storage. A consistent syntax is used to access to the memory, external storage, or blended data.

In the below sections, we will further discuss their differences and similarities.

Common basic functions

Both esProc TSeq and SQL data table are the structured 2-dimensional data objects. In other words, they are row-style 2-dimensional table based on records with one row for one record. The 2-dimensional table together with multiple column names forms a complete data structure. Because the structures are largely similar, their basic usages are almost the same.
Example 1:Query the data in the object. Find the orders in which the number of Freight is greater than 100 and placed before 2013.

SQL: SELECT * FROM Orders WHERE Freight > 100 AND OrderDate<'2013-01-01'
TSeq: =Orders.select(Freight > 100 &&OrderDate< date("'2013-01-01"))

Note:In this example, the data object name is Orders, and another data object of Customers will be used in the subsequent sections.

Example 2: Sort. Sort the orders by EmployeeID in ascending order, then sort by Freight in reverse order.

SQL: SELECT * FROM Orders ORDER BY EmployeeID ,Freight DESC
TSeq: =Orders.sort(EmployeeID,Freight:-1)

Example 3:Grouping and summarizing. Group by employee, summarize the freight charge, and count the orders.

SQL: SELECT EmployeeID, COUNT(OrderID), SUM(Freight) FROM Orders GROUP BY EmployeeID
TSeq: = Orders.groups(EmployeeID;sum(Freight),count(OrderID))

Example 4:Join. Join the two data objects Orders and Customers to form a new data object. Use the left join and the join field is CustomerID.

SQL: Select * from Orders left join Customers on Orders. CustomerID =Customers. CustomerID
TSeq: =join@1(Orders:, CustomerID;  Customers:, CustomerID)

Besides the above several basic usages, esProc TSeq and SQL data table are very alike in the algorithms like distinct, count, sum up, average, maximum, and minimum, just a few example here.

Difference of being in order

Thanks to the ordered record set of TSeq, the order-related computing can be solved easily. SQL data table lacks the sequence number and sequence-number-related access method, making it a bit inconvenient to handle the order-related computing.

Example 1: As for the sales data object, compute the sales increment of this month compared with that of the previous month.
SQL:
select salesAmount, salesMonth,
     (case when
prev_price ! =0 then ((salesAmount)/prev_price)-1
else 0
end) compValue
from (select salesMonth, salesAmount,
lag(salesAmount,1,0) over(order by salesMonth) prev_price
from sales) t
TSeq:
sales.derive(salesAmount / salesAmount [-1]-1: compValue)

The monthly sales have nothing to do with orders. It is represented as salesAmount in the TSeq and the data table. The one related to the order is the Sales amount of previous month, which is equivalent to the salesAmount of the previous record relative to the current record. TSeq is ordered, sosalesAmount[-1] can be used to represent the sales of previous month. By comparison, SQL data table is not ordered. Since SQL2003 standard was introduced, the order-related features are added to the window functions. It is so tedious to use the unavoidable  complex method lag(salesAmount,1,0) over(order by salesMonth)to compute the sales of the previous month.

TSeq can also be used to represent the relative interval, for example, the total five months comprising the current month, the two months before, and two months after the current month. With TSeq, it can be represented like this: salesAmount{-2,2}. In SQL, the window function can also be used to represent the summarizing, but much more troublesome.

Example 2: For the sales data table, find the first 10 records with the highest sales for each product.
SQL:
select salesMan, product ,amount
from ( select salesMan, product ,amount, rank() over (partition by product order by amount desc ) ranking from sales)
where ranking <=10
TSeq:
= sales.group(product).(~.top(-amount;10))

In this example, the most intuitive thinking pattern to implement is to group data by product, and then perform the order-related computations in the groups. The simplest method is to retrieve the first ten records with the highest amount in the group. The intuitive method is to sort the data in the group in reverse order by amount, and then get the records whose sequence numbers in the group is from 1 to 10.
TSeq supports the order-related computation quite well. For TSeq, not only the function top can be used to implement the first algorithm, as demonstrated in the example, but also the sort function and the sequence number of record can be used to implement the second algorithm, as shown below: =sales.group(product).(~.sort(Amount:-1)).(~([to(10)])).

The record in the SQL data table is not ordered. We must compute out a sequence number or field to act as the sequence number, such as ranking. The algorithm adopted in the above example is to compute the data rankings in the group, and then get the records among the top 10. Obviously, SQL syntax is a bit zigzag and difficult to understand. SQL users will have to use the window function over (partition by…… order by……) and the sub-query which is hard to track and debug.

By comparison, TSeq is more straightforward and simpler for easy track and debug. For another example, programmers can firstly write =sales.group(product) code for test. This code represents that the data will be grouped. This code snippet can run and display the result independently. If the grouping results meet your expectation, then programmers can proceed to add the second paragraph of codes: sort the data in the group in reverse order, that is, .(~.sort(Amount:-1)). In which, the “.” indicates the former computing result will be taken as a whole for further processing, “~” indicates the data in the current group, and “-1” indicates the reverse order. The current code line is =sales.group(product).(~.sort(Amount:-1)), which can run and display the result independently. If the grouping results meet your expectation, then programmers can proceed to add the third paragraph of codes: retrieve the records whose sequence numbers in the group is 1-10, i.e. .(~([to(10)])).

As can be seen, the computational procedure of TSeq can be pushed forward step by step. In solving the complex computing problems, the computation can be simplified to avoid errors. In facts, the above three continuous paragraphs of code can be written into three rows of codes step by step, so as to decompose the computing goal more clearly:

        
One thing worthy of noticing is that SQL cannot perform data sorting before order-related computing,even with the temporary table.The two steps must be combined into one step. The reason for this is that the data table does not support the explicit set and generic set. It is rather weak regarding its syntax representation.

In addition, although SQL uses the window function up to the ANSI standard, the database vendor does not implement it fully to the standard. The writing styles vary for various databases and some databases does not provide the window function at all. The function syntax of TSeq is independent of data source. For whatever data sources such as database, TXT file, Excel file, and binary file, programmers are not required to modify the code when using TSeq for computing.

Difference between explicit sets

Although SQL has the concept of set, the explicit set is not provided, and cannot be taken as the independent variable. The set operations can only be implemented with temporary table. TSeq is the true explicit set for implementing the set operation.

Example: Compute over the Contract data object. Suppose the contract with over 40 order quantity is the big contract, and the unit price over 2000 is the important contract. Please find the contract settled in this year which are both the big contract and the important contract, and all other contracts except these contracts.

SQL:
         select SellDate,Quantity,Amount,Client from Contract where to_char(SellDate,'yyyy')='2014' and quantity>=40 and AMOUNT>=2000
         select SellDate,Quantity,Amount,Client from Contract where not(to_char(SellDate,'yyyy')='2014' and quantity>=40 and AMOUNT>=2000)

TSeq:
         =thisYear= Contract.select(year(SellDate)=2014)
         =big= Contract.select(Quantity>40)
         =importance = Contract.select(AMOUNT>2000)
         =answer=thieYear^big^ importance
         =others= Contract\answer

Comparison and Analysis:

The typical natural thinking pattern is to find the contracts that are big contract and also the important contract of this year. The intersection operation is most intuitive. If the big contract is defined as big, the important contract is defined as importance, and the contract in the current year is defined as thisYear, then we can write the pseudocode easily: big∩importance∩thisYear. TSeq is the explicit set capable of representing the equivalent expression very intuitively, i.e. thieYear^big^importance. SQL does not allow for representing it with the variables for set. So, we have to find another way, for example, converting it to the Boolean condition, as demonstrated in the example: to_char(SellDate,'yyyy')='2014' and quantity>=40 and AMOUNT>=2000.
It is easy to solve the first problem because the respective development difficulty with these two tools are not great. With the further research into the problems, their differences become evident.

Problem 2: “Other contacts except for those big and important contracts” - it is also a typical thinking pattern - the computing goal can be achieved with difference set in one step. TSeq expression:Contract\answer, very intuitive. Using Boolean conditions, SQL can also be used to compute out the answer. But the expression writing style and business description vary greatly, as shown below: not(to_char(SellDate,'yyyy')='2014' and quantity>=40 and AMOUNT>=2000).

With SQL, we can solve it using the set operation, and the algorithm is also quite intuitive. However, the code may appear very lengthy since the data table cannot be represented with the set variables.

(select SellDate,Quantity,Amount,Client from Contract)
minus
(Select select SellDate,Quantity,Amount,Client from Contract from(
    (select SellDate,Quantity,Amount,Client from Contract where to_char(SellDate,'yyyy')='2012')
    Intersect
    (select SellDate,Quantity,Amount,Client from Contract where quantity>=40)
    Intersect
    (select SellDate,Quantity,Amount,Client from Contract where AMOUNT>=2000))

Because the code is lengthy, many people would rather use Boolean condition to implement the set operations indirectly.

Undoubtedly, in most cases, it is more convenient for data table to use operation set than Boolean condition, for example, the set operations between multiple physical tables, or the set operations between multi-level sub-queries. In this case, the cost would be high if converting the set operation to the Boolean conditions, and programmers have to adopt the lengthy set operation.

Difference between generic sets

TSeq is the generic set to store both the physic data and the reference to the associated data. With this advantage, TSeq can implement the associated computing through the intuitive object reference. By comparison, the data table can only be used to store the physical data. The equivalent computation can only be completed using the associated complex statements.

Example:

Please compute which department managers received the President Award, and of these managers, which subordinates are the annual outstanding employees. This involves two data object:department and employee, in which, the deptName field of department and the empDept field of employee is in the one-to-one relation. Still, the manager field in department and the empName field in employee is also the one-to-one relation. In addition, the code for president award is PA; and the code for annual outstanding employee is EOY.

Data table:
    SELECT A.*
    FROM employee A,department B,employee C 
    WHERE A.empDept=B.deptName AND B.manager=C.empName AND A.empHonor=‘EOY’ AND C.empHornor=‘PA’

TSeq:
    employee.select(empHonor: "EOY",empDept.manager.empHornor:"PA")

Comparison and Analysis:
The SQL solution can definitely work out.But the relevant associating statement is fairly complex that the average programmer could feel hard to comprehend. By comparison, TSeq solution is rather intuitive, empHonor:"EOY"is one of the conditions: Who are Year Outstanding Employees, while empDept.manager.empHornor indicates the “the department.the department manager.awards received by the department manager”. Obviously, if this value is PA, then the condition 2 in the problem is met: “Department managers who received the president award”. This is the object reference by definition.
The object reference enables programmers to use “.” operator to make reference to the related data. By this way, the business association relation can be translated into the computer language intuitively, so that multilevel relations can be represented conveniently, and the associating computation can be performed intuitively.

Differences regarding Memory and External Storage Transparencies

SQL
Because SQL data table does not support the generic and set data,no data will be lost when writing the data from memory into the external storage. Thus, it is transparent to the computation over memory and external storage. On accessing the data table for the first time, data are usually from the external storage; then, for the subsequent accesses to the same data table, data can be from the memory cache; for the data table with relatively large volume of data, part of the data can be from the external storage, and part from the memory. No matter the data is from memory or external storage, and data volume is big or small, the syntax for retrieving data table is always the same, and programmers need not to write different SQL statements for it.

TSeq
TSeq supports the generic type (reference in particular) and set data. Data could lost when writing the data from memory to the external storage, and can not always ensure to be loaded properly, which making the computation intransparent to the memory or external storage. TSeq is the pure memory data object, and is only capable of handling limited data volume; If the data volume is relatively big, then cursor (another data object of esProc) should be used for the computation in external storage, with different syntaxes for the cursor and the TSeq; esProc programmers also must convert the data between cursor and TSeq if they want to improve performance or handle the computation involving complex business logics.

Comparisons
TSeq is not transparent to the memory and external storage. esProc programmers need to write different codes to meet the requirement of computing in memory, external storage, or mixed computing. In addition, they need to modify the codes to meet the needs of data volume increase. Therefore, the workload for the initial design and post-maintenance is relatively great. SQL data table is transparent to the memory and external storage. Programmers only need to write one set of code to fit various scale of data. The designing and maintenance workloads are relatively small.

Through the above comparison, we can conclude: Characterized with the ordered data, explicit set, and generic set, TSeq can easily solve the order-related complex problem and reduce the complexity of set operations, with support for the intuitive object reference to handle the complex multi-table association. SQL data table is transparent to the memory and external storage, and SQL code is commoner.


July 22, 2014

Basic Computation of esProc Table Sequence and Record Sequence

esProc table sequence is a structured two-dimensional table, boasting concepts like field, record, primary key and reference. These concepts originate from data table of relational database. A table sequence is also an explicit set of genericity and orderliness, which can make computation of structured data more flexibly. Record sequence is the reference of table sequence. They are closely related and their usages are almost the same. The article will explain their basic computation from aspects of accessing, maintenance, loop function, aggregate function and sets operations.

Accessing

1.Create objects
Description: Read two-dimensional structured data from a file, create table sequence objects and store them in cell A1. Create record sequence objects by referring to A1 and store them in B1.
Code
     A1=file("e:/sales.txt").import@t()
     B1=A1.select(Amount>2000)


Results are: the first ten records in A1(table sequence):  



the first ten records in B1(record sequence):


Note: table sequence can be created according to a database or a file, or be created by inserting records within an empty object. A record sequence originates from a table sequence but it doesn't store physical records. It only stores references of some records in the table sequence.

2. Access field values
Description: Get Client field of the fifth record in from A1 - the table sequence objects - and store it in cell A2. Get Client field of the first record from B1 - the record sequence objects - and store it in cell B2.
Code
     A2=A1(5).(Client)     /computed resultDSG
     B2=B1(1).(Client)     /computed resultDSG
Note
1. Since the first record in B1 correspond to the fifth record in A1, both of them have the same computed results.
2.  Both table sequence and record sequence have completely same syntax for accessing field. 
3.  A field name can be replaced by the field’s sequence number and the result won’t change. For instance: A1(5).#2. Because this kind of replacement is employed universally in esProc, we won’t go into details about it.
3.Access column data
Description: Fetch column Client according to column name and store it in A3. Fetch column Client and column Amount according to column names and store them in A4. The record sequence and table sequence in this example have the same expression, and only the latter is selected for our illustration.
Code
    A3=A1.(Client)                            
    A4=A1.new(Client,Amount)

Results are
 A3,A4:


Note
1. With the syntax “table sequence.(field name)” , only a column of data can be fetched, and the result is sequence without structured column name. With new function, however, a column or more columns of data can be fetched, and the result is table sequence with structured column name. 
2. Whether the computing object is a table sequence or a sequence, new function will create a new table sequence, that is, the computed result of B1.new(ClientAmount) is also a table sequence.

4. Access row data
Description: Fetch the first two records according to row number. The record sequence and table sequence in this example have the same expression.
Code    =A1([1,2])



Results are


Maintenance

5. Add records
Description: Append a record r1 to table sequence A1, insert r2, of which only the OrderID field has value and the other fields are null, into the second row. Execute the same operation in record sequence B1.
Table sequence
A6=A1.insert(0,152:OrderID,"CA":Client,5:SellerId,2961.40:Amount,"2010-12-5 0:00:00":OrderDate)
A7=A1.insert(2,153:OrderID)

Record sequence:
B6=create(OrderID,Client,SellerId,Amount,OrderDate)                                             /empty table sequence B6
B7=B6.record([152,"CA",5,2961.40,"2010-12-5 00:00:00"])     /insert the first record r1 into B6
B8=B1.insert(0,B6(1))                 /add r1 to B1
B9=B6.record([153,,,,])               /insert the second record r2 into B6
B10=B1.insert(2,B6(2))                /insert r2 into the second row of B1


Note:
1. The syntax of table sequence and record sequence has a lot of difference when new records are added to them. What is added to table sequence is physical records, and insert function can be directly used in table sequence A1. While for record sequence, only records’ references can be added to it, so the physical records must be there before making any references. In the example, the physical records are stored in B6(or A1 and B1).
2. After those computations are done, the records in B1 originate from two table sequences: A1 and B6.
3. If insert function’s first parameter is zero, add records at the end; if not, insert records into designated places. The rule applies in both table sequence and record sequence.

6.  Delete records
Description: delete the record in the second row.
Table sequence=A1.delete(2)
Record sequence=B1.delete(2)
NoteWhat is deleted in table sequence is physical records; while those deleted in record sequence are references of records and the original table sequence won’t be affected with this operation.
7.Modify records
Description: Change the Amount field in the second record to 2000, and the OderDate field to 2009-07-01 00:00:00.
Table sequence=A1(2).modify(2000:Amount,datetime("2009-07-01 00:00:00"):OrderDate)



Record sequenceRecord modification is forbidden in a sequence, it can only be executed in the original table sequence.
NoteIn the example, modify function executes modification of a single record. But, it can do modification in batches in a table sequence.

8.Add fields
Description: Add two new fields - type and orderYear, in which type is null and orderYear is derived from the year in original OderDate field. The record sequence and table sequence in this example have the same expression.
Code=A1.derive(:type, year(OrderDate):orderYear)


NoteWhether the computing object is a table sequence or a sequence, derive function will create a new table sequence. 

Loop function

Loop function can compute each record of table sequence/record sequence, express complex loop statement with simple function. For instance, select is used to make query, sort to sequence, id to merge repeated records, pselect to fetch sequence number of eligible records and max if to read the maximum value from eligible records. Here the most basic ones - select function and sort function- will be illustrated.

9. Query
Description: Query out records whose Amount field is greater than or equal to 2000 and whose OrderDate is the year 2009. The record sequence and table sequence in this example have the same expression.
Table sequence=A1.select(Amount>=2000 && year(OrderDate)==2009)
Results are



NoteWhether the computing object is table sequence or record sequence, the computed result of select function will always be a record sequence, that is, the references of records instead of physical records.

10. Sort
Description: Sort records in an ascending order according to SellerID fields. If the results of SellerID are the same, sort records in a descending order according to OrderDate fields. The record sequence and table sequence in this example have the same expression.
Code=A1.sort(SellerId,OrderDate:-1)
Results are


NoteWhether the computing object is table sequence or record sequence, the computed result of sort function will always be a record sequence. In fact, most of the functions for table sequence and record sequence can be employed universally unless the records are modified. 

Aggregate function

11. Seek maximum value
DescriptionSeek the maximum value of Amount field. The record sequence and table sequence in this example have the same expression.
CodeA1.max(Amount)
Computed resultA2=29600.0
NoteSimilar functions include min(minimum value), sum(summation), avg(average value), count(count), etc.

12. Sum by grouping
DescriptionSum Amount in each group of data according to SellerID and the year, and count orders of each group. The record sequence and table sequence in this example have the same expression.
CodeA1.groups(SellerId,year(OrderDate);sum(Amount),count(~))
Results are


Note
1.  groups function will create a new table sequence.
2.  "~" in expression count(~) represents the current group. count(~) can also be written as count(OrderID). Besides, we don’t designate field names of computed results in writing code, so default field names like year(OrderDate) will appear. A colon could be used in designated field names, such as =A1.groups(SellerId,year(OrderDate):OrderYear;sum(Amount),count(OrderID)) .

Operations between sets

Operations between sets include intersection "^", union "&", complement "\" and concatenate "|", etc.
13. Intersection and union operations
DescriptionStore orders whose Amount is greater than and equal to 2000 in the year 2009 in A2, and store those whose SellerID is equal to 1 or 2 in A3. Now seek intersection and complement of A2 and A3, then store results respectively in A4 and A5.

Record sequence
    A2=A1.select(Amount>=2000 && year(OrderDate)==2009)   /A2 is record sequence
    A3=A1.select(SellerId==1 || SellerId==2)                                              /A3 is record sequence

    A4=A2^A3                      /intersection, the result is record sequence
    A5=A2\A3                    /complement, remove members of A3 from A2
Table sequence
     A6=A2.new(OrderID,Client,SellerId,Amount,OrderDate)/table sequence created from A2
     A7=A3.new(OrderID,Client,SellerId,Amount,OrderDate)       /table sequence created from A3
     A8=A6^A7                  /Intersection, result is empty. A table sequence is a collection of physical members. Different table sequences always have different members, so the intersection operation of two table sequences is definitely empty. This has no practical significance in business.
     A9=A6\A7                  /Complement. Because members of two table sequences are always different, the computed result of complement is still A6.

NoteOnly sets operations between record sequences originating from the same table sequence have practical significance in business. Usually, the intersection and complement operations between different table sequences or record sequences originating from different table sequences make no sense in business.

14. Union and concatenate operations
DescriptionStore orders of which SellerID equals 2 and 10 in A2, and store those of which SellerID equals 3 and 10 in A3. Now seek the union and concatenate of A2 and A3 and store results respectively in A4 and A5.
Record sequence
      A2=A1.select(SellerId==2 || SellerId==10)
      A3=A1.select(SellerId==3 || SellerId==10)
      A4=A2&A3    /Union. Members of A2 and A3 will combine in order and repeated records will be removed. 
       A5=A2|A3         /concatenate. Members of A2 and A3 will combine in order and repeated records won’t be removed. 
Table sequence:
     A6=A2.new(OrderID,Client,SellerId,Amount,OrderDate)       /table sequence created from A2
     A7=A3.new(OrderID,Client,SellerId,Amount,OrderDate)       /table sequence created from A2
     A8=A6&A7       /Union. Members of two table sequences are completely different, so the union operation means a simple union-all of the two.  


     A9=A6|A7         /concatenate




July 21, 2014

Basic Computations of esProc Sequences

A sequence is an ordered set consisting of some data, which are called members of the sequence. A sequence is similar to an array in high-level language, but the data type of its members not have to be the same. The following will explain its basic computation through creation, access, operators and functions.

Creation

1.Create with Constant
Bracketing members with “[]” represents sequence constant, e.g.

A5=[15.2,"b",1]        /Members of a sequence can be float, string and integer
B5=[A1:C3]             /Members are cell area, i.e.


C5=[3,A5,B4]            /A member of a sequence can also be a sequence,i.e.


A6=[1,2,3,3]              /Members can be repeated
B6=[]                      / A empty sequence
C6=[[]]                /A non-empty sequence that its member is an empty sequence

Note: Members of a sequence can be any data types, including atom type, another sequences, records, etc. A sequence that all its members are integers is called an integer sequence.

2.Create with functions
to(2,6)                         / Numbers of the integer sequence are [2,3,4,5,6]. A integer sequence beginning with 1 can be expressed with to(6)
"1,a,b,c".array()      /Split a character string into a sequence [1,a,b,c], a reversed joint can be expressed with [1,a,b,c].string()
periods@y("2012-08-10 12:00:00",now(),1)            /Create a integer sequence for a period of time by dividing it into smaller time periods whose starting and ending points are in adjacent years, sequential value is ["2012-08-10 12:00:00","2013-01-01 00:00:00","2014-01-01 00:00:00","2014-07-01 10:10:41"]
file("e:/sales.txt").import@t()                                              /Import records from structured text files to form a sequence, sequential value is: 


A sequence whose members are records is table sequence, which is often used to make computations of structured data. It’s not the focus in this article, for more information please refer to “Basic Computation of esProc Table Sequence and Record Sequence”.

3. Create by computing
Description: read the text file sales.txt into table sequence A1, fetch column Client to create sequence A2; group the records according to Client to create sequence A3.

A1=file("e:/sales.txt").import@t()
A2=A1.(Client)
A3=A1.group(Client)

Note:

1.Value of sequence A2:


2.Value of sequence A3:


We can see that members of sequence A3 are a number of sequences whose members are records.

Accessing
4.Access members according to serial numbers

A1=[a,b,c,d,e,f,g] /Sequence A1
A1(2) /Fetch the second member whose value is string “b”, which equals to A1.m(2)
A1([2,3,4]) /Fetch members from the second to the fourth, whose value is expressed by the sequence [b,c,d]. Note that [2,3,4] is also a sequence(integer sequence). Intervals can be used to rewritten the expression as A1(to(2,4)).
A1.m(-1) /Fetch the last member. Note that m function must be used when fetching members backwards, the expression cannot be abbreviated to A1(-1). 

5.Assignment and modification
A1(2)=r /Modify the second member to r, now value of sequence A1 is [a,r,c,d,e,f,g]
A1([2,4])=["r","s"] /Modify the second and the fourth member. A1=[a,r,c,s,d,e,f,g]
A1.Modify(2,["r","s"]) /Modify in turn from the second member, expression A1= [a,r,s,d,e,f,g] equals to A1([2,3])=["r","s"]

6.Add members
A1.Insert(0,"r") /Add members at the end of the sequence, A1=[a,b,c,d,e,f,g,r]
A1.Insert(2,["r","s","t"]) /Insert three members consecutively before the second member, A1=[ a,r,s,t,b,c,d,e,f,g,r]

7.Delete members
A1.Delete(2) /Delete the second member
A1.Delete([2,4]) /Delete the second and the fourth member

Operators
8.Sets computation
Sets computation include ^ intersection, & union, \complement, and |concatenate, etc. For example:
A1=["a","b",1,2,3,4] /Sequence A1
B1= ["d","b",10,12,3,4]      /Sequence A2
A1^B1 /Intersection, return the sequence made up of members of both thetwo sequences, value is ["a","b",3,4]
A1\B1 /Complement, a new sequence created by successively removing from A1 the members of B1, value is ["a",1,2]
A1&B1 /Union, value is["a","b",1,2,3,4,"d",10,12]
A1|B1 /Concatenate, value is ["a","b",1,2,3,4,"d","b",10,12,3,4]
Note: Both union and concatenate are created by combining members of two sequences in order. Common members only appear once in union while, in concatenate, all of them will appear. 

9.Alignment arithmetic operation
Two sequences of the same length can make alignment operation according to members and return the sequence. The operation includes ++ (add), -- (subtract), ** (multiply), // (divide) and %% (complementation). For example: 

A1=[1,2,3,4] /Sequence A1
B1= [10,12,3,4] /Sequence A2
A1++B1 /Counterpoint addition, value is [11,14,16,18]

10.Boolean operation
Two sequences can compare in alignment, the result is a Boolean type.
[1,2,3]==[1,2,3] /Comparative result is true
[1,B,3]<=[1,b,4] /Comparative result is true, because B is less than b
[1,2,3]<[1,3,4] /Result is true, because the second member of [1,2,3] is “2” , which is smaller than the second member “3” of [1,3,4]
Note: “in” function is used to judge the inclusion relation between sequences.

Functions
11.Aggregate function
Functions for sequences include sum, avg, max, variance,etc. For example:
A1=[2,4,6] /Sequence
A1.sum() /Summation, result is 12
A1.sum(~*~) /Quadratic sum, which equals to 2*2+4*4+6*6, result is 56. ~ represents each member of a sequence.

12.Loop function
Loop function can make computation aiming at every member of a sequence, and express complex loop statement with simple functions, including loop computation, filter, locate, look up, rank, sort, etc.
A1=[2,4,-6] /Construct sequence A1 
A1=(~+1) /Add 1 to every member, result is [3,5,-5]
A1.select(~>1) /Filter out members that are greater than 1, result is [2,4]
A1.pselect@a(~>1) /Locate serial numbers of members that are greater than 1, result is [1,2]
A1.pos([-6,2]) /Look up serial numbers of members -6 and 2 in A1, result is [3,1]
A1.rank() /Rank of members of the sequence, result is [2,1,3]
A1.sort() /Sort in ascending order, result is [-6,2,4]; [2,4,-6].sort(~:-1) is the expression when sorting in descending order