November 23, 2012

Vector Computing, who is more powerful, R language or esProc

Do you find Vector Computing tiresome while using statistical computing tools? Here we go for a Vector Computing Comparison: R Language vs. esProc. To me, one of the most attractive features of R language and esProc is that their codes are both agile, that is, only requiring a few lines of codes to implement plentiful functions. For example, both of them allow for composing Vector Computing expression, simplify the judgment statements, extend the basic functions to the advanced ones, and support the generic type. In which, regarding the vector computing, they are characterized with the massive data processing through functions and operators, so as to avoid the loop statement. Users can benefit from 2 resulting advantages: first, easy to grasp for business experts and keep the learning cost low; second, easy to implement the parallel computation and improve the performance.

In order to show users the subtle differences between R and esProc on vector computing, we will go on with several examples below.

Firstly, let's check the most basic functions like vector value getting and assigning. For example, get 5 values of vectors whose subscripts are from 5 to 10, and replace them with another 5 values.

R solution:
01    A1<-c(51,52,53,54,55,56,57,58,59,60)
02    A2<-A1[6:10]
03    A1[6:10]<-seq(1,5)

esProc solution:
A1    =[51,52,53,54,55,56,57,58,59,60]
A2    =A1(to(6,10))
A3    >A1(to(6,10))=to(1,5)

Comments: Both of them enable users to get and assign values easily with almost the same usage. However, subjectively, I prefer using the ":" of R language to represent the interval ranges. It looks more intuitive and agile.

Then, let's compare them on the arithmetical operations of vector.

R solution:
04    A4<-c(1,2,3)
05    A5<-c(2,4,6)
06    A4*A5 # multiplying the vector, and the result is: [1] 2 8 18
07    A4+2    #adding the vector to the constant, and the result is: [1] 3 4 5
08    ifelse(A4>1,A4+2,A4-2) #conditional evaluate, and the result is: [1] -1 4 5
09    sum(A4)    #aggregate, sum up the vector member, and the result is:6
10    sort(A4,decreasing = TRUE)    #sort reversely, and the result is: 3 2 1

esProc solution:
A4    =[1,2,3]
A5    =[2,4,6]
A6    =A4**A5    'multiplying the vector, and the result is: 2 4 18
A7    =A4.(~+2)    'adding the vector to the constant, and the result is:3 4 5
A8    =A4.(if(~>1,~+2,~-2))    'conditional evaluate, and the result is:-1 4 5
A9    =A4.sum()    'aggregating, vector member sum up, and the result is:6
A10    =A4.sort(~:-1)    'reverse sorting, and the result is:3 2 1

Comments: As can be seen from the above, no matter the four arithmetic operations, aggregating, or sorting operations of vector, both R and esProc can implement it well, and their syntaxes are very close. One thing worthy of notice is that the code of esProc looks more "object-oriented", while R is truly "object-oriented" judging from the bottom layer. The former is more suitable for direct use by business experts by themselves and popular with those from the common business sector, and the latter is more suitable for programmers to compile the extended package by themselves and more acceptable to those from the scientific expertise sector.

Let us check the vector computing on the structured data, such as computations based on the Orders table from the Northwind database:
Query the data with freightage from 200 to 300.
Query the order dated 1997.
Compute the intersection set of above-mentioned sets, i.e. data not only with freightage from 200 to 300 but also with orders placed in 1997.
Group the result from the previous step by EmployeeID, and average the freightage for each employee.

R solution:
02    A2<-result[result$Freight>=200 & result$Freight<=300,]
03    A3<- result[format(result$OrderDate,'%Y')=="1997",]
04    A4<-result[result$Freight>=200 & result$Freight<=300 & format(result$OrderDate,'%Y')=="1997",]
05    A5<-tapply(A2$Freight,INDEX=A2$EmployeeID,FUN=mean)

esProc solution:
A2    =A1.select(Freight>=200 && Freight<=300 && year(OrderDate)==1997)
A3    =A1.select(year(OrderDate)==1997)
A4    =A3^A4
A5    =A4.group(EmployeeID;~.avg(Freight))

Comments: R is good at querying and make statistics in groups. However, as for the set operations, R is worse than esProc. In the above example of R, the result is obtained by an indirect means of query instead of any set operations.

R can only perform the set operations on simple vectors, for example, intersect(A2$Orderid,A3$Orderid), and cannot directly implement the set operation on the structured data like data.frame.

Of course, this is not to say that the R is not powerful in vector computing. In effect, R is easier to use than esProc in the aspect of matrix-related computation. For example, to seek the eigenvalue of matrix A, R users can simply use eigen(A), while esProc users are not provided with any functions for them to represent it directly. Judging from this aspect, it proves that esProc is more suitable for business computing, while R is better in handling the scientific computation.

In conclusion, considering the vector computing, both R and esProc demonstrate perfect performance in the basic computing. More specifically speaking, R is second to none in matrix computation, and esProc (download) beats R in handling the structured data.

More news from Raqsoft:

Made-in-China IT Products Emerge with Outstanding Capability

Raqsoft Organizes Training to Better Serve Customers

Business Intelligence Suppliers: Are You Ready for 2013?