January 14, 2015

esProc Helps with Computation in MongoDB – Subquery

MongoDB doesn’t support the complex subquery which can only be realized by retrieving the data out first and then performing further computation. The operation is the same complex even if Java or other programming languages are used to write the program. In view of this, we can consider using esProc to help MongoDB with the computation, the method of which will be illustrated in detail through an example.

In MongoDB, there are two docments: orders, which stores the orders data, and employee, which stores the employee data. They are as follows:
> db.orders.find();
{ "_id" : ObjectId("5434f88dd00ab5276493e270"), "ORDERID" : 1, "CLIENT" : "UJRNP
", "SELLERID" : 17, "AMOUNT" : 392, "ORDERDATE" : "2008/11/2 15:28" }
{ "_id" : ObjectId("5434f88dd00ab5276493e271"), "ORDERID" : 2, "CLIENT" : "SJCH"
, "SELLERID" : 6, "AMOUNT" : 4802, "ORDERDATE" : "2008/11/9 15:28" }
{ "_id" : ObjectId("5434f88dd00ab5276493e272"), "ORDERID" : 3, "CLIENT" : "UJRNP
", "SELLERID" : 16, "AMOUNT" : 13500, "ORDERDATE" : "2008/11/5 15:28" }
{ "_id" : ObjectId("5434f88dd00ab5276493e273"), "ORDERID" : 4, "CLIENT" : "PWQ",
 "SELLERID" : 9, "AMOUNT" : 26100, "ORDERDATE" : "2008/11/8 15:28" }
> db.employee.find();
{ "_id" : ObjectId("5437413513bdf2a4048f3480"), "EID" : 1, "NAME" : "Rebecca", "
SURNAME" : "Moore", "GENDER" : "F", "STATE" : "California", "BIRTHDAY" : "1974-1
1-20", "HIREDATE" : "2005-03-11", "DEPT" : "R&D", "SALARY" : 7000 }
{ "_id" : ObjectId("5437413513bdf2a4048f3481"), "EID" : 2, "NAME" : "Ashley", "S
URNAME" : "Wilson", "GENDER" : "F", "STATE" : "New York", "BIRTHDAY" : "1980-07-
19", "HIREDATE" : "2008-03-16", "DEPT" : "Finance", "SALARY" : 11000 }
{ "_id" : ObjectId("5437413513bdf2a4048f3482"), "EID" : 3, "NAME" : "Rachel", "S
URNAME" : "Johnson", "GENDER" : "F", "STATE" : "New Mexico", "BIRTHDAY" : "1970-
12-17", "HIREDATE" : "2010-12-01", "DEPT" : "Sales", "SALARY" : 9000 }
It is required to select the orders data in which SELLERID is the EID of records of employees where STATE is California in employee. This can be expressed in SQL like this:
Select * from orders where orders.sellerid in (select eid from employee where employee.state=’ California’)

The data of orders cannot be loaded entirely because their size is big, while the data size of both employee and the final result are not big.


The esProc script used to help MongoDB realize this subquery is as follows:

A1: Connect to MongoDB. localhost:27017 is the IP and the port number. test is the database name, user name as well as the password.

A2: Use find function to fetch the data from MongoDB and create a cursor. orders is the collection, with a filtering condition being null and _id , the specified key, not being fetched. Here esProc uses the same parameter format in find function as that in find statement of MongoDB. As esProc’s cursor supports fetching and processing data in batches, the memory overflow caused by importing big data all at once can thus be avoided.

A3: Fetch the desired data from employee according to the condition that STATE is California.

A4: Sort the EID field of A3’s employee data.

A5: Select the desired data from orders according to the condition requiring that SELLERID must exist in the table sequence in A4. That is, the condition can be expressed as SELLERID in A4. The resulting data can be loaded all at once. But if the data are too big, they can be fetched in batches. To fetch the top 1,000 rows, for example, can be expressed as fetch(1000).

The result is as follows:

NoteesProc isn’t equipped with a Java driver included in MongoDB. So to access MongoDB using esProc, you must put MongoDB’s Java driver (a version of 2.12.2 or above is required for esProc, e.g. mongo-java-driver-2.12.2.jar) into [esProc installation directory]\common\jdbc beforehand.

The esProc script used to help MongoDB with the computation is easy to be integrated into the Java program. You just need to add another line of code - result A6 to output a result in the form of resultset to Java program. For the detailed code, please refer to esProc Tutorial. In the same way, MongoDB’s Java driver must be put into the classpath of a Java program before the latter accesses MongoDB by calling an esProc program.

No comments:

Post a Comment