Aggregation in MongoDB
In the last article we talked about how MongoDB works with Node.js. In this article we’ll take a look at Aggregation with MongoDB. So what is Aggregation? Fundamentally, it’s a calculation on results.
There are a few ways to use Aggregation in MongoDB. This first is pipeline. The big picture here is that we receive results and pass them through the function’s pipeline, which runs a calculation on them. We can then pass the results of the function through another pipeline, and so on.
Confused a bit? Don’t be. In order to run some examples, we’re going to create a very simple database. Each document has a number named X, type that is A-E, and a name. We’ll build it like so:
use orders;
function generateName() {
var myArray = ['Moshe','Ran','Haim','Michal','Tamar']
var rand = myArray[Math.floor(Math.random() * myArray.length)];
return rand;
}
function generateType() {
var myArray = ['A','B','C','D','E']
var rand = myArray[Math.floor(Math.random() * myArray.length)];
return rand;
}
for (var i = 1; i < = 100; i++) {
db.testData.insert( { x : i, name : generateName(), type : generateType() } )
}
Here’s a simple example with find that I’m running:
> db.testData.find({})
{ "_id" : ObjectId("543f91343d35c56ad957aac6"), "x" : 1, "name" : "Tamar", "type" : "E" }
{ "_id" : ObjectId("543f91353d35c56ad957aac7"), "x" : 2, "name" : "Moshe", "type" : "E" }
{ "_id" : ObjectId("543f91353d35c56ad957aac8"), "x" : 3, "name" : "Moshe", "type" : "A" }
The first pipe that we’ll use is of the type match and as its name suggests, we can use it to filter results that match certain conditions. For example all those of type A. How do we do this? Like so:
db.testData.aggregate({ $match: {"type": "B"} });
This is the aggregate method through which we pass the match operator. And we can pass other operators too, for instance:
db.testData.aggregate({ $match: {"type": "B", "name":"Moshe"} });
Like this we get those that are B and also Moshe. So far, there’s nothing we don’t already know here. Match is one type of aggregation to which we can pass quite a few parameters, just as we are able to pass to find. For instance, we can require that the type will be A and B:
db.testData.aggregate({ $match: {"type": {$in: [ 'A', 'B' ] }, "name":"Moshe"} });
How do we know this? All of the various pipe types are outlined in the documentation. In the section on match it states explicitly that you can pass the document query parameters which are all also listed in the documentation. There you can find how to use IN, or NOT EQUAL, and so on. In this next query I’m using NOT IN—everyone named Moshe but who is not of the type D or A:
db.testData.aggregate({ $match: {"type": {$nin: [ 'D','A' ] }, "name":"Moshe"} });
There’s no need to go through all the operators, as there are quite a few. But it’s a good idea to have a look at the documentation.
Whatever we collect with aggregation we can pass to an additional pipe. But which one? Anyone that you want! For example group, which is grouped data.
db.testData.aggregate([{
$match: {
"type": "B"
}
}, {
$group: {
_id: '$name',
'Count': {
$sum: 1
}
}
}]);
Wow! What’s going on here? Wait, it’s actually really simple! First we have an array in aggregate that contains all of the pipes. The first member is match, which we’ve already been acquainted with. The second member is group, which is a bit more complicated. First I put in the id_ that I want—in this case the name. After this I put in the data I want. For instance Count (a name I chose) which contains the schema through the sum operator. And the results?
{ "_id" : "Haim", "Count" : 1 }
{ "_id" : "Michal", "Count" : 4 }
{ "_id" : "Moshe", "Count" : 6 }
{ "_id" : "Tamar", "Count" : 8 }
{ "_id" : "Ran", "Count" : 2 }
I can continue to display more and more columns. For example, the average of x:
db.testData.aggregate([{
$match: {
"type": "B"
}
}, {
$group: {
_id: '$name',
'Count': {
$sum: 1
},
'X Average': {
$avg: '$x'
}
}
}]);
See how easy? Everything is in the documentation.
And we can continue to push the results into other pipes. For instance the sorted results or even write the results inside the collection!
db.testData.aggregate([{
$match: {
"type": "B"
}
}, {
$group: {
_id: '$name',
'Count': {
$sum: 1
},
'X Average': {
$avg: '$x'
}
}
}, {
$out: "DaResults"
}]);
The code above will put all the results inside a collection called DaResults.
Each pipe is essentially a member in an array, and all the pipes can be found in the documentation. So simple and easy, right?
There is another way to implement aggregation called Map Reduce. But we’re not going to cover it here, because I don’t know it very well.
For a small change in the results, we can use Single Purpose Aggregation Operations which are very similar to Pipe Aggregation—it’s just that they are simpler.
Coming up in the next article, we learn all about Replications.
About the author: Ran Bar-Zik is an experienced web developer whose personal blog, Internet Israel, features articles and guides on Node.js, MongoDB, Git, SASS, jQuery, HTML 5, MySQL, and more. Translation of the original article by Aaron Raizen.
Recent Stories
Top DiscoverSDK Experts
Compare Products
Select up to three two products to compare by clicking on the compare icon () of each product.
{{compareToolModel.Error}}
{{CommentsModel.TotalCount}} Comments
Your Comment