[MongoDB in Action] 6장 정리

독서 2020. 9. 9. 07:12

Aggregation Framework
고급 쿼리 언어, 여러 Document의 데이터를 변환, 결합 -> 새로운 정보 생성
Example) 월별 매출, 제품별 매출, 사용자별 주문 합계 => SQL에서는 Group by, Join 등을 통해 얻는 결과들

6.1 집계 프레임워크 개요
집계 파이프라인 : 입력 도큐먼트 -> 출력 도큐먼트 사이의 작업들을 통칭(작업 1의 Output이 작업 2의 Input이 됨)
세부 작업 목록

$project	출력 도큐먼트 상에 배치할 필드 지정(projected) -> 다음 작업에서 사용할 필드들을 지정하는 작업
$match	처리될 도큐먼트 선택. find()와 유사
$limit	갯수 제한
$skip	건너 뛰는 갯수 선택
$unwind	배열의 확장(join)
$group	그룹화(group by)
$sort	정렬(order by)
$geoNear	지리 공간위치 근처 도큐먼트 선택
$out	결과를 컬렉션에 삽입
$redact	특정 데이터에 대한 접근 제어

6.2 전자상거래 집계 예제

Aggregation 예시
db.products.aggregate([ {$match: ...}, {$group: ...}, {$sort: ...}] )
aggregation method 내의 {} 괄호 순서대로 작업이 수행됨
Input은 products

1
2
3
4
5
6
7
8

ratingSummary = db.reviews.aggregate([
    {$match : {'product_id': product['_id']}},
    {$group : {_id : '$product_id',
                average:{$avg:'$rating'},
                   count: {$sum:1}
            }
    }
]).next();
Colored by Color Scripter

cs

다음과 같은 aggregation query가 있다고 가정할 경우,

match를 group보다 앞에 쓰는 이유는

1) group을 먼저 수행할 경우 해당 작업 과정에서 도큐먼트를 모두 읽는 경우가 발생할 수 있다.

2) match를 먼저 수행할 경우 도큐먼트 갯수 자체를 줄여서 group할 수 있다.

의 두가지 이유 때문이다.

6.3 집계 파이프라인 연산자

앞의 내용 + $group에서 사용할 수 있는 함수들만 확인하면 끝.

$group 함수
$addToSet	그룹 내 특정 필드의 배열 생성(*중요: 중복 금지)
$first	그룹의 첫 값
$last	그룹의 마지막 값
$max	그룹 내 필드의 최대값
$min	그룹 내 필드의 최소값
$avg	그룹 내 필드의 평균값
$push	그룹 내 특정 필드의 모든 값 반환(*중요: 중복 허용)
$sum	그룹 내에서의 필드의 합계

$addToSet과 $push의 차이점: $addToSet과 $push 모두 새로운 배열을 생성하고, 이 생성된 배열들이 원래의 도큐먼트에는 영향을 주지 않는다는 공통점이 있지만,

addToSet의 경우 distinct의 결과를 삽입하고, push는 모든 데이터를 push한다.

아래 쿼리문들은 이 차이를 확인하기 위한 쿼리.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

//room_type은 push, property_type은 addToSet
db.listingsAndReviews.aggregate([
    {$project: {name:1,review_scores:1,last_review: 1, room_type:1, property_type:1}},
    {$limit: 100},
    {$match: {last_review: {$gte: new Date(2019,0,1)}}},
    {$group: {_id: '$review_scores.review_scores_rating', rooms:{$push:"$room_type"}, properties:{$addToSet:'$property_type'} ,review_count:{$sum:1}}},
    {$sort: {'_id':-1}}
    {$out: 'aggregateResult'}
]);
 
//room_type은 addToSet, property_type은 push
db.listingsAndReviews.aggregate([
    {$project: {name:1,review_scores:1,last_review: 1, room_type:1, property_type:1}},
    {$limit: 100},
    {$match: {last_review: {$gte: new Date(2019,0,1)}}},
    {$group: {_id: '$review_scores.review_scores_rating', rooms:{$addToSet:"$room_type"}, properties:{$push:'$property_type'} ,review_count:{$sum:1}}},
    {$sort: {'_id':-1}}
    {$out: 'aggregateResult'}
]);
 
Colored by Color Scripter

cs

$unwind는 도큐먼트 내의 배열 object들을 분리해, 각각을 하나의 도큐먼트로 재생성하는 작업을 의미함.

6.4 도큐먼트 재구성

필드 구성을 변경하는 것을 의미.

이름, 구성, 새 필드 생성 등이 있음.

문자열 함수 사용 예는 다음과 같다.

1
2
3
4

db.restaurants.aggregate([
    {$project: {sumName : {$concat:['$borough', '-', '$cuisine', '-', '$name']}}},
    {$project: {sumName : {$toLower:'$sumName'}}}
])
Colored by Color Scripter

cs

산술 함수는 MongoDB Documentation에 다음과 같은 예시로 소개되어 있다.

1
2
3
4
5

db.sales.aggregate(
   [
     { $project: { item: 1, total: { $add: [ "$price", "$fee" ] } } }
   ]
)
Colored by Color Scripter

cs

논리 함수는 MongoDB Documentation에 다음과 같은 예시로 소개되어 있다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

db.inventory.aggregate(
   [
      {
         $project:
           {
             item: 1,
             discount:
               {
                 $cond: { if: { $gte: [ "$qty", 250 ] }, then: 30, else: 20 }
               }
           }
      }
   ]
)
//https://docs.mongodb.com/manual/reference/operator/aggregation/cond/#exp._S_cond

cs

집합 함수는 MongoDB Documentation에 다음과 같은 예시로 소개되어 있다.

1
2
3
4
5
6

db.experiments.aggregate(
   [
     { $project: { A: 1, B: 1, sameElements: { $setEquals: [ "$A", "$B" ] }, _id: 0 } }
   ]
)
//https://docs.mongodb.com/manual/reference/operator/aggregation/setEquals/#exp._S_setEquals

cs

 
6.5 allowDiskUse
 

정리 및 추가

Aggregation Pipeline은 Mapreduce를 대체하는 Mongodb의 가공 결과 생성 프레임워크로, aggregate는 여러 작업의 순서로 이루어져 있다.

2.6 버전 전까지는 16MB 까지의 document를 반환했으나 cursor 개념 도입 이후로는 cursor를 사용해 aggregation 작업의 결과를 확인할 수 있다.

ABOUT ME

개발일지 개발일지

티스토리툴바