Merge Is taking so much time in Oracle

I have a table with 36 million record and I need to update fields on some conditions but it is taking 10 min for each query and I have many queries to run :

     MERGE   INTO account USING ( SELECT id, COUNT( Account)  as numAcc
             FROM account  where Status = 'ACTIVATED' GROUP BY id) ba 
            ON (ba.id = b.id ) WHEN MATCHED THEN UPDATE SET b.numAcc = ba.numAcc;

above a sample query and other queries look the same but with changed condition and also the following is the explain

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name                           | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT          |                                |    36M|   892M|       |   610K  (1)| 00:00:24 |
|   1 |  MERGE                   | ACCOUNT |       |       |       |            |          |
|   2 |   VIEW                   |                                |       |       |       |            |          |
|*  3 |    HASH JOIN             |                                |    36M|    19G|   511M|   610K  (1)| 00:00:24 |
|   4 |     VIEW                 |                                |   998K|   500M|       |   365K  (1)| 00:00:15 |
|   5 |      SORT GROUP BY       |                                |   998K|   502M|       |   365K  (1)| 00:00:15 |
|   6 |       VIEW               | VW_DAG_1                       |    25M|    12G|       |   365K  (1)| 00:00:15 |
|   7 |        SORT GROUP BY     |                                |    25M|  1025M|  1346M|   365K  (1)| 00:00:15 |
|*  8 |         TABLE ACCESS FULL| ACCOUNT |    25M|  1025M|       | 91408   (1)| 00:00:04 |
|   9 |     TABLE ACCESS FULL    | ACCOUNT |    36M|  2162M|       | 91501   (1)| 00:00:04 |
-------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - access("BA"."ID"="B"."ID")
   8 - filter("Status "='ACTIVATED')

Sample Data:

id           account     status month
3141243131  1162278310  CLOSED  1
3141243131  1014532510  ACTIVATED   1
3141243131  1094289210  ACTIVATED   1
3141243131  1162278310  CLOSED  2
3141243131  1094289210  ACTIVATED   2
3141243131  1014532510  ACTIVATED   2
3141243131  1094289210  ACTIVATED   3
3141243131  1162278310  CLOSED  3
3141243131  1014532510  ACTIVATED   3
5432523522  1111231231  CLOSED  1
5432523522  1014532510  ACTIVATED   1
5432523522  1094589210  ACTIVATED   1
5432523522  1111231231  CLOSED  2
5432523522  1094289210  ACTIVATED   2
5432523522  1094589210  ACTIVATED   2
5432523522  1094289210  CLOSED  3
5432523522  1111231231  CLOSED  3
5432523522  1094589210  ACTIVATED   3

The slowness is almost certainly not due to row retrieval in this case, so using ROWID or some other nested-loops join is not going to help (it will actually make it worse – a full scan with hash join is in fact the best option here, as you aren’t limiting the rows significantly with a high cardinality predicate). Other than the fact that you aren’t using any parallelism, your execution plan looks good for what you’re doing.

The problem with updates

Your real problem is the # of rows you are updating – updates require a lot of work. The blocks have to be read in current mode rather than in consistent or direct path mode, which requires reading current blocks from disk, buffering in the cache and various concurrency mechanisms surrounding cache buffers. It has to lock the rows (and wait on any blocking locks by other users). It has to reformat the block. It has to generate undo records. It has to generate redo records and wait on those to flush to disk. And, if the volume is high enough and your cache small enough and there are too few DBWR processes, DBWR can fall behind flushing those dirty buffers and you run out of available buffers in the cache, which will stall out the update, with DBWR as your bottleneck. Lastly, if you are updating any indexed columns, you have to modify the index blocks as well. At large volumes, updates (and deletes) are your enemy, to be avoided as much as possible.

#1 Reduce Row Updates

So, you first priority is to reduce the # of rows updated. As written, your query will update 100% of the rows for a given status:

MERGE INTO account b
USING ( SELECT id, COUNT( Account)  as numAcc
          FROM account 
         WHERE Status = 'ACTIVATED' 
         GROUP BY id) ba 
   ON (ba.id = b.id ) 
 WHEN MATCHED THEN UPDATE SET b.numAcc = ba.numAcc;

1. Don’t update at all

If 100% of those rows have the wrong or a NULL numAcc and you are populating it initially, then you must update them all. But it would be far better to rearchitect your loading process so that the correct numAcc value is set in the initial INSERT so you never have to update it. If you have to, use a work table then join it to account and use that to load (INSERT SELECT) the final table. With large row values, this approach is always better than a large-scale update operation. The best kind of update is no update at all.

2. Update fewer rows

If however only some (a small percentage) of those rows need changing – most already have the correct value – then you need to weed out the rows that don’t need changing within the USING clause (don’t try to do it simply with the WHERE sub-clause underneath UPDATE). Usually that requires pre-joining your source and target table together within the USING clause, but in your case it’s the same table, so we don’t need to do a join, just compare the current value with the new one using an analytical function and a parent block:

MERGE INTO account b
USING (SELECT *
         FROM (SELECT id, numAcc old_numAcc, COUNT(*) OVER (PARTITION BY id) as numAcc
                 FROM account 
                WHERE Status = 'ACTIVATED')
        WHERE NVL(old_numAcc,-1) != numAcc) ba 
   ON (ba.id = b.id ) 
 WHEN MATCHED THEN UPDATE SET b.numAcc = ba.numAcc;

#2 Eliminate any dependent objects impacting updates

Once you have reduced the # of rows you are actually updating, the second priority is to eliminate any constraint or sub-object slowing it down. This would be any index on the numAcc column (there really shouldn’t be an index on a column like that, but check to be sure), as well as any foreign key (FK) on the column. Also look for triggers that fire on update. All these will significantly slow down an update operation. Disable them if possible.

#3 Employ parallelism for the update

The third priority is to throw muscle at the operation. Something like this should be parallelized with PDML (parallel-DML) to take advantage of multiple CPUs. This requires enabling parallel DML at the session level as well as hinting it:

ALTER SESSION ENABLE PARALLEL DML;

MERGE /*+ parallel(b,16) */ INTO account b
USING (SELECT *
         FROM (SELECT id, numAcc old_numAcc, COUNT(*) OVER (PARTITION BY id) as numAcc
                 FROM account 
                WHERE Status = 'ACTIVATED')
        WHERE NVL(old_numAcc,-1) != numAcc) ba 
   ON (ba.id = b.id ) 
 WHEN MATCHED THEN UPDATE SET b.numAcc = ba.numAcc;

Finally, I’ll comment that I tried to answer your question about performance. However, it should also be pointed out that you really shouldn’t be doing this in the first place. Stamping a row with the # of neighboring rows is a violation of normalization rules and creates ambiguity. What happens if you add a new row? Are you going to redo the update operation every time so that the value is in sync? We need to avoid situations like this. Aim to never have a column in a table that depends on other rows for its value, especially other rows from the same table. It should be solely determined by the key of its own row (not even a non-key column from its own row) and that alone.

So, you’re much better off not having a numAcc column in your table at all. If you need to know the count, compute it with COUNT(*) at query time. Remember you can use analytical functions like COUNT(*) OVER (PARTITION BY id) to easily get this denormalized in a query without having to do a GROUP BY and change the granularity of your results.

Rather than aggregating and correlating on the id column, you can try using an analytic function and correlating on the ROWID pseudo-column (which is effectively a pointer to the row):

MERGE INTO account dst
USING (
  SELECT activated_count
  FROM   (
    SELECT numacc,
           COUNT(CASE WHEN Status = 'ACTIVATED' THEN account END)
             OVER (PARTITION BY id) AS activated_count
    FROM   account
  )
  WHERE  activated_count > 0  -- Your query effectively includes this filter but I'm not sure
                              -- whether you intended that behaviour or not.
  AND    numacc != activated_count
) src 
ON (
      dst.ROWID = src.ROWID 
)
WHEN MATCHED THEN
  UPDATE SET dst.numAcc = src.activated_count;

fiddle

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 05:03

Thẻ: oracleperformancemergesql-update

Thiết kế website giá rẻ

Danh mục