-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge tikv 7.5 #400
base: raftstore-proxy-7.5
Are you sure you want to change the base?
Merge tikv 7.5 #400
Conversation
close tikv#11161 Add back heap profile HTTP API and make it secure. The API is removed by tikv#11162 due to a secure issue that can visit arbitrary files on the server. This PR makes it only show the file name instead of the absolute path, and adds a paranoid check to make sure the passed file name is in the set of heap profiles. Signed-off-by: Connor1996 <zbk602423539@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
… tablet (tikv#15332) ref tikv#12842 - Fix a bug of compact range that causes a dirty tablet being reported as clean. - Added an additional check to ensure trim's correctness. - Fix a bug that some tablets are not destroyed and block peer destroy progress. Signed-off-by: tabokie <xy.tao@outlook.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…shutting dowm (tikv#15426) ref tikv#15202 not panic in the case of unexepected dropped channel when shutting dowm Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
…ikv#15427) close tikv#15282 disable duplicated mvcc key check compaction by default Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
close tikv#15357 Correct the raft_router/apply_router's alive and leak metrics. Signed-off-by: tonyxuqqi <tonyxuqi@outlook.com>
…ikv#15440) close tikv#15438 fix unwrap panic of region_compact_redundant_rows_percent Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
close tikv#15430 Use concurrent hashmap to avoid router cache occupying too much memory Signed-off-by: Connor1996 <zbk602423539@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#13311 Fix the possible meta inconsistency issue. Signed-off-by: cfzjywxk <lsswxrxr@163.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#14864 This is the first PR to fix OOM caused by Resolver tracking large txns. Resolver checks memory quota before tracking a lock, and returns false if it exceeds memory quota. Signed-off-by: Neil Shen <overvenus@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…tikv#15425) close tikv#15424 Signed-off-by: glorv <glorvs@163.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…5421) ref tikv#15409 Signed-off-by: bufferflies <1045931706@qq.com> Co-authored-by: Spade A <71589810+SpadeA-Tang@users.noreply.github.com>
close tikv#14864 Fix resolved ts OOM caused by Resolver tracking large txns. `ObserveRegion` is deregistered if it exceeds memory quota. It may cause higher CPU usage because of scanning locks, but it's better than OOM. Signed-off-by: Neil Shen <overvenus@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…15453) ref tikv#12842 support column family based write buffer manager Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
ref tikv/pd#6556, close tikv#15428 pc_client: add store-level backoff for the reconnect retries Signed-off-by: nolouch <nolouch@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#15405 Signed-off-by: bufferflies <1045931706@qq.com> Co-authored-by: Spade A <71589810+SpadeA-Tang@users.noreply.github.com>
ref tikv#12842 - Initialize `persisted_apply_index` on startup. Signed-off-by: tabokie <xy.tao@outlook.com>
…for mvcc scan (tikv#15455) ref tikv#14654 consider unmatch between region range and tablet range for mvcc scan
close tikv#12304 Add logs for assertion failure Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#15403 1. split config support to update dynamic. In past, the `optimize_for` function will set the config immutable. Signed-off-by: bufferflies <1045931706@qq.com>
ref tikv#15409 reuse failpoint tests in async_io_test Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
close tikv#15490 avoid duplicated Instant:now Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
close tikv#15458 Resolver owns a hash map to tracking locks and unlock events, and so for calculating resolved ts. However, it does not shrink map even after all lock are removed, this may result OOM if there are transactions that modify many rows across many regions. The total memory usage is proportional to the number of modified rows. Signed-off-by: Neil Shen <overvenus@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#15468 Return `RegionNotFound` while cannot find peer in the current store. Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#8235 Signed-off-by: Neil Shen <overvenus@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…15504) close tikv#15503 fix panic of dynamic changing write-buffer-limit Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
close tikv#15487 Signed-off-by: qupeng <qupeng@pingcap.com>
ref tikv#15409 reuse failpoint tests in test_early_apply Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
…15456) close tikv#15457 there are three triggers will split the regions: 1. load split include sizekeys, load etc. In this cases, the new region should contains the data after split. 2. tidb split tables or partition table, such like `create table test.t1(id int,b int) shard_row_id_bits=4 partition by hash(id) partitions 2000`. In this cases , the new region shouldn't contains any data after split. Signed-off-by: bufferflies <1045931706@qq.com>
ref tikv#15461 limit the flush times during server stop Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
ref tikv#14864 * Fix resolved ts OOM caused by adding large txns locks to `ResolverStatus`. * Add initial scan backoff duration metrics. Signed-off-by: Neil Shen <overvenus@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com> Co-authored-by: Connor <zbk602423539@gmail.com>
…ocksdb compaction (tikv#17431) (tikv#17435) close tikv#17269 compaction-filter: consider mvcc.delete as redundant key to trigger Rocksdb compaction Signed-off-by: Shirly <AndreMouche@126.com> Co-authored-by: Shirly <AndreMouche@126.com>
close tikv#17471 Add a script to renew certificates and fix the flaky test `test_security_status_service_without_cn` . Signed-off-by: Neil Shen <overvenus@gmail.com> Co-authored-by: Neil Shen <overvenus@gmail.com>
…ethod (tikv#17357) (tikv#17481) close tikv#17368 * add one log to indicate the memory quota is freed when drop the `Drain` * free the truncated scanned event memory quota. * refactor `finish_scan_lock` method, to remove the else branch. * row size calculation should also consider old value * remove some outdate todo Signed-off-by: 3AceShowHand <jinl1037@hotmail.com> Co-authored-by: 3AceShowHand <jinl1037@hotmail.com> Co-authored-by: Ling Jin <7138436+3AceShowHand@users.noreply.github.com>
…sts (tikv#17500) (tikv#17517) close tikv#17394 lock_manager: Skip updating lock wait info for non-fair-locking requests This is a simpler and lower-risky fix of the OOM issue tikv#17394 for released branches, as an alternative solution to tikv#17451 . In this way, for acquire_pessimistic_lock requests without enabling fair locking, the behavior of update_wait_for will be a noop. So that if fair locking is globally disabled, the behavior will be equivalent to versions before 7.0. Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com> Co-authored-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
close tikv#17356 Make the diskfull check mechanism compatible to the configuration `raft-engine.spill-dir`. Signed-off-by: lucasliang <nkcs_lykx@hotmail.com> Co-authored-by: glorv <glorvs@163.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17272 TiKV no longer names bloom filter blocks with suffix like "FullBloom" or "Ribbon". Signed-off-by: Yang Zhang <yang.zhang@pingcap.com> Co-authored-by: Yang Zhang <yang.zhang@pingcap.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…tikv#17566) close tikv#17469 The commit fixes a panic in TiKV that occurs in a rare scenario that involves region splits and immediate removal of the new peer. When a region splits, the new peer on a follower can be created in two ways: (1) By receiving a Raft message from the new region (`fn maybe_create_peer`) (2) By applying the split operation locally (`fn on_ready_split_region`). Depending on timing, a new peer might first be created by a Raft message and then again when the split is applied. This is a known situation. When it happens, the second peer replaces the first, and the first peer is dicarded. However, the discarded peer may continue processing existing messages, leading to unexpected states. The panic can be reproduced with the following sequence of events: 1. The first peer is created by a Raft message and is waiting for a Raft snapshot. 2. The second peer (of the same region) is created by `on_ready_split_region` when the split operation is applied, replacing the first peer and closing its mailbox (as expected). 3. The second peer is immediately removed. This removes the region metadata. 4. The first peer continues processing the Raft snapshot message, expecting the metadata of the region to exist, causing the panic. Signed-off-by: Bisheng Huang <hbisheng@gmail.com> Co-authored-by: Bisheng Huang <hbisheng@gmail.com>
…#17458) (tikv#17565) close tikv#17304 Fix unexpected flow control after unsafe destroy range Flow controller detects pending compaction bytes jump before and after unsafe destroy range. If there is a jump, the controller enters a state that would ignore the high pending compaction bytes until it falls back to normal. Previously, the controller may not enter the state if the pending compaction bytes is lower than the threshold while long term average pending bytes is still high. Then it would trigger flow control mistakenly. Signed-off-by: Connor1996 <zbk602423539@gmail.com> Co-authored-by: Connor1996 <zbk602423539@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…#17326) (tikv#17567) close tikv#16229 Reduce the memory usage of peers' message channel Signed-off-by: lucasliang <nkcs_lykx@hotmail.com> Co-authored-by: lucasliang <nkcs_lykx@hotmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…he log task (tikv#17317) (tikv#17570) close tikv#17316 clean `pause-guard-gc-safepoint` when unregister the log task Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: Jianjun Liao <jianjun.liao@outlook.com> Co-authored-by: Jianjun Liao <36503113+Leavrth@users.noreply.github.com> Co-authored-by: Jianjun Liao <jianjun.liao@outlook.com>
…#17591) close tikv#17579 Fix inaccurate storage async write duration metric, which mistakenly included task wait time in the scheduler worker pool. This occurs because the metric is observed in a future running on the scheduler worker pool, leading to inflated values, especially under load. This can be misleading and cause confusion during troubleshooting. This commit corrects the metric by observing it in the async write callback. Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: lucasliang <nkcs_lykx@hotmail.com> Co-authored-by: Neil Shen <overvenus@gmail.com> Co-authored-by: lucasliang <nkcs_lykx@hotmail.com>
close tikv#17224 Add a disk usage check when execute `download` and `apply` RPC from br. When the disk is not `Normal`, the request would be rejected. Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: hillium <yujuncen@pingcap.com> Co-authored-by: ris <79858083+RidRisR@users.noreply.github.com> Co-authored-by: hillium <yujuncen@pingcap.com>
…) (tikv#17598) close tikv#17589 Add some metrics for resource control priority resource limiter. Also adjust the build parameters of QuotaLimiter in resource control module to avoid triggering wait too frequently. Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: glorv <glorvs@163.com> Co-authored-by: glorv <glorvs@163.com>
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
…kv#17656) close tikv#16601, close tikv#17620 cdc: filter events with the observed range before load old values Signed-off-by: qupeng <qupeng@pingcap.com>
close tikv#17808 Use rust-rocksdb tikv-7.5 for 7.5 release Signed-off-by: Yang Zhang <yang.zhang@pingcap.com>
close tikv#17689 Fixing yanked futures-util 0.3.15 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: glorv <glorvs@163.com> Co-authored-by: Yang Zhang <yang.zhang@pingcap.com> Co-authored-by: glorv <glorvs@163.com>
close tikv#17852 expr: fix panic when using radians and degree Signed-off-by: gengliqi <gengliqiii@gmail.com> Co-authored-by: gengliqi <gengliqiii@gmail.com>
…17841) (tikv#17848) close tikv#17840 Skip handling remain raft messages after peer fsm is stopped. This can avoid potential panic if the raft message need to read raft log from raft engine. Signed-off-by: glorv <glorvs@163.com> Co-authored-by: glorv <glorvs@163.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…individual disk performance factors.(tikv#17801) (tikv#17901) close tikv#17884 This pr introduces an extra and individual inspector to detect whether there exists I/O hung issues on kvdb disk, if the kvdb is deployed with a separate mount path. Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
…v#17885) close tikv#17876, fix tikv#17876, close tikv#17877 cdc: skip loading old values for un-observed ranges Signed-off-by: qupeng <qupeng@pingcap.com> Co-authored-by: qupeng <qupeng@pingcap.com>
…ikv#17924) close tikv#17701 add write batch limit for raft command batch Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au> Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com> Co-authored-by: SpadeA-Tang <u6748471@anu.edu.au> Co-authored-by: SpadeA-Tang <tangchenjie1210@gmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…ikv#17765) (tikv#17921) close tikv#17383, close tikv#17760 To address the corner case where a read thread encounters a panic due to reading with a stale index from the `Memtable` in raft-engine, which has been updated by a background thread that has already purged the stale logs. Signed-off-by: lucasliang <nkcs_lykx@hotmail.com> Co-authored-by: lucasliang <nkcs_lykx@hotmail.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@CalvinNeo: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
What is changed and how it works?
Issue Number: Close #xxx
What's Changed:
Related changes
pingcap/docs
/pingcap/docs-cn
:Check List
Tests
Side effects
Release note