Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNM temp #414

Closed
Closed

Conversation

CalvinNeo
Copy link
Member

What is changed and how it works?

Issue Number: Close #xxx

What's Changed:


Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Release note


SpadeA-Tang and others added 30 commits October 31, 2024 02:12
ref tikv#16141

rearrange parts of metrics panel

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#16141

Add test to simulate insertion of 200MB (logical size) of TiDB unqiue
index and secondary index records and measure SkiplistEngine memory
usage.
Test results:
* For secondary index
  * The key-value encoding amplification is approximately 3.10
  * SkiplistEngine amplification is approximately 7.66
* For unique index
  * The key-value encoding amplification is approximately 3.38
  * SkiplistEngine amplification is approximately 8.19

Signed-off-by: Neil Shen <overvenus@gmail.com>
…v#17629)

ref tikv#17459

Track the number of locks of large txns in resolver

Signed-off-by: ekexium <eke@fastmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
)

close tikv#12587, fix tikv#16001

To fix the issue where slow region destruction can block snapshot
generation, this PR moves the snapshot generation logic out of the
region worker. A new worker is added to handle snap gen requests but it 
reuses the existing snap generator pool, so the change doesn't 
introduce any new threads.   

This is a simpler approach than the earlier attempt because it doesn't 
deal with the interactions between snapshot apply and destroy. Since 
snapshot generation has always been an independent task handled by its 
own thread pool, this change does not add significant complexity.

Signed-off-by: Bisheng Huang <hbisheng@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#15990

Add fsm schedule related metrics

Signed-off-by: Connor <zbk602423539@gmail.com>
Signed-off-by: Connor1996 <zbk602423539@gmail.com>

Co-authored-by: Bisheng Huang <hbisheng@gmail.com>
close tikv#12371

* switch kms to aws_sdk lib
* switch s3 to aws_sdk lib

Signed-off-by: Andrey Koshchiy <an.koshchiy@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…kv#17747)

ref tikv#16141

use stop-load-threshold for loading new regions

Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17711

Deprecate write_global_seq, since it is by default false.

Signed-off-by: Yang Zhang <yang.zhang@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#16141

Signed-off-by: Neil Shen <overvenus@gmail.com>
…17730)

close tikv#17728

Use min_lock_ts-1 as the candidate of resolved-ts, to ensure resolved_ts < lock.min_commit_ts( <= commit_ts).

Signed-off-by: ekexium <eke@fastmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Co-authored-by: you06 <you1474600@gmail.com>
ref tikv#16141, close tikv#17762

Let in_memory_engine's config`evict-threshold` and `stop-load-threshold`
default value generated from `capacity`.

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…v#17771)

close tikv#17767

IME observes all peer destroy events to timely evict regions. By adding
a new peer, the old and uninitialized peer will be destroyed and IME
must not panic in this situation.

Signed-off-by: Neil Shen <overvenus@gmail.com>
close tikv#17572

Signed-off-by: RidRisR <79858083+RidRisR@users.noreply.github.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#15990

Raft waterfall metrics track the duration of individual requests, all 
beginning from the same starting point (when the async write request is 
scheduled) but ending at various stages of the write process. Previous 
descriptions did not make that clear and may confuse the readers. This 
commit improves the grafana descriptions for clarity.

Signed-off-by: Bisheng Huang <hbisheng@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
)

close tikv#17696

* take cdc tasks into memory quota to prevent the TiKV OOM caused by too many pending tasks

Signed-off-by: Neil Shen <overvenus@gmail.com>
Signed-off-by: 3AceShowHand <jinl1037@hotmail.com>

Co-authored-by: Neil Shen <overvenus@gmail.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…tikv#17643)

close tikv#17363

Allow leader transfer if conf change applied on transferee.

Signed-off-by: hhwyt <hhwyt1@gmail.com>

Co-authored-by: Bisheng Huang <hbisheng@gmail.com>
…ikv#17765)

close tikv#17383, close tikv#17760

To address the corner case where a read thread encounters a panic due to reading with a stale index from the `Memtable` in raft-engine, which has been updated by a background thread that has already purged the stale logs.

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17788

Avoid can `on_gc_finished` when a new GC task is not run because there is another unfinished task.

Signed-off-by: glorv <glorvs@163.com>
…ikv#17515)

ref tikv#16141

This commit adjusts the following in-memory-engine defaults:

* `capacity`: Now IME uses 10% of the block cache and takes an equal
  amount of memory from the system. This is based on tests showing that
  the IME rarely fills its full capacity.
* `mvcc_amplification_threshold`: Change from 100 to 10 which benefit
  common workloads like TPCc (50 warehouse), saving approximately 20%
  of unified read pool CPU usage.

Also, it addresses two security issues:

* Remove ignore of RUSTSEC-2024-0006, as vulnerable shlex 0.1.1 is
  removed by tikv#13814
* Upgrade hashbrown from yanked 0.15.0 to 0.15.1

Signed-off-by: Neil Shen <overvenus@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…d twice (tikv#17798)

close tikv#17797

If the last call `prepare_for_region` returns `NotInCache`,
`clear_written_regions` can be called twice in both `write_impl` and
`clear`, which will cause panic. This pr changes `clear_written_regions`
to consume `self.written_regions`to avoid this kind of duplicate clear.

Signed-off-by: glorv <glorvs@163.com>
ref tikv#16141

handle error when getting regions info

Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
close tikv#17701

add write batch limit for raft command batch

Signed-off-by: SpadeA-Tang <u6748471@anu.edu.au>
Signed-off-by: SpadeA-Tang <tangchenjie1210@gmail.com>
close tikv#17631

Added a new crate named `compact-log-backup`. Now it can merge some log files generated by log backup and make them become SSTs.

Signed-off-by: hillium <yujuncen@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17836

This commit adds metrics to track Raft snapshots that are dropped 
during sending or receiving due to concurrency limits. These metrics 
help identify bottlenecks during scaling.

Signed-off-by: Bisheng Huang <hbisheng@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…kv#17625)

close tikv#12410

This pr make the `campaign` of the newly splitted regions triggered in time, when the leadership of the parent region is stable after `on_role_changed`.

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…17841)

close tikv#17840

Skip handling remain raft messages after peer fsm is stopped. This can avoid potential panic if the raft message need to read raft log from raft engine.

Signed-off-by: glorv <glorvs@163.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17852

expr: fix panic when using radians and degree

Signed-off-by: gengliqi <gengliqiii@gmail.com>
)

close tikv#17830

Signed-off-by: joccau <zak.zhao@pingcap.cn>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…ax_batch_size. (tikv#17821)

close tikv#17101

Increase the default raft_client_queue_size and raft_msg_max_batch_size.

This PR addresses an issue where too many Raft messages can delay 
sending, increasing the commit log duration and the heartbeat latency. 
The delayed heartbeats can lead to leader drops, especially during PD 
restarts that trigger a surge of hibernated regions. About this scenario, 
see more details at: tikv#17101.

We increased the raft_client_queue_size to prevent Raft messages from 
being dropped when the RaftClient queue becomes full under too many 
message workloads. Additionally, we increased the raft_msg_max_batch_size 
to improve the efficiency of Raft message sending.

Signed-off-by: hhwyt <hhwyt1@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…sts (tikv#17500) (tikv#17870)

close tikv#17394

lock_manager: Skip updating lock wait info for non-fair-locking requests

This is a simpler and lower-risky fix of the OOM issue tikv#17394 for released branches, as an alternative solution to tikv#17451 .
In this way, for acquire_pessimistic_lock requests without enabling fair locking, the behavior of update_wait_for will be a noop. So that if fair locking is globally disabled, the behavior will be equivalent to versions before 7.0.

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
overvenus and others added 26 commits December 25, 2024 06:01
close tikv#18046

Avoid loading region into IME when it is uninitialized to prevent panic
on encoding region end key. This is because `MsgPreLoadRegionRequest`
is sent before leader issue a transfer leader request.

Signed-off-by: Neil Shen <overvenus@gmail.com>
ref tikv#15990

build: bump tikv pkg version

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
…by (tikv#18061)

close tikv#18060

Use regex expression in panel seriesOverrides to let it compatible with the optional "additional_groupby" alias.

Signed-off-by: glorv <glorvs@163.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…f invalid max-ts update (tikv#18057)

close tikv#18055

concurrency_manager: double check via PD TSO before reporting error of invalid max-ts update

Signed-off-by: ekexium <eke@fastmail.com>
close tikv#17618

Fix a bug that wrongly truncates the string when the charset is gbk/gb18030

Signed-off-by: cbcwestwolf <1004626265@qq.com>
…tered (tikv#18066)

close tikv#18065

Print more information in logs when default not found error is encounterred.

Signed-off-by: cfzjywxk <cfzjywxk@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#15990

Export the number of currently running background jobs to help diagnose
potential compaction bottlenecks.

Signed-off-by: Neil Shen <overvenus@gmail.com>

Co-authored-by: Bisheng Huang <hbisheng@gmail.com>
…ction (tikv#18085)

close tikv#18084

`min_input_ts` and `max_input_ts` will present in a log files compaction.

Signed-off-by: hillium <yu745514916@live.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#15990

Fixed a typo: `Migartion` -> `Migration`.

Signed-off-by: hillium <yu745514916@live.com>
ref tikv#18055

When validating max-ts updates, do not report error or panic unless confirmed by PD TSO.
This reduces both false positive and false negative cases.

Signed-off-by: ekexium <eke@fastmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17894

build: update Dockerfile for build and test

Signed-off-by: wuhuizuo <wuhuizuo@126.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
close tikv#18026

Added a new RPC endpoint `flush_now` for the service `LogBackup`.

Signed-off-by: 山岚 <36239017+YuJuncen@users.noreply.github.com>
Signed-off-by: hillium <yu745514916@live.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#18105, ref pingcap/tidb#58238

Adapt ignore rules to make the download can skip some keys larger then specify timestamp

Signed-off-by: 3pointer <luancheng@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…tial risk to affect data correctness (tikv#18092)

close tikv#18091

gc_worker: Do not do delete_files_in_range on lock cf which has potential risk to affect data correctness

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#17989

If tso fetch fails, skip updating last_pd_tso.

Signed-off-by: ekexium <eke@fastmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#16818

Fix duplicated keys returned scanning locks.

Signed-off-by: cfzjywxk <cfzjywxk@gmail.com>
…ikv#18095)

close tikv#18117

Introduce a new field `use_one_pc` to the `Lock` struct to indicate whether the txn uses 1pc, and use it to prevent locks from being skipped when reading with max-ts.

Signed-off-by: zyguan <zhongyangguan@gmail.com>
…8099)

ref tikv#15990

* Increase task wait metrics upper limit from 2.5s to 42s to capture
  long task wait records that are crucial for investigating high
  latency issues
* Add description for end-point-memory-quota configuration

Signed-off-by: Neil Shen <overvenus@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#14474

Fix the request source check logic for external or internal

Signed-off-by: cfzjywxk <cfzjywxk@gmail.com>
…ikv#18102)

close tikv#17995

Address clock-skew issues.

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>
close tikv#18125

Fix incorrect mapped allocation per thread metric

Not all thread builders are hooked by `thread_allocate_exclusive_arena`, so some threads are using shared arena, causing incorrect per thread allocation.

Signed-off-by: Connor1996 <zbk602423539@gmail.com>
close tikv#18111

Support scalar function from_unixtime in tikv

Signed-off-by: wshwsh12 <793703860@qq.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#18113

Support customized raft message rejection logic

Signed-off-by: Calvin Neo <CalvinNeo@users.noreply.github.com>
Signed-off-by: Calvin Neo <calvinneo1995@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Co-authored-by: glorv <glorvs@163.com>
 

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: ekexium <eke@fastmail.com>
Signed-off-by: Calvin Neo <calvinneo1995@gmail.com>

Co-authored-by: MyonKeminta <9948422+MyonKeminta@users.noreply.github.com>
Co-authored-by: ekexium <eke@fastmail.com>
Signed-off-by: Calvin Neo <calvinneo1995@gmail.com>
Copy link

ti-chi-bot bot commented Jan 22, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from calvinneo, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/XXL label Jan 22, 2025
@CLAassistant
Copy link

CLAassistant commented Jan 22, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
12 out of 28 committers have signed the CLA.

✅ glorv
✅ 3AceShowHand
✅ ekexium
✅ joccau
✅ gengliqi
✅ wshwsh12
✅ wuhuizuo
✅ SpadeA-Tang
✅ Defined2014
✅ CalvinNeo
✅ ti-chi-bot
✅ CbcWestwolf
❌ overvenus
❌ hbisheng
❌ Connor1996
❌ LykxSassinator
❌ RidRisR
❌ v01dstar
❌ YuJuncen
❌ akoshchiy
❌ hazel1225
❌ Tristan1900
❌ 3pointer
❌ cfzjywxk
❌ hicqu
❌ zyguan
❌ hhwyt
❌ MyonKeminta
You have signed the CLA already but the status is still pending? Let us recheck it.

@CalvinNeo CalvinNeo closed this Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.