Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pika一致性开发 #2944

Open
6 of 8 tasks
chejinge opened this issue Nov 8, 2024 · 13 comments
Open
6 of 8 tasks

Pika一致性开发 #2944

chejinge opened this issue Nov 8, 2024 · 13 comments
Labels
✏️ Feature New feature or request

Comments

@chejinge
Copy link
Collaborator

chejinge commented Nov 8, 2024

Which PikiwiDB functionalities are relevant/related to the feature request?

No response

Description

  • 确认当前的单key apply DB的一致性 起龙
  • 主实例增量同步binlog和改造 起龙
  • 主实例向从实例发送ACK请求 起龙
  • 从实例增量同步改造 浩宇
  • 从实例写完binlog向主实例发送ACK请求 浩宇
  • 原来不合理的部分改造 wyyyyy
  • 测试及压测. wyyyyy
  • review 小帅、俊华、少一

排期大概1个月,每周周会结束对一下进度

Proposed solution

问题记录:

Alternatives considered

###解决方案:
方案设计地址:
#2938

###开发分支
https://github.com/OpenAtomFoundation/pika/tree/consistency

@chejinge chejinge added the ✏️ Feature New feature or request label Nov 8, 2024
@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Title: Pika consistency development

@QlQlqiqi
Copy link
Contributor

QlQlqiqi commented Nov 12, 2024

确认当前的单 key apply DB 的一致性

目前的设计是没问题的,因为同 db(那肯定也包含同 key)的 binlog 会被分到相同的 worker 中,因为这个 worker 是 binlog worker,它只有一个 bg_thread 对任务进行取出和应用,所以不存在一致性的问题。调用顺序如下:

  1. PikaReplClientConn::DealMessage 中对 response 分类;
  2. PikaReplClientConn::DispatchBinlogRes 分批处理 binlog;
  3. ikaReplicaManager::ScheduleWriteBinlogTask;
  4. PikaReplClient::ScheduleWriteBinlogTask 根据 db name 采用不同的 binlog worker;

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Confirm the consistency of the current single key apply DB
The current design is no problem, because binlogs in the same db (which must also contain the same key) will be assigned to the same worker. Because this worker is a binlog worker, it only has one bg_thread to retrieve and apply tasks, so it does not There is a problem of consistency. The calling sequence is as follows:

  1. Classify response in PikaReplClientConn::DealMessage;
  2. PikaReplClientConn::DispatchBinlogRes processes binlog in batches;
  3. ikaReplicaManager::ScheduleWriteBinlogTask;
  4. PikaReplClient::ScheduleWriteBinlogTask uses different binlog workers according to db name;

@pro-spild
Copy link
Collaborator

前的设计是没问题的,因为同 db(那肯定也包含同 key)的 binlog 会被分到相同的 worker 中,因为这个 worker 是 binlog worker,它只有一个 bg_thread 对任务进行取出和应用,所以不存在一致性的问题。调用顺序

这里是写binlog的一致性的原因,key一致性的保证在PikaReplClient::ScheduleWriteDBTask的实现里面,取出redis_command的第一个操作对象作hash散列到write_db_workers_中,这样保证了写单key一定被同一个线程write_db_workers_执行,保证了一致性。
但是有个疑问,这里感觉只保证了Set, Del这种操作的一致性,但如果是SUnionStore这种呢?考虑以下场景:

  1. SAdd set1 "1"
  2. SAdd set2 "2"
  3. SRem set1 "1"
  4. SUnionStore setUnion set1 set2
  5. SMembers setUnion
    因为操作3 和操作4的dispatch_key是不同的,所以会导致这两个操作在从节点applyDB的时候分配到不同的write_db_workers_,而操作3、4的执行顺序会对操作5造成影响,这样是否会影响主从节点的一致性。
    image
    image

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


The previous design is no problem, because binlogs in the same db (which must also contain the same key) will be assigned to the same worker. Because this worker is a binlog worker, it only has one bg_thread to retrieve and apply tasks, so There is no issue of consistency. Calling sequence

This is the reason for the consistency of writing binlog. The key consistency is guaranteed in the implementation of PikaReplClient::ScheduleWriteDBTask. The first operation object of redis_command is taken out and hashed into write_db_workers_. This ensures that the writing key must be read by the same One thread write_db_workers_ executes, ensuring consistency.
But I have a question. It feels like this only guarantees the consistency of Set and Del operations, but what about SUnionStore? Consider the following scenario:

  1. SAdd set1 "1"
  2. SAdd set2 "2"
  3. SRem set1 "1"
    4.SUnionStore setUnion set1 set2
  4. SMembers setUnion
    Because the dispatch_keys of operations 3 and 4 are different, these two operations will be assigned to different write_db_workers_ when applying DB from the node, and the execution order of operations 3 and 4 will affect operation 5. Will this affect Consistency between master and slave nodes.
    image
    image

@chejinge

This comment was marked as resolved.

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


The previous design is no problem, because binlogs in the same db (which must also contain the same key) will be assigned to the same worker. Because this worker is a binlog worker, it has only one bg_thread to retrieve and apply tasks. So there is no consistency issue. Calling sequence

This is the reason for the consistency of writing binlog. The key consistency is guaranteed in the implementation of PikaReplClient::ScheduleWriteDBTask. The first operation object of redis_command is taken out and hashed into write_db_workers_. This ensures that the writing key must be The same thread write_db_workers_ is executed to ensure consistency. But I have a question. It feels like this only guarantees the consistency of Set and Del operations, but what about SUnionStore? Consider the following scenario:

  1. SAdd set1 "1"
  2. SAdd set2 "2"
  3. SRem set1 "1"
  4. SUnionStore setUnion set1 set2
  5. SMembers setUnion
    Because the dispatch_keys of operations 3 and 4 are different, these two operations will be assigned to different write_db_workers_ when applyingDB from the node, and the execution order of operations 3 and 4 will affect operation 5. Will this happen? Affects the consistency of master-slave nodes.
    ![image](https://private-user-images.githubusercontent.com/80143470/388889373-da14644a-a4bb-4980-8 e61-be83e6d8da55.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoic mF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzIyNjk4OTYsIm5iZiI6MTczMjI2OTU5Niw icGF0aCI6Ii84MDE0MzQ3MC8zODg4ODkzNzMtZGExNDY0NGETYTRiYi00OTgwLThlNjEtYmU4M2U2ZDhkYTU1LnBuZz9YLUFtei 1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEx MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTIyVDA5NTk1NlomWC1BbXotRXhwaXJ lcz0zMDAmWC1BbXotU2lnbmF0dXJlPWExOWI4ZTI5YmVjNTc3OWEzYTRiMzRkNmRhMzk2YTUzNDg5OGJmNzFjMTJhZDcyNzI2Z TAzY2E2ZmZkMDBiNjEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.QdsYgzYvc6L3TqfGFNcfQD2GJD-_dmAb6PBsqjOsD2E)
    ![image](https://private-user-images.githubusercontent.com/80143470/388889866-6f84dcb8-228c-465b-a c91-05e72a3f08fe.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoic mF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzIyNjk4OTYsIm5iZiI6MTczMjI2OTU5Niw icGF0aCI6Ii84MDE0MzQ3MC8zODg4ODk4NjYtNmY4NGRjYjgtMjI4Yy00NjViLWFjOTEtMDVlNzJhM2YwOGZlLnBuZz9YLUFtei 1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEx MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTIyVDA5NTk1NlomWC1BbXotRXhwaXJ lcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBlZWI5MzE5ZDFhNzg1NzIzYzQ2Njg0ZWU3MDBmYTNlYjA4MWY2MjFiYTYxN2Q5MTc2Y jUxM2FjNWVhMzJjNTUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.kgCEo_NYXqnLA_ktUUsdvZ1vSVVY0Jy9_5yDrgiJL_8)

This will indeed be inconsistent. The current design does not consider the idempotence of such commands for the time being. This is an intermediate state. If you want this command to be consistent, you need to synchronize Applybinlog, which will cause too much performance loss. big

@cheniujh
Copy link
Collaborator

前的设计是没问题的,因为同 db(那肯定也包含同 key)的 binlog 会被分到相同的 worker 中,因为这个 worker 是 binlog worker,它只有一个 bg_thread 对任务进行取出和应用,所以不存在一致性的问题。调用顺序

这里是写binlog的一致性的原因,key一致性的保证在PikaReplClient::ScheduleWriteDBTask的实现里面,取出redis_command的第一个操作对象作hash散列到write_db_workers_中,这样保证了写单key一定被同一个线程write_db_workers_执行,保证了一致性。 但是有个疑问,这里感觉只保证了Set, Del这种操作的一致性,但如果是SUnionStore这种呢?考虑以下场景:

  1. SAdd set1 "1"
  2. SAdd set2 "2"
  3. SRem set1 "1"
  4. SUnionStore setUnion set1 set2
  5. SMembers setUnion
    因为操作3 和操作4的dispatch_key是不同的,所以会导致这两个操作在从节点applyDB的时候分配到不同的write_db_workers_,而操作3、4的执行顺序会对操作5造成影响,这样是否会影响主从节点的一致性。
    image
    image

同学你说的很对,非常细心,优秀!
这个问题之前进行过修复,同学有空的话也可以看一下这个修复是否ok,如果有纰漏的话也欢迎提出来哈:
#1658

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


The previous design is no problem, because binlogs in the same db (which must also contain the same key) will be assigned to the same worker. Because this worker is a binlog worker, it has only one bg_thread to retrieve and apply tasks. So there is no consistency issue. Calling sequence

This is the reason for the consistency of writing binlog. The key consistency is guaranteed in the implementation of PikaReplClient::ScheduleWriteDBTask. The first operation object of redis_command is taken out and hashed into write_db_workers_. This ensures that the writing key must be The same thread write_db_workers_ is executed to ensure consistency. But I have a question. It feels like this only guarantees the consistency of Set and Del operations, but what about SUnionStore? Consider the following scenario:

  1. SAdd set1 "1"
  2. SAdd set2 "2"
  3. SRem set1 "1"
  4. SUnionStore setUnion set1 set2
  5. SMembers setUnion
    Because the dispatch_keys of operations 3 and 4 are different, these two operations will be assigned to different write_db_workers_ when applying DB from the node, and the execution order of operations 3 and 4 will affect operation 5. Will this happen? Affects the consistency of master-slave nodes.
    ![image](https://private-user-images.githubusercontent.com/80143470/388889373-da14644a-a4bb-4980-8 e61-be83e6d8da55.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoic mF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzIyNzg1NjUsIm5iZiI6MTczMjI3ODI2NSw icGF0aCI6Ii84MDE0MzQ3MC8zODg4ODkzNzMtZGExNDY0NGETYTRiYi00OTgwLThlNjEtYmU4M2U2ZDhkYTU1LnBuZz9YLUFtei 1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEx MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTIyVDEyMjQyNVomWC1BbXotRXhwaXJ lcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY4YTEzZGM3MjRjYTAyMmEyNjQ0Mzc4ODk2YTM1NTYyNDFhOTBjZjFkYjYxYzQ1ZDU4O GE2NWIwY2E5ZTMwODEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.ZJ4-Lh_UHLwSw57qrKWXJe-XK5hsWdc6SsG93Szrib8)
    ![image](https://private-user-images.githubusercontent.com/80143470/388889866-6f84dcb8-228c-465b-a c91-05e72a3f08fe.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoic mF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzIyNzg1NjUsIm5iZiI6MTczMjI3ODI2NSw icGF0aCI6Ii84MDE0MzQ3MC8zODg4ODk4NjYtNmY4NGRjYjgtMjI4Yy00NjViLWFjOTEtMDVlNzJhM2YwOGZlLnBuZz9YLUFtei 1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEx MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTIyVDEyMjQyNVomWC1BbXotRXhwaXJ lcz0zMDAmWC1BbXotU2lnbmF0dXJlPTFkNGFiOWIzNzA2YWI0NzUyZTQzMTdlYmM1NzFiMmY2ZDlmZGJmMWJjODIxNTg3NGM1Y zIyYWYzY2QzZWFhMWImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.F7akOKkVArU6Sg-jF9jr0D-VQe21_UvYfcpKcA2NJyI)

My classmate, you are right, very careful and excellent!
This problem has been fixed before. If you have time, you can also check whether the repair is OK. If there are any mistakes, you are welcome to raise them:
#1658

@pro-spild
Copy link
Collaborator

前的设计是没问题的,因为同 db(那肯定也包含同 key)的 binlog 会被分到相同的 worker 中,因为这个 worker 是 binlog worker,它只有一个 bg_thread 对任务进行取出和应用,所以不存在一致性的问题。调用顺序

这里是写binlog的一致性的原因,key一致性的保证在PikaReplClient::ScheduleWriteDBTask的实现里面,取出redis_command的第一个操作对象作hash散列到write_db_workers_中,这样保证了写单key一定被同一个线程write_db_workers_执行,保证了一致性。 但是有个疑问,这里感觉只保证了Set, Del这种操作的一致性,但如果是SUnionStore这种呢?考虑以下场景:

  1. SAdd set1 "1"
  2. SAdd set2 "2"
  3. SRem set1 "1"
  4. SUnionStore setUnion set1 set2
  5. SMembers setUnion
    因为操作3 和操作4的dispatch_key是不同的,所以会导致这两个操作在从节点applyDB的时候分配到不同的write_db_workers_,而操作3、4的执行顺序会对操作5造成影响,这样是否会影响主从节点的一致性。
    image
    image

同学你说的很对,非常细心,优秀! 这个问题之前进行过修复,同学有空的话也可以看一下这个修复是否ok,如果有纰漏的话也欢迎提出来哈: #1658

看明白了,先写db,再写binlog。用SUnion取出values,然后再用SADD写入,这样发给从节点的binlog就是SDel+SAdd,在从节点就不有一致性问题了,谢谢。

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


The previous design is no problem, because binlogs in the same db (which must also contain the same key) will be assigned to the same worker. Because this worker is a binlog worker, it has only one bg_thread to retrieve and apply tasks. , so there is no consistency problem. Calling sequence

This is the reason for the consistency of writing binlog. The key consistency is guaranteed in the implementation of PikaReplClient::ScheduleWriteDBTask. The first operation object of redis_command is taken out and hashed into write_db_workers_, which ensures that the writing key is certain. Executed by the same thread write_db_workers_, ensuring consistency. But I have a question. It feels like this only guarantees the consistency of Set and Del operations, but what about SUnionStore? Consider the following scenario:

  1. SAdd set1 "1"
  2. SAdd set2 "2"
  3. SRem set1 "1"
  4. SUnionStore setUnion set1 set2
  5. SMembers setUnion
    Because the dispatch_keys of operations 3 and 4 are different, these two operations will be assigned to different write_db_workers_ when applying DB from the node, and the execution order of operations 3 and 4 will affect operation 5. Is this right? Will affect the consistency of the master and slave nodes.
    ![image](https://private-user-images.githubusercontent.com/80143470/388889373-da14644a-a4bb-4980-8 e61-be83e6d8da55.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoic mF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzIyNzg1NjUsIm5iZiI6MTczMjI3ODI2NSw icGF0aCI6Ii84MDE0MzQ3MC8zODg4ODkzNzMtZGExNDY0NGETYTRiYi00OTgwLThlNjEtYmU4M2U2ZDhkYTU1LnBuZz9YLUFtei 1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEx MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTIyVDEyMjQyNVomWC1BbXotRXhwaXJ lcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY4YTEzZGM3MjRjYTAyMmEyNjQ0Mzc4ODk2YTM1NTYyNDFhOTBjZjFkYjYxYzQ1ZDU4O GE2NWIwY2E5ZTMwODEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.ZJ4-Lh_UHLwSw57qrKWXJe-XK5hsWdc6SsG93Szrib8)
    ![image](https://private-user-images.githubusercontent.com/80143470/388889866-6f84dcb8-228c-465b-a c91-05e72a3f08fe.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoic mF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzIyNzg1NjUsIm5iZiI6MTczMjI3ODI2NSw icGF0aCI6Ii84MDE0MzQ3MC8zODg4ODk4NjYtNmY4NGRjYjgtMjI4Yy00NjViLWFjOTEtMDVlNzJhM2YwOGZlLnBuZz9YLUFtei 1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDEx MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTIyVDEyMjQyNVomWC1BbXotRXhwaXJ lcz0zMDAmWC1BbXotU2lnbmF0dXJlPTFkNGFiOWIzNzA2YWI0NzUyZTQzMTdlYmM1NzFiMmY2ZDlmZGJmMWJjODIxNTg3NGM1Y zIyYWYzY2QzZWFhMWImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.F7akOKkVArU6Sg-jF9jr0D-VQe21_UvYfcpKcA2NJyI)

My classmate, you are right, very careful and excellent! This problem has been fixed before. If you have time, you can also check whether the repair is OK. If there are any mistakes, you are welcome to report them: #1658

I understand, write db first, then binlog. Use SUnion to take out the values, and then use SADD to write them. In this way, the binlog sent to the slave node is SDel+SAdd, and there will be no consistency problem on the slave node. Thank you.

@chejinge
Copy link
Collaborator Author

chejinge commented Dec 16, 2024

故障自愈的case:
1.主节点故障后,从节点自动升主,新的从节点做全量复制后或者下线后重新接一个从实例
2.从节点故障后,主从重新建立连接,数据能够正常的写入,如果从节点长时间不恢复的话,将从节点下线
3.主节点故障后,从节点升主,测试主从数据的一致性
4.从节点长时间没写成功失败了或者下线了,移除出该集群
5.从节点长时间没响应,接入新的从节点,做完全量复制后,主从集群恢复
6.主节点因为网络等问题没有返回且从节点没有写入成功(数据不完整),主从均不可对外提供服务

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically.


Fault self-healing case:

  1. After the master node fails, the slave node automatically becomes the master. The new slave node performs full replication or reconnects to a slave instance after going offline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✏️ Feature New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants