Skip to content

Commit

Permalink
Mon 153321 mariadb connection with errno0 24.10 (#1884)
Browse files Browse the repository at this point in the history
* fix(broker/sql): two issues in the mysql object

* A possible segfault fixed.
* An issue on errors raised by mariadb that can have errno=0 now.
* enh(tests): new tests on database connection
* enh(tests): new test that can lead to a segfault with the mysql object
* fix(cmake): missing dependency on pb_neb_lib

REFS: MON-153321
  • Loading branch information
bouda1 authored Nov 22, 2024
1 parent ab77f77 commit b7df296
Show file tree
Hide file tree
Showing 7 changed files with 355 additions and 32 deletions.
2 changes: 1 addition & 1 deletion broker/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -468,7 +468,7 @@ target_link_libraries(

# Standalone binary.
add_executable(cbd ${SRC_DIR}/main.cc)
add_dependencies(cbd multiplexing centreon_common)
add_dependencies(cbd multiplexing centreon_common pb_neb_lib)

# Flags needed to include all symbols in binary.
target_link_libraries(
Expand Down
27 changes: 18 additions & 9 deletions broker/core/sql/src/mysql_connection.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
* For more information : contact@centreon.com
*/
#include <errmsg.h>
#include <mysqld_error.h>

#include "com/centreon/broker/config/applier/init.hh"
#include "com/centreon/broker/misc/misc.hh"
Expand Down Expand Up @@ -460,18 +461,26 @@ void mysql_connection::_statement(mysql_task* t) {
"mysql_connection {:p}: execute statement {:x} attempt {}: {}",
static_cast<const void*>(this), task->statement_id, attempts, query);
if (mysql_stmt_execute(stmt)) {
std::string err_msg(
fmt::format("{} errno={} {}", mysql_error::msg[task->error_code],
::mysql_errno(_conn), ::mysql_stmt_error(stmt)));
SPDLOG_LOGGER_ERROR(_logger,
"connection fail to execute statement {:p}: {}",
static_cast<const void*>(this), err_msg);
if (_server_error(::mysql_stmt_errno(stmt))) {
int32_t err_code = ::mysql_stmt_errno(stmt);
std::string err_msg(fmt::format("{} errno={} {}",
mysql_error::msg[task->error_code],
err_code, ::mysql_stmt_error(stmt)));
if (err_code == 0) {
SPDLOG_LOGGER_ERROR(_logger,
"mysql_connection: errno=0, so we simulate a "
"server error CR_SERVER_LOST");
err_code = CR_SERVER_LOST;
} else {
SPDLOG_LOGGER_ERROR(_logger,
"connection fail to execute statement {:p}: {}",
static_cast<const void*>(this), err_msg);
}
if (_server_error(err_code)) {
set_error_message(err_msg);
break;
}
if (mysql_stmt_errno(stmt) != 1213 &&
mysql_stmt_errno(stmt) != 1205) // Dead Lock error
if (err_code != ER_LOCK_DEADLOCK &&
err_code != ER_LOCK_WAIT_TIMEOUT) // Dead Lock error
attempts = MAX_ATTEMPTS;

if (mysql_commit(_conn)) {
Expand Down
6 changes: 5 additions & 1 deletion broker/core/sql/src/mysql_multi_insert.cc
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,11 @@ void bulk_or_multi::execute(mysql& connexion,
my_error::code ec,
int thread_id) {
if (_bulk_stmt) {
if (!_bulk_bind->empty()) {
/* If the database connection is lost, we can have this issue */
if (!_bulk_bind) {
_bulk_bind = _bulk_stmt->create_bind();
_bulk_bind->reserve(_bulk_row);
} else if (!_bulk_bind->empty()) {
_bulk_stmt->set_bind(std::move(_bulk_bind));
connexion.run_statement(*_bulk_stmt, ec, thread_id);
_bulk_bind = _bulk_stmt->create_bind();
Expand Down
156 changes: 149 additions & 7 deletions tests/broker-engine/services-and-bulk-stmt.robot
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ EBBPS1
${start} Get Current Date
${start_broker} Get Current Date
Ctn Start Broker
Ctn Start engine
Ctn Start Engine
Ctn Wait For Engine To Be Ready ${start}

FOR ${i} IN RANGE ${1000}
Expand All @@ -52,6 +52,7 @@ EBBPS1
IF "${output}" == "((0,),)" BREAK
END
Should Be Equal As Strings ${output} ((0,),)
Disconnect From Database

FOR ${i} IN RANGE ${1000}
Ctn Process Service Check Result host_1 service_${i+1} 2 warning${i}
Expand Down Expand Up @@ -89,6 +90,7 @@ EBBPS1
IF "${output}" == "((0,),)" BREAK
END
Should Be Equal As Strings ${output} ((0,),)
Disconnect From Database

EBBPS2
[Documentation] 1000 service check results are sent to the poller. The test is done with the unified_sql stream, no service status is lost, we find the 1000 results in the database: table services.
Expand All @@ -109,7 +111,7 @@ EBBPS2
${start} Get Current Date
${start_broker} Get Current Date
Ctn Start Broker
Ctn Start engine
Ctn Start Engine
${content} Create List INITIAL SERVICE STATE: host_1;service_1000;
${result} Ctn Find In Log With Timeout ${engineLog0} ${start} ${content} 30
Should Be True
Expand All @@ -135,6 +137,7 @@ EBBPS2
IF "${output}" == "((0,),)" BREAK
END
Should Be Equal As Strings ${output} ((0,),)
Disconnect From Database

FOR ${i} IN RANGE ${1000}
Ctn Process Service Check Result host_1 service_${i+1} 2 critical${i}
Expand Down Expand Up @@ -171,6 +174,7 @@ EBBPS2
IF "${output}" == "((0,),)" BREAK
END
Should Be Equal As Strings ${output} ((0,),)
Disconnect From Database

EBMSSM
[Documentation] 1000 services are configured with 100 metrics each. The rrd output is removed from the broker configuration. GetSqlManagerStats is called to measure writes into data_bin.
Expand All @@ -191,7 +195,7 @@ EBMSSM
Ctn Clear Retention
${start} Get Current Date
Ctn Start Broker
Ctn Start engine
Ctn Start Engine
Ctn Broker Set Sql Manager Stats 51001 5 5

# Let's wait for the external command check start
Expand All @@ -217,6 +221,7 @@ EBMSSM
Sleep 1s
END
Should Be True ${output[0][0]} >= 100000
Disconnect From Database

EBPS2
[Documentation] 1000 services are configured with 20 metrics each. The rrd output is removed from the broker configuration to avoid to write too many rrd files. While metrics are written in bulk, the database is stopped. This must not crash broker.
Expand All @@ -240,7 +245,7 @@ EBPS2

${start} Get Current Date
Ctn Start Broker
Ctn Start engine
Ctn Start Engine
# Let's wait for the external command check start
${content} Create List check_for_external_commands()
${result} Ctn Find In Log With Timeout ${engineLog0} ${start} ${content} 60
Expand Down Expand Up @@ -294,7 +299,7 @@ RLCode
${start} Get Current Date

Ctn Start Broker
Ctn Start engine
Ctn Start Engine

${content} Create List check_for_external_commands()
${result} Ctn Find In Log With Timeout ${engineLog0} ${start} ${content} 60
Expand Down Expand Up @@ -364,7 +369,7 @@ metric_mapping
${start} Get Current Date

Ctn Start Broker
Ctn Start engine
Ctn Start Engine

${content} Create List check_for_external_commands()
${result} Ctn Find In Log With Timeout ${engineLog0} ${start} ${content} 60
Expand Down Expand Up @@ -404,7 +409,7 @@ Services_and_bulks_${id}

${start} Get Current Date
Ctn Start Broker
Ctn Start engine
Ctn Start Engine
Ctn Broker Set Sql Manager Stats 51001 5 5

# Let's wait for the external command check start
Expand Down Expand Up @@ -435,6 +440,143 @@ Services_and_bulks_${id}
... 1 1020
... 2 150

EBMSSMDBD
[Documentation] 1000 services are configured with 100 metrics each.
... The rrd output is removed from the broker configuration.
... While metrics are written in the database, we stop the database and then restart it.
... Broker must recover its connection to the database and continue to write metrics.
[Tags] broker engine unified_sql MON-152743
Ctn Clear Metrics
Ctn Config Engine ${1} ${1} ${1000}
# We want all the services to be passive to avoid parasite checks during our test.
Ctn Set Services Passive ${0} service_.*
Ctn Config Broker central
Ctn Config Broker rrd
Ctn Config Broker module ${1}
Ctn Config BBDO3 1
Ctn Broker Config Log central core error
Ctn Broker Config Log central tcp error
Ctn Broker Config Log central sql debug
Ctn Config Broker Sql Output central unified_sql
Ctn Config Broker Remove Rrd Output central
Ctn Clear Retention
${start} Get Current Date
Ctn Start Broker
Ctn Start Engine

Ctn Wait For Engine To Be Ready ${start} 1

${start} Ctn Get Round Current Date
# Let's wait for one "INSERT INTO data_bin" to appear in stats.
Log To Console Many service checks with 100 metrics each are processed.
FOR ${i} IN RANGE ${1000}
Ctn Process Service Check Result With Metrics host_1 service_${i+1} 1 warning${i} 100
END

Log To Console We wait for at least one metric to be written in the database.
# Let's wait for all force checks to be in the storage database.
Connect To Database pymysql ${DBName} ${DBUser} ${DBPass} ${DBHost} ${DBPort}
FOR ${i} IN RANGE ${500}
${output} Query
... SELECT COUNT(s.last_check) FROM metrics m LEFT JOIN index_data i ON m.index_id = i.id LEFT JOIN services s ON s.host_id = i.host_id AND s.service_id = i.service_id WHERE metric_name LIKE "metric_%%" AND s.last_check >= ${start}
IF ${output[0][0]} >= 1 BREAK
Sleep 1s
END
Disconnect From Database

Log To Console Let's start some database manipulation...
${start} Get Current Date

FOR ${i} IN RANGE ${3}
Ctn Stop Mysql
Sleep 10s
Ctn Start Mysql
${content} Create List could not insert data in data_bin
${result} Ctn Find In Log With Timeout ${centralLog} ${start} ${content} 10
Log To Console ${result}
END

EBMSSMPART
[Documentation] 1000 services are configured with 100 metrics each.
... The rrd output is removed from the broker configuration.
... The data_bin table is configured with two partitions p1 and p2 such
... that p1 contains old data and p2 contains current data.
... While metrics are written in the database, we remove the p2 partition.
... Once the p2 partition is recreated, broker must recover its connection
... to the database and continue to write metrics.
... To check that last point, we force a last service check and we check
... that its metrics are written in the database.
[Tags] broker engine unified_sql MON-152743
Ctn Clear Metrics
Ctn Config Engine ${1} ${1} ${1000}
# We want all the services to be passive to avoid parasite checks during our test.
Ctn Set Services Passive ${0} service_.*
Ctn Config Broker central
Ctn Config Broker rrd
Ctn Config Broker module ${1}
Ctn Config BBDO3 1
Ctn Broker Config Log central core error
Ctn Broker Config Log central tcp error
Ctn Broker Config Log central sql trace
Ctn Config Broker Sql Output central unified_sql
Ctn Config Broker Remove Rrd Output central
Ctn Clear Retention

Ctn Prepare Partitions For Data Bin
${start} Get Current Date
Ctn Start Broker
Ctn Start Engine

Ctn Wait For Engine To Be Ready ${start} 1

${start} Ctn Get Round Current Date
# Let's wait for one "INSERT INTO data_bin" to appear in stats.
Log To Console Many service checks with 100 metrics each are processed.
FOR ${i} IN RANGE ${1000}
Ctn Process Service Check Result With Metrics host_1 service_${i+1} 1 warning${i} 100
END

Log To Console We wait for at least one metric to be written in the database.
# Let's wait for all force checks to be in the storage database.
Connect To Database pymysql ${DBName} ${DBUser} ${DBPass} ${DBHost} ${DBPort}
FOR ${i} IN RANGE ${500}
${output} Query
... SELECT COUNT(s.last_check) FROM metrics m LEFT JOIN index_data i ON m.index_id = i.id LEFT JOIN services s ON s.host_id = i.host_id AND s.service_id = i.service_id WHERE metric_name LIKE "metric_%%" AND s.last_check >= ${start}
IF ${output[0][0]} >= 1 BREAK
Sleep 1s
END
Disconnect From Database

Log To Console Let's start some database manipulation...
Ctn Remove P2 From Data Bin
${start} Get Current Date

${content} Create List errno=
FOR ${i} IN RANGE ${6}
${result} Ctn Find In Log With Timeout ${centralLog} ${start} ${content} 10
IF ${result} BREAK
END

Log To Console Let's recreate the p2 partition...
Ctn Add P2 To Data Bin

${start} Ctn Get Round Current Date
Ctn Process Service Check Result With Metrics host_1 service_1 0 Last Output OK 100

Log To Console Let's wait for the last service check to be in the database...
Connect To Database pymysql ${DBName} ${DBUser} ${DBPass} ${DBHost} ${DBPort}
FOR ${i} IN RANGE ${120}
${output} Query SELECT count(*) FROM data_bin WHERE ctime >= ${start} - 10
Log To Console ${output}
IF ${output[0][0]} >= 100 BREAK
Sleep 1s
END
Log To Console ${output}
Should Be True ${output[0][0]} >= 100
Disconnect From Database

Ctn Init Data Bin Without Partition


*** Keywords ***
Ctn Test Clean
Expand Down
4 changes: 2 additions & 2 deletions tests/broker-engine/services-increased.robot
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ EBNSVC1
${result} Ctn Check Number Of Resources Monitored By Poller Is ${3} ${nb_res} 30
Should Be True ${result} Poller 3 should monitor ${nb_srv} services and 16 hosts.
END
Ctn Stop engine
Ctn Stop Engine
Ctn Kindly Stop Broker

Service_increased_huge_check_interval
Expand Down Expand Up @@ -154,4 +154,4 @@ Service_increased_huge_check_interval
... rra[0].pdp_per_row must be equal to 5400 for metric ${m}
END

[Teardown] Run Keywords Ctn Stop engine AND Ctn Kindly Stop Broker
[Teardown] Run Keywords Ctn Stop Engine AND Ctn Kindly Stop Broker
Loading

1 comment on commit b7df296

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
14 1 0 15 93.33 27m17.823085999s

Failed Tests

Name Message ⏱️ Duration Suite
BENCH_1000STATUS AttributeError: 'NoneType' object has no attribute 'query_read_bytes' 85.365 s Bench

Please sign in to comment.