-
Notifications
You must be signed in to change notification settings - Fork 0
Re-implement Doublewrite buffer encryption #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ps-8.0.20-merge
Are you sure you want to change the base?
Re-implement Doublewrite buffer encryption #3
Conversation
…ace encryption) https://jira.percona.com/browse/PS-6789 Temporarily reverted PS-3822 "InnoDB system tablespace encryption" https://jira.percona.com/browse/PS-3822 (commit 78b6114) to make parallel doublewrite part of the upstream 8.0.20 merge easier. Temporarily disabled the following MTR test cases: - 'innodb.percona_parallel_dblwr_encrypt' - 'innodb.percona_sys_tablespace_encrypt' - 'innodb.percona_sys_tablespace_encrypt_dblwr' - 'sys_vars.innodb_parallel_dblwr_encrypt_basic' - 'sys_vars.innodb_sys_tablespace_encrypt_basic'
…b_doublewrite file when innodb_doublewrite is disabled) https://jira.percona.com/browse/PS-6789 Temporarily reverted PS-3411 "LP #1570682: Parallel doublewrite buffer file created when skip-innodb_doublewrite is set" https://jira.percona.com/browse/PS-3411 (commit 14318e4) to make parallel doublewrite part of the upstream 8.0.20 merge easier.
…must crash server on I/O error) https://jira.percona.com/browse/PS-6789 Temporarily reverted PS-5678 "Parallel doublewrite must crash server on I/O error" https://jira.percona.com/browse/PS-5678 (commit 0f810d7) to make parallel doublewrite part of the upstream 8.0.20 merge easier.
…rotation. ALPHA) https://jira.percona.com/browse/PS-6789 Temporarily reverted 'buf0dblwr.cc' part of the PS-3829 "Innodb key rotation. ALPHA" https://jira.percona.com/browse/PS-3829 (commit c7f44ee) to make parallel doublewrite part of the upstream 8.0.20 merge easier.
…d to set O_DIRECT on xb_doublewrite when running MTR test cases) https://jira.percona.com/browse/PS-6789 Temporarily reverted PS-1068 "Fix bug 1669414 (Failed to set O_DIRECT on xb_doublewrite when running MTR test cases)" https://jira.percona.com/browse/PS-1068 (commit 7f41824) to make parallel doublewrite part of the upstream 8.0.20 merge easier.
…lel doublewrite memory not freed with innodb_fast_shutdown=2) https://jira.percona.com/browse/PS-6789 Temporarily reverted PS-1707 "LP #1578139: Parallel doublewrite memory not freed with innodb_fast_shutdown=2" https://jira.percona.com/browse/PS-1707 (commit 8a53ed7) to make parallel doublewrite part of the upstream 8.0.20 merge easier.
… implementation (Implement parallel doublewrite) https://jira.percona.com/browse/PS-6789 Reverted 'parallel-doublewrite' blueprint implementation "Implement parallel doublewrite" https://blueprints.launchpad.net/percona-server/+spec/parallel-doublewrite (commit 4596aaa) to make parallel doublewrite part of the upstream 8.0.20 merge easier. Temporarily disabled the following MTR test cases: - 'sys_vars.innodb_parallel_doublewrite_path_basic' - 'innodb.percona_doublewrite'
https://jira.percona.com/browse/PS-6789 *** Updated man pages from MySQL Server 8.0.20 source tarball. *** Updated 'scripts/fill_help_tables.sql' from MySQL Server 8.0.20 source tarball.
https://jira.percona.com/browse/PS-6789 *** Reverted our fix for PS-6094 "Handler fails to trigger on Error 1049 or SQLSTATE 42000 or plain sqlexception" (https://jira.percona.com/browse/PS-6094) (commit 31b5c73) in favor of the upstream fix for the Bug #30561920 / #97682 "Handler fails to trigger on Error 1049 or SQLSTATE 42000 or plain sqlexception" (https://bugs.mysql.com/bug.php?id=97682) (commit mysql/mysql-server@72c6171). *** Reverted our fix for PS-3630 "LP #1660255: Test innodb.innodb_mysql is unstable" (https://jira.percona.com/browse/PS-3630) (commit e0b5050) in favor of the upstream fix for the Bug #30810572 "FIX INNODB-MYSQL TEST" (commit mysql/mysql-server@2692669). *** Reverted our 8.0.17 merge postfix "PS-5363 (Merge MySQL 8.0.17): fixed regexps in the rpl.rpl_perfschema_threads_processlist_status MTR test case" (https://jira.percona.com/browse/PS-5363) (commit 8d7dd4a) affecting 'rpl.rpl_perfschema_threads_processlist_status' MTR test case in favor of the changes made by upstream in WL#3549 "Binlog Compression" (commit mysql/mysql-server@1e5ae34). *** Reverted our 8.0.18 merge postfix "PS-5674: gen_lex_token generator reworked" (https://jira.percona.com/browse/PS-5674) (commit 214212a) in favor of the changes made by upstream Bug #30765691 "FREE TOKEN SLOTS ARE EXHAUSTED IN GEN_LEX_TOKEN.CC" (commit mysql/mysql-server@17ca03f). 'SYM_PERCONA()' macro preserved and made a synonym for upstream's 'SYM()'. Percona Server 5.7-specific tokens - CHANGED_PAGE_BITMAPS_SYM - CLIENT_STATS_SYM - CLUSTERING_SYM - COMPRESSION_DICTIONARY_SYM - INDEX_STATS_SYM - TABLE_STATS_SYM - THREAD_STATS_SYM - USER_STATS_SYM - ENCRYPTION_KEY_ID_SYM explicitly assigned values starting from 1300. The same values were assigned to them implicitly in Percona Server 8.0.19. Percona Server 8.0-specific tokens - EFFECTIVE_SYM - SEQUENCE_TABLE_SYM explicitly assigned values starting from 1350. This group has different values than in Percona Server 8.0.19. *** Similarly to other 'innodb.log_encrypt_<n>' MTR test cases 'innodb.log_encrypt_7' coming from upstream 8.0.20 cloned into two 'innodb.log_encrypt_7_mk' and 'innodb.log_encrypt_7_rk'. *** Similarly to other 'innodb.table_encrypt_<n>' MTR test cases 'innodb.table_encrypt_6' coming from upstream 8.0.20 cloned into three 'innodb.table_encrypt_6', 'keyring_vault.table_encrypt_6' and 'keyring_vault.table_encrypt_6_directory'. *** VERSION raised to "8.0.20-11". univ.i version raised to "11".
https://jira.percona.com/browse/PS-6789 In the fix for Bug #30508721 "MTR DOESN'T KEEP TRACK OF THE STATE OF INNODB MONITORS" (commit mysql/mysql-server@abd33c2) Oracle extended MTR 'check-testcase' procedure with additional comparison of data from InnoDB metrics state. They also introduced 'mysql-test/include/innodb_monitor_restore.inc' MTR include file that is supposed to reset InnoDB monitors to their default state. 'mysql-test/include/innodb_monitor_restore.inc' extended with enabling Percona-specific monitors, those that are enabled (defined with 'MONITOR_DEFAULT_ON' flag) by default. Similarly to what was done in the upstream patch "SET GLOBAL innodb_monitor_enable=default;" "SET GLOBAL innodb_monitor_disable=default;" "SET GLOBAL innodb_monitor_reset_all=default;" statement sequences were substituted with '--source include/innodb_monitor_restore.inc' all over the test code. As the result, fixed the following MTR test cases: - 'innodb.innodb_idle_flush_pct' - 'innodb.lock_contention_big' - 'innodb.monitor' - 'innodb.percona_ahi_partitions' - 'innodb.percona_changed_page_bmp_flush_5446' - 'innodb.transportable_tbsp-debug' - 'innodb_zip.transportable_tbsp_debug_zip' - 'sys_vars.innodb_monitor_disable_basic' - 'sys_vars.innodb_monitor_enable_basic' - 'sys_vars.innodb_monitor_reset_all_basic' - 'sys_vars.innodb_monitor_reset_basic' - 'sys_vars.innodb_purge_run_now_basic' - 'sys_vars.innodb_purge_stop_now_basic'
…ated MTR test cases https://jira.percona.com/browse/PS-6789 The following MTR test cases re-recorded because of the 'filesort' improvements introduced in the fix for Oracle's Bug #30776132 "MAKE FILESORT KEYS CONSISTENT BETWEEN FIELDS AND ITEMS" (commit mysql/mysql-server@6d587a6) - 'main.pool_of_threads' - 'main.pool_of_threads_high_prio_tickets'. The following MTR test cases re-recorded because of the changed execution plan (more hash joins instead of nested blok loops) introduced in these improvements Bug #30528604 "DELETE THE PRE-ITERATOR EXECUTOR" (commit mysql/mysql-server@ef166f8), Bug #30473261 "CONVERT THE INDEX SUBQUERY ENGINES INTO USING THE ITERATOR EXECUTOR" (commit mysql/mysql-server@cb4116e) (commit mysql/mysql-server@629b549) (commit mysql/mysql-server@5a41fba) (commit mysql/mysql-server@31bd903) (commit mysql/mysql-server@75bbe1b) (commit mysql/mysql-server@6226c1a) (commit mysql/mysql-server@0b45e96) (commit mysql/mysql-server@8e45d7e) (commit mysql/mysql-server@7493ae4) (commit mysql/mysql-server@a5f60bf) (commit mysql/mysql-server@609b86e), Bug #30912972 "ASSERTION `KEYLEN == M_START_KEY.LENGTH' FAILED" (commit mysql/mysql-server@b28bea5) - 'audit_log.audit_log_filter_db' - 'main.pool_of_threads' - 'main.pool_of_threads_high_prio_tickets' - 'main.percona_expand_fast_index_creation' - 'main.percona_sequence_table'
https://jira.percona.com/browse/PS-6789 Re-recorded 'main.bug74778' MTR test case because of the new 'SHOW_ROUTINE' privilege implemented by Oracle in WL #9049 "Add a dynamic privilege for stored routine backup" (https://dev.mysql.com/worklog/task/?id=9049) (commit mysql/mysql-server@3e41e44)
… MTR test case https://jira.percona.com/browse/PS-6789 Re-recorded 'main.backup_locks_mysqldump' MTR test case because of the new default 'mysqldump' network timeout introduced in the fix for Oracle Bug #30755992 / #98203 "mysql dump sufficiently long network timeout too short" (https://bugs.mysql.com/bug.php?id=98203) (commit mysql/mysql-server@1f90fad)
https://jira.percona.com/browse/PS-6789 Re-recorded 'main.bug88797' MTR test case because of the new deprecation warning introduced in the implementation of WL #13325 "Deprecate VALUES syntax in INSERT ... ON DUPLICATE KEY UPDATE" (https://dev.mysql.com/worklog/task/?id=13325) (commit mysql/mysql-server@6f3b9df)
…test cases with explicit binlog positions https://jira.percona.com/browse/PS-6789 Fixed/re-recorded the following MTR test cases because of the changes in the implementation of WL percona#3549 "Binlog: compression" (https://dev.mysql.com/worklog/task/?id=3549) (commit mysql/mysql-server@1e5ae34) that caused increasing 'Format_description_event' binlog event size and therefore some pre-recorded binary log positions in the '.result' files. - 'main.ackup_safe_binlog_info' - 'main.mysqldump-max' - 'binlog.percona_binlog_consistent_mixed' - 'binlog.percona_binlog_consistent_row' - 'binlog.percona_binlog_consistent_stmt' - 'binlog.percona_binlog_consistent_debug'
…space encryption) https://jira.percona.com/browse/PS-6789 1. Re-enable system tablespace encryption again after 8.0.20 upstream merge that has new parallel doublewrite implementation (https://jira.percona.com/browse/PS-3822). 2. Removed 'innodb.percona_sys_tablespace_encrypt_dblwr' MTR test case as there is no doublewrite buffer in system tablespace anymore.
…29 (Innodb key rotation. ALPHA) https://jira.percona.com/browse/PS-6789 Restored 'buf0dblwr.cc' part of the PS-3829 "Innodb key rotation. ALPHA" https://jira.percona.com/browse/PS-3829 (commit c7f44ee) after upstream 8.0.20 merge. The following MTR test cases do not crash anymore - 'encryption.upgrade_crypt_data_57_v1' - 'encryption.upgrade_crypt_data_v1' - 'innodb.innodb_scrub' - 'main.percona_dd_upgrade_encrypted'
…in.percona_signal_handling_threadpool MTR test cases https://jira.percona.com/browse/PS-6789 Fixed and re-recorded 'main.percona_signal_handling' and 'main.percona_signal_handling_threadpool' MTR test in response to the changes in the Bug #30578923 "SENDING SIGHUP CAUSES A LOT OF GARBAGE TO BE PRINTED" (commit mysql/mysql-server@b90a1b3). Removed "Status information:" log section is now simulated via 'DBUG_EXECUTE_IF()'. MTR test cases made debug-only.
./mtr --mem innodb.percona_parallel_dblwr_encrypt{,,,} --parallel=4 --repeat=20 [ 93%] innodb.percona_parallel_dblwr_encrypt w4 [ pass ] 11147
|
3708733
to
f6f0876
Compare
81e3869
to
6c5dc40
Compare
…o: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded Problem ======= Running mtr with ASAN build on Gentoo tests fails since the path to libtirpc is not /lib64/libtirpc.so which is the path mtr uses for preloading the library. Further more the libasan path in Gentoo may contain also underscores and minus which mtr safe_process does not recognize. Fails on Gentoo since /lib64/libtirpc.so do not exist +ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. Fails on Gentoo since /usr/lib64/libtirpc.so is a GNU LD script +ERROR: ld.so: object '/usr/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (invalid ELF header): ignored. Need to preload /lib64/libtirpc.so.3 on gentoo. When compiling with GNU C++ libasan path also include minus and underscores: $ less mysql-test/lib/My/SafeProcess/ldd_asan_test_result linux-vdso.so.1 (0x00007ffeba962000) libasan.so.4 => /usr/lib/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so.4 (0x00007f3c2e827000) Tests that been affected in different ways are for example: $ ./mtr group_replication.gr_clone_integration_clone_not_installed [100%] group_replication.gr_clone_integration_clone_not_installed w3 [ fail ] ... ERROR: ld.so: object '/usr/lib/gcc/x86' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. mysqltest: At line 21: Query 'START GROUP_REPLICATION' failed. ERROR 2013 (HY000): Lost connection to MySQL server during query ... ASAN:DEADLYSIGNAL ================================================================= ==11970==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f0e5cecfb8c bp 0x7f0e340f1650 sp 0x7f0e340f0dc8 T44) ==11970==The signal is caused by a READ memory access. ==11970==Hint: address points to the zero page. #0 0x7f0e5cecfb8b in xdr_uint32_t (/lib64/libc.so.6+0x13cb8b) #1 0x7f0e5fbe6d43 (/usr/lib/gcc/x86_64-pc-linux-gnu/7.3.0/libasan.so.4+0x87d43) #2 0x7f0e3c675e59 in xdr_node_no plugin/group_replication/libmysqlgcs/xdr_gen/xcom_vp_xdr.c:88 #3 0x7f0e3c67744d in xdr_pax_msg_1_6 plugin/group_replication/libmysqlgcs/xdr_gen/xcom_vp_xdr.c:852 ... $ ./mtr ndb.ndb_config [100%] ndb.ndb_config [ fail ] ... --- /.../src/mysql-test/suite/ndb/r/ndb_config.result 2019-06-25 21:19:08.308997942 +0300 +++ /.../bld/mysql-test/var/log/ndb_config.reject 2019-06-26 11:58:11.718512944 +0300 @@ -30,16 +30,22 @@ == 16 == bug44689 192.168.0.1 192.168.0.2 192.168.0.3 192.168.0.4 192.168.0.1 192.168.0.1 == 17 == bug49400 +ERROR: ld.so: object '/usr/lib/gcc/x86' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. +ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR -- at line 25: TCP connection is a duplicate of the existing TCP link from line 14 ERROR -- at line 25: Could not store section of configuration file. $ ./mtr ndb.ndb_basic [100%] ndb.ndb_basic [ pass ] 34706 ERROR: ld.so: object '/usr/lib/gcc/x86' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. ERROR: ld.so: object '/lib64/libtirpc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored. Solution ======== In safe_process use same trick for libtirpc as for libasan to determine path to library for pre loading. Also allow underscores and minus in paths. In addition also add some memory leak suppressions for perl. Change-Id: Ia02e354a20cf8b279eb2573f3f8c2c39776343dc (cherry picked from commit e88706d)
To call a service implementation one needs to: 1. query the registry to get a reference to the service needed 2. call the service via the reference 3. call the registry to release the reference While #2 is very fast (just a function pointer call) #1 and #3 can be expensive since they'd need to interact with the registry's global structure in a read/write fashion. Hence if the above sequence is to be repeated in a quick succession it'd be beneficial to do steps #1 and #3 just once and aggregate as many #2 steps in a single sequence. This will usually mean to cache the service reference received in #1 and delay 3 for as much as possible. But since there's an active reference held to the service implementation until 3 is taken special handling is needed to make sure that: The references are released at regular intervals so changes in the registry can become effective. There is a way to mark a service implementation as "inactive" ("dying") so that until all of the active references to it are released no new ones are possible. All of the above is part of the current audit API machinery, but needs to be isolated into a separate service suite and made generally available to all services. This is what this worklog aims to implement. RB#24806
TABLESPACE STATE DOES NOT CHANGE THE SPACE TO EMPTY After the commit for Bug#31991688, it was found that an idle system may not ever get around to truncating an undo tablespace when it is SET INACTIVE. Actually, it takes about 128 seconds before the undo tablespace is finally truncated. There are three main tasks for the function trx_purge(). 1) Process the undo logs and apply changes to the data files. (May be multiple threads) 2) Clean up the history list by freeing old undo logs and rollback segments. 3) Truncate undo tablespaces that have grown too big or are SET INACTIVE explicitly. Bug#31991688 made sure that steps 2 & 3 are not done too often. Concentrating this effort keeps the purge lag from growing too large. By default, trx_purge() does step#1 128 times before attempting steps #2 & #3 which are called 'truncate' steps. This is set by the setting innodb_purge_rseg_truncate_frequency. On an idle system, trx_purge() is called once per second if it has nothing to do in step 1. After 128 seconds, it will finally do steps 2 (truncating the undo logs and rollback segments which reduces the history list to zero) and step 3 (truncating any undo tablespaces that need it). The function that the purge coordinator thread uses to make these repeated calls to trx_purge() is called srv_do_purge(). When trx_purge() returns having done nothing, srv_do_purge() returns to srv_purge_coordinator_thread() which will put the purge thread to sleep. It is woke up again once per second by the master thread in srv_master_do_idle_tasks() if not sooner by any of several of other threads and activities. This is how an idle system can wait 128 seconds before the truncate steps are done and an undo tablespace that was SET INACTIVE can finally become 'empty'. The solution in this patch is to modify srv_do_purge() so that if trx_purge() did nothing and there is an undo space that was explicitly set to inactive, it will immediately call trx_purge again with do_truncate=true so that steps #2 and #3 will be done. This does not affect the effort by Bug#31991688 to keep the purge lag from growing too big on sysbench UPDATE NO_KEY. With this change, the purge lag has to be zero and there must be a pending explicit undo space truncate before this extra call to trx_purge is done. Approved by Sunny in RB#25311
Some minor refactoring of function find_item_in_list() before the fix. - Code is generally aligned with coding standard. - Error handling is separated out, now is is false for success and true for error. - The found item is now an output argument, and a null pointer means the item was not found (along with the other two out arguments). - The report_error argument is removed since it was always used as REPORT_EXCEPT_NOT_FOUND. - A local variable "find_ident" is introduced, since it better represents that we are searching for a column reference than having separate field_name, table_name and db_name variables. Item_field::replace_with_derived_expr_ref() - Redundant tests were removed. Function resolve_ref_in_select_and_group() has also been changed so that success/error is now returned as false/true, and the found item is an out argument. Function Item_field::fix_fields() - The value of thd->lex->current_query_block() is cached in a local variable. - Since single resolving was introduced, a test for "field" equal to nullptr was redundant and could be eliminated, along with the indented code block that followed. - A code block for checking bitmaps if the above test was false could also be removed. Change-Id: I3cd4bd6a23dd07faff773bdf11940bcfd847c903
Two problems were identified for this bug. The first is seen by looking at the reduced query: select subq_1.c1 as c1 from (select subq_0.c0 as c0, subq_0.c0 as c1, 90 as c2, subq_0.c1 as c3 from (select (select v2 from table1) as c0, ref_0.v4 as c1 from table1 as ref_0 ) as subq_0 ) as subq_1 where EXISTS (select subq_1.c0 as c2, case when EXISTS (select (select v0 from table1) as c1 from table1 as ref_8 where EXISTS (select subq_1.c2 as c7 from table1 as ref_9 ) ) then subq_1.c3 end as c5 from table1 as ref_7); In the innermost EXISTS predicate, a column subq_1.c2 is looked up. It is erroneously found as the column subq_1.c0 with alias c2 in the query block of the outermost EXISTS predicate. But this resolving is not according to SQL standard: A table name cannot be part of a column alias, it has to be a simple identifier, and any referencing column must also be a simple identifier. By changing item_ref->item_name to item_ref->field_name in a test in find_item_in_list, we ensure that the match is against a table (view) name and column name and not an alias. But there is also another problem. The EXISTS predicate contains a few selected columns that are resolved and then immediately deleted since they are redundant in EXISTS processing. But if these columns are outer references and defined in a derived table, we may actually de-reference them before the initial reference increment. Thus, those columns are removed before they are possibly used later. This happens to subq_1.c2 which is resolved in the outer-most query block and coming from a derived table. We prevent this problem by incrementing the reference count of selected expressions from derived tables earlier, and we try to prevent this problem from re-occuring by adding an "m_abandoned" field in class Item, which is set to true when the reference count is decremented to zero and prevents the reference count from ever be incremented after that. Change-Id: Idda48ae726a580c1abdc000371b49a753e197bc6
https://jira.percona.com/browse/PS-8592 Description ----------- GR suffered from 9E19 problems caused by the security probes and network scanner processes connecting to the group replication communication port. This usually is not a problem, but poses a serious threat when another member tries to join the cluster by initialting a connection to the member which is affected by external processes using the port dedicated for group communication for longer durations. On such activites by external processes, the SSL enabled server stalled forever on the SSL_accept() call waiting for handshake data. Below is the stacktrace: Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)): #0 in read () #1 in sock_read () #2 in BIO_read () #3 in ssl23_read_bytes () #4 in ssl23_get_client_hello () #5 in ssl23_accept () #6 in xcom_tcp_server_startup(Xcom_network_provider*) () When the server stalled in the above path forever, it prohibited other members to join the cluster resulting in the following messages on the joiner server's logs. [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group' [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.' Solution -------- This patch adds two new variables 1. group_replication_xcom_ssl_socket_timeout It is a file-descriptor level timeout in seconds for both accept() and SSL_accept() calls when group replication is listening on the xcom port. When set to a valid value, say for example 5 seconds, both accept() and SSL_accept() return after 5 seconds. The default value has been set to 0 (waits infinitely) for backward compatibility. This variable is effective only when GR is configred with SSL. 2. group_replication_xcom_ssl_accept_retries It defines the number of retries to be performed before closing the socket. For each retry the server thread calls SSL_accept() with timeout defined by the group_replication_xcom_ssl_socket_timeout for the SSL handshake process once the connection has been accepted by the first accept() call. The default value has been set to 10. This variable is effective only when GR is configred with SSL. Note: - Both of the above variables are dynamically configurable, but will become effective only on START GROUP_REPLICATION.
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
Part of WL#15135 Certificate Architecture This patch adds an instance of TlsKeyManager to class TransporterRegistry. This TlsKeyManager will handle certificate authentication in all node types. A new method TransporterRegistry::init_tls() configures TLS at node startup time. Change-Id: I1f9d3fff21ea7f2d9f009cce48823304c2baead7
Add a unit test, an NdbApi test, and an MTR test. The unit test is testNdbProcess-t The NdbApi test is testMgmd -n SshKeySigning The MTR test is sign_keys in suite ndb_tls Create the ndb_tls test suite. Create the ndb-tls subdirectory in std_data. Create a CA key and certificate in std_data/ndb-tls/. Change-Id: Icec0fa4a9031be11facbd346d09debe8bc8bfe68
Negotiate TLS in SocketAuthTls SocketAuthenticator. On the server side, TransporterRegistry always instantiates a SocketAuthTls authenticator. Change-Id: I826390b545ef96ec4224ff25bc66d9fdb7a5cf7a
Add the final bit of code into TransporterRegsitry to start TLS before transporter upgrade, and update the MTR test results. The tls_required and tls_off_certs tests will show TLS in use for transporter connections to MGMD. Change-Id: I2683447c02b27e498873fee77e0382c609a477cd
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Merge remote-tracking branch 'venki/PS-9018-8.0-gca' into HEAD Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
…nt on Windows and posix [#3] Adding a new method NdbProcess::create_via_ssh to be used when invoking programs over ssh. Currently no extra quoting of arguments will be done due to the ssh indirection. To properly provide that one would need to know about what shell will ssh use to execute command on remote host, and on Windows one also need to know what quoting the program expects. Change-Id: Iff588581c2afa6f599e6055f916dafb5d3cff602
Problem: Starting ´ndb_mgmd --bind-address´ may potentially cause abnormal program termination in MgmtSrvr destructor when ndb_mgmd restart itself. Core was generated by `ndb_mgmd --defa'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f8ce4066b8f in raise () from /lib64/libc.so.6 #1 0x00007f8ce4039ea5 in abort () from /lib64/libc.so.6 #2 0x00007f8ce40a7d97 in __libc_message () from /lib64/libc.so.6 #3 0x00007f8ce40af08c in malloc_printerr () from /lib64/libc.so.6 #4 0x00007f8ce40b132d in _int_free () from /lib64/libc.so.6 #5 0x00000000006e9ffe in MgmtSrvr::~MgmtSrvr (this=0x28de4b0) at mysql/8.0/storage/ndb/src/mgmsrv/MgmtSrvr.cpp: 890 #6 0x00000000006ea09e in MgmtSrvr::~MgmtSrvr (this=0x2) at mysql/8.0/ storage/ndb/src/mgmsrv/MgmtSrvr.cpp:849 #7 0x0000000000700d94 in mgmd_run () at mysql/8.0/storage/ndb/src/mgmsrv/main.cpp:260 #8 0x0000000000700775 in mgmd_main (argc=<optimized out>, argv=0x28041d0) at mysql/8.0/storage/ndb/src/ mgmsrv/main.cpp:479 Analysis: While starting up, the ndb_mgmd will allocate memory for bind_address in order to potentially rewrite the parameter. When ndb_mgmd restart itself the memory will be released and dangling pointer causing double free. Fix: Drop support for bind_address=[::], it is not documented anywhere, is not useful and doesn't work. This means the need to rewrite bind_address is gone and bind_address argument need neither alloc or free. Change-Id: I7797109b9d8391394587188d64d4b1f398887e94
… for connection xxx'. The new iterator based explains are not impacted. The issue here is a race condition. More than one thread is using the query term iterator at the same time (whoch is neithe threas safe nor reantrant), and part of its state is in the query terms being visited which leads to interference/race conditions. a) the explain thread uses an iterator here: Sql_cmd_explain_other_thread::execute is inspecting the Query_expression of the running query calling master_query_expression()->find_blocks_query_term which uses an iterator over the query terms in the query expression: for (auto qt : query_terms<>()) { if (qt->query_block() == qb) { return qt; } } the above search fails to find qb due to the interference of the thread b), see below, and then tries to access a nullpointer: * thread percona#36, name = ‘connection’, stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x000000010bb3cf0d mysqld`Query_block::type(this=0x00007f8f82719088) const at sql_lex.cc:4441:11 frame #1: 0x000000010b83763e mysqld`(anonymous namespace)::Explain::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:792:50 frame #2: 0x000000010b83cc4d mysqld`(anonymous namespace)::Explain_join::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:1487:21 frame #3: 0x000000010b837c34 mysqld`(anonymous namespace)::Explain::prepare_columns(this=0x00007000020611b8) at opt_explain.cc:744:26 frame #4: 0x000000010b83ea0e mysqld`(anonymous namespace)::Explain_join::explain_qep_tab(this=0x00007000020611b8, tabnum=0) at opt_explain.cc:1415:32 frame #5: 0x000000010b83ca0a mysqld`(anonymous namespace)::Explain_join::shallow_explain(this=0x00007000020611b8) at opt_explain.cc:1364:9 frame #6: 0x000000010b83379b mysqld`(anonymous namespace)::Explain::send(this=0x00007000020611b8) at opt_explain.cc:770:14 frame #7: 0x000000010b834147 mysqld`explain_query_specification(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, query_term=0x00007f8f82719088, ctx=CTX_JOIN) at opt_explain.cc:2088:20 frame #8: 0x000000010bd36b91 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f82719088) at sql_union.cc:1519:11 frame percona#9: 0x000000010bd36c68 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f8271d748) at sql_union.cc:1526:13 frame percona#10: 0x000000010bd373f7 mysqld`Query_expression::explain(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00) at sql_union.cc:1591:7 frame percona#11: 0x000000010b835820 mysqld`mysql_explain_query_expression(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2392:17 frame percona#12: 0x000000010b835400 mysqld`explain_query(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2353:13 * frame percona#13: 0x000000010b8363e4 mysqld`Sql_cmd_explain_other_thread::execute(this=0x00007f8fba585b68, thd=0x00007f8fbb111e00) at opt_explain.cc:2531:11 frame percona#14: 0x000000010bba7d8b mysqld`mysql_execute_command(thd=0x00007f8fbb111e00, first_level=true) at sql_parse.cc:4648:29 frame percona#15: 0x000000010bb9e230 mysqld`dispatch_sql_command(thd=0x00007f8fbb111e00, parser_state=0x0000700002065de8) at sql_parse.cc:5303:19 frame percona#16: 0x000000010bb9a4cb mysqld`dispatch_command(thd=0x00007f8fbb111e00, com_data=0x0000700002066e38, command=COM_QUERY) at sql_parse.cc:2135:7 frame percona#17: 0x000000010bb9c846 mysqld`do_command(thd=0x00007f8fbb111e00) at sql_parse.cc:1464:18 frame percona#18: 0x000000010b2f2574 mysqld`handle_connection(arg=0x0000600000e34200) at connection_handler_per_thread.cc:304:13 frame percona#19: 0x000000010e072fc4 mysqld`pfs_spawn_thread(arg=0x00007f8fba8160b0) at pfs.cc:3051:3 frame percona#20: 0x00007ff806c2b202 libsystem_pthread.dylib`_pthread_start + 99 frame percona#21: 0x00007ff806c26bab libsystem_pthread.dylib`thread_start + 15 b) the query thread being explained is itself performing LEX::cleanup and as part of the iterates over the query terms, but still allows EXPLAIN of the query plan since thd->query_plan.set_query_plan(SQLCOM_END, ...) hasn't been called yet. 20:frame: Query_terms<(Visit_order)1, (Visit_leaves)0>::Query_term_iterator::operator++() (in mysqld) (query_term.h:613) 21:frame: Query_expression::cleanup(bool) (in mysqld) (sql_union.cc:1861) 22:frame: LEX::cleanup(bool) (in mysqld) (sql_lex.h:4286) 30:frame: Sql_cmd_dml::execute(THD*) (in mysqld) (sql_select.cc:799) 31:frame: mysql_execute_command(THD*, bool) (in mysqld) (sql_parse.cc:4648) 32:frame: dispatch_sql_command(THD*, Parser_state*) (in mysqld) (sql_parse.cc:5303) 33:frame: dispatch_command(THD*, COM_DATA const*, enum_server_command) (in mysqld) (sql_parse.cc:2135) 34:frame: do_command(THD*) (in mysqld) (sql_parse.cc:1464) 57:frame: handle_connection(void*) (in mysqld) (connection_handler_per_thread.cc:304) 58:frame: pfs_spawn_thread(void*) (in mysqld) (pfs.cc:3053) 65:frame: _pthread_start (in libsystem_pthread.dylib) + 99 66:frame: thread_start (in libsystem_pthread.dylib) + 15 Solution: This patch solves the issue by removing iterator state from Query_term, making the query_term iterators thread safe. This solution labels every child query_term with its index in its parent's m_children vector. The iterator can therefore easily compute the next child to visit based on Query_term::m_sibling_idx. A unit test case is added to check reentrancy. One can also manually verify that we have no remaining race condition by running two client connections files (with \. <file>) with a big number of copies of the repro query in one connection and a big number of EXPLAIN format=json FOR <connection>, e.g. EXPLAIN FORMAT=json FOR CONNECTION 8\G in the other. The actual connection number would need to verified in con CEB7 nection one, of course. Change-Id: Ie7d56610914738ccbbecf399ccc4f465f7d26ea7
This is a combination of 5 commits. This is the 1st commit message: WL#15746: TLS Enhancements for HeatWave-AutoML & Dask Comm. Upgrade Problem: -------- - HeatWave-AutoML communication was unauthenticated, unauthorized, and unencrypted. - Dask communication utilized TCP, not aligning with FedRamp guidelines. Solution: --------- - Introduced TLS and mTLS in HeatWave-AutoML's plugin and driver for authentication, authorization, and encryption. - Applied TLS to Dask to ensure authentication, encryption, and authorization. Dask Authorization (OCID-based): -------------------------------- 1. For each DBsystem: - MySQL node sends OCIDs of authorized nodes to the head driver via: a. rapid_net_nodes b. rapid_net_allowed_ocids (older API, mainly for MTR tests) - Scenarios: a. All OCIDs provided: Dask authorizes. b. Any OCID absent: ML call fails with message. 2. During Dask worker registration to the Dask scheduler, a script is dispatched to the Dask worker for execution, retrieving the worker's OCID for authorization purposes. - If the OCID isn't approved, the connection is denied, terminating the worker and causing the ML query to fail. 3. For every Dask worker (both as listener and connector), an OCID- based authorization is performed post SSL/TLS connection handshake. The process compares the OCID from the peer's certificate against the allowed_ocids received from the HeatWave-AutoML MySQL plugin. HWAML Plugin Changes: --------------------- - Sourced certificate data and SSL setup from disk, incorporating SSL/TLS for HWAML. - Reused "keystore" variable to specify disk location for certificate retrieval. - Certificates and keys expected in PKCS12 format. - Introduced "have_ml_encryption" variable (default=0). > Acts as a switch to explicitly deactivate HWAML network encryption, akin to "disable_net_encryption" affecting network encryption for HeatWave. Set to 1 to enable. - Introduced a customized verifier function for verify_callback to be set in SSL_CTX_set_verify and used in the handshake process of SSL/TLS. The customized verifier function will perform instance id (OCID) based authorization on the plugin side during standard SSL/TLS handshake process. - CRL (Certificate Revocation List) checks are also conducted if CRL Distribution Points are present and accessible in the provided certificate. HWAML Driver Changes & OCID-based Authorization: ------------------------------------------------ - Introduced "enable_encryption" (default=0). > Set to 1 to enable encryption. - When receiving a new connection request and encryption is on, the driver performs OCID-based self-checking, comparing OCID retrieved from its own instance principal with the OCID in the provided certificate on disk. - The driver compares OCID from "mysql_compute_id" and extracted OCID from mTLS certificate during connection. - Introduced "cert_dir" argument for certificate directory specification. - Expected files: cert_chain.pem, certificate.pem, private_key.pem. > OCID should be in the userID (UID) or CN field of the certificate.pem subject. - CRL (Certificate Revocation List) checks are also conducted post handshake, if CRL Distribution Points are present and accessible in the provided certificate, alongside OCID authorization. Encryption Behavior: -------------------- - If encryption is deactivated on both plugin and driver side, HWAML will work without encryption as it was before this commit. Enabling Encryption: -------------------- - By default, "have_ml_encryption" and "enable_encryption" are set to 0 > Encryption is disabled by default. - For the HWAML plugin: > "have_ml_encryption" set to 1 (default is 0). > Specify the .pfx file's path using the "keystore". - For the HWAML Driver: > "enable_encryption" set to 1 (default is 0) > Specify "mysql_instance_id" and "cert_dir". Testing: -------- - MTR has been modified for the encryption setup. > Runs with encryption if "OCI_INSTANCE_ID" is set to a valid value. - On OCI (when "OLRAPID_KEYSTORE" is not set): > Certificates and keys are generated; PEMs for driver and PKCS12 for plugin. - On AWS (when "OLRAPID_KEYSTORE" is set as the path to PKCS12 keystore files): > PEM files are extracted from the provided PKCS12 and used for the driver. The plugin uses the provided PKCS12 keystore file. Change-Id: I553ca135241e03484db6debbe186e6d34d582bf4 This is the commit message #2: WL#15746 - Adding ML encryption support to BM Enabling ML encryption on Baumeister: - Certificates are generated on MySQLd during initialization - Needed certicates for workers are packaged and sent to worker nodes - Workers use packaged files to generate their certificates - Arguments are added to driver.py invoke - Keystore path is added to mysql config Change-Id: I11a5cc5926488ff4fbf91bb6c10a091358db7dc9 This is the commit message #3: WL#15746: Enhanced CRL Daemon Checker Issue ===== The previous design assumed a plain HTTPS link for the CRL distribution point, accessible to all. This assumption no longer holds, as public accessibility for CRL distribution contradicts OCI guidelines. Now, the CRL distribution point in certificates provided by the control plane is expected to be protected by OCI Instance Principal Authentication. However, using this authentication method introduces a delay of several seconds, which is impractical for HeatWave-AutoML. Solution ======== The CRL fetching code now uses OCI Instance Principal Authentication. To mitigate performance issues, the CRL checking process has been redesigned. Instead of enforcing CRL checks per connection in MySQL Plugin and HeatWave-AutoML Driver communications, a daemon thread in HeatWave-AutoML Driver, Dask scheduler, and Dask Worker now periodically fetches and verifies the CRL against all active connections. This separation minimizes performance impacts. Consequently, MySQL Plugin's CRL checks have been removed, as checks in the Driver, Scheduler, and Worker sufficiently cover all cluster nodes. Changes ======= - Implemented CRL checker as a daemon thread in Driver, Scheduler, and Worker. - Each connection/socket has an associated CRL checker. - CRL checks occur periodically at set intervals. - Skips CRL check if the CRL is temporarily unavailable. - Failing a CRL check results in the associated connection/socket being closed. On the Driver, a stop event is triggered (akin to CTRL-C). Change-Id: Id998cfe9e15d9236291b0ae420d65c2197837966 This is the commit message #4: WL#15746: Fix Dask workers being shutdown without releasing address Issue ===== Dask workers getting shutting but not releasing the address used properly sometimes. Solution ======== Reverted some changes in heatwave_cluster.py in dask worker shutdown function. Hopefully this will fix the address issue Change-Id: I5a6749b5a25b0ccb73ba7369e545bc010da1b84f This is the commit message #5: WL#15746: Implement Dask Worker Join Timeout for Head Node Issue: ====== In the cluster_shutdown method, the join operation on the head node's worker process lacked a timeout. This led to potential indefinite waiting and eventual hanging of the head node. Solution: ========= A timeout has been introduced for the worker process join on the head node. Unlike non-head nodes, which rely on worker join to complete Dask tasks and cannot have a timeout, the head node can safely implement this. Now, if the worker process on the head node fails to shut down post-join, indicating a timeout, it will be manually terminated. This ensures proper release of associated resources and prevents hanging of the head node. Additional Change: ================== Added Cert Rotation Guard for DASK clusters. This feature initiates on the first plugin-driver connection when the DASK cluster is off, recording the certificate's expiry date. During driver idle times, it checks the current cert's expiry against this date. If it detects a change, indicating a certificate rotation, it shuts down the DASK cluster. The cluster restarts on the next connection request, ensuring the use of the latest certificate. Change-Id: Ie63a2e2b7664e05e1622d8bd6503663e13fa73cb
This is a combination of 5 commits. This is the 1st commit message: WL#15746: TLS Enhancements for HeatWave-AutoML & Dask Comm. Upgrade Problem: -------- - HeatWave-AutoML communication was unauthenticated, unauthorized, and unencrypted. - Dask communication utilized TCP, not aligning with FedRamp guidelines. Solution: --------- - Introduced TLS and mTLS in HeatWave-AutoML's plugin and driver for authentication, authorization, and encryption. - Applied TLS to Dask to ensure authentication, encryption, and authorization. Dask Authorization (OCID-based): -------------------------------- 1. For each DBsystem: - MySQL node sends OCIDs of authorized nodes to the head driver via: a. rapid_net_nodes b. rapid_net_allowed_ocids (older API, mainly for MTR tests) - Scenarios: a. All OCIDs provided: Dask authorizes. b. Any OCID absent: ML call fails with message. 2. During Dask worker registration to the Dask scheduler, a script is dispatched to the Dask worker for execution, retrieving the worker's OCID for authorization purposes. - If the OCID isn't approved, the connection is denied, terminating the worker and causing the ML query to fail. 3. For every Dask worker (both as listener and connector), an OCID- based authorization is performed post SSL/TLS connection handshake. The process compares the OCID from the peer's certificate against the allowed_ocids received from the HeatWave-AutoML MySQL plugin. HWAML Plugin Changes: --------------------- - Sourced certificate data and SSL setup from disk, incorporating SSL/TLS for HWAML. - Reused "keystore" variable to specify disk location for certificate retrieval. - Certificates and keys expected in PKCS12 format. - Introduced "have_ml_encryption" variable (default=0). > Acts as a switch to explicitly deactivate HWAML network encryption, akin to "disable_net_encryption" affecting network encryption for HeatWave. Set to 1 to enable. - Introduced a customized verifier function for verify_callback to be set in SSL_CTX_set_verify and used in the handshake process of SSL/TLS. The customized verifier function will perform instance id (OCID) based authorization on the plugin side during standard SSL/TLS handshake process. - CRL (Certificate Revocation List) checks are also conducted if CRL Distribution Points are present and accessible in the provided certificate. HWAML Driver Changes & OCID-based Authorization: ------------------------------------------------ - Introduced "enable_encryption" (default=0). > Set to 1 to enable encryption. - When receiving a new connection request and encryption is on, the driver performs OCID-based self-checking, comparing OCID retrieved from its own instance principal with the OCID in the provided certificate on disk. - The driver compares OCID from "mysql_compute_id" and extracted OCID from mTLS certificate during connection. - Introduced "cert_dir" argument for certificate directory specification. - Expected files: cert_chain.pem, certificate.pem, private_key.pem. > OCID should be in the userID (UID) or CN field of the certificate.pem subject. - CRL (Certificate Revocation List) checks are also conducted post handshake, if CRL Distribution Points are present and accessible in the provided certificate, alongside OCID authorization. Encryption Behavior: -------------------- - If encryption is deactivated on both plugin and driver side, HWAML will work without encryption as it was before this commit. Enabling Encryption: -------------------- - By default, "have_ml_encryption" and "enable_encryption" are set to 0 > Encryption is disabled by default. - For the HWAML plugin: > "have_ml_encryption" set to 1 (default is 0). > Specify the .pfx file's path using the "keystore". - For the HWAML Driver: > "enable_encryption" set to 1 (default is 0) > Specify "mysql_instance_id" and "cert_dir". Testing: -------- - MTR has been modified for the encryption setup. > Runs with encryption if "OCI_INSTANCE_ID" is set to a valid value. - On OCI (when "OLRAPID_KEYSTORE" is not set): > Certificates and keys are generated; PEMs for driver and PKCS12 for plugin. - On AWS (when "OLRAPID_KEYSTORE" is set as the path to PKCS12 keystore files): > PEM files are extracted from the provided PKCS12 and used for the driver. The plugin uses the provided PKCS12 keystore file. Change-Id: I553ca135241e03484db6debbe186e6d34d582bf4 This is the commit message #2: WL#15746 - Adding ML encryption support to BM Enabling ML encryption on Baumeister: - Certificates are generated on MySQLd during initialization - Needed certicates for workers are packaged and sent to worker nodes - Workers use packaged files to generate their certificates - Arguments are added to driver.py invoke - Keystore path is added to mysql config Change-Id: I11a5cc5926488ff4fbf91bb6c10a091358db7dc9 This is the commit message #3: WL#15746: Enhanced CRL Daemon Checker Issue ===== The previous design assumed a plain HTTPS link for the CRL distribution point, accessible to all. This assumption no longer holds, as public accessibility for CRL distribution contradicts OCI guidelines. Now, the CRL distribution point in certificates provided by the control plane is expected to be protected by OCI Instance Principal Authentication. However, using this authentication method introduces a delay of several seconds, which is impractical for HeatWave-AutoML. Solution ======== The CRL fetching code now uses OCI Instance Principal Authentication. To mitigate performance issues, the CRL checking process has been redesigned. Instead of enforcing CRL checks per connection in MySQL Plugin and HeatWave-AutoML Driver communications, a daemon thread in HeatWave-AutoML Driver, Dask scheduler, and Dask Worker now periodically fetches and verifies the CRL against all active connections. This separation minimizes performance impacts. Consequently, MySQL Plugin's CRL checks have been removed, as checks in the Driver, Scheduler, and Worker sufficiently cover all cluster nodes. Changes ======= - Implemented CRL checker as a daemon thread in Driver, Scheduler, and Worker. - Each connection/socket has an associated CRL checker. - CRL checks occur periodically at set intervals. - Skips CRL check if the CRL is temporarily unavailable. - Failing a CRL check results in the associated connection/socket being closed. On the Driver, a stop event is triggered (akin to CTRL-C). Change-Id: Id998cfe9e15d9236291b0ae420d65c2197837966 This is the commit message #4: WL#15746: Fix Dask workers being shutdown without releasing address Issue ===== Dask workers getting shutting but not releasing the address used properly sometimes. Solution ======== Reverted some changes in heatwave_cluster.py in dask worker shutdown function. Hopefully this will fix the address issue Change-Id: I5a6749b5a25b0ccb73ba7369e545bc010da1b84f This is the commit message #5: WL#15746: Implement Dask Worker Join Timeout for Head Node Issue: ====== In the cluster_shutdown method, the join operation on the head node's worker process lacked a timeout. This led to potential indefinite waiting and eventual hanging of the head node. Solution: ========= A timeout has been introduced for the worker process join on the head node. Unlike non-head nodes, which rely on worker join to complete Dask tasks and cannot have a timeout, the head node can safely implement this. Now, if the worker process on the head node fails to shut down post-join, indicating a timeout, it will be manually terminated. This ensures proper release of associated resources and prevents hanging of the head node. Additional Change: ================== Added Cert Rotation Guard for DASK clusters. This feature initiates on the first plugin-driver connection when the DASK cluster is off, recording the certificate's expiry date. During driver idle times, it checks the current cert's expiry against this date. If it detects a change, indicating a certificate rotation, it shuts down the DASK cluster. The cluster restarts on the next connection request, ensuring the use of the latest certificate. Change-Id: Ie63a2e2b7664e05e1622d8bd6503663e13fa73cb
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3 PS-5217 : Merge fb-prod201803 Summary: Original report: https://jira.mariadb.org/browse/MDEV-15816 To reproduce this bug just following below steps, client 1: USE test; CREATE TABLE t1 (i INT) ENGINE=MyISAM; HANDLER t1 OPEN h; CREATE TABLE t2 (i INT) ENGINE=RocksDB; LOCK TABLES t2 WRITE; client 2: FLUSH TABLES WITH READ LOCK; client 1: INSERT INTO t2 VALUES (1); So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE. Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly set to RDB_LOCK_NONE, as below ``` #0 myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE) #1 get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2) #2 mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0) #3 THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true) #4 MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8) #5 MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2) #6 Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0) ``` Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE' failed in myrocks::ha_rocksdb::write_row() Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE. Closes facebook/mysql-5.6#838 Pull Request resolved: facebook/mysql-5.6#871 Differential Revision: D9417382 Pulled By: lth fbshipit-source-id: c36c164e06c
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6 PS-6865 : Merge fb-prod201903 Summary: Currently during primary key's value encode, its ttl value can be from either one of these 3 cases 1. ttl column in primary key 2. non-ttl column a. old record(update case) b. current timestamp 3. ttl column in non-key field Workflow #1: first in Rdb_key_def::pack_record() find and store pk_offset, then in value encode try to parse key slice to fetch ttl value by using pk_offset. Workflow #3: fetch ttl value from ttl column The change is to merge #1 and #3 by always fetching TTL value from ttl column, not matter whether the ttl column is in primary key or not. Of course, remove pk_offset, since it isn't used. BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is stored by primary value encoding. Reviewed By: yizhang82 Differential Revision: D14662716 fbshipit-source-id: 6b4e5f044fd
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID: facebook/mysql-5.6@3366bd9d91b2 PS-9395: Merge percona-202401 (https://jira.percona.com/browse/PS-9395) Summary: some MTR failed in ubsan due to num of rows/records calculated in records_in_range() is a negative values which is caused by m_actual_disk_size is a negative value ``` storage/rocksdb/ha_rocksdb.cc:: runtime error: -56.8272 is outside the range of representable values of type 'unsigned long long' #0 myrocks::ha_rocksdb::records_in_range_internal(unsigned int, key_range*, key_range*, long, long, unsigned long long*, unsigned long long*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14855:16 #1 myrocks::ha_rocksdb::records_in_range(unsigned int, key_range*, key_range*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14760:3 #2 handler::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/sql/handler.cc:6608:26 #3 myrocks::ha_rocksdb::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:19002:18 #4 check_quick_select(THD*, RANGE_OPT_P ``` Due to m_actual_disk_size is an estimated value, always reset to 0 if it i becomes negative. Differential Revision: D50531919
PS-5741: Incorrect use of memset_s in keyring_vault. Fixed the usage of memset_s. The arguments should be: void memset_s(void *dest, size_t dest_max, int c, size_t n) where the 2nd argument is size of buffer and the 3rd is argument is character to fill. --------------------------------------------------------------------------- PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate --- *Problem:* `st_mysql_value::val_str` might return a pointer to `buf` which after the function called is deleted. Therefore the value in `save`, after reuturnin from the function, is invalid. In this particular case, the error is not manifesting as val_str` returns memory allocated with `thd_strmake` and it does not use `buf`. *Solution:* Allocate memory with `thd_strmake` so the memory in `save` is not local. --------------------------------------------------------------------------- Fix test main.bug12969156 when WITH_ASAN=ON *Problem:* ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`: ``` ==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478 WRITE of size 24 at 0x7fe746d06d14 thread T16777215 Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62 This frame has 4 object(s): [48, 56) 'result' (line 66) [80, 112) '_db_stack_frame_' (line 63) [144, 200) 'tm_tmp' (line 67) [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions *are* supported) Thread T26 created by T25 here: #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216 #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104 #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148 #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279 #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279 #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664 #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160 #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952 #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544 percona#9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065 percona#10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325 percona#11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198 percona#12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473 ``` The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above. This is a benign error as all the variables are on the stack. *Solution*: Finish the thread in orderly way by using a signalling variable. --------------------------------------------------------------------------- PS-8204: Fix XML escape rules for audit plugin https://jira.percona.com/browse/PS-8204 There was a wrong length specified for some XML escape rules. As a result of this terminating null symbol from replacement rule was copied into resulting string. This lead to quer text truncation in audit log file. In addition added empty replacement rules for '\b' and 'f' symbols which just remove them from resulting string. These symboles are not supported in XML 1.0. --------------------------------------------------------------------------- PS-8854: Add main.percona_udf MTR test Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions. --------------------------------------------------------------------------- PS-9369: Fix currently processed query comparison in audit_log https://perconadev.atlassian.net/browse/PS-9369 The audit_log uses stack to keep track of table access operations being performed in scope of one query. It compares last known table access query string stored on top of this stack with actual query in audit event being processed at the moment to decide if new record should be pushed to stack or it is time to clean records from the stack. Currently audit_log simply compares char* variables to decide if this is the same query string. This approach doesn't work. As a result plugin looses control of the stack size and it starts growing with the time consuming memory. This issue is not noticable on short term server connections as memory is freed once connection is closed. At the same time this leads to extra memory consumption for long running server connections. The following is done to fix the issue: - Query is sent along with audit event as MYSQL_LEX_CSTRING structure. It is not correct to ignore MYSQL_LEX_CSTRING.length comparison as sometimes MYSQL_LEX_CSTRING.str pointer may be not iniialised properly. Added string length check to make sure structure contains any valid string. - Used strncmp to compare actual strings instead of comparing char* variables.
…n read() syscall over network https://jira.percona.com/browse/PS-8592 Description ----------- GR suffered from problems caused by the security probes and network scanner processes connecting to the group replication communication port. This usually is not a problem, but poses a serious threat when another member tries to join the cluster by initialting a connection to the member which is affected by external processes using the port dedicated for group communication for longer durations. On such activites by external processes, the SSL enabled server stalled forever on the SSL_accept() call waiting for handshake data. Below is the stacktrace: Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)): #0 in read () #1 in sock_read () #2 in BIO_read () #3 in ssl23_read_bytes () #4 in ssl23_get_client_hello () #5 in ssl23_accept () #6 in xcom_tcp_server_startup(Xcom_network_provider*) () When the server stalled in the above path forever, it prohibited other members to join the cluster resulting in the following messages on the joiner server's logs. [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group' [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.' Solution -------- This patch adds two new variables 1. group_replication_xcom_ssl_socket_timeout It is a file-descriptor level timeout in seconds for both accept() and SSL_accept() calls when group replication is listening on the xcom port. When set to a valid value, say for example 5 seconds, both accept() and SSL_accept() return after 5 seconds. The default value has been set to 0 (waits infinitely) for backward compatibility. This variable is effective only when GR is configred with SSL. 2. group_replication_xcom_ssl_accept_retries It defines the number of retries to be performed before closing the socket. For each retry the server thread calls SSL_accept() with timeout defined by the group_replication_xcom_ssl_socket_timeout for the SSL handshake process once the connection has been accepted by the first accept() call. The default value has been set to 10. This variable is effective only when GR is configred with SSL. Note: - Both of the above variables are dynamically configurable, but will become effective only on START GROUP_REPLICATION. ------------------------------------------------------------------------------- PS-8844: Fix the failing main.mysqldump_gtid_purged https://jira.percona.com/browse/PS-8844 This patch fixes the test failure of main.mysqldump_gtid_purged that failed due to the uninitialized variable $redirect_stderr in the start_proc_in_background.inc.
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3 PS-5217 : Merge fb-prod201803 Summary: Original report: https://jira.mariadb.org/browse/MDEV-15816 To reproduce this bug just following below steps, client 1: USE test; CREATE TABLE t1 (i INT) ENGINE=MyISAM; HANDLER t1 OPEN h; CREATE TABLE t2 (i INT) ENGINE=RocksDB; LOCK TABLES t2 WRITE; client 2: FLUSH TABLES WITH READ LOCK; client 1: INSERT INTO t2 VALUES (1); So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE. Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly set to RDB_LOCK_NONE, as below ``` #0 myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE) #1 get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2) #2 mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0) #3 THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true) #4 MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8) #5 MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2) #6 Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0) ``` Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE' failed in myrocks::ha_rocksdb::write_row() Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE. Closes facebook/mysql-5.6#838 Pull Request resolved: facebook/mysql-5.6#871 Differential Revision: D9417382 Pulled By: lth fbshipit-source-id: c36c164e06c
Upstream commit ID : fb-mysql-5.6.35/77032004ad23d21a4c386f8136ecfbb071ea42d6 PS-6865 : Merge fb-prod201903 Summary: Currently during primary key's value encode, its ttl value can be from either one of these 3 cases 1. ttl column in primary key 2. non-ttl column a. old record(update case) b. current timestamp 3. ttl column in non-key field Workflow #1: first in Rdb_key_def::pack_record() find and store pk_offset, then in value encode try to parse key slice to fetch ttl value by using pk_offset. Workflow #3: fetch ttl value from ttl column The change is to merge #1 and #3 by always fetching TTL value from ttl column, not matter whether the ttl column is in primary key or not. Of course, remove pk_offset, since it isn't used. BTW, for secondary keys, its ttl value is always from m_ttl_bytes, which is stored by primary value encoding. Reviewed By: yizhang82 Differential Revision: D14662716 fbshipit-source-id: 6b4e5f044fd
Upstream commit ID : fb-mysql-5.6.35/e025cf1c47e63aada985d78e4083f2e02fba434f PS-7731 : Merge percona-202102 Summary: Today in `SELECT count(*)` MyRocks would still decode every single column due to this check, despite the readset being empty: ``` // bitmap is cleared on index merge, but it still needs to decode columns bool field_requested = decode_all_fields || m_verify_row_debug_checksums || bitmap_is_set(field_map, m_table->field[i]->field_index); ``` As a result MyRocks is significantly slower than InnoDB in this particular scenario. Turns out in index merge, when it tries to reset, it calls ha_index_init with an empty column_bitmap, so our field decoders didn't know it needs to decode anything, so the entire query would return nothing. This is discussed in [this commit](facebook/mysql-5.6@70f2bcd), and [issue 624](facebook/mysql-5.6#624) and [PR 626](facebook/mysql-5.6#626). So the workaround we had at that time is to simply treat empty map as implicitly everything, and the side effect is massively slowed down count(*). We have a few options to address this: 1. Fix index merge optimizer - looking at the code in QUICK_RANGE_SELECT::init_ror_merged_scan, it actually fixes up the column_bitmap properly, but after init/reset, so the fix would simply be moving the bitmap set code up. For secondary keys, prepare_for_position will automatically call `mark_columns_used_by_index_no_reset(s->primary_key, read_set)` if HA_PRIMARY_KEY_REQUIRED_FOR_POSITION is set (true for both InnoDB and MyRocks), so we would know correctly that we need to unpack PK when walking SK during index merge. 2. Overriding `column_bitmaps_signal` and setup decoders whenever the bitmap changes - however this doesn't work by itself. Because no storage engine today actually use handler::column_bitmaps_signal this path haven't been tested properly in index merge. In this case, QUICK_RANGE_SELECT::init_ror_merged_scan should call set_column_bitmaps_no_signal to avoid resetting the correct read/write set of head since head is used as first handler (reuses_handler=true) and subsequent place holders for read/write set updates (reuse_handler=false). 3. Follow InnoDB's solution - InnoDB delays it actually initialize its template again in index_read for the 2nd time (relying on `prebuilt->sql_stat_start`), and during index_read `QUICK_RANGE_SELECT::column_bitmap` is already fixed up and the table read/write set is switched to it, so the new template would be built correctly. In order to make it easier to maintain and port, after discussing with Manuel, I'm going with a simplified version of #3 that delays decoder creation until the first read operation (index_*, rnd_*, range_read_*, multi_range_read_*), and setting the delay flag in index_init / rnd_init / multi_range_read_init. Also, I ran into a bug with truncation_partition where Rdb_converter's tbl_def is stale (we only update ha_rocksdb::m_tbl_def), but it is fine because it is not being used after table open. But my change moves the lookup_bitmap initialization into Rdb_converter which takes a dependency on Rdb_converter::m_tbl_def so now we need to reset it properly. Reference Patch: facebook/mysql-5.6@44d6a8d --------- Porting Note: Due to 8.0's new counting infra (handler::record & handler::record_with_index), this only helps PK counting. Will send out a better fix that works better with 8.0 new counting infra. Reviewed By: Pushapgl Differential Revision: D26265470 fbshipit-source-id: f142be681ab
Upstream commit ID: facebook/mysql-5.6@3366bd9d91b2 PS-9395: Merge percona-202401 (https://jira.percona.com/browse/PS-9395) Summary: some MTR failed in ubsan due to num of rows/records calculated in records_in_range() is a negative values which is caused by m_actual_disk_size is a negative value ``` storage/rocksdb/ha_rocksdb.cc:: runtime error: -56.8272 is outside the range of representable values of type 'unsigned long long' #0 myrocks::ha_rocksdb::records_in_range_internal(unsigned int, key_range*, key_range*, long, long, unsigned long long*, unsigned long long*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14855:16 #1 myrocks::ha_rocksdb::records_in_range(unsigned int, key_range*, key_range*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14760:3 #2 handler::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/sql/handler.cc:6608:26 #3 myrocks::ha_rocksdb::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:19002:18 #4 check_quick_select(THD*, RANGE_OPT_P ``` Due to m_actual_disk_size is an estimated value, always reset to 0 if it i becomes negative. Differential Revision: D50531919
PS-5741: Incorrect use of memset_s in keyring_vault. Fixed the usage of memset_s. The arguments should be: void memset_s(void *dest, size_t dest_max, int c, size_t n) where the 2nd argument is size of buffer and the 3rd is argument is character to fill. --------------------------------------------------------------------------- PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate --- *Problem:* `st_mysql_value::val_str` might return a pointer to `buf` which after the function called is deleted. Therefore the value in `save`, after reuturnin from the function, is invalid. In this particular case, the error is not manifesting as val_str` returns memory allocated with `thd_strmake` and it does not use `buf`. *Solution:* Allocate memory with `thd_strmake` so the memory in `save` is not local. --------------------------------------------------------------------------- Fix test main.bug12969156 when WITH_ASAN=ON *Problem:* ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`: ``` ==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478 WRITE of size 24 at 0x7fe746d06d14 thread T16777215 Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62 This frame has 4 object(s): [48, 56) 'result' (line 66) [80, 112) '_db_stack_frame_' (line 63) [144, 200) 'tm_tmp' (line 67) [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork (longjmp and C++ exceptions *are* supported) Thread T26 created by T25 here: #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216 #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104 #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148 #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279 #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279 #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664 #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160 #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952 #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544 percona#9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065 percona#10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325 percona#11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198 percona#12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473 ``` The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above. This is a benign error as all the variables are on the stack. *Solution*: Finish the thread in orderly way by using a signalling variable. --------------------------------------------------------------------------- PS-8204: Fix XML escape rules for audit plugin https://jira.percona.com/browse/PS-8204 There was a wrong length specified for some XML escape rules. As a result of this terminating null symbol from replacement rule was copied into resulting string. This lead to quer text truncation in audit log file. In addition added empty replacement rules for '\b' and 'f' symbols which just remove them from resulting string. These symboles are not supported in XML 1.0. --------------------------------------------------------------------------- PS-8854: Add main.percona_udf MTR test Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions. --------------------------------------------------------------------------- PS-9369: Fix currently processed query comparison in audit_log https://perconadev.atlassian.net/browse/PS-9369 The audit_log uses stack to keep track of table access operations being performed in scope of one query. It compares last known table access query string stored on top of this stack with actual query in audit event being processed at the moment to decide if new record should be pushed to stack or it is time to clean records from the stack. Currently audit_log simply compares char* variables to decide if this is the same query string. This approach doesn't work. As a result plugin looses control of the stack size and it starts growing with the time consuming memory. This issue is not noticable on short term server connections as memory is freed once connection is closed. At the same time this leads to extra memory consumption for long running server connections. The following is done to fix the issue: - Query is sent along with audit event as MYSQL_LEX_CSTRING structure. It is not correct to ignore MYSQL_LEX_CSTRING.length comparison as sometimes MYSQL_LEX_CSTRING.str pointer may be not iniialised properly. Added string length check to make sure structure contains any valid string. - Used strncmp to compare actual strings instead of comparing char* variables.
…n read() syscall over network https://jira.percona.com/browse/PS-8592 Description ----------- GR suffered from problems caused by the security probes and network scanner processes connecting to the group replication communication port. This usually is not a problem, but poses a serious threat when another member tries to join the cluster by initialting a connection to the member which is affected by external processes using the port dedicated for group communication for longer durations. On such activites by external processes, the SSL enabled server stalled forever on the SSL_accept() call waiting for handshake data. Below is the stacktrace: Thread 55 (Thread 0x7f7bb77ff700 (LWP 2198598)): #0 in read () #1 in sock_read () #2 in BIO_read () #3 in ssl23_read_bytes () #4 in ssl23_get_client_hello () #5 in ssl23_accept () #6 in xcom_tcp_server_startup(Xcom_network_provider*) () When the server stalled in the above path forever, it prohibited other members to join the cluster resulting in the following messages on the joiner server's logs. [ERROR] [MY-011640] [Repl] Plugin group_replication reported: 'Timeout on wait for view after joining group' [ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] The member is already leaving or joining a group.' Solution -------- This patch adds two new variables 1. group_replication_xcom_ssl_socket_timeout It is a file-descriptor level timeout in seconds for both accept() and SSL_accept() calls when group replication is listening on the xcom port. When set to a valid value, say for example 5 seconds, both accept() and SSL_accept() return after 5 seconds. The default value has been set to 0 (waits infinitely) for backward compatibility. This variable is effective only when GR is configred with SSL. 2. group_replication_xcom_ssl_accept_retries It defines the number of retries to be performed before closing the socket. For each retry the server thread calls SSL_accept() with timeout defined by the group_replication_xcom_ssl_socket_timeout for the SSL handshake process once the connection has been accepted by the first accept() call. The default value has been set to 10. This variable is effective only when GR is configred with SSL. Note: - Both of the above variables are dynamically configurable, but will become effective only on START GROUP_REPLICATION. ------------------------------------------------------------------------------- PS-8844: Fix the failing main.mysqldump_gtid_purged https://jira.percona.com/browse/PS-8844 This patch fixes the test failure of main.mysqldump_gtid_purged that failed due to the uninitialized variable $redirect_stderr in the start_proc_in_background.inc.
…ocal DDL executed https://perconadev.atlassian.net/browse/PS-9018 Problem ------- In high concurrency scenarios, MySQL replica can enter into a deadlock due to a race condition between the replica applier thread and the client thread performing a binlog group commit. Analysis -------- It needs at least 3 threads for this deadlock to happen 1. One client thread 2. Two replica applier threads How this deadlock happens? -------------------------- 0. Binlog is enabled on replica, but log_replica_updates is disabled. 1. Initially, both "Commit Order" and "Binlog Flush" queues are empty. 2. Replica applier thread 1 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier thread 1 3.1. Becomes leader (In Commit_stage_manager::enroll_for()). 3.2. Registers in the commit order queue. 3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log. 3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is not yet released. NOTE: SE commit for applier thread is already done by the time it reaches here. 4. Replica applier thread 2 enters the group commit pipeline to register in the "Commit Order" queue since `log-replica-updates` is disabled on the replica node. 5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the applier thread 2 5.1. Becomes leader (In Commit_stage_manager::enroll_for()) 5.2. Registers in the commit order queue. 5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier thread 1 it will wait until the lock is released. 6. Client thread enters the group commit pipeline to register in the "Binlog Flush" queue. 7. Since "Commit Order" queue is not empty (there is applier thread 2 in the queue), it enters the conditional wait `m_stage_cond_leader` with an intention to become the leader for both the "Binlog Flush" and "Commit Order" queues. 8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update the GTID by calling gtid_state->update_commit_group() from Commit_order_manager::flush_engine_and_signal_threads(). 9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log. 9.1. It checks if there is any thread waiting in the "Binlog Flush" queue to become the leader. Here it finds the client thread waiting to be the leader. 9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the cond_var `m_stage_cond_leader` and enters a conditional wait until the thread's `tx_commit_pending` is set to false by the client thread (will be done in the Commit_stage_manager::process_final_stage_for_ordered_commit_group() called by client thread from fetch_and_process_flush_stage_queue()). 10. The client thread wakes up from the cond_var `m_stage_cond_leader`. The thread has now become a leader and it is its responsibility to update GTID of applier thread 2. 10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log. 10.2. Returns from `enroll_for()` and proceeds to process the "Commit Order" and "Binlog Flush" queues. 10.3. Fetches the "Commit Order" and "Binlog Flush" queues. 10.4. Performs the storage engine flush by calling ha_flush_logs() from fetch_and_process_flush_stage_queue(). 10.5. Proceeds to update the GTID of threads in "Commit Order" queue by calling gtid_state->update_commit_group() from Commit_stage_manager::process_final_stage_for_ordered_commit_group(). 11. At this point, we will have - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and - Applier thread 1 performing GTID update for itself (from step 8). Due to the lack of proper synchronization between the above two threads, there exists a time window where both threads can call gtid_state->update_commit_group() concurrently. In subsequent steps, both threads simultaneously try to modify the contents of the array `commit_group_sidnos` which is used to track the lock status of sidnos. This concurrent access to `update_commit_group()` can cause a lock-leak resulting in one thread acquiring the sidno lock and not releasing at all. ----------------------------------------------------------------------------------------------------------- Client thread Applier Thread 1 ----------------------------------------------------------------------------------------------------------- update_commit_group() => global_sid_lock->rdlock(); update_commit_group() => global_sid_lock->rdlock(); calls update_gtids_impl_lock_sidnos() calls update_gtids_impl_lock_sidnos() set commit_group_sidno[2] = true set commit_group_sidno[2] = true lock_sidno(2) -> successful lock_sidno(2) -> waits update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { unlock_sidno(2); commit_group_sidnos[2] = false; } Applier thread continues.. lock_sidno(2) -> successful update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()` if (commit_group_sidnos[2]) { <=== this check fails and lock is not released. unlock_sidno(2); commit_group_sidnos[2] = false; } Client thread continues without releasing the lock ----------------------------------------------------------------------------------------------------------- 12. As the above lock-leak can also happen the other way i.e, the applier thread fails to unlock, there can be different consequences hereafter. 13. If the client thread continues without releasing the lock, then at a later stage, it can enter into a deadlock with the applier thread performing a GTID update with stack trace. Client_thread ------------- #1 __GI___lll_lock_wait #2 ___pthread_mutex_lock #3 native_mutex_lock <= waits for commit lock while holding sidno lock #4 Commit_stage_manager::enroll_for #5 MYSQL_BIN_LOG::change_stage #6 MYSQL_BIN_LOG::ordered_commit #7 MYSQL_BIN_LOG::commit #8 ha_commit_trans percona#9 trans_commit_implicit percona#10 mysql_create_like_table percona#11 Sql_cmd_create_table::execute percona#12 mysql_execute_command percona#13 dispatch_sql_command Applier thread -------------- #1 ___pthread_mutex_lock #2 native_mutex_lock #3 safe_mutex_lock #4 Gtid_state::update_gtids_impl_lock_sidnos <= waits for sidno lock #5 Gtid_state::update_commit_group #6 Commit_order_manager::flush_engine_and_signal_threads <= acquires commit lock here #7 Commit_order_manager::finish #8 Commit_order_manager::wait_and_finish percona#9 ha_commit_low percona#10 trx_coordinator::commit_in_engines percona#11 MYSQL_BIN_LOG::commit percona#12 ha_commit_trans percona#13 trans_commit percona#14 Xid_log_event::do_commit percona#15 Xid_apply_log_event::do_apply_event_worker percona#16 Slave_worker::slave_worker_exec_event percona#17 slave_worker_exec_job_group percona#18 handle_slave_worker 14. If the applier thread continues without releasing the lock, then at a later stage, it can perform recursive locking while setting the GTID for the next transaction (in set_gtid_next()). In debug builds the above case hits the assertion `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the replica applier thread when it tries to re-acquire the lock. Solution -------- In the above problematic example, when seen from each thread individually, we can conclude that there is no problem in the order of lock acquisition, thus there is no need to change the lock order. However, the root cause for this problem is that multiple threads can concurrently access to the array `Gtid_state::commit_group_sidnos`. In its initial implementation, it was expected that threads should hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it was not considered when upstream implemented WL#7846 (MTS: slave-preserve-commit-order when log-slave-updates/binlog is disabled). With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired when the client thread (binlog flush leader) when it tries to perform GTID update on behalf of threads waiting in "Commit Order" queue, thus providing a guarantee that `Gtid_state::commit_group_sidnos` array is never accessed without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
Upstream commit ID: facebook/mysql-5.6@3366bd9d91b2 PS-9395: Merge percona-202401 (https://jira.percona.com/browse/PS-9395) Summary: some MTR failed in ubsan due to num of rows/records calculated in records_in_range() is a negative values which is caused by m_actual_disk_size is a negative value ``` storage/rocksdb/ha_rocksdb.cc:: runtime error: -56.8272 is outside the range of representable values of type 'unsigned long long' #0 myrocks::ha_rocksdb::records_in_range_internal(unsigned int, key_range*, key_range*, long, long, unsigned long long*, unsigned long long*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14855:16 #1 myrocks::ha_rocksdb::records_in_range(unsigned int, key_range*, key_range*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:14760:3 #2 handler::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/sql/handler.cc:6608:26 #3 myrocks::ha_rocksdb::multi_range_read_info_const(unsigned int, RANGE_SEQ_IF*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*) /data/sandcastle/boxes/trunk-git-mysql/storage/rocksdb/ha_rocksdb.cc:19002:18 #4 check_quick_select(THD*, RANGE_OPT_P ``` Due to m_actual_disk_size is an estimated value, always reset to 0 if it i becomes negative. Differential Revision: D50531919
No description provided.