OBCE V3 培训实验:应用到数据库全链路分析(下)

2023年 10月 23日 34.5k 0

本文主要分享 OBCE V3 培训的理论和实验知识第三篇,接上篇继续介绍应用到 OceanBase 全链路分析的实验部分。个人观点,理解不当欢迎指出。

实验部分

OBCE培训材料和实验官网地址:https://www.oceanbase.com/training/detail?level=OBCE  。

从应用到 OceanBase 全链路分析涉及到的理论知识不难理解,实际运用的时候却往往非常耗费精力,主要原因就是日志的收集、定位和分析。练习这个需要一个合适的应用,还要模拟应用或网络出问题,有点困难,所以实际的实验要求只是练习一下网络抓包 tcpdump 命令,通过抓包观察 OB 连接和 SQL 执行耗时信息。由于网络处于技术栈最底层,实际抓包时数据量会非常大。在实际问题诊断中除非怀疑网络稳定性和性能有问题,否则不会先选这个方法。抓到的报文文件建议通过 Wireshark 图形化软件去分析,使用体验会好一些。

本文不分享这个实验的抓包过程(网上有很多介绍),而是分析 OB 客户端连接正常和异常情况下 OBProxy 的日志内容以及 业务查询和事务对应的 OBProxy 日志。了解 OBProxy 日志特点,将来分析真实的线上链路问题时,就不会因为问题现象已经消失而无法排查,而是通过 OBProxy 日志反向推测链路上发生什么问题。

OB 数据库连接的 OBProxy 日志

场景一:客户端会话正常建立和退出。

场景描述:通过 mysql 客户端发起一个租户连接,执行业务查询,然后正常退出。观察 OBProxy 对应的日志。

首先连接后发起一笔应用查询会话,确保 OBProxy 建立对应的后端连接。

    MySQL [tpccdb]> select * from bmsql_warehouse;
    +------+------------+--------+------------+---------------------+----------------------+----------------------+---------+-----------+
    | w_id | w_ytd | w_tax | w_name | w_street_1 | w_street_2 | w_city | w_state | w_zip |
    +------+------------+--------+------------+---------------------+----------------------+----------------------+---------+-----------+
    | 1 | 4062863.16 | 0.1873 | vPJUyrE7G | yjDKpFuiShMAoNA | fOrVS6Q7iAzP | xvNWSBM8SRz9IQ4 | NX | 637211111 |

    | 10 | 3796057.29 | 0.0304 | DP98DH32zb | boSEcSBN8GcAm6se8M | ozsg0xZMeAu1vn | EgXsvSzIwucP | PJ | 620111111 |
    +------+------------+--------+------------+---------------------+----------------------+----------------------+---------+-----------+
    10 rows in set (0.01 sec)

    在客户端观察自己的连接相关信息,可以用于后面分析诊断用。

      MySQL [tpccdb]> select last_trace_id();
      +-----------------------------------+
      | last_trace_id() |
      +-----------------------------------+
      | YB420A00003D-000607FAEB753695-0-0 |
      +-----------------------------------+
      1 row in set (0.00 sec)


      MySQL [tpccdb]> show processlist;
      +------+---------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+
      | Id | Tenant | User | Host | db | trans_count | svr_session_count | state | tid | pid |
      +------+---------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+
      | 16 | obmysql | tpcc | 10.0.0.66:45843 | tpccdb | 0 | 1 | MCS_ACTIVE_READER | 89116 | 89116 |
      +------+---------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+
      1 row in set (0.01 sec)


      MySQL [tpccdb]> show proxysession;
      +--------------------+------+---------+---------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+-----------+
      | proxy_sessid | Id | Cluster | Tenant | User | Host | db | trans_count | svr_session_count | state | tid | pid | using_ssl |
      +--------------------+------+---------+---------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+-----------+
      | 720576208556654611 | 16 | obdemo | obmysql | tpcc | 10.0.0.66:45843 | tpccdb | 0 | 1 | MCS_ACTIVE_READER | 89116 | 89116 | 0 |
      +--------------------+------+---------+---------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+-----------+
      1 row in set (0.00 sec)

      其中有用的信息包括:

      • last_trace_id ,上一次执行 SQL的 trace_id,可用于在所有 OB 节点里定位该SQL 执行期间的日志,用以观察 OB 内部各个模块的工作信息。这里不用这个信息。

      • processlist 中的 Id,这个就是 OBProxy 原理提到的客户端连接的 cs_id ,不同 OBProxy 节点上的 cs_id 可能会重复。

      • proxy_sessid ,obproxy 后端连接的标识,这个在具体的 OBProxy 内部是唯一的。

      • Host 信息是客户端连接的 IP 和 TCP 端口。可以在主机层面查看网络 TCP 连接时用该端口定位 TCP 连接状态。

      下面就是在客户端服务器上定位客户端连接的状态。OBProxy 是 10.0.0.62:28830 。当前连接状态是 ESTABLISHED,进程号 6399 。

        [root@ora12c ~]# netstat -antlp |grep 10.0.0.62 |grep 45843                                                                                                                                                                                                                            
        tcp 0 0 172.27.0.2:45843 10.0.0.62:28830 ESTABLISHED 6399/mysql
        [root@ora12c ~]#

        在 OBProxy 服务器上也可以看到这个 TCP 连接。

          [root@server062 ~]# netstat -anltp |grep 10.0.0.66 |grep 45843
          tcp 0 0 10.0.0.62:28830 10.0.0.66:45843 ESTABLISHED 89116/bin/obproxy
          [root@server062 ~]#

          在 OBProxy 管理界面查看前端和后端会话信息。

            MySQL [(none)]> show processlist;
            +--------+----------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+
            | Id | Tenant | User | Host | db | trans_count | svr_session_count | state | tid | pid |
            +--------+----------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+
            | 262147 | proxysys | root | 127.0.0.1:45170 | NULL | 0 | 0 | MCS_ACTIVE_READER | 89127 | 89116 |
            | 16 | obmysql | tpcc | 10.0.0.66:45843 | tpccdb | 0 | 1 | MCS_ACTIVE_READER | 89116 | 89116 |
            +--------+----------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+
            2 rows in set (0.01 sec)


            MySQL [(none)]> show proxysession attribute 16;
            +----------------------------------+---------------------+----------------+
            | attribute_name | value | info |
            +----------------------------------+---------------------+----------------+
            | proxy_sessid | 720576208556654611 | cs common |
            | cs_id | 16 | cs common |
            | cluster | obdemo | cs common |
            | tenant | obmysql | cs common |
            | user | tpcc | cs common |
            | host_ip | 10.0.0.66 | cs common |
            | host_port | 45843 | cs common |
            | db | tpccdb | cs common |
            | total_trans_cnt | 0 | cs common |
            | svr_session_cnt | 1 | cs common |
            | active | true | cs common |
            | read_state | MCS_ACTIVE_READER | cs common |
            | tid | 89116 | cs common |
            | pid | 89116 | cs common |
            | idc_name | | cs common |
            | modified_time | 0 | cs stat |
            | reported_time | 0 | cs stat |
            | hot_sys_var_version | 2 | cs var version |
            | sys_var_version | 4 | cs var version |
            | user_var_version | 0 | cs var version |
            | last_insert_id_version | 0 | cs var version |
            | db_name_version | 1 | cs var version |
            | server_ip | 10.0.0.61 | last used ss |
            | server_port | 2881 | last used ss |
            | server_sessid | 3221613756 | last used ss |
            | ss_id | 15878 | last used ss |
            | state | MSS_KA_CLIENT_SLAVE | last used ss |
            | transact_count | 4 | last used ss |
            | server_trans_stat | 0 | last used ss |
            | hot_sys_var_version | 2 | last used ss |
            | sys_var_version | 4 | last used ss |
            | user_var_version | 0 | last used ss |
            | last_insert_id_version | 0 | last used ss |
            | db_name_version | 1 | last used ss |
            | is_checksum_supported | 1 | last used ss |
            | is_safe_read_weak_supported | 0 | last used ss |
            | is_checksum_switch_supported | 1 | last used ss |
            | checksum_switch | 1 | last used ss |
            | enable_extra_ok_packet_for_stats | 1 | last used ss |
            +----------------------------------+---------------------+----------------+
            39 rows in set (0.00 sec)

            后面这个命令看到前端会话(cs_id=16) 对应的后端会话信息(ss_id=15878)。

            以上客户端的 IP 和端口信息、cs_id 和 ss_id 信息等都在 OBProxy.log 里有详细的记录,并用于定位该会话相关的日志。

              ```bash
              [2023-10-20 14:16:16.870082] INFO [PROXY.SOCK] ob_connection.cpp:350 [89147][Y0-0] [lt=46] [dc=0] connection accepted(client={10.0.0.66:45843}, server={0.0.0.0:28830}, accepted_fd=60, listen_fd=57)
              [2023-10-20 14:16:16.870200] INFO [PROXY.NET] ob_mysql_session_accept.cpp:36 [89116][Y0-7FB059DAA2E0] [lt=7] [dc=0] [ObMysqlSessionAccept:main_event] accepted connection(netvc=0x7fb059da3d20, client_ip={10.0.0.66:45843})
              [2023-10-20 14:16:16.870290] INFO [PROXY.NET] ob_unix_net_vconnection.cpp:1243 [89116][Y0-7FB059DAA2E0] [lt=21] [dc=0] vip connect(protocol=6, fd=60, vid=0, vaddr={10.0.0.66:2883}, caddr={10.0.0.66:45843}, daddr={10.0.0.62:28830})
              [2023-10-20 14:16:16.870311] INFO [PROXY.CS] ob_mysql_client_session.cpp:365 [89116][Y0-7FB059DAA2E0] [lt=17] [dc=0] client session born(cs_id=16, proxy_sessid=0, is_local_connection=false, client_vc=0x7fb059da3d20, client_fd=60, client_addr="10.0.0.66:45843")
              [2023-10-20 14:16:16.870344] INFO [PROXY.CS] ob_mysql_client_session.cpp:236 [89116][Y0-7FB059DAA2E0] [lt=11] [dc=0] Starting new transaction using sm(cs_id=16, get_transact_count()=0, sm_id=21)


              [2023-10-20 14:16:16.874525] INFO [PROXY.SM] ob_mysql_sm.cpp:2098 [89116][Y0-7FB059DAA2E0] [lt=20] [dc=0] client login audit(client_addr={10.0.0.66:45843}, hsr.cluster_name_=obdemo, hsr.tenant_name_=obmysql, hsr.user_name_=tpcc, status="success")
              [2023-10-20 14:16:16.874886] INFO [PROXY.SS] ob_mysql_server_session.cpp:105 [89116][Y0-7FB059DAA2E0] [lt=13] [dc=0] server session born(ss_id=15878, server_ip={10.0.0.61:2881}, cs_id=16, proxy_sessid=0, server_type=1)
              [2023-10-20 14:16:16.875110] INFO [PROXY.TXN] ob_mysql_transact.cpp:3873 [89116][Y0-7FB059DAA2E0] [lt=30] [dc=0] succ to set proxy_sessid(cs_id=16, proxy_sessid=720576208556654611, server_ip={10.0.0.61:2881}, ss_id=15878, server_sessid=3221613756, is_proxy_mysql_client=false, ss_fd=62, client_addr="10.0.0.66:45843")

              上面这段日志记录了 OBProxy 客户端会话的建立过程。先建立前端会话连接(client session born),后建立后端会话连接(server session born)。其中前端会话中 caddr 是客户端连接信息, 但不一定是真正的客户端的地址。线上环境,这个有可能是 F5 的地址。daddr 是连接的 OBProxy 地址。日志里也记录了连接的集群、租户名和用户名,方便进一步识别应用的连接。最后 2 笔日志显式了后端会话建立,目标端口是 server_ip={10.0.0.61:2881} 。

              故意构造一笔慢查询。

                MySQL [tpccdb]> select sleep(3) from bmsql_warehouse;
                +----------+
                | sleep(3) |
                +----------+
                | 0 |

                | 0 |
                +----------+
                10 rows in set (30.16 sec)

                随后 OBProxy 里出现一笔日志。

                  [2023-10-20 14:22:22.555636] WARN [PROXY.SM] update_cmd_stats (ob_mysql_sm.cpp:8747) [89116][Y0-7FB059DAA2E0] [lt=15] [dc=0] Slow Query: ((client_ip={10.0.0.66:45843}, server_ip={10.0.0.61:2881}, obproxy_client_port={10.0.0.62:49399}, server_trace_id=YB420A00003D-000607FAEBE53389-0-0, route_type=ROUTE_TYPE_LEADER, user_name=tpcc, tenant_name=obmysql, cluster_name=obdemo, logic_database_name=, logic_tenant_name=, ob_proxy_protocol=0, cs_id=16, proxy_sessid=720576208556654611, ss_id=15878, server_sessid=3221613756, sm_id=21, cmd_size_stats={client_request_bytes:41, server_request_bytes:59, server_response_bytes:0, client_response_bytes:117}, cmd_time_stats={client_transaction_idle_time_us=321401977, client_request_read_time_us=41, client_request_analyze_time_us=38, cluster_resource_create_time_us=0, pl_lookup_time_us=0, pl_process_time_us=0, bl_lookup_time_us=0, bl_process_time_us=0, congestion_control_time_us=0, congestion_process_time_us=0, do_observer_open_time_us=12, server_connect_time_us=0, server_sync_session_variable_time_us=0, server_send_saved_login_time_us=0, server_send_use_database_time_us=0, server_send_session_variable_time_us=0, server_send_all_session_variable_time_us=0, server_send_last_insert_id_time_us=0, server_send_start_trans_time_us=0, server_send_xa_start_time_us=0, build_server_request_time_us=12, plugin_compress_request_time_us=0, prepare_send_request_to_server_time_us=82, server_request_write_time_us=29, server_process_request_time_us=30136850, server_response_read_time_us=8, plugin_decompress_response_time_us=26, server_response_analyze_time_us=10, ok_packet_trim_time_us=0, client_response_write_time_us=44, request_total_time_us=30137123}, sql=select sleep(3) from bmsql_warehouse)

                  这是常见的 Slow query 语句,字段 request_total_time_us 表示该 Query 的总耗时。因为超过了 OBProxy 的参数 slow_query_time_threshold 值,所以出现在 OBProxy 日志里。由于默认参数阈值是 500ms,很容易导致这类 Slow query 语句日志很多,影响排查。

                  接下来,客户端会话正常退出,发出 exit;命令。观察此时的 OBProxy 日志。

                    [2023-10-20 14:24:12.487129] INFO  [PROXY.SS] ob_mysql_server_session.cpp:145 [89116][Y0-7FB059DAA2E0] [lt=6] [dc=0] server session do_io_close((*this={ss_id:15878, server_sessid:3221613756, server_ip:{10.0.0.61:2881}, is_inited:true, magic:19136237, state:1, server_vc:0x7fb05c6c2720, client_session:{this:0x7fb05ab88030, is_proxy_mysql_client:false, is_waiting_trans_first_request:false, need_delete_cluster:false, is_first_dml_sql_got:false, vc_ready_killed:false, active:true, magic:19132429, conn_decrease:true, current_tid:89116,cs_id:16, proxy_sessid:720576208556654611, session_info:{is_inited:true, priv_info:{has_all_privilege:false, cs_id:16, user_priv_set:512, cluster_name:"obdemo", tenant_name:"obmysql", user_name:"tpcc"}, version:{common_hot_sys_var_version:0, common_sys_var_version:0,mysql_hot_sys_var_version:0, mysql_sys_var_version:0, hot_sys_var_version:2, sys_var_version:4, user_var_version:0, db_name_version:1, last_insert_id_version:0, sess_info_version:0}, hash_version:{common_hot_sys_var_version:0, common_sys_var_version:0, mysql_hot_sys_var_version:0, mysql_sys_var_version:0, hot_sys_var_version:0, sys_var_version:0, user_var_version:0, db_name_version:0, last_insert_id_version:0, sess_info_version:0}, val_hash:{common_hot_sys_var_hash:0, common_cold_sys_var_hash:0, mysql_hot_sys_var_hash:0, mysql_cold_sys_var_hash:0, hot_sys_var_hash:0, cold_sys_var_hash:0, user_var_hash:0}, global_vars_version:0, is_global_vars_changed:false, is_trans_specified:false, is_user_idc_name_set:false, is_read_consistency_set:false, idc_name:"", cluster_id:0, real_meta_cluster_name:"", safe_read_snapshot:1, syncing_safe_read_snapshot:0, route_policy:1, proxy_route_policy:3, user_identity:0, global_vars_version:0, is_read_only_user:false, is_request_follower_user:false}, dummy_ldc:{use_ldc:false, idc_name:"", item_count:3, site_start_index_array:[[0]0, [1]3, [2]3, [3]3], item_array:[[0]{idc_type:"SAME_IDC", zone_type:"ReadWrite", replica:{server:"10.0.0.62:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, is_merging:false, is_partition_server:false, is_force_congested:false, is_used:true}, [1]{idc_type:"SAME_IDC", zone_type:"ReadWrite", replica:{server:"10.0.0.63:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, is_merging:false, is_partition_server:false, is_force_congested:false, is_used:false}, [2]{idc_type:"SAME_IDC", zone_type:"ReadWrite", replica:{server:"10.0.0.61:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, is_merging:false, is_partition_server:false, is_force_congested:false, is_used:true}], pl:null, ts:0x7fb05aebfec0, readonly_exist_status:"READONLY_ZONE_NOT_EXIST"}, dummy_entry:0x7fb05c699340, server_state_version:2, cur_ss:0x7fb05a552590, bound_ss:null, lii_ss:null, cluster_resource:{this:0x7fb059594080, ref_count:10, is_inited:true, cluster_info_key:{cluster_name:{config_string:"obdemo"}, cluster_id:0}, cr_state:"CR_AVAIL", version:146461, last_access_time_ns:1697782582556246864, deleting_completed_thread_num:0, fetch_rslist_task_count:0, fetch_idc_list_task_count:0, last_idc_list_refresh_time_ns:1697779627710187785, last_rslist_refresh_time_ns:0, server_state_version:2}, client_vc:0x7fb059da3d20, using_ldg:false, trace_stats:NULL}, transact_count:6, server_trans_stat:1, session_info:{cap:2639, ob_server:, version:{common_hot_sys_var_version:0, common_sys_var_version:0, mysql_hot_sys_var_version:0, mysql_sys_var_version:0, hot_sys_var_version:2, sys_var_version:4, user_var_version:0, db_name_version:1, last_insert_id_version:0, sess_info_version:0}, val_hash:{common_hot_sys_var_hash:0, common_cold_sys_var_hash:0, mysql_hot_sys_var_hash:0, mysql_cold_sys_var_hash:0, hot_sys_var_hash:0, cold_sys_var_hash:0, user_var_hash:0}, checksum_switch:1, server_type:1}}, server_vc_=0x7fb05c6c2720, this=0x7fb05a552590)
                    [2023-10-20 14:24:12.487403] INFO [PROXY.SS] ob_mysql_server_session.cpp:177 [89116][Y0-7FB059DAA2E0] [lt=261] [dc=0] server session is closing(ss_id=15878, server_sessid=3221613756, server_ip={10.0.0.61:2881}, cs_id=16, proxy_sessid=720576208556654611)
                    [2023-10-20 14:24:12.487434] INFO [PROXY.TXN] ob_mysql_transact.cpp:4692 [89116][Y0-7FB059DAA2E0] [lt=8] [dc=0] [ObMysqlTransact::handle_server_connection_break](client_ip={10.0.0.66:45843}, server_ip={10.0.0.61:2881}, cs_id=16, proxy_sessid=720576208556654611, ss_id=0, server_sessid=0, sm_id=21, proxy_user_name=tpcc@obmysql#obdemo, database_name=tpccdb, server_state="INTERNAL_ERROR", request_cmd="Quit", sql_cmd="Quit", sql=)
                    [2023-10-20 14:24:12.487447] INFO [PROXY.SM] ob_mysql_sm.cpp:7619 [89116][Y0-7FB059DAA2E0] [lt=11] [dc=0] [setup_error_transfer] Now closing connection caused by COM_QUIT(sm_id=21, request_cmd="Quit", sql_cmd="Quit", sql=)
                    [2023-10-20 14:24:12.487459] WARN [PROXY.SM] update_stats (ob_mysql_sm.cpp:8873) [89116][Y0-7FB059DAA2E0] [lt=4] [dc=0] Slow transaction: ((client_ip={10.0.0.66:45843}, server_ip={10.0.0.61:2881}, server_trace_id=YB420A00003D-000607FAEBE53389-0-0, route_type=ROUTE_TYPE_LEADER, ob_proxy_protocol=0, cs_id=16, proxy_sessid=720576208556654611, ss_id=0, server_sessid=0, sm_id=21, state=14, trans_stats_={client_requests:6, server_responses:4, pl_lookup_retries:0, server_retries:0, client_request_bytes:150, server_request_bytes:179, server_response_bytes:0, client_response_bytes:2992, client_transaction_idle_time_us=439774381, client_process_request_time_us=352, client_request_read_time_us=212, client_request_analyze_time_us=194, cluster_resource_create_time_us=0, pl_lookup_time_us=8, pl_process_time_us=11, congestion_control_time_us=12, congestion_process_time_us=0, do_observer_open_time_us=43, server_connect_time_us=0, sync_session_variable_time_us=0, send_saved_login_time_us=0, send_use_database_time_us=0, send_session_vars_time_us=0, send_all_session_vars_time_us=0, send_last_insert_id_time_us=0, send_start_trans_time_us=0, send_xa_start_time_us=0, build_server_request_time_us=42, plugin_compress_request_time_us=0, prepare_send_request_to_server_time_us=277, server_request_write_time_us=139, server_process_request_time_us=30143475, server_response_read_time_us=75, plugin_decompress_response_time_us=82, server_response_analyze_time_us=83, ok_packet_trim_time_us=0, client_response_write_time_us=579, trans_time_us=469927320}, last_sql=)
                    [2023-10-20 14:24:12.487656] INFO  [PROXY.SS] ob_mysql_client_session.cpp:647 [89116][Y0-7FB059DAA2E0] [lt=26] [dc=0] client session do_io_close((*this={this:0x7fb05ab88030, is_proxy_mysql_client:false, is_waiting_trans_first_request:false, need_delete_cluster:false, is_first_dml_sql_got:false, vc_ready_killed:false, active:true, magic:19132429, conn_decrease:true, current_tid:89116, cs_id:16, proxy_sessid:720576208556654611, session_info:{is_inited:true, priv_info:{has_all_privilege:false, cs_id:16, user_priv_set:512, cluster_name:"obdemo", tenant_name:"obmysql", user_name:"tpcc"}, version:{common_hot_sys_var_version:0, common_sys_var_version:0, mysql_hot_sys_var_version:0, mysql_sys_var_version:0, hot_sys_var_version:2, sys_var_version:4, user_var_version:0, db_name_version:1, last_insert_id_version:0, sess_info_version:0}, hash_version:{common_hot_sys_var_version:0, common_sys_var_version:0, mysql_hot_sys_var_version:0, mysql_sys_var_version:0, hot_sys_var_version:0, sys_var_version:0, user_var_version:0, db_name_version:0, last_insert_id_version:0, sess_info_version:0}, val_hash:{common_hot_sys_var_hash:0, common_cold_sys_var_hash:0, mysql_hot_sys_var_hash:0, mysql_cold_sys_var_hash:0, hot_sys_var_hash:0, cold_sys_var_hash:0, user_var_hash:0}, global_vars_version:0, is_global_vars_changed:false, is_trans_specified:false, is_user_idc_name_set:false, is_read_consistency_set:false, idc_name:"", cluster_id:0, real_meta_cluster_name:"", safe_read_snapshot:1, syncing_safe_read_snapshot:0, route_policy:1, proxy_route_policy:3, user_identity:0, global_vars_version:0, is_read_only_user:false, is_request_follower_user:false}, dummy_ldc:{use_ldc:false, idc_name:"", item_count:3, site_start_index_array:[[0]0, [1]3, [2]3, [3]3], item_array:[[0]{idc_type:"SAME_IDC", zone_type:"ReadWrite", replica:{server:"10.0.0.62:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, is_merging:false, is_partition_server:false, is_force_congested:false, is_used:true}, [1]{idc_type:"SAME_IDC", zone_type:"ReadWrite", replica:{server:"10.0.0.63:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, is_merging:false, is_partition_server:false, is_force_congested:false, is_used:false}, [2]{idc_type:"SAME_IDC", zone_type:"ReadWrite", replica:{server:"10.0.0.61:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, is_merging:false, is_partition_server:false, is_force_congested:false, is_used:true}], pl:null, ts:0x7fb05aebfec0, readonly_exist_status:"READONLY_ZONE_NOT_EXIST"}, dummy_entry:0x7fb05c699340, server_state_version:2, cur_ss:null, bound_ss:null, lii_ss:null, cluster_resource:{this:0x7fb059594080, ref_count:10, is_inited:true, cluster_info_key:{cluster_name:{config_string:"obdemo"}, cluster_id:0}, cr_state:"CR_AVAIL", version:146461, last_access_time_ns:1697782582556246864, deleting_completed_thread_num:0, fetch_rslist_task_count:0, fetch_idc_list_task_count:0, last_idc_list_refresh_time_ns:1697779627710187785, last_rslist_refresh_time_ns:0, server_state_version:2}, client_vc:0x7fb059da3d20, using_ldg:false, trace_stats:NULL}, client_vc_=0x7fb059da3d20, this=0x7fb05ab88030)
                    [2023-10-20 14:24:12.487946] INFO [PROXY.CS] ob_mysql_client_session.cpp:93 [89116][Y0-7FB059DAA2E0] [lt=245] [dc=0] client session destroy(cs_id=16, proxy_sessid=720576208556654611, client_vc=NULL)
                    [2023-10-20 14:24:12.488015] INFO [PROXY.SM] ob_mysql_sm.cpp:8179 [89116][Y0-7FB059DAA2E0] [lt=11] [dc=0] deallocating sm(sm_id=21)

                    分析发现:

                    • 连接正常退出时,日志里有 COM_QUIT 事件。

                    • 正常退出时,日志里显示的调用顺序:先有 `server session do_io_close` 后有 `client session do_io_close` 并且有 `deallocating sm` 。

                    场景二:客户端会话正常建立和异常退出。

                    客户端相关过程和信息如下。

                      [root@ora12c ~]# mysql -h10.0.0.62 -utpcc@obmysql#obdemo -P28830 -pabcABC123 -c -A tpccdb


                      MySQL [tpccdb]> select * from bmsql_warehouse;
                      +------+------------+--------+------------+---------------------+----------------------+----------------------+---------+-----------+
                      | w_id | w_ytd | w_tax | w_name | w_street_1 | w_street_2 | w_city | w_state | w_zip |
                      +------+------------+--------+------------+---------------------+----------------------+----------------------+---------+-----------+
                      | 1 | 4062863.16 | 0.1873 | vPJUyrE7G | yjDKpFuiShMAoNA | fOrVS6Q7iAzP | xvNWSBM8SRz9IQ4 | NX | 637211111 |

                      | 10 | 3796057.29 | 0.0304 | DP98DH32zb | boSEcSBN8GcAm6se8M | ozsg0xZMeAu1vn | EgXsvSzIwucP | PJ | 620111111 |
                      +------+------------+--------+------------+---------------------+----------------------+----------------------+---------+-----------+
                      10 rows in set (0.11 sec)


                      MySQL [tpccdb]> select last_trace_id();
                      +-----------------------------------+
                      | last_trace_id() |
                      +-----------------------------------+
                      | YB420A00003D-000607FAEEE531BC-0-0 |
                      +-----------------------------------+
                      1 row in set (0.00 sec)


                      MySQL [tpccdb]> show processlist;
                      +------+---------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+
                      | Id | Tenant | User | Host | db | trans_count | svr_session_count | state | tid | pid |
                      +------+---------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+
                      | 15 | obmysql | tpcc | 10.0.0.66:40207 | tpccdb | 0 | 1 | MCS_ACTIVE_READER | 89116 | 89116 |
                      +------+---------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+
                      1 row in set (0.02 sec)


                      MySQL [tpccdb]> show proxysession;
                      +--------------------+------+---------+---------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+-----------+
                      | proxy_sessid | Id | Cluster | Tenant | User | Host | db | trans_count | svr_session_count | state | tid | pid | using_ssl |
                      +--------------------+------+---------+---------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+-----------+
                      | 720576208556654610 | 15 | obdemo | obmysql | tpcc | 10.0.0.66:40207 | tpccdb | 0 | 1 | MCS_ACTIVE_READER | 89116 | 89116 | 0 |
                      +--------------------+------+---------+---------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+-----------+
                      1 row in set (0.03 sec)


                      MySQL [tpccdb]> show proxysession attribute 15;
                      +----------------------------------+---------------------+----------------+
                      | attribute_name | value | info |
                      +----------------------------------+---------------------+----------------+
                      | proxy_sessid | 720576208556654610 | cs common |
                      | cs_id | 15 | cs common |
                      | cluster | obdemo | cs common |
                      | tenant | obmysql | cs common |
                      | user | tpcc | cs common |
                      | host_ip | 10.0.0.66 | cs common |
                      | host_port | 40207 | cs common |
                      | db | tpccdb | cs common |
                      | total_trans_cnt | 0 | cs common |
                      | svr_session_cnt | 1 | cs common |
                      | active | true | cs common |
                      | read_state | MCS_ACTIVE_READER | cs common |
                      | tid | 89116 | cs common |
                      | pid | 89116 | cs common |
                      | idc_name | | cs common |
                      | modified_time | 0 | cs stat |
                      | reported_time | 0 | cs stat |
                      | hot_sys_var_version | 2 | cs var version |
                      | sys_var_version | 4 | cs var version |
                      | user_var_version | 0 | cs var version |
                      | last_insert_id_version | 0 | cs var version |
                      | db_name_version | 1 | cs var version |
                      | server_ip | 10.0.0.61 | last used ss |
                      | server_port | 2881 | last used ss |
                      | server_sessid | 3221714274 | last used ss |
                      | ss_id | 15580 | last used ss |
                      | state | MSS_KA_CLIENT_SLAVE | last used ss |
                      | transact_count | 4 | last used ss |
                      | server_trans_stat | 0 | last used ss |
                      | hot_sys_var_version | 2 | last used ss |
                      | sys_var_version | 4 | last used ss |
                      | user_var_version | 0 | last used ss |
                      | last_insert_id_version | 0 | last used ss |
                      | db_name_version | 1 | last used ss |
                      | is_checksum_supported | 1 | last used ss |
                      | is_safe_read_weak_supported | 0 | last used ss |
                      | is_checksum_switch_supported | 1 | last used ss |
                      | checksum_switch | 1 | last used ss |
                      | enable_extra_ok_packet_for_stats | 1 | last used ss |
                      +----------------------------------+---------------------+----------------+
                      39 rows in set (0.00 sec)

                      OBProxy 端收集的信息如下。

                        MySQL [(none)]> show processlist;
                        +--------+----------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+
                        | Id | Tenant | User | Host | db | trans_count | svr_session_count | state | tid | pid |
                        +--------+----------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+
                        | 262147 | proxysys | root | 127.0.0.1:45170 | NULL | 0 | 0 | MCS_ACTIVE_READER | 89127 | 89116 |
                        | 15 | obmysql | tpcc | 10.0.0.66:40207 | tpccdb | 0 | 1 | MCS_ACTIVE_READER | 89116 | 89116 |
                        +--------+----------+------+-----------------+--------+-------------+-------------------+-------------------+-------+-------+
                        2 rows in set (0.00 sec)


                        SELECT
                        tenant, id, user, host, db, command, time, state, info, svr_ip, proxy_sessid, user_client_ip
                        FROM
                        oceanbase.__all_virtual_processlist
                        WHERE tenant='obmysql'
                        and user='TPCC'
                        ORDER BY
                        user_client_ip;

                        这个视图也非常有用。上面记录表示有客户端从 66 通过 62 上的代理间接连接到 OB 节点 61 。39741 是 OBProxy 维护后端会话的本地临时端口。

                        OBProxy 日志里的连接建立相关日志。

                          [2023-10-20 13:47:36.727119] INFO [PROXY.SOCK] ob_connection.cpp:350 [89148][Y0-0] [lt=50] [dc=0] connection accepted(client={10.0.0.66:40207}, server={0.0.0.0:28830}, accepted_fd=58, listen_fd=57)


                          [2023-10-20 13:47:36.727478] INFO [PROXY.NET] ob_unix_net_vconnection.cpp:1243 [89116][Y0-7FB05A5A83A0] [lt=32] [dc=0] vip connect(protocol=6, fd=58, vid=0, vaddr={10.0.0.66:2883}, caddr={10.0.0.66:40207}, daddr={10.0.0.62:28830})
                          [2023-10-20 13:47:36.727505] INFO [PROXY.CS] ob_mysql_client_session.cpp:365 [89116][Y0-7FB05A5A83A0] [lt=22] [dc=0] client session born(cs_id=15, proxy_sessid=0, is_local_connection=false, client_vc=0x7fb05a598120, client_fd=58, client_addr="10.0.0.66:40207")
                          [2023-10-20 13:47:36.727531] INFO [PROXY.CS] ob_mysql_client_session.cpp:236 [89116][Y0-7FB05A5A83A0] [lt=12] [dc=0] Starting new transaction using sm(cs_id=15, get_transact_count()=0, sm_id=20)


                          [2023-10-20 13:47:36.731806] INFO [PROXY.SM] ob_mysql_sm.cpp:2098 [89116][Y0-7FB05A5A83A0] [lt=38] [dc=0] client login audit(client_addr={10.0.0.66:40207}, hsr.cluster_name_=obdemo, hsr.tenant_name_=obmysql, hsr.user_name_=tpcc, status="success")
                          [2023-10-20 13:47:36.732324] INFO [PROXY.SS] ob_mysql_server_session.cpp:105 [89116][Y0-7FB05A5A83A0] [lt=16] [dc=0] server session born(ss_id=15580, server_ip={10.0.0.61:2881}, cs_id=15, proxy_sessid=0, server_type=1)
                          [2023-10-20 13:47:36.735767] INFO [PROXY.TXN] ob_mysql_transact.cpp:3873 [89116][Y0-7FB05A5A83A0] [lt=27] [dc=0] succ to set proxy_sessid(cs_id=15, proxy_sessid=720576208556654610, server_ip={10.0.0.61:2881}, ss_id=15580, server_sessid=3221714274, is_proxy_mysql_client=false, ss_fd=62, client_addr="10.0.0.66:40207")

                          模拟客户端杀会话。真实的业务场景里杀连接的通常是防火墙或者 F5/A10/SLB/LVS 等软硬件负载均衡产品,杀连接的原因是连接空闲超出这些设备的最大容忍设置。而 OBProxy 作为有责任的代理,从来不主动杀连接(前端会话和后端会话)(OB 会话空闲超时会杀连接)。

                            [root@ora12c ~]# kill -9 4013
                            [root@ora12c ~]# netstat -antlp |grep 10.0.0.62
                            tcp 0 0 172.27.0.2:40207 10.0.0.62:28830 TIME_WAIT -
                            [root@ora12c ~]#

                            会话杀掉后,TCP 连接状态会很快变为 TIME_WAIT,然后很快连接就释放了。线上清空如果网络链路有问题,可能会观察到大量的 TIME_WAIT 状态的连接。

                            观察 OBProxy.log 日志里连接断开的日志。

                              [2023-10-20 14:01:48.980972] INFO [PROXY.SM] ob_mysql_sm.cpp:524 [89116][Y0-7FB05A5A83A0] [lt=7] [dc=0] ObMysqlSM::state_client_request_read(event="VC_EVENT_EOS", sm_id=20, client_vc=0x7fb05a598120)
                              [2023-10-20 14:01:48.981009] INFO [PROXY.TXN] ob_mysql_transact.cpp:80 [89116][Y0-7FB05A5A83A0] [lt=35] [dc=0] [ObMysqlTransact::bad_request] parser marked request bad
                              [2023-10-20 14:01:48.981125] WARN  [PROXY.SM] update_stats (ob_mysql_sm.cpp:8873) [89116][Y0-7FB05A5A83A0] [lt=58] [dc=0] Slow transaction: ((client_ip={10.0.0.66:40207}, server_ip={10.0.0.61:2881}, server_trace_id=, route_type=ROUTE_TYPE_LEADER, ob_proxy_protocol=0, cs_id=15, proxy_sessid=720576208556654610, ss_id=0, server_sessid=0, sm_id=20, state=9, trans_stats_={client_requests:8, server_responses:2, pl_lookup_retries:0, server_retries:0, client_request_bytes:217, server_request_bytes:97, server_response_bytes:0, client_response_bytes:7001, client_transaction_idle_time_us=825806214, client_process_request_time_us=366, client_request_read_time_us=304, client_request_analyze_time_us=281, cluster_resource_create_time_us=0, pl_lookup_time_us=8, pl_process_time_us=11, congestion_control_time_us=12, congestion_process_time_us=0, do_observer_open_time_us=25, server_connect_time_us=0, sync_session_variable_time_us=0, send_saved_login_time_us=0, send_use_database_time_us=0, send_session_vars_time_us=0, send_all_session_vars_time_us=0, send_last_insert_id_time_us=0, send_start_trans_time_us=0, send_xa_start_time_us=0, build_server_request_time_us=22, plugin_compress_request_time_us=0, prepare_send_request_to_server_time_us=161, server_request_write_time_us=62, server_process_request_time_us=15910, server_response_read_time_us=16, plugin_decompress_response_time_us=56, server_response_analyze_time_us=20, ok_packet_trim_time_us=0, client_response_write_time_us=1360, trans_time_us=825858946}, last_sql=show proxysession attribute 15)
                              [2023-10-20 14:01:48.981281] INFO  [PROXY.SS] ob_mysql_client_session.cpp:647 [89116][Y0-7FB05A5A83A0] [lt=46] [dc=0] client session do_io_close((*this={this:0x7fb05ab88030, is_proxy_mysql_client:false, is_waiting_trans_first_request:false, need_delete_cluster:false,is_first_dml_sql_got:false, vc_ready_killed:false, active:true, magic:19132429, conn_decrease:true, current_tid:89116, cs_id:15, proxy_sessid:720576208556654610, session_info:{is_inited:true, priv_info:{has_all_privilege:false, cs_id:15, user_priv_set:512, cluster_name:"obdemo", tenant_name:"obmysql", user_name:"tpcc"}, version:{common_hot_sys_var_version:0, common_sys_var_version:0, mysql_hot_sys_var_version:0, mysql_sys_var_version:0, hot_sys_var_version:2, sys_var_version:4, user_var_version:0, db_name_version:1, last_insert_id_version:0, sess_info_version:0}, hash_version:{common_hot_sys_var_version:0, common_sys_var_version:0, mysql_hot_sys_var_version:0, mysql_sys_var_version:0, hot_sys_var_version:0, sys_var_version:0, user_var_version:0, db_name_version:0, last_insert_id_version:0, sess_info_version:0}, val_hash:{common_hot_sys_var_hash:0, common_cold_sys_var_hash:0, mysql_hot_sys_var_hash:0, mysql_cold_sys_var_hash:0, hot_sys_var_hash:0, cold_sys_var_hash:0, user_var_hash:0}, global_vars_version:0, is_global_vars_changed:false, is_trans_specified:false, is_user_idc_name_set:false, is_read_consistency_set:false, idc_name:"", cluster_id:0, real_meta_cluster_name:"", safe_read_snapshot:1, syncing_safe_read_snapshot:0, route_policy:1, proxy_route_policy:3, user_identity:0, global_vars_version:0, is_read_only_user:false, is_request_follower_user:false}, dummy_ldc:{use_ldc:false, idc_name:"", item_count:3, site_start_index_array:[[0]0, [1]3, [2]3, [3]3], item_array:[[0]{idc_type:"SAME_IDC", zone_type:"ReadWrite", replica:{server:"10.0.0.62:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, is_merging:false, is_partition_server:false, is_force_congested:false, is_used:true}, [1]{idc_type:"SAME_IDC", zone_type:"ReadWrite", replica:{server:"10.0.0.61:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, is_merging:false, is_partition_server:false, is_force_congested:false, is_used:true}, [2]{idc_type:"SAME_IDC", zone_type:"ReadWrite", replica:{server:"10.0.0.63:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, is_merging:false, is_partition_server:false, is_force_congested:false, is_used:false}], pl:null, ts:0x7fb05aebfec0, readonly_exist_status:"READONLY_ZONE_NOT_EXIST"}, dummy_entry:0x7fb05c699340, server_state_version:2, cur_ss:null, bound_ss:0x7fb05a552590, lii_ss:null, cluster_resource:{this:0x7fb059594080, ref_count:10, is_inited:true, cluster_info_key:{cluster_name:{config_string:"obdemo"}, cluster_id:0}, cr_state:"CR_AVAIL", version:146461, last_access_time_ns:1697780883121620704, deleting_completed_thread_num:0, fetch_rslist_task_count:0, fetch_idc_list_task_count:0, last_idc_list_refresh_time_ns:1697779627710187785, last_rslist_refresh_time_ns:0, server_state_version:2}, client_vc:0x7fb05a598120, using_ldg:false, trace_stats:NULL}, client_vc_=0x7fb05a598120, this=0x7fb05ab88030)
                              [2023-10-20 14:01:48.981411] INFO [PROXY.SS] ob_mysql_server_session.cpp:145 [89116][Y0-7FB05A5A83A0] [lt=124] [dc=0] server session do_io_close((*this={ss_id:15580, server_sessid:3221714274, server_ip:{10.0.0.61:2881}, is_inited:true, magic:19136237, state:2, server_vc:0x7fb05c6c1870, client_session:{this:0x7fb05ab88030, is_proxy_mysql_client:false, is_waiting_trans_first_request:false, need_delete_cluster:false, is_first_dml_sql_got:false, vc_ready_killed:false, active:false, magic:19132429, conn_decrease:true, current_tid:89116, cs_id:15, proxy_sessid:720576208556654610, session_info:{is_inited:true, priv_info:{has_all_privilege:false, cs_id:15, user_priv_set:512, cluster_name:"obdemo", tenant_name:"obmysql", user_name:"tpcc"}, version:{common_hot_sys_var_version:0, common_sys_var_version:0, mysql_hot_sys_var_version:0, mysql_sys_var_version:0, hot_sys_var_version:2, sys_var_version:4, user_var_version:0, db_name_version:1, last_insert_id_version:0, sess_info_version:0}, hash_version:{common_hot_sys_var_version:0, common_sys_var_version:0, mysql_hot_sys_var_version:0, mysql_sys_var_version:0, hot_sys_var_version:0, sys_var_version:0, user_var_version:0, db_name_version:0, last_insert_id_version:0, sess_info_version:0}, val_hash:{common_hot_sys_var_hash:0, common_cold_sys_var_hash:0, mysql_hot_sys_var_hash:0, mysql_cold_sys_var_hash:0, hot_sys_var_hash:0, cold_sys_var_hash:0, user_var_hash:0}, global_vars_version:0, is_global_vars_changed:false, is_trans_specified:false, is_user_idc_name_set:false, is_read_consistency_set:false, idc_name:"", cluster_id:0, real_meta_cluster_name:"", safe_read_snapshot:1, syncing_safe_read_snapshot:0, route_policy:1, proxy_route_policy:3, user_identity:0, global_vars_version:0, is_read_only_user:false, is_request_follower_user:false}, dummy_ldc:{use_ldc:false, idc_name:"", item_count:3, site_start_index_array:[[0]0, [1]3, [2]3, [3]3], item_array:[[0]{idc_type:"SAME_IDC", zone_type:"ReadWrite", replica:{server:"10.0.0.62:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, is_merging:false, is_partition_server:false, is_force_congested:false, is_used:true}, [1]{idc_type:"SAME_IDC", zone_type:"ReadWrite", replica:{server:"10.0.0.61:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, is_merging:false, is_partition_server:false, is_force_congested:false, is_used:true}, [2]{idc_type:"SAME_IDC", zone_type:"ReadWrite", replica:{server:"10.0.0.63:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, is_merging:false, is_partition_server:false, is_force_congested:false, is_used:false}], pl:null, ts:0x7fb05aebfec0, readonly_exist_status:"READONLY_ZONE_NOT_EXIST"}, dummy_entry:0x7fb05c699340, server_state_version:2, cur_ss:null, bound_ss:0x7fb05a552590, lii_ss:null, cluster_resource:{this:0x7fb059594080, ref_count:10, is_inited:true, cluster_info_key:{cluster_name:{config_string:"obdemo"}, cluster_id:0}, cr_state:"CR_AVAIL", version:146461, last_access_time_ns:1697780883121620704, deleting_completed_thread_num:0, fetch_rslist_task_count:0, fetch_idc_list_task_count:0, last_idc_list_refresh_time_ns:1697779627710187785, last_rslist_refresh_time_ns:0, server_state_version:2}, client_vc:0x7fb05a598120, using_ldg:false, trace_stats:NULL}, transact_count:4, server_trans_stat:0, session_info:{cap:2639, ob_server:, version:{common_hot_sys_var_version:0, common_sys_var_version:0, mysql_hot_sys_var_version:0, mysql_sys_var_version:0, hot_sys_var_version:2, sys_var_version:4, user_var_version:0, db_name_version:1, last_insert_id_version:0, sess_info_version:0}, val_hash:{common_hot_sys_var_hash:0, common_cold_sys_var_hash:0, mysql_hot_sys_var_hash:0, mysql_cold_sys_var_hash:0, hot_sys_var_hash:0, cold_sys_var_hash:0, user_var_hash:0}, checksum_switch:1, server_type:1}}, server_vc_=0x7fb05c6c1870, this=0x7fb05a552590)
                              [2023-10-20 14:01:48.981592] INFO [PROXY.SS] ob_mysql_server_session.cpp:177 [89116][Y0-7FB05A5A83A0] [lt=95] [dc=0] server session is closing(ss_id=15580, server_sessid=3221714274, server_ip={10.0.0.61:2881}, cs_id=15, proxy_sessid=720576208556654610)

                              分析发现:

                              • 客户端异常退出,会有个事件关键字 VC_EVENT_EOS 。

                              • 客户端异常退出时 OBProxy.lo 先有 `client session do_io_close` 后有 `server session do_io_close`。

                              • 没有 deallocating sm 日志。

                              OB SQL 审计视图

                              场景三:应用只读查询场景。

                              通过 BenchmarkSQL 压测 OB ,开启 2 个并发,跑 1 笔交易就停止,便于在 OB 端抓取相应的 SQL 和日志。下面是通过 GV$OB_SQL_AUDIT 抓取的 BenchmarkSQL 其中一个场景会话的 SQL 。

                              GV$OB_SQL_AUDIT 信息也很丰富,对推测业务场景有帮助。分析如下:

                              • 这个应用会话主要是发起了两个查询。应用的 OB JDBC 驱动会发出四条 SQL。

                              • 列 `SID` 是会话 ID,具体是指 OBProxy 跟 OBServer 节点的会话在 OBServer 端的ID。这个可以用于 OBProxy 日志里定位相关会话记录(OBProxy.log 里搜索 server_sessid )。

                              • 列 `SVR_IP` 是这条 SQL 被路由到的节点,也是生成执行计划并执行的节点, 列 `PLAN_TYPE` 是执行计划类型。0 是非查询和修改类 SQL,1 表示本地执行计划,2 表示远程执行,3表示分布式执行计划。

                              • 第 3 条 SQL 是 JDBC 驱动获取 OB 时区设置的。3.2  OB 部分版本这个视图为空,可能导致应用时区设置不当。

                              • 列 `TRANSACTION_HASH` 为 0 表示没有开启事务;大于 0 表示事务标识。第 5 和 6 两条 SQL 都是只读查询,没有开启事务(具体由 OBServer 端节点参数 `ob_proxy_readonly_transaction_routing_policy` 控制。`false` 表示只读查询,不开启事务)。

                              • 列 `USER_CLIENT_IP` 显式的是 SQL 请求客户端的真实 IP。有些应用是 K8S 里的容器,有时候能看到容器的真实 IP,也可能看到的是容器所在物理机的 IP,这取决于容器的网络模式。这个特点留待以后再研究。

                              对应的 OBProxy.log 日志如下。

                                [2023-10-18 19:51:39.741420] INFO  [PROXY.NET] ob_mysql_session_accept.cpp:36 [89116][Y0-7FB059DA83A0] [lt=7] [dc=0] [ObMysqlSessionAccept:main_event] accepted connection(netvc=0x7fb059d98120, client_ip={10.0.0.65:39364})
                                [2023-10-18 19:51:39.741626] INFO [PROXY.NET] ob_unix_net_vconnection.cpp:1243 [89116][Y0-7FB059DA83A0] [lt=22] [dc=0] vip connect(protocol=6, fd=62, vid=0, vaddr={10.0.0.65:2883}, caddr={10.0.0.65:39364}, daddr={10.0.0.62:28830})
                                [2023-10-18 19:51:39.741648] INFO  [PROXY.CS] ob_mysql_client_session.cpp:365 [89116][Y0-7FB059DA83A0] [lt=18] [dc=0] client session born(cs_id=5, proxy_sessid=0, is_local_connection=false, client_vc=0x7fb059d98120, client_fd=62, client_addr="10.0.0.65:39364")
                                [2023-10-18 19:51:39.778824] INFO  [PROXY] ob_vip_tenant_conn.cpp:144 [89116][Y0-7FB059DA83A0] [lt=15] [dc=0] vip tenant connect is not in vip_tenant_conn_cache(key_name=oboracle#obdemo|, ret=-4018)
                                [2023-10-18 19:51:39.778849] INFO  [PROXY.SM] ob_mysql_sm.cpp:2098 [89116][Y0-7FB059DA83A0] [lt=23] [dc=0] client login audit(client_addr={10.0.0.65:39364}, hsr.cluster_name_=obdemo, hsr.tenant_name_=oboracle, hsr.user_name_=tpcc, status="success")

                                日志简单分析如下:

                                • 这是 OBProxy 收到监听开始建连接日志。

                                • 地址 `vaddr={10.0.0.65:2883}` 意思不明,65 是客户端,不是 OBProxy 服务器, 上面没有 2883 这个监听。这个估计是代码错误。

                                • 地址 `caddr={10.0.0.65:39364}` 是客户端地址和端口。

                                • 地址 `daddr={10.0.0.62:28830}` 是目标端(OBProxy)连接地址和端口。

                                • OBProxy 跟应用客户端连接建立,标识 `cs_id=5`, 跟 OBServer 端连接还没有建立,所以 `proxy_sessid=0` 。

                                • 错误 `ret=-4018` 跟 OBProxy 一个用法有关,这个错误可以忽略。后面这类错误也不少。

                                  [2023-10-18 19:51:39.741671] INFO [PROXY.CS] ob_mysql_client_session.cpp:236 [89116][Y0-7FB059DA83A0] [lt=10] [dc=0] Starting new transaction using sm(cs_id=5, get_transact_count()=0, sm_id=7)

                                  [2023-10-18 19:51:39.741703] WARN [PROXY] get_cluster_name (ob_session_field_mgr.cpp:617) [89116][Y0-7FB059DA83A0] [lt=7] [dc=0] fail to get cluster name(ret=-4018)
                                  [2023-10-18 19:51:39.778824] INFO [PROXY] ob_vip_tenant_conn.cpp:144 [89116][Y0-7FB059DA83A0] [lt=15] [dc=0] vip tenant connect is not in vip_tenant_conn_cache(key_name=oboracle#obdemo|, ret=-4018)
                                  [2023-10-18 19:51:39.778849] INFO [PROXY.SM] ob_mysql_sm.cpp:2098 [89116][Y0-7FB059DA83A0] [lt=23] [dc=0] client login audit(client_addr={10.0.0.65:39364}, hsr.cluster_name_=obdemo, hsr.tenant_name_=oboracle, hsr.user_name_=tpcc, status="success")
                                  [2023-10-18 19:51:39.785488] INFO [PROXY.SM] ob_mysql_sm.cpp:8726 [89116][Y0-7FB059DA83A0] [lt=11] [dc=0] Trace Stat: ((client_ip={*Not IP address [0]*:0}, server_ip={10.0.0.63:2881}, obproxy_client_port={10.0.0.62:40963}, server_trace_id=, route_type=ROUTE_TYPE_NONPARTITION_UNMERGE_LOCAL, user_name=proxyro, tenant_name=sys, cluster_name=obdemo, logic_database_name=, logic_tenant_name=, ob_proxy_protocol=0, cs_id=2147483652, proxy_sessid=720576208556654595, ss_id=48, server_sessid=3222012880, sm_id=4, cmd_size_stats={client_request_bytes:256, server_request_bytes:274, server_response_bytes:0, client_response_bytes:2844}, cmd_time_stats={client_transaction_idle_time_us=0, client_request_read_time_us=26, client_request_analyze_time_us=26, cluster_resource_create_time_us=0, pl_lookup_time_us=7, pl_process_time_us=20, bl_lookup_time_us=0, bl_process_time_us=0, congestion_control_time_us=4, congestion_process_time_us=0, do_observer_open_time_us=3, server_connect_time_us=0, server_sync_session_variable_time_us=0, server_send_saved_login_time_us=0, server_send_use_database_time_us=0, server_send_session_variable_time_us=0, server_send_all_session_variable_time_us=0, server_send_last_insert_id_time_us=0, server_send_start_trans_time_us=0, server_send_xa_start_time_us=0, build_server_request_time_us=6, plugin_compress_request_time_us=0, prepare_send_request_to_server_time_us=74, server_request_write_time_us=54, server_process_request_time_us=6190, server_response_read_time_us=8, plugin_decompress_response_time_us=25, server_response_analyze_time_us=8, ok_packet_trim_time_us=0, client_response_write_time_us=89, request_total_time_us=6514}, sql=SELECT *+READ_CONSISTENCY(WEAK)*/ * FROM oceanbase.__all_virtual_proxy_schema WHERE tenant_name = 'oboracle' AND database_name = 'oceanbase' AND table_name = '__all_dummy' AND sql_port > 0 ORDER BY partition_id ASC, role ASC LIMIT 9223372036854775807)
                                  [2023-10-18 19:51:39.785611] INFO [PROXY] ob_route_utils.cpp:414 [89116][Y0-7FB059DA83A0] [lt=66] [dc=0] this is all_dummy table, use tenant servers((entry={{this:0x7fb05c698d10, ref_count:1, cr_version:87, cr_id:0, create_time_us:1697629899785588, last_valid_time_us:1697629899785397, last_access_time_us:1697629899785397, last_update_time_us:0, schema_version:336, tenant_version:0, time_for_expired:0, state:"AVAIL"}, this:0x7fb05c698d10, is_inited:true, is_dummy_entry:true, is_entry_from_rslist:false, is_empty_entry_allowed:false, is_need_force_flush:false, has_dup_replica:false, cr_id:0, name:{cluster_name:"obdemo", tenant_name:"oboracle", database_name:"oceanbase", package_name:"", table_name:"__all_dummy"}, table_id:1102810162659463, table_type:"UNKNOWN", part_num:1, replica_num:3, buf_len:34, buf_start:0x7fb05c698e18, tenant_servers:{this:0x7fb05aebfe60, is_inited:true, server_count:3, replica_count:3, partition_count:1, next_partition_idx:0, server_array:0x7fb05d5c4810, server_array_:[[0]{server:"10.0.0.63:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, [1]{server:"10.0.0.61:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, [2]{server:"10.0.0.62:2881",is_dup_replica:false, role:"FOLLOWER", type:"FULL"}]}}, server_list=[[0]{server:"10.0.0.61:2881", is_dup_replica:false, role:"LEADER", type:"FULL"}, [1]{server:"10.0.0.62:2881", is_dup_replica:false, role:"FOLLOWER", type:"FULL"}, [2]{server:"10.0.0.63:2881",is_dup_replica:false, role:"FOLLOWER", type:"FULL"}])
                                  [2023-10-18 19:51:39.786431] INFO [PROXY.SS] ob_mysql_server_session.cpp:105 [89116][Y0-7FB059DA83A0] [lt=75] [dc=0] server session born(ss_id=49, server_ip={10.0.0.61:2881}, cs_id=5, proxy_sessid=0, server_type=1)
                                  [2023-10-18 19:51:39.792219] INFO [PROXY.TXN] ob_mysql_transact.cpp:3873 [89116][Y0-7FB059DA83A0] [lt=18] [dc=0] succ to set proxy_sessid(cs_id=5, proxy_sessid=720576208556654599, server_ip={10.0.0.61:2881}, ss_id=49, server_sessid=3221663711, is_proxy_mysql_client=false, ss_fd=59, client_addr="10.0.0.65:39364")

                                  这段日志主要是 OBProxy 连接 OB 集群获取 RS 服务的节点列表,SQL 如下。

                                    SELECT /*+READ_CONSISTENCY(WEAK)*/ tenant_name,database_name,table_name,partition_id,svr_ip,ROLE FROM oceanbase.__all_virtual_proxy_schema WHERE tenant_name = 'oboracle' AND database_name = 'oceanbase' AND table_name = '__all_dummy' AND sql_port > 0 ORDER BY partition_id ASC, role ASC LIMIT 9223372036854775807;

                                    查询成功后,OBProxy 跟 OBServer 后端会话创建成功(ss_id=49)。

                                    有时候会在这个环节看到 Slow query 示例,这个多出现在集群不稳定或者压力太大性能变差或者网络变差时。

                                      [2023-10-18 19:51:39.794179] INFO [PROXY.SM] ob_mysql_sm.cpp:8726 [89116][Y0-7FB059DA83A0] [lt=14] [dc=0] Slow Query: ((client_ip={10.0.0.65:39364}, server_ip={10.0.0.61:2881}, obproxy_client_port={10.0.0.62:56751}, server_trace_id=Y0-0, route_type=ROUTE_TYPE_MAX, user_name=tpcc, tenant_name=oboracle, cluster_name=obdemo, logic_database_name=, logic_tenant_name=, ob_proxy_protocol=0, cs_id=5, proxy_sessid=720576208556654599, ss_id=49,server_sessid=3221663711, sm_id=7, cmd_size_stats={client_request_bytes:364, server_request_bytes:298, server_response_bytes:152, client_response_bytes:152}, cmd_time_stats={client_transaction_idle_time_us=36535, client_request_read_time_us=305, client_request_analyze_time_us=283,cluster_resource_create_time_us=0, pl_lookup_time_us=6956, pl_process_time_us=45, bl_lookup_time_us=0, bl_process_time_us=0, congestion_control_time_us=5, congestion_process_time_us=0, do_observer_open_time_us=556, server_connect_time_us=503, server_sync_session_variable_time_us=0, server_send_saved_login_time_us=0, server_send_use_database_time_us=0, server_send_session_variable_time_us=0, server_send_all_session_variable_time_us=0, server_send_last_insert_id_time_us=0, server_send_start_trans_time_us=0, server_send_xa_start_time_us=0, build_server_request_time_us=3, plugin_compress_request_time_us=0, prepare_send_request_to_server_time_us=13779, server_request_write_time_us=55, server_process_request_time_us=1298, server_response_read_time_us=12, plugin_decompress_response_time_us=0, server_response_analyze_time_us=7, ok_packet_trim_time_us=392, client_response_write_time_us=435, request_total_time_us=15595}, sql=COM_LOGIN)

                                      last_sql 是 COM_LOGIN 表示这个慢 SQL 是 OBServer 登录。通常不应该是慢 SQL。当登录都慢的时候,下面接着获取时区的 SQL 也容易成为慢 SQL。这个查询同时也触发了后端新会话的建立(ss_id=50,OBServer 节点跟前面不一样)。所以,这里可以看到 OBProxy 一个前端会话建立 2 个后端会话的过程。

                                        [2023-10-18 19:51:39.840578] INFO [PROXY.TXN] ob_mysql_transact.cpp:302 [89116][Y0-7FB059DA83A0] [lt=28] [dc=0] user first dml got(cs_id=5, sql=select * from SYS.ALL_VIRTUAL_TENANT_TIME_ZONE_NAME_REAL_AGENT where rownum < 2;)
                                        [2023-10-18 19:51:39.840902] INFO [PROXY.SS] ob_mysql_server_session.cpp:105 [89116][Y0-7FB059DA83A0] [lt=31] [dc=0] server session born(ss_id=50, server_ip={10.0.0.62:2881}, cs_id=5, proxy_sessid=720576208556654599, server_type=2)
                                        [2023-10-18 19:51:39.860309] INFO [PROXY.SM] ob_mysql_sm.cpp:8726 [89116][Y0-7FB059DA83A0] [lt=42] [dc=0] Slow Query: ((client_ip={10.0.0.65:39364}, server_ip={10.0.0.62:2881}, obproxy_client_port={10.0.0.62:44769}, server_trace_id=, route_type=ROUTE_TYPE_MAX, user_name=tpcc, tenant_name=oboracle, cluster_name=obdemo, logic_database_name=, logic_tenant_name=, ob_proxy_protocol=0, cs_id=5, proxy_sessid=720576208556654599, ss_id=50, server_sessid=3221949588, sm_id=7, cmd_size_stats={client_request_bytes:85, server_request_bytes:103, server_response_bytes:0, client_response_bytes:582}, cmd_time_stats={client_transaction_idle_time_us=0, client_request_read_time_us=41, client_request_analyze_time_us=39, cluster_resource_create_time_us=0, pl_lookup_time_us=6, pl_process_time_us=63, bl_lookup_time_us=0, bl_process_time_us=0, congestion_control_time_us=7, congestion_process_time_us=0, do_observer_open_time_us=295, server_connect_time_us=196, server_sync_session_variable_time_us=14955, server_send_saved_login_time_us=3014, server_send_use_database_time_us=11081, server_send_session_variable_time_us=860, server_send_all_session_variable_time_us=0, server_send_last_insert_id_time_us=0, server_send_start_trans_time_us=0, server_send_xa_start_time_us=0, build_server_request_time_us=68,plugin_compress_request_time_us=0, prepare_send_request_to_server_time_us=15987, server_request_write_time_us=33, server_process_request_time_us=3620, server_response_read_time_us=6, plugin_decompress_response_time_us=20, server_response_analyze_time_us=11,ok_packet_trim_time_us=0, client_response_write_time_us=43, request_total_time_us=19769}, sql=select * from SYS.ALL_VIRTUAL_TENANT_TIME_ZONE_NAME_REAL_AGENT where rownum < 2;)

                                        接着 BenchmarkSQL 正常断开连接,对应 OBProxy.log 里的日志如下。先关闭两个后端会话,再关闭客户端会话。

                                          [2023-10-18 19:51:42.252010] INFO [PROXY.SM] ob_mysql_sm.cpp:524 [89116][Y0-7FB059DA83A0] [lt=91] [dc=0] ObMysqlSM::state_client_request_read(event="VC_EVENT_EOS", sm_id=7, client_vc=0x7fb059d98120)
                                          [2023-10-18 19:51:42.252037] INFO [PROXY.TXN] ob_mysql_transact.cpp:80 [89116][Y0-7FB059DA83A0] [lt=25] [dc=0] [ObMysqlTransact::bad_request] parser marked request bad
                                          [2023-10-18 19:51:42.252044] WARN [PROXY.SM] setup_error_transfer (ob_mysql_sm.cpp:7627) [89116][Y0-7FB059DA83A0] [lt=5] [dc=0] [setup_error_transfer] Now closing connection(sm_id=7, request_cmd="Sleep", sql_cmd="GET PIECE DATA", sql=)
                                          [2023-10-18 19:51:42.252092] INFO [PROXY.SS] ob_mysql_client_session.cpp:647 [89116][Y0-7FB059DA83A0] [lt=12] [dc=0] client session do_io_close((*this={this:0x7fb05 .......
                                          [2023-10-18 19:51:42.252191] INFO [PROXY.SS] ob_mysql_server_session.cpp:145 [89116][Y0-7FB059DA83A0] [lt=97] [dc=0] server session do_io_close((*this={ss_id:50, server_sessid:3221949588, server_ip:{10.0.0.62:2881}, ......
                                          [2023-10-18 19:51:42.252317] INFO [PROXY.SS] ob_mysql_server_session.cpp:177 [89116][Y0-7FB059DA83A0] [lt=76] [dc=0] server session is closing(ss_id=50, server_sessid=3221949588, server_ip={10.0.0.62:2881}, cs_id=5, proxy_sessid=720576208556654599)
                                          [2023-10-18 19:51:42.252357] INFO [PROXY.SS] ob_mysql_server_session.cpp:145 [89116][Y0-7FB059DA83A0] [lt=8] [dc=0] server session do_io_close((*this={ss_id:49, server_sessid:3221663711, server_ip:{10.0.0.61:2881}, is_inited:true, ......
                                          [2023-10-18 19:51:42.252521] INFO [PROXY.SS] ob_mysql_server_session.cpp:177 [89116][Y0-7FB059DA83A0] [lt=117] [dc=0] server session is closing(ss_id=49, server_sessid=3221663711, server_ip={10.0.0.61:2881}, cs_id=5, proxy_sessid=720576208556654599)
                                          [2023-10-18 19:51:42.252554] INFO [PROXY.CS] ob_mysql_client_session.cpp:93 [89116][Y0-7FB059DA83A0] [lt=15] [dc=0] client session destroy(cs_id=5, proxy_sessid=720576208556654599, client_vc=NULL)
                                          [2023-10-18 19:51:42.252624] INFO [PROXY.SM] ob_mysql_sm.cpp:8179 [89116][Y0-7FB059DA83A0] [lt=7] [dc=0] deallocating sm(sm_id=7)

                                          场景四:应用事务场景。同样是 BenchmarkSQL 里的场景。通过 GV$OB_SQL_AUDIT 抓取事务记录如下。

                                          分析如下:

                                          • SET、SHOW、SELECT @@ 、ALTER 语句不访问具体的表,不开启事务,随机路由。普通的 SELECT 查询不开启事务。
                                          • SELECT FOR UPDATE 语句开启事务,transaction_hash 值不为 0 . 此后在事务没有提交之前该事务所有语句的 transaction_hash 保持一致(非0)。
                                          • 开启事务后,事务所有 SQL 发往同一个节点 `SVR_IP` 。表 `bmsql_customer` 是非分区表,表 `bmsql_customer`、`bmsql_district` 是分区表,分区分布在两个节点上,所以这里碰巧对应的分区不在节点 `10.0.0.62` 上,所以 SQL 执行计划类型有远程的,也有分布式的。
                                          • 使用列 `SID` 的值去 OBProxy 日志里查看相关记录(server_sessid=xxxxx)。

                                          这个时候的 OBProxy.log 里对 SQL记录的并不完整,因为 OBProxy.log 只记录慢SQL。这里就不分析了。

                                          这里我还没有使用表分组。使用表分组后,上面场景的分布式执行计划和远程执行计划应该能消除。

                                          关于 JAVA 应用的日志特点,利用 BenchmarkSQL 也能看出一点。这里以内存报错时的信息为例。

                                          在Java 的数据库异常中,如果有(conn=xxx)字眼,表示当前连接发生异常。连接的ID xxx 就对应 OBProxy 日志里的 cs_id 字眼。通过这个信息也可以查到相关 OBProxy 日志,进而获取更多信息再去查看 OBServer 日志。当然内存报错问题通过 OB 集群的SQL 审计视图也能逐步定位到相应报错地方。

                                            [admin@server062 log]$ cat obproxy.log|grep "cs_id=524292" obproxy.log |vim -

                                            总结OBProxy 连接包含几个有用的 ID 信息,可以跟应用客户端日志、OB 上 SQL 审计视图、OB 内部日志信息关联。极大的方便从日志里诊断链路异常问题。
                                            更多内容参考:

                                            • OBCE V3 培训实验:应用到数据库全链路分析(上)
                                            • OBCE V3 培训试验:内存管理分享
                                            • OceanBase 2.x体验:用BenchmarkSQL跑TPC-C

                                            相关文章

                                            Oracle如何使用授予和撤销权限的语法和示例
                                            Awesome Project: 探索 MatrixOrigin 云原生分布式数据库
                                            下载丨66页PDF,云和恩墨技术通讯(2024年7月刊)
                                            社区版oceanbase安装
                                            Oracle 导出CSV工具-sqluldr2
                                            ETL数据集成丨快速将MySQL数据迁移至Doris数据库

                                            发布评论