欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

SQL Server一个特殊的阻塞案例分析2

程序员文章站 2022-11-04 09:10:25
最近发现一个非常奇怪的阻塞问题,如下截图所示(来自监控工具DPA),会话583被会话1036阻塞,而且阻塞发生在tempdb,被阻塞的SQL如下截图所示,会话等待类型为LCK_M_S 因为DPA工具不好截取全部信息,使用下面SQL语句获取了阻塞会话的详细信息如下,来自Microsoft SQL Se... ......

最近发现一个非常奇怪的阻塞问题,如下截图所示(来自监控工具dpa),会话583被会话1036阻塞,而且阻塞发生在tempdb,被阻塞的sql如下截图所示,会话等待类型为lck_m_s

 

 

 

 

 

 

 

因为dpa工具不好截取全部信息,使用下面sql语句获取了阻塞会话的详细信息如下,来自microsoft sql server management studio - transact-sql intellisense的的sql被阻塞了

 

select  s.session_id ,
        r.status ,
        r.blocking_session_id 'blk by' ,
        r.wait_type ,
        wait_resource ,
        r.wait_time / ( 1000 * 60 ) 'wait m' ,
        r.cpu_time ,
        r.logical_reads ,
        r.reads ,
        r.writes ,
        r.total_elapsed_time / ( 1000 * 60 ) 'elaps m' ,
        substring(st.text, ( r.statement_start_offset / 2 ) + 1,
                  ( ( case r.statement_end_offset
                        when -1 then datalength(st.text)
                        else r.statement_end_offset
                      end - r.statement_start_offset ) / 2 ) + 1) as statement_text ,
        coalesce(quotename(db_name(st.dbid)) + n'.'
                 + quotename(object_schema_name(st.objectid, st.dbid)) + n'.'
                 + quotename(object_name(st.objectid, st.dbid)), '') as command_text ,
        r.command ,
        s.login_name ,
        s.host_name ,
        s.program_name ,
        s.last_request_end_time ,
        s.login_time ,
        r.open_transaction_count
from    sys.dm_exec_sessions as s
        join sys.dm_exec_requests as r on r.session_id = s.session_id
        cross apply sys.dm_exec_sql_text(r.sql_handle) as st
where r.session_id =583
order by r.cpu_time desc;

 

 

 

而会话1036处于睡眠(sleeping)状态,有一个open的事务。通过wait_resource的值key: 2:1125899909070848 (79c68d10aa42)我们定位到是一个系统表sysschobj.它是sys.triggers这个系统视图的基表。如下所示:

 

 

 

 

 

 

 

set quoted_identifier on
set ansi_nulls on
go
create view sys.triggers as
    select o.name,
        object_id = o.id,
        parent_class = o.pclass,
        parent_class_desc = pc.name,
        parent_id = o.pid,
        type = o.type,
        type_desc = n.name,
        create_date = o.created,
        modify_date = o.modified,
        is_ms_shipped = sysconv(bit, o.status & 1),         -- objall_msshipped
        is_disabled = sysconv(bit, o.status & 256),         -- objtrg_disabled
        is_not_for_replication = sysconv(bit, o.status & 512), -- objtrg_notforrepl
        is_instead_of_trigger = sysconv(bit, o.status & 1024)     -- objtrg_insteadof
    from sys.sysschobjs o
    left join sys.syspalnames n on n.class = 'obty' and n.value = o.type
    left join sys.syspalvalues pc on pc.class = 'uncl' and pc.value = o.pclass
    where o.type in ('ta','tr') and o.pclass <> 100
        and has_access('tr', o.id, o.pid, o.nsclass) = 1
 
go

 

 

 

使用下面脚本可以查到对应的锁信息(其实,正确的做法应该是加上条件过滤spid),从而可以判断,会话1036可能因为事务中,间接涉及对基表sysschobj的dml操作(有可能是衍生的系统视图),而由于事务没有提交也没有释放,导致在基表上的锁没有释放,从而导致出现这样一个阻塞。个人也很好奇是什么样的sql会导致这样一个阻塞出现,不过这种阻塞源头是sleeping状态的会话,已经无法定位阻塞源头sql,只能通过profile去跟踪捕获。但是这种阻塞出现时机和场景不清楚,很难一下子捕获到。

 

 
use tempdb
go
select  request_session_id as spid ,
        db_name(resource_database_id) as dbname ,
        case when resource_type = 'object'
             then object_name(resource_associated_entity_id)
             when resource_associated_entity_id = 0 then 'n/a'
             else object_name(p.object_id)
        end as entity_name ,
        index_id ,
        resource_type as resource ,
        resource_description as description ,
        request_mode as mode ,
        request_status as status
from    sys.dm_tran_locks t
left join sys.partitions p
on      p.partition_id = t.resource_associated_entity_id
where   resource_database_id = db_id()

 

 

 

 

今天早上有发现同样的阻塞,猜测是sql里面有创建临时表或对临时表做ddl或dml操作的迹象。不过依然没有找到源头sql,只能静候下次机会。找出罪魁祸首。不过让我头痛的是,对于这种特殊阻塞情况,即使是使用profile跟踪也是相当麻烦,因为不知道它什么时间点出现、出现在那个会话。

 

select  tl.resource_type as [resource type] ,
        db_name(tl.resource_database_id) as [db name] ,
        case tl.resource_type
          when 'object'
          then object_name(tl.resource_associated_entity_id,
                           tl.resource_database_id)
          when 'database' then 'db'
          else case when tl.resource_database_id = db_id()
                    then ( select   object_name(object_id,
                                                tl.resource_database_id)
                           from     sys.partitions
                           where    hobt_id = tl.resource_associated_entity_id
                         )
                    else '(run under db context)'
               end
        end as [object] ,
        tl.resource_description as [resource] ,
        tl.request_session_id as [session] ,
        tl.request_mode as [mode] ,
        tl.request_status as [status] ,
        wt.wait_duration_ms as [wait (ms)] ,
        qi.sql ,
        qi.query_plan
from    sys.dm_tran_locks tl with ( nolock )
left outer join sys.dm_os_waiting_tasks wt with ( nolock )
on      tl.lock_owner_address = wt.resource_address
        and tl.request_status = 'wait'
outer apply ( select    substring(s.text,
                                  ( er.statement_start_offset / 2 ) + 1,
                                  ( ( case er.statement_end_offset
                                        when -1 then datalength(s.text)
                                        else er.statement_end_offset
                                      end - er.statement_start_offset ) / 2 )
                                  + 1) as sql ,
                        qp.query_plan
              from      sys.dm_exec_requests er with ( nolock )
              cross apply sys.dm_exec_sql_text(er.sql_handle) s
              outer apply sys.dm_exec_query_plan(er.plan_handle) qp
              where     tl.request_session_id = er.session_id
            ) qi
where   tl.request_session_id =134

 

 

 

链接资料

一个特殊的sql server阻塞案例分析

sql server一个特殊的阻塞案例分析2