欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

SQL Server Alwayson架构下 服务器 各虚拟IP漂移监控告警的功能实现 -1(服务器视角)

程序员文章站 2023-01-01 15:36:23
1.需求概括 我们知道,在SQL Server Alwayson 架构中,有多种虚拟IP,例如 WindowsCluster IP,ListenIP,角色高可用性IP(类似于侦听IP)。在某些条件下,例如系统故障,会触发虚拟IP的漂移,如何高效率、低延迟、更好地监控IP漂移情况,是我们DB的一个重要 ......

1.需求概括

我们知道,在sql server alwayson 架构中,有多种虚拟ip,例如 windowscluster ip,listenip,角色高可用性ip(类似于侦听ip)。在某些条件下,例如系统故障,会触发虚拟ip的漂移,如何高效率、低延迟、更好地监控ip漂移情况,是我们db的一个重要工作。

下面是我们的一个通过sql server 存储过程来实现的方案。

2.基本原理 

 周期性收集当前server 上相应的ip地址,并与上个周期收集的结果比较判断,看那些ip发生了漂移变化。其主要流程图如下;

SQL Server Alwayson架构下 服务器 各虚拟IP漂移监控告警的功能实现 -1(服务器视角)

 

 

3.代码实现

表 dba_serveripdatabase_overcheck,主要存储 当前 (本收集周期) server的信息(主要是ip信息、servername信息等),其创建脚本如下;

/****** object:  table [dbo].[dba_serveripdatabase_overcheck]    script date: 2019/6/27 16:01:27 ******/
set ansi_nulls on
go

set quoted_identifier on
go

set ansi_padding on
go

create table [dbo].[dba_serveripdatabase_overcheck](
    [localserverip] [varchar](20) null,
    [clustername] [varchar](50) null,
    [serverip] [varchar](20) null,
    [servername] [varchar](100) null,
    [serverfullname] [varchar](100) null,
    [serveriptype] [varchar](20) null,
    [databasename] [varchar](300) null,
    [disabledflag] [varchar](1) null,
    [createtime] [datetime] null,
    [createby] [varchar](50) null,
    [modifytime] [datetime] null,
    [modifyby] [varchar](50) null
) on [primary]

go

set ansi_padding off
go

exec sys.sp_addextendedproperty @name=n'ms_description', @value=n'windows集群名称' , @level0type=n'schema',@level0name=n'dbo', @level1type=n'table',@level1name=n'dba_serveripdatabase_overcheck', @level2type=n'column',@level2name=n'clustername'
go

exec sys.sp_addextendedproperty @name=n'ms_description', @value=n'ip地址' , @level0type=n'schema',@level0name=n'dbo', @level1type=n'table',@level1name=n'dba_serveripdatabase_overcheck', @level2type=n'column',@level2name=n'serverip'
go

exec sys.sp_addextendedproperty @name=n'ms_description', @value=n'计算机对象名称' , @level0type=n'schema',@level0name=n'dbo', @level1type=n'table',@level1name=n'dba_serveripdatabase_overcheck', @level2type=n'column',@level2name=n'servername'
go

exec sys.sp_addextendedproperty @name=n'ms_description', @value=n'计算机对象全称' , @level0type=n'schema',@level0name=n'dbo', @level1type=n'table',@level1name=n'dba_serveripdatabase_overcheck', @level2type=n'column',@level2name=n'serverfullname'
go

exec sys.sp_addextendedproperty @name=n'ms_description', @value=n'计算机对象全称' , @level0type=n'schema',@level0name=n'dbo', @level1type=n'table',@level1name=n'dba_serveripdatabase_overcheck', @level2type=n'column',@level2name=n'serveriptype'
go

exec sys.sp_addextendedproperty @name=n'ms_description', @value=n'0实时有效,1第一次失效,2第二次失效,3第三次失效,4第四次失效,5第五次失效,彻底删除' , @level0type=n'schema',@level0name=n'dbo', @level1type=n'table',@level1name=n'dba_serveripdatabase_overcheck', @level2type=n'column',@level2name=n'disabledflag'
go

exec sys.sp_addextendedproperty @name=n'ms_description', @value=n'创建时间' , @level0type=n'schema',@level0name=n'dbo', @level1type=n'table',@level1name=n'dba_serveripdatabase_overcheck', @level2type=n'column',@level2name=n'createtime'
go

exec sys.sp_addextendedproperty @name=n'ms_description', @value=n'创建人' , @level0type=n'schema',@level0name=n'dbo', @level1type=n'table',@level1name=n'dba_serveripdatabase_overcheck', @level2type=n'column',@level2name=n'createby'
go

 

表 dba_serveripdatabase_overcheckoriginorigin,主要存储 以前 (前一个收集周期) server的信息,是用来比较变化的基准表,其表结构 与表dba_serveripdatabase_overcheck相同,创建脚本如下:

 

/****** object:  table [dbo].[dba_serveripdatabase_overcheckoriginorigin]    script date: 2019/6/27 16:56:28 ******/
set ansi_nulls on
go

set quoted_identifier on
go

set ansi_padding on
go

create table [dbo].[dba_serveripdatabase_overcheckoriginorigin](
    [localserverip] [varchar](20) null,
    [clustername] [varchar](50) null,
    [serverip] [varchar](20) null,
    [servername] [varchar](100) null,
    [serverfullname] [varchar](100) null,
    [serveriptype] [varchar](20) null,
    [databasename] [varchar](300) null,
    [disabledflag] [varchar](1) null,
    [createtime] [datetime] null,
    [createby] [varchar](50) null,
    [modifytime] [datetime] null,
    [modifyby] [varchar](50) null
) on [primary]

go

set ansi_padding off
go

 

具体的存储过程 usp_dba_serveripdatabase_overcheck,用来实现 收集、比较、告警等功能,代码实现如下:

 

/****** object:  storedprocedure [dbo].[usp_dba_serveripdatabase_overcheck]    script date: 2019/6/27 15:26:06 ******/
set ansi_nulls on
go

set quoted_identifier on
go


create procedure [dbo].[usp_dba_serveripdatabase_overcheck]
as
begin 

declare @ipline varchar(200) 
        declare @ip varchar(20)
        declare @pingname varchar(40)
        declare @servername varchar(100)
        declare @serverfullname varchar(100)

        declare @currdatetime nvarchar(20)
        declare @prediffdatetime nvarchar(20) =''

        set @currdatetime=convert(varchar(19),getdate(),120) 
 
        set nocount on

        declare @serverip varchar(20)
        set @servername=@@servername
        set @serverip = cast(connectionproperty('local_net_address') as varchar(20)) 

        ----set @serverip ='xxx.xxx.xxx.xxx'   --如果不准确的话,请手动定义
            
        delete from dba_serveripdatabase_overcheck 
        where createtime< convert(varchar(19),dateadd( hh,-10,getdate()),120) 

        set @ip = null 
        if object_id('tempdb..#tempserverip') is not null
            drop table #tempserverip 
        create table #tempserverip ( ipline varchar(200) ) 
        insert  #tempserverip
                exec master..xp_cmdshell 'ipconfig /all ' 
        
        insert  into dba_serveripdatabase_overcheck
                ( localserverip,serverip,disabledflag,createtime,createby
                )
                select @serverip, p.serverip,'0'as disabledflag,getdate() as createtime ,@@servername as createby
                from    dbo.dba_serveripdatabase_overcheck i
                        right join ( select rtrim(ltrim(replace(substring(ipline,
                                                              charindex(':',
                                                              ipline) + 1, 20),
                                                              '(首选)', ''))) as 'serverip'
                                     from   #tempserverip
                                     where  upper(ipline) like '%ipv4 地址%'--这里需要注意一下,系统不同这里的匹配值就不同 
                                            and upper(ipline) not like '%192.168.%'
                                            and upper(ipline) not like '%169.254.%'
                                   ) p on i.serverip = p.serverip
                where   i.serverip is null  --只关注漂来飘往数据

    --0002 -20180530 针对20180530持续告警问题,发现告警时间超过预期,进行优化。聚焦点再表dba_serveripdatabase_overcheck中的createtime栏位,精准更新  begin
    update i set i.createtime=getdate()
    from    dbo.dba_serveripdatabase_overcheck i
                        right join ( select rtrim(ltrim(replace(substring(ipline,
                                                              charindex(':',
                                                              ipline) + 1, 20),
                                                              '(首选)', ''))) as 'serverip'
                                     from   #tempserverip
                                     where  upper(ipline) like '%ipv4 地址%'--这里需要注意一下,系统不同这里的匹配值就不同 
                                            and upper(ipline) not like '%192.168.%'
                                            and upper(ipline) not like '%169.254.%'
                                   ) p on i.serverip = p.serverip

    ------
        declare ip cursor
        for
            select  serverip
            from    dbo.dba_serveripdatabase_overcheck where disabledflag is not null and disabledflag=0
        open ip
        fetch next from ip into @ip
        while @@fetch_status = 0
            begin
                --set @pingname = 'ping -a ' + @ip
                set @pingname = 'ping -a ' + @ip + ' -n 1 -l 10'

                truncate table #tempserverip
                insert  #tempserverip
                        exec master..xp_cmdshell @pingname

                select  @servername = replace(rtrim(ltrim(substring(ipline, 8,
                                                              charindex('[',
                                                              ipline) - 8))),
                                              '.xxxxxx.com', '') ,-----加域的电脑,计算机名字可能带有域名,请根据实际情况替换
                        @serverfullname = rtrim(ltrim(substring(ipline, 8,
                                                              charindex('[',
                                                              ipline) - 8)))
                from    #tempserverip
                where   ipline like '%正在 ping%'

                update  dbo.dba_serveripdatabase_overcheck
                set     servername = @servername ,
                        serverfullname = @serverfullname
                where   serverip = @ip

                fetch next from ip into @ip
            end
        close ip 
        deallocate ip

        update  dbo.dba_serveripdatabase_overcheck
        set     serveriptype = 'localhost'
        where   servername = @@servername


        if object_id('sys.availability_group_listener_ip_addresses') is not null 
        begin
          update a set a.serveriptype='listenip'
          from dba_serveripdatabase_overcheck a inner join sys.availability_group_listener_ip_addresses b
          on a.serverip=b.ip_address
        end 

        if object_id('sys.dm_hadr_cluster') is not null
            begin
                update dbo.dba_serveripdatabase_overcheck set clustername=(select cluster_name from sys.dm_hadr_cluster)
                update dbo.dba_serveripdatabase_overcheck set serveriptype='windowscluster' where servername=(select cluster_name from sys.dm_hadr_cluster)
            end
        if object_id('sys.dm_hadr_cluster') is null
            begin
                update dbo.dba_serveripdatabase_overcheck set clustername='not cluster'
            end
--0002 -20180530  针对20180530持续告警问题,发现告警时间超过预期,进行优化。聚焦点再表dba_serveripdatabase_overcheck中的createtime栏位,精准更新  begin
        update dba_serveripdatabase_overcheck set  databasename=stuff(
            (select ',' + name from sys.databases  
                               where name not in ('master'
                                 ,'tempdb','model','msdb','reportserver','reportservertempdb','distribution')
                               for xml path('')   ),
                            1,1,'')
-----002 end 
        --the mail alarm 
           declare @sql as varchar(200)
        declare @subject as varchar(200)=n'db server ip 有漂移,请检查确认!'
        declare @body as nvarchar(max)=''

        select @prediffdatetime= createtime  from dba_serveripdatabase_overcheckoriginorigin
        order by createtime
        
        select  top 0 a.*  into #temp_dba_serveripdatabase_overcheck_diff from dba_serveripdatabase_overcheck a inner join dba_serveripdatabase_overcheckoriginorigin  b
         on a.serverip =b.serverip  

        if exists(select * from dba_serveripdatabase_overcheck a left join dba_serveripdatabase_overcheckoriginorigin  b on a.serverip =b.serverip  where b.serverip is null)
        begin

           insert  into  #temp_dba_serveripdatabase_overcheck_diff
           select a.*  from dba_serveripdatabase_overcheck a left join dba_serveripdatabase_overcheckoriginorigin  b on a.serverip =b.serverip  where b.serverip is null

            if exists(    select * from #temp_dba_serveripdatabase_overcheck_diff  ) 
            begin
            set @body= n'<html>' 
                    + n'<style type="text/css">' 
                    + n' td {border:solid #9ec9ec;  border-width:1px 1px 1px 1px; padding:4px 0px;}' 
                    + n' table {border:1px solid #9ec9ec;width:80%;border-width:0px 0px 0px 0px;font-size:14px}' 
                    + n'</style>'
                    + n'<h1 style="color:#ff0000;font-size:14px"></h1>' 
            set @body=@body+'<body><font color=#0000cc>dear all,<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;此list是监控到过去10 min server ip 异常情况,'+@serverip +'服务器有新ip创建声明(新增), 请及时check。具体数据如下:;<br><br><table>' 
            set @body=@body+'<tr bgcolor=#ffff00 align="center"><td>clustername</td><td>serverip</td><td>servername</td><td>serveriptype</td><td>databasename</td><td>disabledflag</td><td>当前时间</td><td>差异采样时间</td></tr>'    
            select @body=@body+'<tr><td>'+clustername+'</td><td>'+serverip+'</td><td>'+servername+'</td><td>'+serveriptype+'</td><td>'+databasename+'</td><td>'+ disabledflag+'</td><td>'+@currdatetime+'</td><td>'+ @prediffdatetime+'</td></tr>'
             from #temp_dba_serveripdatabase_overcheck_diff     

             set @body = @body +'</table><font color=#0000cc><br><br>dba<br>best wishes</body><html>'
            end 
       end


        if exists(select * from dba_serveripdatabase_overcheck a right join dba_serveripdatabase_overcheckoriginorigin  b on a.serverip =b.serverip  where a.serverip is null)
        begin

           delete from #temp_dba_serveripdatabase_overcheck_diff
           insert  into  #temp_dba_serveripdatabase_overcheck_diff
           select a.*  from dba_serveripdatabase_overcheck a right join dba_serveripdatabase_overcheckoriginorigin  b on a.serverip =b.serverip  where a.serverip is null

            if exists(    select * from #temp_dba_serveripdatabase_overcheck_diff  ) 
            begin
            set @body= n'<html>' 
                    + n'<style type="text/css">' 
                    + n' td {border:solid #9ec9ec;  border-width:1px 1px 1px 1px; padding:4px 0px;}' 
                    + n' table {border:1px solid #9ec9ec;width:80%;border-width:0px 0px 0px 0px;font-size:14px}' 
                    + n'</style>'
                    + n'<h1 style="color:#ff0000;font-size:14px"></h1>' 
            set @body=@body+'<body><font color=#0000cc>dear all,<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;此list是监控到过去10 min server ip 异常情况,'+@serverip +'服务器有ip漂移(消减), 请及时check。具体数据如下:;<br><br><table>' 
            set @body=@body+'<tr bgcolor=#ffff00 align="center"><td>clustername</td><td>serverip</td><td>servername</td><td>serveriptype</td><td>databasename</td><td>disabledflag</td><td>当前时间</td><td>差异采样时间</td></tr>'    
            select @body=@body+'<tr><td>'+clustername+'</td><td>'+serverip+'</td><td>'+servername+'</td><td>'+serveriptype+'</td><td>'+databasename+'</td><td>'+ disabledflag+'</td><td>'+@currdatetime+'</td><td>'+ @prediffdatetime+'</td></tr>'
             from #temp_dba_serveripdatabase_overcheck_diff     

             set @body = @body +'</table><font color=#0000cc><br><br>dba<br>best wishes</body><html>'
            end 

       end  
            set @body=replace(@body,'''','')
            if replace(@body,' ','')<>''
                begin

                    declare @allemailtoaddress varchar(3000)=''
                    declare @allemailccaddress varchar(3000)=''
                    select @allemailtoaddress='hanmeimei;xiaoming;lilei'

                    select @allemailccaddress='laoban'


                    exec msdb..sp_send_dbmail @profile_name =  'automail'               -- profile 名称,请检查此参数,根据实际情况进行替换 
                     ,@recipients   =  @allemailtoaddress         -- 收件人邮箱 
                     ,@copy_recipients=@allemailccaddress
                     ,@subject      =  @subject -- 邮件标题 
                     ,@body         =  @body            -- 邮件内容 
                     ,@body_format  =  'html'                      -- 邮件格式 
                     ,@file_attachments=''
                     ,@importance = 'high' -- varchar(10) 告警级别
                end


        ------------------新增立即插入-----------

        insert into [dbo].[dba_serveripdatabase_overcheckoriginorigin]
        select a.* from  dba_serveripdatabase_overcheck a left join dba_serveripdatabase_overcheckoriginorigin b
        on a.serverip=b.serverip
        where b.serverip is null

        ---漂移后,指定时间段后直接删除过时数据,暂定八个小时。【即如果有漂移(减少),减少的ip信息,则在指定时间后,删除。】

        delete b
        from  dba_serveripdatabase_overcheck a right join dba_serveripdatabase_overcheckoriginorigin b
        on a.serverip=b.serverip
        where a.serverip is null
        and b.createtime< convert(varchar(19),dateadd( hh,-3,getdate()),120)

-----将数据插入到远程server db中,远程server有一个sp过程,用来判断漂移前/后db是否有变化。远程的sp主要是依据 localserverip 和 serverip 对应关系变化情况来判断。这段代码省略,下次再描述 set nocount off end go

 

 4.功能实现

例如当 服务器有新ip创建声明(新增)时,其发出的告警邮件如下:

 

SQL Server Alwayson架构下 服务器 各虚拟IP漂移监控告警的功能实现 -1(服务器视角)

 

 

本文版权归作者所有,未经作者同意不得转载,谢谢配合!!!

本文版权归作者所有,未经作者同意不得转载,谢谢配合!!!

本文版权归作者所有,未经作者同意不得转载,谢谢配合!!!