阿里二面:使用 Nacos 做注册中心怎么做优雅发布?

2023年 7月 18日 218.4k 0

大家好,我是君哥。今天重新发一下这篇文章。

今天来聊一聊使用 Nacos 做注册中心怎么做优雅发布。

跟其他的注册中心一样,Nacos 作为注册中心的使用如下图:

图片图片

Service Provider 启动后注册到 Nacos Server,Service Consumer 则从 Nacos Server 拉取服务列表,根据一定算法选择一个 Service Provider 来发送请求。

1.优雅要求

对于优雅发布,要求是 Service Provider 上线(注册到 Nacos)后,服务能够正常地接收和处理请求,而 Service Provider 停服后,则不会再收到请求。这就有两个要求:

  • 优雅上线:Service Provider 发布完成之前,Service Consumer 不应该从服务列表中拉取到这个服务地址;
  • 优雅下线:Service Provider 下线后,Service Consumer 不会从服务列表中拉取到这个服务地址。
  • 解决了这两个问题,优雅发布就可以做到了。

    2.搭建环境

    搭建环境是为了看 Nacos 日志,通过日志找到对应的源代码。本文搭建的环境如下图:

    图片图片

    2.1 启动 provider

    启动 springboot-provider 的应用,注册到 Nacos,启动日志如下:

    2023-06-11 18:58:10,120 [main] [INFO] com.alibaba.nacos.client.naming - [BEAT] adding beat: BeatInfo{port=8083, ip='192.168.31.94', weight=1.0, serviceName='DEFAULT_GROUP@@springboot-provider', cluster='DEFAULT', metadata={management.endpoints.web.base-path=/actuator, management.port=18082, preserved.register.source=SPRING_CLOUD, management.address=127.0.0.1}, scheduled=false, period=5000, stopped=false} to beat map.
    2023-06-11 18:58:10,121 [main] [INFO] com.alibaba.nacos.client.naming - [REGISTER-SERVICE] public registering service DEFAULT_GROUP@@springboot-provider with instance: Instance{instanceId='null', ip='192.168.31.94', port=8083, weight=1.0, healthy=true, enabled=true, ephemeral=true, clusterName='DEFAULT', serviceName='null', metadata={management.endpoints.web.base-path=/actuator, management.port=18082, preserved.register.source=SPRING_CLOUD, management.address=127.0.0.1}}
    2023-06-11 18:58:10,133 [main] [INFO] com.alibaba.cloud.nacos.registry.NacosServiceRegistry - nacos registry, DEFAULT_GROUP springboot-provider 192.168.31.94:8083 register finished
    2023-06-11 18:58:10,221 [main] [INFO] org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat initialized with port(s): 18082 (http)
    2023-06-11 18:58:10,222 [main] [INFO] org.apache.coyote.http11.Http11NioProtocol - Initializing ProtocolHandler ["http-nio-127.0.0.1-18082"]
    2023-06-11 18:58:10,223 [main] [INFO] org.apache.catalina.core.StandardService - Starting service [Tomcat]
    2023-06-11 18:58:10,223 [main] [INFO] org.apache.catalina.core.StandardEngine - Starting Servlet engine: [Apache Tomcat/9.0.21]
    2023-06-11 18:58:10,239 [main] [INFO] org.apache.catalina.core.ContainerBase.[Tomcat-1].[localhost].[/] - Initializing Spring embedded WebApplicationContext
    2023-06-11 18:58:10,239 [main] [INFO] org.springframework.web.context.ContextLoader - Root WebApplicationContext: initialization completed in 99 ms
    2023-06-11 18:58:10,268 [main] [INFO] org.springframework.boot.actuate.endpoint.web.EndpointLinksResolver - Exposing 22 endpoint(s) beneath base path '/actuator'
    2023-06-11 18:58:10,336 [main] [INFO] org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-127.0.0.1-18082"]
    2023-06-11 18:58:10,340 [main] [INFO] org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 18082 (http) with context path ''
    2023-06-11 18:58:10,342 [main] [INFO] boot.Application - Started Application in 7.051 seconds (JVM running for 7.874)
    2023-06-11 18:58:10,358 [main] [INFO] com.alibaba.nacos.client.config.impl.ClientWorker - [fixed-39.105.183.91_8848] [subscribe] springboot-provider.properties+DEFAULT_GROUP
    2023-06-11 18:58:10,359 [main] [INFO] com.alibaba.nacos.client.config.impl.CacheData - [fixed-39.105.183.91_8848] [add-listener] ok, tenant=, dataId=springboot-provider.properties, group=DEFAULT_GROUP, cnt=1
    2023-06-11 18:58:10,359 [main] [INFO] com.alibaba.nacos.client.config.impl.ClientWorker - [fixed-39.105.183.91_8848] [subscribe] springboot-provider-dev.properties+DEFAULT_GROUP
    2023-06-11 18:58:10,359 [main] [INFO] com.alibaba.nacos.client.config.impl.CacheData - [fixed-39.105.183.91_8848] [add-listener] ok, tenant=, dataId=springboot-provider-dev.properties, group=DEFAULT_GROUP, cnt=1
    2023-06-11 18:58:10,360 [main] [INFO] com.alibaba.nacos.client.config.impl.ClientWorker - [fixed-39.105.183.91_8848] [subscribe] springboot-provider+DEFAULT_GROUP
    2023-06-11 18:58:10,360 [main] [INFO] com.alibaba.nacos.client.config.impl.CacheData - [fixed-39.105.183.91_8848] [add-listener] ok, tenant=, dataId=springboot-provider, group=DEFAULT_GROUP, cnt=1
    2023-06-11 18:58:10,639 [RMI TCP Connection(1)-192.168.31.94] [INFO] org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring DispatcherServlet 'dispatcherServlet'
    2023-06-11 18:58:10,839 [com.alibaba.nacos.client.naming.updater] [INFO] com.alibaba.nacos.client.naming - [BEAT] adding beat: BeatInfo{port=8083, ip='192.168.31.94', weight=1.0, serviceName='DEFAULT_GROUP@@springboot-provider', cluster='DEFAULT', metadata={management.endpoints.web.base-path=/actuator, management.port=18082, preserved.register.source=SPRING_CLOUD, management.address=127.0.0.1}, scheduled=false, period=5000, stopped=false} to beat map.
    2023-06-11 18:58:10,840 [com.alibaba.nacos.client.naming.updater] [INFO] com.alibaba.nacos.client.naming - modified ips(1) service: DEFAULT_GROUP@@springboot-provider@@DEFAULT -> [{"instanceId":"192.168.31.94#8083#DEFAULT#DEFAULT_GROUP@@springboot-provider","ip":"192.168.31.94","port":8083,"weight":1.0,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"DEFAULT","serviceName":"DEFAULT_GROUP@@springboot-provider","metadata":{"management.endpoints.web.base-path":"/actuator","management.port":"18082","preserved.register.source":"SPRING_CLOUD","management.address":"127.0.0.1"},"ipDeleteTimeout":30000,"instanceHeartBeatInterval":5000,"instanceHeartBeatTimeOut":15000}]
    2023-06-11 18:58:10,841 [com.alibaba.nacos.client.naming.updater] [INFO] com.alibaba.nacos.client.naming - current ips:(1) service: DEFAULT_GROUP@@springboot-provider@@DEFAULT -> [{"instanceId":"192.168.31.94#8083#DEFAULT#DEFAULT_GROUP@@springboot-provider","ip":"192.168.31.94","port":8083,"weight":1.0,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"DEFAULT","serviceName":"DEFAULT_GROUP@@springboot-provider","metadata":{"management.endpoints.web.base-path":"/actuator","management.port":"18082","preserved.register.source":"SPRING_CLOUD","management.address":"127.0.0.1"},"ipDeleteTimeout":30000,"instanceHeartBeatInterval":5000,"instanceHeartBeatTimeOut":15000}]

    我们再看下 Nacos 的日志,这里看的文件 naming-server.log,日志如下:

    2023-06-11 18:58:09,723 INFO Client connection 192.168.31.94:51885#true connect
    2023-06-11 18:58:10,105 INFO Client change for service Service{namespace='public', group='DEFAULT_GROUP', name='springboot-provider', ephemeral=true, revisinotallow=1}, 192.168.31.94:8083#true
    2023-06-11 18:58:18,204 INFO Client connection 192.168.31.94:60850#true disconnect, remove instances and subscribers

    springboot-provider 启动成功后,从Nacos 管理后台可以看到下图:

    图片图片

    2.2 provider 下线

    服务下线后,Nacos 日志如下:

    2023-06-11 19:01:03,375 INFO Client connection 192.168.31.94:51885#true disconnect, remove instances and subscribers
    2023-06-11 19:01:05,048 INFO [AUTO-DELETE-IP] service: Service{namespace='public', group='DEFAULT_GROUP', name='springboot-provider', ephemeral=true, revisinotallow=2}, ip: {"ip":"192.168.31.94","port":8083,"healthy":false,"cluster":"DEFAULT","extendDatum":{"management.endpoints.web.base-path":"/actuator","management.port":"18082","preserved.register.source":"SPRING_CLOUD","management.address":"127.0.0.1","customInstanceId":"192.168.31.94#8083#DEFAULT#DEFAULT_GROUP@@springboot-provider"},"lastHeartBeatTime":1686481231604,"metadataId":"192.168.31.94:8083:DEFAULT"}
    2023-06-11 19:01:05,048 INFO Client remove for service Service{namespace='public', group='DEFAULT_GROUP', name='springboot-provider', ephemeral=true, revisinotallow=2}, 192.168.31.94:8083#true
    2023-06-11 19:01:08,379 INFO Client connection 192.168.31.94:8083#true disconnect, remove instances and subscribers

    2.3 服务调用

    在 springboot-consumer 上跑一个单元测试的用例,用 FeignClient 调用下面的方法:

    @FeignClient(value = "springboot-provider", configuration = FeignMultipartSupportConfig.class)
    public interface FeignAsEurekaClient {
    
        @PostMapping("/employee/save")
        String saveEmployeebyName(@RequestBody Employee employee);
    
    }

    日志如下:

    2023-06-11 19:15:47,694 [main] [INFO] org.springframework.test.context.transaction.TransactionContext - Began transaction (1) for test context [DefaultTestContext@5bf0d49 testClass = TestFeignAsEurekaClient, testInstance = boot.service.TestFeignAsEurekaClient@10683d9d, testMethod = testPostEmployByFeign@TestFeignAsEurekaClient, testException = [null], mergedContextConfiguration = [WebMergedContextConfiguration@5b7a5baa testClass = TestFeignAsEurekaClient, locations = '{}', classes = '{class boot.Application, class boot.Application}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{org.springframework.boot.test.context.SpringBootTestCnotallow=true, server.port=0}', contextCustomizers = set[org.springframework.boot.test.context.filter.ExcludeFilterContextCustomizer@166fa74d, org.springframework.boot.test.json.DuplicateJsonObjectContextCustomizerFactory$DuplicateJsonObjectContextCustomizer@588df31b, org.springframework.boot.test.mock.mockito.MockitoContextCustomizer@0, org.springframework.boot.test.web.client.TestRestTemplateContextCustomizer@7fad8c79, org.springframework.boot.test.autoconfigure.properties.PropertyMappingContextCustomizer@0, org.springframework.boot.test.autoconfigure.web.servlet.WebDriverContextCustomizerFactory$Customizer@10b48321], resourceBasePath = 'src/main/webapp', contextLoader = 'org.springframework.boot.test.context.SpringBootContextLoader', parent = [null]], attributes = map['org.springframework.test.context.web.ServletTestExecutionListener.activateListener' -> false]]; transaction manager [org.springframework.jdbc.datasource.DataSourceTransactionManager@693676d]; rollback [true]
    2023-06-11 19:15:47,941 [main] [INFO] com.netflix.config.ChainedDynamicProperty - Flipping property: springboot-provider.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
    2023-06-11 19:15:47,962 [main] [INFO] com.netflix.loadbalancer.BaseLoadBalancer - Client: springboot-provider instantiated a LoadBalancer: DynamicServerListLoadBalancer:{NFLoadBalancer:name=springboot-provider,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}ServerList:null
    2023-06-11 19:15:47,969 [main] [INFO] com.netflix.loadbalancer.DynamicServerListLoadBalancer - Using serverListUpdater PollingServerListUpdater
    2023-06-11 19:15:48,064 [main] [INFO] com.netflix.config.ChainedDynamicProperty - Flipping property: springboot-provider.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
    2023-06-11 19:15:48,064 [main] [INFO] com.netflix.loadbalancer.DynamicServerListLoadBalancer - DynamicServerListLoadBalancer for client springboot-provider initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=springboot-provider,current list of Servers=[192.168.31.94:8083],Load balancer stats=Zone stats: {unknown=[Zone:unknown; Instance count:1; Active connections count: 0; Circuit breaker tripped count: 0; Active connections per server: 0.0;]
    },Server stats: [[Server:192.168.31.94:8083; Zone:UNKNOWN; Total Requests:0; Successive connection failure:0; Total blackout seconds:0; Last connection made:Thu Jan 01 08:00:00 CST 1970; First connection made: Thu Jan 01 08:00:00 CST 1970; Active Connections:0; total failure count in last (1000) msecs:0; average resp time:0.0; 90 percentile resp time:0.0; 95 percentile resp time:0.0; min resp time:0.0; max resp time:0.0; stddev resp time:0.0]
    ]}ServerList:com.alibaba.cloud.nacos.ribbon.NacosServerList@24d998ba

    注意,这里使用了 OpenFeign,其中用到了 Ribbon 做负载均衡,那就需要考虑到 Ribbon 刷新本地服务列表的时间,从源代码中看,刷新周期是 30s。如下图:

    图片图片

    Ribbon 刷新缓存的逻辑参考下面代码:

    public synchronized void start(final UpdateAction updateAction) {
     if (isActive.compareAndSet(false, true)) {
      final Runnable wrapperRunnable = new Runnable() {
       @Override
       public void run() {
        //...
       }
      };
    
      scheduledFuture = getRefreshExecutor().scheduleWithFixedDelay(
        wrapperRunnable,
        initialDelayMs,
        refreshIntervalMs,//这里定义的是30s
        TimeUnit.MILLISECONDS
      );
     }//...
    }

    3.优雅发布

    前面第一节提到过,优雅发布有两个要求:优雅上线和优雅下线。

    Service Consumer 初始化时会从 Nacos Server 获取服务列表并更新本地缓存,同时会向 Nacos Server 订阅服务列表(如果 Nacos Server 上的服务列表发生变化,会主动通知 Service Consumer)。之后会定时(默认间隔 1s )拉取服务列表并更新本地缓存。代码如下:

    //NacosNamingService 类
    public List selectInstances(String serviceName, String groupName, List clusters, boolean healthy,
      boolean subscribe) throws NacosException {
     
     ServiceInfo serviceInfo;
     String clusterString = StringUtils.join(clusters, ",");
     if (subscribe) {
      serviceInfo = serviceInfoHolder.getServiceInfo(serviceName, groupName, clusterString);
      if (null == serviceInfo) {
       serviceInfo = clientProxy.subscribe(serviceName, groupName, clusterString);
      }
     } else {
      serviceInfo = clientProxy.queryInstancesOfService(serviceName, groupName, clusterString, 0, false);
     }
     return selectInstances(serviceInfo, healthy);
    }

    在订阅的代码中,加入了定时更新服务列表的代码,如下:

    //NamingClientProxyDelegate 类
    public ServiceInfo subscribe(String serviceName, String groupName, String clusters) throws NacosException {
     NAMING_LOGGER.info("[SUBSCRIBE-SERVICE] service:{}, group:{}, clusters:{} ", serviceName, groupName, clusters);
     String serviceNameWithGroup = NamingUtils.getGroupedName(serviceName, groupName);
     String serviceKey = ServiceInfo.getKey(serviceNameWithGroup, clusters);
     serviceInfoUpdateService.scheduleUpdateIfAbsent(serviceName, groupName, clusters);
     ServiceInfo result = serviceInfoHolder.getServiceInfoMap().get(serviceKey);
     if (null == result || !isSubscribed(serviceName, groupName, clusters)) {
      result = grpcClientProxy.subscribe(serviceName, groupName, clusters);
     }
     serviceInfoHolder.processServiceInfo(result);
     return result;
    }

    Nacos Server 会定时(每隔 5s)检查 Service Provider 是否健康(根据心跳来检查),如果 15s (默认,可以配置)没有收到心跳,则会把服务置为不健康,并且通知 Service Consumer。代码如下:

    //UnhealthyInstanceChecker 类
    public void doCheck(Client client, Service service, HealthCheckInstancePublishInfo instance) {
     if (instance.isHealthy() && isUnhealthy(service, instance)) {
      changeHealthyStatus(client, service, instance);
     }
    }
    
    private void changeHealthyStatus(Client client, Service service, HealthCheckInstancePublishInfo instance) {
     instance.setHealthy(false);
    
     NotifyCenter.publishEvent(new ServiceEvent.ServiceChangedEvent(service));
     NotifyCenter.publishEvent(new ClientEvent.ClientChangedEvent(client));
     NotifyCenter.publishEvent(new HealthStateChangeTraceEvent(System.currentTimeMillis(),
       service.getNamespace(), service.getGroup(), service.getName(), instance.getIp(), instance.getPort(),
       false, "client_beat"));
    }

    3.1 优雅上线

    优雅上线存在的问题主要在于 Service Provider 注册到 Nacos 后,服务还没有完成初始化,请求已经到来。这种情况主要原因是 Service Provider 启动后立刻注册 Naocs,而本身提供的接口可能还没有初始化完成。

    这种情况的解决方法是关闭自动注册:

    spring.cloud.nacos.discovery.registerEnabled=false

    在服务初始化后使用代码手动注册,代码如下:

    Properties setting8 = new Properties();
    String serverIp8 = "127.0.0.1:8848";
    setting8.put(PropertyKeyConst.SERVER_ADDR, serverIp8);
    setting8.put(PropertyKeyConst.USERNAME, "nacos");
    setting8.put(PropertyKeyConst.PASSWORD, "nacos");
    NamingService inaming8 = NacosFactory.createNamingService(setting7);
    inaming8.registerInstance("springboot-provider", "192.168.31.94", 8083);

    3.2 优雅下线

    对于正常下线,Nacos Server 收到 Provider 发送的下线请求后,会通知订阅的 Server Consumer,而且 Consumer 也会每隔 1s 去更新本地服务列表,这个过程已经非常接近优雅下线了。

    而对于异常下线,Nacos Server 采用心跳检测机制来更新服务列表。心跳周期是 5s,Nacos Server 如果 15s 没收到心跳就才会将实例设置为不健康。

    3.2.1 正常停服

    正常下线的情况下,最优雅的方式是先向 Nacos Server 发送下线通知,发送通知一段时间(比如 5s)后,再停服。比如增加一个 API 接口,服务下线之前增加 preStopHook 函数调用这个 API 接口来实现下线。API 接口示例代码如下:

    @GetMapping(value = "/nacos/deregisterInstance")
    public String deregisterInstance() {
     Properties prop = new Properties();
     prop.setProperty("serverAddr", "localhost");
     prop.put(PropertyKeyConst.NAMESPACE, "test");
     NacosNamingService client = new NacosNamingService(prop);
     client.deregisterInstance("springboot-provider", "192.168.31.94", 8083);
     return "success";
    }

    在使用 Ribbon 的场景,也需要考虑 Ribbon 更新本地缓存服务列表的机制,手动下线后,可以再等待 30s 后关闭服务。

    3.2.1 服务故障

    对于服务故障的情况,Nacos Server 需要采用心跳来检测服务在线,如果 15s 没收到心跳才会将实例设置为不健康,在 30s 没收到心跳才会把这个服务从列表中删除。这个时间可以做优化设置:

    spring.cloud.nacos.discovery.metadata.preserved.heart.beat.interval=1000 #心跳间隔5s->1s
    spring.cloud.nacos.discovery.metadata.preserved.heart.beat.timeout=3000 #超时时间15s->3s
    spring.cloud.nacos.discovery.metadata.preserved.ip.delete.timeout=5000 #删除时间30s->5s

    但是,Service Provider 故障情况下,即使做优化配置,也是很难让 Service Consumer 无感知。

    极端情况下,可能 Provider 部分服务已经不能正常提供了,但还是会向 Nacos Server 发送心跳,这种情况可以采用服务本身的健康检查来通知 Nacos Server 服务下线。

    4 总结

    无论是哪一款注册中心,优雅发布要解决的问题都是优雅上线和优雅下线。本文结合 Nacos 的原理讲解了 Nacos 的优雅发布,希望对你有所帮助。

    相关文章

    JavaScript2024新功能:Object.groupBy、正则表达式v标志
    PHP trim 函数对多字节字符的使用和限制
    新函数 json_validate() 、randomizer 类扩展…20 个PHP 8.3 新特性全面解析
    使用HTMX为WordPress增效:如何在不使用复杂框架的情况下增强平台功能
    为React 19做准备:WordPress 6.6用户指南
    如何删除WordPress中的所有评论

    发布评论