redis连接池管理

redis连接池管理与空闲连接清理

Posted by Link on May 18, 2021

redis连接池管理

起因

整件事情的起因其实是一个在生产环境部署的服务报了如下的redis错误 JedisConnectionException: Unexpected end of stream.。该服务接受消息,通过EHINCRBY指令进行访问统计,偶尔会出现如下错误。该业务还并未正式上线,写入的qps相当低(几分钟一条),而使用的redis集群又是独占的集群,不太可能会出现性能瓶颈,觉得很奇怪。

redis.clients.jedis.exceptions.JedisConnectionException: Unexpected end of stream.
    at redis.clients.jedis.util.RedisInputStream.ensureFill(RedisInputStream.java:202) ~[jedis-xmly-ext-3.1.1-SNAPSHOT.jar!/:na]
    at redis.clients.jedis.util.RedisInputStream.readByte(RedisInputStream.java:43) ~[jedis-xmly-ext-3.1.1-SNAPSHOT.jar!/:na]
    at redis.clients.jedis.Protocol.process(Protocol.java:154) ~[jedis-xmly-ext-3.1.1-SNAPSHOT.jar!/:na]
    at redis.clients.jedis.Protocol.read(Protocol.java:219) ~[jedis-xmly-ext-3.1.1-SNAPSHOT.jar!/:na]
    at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:309) ~[jedis-xmly-ext-3.1.1-SNAPSHOT.jar!/:na]
    at redis.clients.jedis.Connection.getIntegerReply(Connection.java:260) ~[jedis-xmly-ext-3.1.1-SNAPSHOT.jar!/:na]
    at redis.clients.jedis.Jedis.ehincrBy(Jedis.java:955) ~[jedis-xmly-ext-3.1.1-SNAPSHOT.jar!/:na]
    ...

排查过程

  1. 排查服务端问题

    在goole上搜索该异常,看了几个网页都说是如下问题。

    1. 缓冲区满
      1. 由于请求量极小,并没有出现缓冲区满的问题
    2. server timeout设置不合理
      1. timeout设置为75s
    3. 网络不稳定 于是找了redis server端进行排查,server端看到所有的EHINCRBY指令都已正常执行完毕。因此转而排查客户端连接问题。 goole
  2. 排查客户端连接问题

    思考该异常一般出现在连接失败的情况下,考虑连接失败的情况,又考虑到请求量极少,于是想到是空闲连接被redis server关闭,但还在客户端使用导致的。有关的redis配置为 空闲资源监测配置属性

    1. maxIdle最大空闲和最小空闲,如果最大空闲为0,则不存在空闲连接,每次都会创建连接,结束后销毁连接。
    2. testWhileIdle 为true会开启监测
    3. timeBetweenEvictionRunsMillis 空闲资源的检测周期,单位为毫秒,默认值:-1,表示不检测,建议设置一个合理的值,周期性运行监测任务4.
    4. softMinEvictableIdleTimeMillis 资源池中的自由最小空闲时间,单位为ms, 当达到该值且空闲连接数大于minIdle时移除,软移除
    5. minEvictableIdleTimeMillis 资源池中资源最小空闲时间,单位为毫秒,默认值:30分钟(1000 60L 30L),当达到该值后空闲资源将被移除,建议根据业务自身设定
    6. numTestsPerEvictionRun 做空闲资源检测时,每次的采样数,默认值:3,可根据自身应用连接数进行微调,如果设置为 -1,表示对所有连接做空闲监测

    最后发现的问题是项目中timeBetweenEvictionRunsMillis配置了-1,maxIdle为20,因此允许空闲连接存在,但没有进行清理,导致客户端使用失效的连接进行连接导致,将该参数改为30s,比timeout75s短解决了问题

redis pool连接管理

基于该问题,整理了一下redis pool如何管理连接

image.png

  1. 创建连接池

         public GenericObjectPool(final PooledObjectFactory<T> factory,
                 final GenericObjectPoolConfig<T> config) {
    
             super(config, ONAME_BASE, config.getJmxNamePrefix());
    
             if (factory == null) {
                 jmxUnregister(); // tidy up
                 throw new IllegalArgumentException("factory may not be null");
             }
             this.factory = factory;
    
             idleObjects = new LinkedBlockingDeque<>(config.getFairness());
    
             setConfig(config);
         }
    
  2. 开启检测(我们遇到的问题就是在这没有启动Evictor

    该方法在1中setConfig->setTimeBetweenEvictionRunsMillis中调用,使用配置timeBetweenEvictionRunsMillis创建失效连接检测器

         final void startEvictor(final long delay) {
             synchronized (evictionLock) {
                 if (evictor == null) { // Starting evictor for the first time or after a cancel
                     if (delay > 0) {   // Starting new evictor
                         evictor = new Evictor();
                         EvictionTimer.schedule(evictor, delay, delay);
                     }
                 } else {  // Stop or restart of existing evictor
                     if (delay > 0) { // Restart
                         synchronized (EvictionTimer.class) { // Ensure no cancel can happen between cancel / schedule calls
                             EvictionTimer.cancel(evictor, evictorShutdownTimeoutMillis, TimeUnit.MILLISECONDS, true);
                             evictor = null;
                             evictionIterator = null;
                             evictor = new Evictor();
                             EvictionTimer.schedule(evictor, delay, delay);
                         }
                     } else { // Stopping evictor
                         EvictionTimer.cancel(evictor, evictorShutdownTimeoutMillis, TimeUnit.MILLISECONDS, false);
                     }
                 }
             }
         }
    
  3. evictor定时调用

    evictor中的run函数

             @Override
             public void run() {
                 final ClassLoader savedClassLoader =
                         Thread.currentThread().getContextClassLoader();
                 try {
                     if (factoryClassLoader != null) {
                         // Set the class loader for the factory
                         final ClassLoader cl = factoryClassLoader.get();
                         if (cl == null) {
                             // The pool has been dereferenced and the class loader
                             // GC'd. Cancel this timer so the pool can be GC'd as
                             // well.
                             cancel();
                             return;
                         }
                         Thread.currentThread().setContextClassLoader(cl);
                     }
    
                     // Evict from the pool
                     try {
                         evict();
                     } catch(final Exception e) {
                         swallowException(e);
                     } catch(final OutOfMemoryError oome) {
                         // Log problem but give evictor thread a chance to continue
                         // in case error is recoverable
                         oome.printStackTrace(System.err);
                     }
                     // Re-create idle instances.
                     try {
                         ensureMinIdle();
                     } catch (final Exception e) {
                         swallowException(e);
                     }
                 } finally {
                     // Restore the previous CCL
                     Thread.currentThread().setContextClassLoader(savedClassLoader);
                 }
             }
    
  4. 实际检测

    evict()由GenericObjectPool实现,调用EvictionPolicy.evict

     public void evict() throws Exception {
    
         //...
         PooledObject<T> underTest = null;
         final EvictionPolicy<T> evictionPolicy = getEvictionPolicy();
    
         synchronized (evictionLock) {
             final EvictionConfig evictionConfig = new EvictionConfig(
                     getMinEvictableIdleTimeMillis(),
                     getSoftMinEvictableIdleTimeMillis(),
                     getMinIdle());
    
             final boolean testWhileIdle = getTestWhileIdle();
    
             for (int i = 0, m = getNumTests(); i < m; i++) {
                 if (evictionIterator == null || !evictionIterator.hasNext()) {
                     evictionIterator = new EvictionIterator(idleObjects);
                 }
                 if (!evictionIterator.hasNext()) {
                     // Pool exhausted, nothing to do here
                     return;
                 }
    
                 try {
                     underTest = evictionIterator.next();
                 } catch (final NoSuchElementException nsee) {
                     // Object was borrowed in another thread
                     // Don't count this as an eviction test so reduce i;
                     i--;
                     evictionIterator = null;
                     continue;
                 }
    
                 if (!underTest.startEvictionTest()) {
                     // Object was borrowed in another thread
                     // Don't count this as an eviction test so reduce i;
                     i--;
                     continue;
                 }
    
                 // User provided eviction policy could throw all sorts of
                 // crazy exceptions. Protect against such an exception
                 // killing the eviction thread.
                 boolean evict;
                 try {
                     evict = evictionPolicy.evict(evictionConfig, underTest,
                             idleObjects.size());
                 } catch (final Throwable t) {
                     // Slightly convoluted as SwallowedExceptionListener
                     // uses Exception rather than Throwable
                     PoolUtils.checkRethrow(t);
                     swallowException(new Exception(t));
                     // Don't evict on error conditions
                     evict = false;
                 }
                 //...
         }
    
  5. 实际判断

    可以看到满足以下条件时清理

    1. 空闲时间>IdleSoftEvictTime 同时idleCount > MinIdle
    2. 空闲时间>IdleEvictTime
         @Override
         public boolean evict(final EvictionConfig config, final PooledObject<T> underTest,
                 final int idleCount) {
    
             if ((config.getIdleSoftEvictTime() < underTest.getIdleTimeMillis() &&
                     config.getMinIdle() < idleCount) ||
                     config.getIdleEvictTime() < underTest.getIdleTimeMillis()) {
                 return true;
             }
             return false;
         }
    

参考文章

(文章)[https://www.modb.pro/db/21488]