In some scenarios, we might observe issues related to cache-fs where reading or writing to cachefs is slower due to which uploads/downloads are taking time to process. You can follow the below debugging steps to troubleshoot this issue further.
1. Applying the below debug logger to /opt/jfrog/artifcatory/var/etc/artifactory/logback.xml file related to cachefs provider.
<appender name="storageprovider" class="ch.qos.logback.core.rolling.RollingFileAppender"> <File>${log.dir}/artifactory-cachefs.log</File> <rollingPolicy class="org.jfrog.common.logging.logback.rolling.FixedWindowWithDateRollingPolicy"> <FileNamePattern>${log.dir.archived}/artifactory-storageprovider.%i.log.gz</FileNamePattern> <maxIndex>10</maxIndex> </rollingPolicy> <triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy"> <MaxFileSize>25MB</MaxFileSize> </triggeringPolicy> <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder"> <layout class="org.jfrog.common.logging.logback.layout.BackTracePatternLayout"> <pattern>%date{yyyy-MM-dd'T'HH:mm:ss.SSS, UTC}Z [jfrt ] [%-5p] [%-16X{uber-trace-id}] [%-30.30(%c{3}:%L)] [%-20.20thread] - %m%n</pattern> </layout> </encoder> </appender> <logger name="org.jfrog.storage.binstore.providers" additivity="false"> <level value="trace"/> <appender-ref ref="storageprovider"/> </logger>
NOTE: Debug loggers are resource intensive, once the issue is reproduced, please remove the debug logger.
The above debugger logger will help you in tracking the behavior of the cache-fs layer when a new upload/download is made to the cache-fs provider.
2. Collecting thread dump as explained here.
In some scenarios collecting multiple thread dumps will help you in further understanding if there are any blocked or long-running threads when reading from / writing to cache-fs.
Below is the example thread:
PRIORITY : 5 THREAD ID : 0X00007F4A687474D0 NATIVE ID : 0X261013 NATIVE ID (DECIMAL) : 2494483 STATE : BLOCKED stackTrace: java.lang.Thread.State: BLOCKED (on object monitor) at org.jfrog.storage.binstore.multi.ConcurrentReadWhileWriteStreamReader.advanceWriterIfNeeded(ConcurrentReadWhileWriteStreamReader.java:72) - waiting to lock <0x000000044fb522e0> (a org.jfrog.storage.binstore.providers.cachefs.ConcurrentReadWhileWriteStreamWriter) at org.jfrog.storage.binstore.multi.ConcurrentReadWhileWriteStreamReader.read(ConcurrentReadWhileWriteStreamReader.java:55) at java.io.BufferedInputStream.read1(java.base@17.0.6/BufferedInputStream.java:282) at java.io.BufferedInputStream.read(java.base@17.0.6/BufferedInputStream.java:343) - locked <0x000000044f2d6950> (a org.jfrog.storage.binstore.common.ReaderTrackingInputStream) at java.io.FilterInputStream.read(java.base@17.0.6/FilterInputStream.java:106) at org.artifactory.util.LazyStreamLoader.read(LazyStreamLoader.java:37) at org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:81) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1158) at org.apache.commons.io.IOUtils.copy(IOUtils.java:878) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1135) at org.artifactory.api.request.ArtifactoryResponseBase.sendStream(ArtifactoryResponseBase.java:72) at org.artifactory.api.request.ArtifactoryResponseBase.sendStream(ArtifactoryResponseBase.java:61) at org.artifactory.request.RequestResponseHelper.sendStream(RequestResponseHelper.java:435) at org.artifactory.request.RequestResponseHelper.sendBodyResponse(RequestResponseHelper.java:157) at org.artifactory.engine.DownloadServiceImpl.sendStreamOrRedirect(DownloadServiceImpl.java:1000) at org.artifactory.engine.DownloadServiceImpl.respondFoundResource(DownloadServiceImpl.java:751) at org.artifactory.engine.DownloadServiceImpl.respond(DownloadServiceImpl.java:547) at org.artifactory.engine.DownloadServiceImpl.processDownload(DownloadServiceImpl.java:377) at org.artifactory.engine.DownloadServiceImpl$$Lambda$5816/0x0000000803444000.execute(Unknown Source) at org.artifactory.api.request.GracefulShutdownAware.track(GracefulShutdownAware.java:36) at org.artifactory.engine.DownloadServiceImpl.process(DownloadServiceImpl.java:300) at jdk.internal.reflect.GeneratedMethodAccessor692.invoke(Unknown Source) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.6/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(java.base@17.0.6/Method.java:568) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.artifactory.request.aop.RequestAdvice.invoke(RequestAdvice.java:67) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:220) at jdk.proxy5.$Proxy452.process(jdk.proxy5/Unknown Source) at org.artifactory.webapp.servlet.RepoFilter.doDownload(RepoFilter.java:332) at org.artifactory.webapp.servlet.RepoFilter.execute(RepoFilter.java:182) at org.artifactory.webapp.servlet.RepoFilter.doFilter(RepoFilter.java:108) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:178) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:153) at org.artifactory.webapp.servlet.AccessFilter.useAuthentication(AccessFilter.java:627) at org.artifactory.webapp.servlet.AccessFilter.useAnonymousIfPossible(AccessFilter.java:586) at org.artifactory.webapp.servlet.AccessFilter.doFilterInternal(AccessFilter.java:314) at org.artifactory.webapp.servlet.AccessFilter.doFilter(AccessFilter.java:226) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:178) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:153) at org.artifactory.webapp.servlet.RequestFilter.lambda$doFilter$0(RequestFilter.java:89) at org.artifactory.webapp.servlet.RequestFilter$$Lambda$5304/0x0000000803333c40.run(Unknown Source) at org.artifactory.throttling.qrl.service.QueryRateLimiterServiceImpl.wrapRequestWithLowPriority(QueryRateLimiterServiceImpl.java:364) at org.artifactory.webapp.servlet.RequestFilter.doFilter(RequestFilter.java:89) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:178) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:153) at org.artifactory.webapp.servlet.ArtifactoryCsrfFilter.doFilter(ArtifactoryCsrfFilter.java:83) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:178) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:153) at org.springframework.session.web.http.SessionRepositoryFilter.doFilterInternal(SessionRepositoryFilter.java:164) at org.springframework.session.web.http.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:80) at org.artifactory.webapp.servlet.SessionFilter.doFilter(SessionFilter.java:67) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:178) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:153) at org.artifactory.webapp.servlet.ArtifactoryTracingFilter.doFilter(ArtifactoryTracingFilter.java:38) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:178) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:153) at org.artifactory.webapp.servlet.ArtifactoryFilter.doFilter(ArtifactoryFilter.java:126) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:178) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:153) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:167) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:481) at org.apache.catalina.valves.rewrite.RewriteValve.invoke(RewriteValve.java:289) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:130) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:390) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:960) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1791) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52) at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(java.base@17.0.6/Thread.java:833) Locked ownable synchronizers: - <0x0000000209bb9048> (a org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker) - <0x000000044f2debe8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
In the above thread, there is a lock created when performing ConcurrentReadWhileWriteStreamWriter from the cache-fs which eventually leads to a thread moving to the blocked state and causing a system-down issue with the Artifactory. As a next step, we should check the following things,
a. Disk read/write speed to understand if there are any slowness:
Disk read/write speed refers to the rate at which data can be read from or written to a storage device, such as a hard disk drive (HDD), solid-state drive (SSD), or external storage device. It is typically measured in units of data per second, such as megabytes per second (MB/s) or gigabytes per second (GB/s).
Write speed:
dd if=/dev/zero of=testfile bs=1G count=1 oflag=direct
Read speed:
dd if=testfile of=/dev/null bs=1G count=1 iflag=direct
b. Cache-fs size:
Check if the storage allocated for cache-fs is sufficient to store the artifacts and check if it is following the formulae of 10 percent of the binary size in Artifactory.
c. Disabling MultiRead functionality:
By default, when simultaneous requests for the same binary are made, one of the nodes will request the upstream filestore(e.g. S3) and the other nodes will wait to read the item from that node's cache on receipt. This causes the other node's threads to wait and increase CPU. We can prevent this behavior from occurring by disabling this behavior and allowing each node to request binaries directly from S3 and prevent additional waiting. To do so, you need to update the "cache-fs" provider in the binarystore.xml to disable the "multiReadEnabled" attribute:
<provider id="cache-fs" type="cache-fs"> <cacheProviderDir>/opt/jfrog-cache/artifactory</cacheProviderDir> <maxCacheSize>483183820800</maxCacheSize> <multiReadEnabled>false</multiReadEnabled> </provider>
NOTE: A restart is required for the changes to take effect.