Introduction
When retrieving PyPI packages using older versions of Artifactory, we may encounter error messages:
[ERROR] [598740fcb096b9da] [.r.PypiRemoteIndexProvider:205] [ttp-nio-8081-exec-59] - Could not retrieve remote index from https://pypi.org/simple/certifi/: More data read than expected: dataLength=16384; expectedLength=13365; includeSkipped=false; in.getClass()=class com.amazonaws.internal.ReleasableInputStream; markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; resetCount=0
When we perform HEAD requests, we request it with Accept-Encoding: gzip meaning it will return the content length of the object encoded as gzip:
Received status code 200 and caught exception: More data read than expected: dataLength=524292; expectedLength=471451; includeSkipped=false; in.getClass()=class com.amazonaws.internal.ReleasableInputStream; markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; resetCount=0
When we upload to AWS, AWS client is expecting the actual size of the object and not the gzip one. The AWS SDK, compares the content length and the actual size of the bytes read. It seems that pypi.org is using gzip format and since you are using the S3 configuration, you might encounter this bug.
Overall this occurs because the AmazonS3Client is verifying the data size against the 'content-length' that is in the response from the upstream but if the actual uploaded file size (when decompressed) is bigger it will throw an exception "More data read than expected". This bug was fixed in later versions of the AmazonS3Client and also updated in the Artifactory code libraries.
Resolution
The optimal solution is to upgrade Artifactory to the latest version. This will not only resolve the current issue but also ensure you are using a supported version, as older versions may have reached its End of Life.
As a workaround for the time being, if upgrading is not feasible at the moment, we can try setting the following property in the artifactory.system.properties under:
$JFROG_HOME/artifactory/var/etc/artifactory/artifactory.system.properties
You will need to add the following parameter (artifactory.http.acceptEncoding.gzip) to the artifactory.system.properties file and set it to 'false':
## Send the Accept-Encoding:gzip header to remote repositories and handle gzip stream responses
artifactory.http.acceptEncoding.gzip=false
Following this change, we will need to perform a full restart on Artifactory.