Best Practices for Managing Your Artifactory Database

How to monitor your Artifactory Database and improve performance

By Batel Tova and Eran Blumenthal

INTRODUCTION

One of the key underlying JFrog Artifactory features is checksum based storage, which enables unique artifact storage that optimizes many aspects of repository management. This is done by storing the required artifact metadata in a dedicated database, and mapping out to its physical storage (NAS/SAN/FileSystem/Blob Storage/etc.). This relationship entails the significance of the connection and communication between the two, and dictates the availability of resources for the database. For example, a stable and fast connection between Artifactory and the database is important in order to prevent issues such as slower response time for requests (GET, search, certain API calls, and even UI slowness), dropped connection, connection pool depletion, timeouts, and in some cases node synchronisation issues (since Artifactory High Availability nodes synchronize via the database).

This white paper provides a foundation for understanding and managing the database used by Artifactory, including setup, optimized configuration, and tuning for working at scale.

Each binary is located in the filestore (local disk by default) and identified by its checksum (sha1), with all the metadata (such as package metadata, artifact names, sizes, creation dates, repo locations, sha256 values, and signatures) saved in the database.

Learn more > about best practices for managing your Artifactory filestore

What is the Artifactory Database?

Let’s first take a look at the relationship between Artifactory and its database. As mentioned, Artifactory utilizes a checksum based storage. This implies that each binary is stored and renamed on the filesystem by its checksum, once and only once. In order to support this, Artifactory uses the database to map the required references between the logical representation of a file identified by its checksum to its location in a virtual file system.

Artifactory utilizing a checksum based storage

In addition to supporting checksum based storage, the Artifactory database contains additional information, including:

  • Propertieskey:value entries, part of the available artifact metadata
  • Security entities – users, groups, permission targets, etc
  • File checksum – md5, SHA1, SHA2
  • Archive indices – internal structure mapping
  • File stats – created/modified dates, size, download count, deployer, etc.

Storing this information in the database means that all Artifactory requests are translated into database queries. For example, a single REST API call to Artifactory can translate into six database queries; Artifactory uses the database for standard OLTP usage, short lived and very efficient transactions. This is true from upload/download requests to the most complex AQL queries. Artifactory also keeps application-related content in the database. This includes security, general configurations, and information such as storage info details.

> Learn More about the JFrog Platform resources.

Preliminary Considerations

The following list includes important considerations when working with and monitoring your Artifactory database.

  • Ensure sufficient resources for your database, including compute, storage and networking.
  • Establish backup and restore procedures in advance to protect your database. [1]
  • Plan regular monitoring and tuning in advance for your system
  • Establish troubleshooting and support availability
  • Create a disaster recovery (DR) plan

[1] When considering the infrastructure, one of the most common questions is whether it is recommended to install the Artifactory database on a dedicated server or share it with other applications. While this depends on your organization’s needs, it is important to take into consideration that there is continuous communication between Artifactory and the database which, in most cases, will result in high database utilization.

Database connections

Currently Artifactory uses JDBC queries. JDBC is a Java API that allows you to connect and execute the query with a database. The JDBC API uses JDBC drivers to connect with the database as in the following diagram.

Database Connection

Configurable parameters

Defaults

The default number of configured concurrent database connections is 100 for every service. This translates to 300 open database connections from Artifactory; 100 for the Artifactory application, 100 for the Access service and 100 for the database locking mechanism. The maximum number of idle database connections that Artifactory can hold is 10.

Parameters

The number of idle and database connections can be modified using the system.yaml (for Artifactory version 7.x) and db.properties file (for Artifactory version 6.x) using the following parameters:

pool.max.idle = <Number_Of_Idle_Connections>
pool.max.active = <Number_Of_Connections>

Timeout

The following exception is returned, and will be displayed in the artifactory.log, in case Artifactory fails to reserve a connection.

org.springframework.transaction.CannotCreateTransactionException: Could not open JDBC Connection for transaction; nested exception is org.apache.tomcat.jdbc.pool.PoolExhaustedException: [art-exec-672866] Timeout: Pool empty. Unable to fetch a connection in 120 seconds, none available[size:100; busy:100; idle:0; lastwait:120000].
Important Note: Make sure to monitor your wait time for available connections. A several second wait can result in higher response time and should be avoided.

 

Database performance factors

Estimating the number of concurrent database connections and size requires taking into consideration the daily usage of Artifactory.

This sections describes some of the common factors that will affect your database performance you’ll need to take into consideration, including:

  • AQL queries
  • Indexed archive entries
  • Database size
  • Total properties per artifact
  • IOPS value
  • Database resources
  • Tomcat configuration

Artifactory Query Language (AQL)

If you are running an enhanced AQL query for every build, and this query is running for almost 30 seconds, as AQL uses streaming, Artifactory will hold the database connection for 30 seconds for every build.

Indexed archived entries

In order to allow stored archive artifacts (zip, tar, tar.gz) to be searchable through class search and browsable from the UI, Artifactory maintains the indexed_archives_entries table, which represents an index of files that are contained within artifact files.

indexed_archives_entries

When using the archive indexing functionality, the indexed_archives_entries will become larger. Moreover, when a new archive file is deployed to Artifactory, its content is indexed and the table is updated. These archive files can be jar, war, zip, or other archive types, as defined in the ${ARTIFACTORY_HOME}/etc/mimetypes.xml file.

Entries are deleted from this table whenever an indexed artifact is deleted by the garbage collector, so this table should not hold any indexes of non-existing archives.

To disable archive indexing for a specific file type, edit the mimetypes.xml and change the value of the “index” attribute of the file type from “true” to “false”. Disabling future indexing of a specific mime-type will not delete existing indexes from the database.

This is a useful behaviour automatically enabled by Artifactory, but if it is not valuable for the customer use cases, it can be disabled.

Database size

The database server may be sized according to the following ratio: 1/100 of the filestore.
This is not a strict formula to follow, it is meant to give you an initial sizing. Make sure to monitor your database storage.

When sizing your database storage, keep in mind that during Artifactory upgrades, the database schema may be modified and temporary tables will be created, which will increase the database storage you’ll need.

Total properties per artifact

The database size can also be affected when Artifactory has many artifacts with a large number of properties. Today Artifactory keeps a 4,000 varchar (approximately 4k) for every property value. This means that if there are ten properties per artifact, Artifactory keeps 0.04 MB for every artifact in the database.

Obviously, JFrog encourages leveraging the properties’ capabilities as much as possible, since metadata is at the heart of DevOps, yet, extensive use of metadata might impose performance and usage impact on the database, thus this table should be monitored.

Total properties per artifact

IOPS value

The number of IOPS (Input/Output Operations Per Second) should also be taken into consideration when configuring the Artifactory database. For example, when using the AWS RDS. IOPS are popular performance metric used to distinguish between one storage type to another. As IOPS values increase, performance needs and costs rise. 

We received a number of use cases where the user reached the limit of the IOPS and experienced a 40% decrease in performance.

To increase the number of connections on the database side perform the following calculation:

Total number of connections = (number of nodes) * 3 * (pool.max.active) + 50

Database tuning

In addition to modifying database connections, there are other settings that need to be taken into consideration when it comes to database performance. For fine tuning examples, our MySQL documentation includes suggestions for setting up your my.cnf file to use the InnoDB engine

For PostgreSQL, you can find out more about tuning in the PostgreSQL wiki and also learn about how to increase your max_connections setting.

Tomcat configuration

Artifactory runs on Tomcat which uses the Apache HTTP client to handle HTTP requests. Tomcat controls a pool of threads and allocates a single thread for each incoming request. As the pool of threads is reduced, Tomcat will create more connections to handle additional requests that come in. If the number of requests exceeds the maxThreads value, requests will be queued. In case that the acceptCount limit is also reached, Tomcat will throw connection refused errors for any additional requests until the required resources are eventually released.

You can increase the values of these attributes in Tomcat’s server.xml file in the  Artifactory_Home/tomcat/conf file.

The default Tomcat maxThreads value is 200, while acceptCount is set to 100.

Recommended Database hardware 

Concurrent Connections   Max Connections Tomcat

maxThreads

Tomcat  acceptCount Storage Processor RAM (xmx)
1 – 20 100 (default settings) 200 100 At least 20% from the Artifactory storage  4 cores  8 GB
20 – 50 100 (default settings) 200 100 At least 20% from the Artifactory storage  4 Cores 16 GB 
50  – 100 200  200 100 At least 20% from the Artifactory storage  8 Cores 32 GB
100 – 200 400 400 200 At least 20% from the Artifactory storage  16 Cores 64 GB
200 – 500  800  800 400 At least 20% from the Artifactory storage  32 Cores 128 GB
500 + Please contact JFrog Support for a recommended setup.

 

FAQ

1. How many queries does Artifactory send to the database per user request?

Approximately 1 to 8 queries run against the database per request to the Artifactory server.

2. Does Artifactory support High Availability databases?

Artifactory supports a single URL for the Database connections as part of the database configuration file (db.properties file for Artifactory version 6.x and below).

In general, Artifactory is unaware of the clustered topology of the database setup.

Working with an active/passive database set up through a single database endpoint for high availability and resilience is supported. This is also the case for JFrog cloud solutions that works with various cloud solutions such as Amazon RDS or others (PostgreSQL, MSSql, MySql, etc.), using active/passive topology behind the scenes.

Working with an active-active database set-up is highly dependent on the internal database implementation and therefore currently not fully supported for all vendors and all solutions. However, alternative solutions such as Oracle RAC, mssql active-active clusters, and additional solutions are also successful.

It is recommended to first validate the requirement for the additional database nodes: 

For high availability, use a supported active-passive setup. For load balancing, setup your database in a way that will work for your needs and use cases.

In summary, an active-passive solution is recommended.

3. What is the recommended latency between Artifactory and the Database?

The network latency must be a few milliseconds at the most.

In an HA cluster, the nodes require a LAN quality network connectivity, a stable network with low latency and high throughput. This means that different availability zones within the same region are supported, but cross zone topology for an HA cluster is not.

The reason for this requirement is that JFrog Artifactory is chatty with the database. In a best case scenario, slow connection to the database is directly correlated with slower response time for requests (such as GET, search, certain API calls, and even UI slowness). In other cases, this can lead to dropped connections, connection pool depletion, timeouts, and in some scenarios can even lead to node synchronisation issues (nodes synchronize via the database). 

Following this, it recommended to have a cluster in every data center with replication between clusters instead of having an extended cluster in two data centers. 

4. Do we have any caching mechanism for the database?

Artifactory doesn’t cache results but every database has its own mechanism for caching, such as for PostgreSQL.

5. Is it recommended to use the bundled database or an externalized one?

Artifactory comes with an embedded Derby Database out-of-the-box which it is pre-configured to use, however, for better performance and to reuse existing infrastructures you may have, you can configure Artifactory to work with alternative supported databases.

In general, the bundled DerbyDB is a very good, fast and reliable database for supporting up to  ~700K artifacts. After this point we noticed performance degradation.

Pros and Cons for utilizing a bundled database:

Pros:

  • Speed: Derby is a very lean and fast database which is improved by sharing the same JVM as Artifactory (no “connection over HTTP”).
  • Management: Using the bundled database reduces maintaining additional products/components (aka external database).

Cons:

  • Resources: Since the database and Artifactory live in the same JVM, there is a risk of competition over resources. This is especially true as the number of artifacts in the system increases. Not as scalable as an external database.
  • Management: Although there is less management in a bundled database, maintenance and other operations are limited. For example, connection to the database is possible only when the Artifactory application is turned off.

6. Which database is recommended? Oracle, MS, MySQL, or other?

There is no official database recommendation. Choosing your database mainly depends on your organization requirements and needs.

7. Can I modify configuration during an upgrade of an HA cluster?

In an Artifactory HA setup, it is possible to continue the regular, day-to-day, operations during a system upgrade. But it is not possible to modify configuration settings.

This is because there are many changes that could be happening in the background during a system upgrade. For example, new tables/columns, configuration files are created or modified

To prevent conflicts and data corruption, until all Artifactory nodes are on the same version, no configuration changes can be submitted.

Notice that an Artifactory upgrade is a relatively short process, and should take minutes.

 

Release Fast Or Die