Troubleshooting Artifactory during startup
Video Transcript:
Hello everyone and welcome to another video. My name is Patrick and I'm with the JFrog support team. Today we will be going over how to troubleshoot a startup problem. Let's go ahead and get started.
We will be covering the startup sequence, then we will move on to how to troubleshoot systemd errors, followed by how to troubleshoot microservice start errors, and finally we will discuss where to go when you see the "all services started successfully" line in the logs.
First, let's take a look at what's supposed to happen during a normal bootup sequence. When an administrator or the system itself calls the start command, systemd will be running an artifactory manage.sh script. The script will trigger all of the microservices in the JFrog platform to start. They're not ready to run just yet. JFrog Router, one of the core microservices, will perform health checks and readiness checks on the other microservices. As they check in and fully boot, the router will go from a "not ready" state to a "ready" state. When all services have started successfully, the application is ready to work.
One of the key early stages to an application startup is the systemd process. If the systemd process is unable to start the application, it means there's probably a fundamental problem with the installation or the installation files. When the Artifactory manage script is not available, for example, you will see an error and a hint to check the journalctl log.
To troubleshoot these kinds of problems, follow the hint and run the journalctl -xe command. You then need to parse the journal logs in order to determine where the problem is.
Let's see if we can solve this systemd issue. Over here I have a standard installation of JFrog Artifactory on Ubuntu. Let me try to call the systemctl start artifactory command. As you can see, it is currently not working correctly. It is encountering a problem. So, in order to troubleshoot this, I am going to run the suggested journalctl -xe command.
In the journalctl command, you can use the arrow keys to scroll around. See if you can find the Artifactory error line in the journal printout. Here we are—it says it failed to start Artifactory. It looks like it is unable to do so because the Artifactory manage script is nowhere to be found. You can press Q to quit out of the journalctl printout.
In order to inspect a systemctl process, you can call systemctl status artifactory, and you can see that the process is exited and it has failed. Importantly, you can inspect the artifactory.service service file by copying the path to it (printed in the status command) and then running a cat command on this path. This will show you all of the details behind the systemd installation.
As you can see, the artifactory manage.sh script has been installed and is being used by the ExecStart directive. Let's head over to /opt/jfrog/artifactory/app and take a look around. Oh look, there's an artifactory manage.sh script one level above where it's supposed to be. Let's put it back.
Let's call systemctl stop artifactory. This will stop the service from the systemctl registration. This kind of clears out the buffers, if you will. And now let's try calling systemctl start artifactory. Having solved the issue—no errors this time.
Let's head back to the presentation. Let's move on to discussing troubleshooting the microservices themselves.
During a bootup sequence, the JFrog Router will track the status of the different microservices. In this example right here, JFRT (short for Artifactory), JFE (short for front end), and JFOB (short for observability) have all failed to pass their readiness API check. This is expected—sometimes applications and microservices need a minute to boot.
In this example right here, the JFrog Access microservice is unable to check in on the Router. This is expected. It has tried six times before this seventh attempt, and it will try again. If too many of these readiness checks fail, the entire startup sequence will fail. If you find errors in the logs after the health checks have given up, you need to investigate the source of those errors.
After the script boots up all the microservices, they'll usually take between 25 to 100 seconds to fully boot up. If it takes longer than that, it's an indication that there could be a problem with one of the microservices.
These microservices, in this order, are crucial to a proper startup of the JFrog platform. To begin with, we have JFrog Router. This application is responsible for coordinating the other microservices and connecting them successfully. If it fails to boot, none of the other microservices are going to be able to communicate.
JFrog Access is typically the second microservice to start in a proper startup sequence. It is responsible for the secure registration of the other microservices. When a microservice boots up, it gets a join key from the Access program in order to join the cluster and authenticate properly. You can look early on in the startup logs to see if the other microservices are unable to join the cluster because there's no join key—it's an indicator of a problem with the Access program.
JFrog Artifactory is the main event. It is behind the running of the repositories and other core functions of the JFrog platform. If it is failing to boot, you will see errors in the JFrog Router logs.
Finally, JFrog Frontend, which directly depends on JFrog Artifactory, is the last microservice to boot. When it starts, the frontend loads and the entire platform comes online. There are other microservices not mentioned here that depend on these four programs in order for the application to run successfully.
I would advise checking the console log for any errors related to these core four functions. If there is an error, you should find it in this merged, aggregated log file.
Let's take a look and see if we can diagnose the startup issue. So I have successfully started Artifactory after the systemd error. Let's head over to the logs and see if the system is booting correctly. This will be over in /var/opt/jfrog/artifactory.
Over here we have a log directory containing all of the application's logs. Let's start with the console.log. I'm going to be doing a tail -f command to track the startup in real time. Let me go ahead and cancel that.
As you can see, there are some problems. There is a 404 "not found" page being reported by JFrog Router. If we take a look at some of the other entries, there have been 70 failed ping requests from the front end to Artifactory, and importantly we don't see any Artifactory events in the console log.
Let's go ahead and run a tail on the artifactory-service.log file to see what's going on. So this is not a good sign. At the very bottom is an error entry stating that Artifactory could not be initialized. Above that is the makings of a Java stack trace.
Instead of a tail command, let's switch over to the less text viewer. I'm going to use Shift+G to jump to the bottom and take a look at the error. There is a "Caused by" line, and I'm going to go ahead and scroll up here to the top. It states that it failed to initialize home, there was an exception, and it looks like there is a problem with the URL. It looks fairly straightforward here, and other errors may be more complicated than this example.
Let's correct the error and see if the application starts. I'm going to edit the /etc/system.yaml where the line complained about in the log is located. As reported in the error, there is a "your DB URL here" where there should not be. I'm going to go ahead and get rid of it. I happen to have set up a Postgres on the localhost in this location.
Going to save and quit with :wq, and let's try rebooting Artifactory.
This is an important sequence. If you suspect a startup problem, look in the console.log file first. It will contain the entries from all the microservices merged together. If the Router is complaining about a specific microservice, you can then check that microservice’s logs to diagnose the issue.
Just as before, I'm going to now run a tail -f on the log. This is looking a lot healthier. As you can see, there are these INFO-level events occurring across the board. This indicates that the startup sequence is going well.
We are also seeing events from the JFRT microservice shown in green. As explained earlier, you will see some warnings indicating that some of the microservices have not started yet. As long as they are not full-blown errors, it is nothing to worry about.
And there we are. This blurb right here indicates that all microservices within the JFrog platform have started passing their readiness checks. Thus, the platform’s UI has unlocked and the application is ready to work. The startup has succeeded.
As shown in the demo, we got an "all services started successfully" message. This means the JFrog platform has booted completely. You should be able to load the JFrog web page over on port 8082. If you're not able to load the web page, consider trying a curl command to localhost:8082. If you get back a success message from that test, it indicates there's a problem with the network. You might need to check things like the reverse proxy or any firewalls to load the JFrog platform.
That's everything I wanted to cover in this short video. Thank you so much for watching, and if you need further assistance, please look into the Help Center or leave a comment below. Thank you.